[jira] [Updated] (HUDI-1602) Corrupted Avro schema extracted from parquet file
[ https://issues.apache.org/jira/browse/HUDI-1602?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sagar Sumit updated HUDI-1602: -- Story Points: 1 (was: 0.5) > Corrupted Avro schema extracted from parquet file > - > > Key: HUDI-1602 > URL: https://issues.apache.org/jira/browse/HUDI-1602 > Project: Apache Hudi > Issue Type: Bug >Affects Versions: 0.9.0 >Reporter: Alexander Filipchik >Assignee: Sagar Sumit >Priority: Blocker > Labels: core-flow-ds, pull-request-available, sev:critical > Fix For: 0.11.0 > > > we are running a HUDI deltastreamer on a very complex stream. Schema is > deeply nested, with several levels of hierarchy (avro schema is around 6600 > LOC). > > The version of HUDI that writes the dataset if 0.5-SNAPTHOT and we recently > started attempts to upgrade to the latest. Hovewer, latest HUDI can't read > the provided dataset. Exception I get: > > > {code:java} > Got exception while parsing the arguments:Got exception while parsing the > arguments:Found recursive reference in Avro schema, which can not be > processed by Spark:{ "type" : "record", "name" : "array", "fields" : [ { > "name" : "id", "type" : [ "null", "string" ], "default" : null }, { > "name" : "type", "type" : [ "null", "string" ], "default" : null }, { > "name" : "exist", "type" : [ "null", "boolean" ], "default" : null > } ]} Stack > trace:org.apache.spark.sql.avro.IncompatibleSchemaException:Found recursive > reference in Avro schema, which can not be processed by Spark:{ "type" : > "record", "name" : "array", "fields" : [ { "name" : "id", "type" : [ > "null", "string" ], "default" : null }, { "name" : "type", "type" : > [ "null", "string" ], "default" : null }, { "name" : "exist", > "type" : [ "null", "boolean" ], "default" : null } ]} > at > org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:75) > at > org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:89) > at > org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:105) > at > org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:82) > at > org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:81) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at scala.collection.Iterator$class.foreach(Iterator.scala:891) at > scala.collection.AbstractIterator.foreach(Iterator.scala:1334) at > scala.collection.IterableLike$class.foreach(IterableLike.scala:72) at > scala.collection.AbstractIterable.foreach(Iterable.scala:54) at > scala.collection.TraversableLike$class.map(TraversableLike.scala:234) at > scala.collection.AbstractTraversable.map(Traversable.scala:104) at > org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:81) > at > org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:105) > at > org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:82) > at > org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:81) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at scala.collection.Iterator$class.foreach(Iterator.scala:891) at > scala.collection.AbstractIterator.foreach(Iterator.scala:1334) at > scala.collection.IterableLike$class.foreach(IterableLike.scala:72) at > scala.collection.AbstractIterable.foreach(Iterable.scala:54) at > scala.collection.TraversableLike$class.map(TraversableLike.scala:234) at > scala.collection.AbstractTraversable.map(Traversable.scala:104) at > org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:81) > at > org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:105) > at > org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:82) > at > org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:81) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at scala.collection.Iterator$class.foreach(Iterator.scala:891) at > scala.collection.AbstractIterator.foreach(Iterator.scala:1334) at > scala.collection.IterableLike$class.foreach(IterableLike.scala:72) at > scala.collection.AbstractIterable.foreach(Iterable.scala:54) at > scala.collection
[jira] [Updated] (HUDI-1602) Corrupted Avro schema extracted from parquet file
[ https://issues.apache.org/jira/browse/HUDI-1602?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sagar Sumit updated HUDI-1602: -- Status: In Progress (was: Open) > Corrupted Avro schema extracted from parquet file > - > > Key: HUDI-1602 > URL: https://issues.apache.org/jira/browse/HUDI-1602 > Project: Apache Hudi > Issue Type: Bug >Affects Versions: 0.9.0 >Reporter: Alexander Filipchik >Assignee: Sagar Sumit >Priority: Blocker > Labels: core-flow-ds, pull-request-available, sev:critical > Fix For: 0.11.0 > > > we are running a HUDI deltastreamer on a very complex stream. Schema is > deeply nested, with several levels of hierarchy (avro schema is around 6600 > LOC). > > The version of HUDI that writes the dataset if 0.5-SNAPTHOT and we recently > started attempts to upgrade to the latest. Hovewer, latest HUDI can't read > the provided dataset. Exception I get: > > > {code:java} > Got exception while parsing the arguments:Got exception while parsing the > arguments:Found recursive reference in Avro schema, which can not be > processed by Spark:{ "type" : "record", "name" : "array", "fields" : [ { > "name" : "id", "type" : [ "null", "string" ], "default" : null }, { > "name" : "type", "type" : [ "null", "string" ], "default" : null }, { > "name" : "exist", "type" : [ "null", "boolean" ], "default" : null > } ]} Stack > trace:org.apache.spark.sql.avro.IncompatibleSchemaException:Found recursive > reference in Avro schema, which can not be processed by Spark:{ "type" : > "record", "name" : "array", "fields" : [ { "name" : "id", "type" : [ > "null", "string" ], "default" : null }, { "name" : "type", "type" : > [ "null", "string" ], "default" : null }, { "name" : "exist", > "type" : [ "null", "boolean" ], "default" : null } ]} > at > org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:75) > at > org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:89) > at > org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:105) > at > org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:82) > at > org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:81) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at scala.collection.Iterator$class.foreach(Iterator.scala:891) at > scala.collection.AbstractIterator.foreach(Iterator.scala:1334) at > scala.collection.IterableLike$class.foreach(IterableLike.scala:72) at > scala.collection.AbstractIterable.foreach(Iterable.scala:54) at > scala.collection.TraversableLike$class.map(TraversableLike.scala:234) at > scala.collection.AbstractTraversable.map(Traversable.scala:104) at > org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:81) > at > org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:105) > at > org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:82) > at > org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:81) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at scala.collection.Iterator$class.foreach(Iterator.scala:891) at > scala.collection.AbstractIterator.foreach(Iterator.scala:1334) at > scala.collection.IterableLike$class.foreach(IterableLike.scala:72) at > scala.collection.AbstractIterable.foreach(Iterable.scala:54) at > scala.collection.TraversableLike$class.map(TraversableLike.scala:234) at > scala.collection.AbstractTraversable.map(Traversable.scala:104) at > org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:81) > at > org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:105) > at > org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:82) > at > org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:81) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at scala.collection.Iterator$class.foreach(Iterator.scala:891) at > scala.collection.AbstractIterator.foreach(Iterator.scala:1334) at > scala.collection.IterableLike$class.foreach(IterableLike.scala:72) at > scala.collection.AbstractIterable.foreach(Iterable.scala:54) at > scala.colle
[jira] [Updated] (HUDI-1602) Corrupted Avro schema extracted from parquet file
[ https://issues.apache.org/jira/browse/HUDI-1602?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-1602: - Sprint: Hudi-Sprint-Apr-12, Hudi-Sprint-Apr-19, Hudi-Sprint-Apr-25, Hudi-Sprint-May-02 (was: Hudi-Sprint-Apr-12, Hudi-Sprint-Apr-19, Hudi-Sprint-Apr-25) > Corrupted Avro schema extracted from parquet file > - > > Key: HUDI-1602 > URL: https://issues.apache.org/jira/browse/HUDI-1602 > Project: Apache Hudi > Issue Type: Bug >Affects Versions: 0.9.0 >Reporter: Alexander Filipchik >Assignee: Sagar Sumit >Priority: Blocker > Labels: core-flow-ds, pull-request-available, sev:critical > Fix For: 0.11.0 > > > we are running a HUDI deltastreamer on a very complex stream. Schema is > deeply nested, with several levels of hierarchy (avro schema is around 6600 > LOC). > > The version of HUDI that writes the dataset if 0.5-SNAPTHOT and we recently > started attempts to upgrade to the latest. Hovewer, latest HUDI can't read > the provided dataset. Exception I get: > > > {code:java} > Got exception while parsing the arguments:Got exception while parsing the > arguments:Found recursive reference in Avro schema, which can not be > processed by Spark:{ "type" : "record", "name" : "array", "fields" : [ { > "name" : "id", "type" : [ "null", "string" ], "default" : null }, { > "name" : "type", "type" : [ "null", "string" ], "default" : null }, { > "name" : "exist", "type" : [ "null", "boolean" ], "default" : null > } ]} Stack > trace:org.apache.spark.sql.avro.IncompatibleSchemaException:Found recursive > reference in Avro schema, which can not be processed by Spark:{ "type" : > "record", "name" : "array", "fields" : [ { "name" : "id", "type" : [ > "null", "string" ], "default" : null }, { "name" : "type", "type" : > [ "null", "string" ], "default" : null }, { "name" : "exist", > "type" : [ "null", "boolean" ], "default" : null } ]} > at > org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:75) > at > org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:89) > at > org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:105) > at > org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:82) > at > org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:81) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at scala.collection.Iterator$class.foreach(Iterator.scala:891) at > scala.collection.AbstractIterator.foreach(Iterator.scala:1334) at > scala.collection.IterableLike$class.foreach(IterableLike.scala:72) at > scala.collection.AbstractIterable.foreach(Iterable.scala:54) at > scala.collection.TraversableLike$class.map(TraversableLike.scala:234) at > scala.collection.AbstractTraversable.map(Traversable.scala:104) at > org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:81) > at > org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:105) > at > org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:82) > at > org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:81) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at scala.collection.Iterator$class.foreach(Iterator.scala:891) at > scala.collection.AbstractIterator.foreach(Iterator.scala:1334) at > scala.collection.IterableLike$class.foreach(IterableLike.scala:72) at > scala.collection.AbstractIterable.foreach(Iterable.scala:54) at > scala.collection.TraversableLike$class.map(TraversableLike.scala:234) at > scala.collection.AbstractTraversable.map(Traversable.scala:104) at > org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:81) > at > org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:105) > at > org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:82) > at > org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:81) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at scala.collection.Iterator$class.foreach(Iterator.scala:891) at > scala.collection.AbstractIterator.foreach(Iterator.scala:1334) at > scala.collection.IterableLike$
[jira] [Updated] (HUDI-1602) Corrupted Avro schema extracted from parquet file
[ https://issues.apache.org/jira/browse/HUDI-1602?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-1602: - Sprint: Hudi-Sprint-Apr-12, Hudi-Sprint-Apr-19, Hudi-Sprint-Apr-25 (was: Hudi-Sprint-Apr-12, Hudi-Sprint-Apr-19) > Corrupted Avro schema extracted from parquet file > - > > Key: HUDI-1602 > URL: https://issues.apache.org/jira/browse/HUDI-1602 > Project: Apache Hudi > Issue Type: Bug >Affects Versions: 0.9.0 >Reporter: Alexander Filipchik >Assignee: Sagar Sumit >Priority: Blocker > Labels: core-flow-ds, pull-request-available, sev:critical > Fix For: 0.11.0 > > > we are running a HUDI deltastreamer on a very complex stream. Schema is > deeply nested, with several levels of hierarchy (avro schema is around 6600 > LOC). > > The version of HUDI that writes the dataset if 0.5-SNAPTHOT and we recently > started attempts to upgrade to the latest. Hovewer, latest HUDI can't read > the provided dataset. Exception I get: > > > {code:java} > Got exception while parsing the arguments:Got exception while parsing the > arguments:Found recursive reference in Avro schema, which can not be > processed by Spark:{ "type" : "record", "name" : "array", "fields" : [ { > "name" : "id", "type" : [ "null", "string" ], "default" : null }, { > "name" : "type", "type" : [ "null", "string" ], "default" : null }, { > "name" : "exist", "type" : [ "null", "boolean" ], "default" : null > } ]} Stack > trace:org.apache.spark.sql.avro.IncompatibleSchemaException:Found recursive > reference in Avro schema, which can not be processed by Spark:{ "type" : > "record", "name" : "array", "fields" : [ { "name" : "id", "type" : [ > "null", "string" ], "default" : null }, { "name" : "type", "type" : > [ "null", "string" ], "default" : null }, { "name" : "exist", > "type" : [ "null", "boolean" ], "default" : null } ]} > at > org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:75) > at > org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:89) > at > org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:105) > at > org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:82) > at > org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:81) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at scala.collection.Iterator$class.foreach(Iterator.scala:891) at > scala.collection.AbstractIterator.foreach(Iterator.scala:1334) at > scala.collection.IterableLike$class.foreach(IterableLike.scala:72) at > scala.collection.AbstractIterable.foreach(Iterable.scala:54) at > scala.collection.TraversableLike$class.map(TraversableLike.scala:234) at > scala.collection.AbstractTraversable.map(Traversable.scala:104) at > org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:81) > at > org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:105) > at > org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:82) > at > org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:81) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at scala.collection.Iterator$class.foreach(Iterator.scala:891) at > scala.collection.AbstractIterator.foreach(Iterator.scala:1334) at > scala.collection.IterableLike$class.foreach(IterableLike.scala:72) at > scala.collection.AbstractIterable.foreach(Iterable.scala:54) at > scala.collection.TraversableLike$class.map(TraversableLike.scala:234) at > scala.collection.AbstractTraversable.map(Traversable.scala:104) at > org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:81) > at > org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:105) > at > org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:82) > at > org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:81) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at scala.collection.Iterator$class.foreach(Iterator.scala:891) at > scala.collection.AbstractIterator.foreach(Iterator.scala:1334) at > scala.collection.IterableLike$class.foreach(IterableLike.scala:72) at
[jira] [Updated] (HUDI-1602) Corrupted Avro schema extracted from parquet file
[ https://issues.apache.org/jira/browse/HUDI-1602?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-1602: - Sprint: Hudi-Sprint-Apr-12, Hudi-Sprint-Apr-19 (was: Hudi-Sprint-Apr-12) > Corrupted Avro schema extracted from parquet file > - > > Key: HUDI-1602 > URL: https://issues.apache.org/jira/browse/HUDI-1602 > Project: Apache Hudi > Issue Type: Bug >Affects Versions: 0.9.0 >Reporter: Alexander Filipchik >Assignee: Sagar Sumit >Priority: Blocker > Labels: core-flow-ds, pull-request-available, sev:critical > Fix For: 0.11.0 > > > we are running a HUDI deltastreamer on a very complex stream. Schema is > deeply nested, with several levels of hierarchy (avro schema is around 6600 > LOC). > > The version of HUDI that writes the dataset if 0.5-SNAPTHOT and we recently > started attempts to upgrade to the latest. Hovewer, latest HUDI can't read > the provided dataset. Exception I get: > > > {code:java} > Got exception while parsing the arguments:Got exception while parsing the > arguments:Found recursive reference in Avro schema, which can not be > processed by Spark:{ "type" : "record", "name" : "array", "fields" : [ { > "name" : "id", "type" : [ "null", "string" ], "default" : null }, { > "name" : "type", "type" : [ "null", "string" ], "default" : null }, { > "name" : "exist", "type" : [ "null", "boolean" ], "default" : null > } ]} Stack > trace:org.apache.spark.sql.avro.IncompatibleSchemaException:Found recursive > reference in Avro schema, which can not be processed by Spark:{ "type" : > "record", "name" : "array", "fields" : [ { "name" : "id", "type" : [ > "null", "string" ], "default" : null }, { "name" : "type", "type" : > [ "null", "string" ], "default" : null }, { "name" : "exist", > "type" : [ "null", "boolean" ], "default" : null } ]} > at > org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:75) > at > org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:89) > at > org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:105) > at > org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:82) > at > org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:81) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at scala.collection.Iterator$class.foreach(Iterator.scala:891) at > scala.collection.AbstractIterator.foreach(Iterator.scala:1334) at > scala.collection.IterableLike$class.foreach(IterableLike.scala:72) at > scala.collection.AbstractIterable.foreach(Iterable.scala:54) at > scala.collection.TraversableLike$class.map(TraversableLike.scala:234) at > scala.collection.AbstractTraversable.map(Traversable.scala:104) at > org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:81) > at > org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:105) > at > org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:82) > at > org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:81) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at scala.collection.Iterator$class.foreach(Iterator.scala:891) at > scala.collection.AbstractIterator.foreach(Iterator.scala:1334) at > scala.collection.IterableLike$class.foreach(IterableLike.scala:72) at > scala.collection.AbstractIterable.foreach(Iterable.scala:54) at > scala.collection.TraversableLike$class.map(TraversableLike.scala:234) at > scala.collection.AbstractTraversable.map(Traversable.scala:104) at > org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:81) > at > org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:105) > at > org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:82) > at > org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:81) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at scala.collection.Iterator$class.foreach(Iterator.scala:891) at > scala.collection.AbstractIterator.foreach(Iterator.scala:1334) at > scala.collection.IterableLike$class.foreach(IterableLike.scala:72) at > scala.collection.AbstractIterable.forea
[jira] [Updated] (HUDI-1602) Corrupted Avro schema extracted from parquet file
[ https://issues.apache.org/jira/browse/HUDI-1602?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-1602: - Story Points: 0.5 > Corrupted Avro schema extracted from parquet file > - > > Key: HUDI-1602 > URL: https://issues.apache.org/jira/browse/HUDI-1602 > Project: Apache Hudi > Issue Type: Bug >Affects Versions: 0.9.0 >Reporter: Alexander Filipchik >Priority: Blocker > Labels: core-flow-ds, pull-request-available, sev:critical > Fix For: 0.11.0 > > > we are running a HUDI deltastreamer on a very complex stream. Schema is > deeply nested, with several levels of hierarchy (avro schema is around 6600 > LOC). > > The version of HUDI that writes the dataset if 0.5-SNAPTHOT and we recently > started attempts to upgrade to the latest. Hovewer, latest HUDI can't read > the provided dataset. Exception I get: > > > {code:java} > Got exception while parsing the arguments:Got exception while parsing the > arguments:Found recursive reference in Avro schema, which can not be > processed by Spark:{ "type" : "record", "name" : "array", "fields" : [ { > "name" : "id", "type" : [ "null", "string" ], "default" : null }, { > "name" : "type", "type" : [ "null", "string" ], "default" : null }, { > "name" : "exist", "type" : [ "null", "boolean" ], "default" : null > } ]} Stack > trace:org.apache.spark.sql.avro.IncompatibleSchemaException:Found recursive > reference in Avro schema, which can not be processed by Spark:{ "type" : > "record", "name" : "array", "fields" : [ { "name" : "id", "type" : [ > "null", "string" ], "default" : null }, { "name" : "type", "type" : > [ "null", "string" ], "default" : null }, { "name" : "exist", > "type" : [ "null", "boolean" ], "default" : null } ]} > at > org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:75) > at > org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:89) > at > org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:105) > at > org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:82) > at > org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:81) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at scala.collection.Iterator$class.foreach(Iterator.scala:891) at > scala.collection.AbstractIterator.foreach(Iterator.scala:1334) at > scala.collection.IterableLike$class.foreach(IterableLike.scala:72) at > scala.collection.AbstractIterable.foreach(Iterable.scala:54) at > scala.collection.TraversableLike$class.map(TraversableLike.scala:234) at > scala.collection.AbstractTraversable.map(Traversable.scala:104) at > org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:81) > at > org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:105) > at > org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:82) > at > org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:81) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at scala.collection.Iterator$class.foreach(Iterator.scala:891) at > scala.collection.AbstractIterator.foreach(Iterator.scala:1334) at > scala.collection.IterableLike$class.foreach(IterableLike.scala:72) at > scala.collection.AbstractIterable.foreach(Iterable.scala:54) at > scala.collection.TraversableLike$class.map(TraversableLike.scala:234) at > scala.collection.AbstractTraversable.map(Traversable.scala:104) at > org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:81) > at > org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:105) > at > org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:82) > at > org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:81) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at scala.collection.Iterator$class.foreach(Iterator.scala:891) at > scala.collection.AbstractIterator.foreach(Iterator.scala:1334) at > scala.collection.IterableLike$class.foreach(IterableLike.scala:72) at > scala.collection.AbstractIterable.foreach(Iterable.scala:54) at > scala.collection.TraversableLike$class.map(TraversableLike.scal
[jira] [Updated] (HUDI-1602) Corrupted Avro schema extracted from parquet file
[ https://issues.apache.org/jira/browse/HUDI-1602?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-1602: - Sprint: Hudi-Sprint-Apr-12 > Corrupted Avro schema extracted from parquet file > - > > Key: HUDI-1602 > URL: https://issues.apache.org/jira/browse/HUDI-1602 > Project: Apache Hudi > Issue Type: Bug >Affects Versions: 0.9.0 >Reporter: Alexander Filipchik >Priority: Blocker > Labels: core-flow-ds, pull-request-available, sev:critical > Fix For: 0.11.0 > > > we are running a HUDI deltastreamer on a very complex stream. Schema is > deeply nested, with several levels of hierarchy (avro schema is around 6600 > LOC). > > The version of HUDI that writes the dataset if 0.5-SNAPTHOT and we recently > started attempts to upgrade to the latest. Hovewer, latest HUDI can't read > the provided dataset. Exception I get: > > > {code:java} > Got exception while parsing the arguments:Got exception while parsing the > arguments:Found recursive reference in Avro schema, which can not be > processed by Spark:{ "type" : "record", "name" : "array", "fields" : [ { > "name" : "id", "type" : [ "null", "string" ], "default" : null }, { > "name" : "type", "type" : [ "null", "string" ], "default" : null }, { > "name" : "exist", "type" : [ "null", "boolean" ], "default" : null > } ]} Stack > trace:org.apache.spark.sql.avro.IncompatibleSchemaException:Found recursive > reference in Avro schema, which can not be processed by Spark:{ "type" : > "record", "name" : "array", "fields" : [ { "name" : "id", "type" : [ > "null", "string" ], "default" : null }, { "name" : "type", "type" : > [ "null", "string" ], "default" : null }, { "name" : "exist", > "type" : [ "null", "boolean" ], "default" : null } ]} > at > org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:75) > at > org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:89) > at > org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:105) > at > org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:82) > at > org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:81) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at scala.collection.Iterator$class.foreach(Iterator.scala:891) at > scala.collection.AbstractIterator.foreach(Iterator.scala:1334) at > scala.collection.IterableLike$class.foreach(IterableLike.scala:72) at > scala.collection.AbstractIterable.foreach(Iterable.scala:54) at > scala.collection.TraversableLike$class.map(TraversableLike.scala:234) at > scala.collection.AbstractTraversable.map(Traversable.scala:104) at > org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:81) > at > org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:105) > at > org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:82) > at > org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:81) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at scala.collection.Iterator$class.foreach(Iterator.scala:891) at > scala.collection.AbstractIterator.foreach(Iterator.scala:1334) at > scala.collection.IterableLike$class.foreach(IterableLike.scala:72) at > scala.collection.AbstractIterable.foreach(Iterable.scala:54) at > scala.collection.TraversableLike$class.map(TraversableLike.scala:234) at > scala.collection.AbstractTraversable.map(Traversable.scala:104) at > org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:81) > at > org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:105) > at > org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:82) > at > org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:81) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at scala.collection.Iterator$class.foreach(Iterator.scala:891) at > scala.collection.AbstractIterator.foreach(Iterator.scala:1334) at > scala.collection.IterableLike$class.foreach(IterableLike.scala:72) at > scala.collection.AbstractIterable.foreach(Iterable.scala:54) at > scala.collection.TraversableLike$class.map(Traversable
[jira] [Updated] (HUDI-1602) Corrupted Avro schema extracted from parquet file
[ https://issues.apache.org/jira/browse/HUDI-1602?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-1602: - Priority: Blocker (was: Major) > Corrupted Avro schema extracted from parquet file > - > > Key: HUDI-1602 > URL: https://issues.apache.org/jira/browse/HUDI-1602 > Project: Apache Hudi > Issue Type: Bug >Affects Versions: 0.9.0 >Reporter: Alexander Filipchik >Priority: Blocker > Labels: core-flow-ds, pull-request-available, sev:critical > Fix For: 0.11.0 > > > we are running a HUDI deltastreamer on a very complex stream. Schema is > deeply nested, with several levels of hierarchy (avro schema is around 6600 > LOC). > > The version of HUDI that writes the dataset if 0.5-SNAPTHOT and we recently > started attempts to upgrade to the latest. Hovewer, latest HUDI can't read > the provided dataset. Exception I get: > > > {code:java} > Got exception while parsing the arguments:Got exception while parsing the > arguments:Found recursive reference in Avro schema, which can not be > processed by Spark:{ "type" : "record", "name" : "array", "fields" : [ { > "name" : "id", "type" : [ "null", "string" ], "default" : null }, { > "name" : "type", "type" : [ "null", "string" ], "default" : null }, { > "name" : "exist", "type" : [ "null", "boolean" ], "default" : null > } ]} Stack > trace:org.apache.spark.sql.avro.IncompatibleSchemaException:Found recursive > reference in Avro schema, which can not be processed by Spark:{ "type" : > "record", "name" : "array", "fields" : [ { "name" : "id", "type" : [ > "null", "string" ], "default" : null }, { "name" : "type", "type" : > [ "null", "string" ], "default" : null }, { "name" : "exist", > "type" : [ "null", "boolean" ], "default" : null } ]} > at > org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:75) > at > org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:89) > at > org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:105) > at > org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:82) > at > org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:81) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at scala.collection.Iterator$class.foreach(Iterator.scala:891) at > scala.collection.AbstractIterator.foreach(Iterator.scala:1334) at > scala.collection.IterableLike$class.foreach(IterableLike.scala:72) at > scala.collection.AbstractIterable.foreach(Iterable.scala:54) at > scala.collection.TraversableLike$class.map(TraversableLike.scala:234) at > scala.collection.AbstractTraversable.map(Traversable.scala:104) at > org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:81) > at > org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:105) > at > org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:82) > at > org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:81) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at scala.collection.Iterator$class.foreach(Iterator.scala:891) at > scala.collection.AbstractIterator.foreach(Iterator.scala:1334) at > scala.collection.IterableLike$class.foreach(IterableLike.scala:72) at > scala.collection.AbstractIterable.foreach(Iterable.scala:54) at > scala.collection.TraversableLike$class.map(TraversableLike.scala:234) at > scala.collection.AbstractTraversable.map(Traversable.scala:104) at > org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:81) > at > org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:105) > at > org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:82) > at > org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:81) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at scala.collection.Iterator$class.foreach(Iterator.scala:891) at > scala.collection.AbstractIterator.foreach(Iterator.scala:1334) at > scala.collection.IterableLike$class.foreach(IterableLike.scala:72) at > scala.collection.AbstractIterable.foreach(Iterable.scala:54) at > scala.collection.TraversableLike$class.map(Traver
[jira] [Updated] (HUDI-1602) Corrupted Avro schema extracted from parquet file
[ https://issues.apache.org/jira/browse/HUDI-1602?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-1602: - Fix Version/s: 0.11.0 (was: 0.12.0) > Corrupted Avro schema extracted from parquet file > - > > Key: HUDI-1602 > URL: https://issues.apache.org/jira/browse/HUDI-1602 > Project: Apache Hudi > Issue Type: Bug >Affects Versions: 0.9.0 >Reporter: Alexander Filipchik >Assignee: Nishith Agarwal >Priority: Major > Labels: core-flow-ds, pull-request-available, sev:critical > Fix For: 0.11.0 > > > we are running a HUDI deltastreamer on a very complex stream. Schema is > deeply nested, with several levels of hierarchy (avro schema is around 6600 > LOC). > > The version of HUDI that writes the dataset if 0.5-SNAPTHOT and we recently > started attempts to upgrade to the latest. Hovewer, latest HUDI can't read > the provided dataset. Exception I get: > > > {code:java} > Got exception while parsing the arguments:Got exception while parsing the > arguments:Found recursive reference in Avro schema, which can not be > processed by Spark:{ "type" : "record", "name" : "array", "fields" : [ { > "name" : "id", "type" : [ "null", "string" ], "default" : null }, { > "name" : "type", "type" : [ "null", "string" ], "default" : null }, { > "name" : "exist", "type" : [ "null", "boolean" ], "default" : null > } ]} Stack > trace:org.apache.spark.sql.avro.IncompatibleSchemaException:Found recursive > reference in Avro schema, which can not be processed by Spark:{ "type" : > "record", "name" : "array", "fields" : [ { "name" : "id", "type" : [ > "null", "string" ], "default" : null }, { "name" : "type", "type" : > [ "null", "string" ], "default" : null }, { "name" : "exist", > "type" : [ "null", "boolean" ], "default" : null } ]} > at > org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:75) > at > org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:89) > at > org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:105) > at > org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:82) > at > org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:81) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at scala.collection.Iterator$class.foreach(Iterator.scala:891) at > scala.collection.AbstractIterator.foreach(Iterator.scala:1334) at > scala.collection.IterableLike$class.foreach(IterableLike.scala:72) at > scala.collection.AbstractIterable.foreach(Iterable.scala:54) at > scala.collection.TraversableLike$class.map(TraversableLike.scala:234) at > scala.collection.AbstractTraversable.map(Traversable.scala:104) at > org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:81) > at > org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:105) > at > org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:82) > at > org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:81) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at scala.collection.Iterator$class.foreach(Iterator.scala:891) at > scala.collection.AbstractIterator.foreach(Iterator.scala:1334) at > scala.collection.IterableLike$class.foreach(IterableLike.scala:72) at > scala.collection.AbstractIterable.foreach(Iterable.scala:54) at > scala.collection.TraversableLike$class.map(TraversableLike.scala:234) at > scala.collection.AbstractTraversable.map(Traversable.scala:104) at > org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:81) > at > org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:105) > at > org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:82) > at > org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:81) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at scala.collection.Iterator$class.foreach(Iterator.scala:891) at > scala.collection.AbstractIterator.foreach(Iterator.scala:1334) at > scala.collection.IterableLike$class.foreach(IterableLike.scala:72) at > scala.collection.AbstractIterable.foreach(Iterable.s
[jira] [Updated] (HUDI-1602) Corrupted Avro schema extracted from parquet file
[ https://issues.apache.org/jira/browse/HUDI-1602?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-1602: - Status: Open (was: Patch Available) > Corrupted Avro schema extracted from parquet file > - > > Key: HUDI-1602 > URL: https://issues.apache.org/jira/browse/HUDI-1602 > Project: Apache Hudi > Issue Type: Bug >Affects Versions: 0.9.0 >Reporter: Alexander Filipchik >Assignee: Nishith Agarwal >Priority: Major > Labels: core-flow-ds, pull-request-available, sev:critical > Fix For: 0.12.0 > > > we are running a HUDI deltastreamer on a very complex stream. Schema is > deeply nested, with several levels of hierarchy (avro schema is around 6600 > LOC). > > The version of HUDI that writes the dataset if 0.5-SNAPTHOT and we recently > started attempts to upgrade to the latest. Hovewer, latest HUDI can't read > the provided dataset. Exception I get: > > > {code:java} > Got exception while parsing the arguments:Got exception while parsing the > arguments:Found recursive reference in Avro schema, which can not be > processed by Spark:{ "type" : "record", "name" : "array", "fields" : [ { > "name" : "id", "type" : [ "null", "string" ], "default" : null }, { > "name" : "type", "type" : [ "null", "string" ], "default" : null }, { > "name" : "exist", "type" : [ "null", "boolean" ], "default" : null > } ]} Stack > trace:org.apache.spark.sql.avro.IncompatibleSchemaException:Found recursive > reference in Avro schema, which can not be processed by Spark:{ "type" : > "record", "name" : "array", "fields" : [ { "name" : "id", "type" : [ > "null", "string" ], "default" : null }, { "name" : "type", "type" : > [ "null", "string" ], "default" : null }, { "name" : "exist", > "type" : [ "null", "boolean" ], "default" : null } ]} > at > org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:75) > at > org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:89) > at > org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:105) > at > org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:82) > at > org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:81) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at scala.collection.Iterator$class.foreach(Iterator.scala:891) at > scala.collection.AbstractIterator.foreach(Iterator.scala:1334) at > scala.collection.IterableLike$class.foreach(IterableLike.scala:72) at > scala.collection.AbstractIterable.foreach(Iterable.scala:54) at > scala.collection.TraversableLike$class.map(TraversableLike.scala:234) at > scala.collection.AbstractTraversable.map(Traversable.scala:104) at > org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:81) > at > org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:105) > at > org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:82) > at > org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:81) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at scala.collection.Iterator$class.foreach(Iterator.scala:891) at > scala.collection.AbstractIterator.foreach(Iterator.scala:1334) at > scala.collection.IterableLike$class.foreach(IterableLike.scala:72) at > scala.collection.AbstractIterable.foreach(Iterable.scala:54) at > scala.collection.TraversableLike$class.map(TraversableLike.scala:234) at > scala.collection.AbstractTraversable.map(Traversable.scala:104) at > org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:81) > at > org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:105) > at > org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:82) > at > org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:81) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at scala.collection.Iterator$class.foreach(Iterator.scala:891) at > scala.collection.AbstractIterator.foreach(Iterator.scala:1334) at > scala.collection.IterableLike$class.foreach(IterableLike.scala:72) at > scala.collection.AbstractIterable.foreach(Iterable.scala:54) at > scala.c
[jira] [Updated] (HUDI-1602) Corrupted Avro schema extracted from parquet file
[ https://issues.apache.org/jira/browse/HUDI-1602?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-1602: - Fix Version/s: 0.12.0 (was: 0.11.0) > Corrupted Avro schema extracted from parquet file > - > > Key: HUDI-1602 > URL: https://issues.apache.org/jira/browse/HUDI-1602 > Project: Apache Hudi > Issue Type: Bug >Affects Versions: 0.9.0 >Reporter: Alexander Filipchik >Assignee: Nishith Agarwal >Priority: Major > Labels: core-flow-ds, pull-request-available, sev:critical > Fix For: 0.12.0 > > > we are running a HUDI deltastreamer on a very complex stream. Schema is > deeply nested, with several levels of hierarchy (avro schema is around 6600 > LOC). > > The version of HUDI that writes the dataset if 0.5-SNAPTHOT and we recently > started attempts to upgrade to the latest. Hovewer, latest HUDI can't read > the provided dataset. Exception I get: > > > {code:java} > Got exception while parsing the arguments:Got exception while parsing the > arguments:Found recursive reference in Avro schema, which can not be > processed by Spark:{ "type" : "record", "name" : "array", "fields" : [ { > "name" : "id", "type" : [ "null", "string" ], "default" : null }, { > "name" : "type", "type" : [ "null", "string" ], "default" : null }, { > "name" : "exist", "type" : [ "null", "boolean" ], "default" : null > } ]} Stack > trace:org.apache.spark.sql.avro.IncompatibleSchemaException:Found recursive > reference in Avro schema, which can not be processed by Spark:{ "type" : > "record", "name" : "array", "fields" : [ { "name" : "id", "type" : [ > "null", "string" ], "default" : null }, { "name" : "type", "type" : > [ "null", "string" ], "default" : null }, { "name" : "exist", > "type" : [ "null", "boolean" ], "default" : null } ]} > at > org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:75) > at > org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:89) > at > org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:105) > at > org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:82) > at > org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:81) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at scala.collection.Iterator$class.foreach(Iterator.scala:891) at > scala.collection.AbstractIterator.foreach(Iterator.scala:1334) at > scala.collection.IterableLike$class.foreach(IterableLike.scala:72) at > scala.collection.AbstractIterable.foreach(Iterable.scala:54) at > scala.collection.TraversableLike$class.map(TraversableLike.scala:234) at > scala.collection.AbstractTraversable.map(Traversable.scala:104) at > org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:81) > at > org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:105) > at > org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:82) > at > org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:81) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at scala.collection.Iterator$class.foreach(Iterator.scala:891) at > scala.collection.AbstractIterator.foreach(Iterator.scala:1334) at > scala.collection.IterableLike$class.foreach(IterableLike.scala:72) at > scala.collection.AbstractIterable.foreach(Iterable.scala:54) at > scala.collection.TraversableLike$class.map(TraversableLike.scala:234) at > scala.collection.AbstractTraversable.map(Traversable.scala:104) at > org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:81) > at > org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:105) > at > org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:82) > at > org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:81) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at scala.collection.Iterator$class.foreach(Iterator.scala:891) at > scala.collection.AbstractIterator.foreach(Iterator.scala:1334) at > scala.collection.IterableLike$class.foreach(IterableLike.scala:72) at > scala.collection.AbstractIterable.foreach(Iterable.s
[jira] [Updated] (HUDI-1602) Corrupted Avro schema extracted from parquet file
[ https://issues.apache.org/jira/browse/HUDI-1602?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan updated HUDI-1602: -- Labels: core-flow-ds pull-request-available sev:critical (was: pull-request-available sev:critical) > Corrupted Avro schema extracted from parquet file > - > > Key: HUDI-1602 > URL: https://issues.apache.org/jira/browse/HUDI-1602 > Project: Apache Hudi > Issue Type: Bug >Affects Versions: 0.9.0 >Reporter: Alexander Filipchik >Assignee: Nishith Agarwal >Priority: Major > Labels: core-flow-ds, pull-request-available, sev:critical > Fix For: 0.11.0 > > > we are running a HUDI deltastreamer on a very complex stream. Schema is > deeply nested, with several levels of hierarchy (avro schema is around 6600 > LOC). > > The version of HUDI that writes the dataset if 0.5-SNAPTHOT and we recently > started attempts to upgrade to the latest. Hovewer, latest HUDI can't read > the provided dataset. Exception I get: > > > {code:java} > Got exception while parsing the arguments:Got exception while parsing the > arguments:Found recursive reference in Avro schema, which can not be > processed by Spark:{ "type" : "record", "name" : "array", "fields" : [ { > "name" : "id", "type" : [ "null", "string" ], "default" : null }, { > "name" : "type", "type" : [ "null", "string" ], "default" : null }, { > "name" : "exist", "type" : [ "null", "boolean" ], "default" : null > } ]} Stack > trace:org.apache.spark.sql.avro.IncompatibleSchemaException:Found recursive > reference in Avro schema, which can not be processed by Spark:{ "type" : > "record", "name" : "array", "fields" : [ { "name" : "id", "type" : [ > "null", "string" ], "default" : null }, { "name" : "type", "type" : > [ "null", "string" ], "default" : null }, { "name" : "exist", > "type" : [ "null", "boolean" ], "default" : null } ]} > at > org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:75) > at > org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:89) > at > org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:105) > at > org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:82) > at > org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:81) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at scala.collection.Iterator$class.foreach(Iterator.scala:891) at > scala.collection.AbstractIterator.foreach(Iterator.scala:1334) at > scala.collection.IterableLike$class.foreach(IterableLike.scala:72) at > scala.collection.AbstractIterable.foreach(Iterable.scala:54) at > scala.collection.TraversableLike$class.map(TraversableLike.scala:234) at > scala.collection.AbstractTraversable.map(Traversable.scala:104) at > org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:81) > at > org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:105) > at > org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:82) > at > org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:81) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at scala.collection.Iterator$class.foreach(Iterator.scala:891) at > scala.collection.AbstractIterator.foreach(Iterator.scala:1334) at > scala.collection.IterableLike$class.foreach(IterableLike.scala:72) at > scala.collection.AbstractIterable.foreach(Iterable.scala:54) at > scala.collection.TraversableLike$class.map(TraversableLike.scala:234) at > scala.collection.AbstractTraversable.map(Traversable.scala:104) at > org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:81) > at > org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:105) > at > org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:82) > at > org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:81) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at scala.collection.Iterator$class.foreach(Iterator.scala:891) at > scala.collection.AbstractIterator.foreach(Iterator.scala:1334) at > scala.collection.IterableLike$class.foreach(IterableLike.scala:7
[jira] [Updated] (HUDI-1602) Corrupted Avro schema extracted from parquet file
[ https://issues.apache.org/jira/browse/HUDI-1602?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan updated HUDI-1602: -- Fix Version/s: 0.11.0 (was: 0.10.0) > Corrupted Avro schema extracted from parquet file > - > > Key: HUDI-1602 > URL: https://issues.apache.org/jira/browse/HUDI-1602 > Project: Apache Hudi > Issue Type: Bug >Affects Versions: 0.9.0 >Reporter: Alexander Filipchik >Assignee: Nishith Agarwal >Priority: Major > Labels: pull-request-available, sev:critical > Fix For: 0.11.0 > > > we are running a HUDI deltastreamer on a very complex stream. Schema is > deeply nested, with several levels of hierarchy (avro schema is around 6600 > LOC). > > The version of HUDI that writes the dataset if 0.5-SNAPTHOT and we recently > started attempts to upgrade to the latest. Hovewer, latest HUDI can't read > the provided dataset. Exception I get: > > > {code:java} > Got exception while parsing the arguments:Got exception while parsing the > arguments:Found recursive reference in Avro schema, which can not be > processed by Spark:{ "type" : "record", "name" : "array", "fields" : [ { > "name" : "id", "type" : [ "null", "string" ], "default" : null }, { > "name" : "type", "type" : [ "null", "string" ], "default" : null }, { > "name" : "exist", "type" : [ "null", "boolean" ], "default" : null > } ]} Stack > trace:org.apache.spark.sql.avro.IncompatibleSchemaException:Found recursive > reference in Avro schema, which can not be processed by Spark:{ "type" : > "record", "name" : "array", "fields" : [ { "name" : "id", "type" : [ > "null", "string" ], "default" : null }, { "name" : "type", "type" : > [ "null", "string" ], "default" : null }, { "name" : "exist", > "type" : [ "null", "boolean" ], "default" : null } ]} > at > org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:75) > at > org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:89) > at > org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:105) > at > org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:82) > at > org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:81) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at scala.collection.Iterator$class.foreach(Iterator.scala:891) at > scala.collection.AbstractIterator.foreach(Iterator.scala:1334) at > scala.collection.IterableLike$class.foreach(IterableLike.scala:72) at > scala.collection.AbstractIterable.foreach(Iterable.scala:54) at > scala.collection.TraversableLike$class.map(TraversableLike.scala:234) at > scala.collection.AbstractTraversable.map(Traversable.scala:104) at > org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:81) > at > org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:105) > at > org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:82) > at > org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:81) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at scala.collection.Iterator$class.foreach(Iterator.scala:891) at > scala.collection.AbstractIterator.foreach(Iterator.scala:1334) at > scala.collection.IterableLike$class.foreach(IterableLike.scala:72) at > scala.collection.AbstractIterable.foreach(Iterable.scala:54) at > scala.collection.TraversableLike$class.map(TraversableLike.scala:234) at > scala.collection.AbstractTraversable.map(Traversable.scala:104) at > org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:81) > at > org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:105) > at > org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:82) > at > org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:81) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at scala.collection.Iterator$class.foreach(Iterator.scala:891) at > scala.collection.AbstractIterator.foreach(Iterator.scala:1334) at > scala.collection.IterableLike$class.foreach(IterableLike.scala:72) at > scala.collection.AbstractIterable.foreach(Iterab
[jira] [Updated] (HUDI-1602) Corrupted Avro schema extracted from parquet file
[ https://issues.apache.org/jira/browse/HUDI-1602?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Udit Mehrotra updated HUDI-1602: Fix Version/s: (was: 0.9.0) 0.10.0 > Corrupted Avro schema extracted from parquet file > - > > Key: HUDI-1602 > URL: https://issues.apache.org/jira/browse/HUDI-1602 > Project: Apache Hudi > Issue Type: Bug >Affects Versions: 0.9.0 >Reporter: Alexander Filipchik >Assignee: Nishith Agarwal >Priority: Major > Labels: pull-request-available, sev:critical > Fix For: 0.10.0 > > > we are running a HUDI deltastreamer on a very complex stream. Schema is > deeply nested, with several levels of hierarchy (avro schema is around 6600 > LOC). > > The version of HUDI that writes the dataset if 0.5-SNAPTHOT and we recently > started attempts to upgrade to the latest. Hovewer, latest HUDI can't read > the provided dataset. Exception I get: > > > {code:java} > Got exception while parsing the arguments:Got exception while parsing the > arguments:Found recursive reference in Avro schema, which can not be > processed by Spark:{ "type" : "record", "name" : "array", "fields" : [ { > "name" : "id", "type" : [ "null", "string" ], "default" : null }, { > "name" : "type", "type" : [ "null", "string" ], "default" : null }, { > "name" : "exist", "type" : [ "null", "boolean" ], "default" : null > } ]} Stack > trace:org.apache.spark.sql.avro.IncompatibleSchemaException:Found recursive > reference in Avro schema, which can not be processed by Spark:{ "type" : > "record", "name" : "array", "fields" : [ { "name" : "id", "type" : [ > "null", "string" ], "default" : null }, { "name" : "type", "type" : > [ "null", "string" ], "default" : null }, { "name" : "exist", > "type" : [ "null", "boolean" ], "default" : null } ]} > at > org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:75) > at > org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:89) > at > org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:105) > at > org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:82) > at > org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:81) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at scala.collection.Iterator$class.foreach(Iterator.scala:891) at > scala.collection.AbstractIterator.foreach(Iterator.scala:1334) at > scala.collection.IterableLike$class.foreach(IterableLike.scala:72) at > scala.collection.AbstractIterable.foreach(Iterable.scala:54) at > scala.collection.TraversableLike$class.map(TraversableLike.scala:234) at > scala.collection.AbstractTraversable.map(Traversable.scala:104) at > org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:81) > at > org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:105) > at > org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:82) > at > org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:81) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at scala.collection.Iterator$class.foreach(Iterator.scala:891) at > scala.collection.AbstractIterator.foreach(Iterator.scala:1334) at > scala.collection.IterableLike$class.foreach(IterableLike.scala:72) at > scala.collection.AbstractIterable.foreach(Iterable.scala:54) at > scala.collection.TraversableLike$class.map(TraversableLike.scala:234) at > scala.collection.AbstractTraversable.map(Traversable.scala:104) at > org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:81) > at > org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:105) > at > org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:82) > at > org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:81) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at scala.collection.Iterator$class.foreach(Iterator.scala:891) at > scala.collection.AbstractIterator.foreach(Iterator.scala:1334) at > scala.collection.IterableLike$class.foreach(IterableLike.scala:72) at > scala.collection.AbstractIterable.foreach(Iterable.scala:54)
[jira] [Updated] (HUDI-1602) Corrupted Avro schema extracted from parquet file
[ https://issues.apache.org/jira/browse/HUDI-1602?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan updated HUDI-1602: -- Status: In Progress (was: Open) > Corrupted Avro schema extracted from parquet file > - > > Key: HUDI-1602 > URL: https://issues.apache.org/jira/browse/HUDI-1602 > Project: Apache Hudi > Issue Type: Bug >Affects Versions: 0.9.0 >Reporter: Alexander Filipchik >Assignee: Vinoth Chandar >Priority: Major > Labels: pull-request-available, sev:critical > Fix For: 0.9.0 > > > we are running a HUDI deltastreamer on a very complex stream. Schema is > deeply nested, with several levels of hierarchy (avro schema is around 6600 > LOC). > > The version of HUDI that writes the dataset if 0.5-SNAPTHOT and we recently > started attempts to upgrade to the latest. Hovewer, latest HUDI can't read > the provided dataset. Exception I get: > > > {code:java} > Got exception while parsing the arguments:Got exception while parsing the > arguments:Found recursive reference in Avro schema, which can not be > processed by Spark:{ "type" : "record", "name" : "array", "fields" : [ { > "name" : "id", "type" : [ "null", "string" ], "default" : null }, { > "name" : "type", "type" : [ "null", "string" ], "default" : null }, { > "name" : "exist", "type" : [ "null", "boolean" ], "default" : null > } ]} Stack > trace:org.apache.spark.sql.avro.IncompatibleSchemaException:Found recursive > reference in Avro schema, which can not be processed by Spark:{ "type" : > "record", "name" : "array", "fields" : [ { "name" : "id", "type" : [ > "null", "string" ], "default" : null }, { "name" : "type", "type" : > [ "null", "string" ], "default" : null }, { "name" : "exist", > "type" : [ "null", "boolean" ], "default" : null } ]} > at > org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:75) > at > org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:89) > at > org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:105) > at > org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:82) > at > org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:81) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at scala.collection.Iterator$class.foreach(Iterator.scala:891) at > scala.collection.AbstractIterator.foreach(Iterator.scala:1334) at > scala.collection.IterableLike$class.foreach(IterableLike.scala:72) at > scala.collection.AbstractIterable.foreach(Iterable.scala:54) at > scala.collection.TraversableLike$class.map(TraversableLike.scala:234) at > scala.collection.AbstractTraversable.map(Traversable.scala:104) at > org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:81) > at > org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:105) > at > org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:82) > at > org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:81) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at scala.collection.Iterator$class.foreach(Iterator.scala:891) at > scala.collection.AbstractIterator.foreach(Iterator.scala:1334) at > scala.collection.IterableLike$class.foreach(IterableLike.scala:72) at > scala.collection.AbstractIterable.foreach(Iterable.scala:54) at > scala.collection.TraversableLike$class.map(TraversableLike.scala:234) at > scala.collection.AbstractTraversable.map(Traversable.scala:104) at > org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:81) > at > org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:105) > at > org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:82) > at > org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:81) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at scala.collection.Iterator$class.foreach(Iterator.scala:891) at > scala.collection.AbstractIterator.foreach(Iterator.scala:1334) at > scala.collection.IterableLike$class.foreach(IterableLike.scala:72) at > scala.collection.AbstractIterable.foreach(Iterable.scala:54) at > scala.col
[jira] [Updated] (HUDI-1602) Corrupted Avro schema extracted from parquet file
[ https://issues.apache.org/jira/browse/HUDI-1602?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan updated HUDI-1602: -- Status: Patch Available (was: In Progress) > Corrupted Avro schema extracted from parquet file > - > > Key: HUDI-1602 > URL: https://issues.apache.org/jira/browse/HUDI-1602 > Project: Apache Hudi > Issue Type: Bug >Affects Versions: 0.9.0 >Reporter: Alexander Filipchik >Assignee: Vinoth Chandar >Priority: Major > Labels: pull-request-available, sev:critical > Fix For: 0.9.0 > > > we are running a HUDI deltastreamer on a very complex stream. Schema is > deeply nested, with several levels of hierarchy (avro schema is around 6600 > LOC). > > The version of HUDI that writes the dataset if 0.5-SNAPTHOT and we recently > started attempts to upgrade to the latest. Hovewer, latest HUDI can't read > the provided dataset. Exception I get: > > > {code:java} > Got exception while parsing the arguments:Got exception while parsing the > arguments:Found recursive reference in Avro schema, which can not be > processed by Spark:{ "type" : "record", "name" : "array", "fields" : [ { > "name" : "id", "type" : [ "null", "string" ], "default" : null }, { > "name" : "type", "type" : [ "null", "string" ], "default" : null }, { > "name" : "exist", "type" : [ "null", "boolean" ], "default" : null > } ]} Stack > trace:org.apache.spark.sql.avro.IncompatibleSchemaException:Found recursive > reference in Avro schema, which can not be processed by Spark:{ "type" : > "record", "name" : "array", "fields" : [ { "name" : "id", "type" : [ > "null", "string" ], "default" : null }, { "name" : "type", "type" : > [ "null", "string" ], "default" : null }, { "name" : "exist", > "type" : [ "null", "boolean" ], "default" : null } ]} > at > org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:75) > at > org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:89) > at > org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:105) > at > org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:82) > at > org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:81) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at scala.collection.Iterator$class.foreach(Iterator.scala:891) at > scala.collection.AbstractIterator.foreach(Iterator.scala:1334) at > scala.collection.IterableLike$class.foreach(IterableLike.scala:72) at > scala.collection.AbstractIterable.foreach(Iterable.scala:54) at > scala.collection.TraversableLike$class.map(TraversableLike.scala:234) at > scala.collection.AbstractTraversable.map(Traversable.scala:104) at > org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:81) > at > org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:105) > at > org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:82) > at > org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:81) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at scala.collection.Iterator$class.foreach(Iterator.scala:891) at > scala.collection.AbstractIterator.foreach(Iterator.scala:1334) at > scala.collection.IterableLike$class.foreach(IterableLike.scala:72) at > scala.collection.AbstractIterable.foreach(Iterable.scala:54) at > scala.collection.TraversableLike$class.map(TraversableLike.scala:234) at > scala.collection.AbstractTraversable.map(Traversable.scala:104) at > org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:81) > at > org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:105) > at > org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:82) > at > org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:81) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at scala.collection.Iterator$class.foreach(Iterator.scala:891) at > scala.collection.AbstractIterator.foreach(Iterator.scala:1334) at > scala.collection.IterableLike$class.foreach(IterableLike.scala:72) at > scala.collection.AbstractIterable.foreach(Iterable.scala:54) at
[jira] [Updated] (HUDI-1602) Corrupted Avro schema extracted from parquet file
[ https://issues.apache.org/jira/browse/HUDI-1602?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gary Li updated HUDI-1602: -- Fix Version/s: (was: 0.8.0) 0.9.0 > Corrupted Avro schema extracted from parquet file > - > > Key: HUDI-1602 > URL: https://issues.apache.org/jira/browse/HUDI-1602 > Project: Apache Hudi > Issue Type: Bug >Affects Versions: 0.9.0 >Reporter: Alexander Filipchik >Assignee: Vinoth Chandar >Priority: Major > Labels: pull-request-available, sev:critical > Fix For: 0.9.0 > > > we are running a HUDI deltastreamer on a very complex stream. Schema is > deeply nested, with several levels of hierarchy (avro schema is around 6600 > LOC). > > The version of HUDI that writes the dataset if 0.5-SNAPTHOT and we recently > started attempts to upgrade to the latest. Hovewer, latest HUDI can't read > the provided dataset. Exception I get: > > > {code:java} > Got exception while parsing the arguments:Got exception while parsing the > arguments:Found recursive reference in Avro schema, which can not be > processed by Spark:{ "type" : "record", "name" : "array", "fields" : [ { > "name" : "id", "type" : [ "null", "string" ], "default" : null }, { > "name" : "type", "type" : [ "null", "string" ], "default" : null }, { > "name" : "exist", "type" : [ "null", "boolean" ], "default" : null > } ]} Stack > trace:org.apache.spark.sql.avro.IncompatibleSchemaException:Found recursive > reference in Avro schema, which can not be processed by Spark:{ "type" : > "record", "name" : "array", "fields" : [ { "name" : "id", "type" : [ > "null", "string" ], "default" : null }, { "name" : "type", "type" : > [ "null", "string" ], "default" : null }, { "name" : "exist", > "type" : [ "null", "boolean" ], "default" : null } ]} > at > org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:75) > at > org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:89) > at > org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:105) > at > org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:82) > at > org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:81) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at scala.collection.Iterator$class.foreach(Iterator.scala:891) at > scala.collection.AbstractIterator.foreach(Iterator.scala:1334) at > scala.collection.IterableLike$class.foreach(IterableLike.scala:72) at > scala.collection.AbstractIterable.foreach(Iterable.scala:54) at > scala.collection.TraversableLike$class.map(TraversableLike.scala:234) at > scala.collection.AbstractTraversable.map(Traversable.scala:104) at > org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:81) > at > org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:105) > at > org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:82) > at > org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:81) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at scala.collection.Iterator$class.foreach(Iterator.scala:891) at > scala.collection.AbstractIterator.foreach(Iterator.scala:1334) at > scala.collection.IterableLike$class.foreach(IterableLike.scala:72) at > scala.collection.AbstractIterable.foreach(Iterable.scala:54) at > scala.collection.TraversableLike$class.map(TraversableLike.scala:234) at > scala.collection.AbstractTraversable.map(Traversable.scala:104) at > org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:81) > at > org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:105) > at > org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:82) > at > org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:81) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at scala.collection.Iterator$class.foreach(Iterator.scala:891) at > scala.collection.AbstractIterator.foreach(Iterator.scala:1334) at > scala.collection.IterableLike$class.foreach(IterableLike.scala:72) at > scala.collection.AbstractIterable.foreach(Iterable.scala:54) at > scala.col
[jira] [Updated] (HUDI-1602) Corrupted Avro schema extracted from parquet file
[ https://issues.apache.org/jira/browse/HUDI-1602?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gary Li updated HUDI-1602: -- Affects Version/s: 0.9.0 > Corrupted Avro schema extracted from parquet file > - > > Key: HUDI-1602 > URL: https://issues.apache.org/jira/browse/HUDI-1602 > Project: Apache Hudi > Issue Type: Bug >Affects Versions: 0.9.0 >Reporter: Alexander Filipchik >Assignee: Vinoth Chandar >Priority: Major > Labels: pull-request-available, sev:critical > Fix For: 0.8.0 > > > we are running a HUDI deltastreamer on a very complex stream. Schema is > deeply nested, with several levels of hierarchy (avro schema is around 6600 > LOC). > > The version of HUDI that writes the dataset if 0.5-SNAPTHOT and we recently > started attempts to upgrade to the latest. Hovewer, latest HUDI can't read > the provided dataset. Exception I get: > > > {code:java} > Got exception while parsing the arguments:Got exception while parsing the > arguments:Found recursive reference in Avro schema, which can not be > processed by Spark:{ "type" : "record", "name" : "array", "fields" : [ { > "name" : "id", "type" : [ "null", "string" ], "default" : null }, { > "name" : "type", "type" : [ "null", "string" ], "default" : null }, { > "name" : "exist", "type" : [ "null", "boolean" ], "default" : null > } ]} Stack > trace:org.apache.spark.sql.avro.IncompatibleSchemaException:Found recursive > reference in Avro schema, which can not be processed by Spark:{ "type" : > "record", "name" : "array", "fields" : [ { "name" : "id", "type" : [ > "null", "string" ], "default" : null }, { "name" : "type", "type" : > [ "null", "string" ], "default" : null }, { "name" : "exist", > "type" : [ "null", "boolean" ], "default" : null } ]} > at > org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:75) > at > org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:89) > at > org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:105) > at > org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:82) > at > org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:81) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at scala.collection.Iterator$class.foreach(Iterator.scala:891) at > scala.collection.AbstractIterator.foreach(Iterator.scala:1334) at > scala.collection.IterableLike$class.foreach(IterableLike.scala:72) at > scala.collection.AbstractIterable.foreach(Iterable.scala:54) at > scala.collection.TraversableLike$class.map(TraversableLike.scala:234) at > scala.collection.AbstractTraversable.map(Traversable.scala:104) at > org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:81) > at > org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:105) > at > org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:82) > at > org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:81) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at scala.collection.Iterator$class.foreach(Iterator.scala:891) at > scala.collection.AbstractIterator.foreach(Iterator.scala:1334) at > scala.collection.IterableLike$class.foreach(IterableLike.scala:72) at > scala.collection.AbstractIterable.foreach(Iterable.scala:54) at > scala.collection.TraversableLike$class.map(TraversableLike.scala:234) at > scala.collection.AbstractTraversable.map(Traversable.scala:104) at > org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:81) > at > org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:105) > at > org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:82) > at > org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:81) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at scala.collection.Iterator$class.foreach(Iterator.scala:891) at > scala.collection.AbstractIterator.foreach(Iterator.scala:1334) at > scala.collection.IterableLike$class.foreach(IterableLike.scala:72) at > scala.collection.AbstractIterable.foreach(Iterable.scala:54) at > scala.collection.TraversableLike$class.ma
[jira] [Updated] (HUDI-1602) Corrupted Avro schema extracted from parquet file
[ https://issues.apache.org/jira/browse/HUDI-1602?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-1602: - Labels: pull-request-available sev:critical (was: sev:critical) > Corrupted Avro schema extracted from parquet file > - > > Key: HUDI-1602 > URL: https://issues.apache.org/jira/browse/HUDI-1602 > Project: Apache Hudi > Issue Type: Bug >Reporter: Alexander Filipchik >Assignee: Vinoth Chandar >Priority: Major > Labels: pull-request-available, sev:critical > Fix For: 0.8.0 > > > we are running a HUDI deltastreamer on a very complex stream. Schema is > deeply nested, with several levels of hierarchy (avro schema is around 6600 > LOC). > > The version of HUDI that writes the dataset if 0.5-SNAPTHOT and we recently > started attempts to upgrade to the latest. Hovewer, latest HUDI can't read > the provided dataset. Exception I get: > > > {code:java} > Got exception while parsing the arguments:Got exception while parsing the > arguments:Found recursive reference in Avro schema, which can not be > processed by Spark:{ "type" : "record", "name" : "array", "fields" : [ { > "name" : "id", "type" : [ "null", "string" ], "default" : null }, { > "name" : "type", "type" : [ "null", "string" ], "default" : null }, { > "name" : "exist", "type" : [ "null", "boolean" ], "default" : null > } ]} Stack > trace:org.apache.spark.sql.avro.IncompatibleSchemaException:Found recursive > reference in Avro schema, which can not be processed by Spark:{ "type" : > "record", "name" : "array", "fields" : [ { "name" : "id", "type" : [ > "null", "string" ], "default" : null }, { "name" : "type", "type" : > [ "null", "string" ], "default" : null }, { "name" : "exist", > "type" : [ "null", "boolean" ], "default" : null } ]} > at > org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:75) > at > org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:89) > at > org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:105) > at > org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:82) > at > org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:81) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at scala.collection.Iterator$class.foreach(Iterator.scala:891) at > scala.collection.AbstractIterator.foreach(Iterator.scala:1334) at > scala.collection.IterableLike$class.foreach(IterableLike.scala:72) at > scala.collection.AbstractIterable.foreach(Iterable.scala:54) at > scala.collection.TraversableLike$class.map(TraversableLike.scala:234) at > scala.collection.AbstractTraversable.map(Traversable.scala:104) at > org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:81) > at > org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:105) > at > org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:82) > at > org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:81) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at scala.collection.Iterator$class.foreach(Iterator.scala:891) at > scala.collection.AbstractIterator.foreach(Iterator.scala:1334) at > scala.collection.IterableLike$class.foreach(IterableLike.scala:72) at > scala.collection.AbstractIterable.foreach(Iterable.scala:54) at > scala.collection.TraversableLike$class.map(TraversableLike.scala:234) at > scala.collection.AbstractTraversable.map(Traversable.scala:104) at > org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:81) > at > org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:105) > at > org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:82) > at > org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:81) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at scala.collection.Iterator$class.foreach(Iterator.scala:891) at > scala.collection.AbstractIterator.foreach(Iterator.scala:1334) at > scala.collection.IterableLike$class.foreach(IterableLike.scala:72) at > scala.collection.AbstractIterable.foreach(Iterable.scala:54) at > scala.collection
[jira] [Updated] (HUDI-1602) Corrupted Avro schema extracted from parquet file
[ https://issues.apache.org/jira/browse/HUDI-1602?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinoth Chandar updated HUDI-1602: - Fix Version/s: 0.8.0 > Corrupted Avro schema extracted from parquet file > - > > Key: HUDI-1602 > URL: https://issues.apache.org/jira/browse/HUDI-1602 > Project: Apache Hudi > Issue Type: Bug >Reporter: Alexander Filipchik >Priority: Major > Fix For: 0.8.0 > > > we are running a HUDI deltastreamer on a very complex stream. Schema is > deeply nested, with several levels of hierarchy (avro schema is around 6600 > LOC). > > The version of HUDI that writes the dataset if 0.5-SNAPTHOT and we recently > started attempts to upgrade to the latest. Hovewer, latest HUDI can't read > the provided dataset. Exception I get: > > > {code:java} > Got exception while parsing the arguments:Got exception while parsing the > arguments:Found recursive reference in Avro schema, which can not be > processed by Spark:{ "type" : "record", "name" : "array", "fields" : [ { > "name" : "id", "type" : [ "null", "string" ], "default" : null }, { > "name" : "type", "type" : [ "null", "string" ], "default" : null }, { > "name" : "exist", "type" : [ "null", "boolean" ], "default" : null > } ]} Stack > trace:org.apache.spark.sql.avro.IncompatibleSchemaException:Found recursive > reference in Avro schema, which can not be processed by Spark:{ "type" : > "record", "name" : "array", "fields" : [ { "name" : "id", "type" : [ > "null", "string" ], "default" : null }, { "name" : "type", "type" : > [ "null", "string" ], "default" : null }, { "name" : "exist", > "type" : [ "null", "boolean" ], "default" : null } ]} > at > org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:75) > at > org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:89) > at > org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:105) > at > org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:82) > at > org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:81) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at scala.collection.Iterator$class.foreach(Iterator.scala:891) at > scala.collection.AbstractIterator.foreach(Iterator.scala:1334) at > scala.collection.IterableLike$class.foreach(IterableLike.scala:72) at > scala.collection.AbstractIterable.foreach(Iterable.scala:54) at > scala.collection.TraversableLike$class.map(TraversableLike.scala:234) at > scala.collection.AbstractTraversable.map(Traversable.scala:104) at > org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:81) > at > org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:105) > at > org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:82) > at > org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:81) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at scala.collection.Iterator$class.foreach(Iterator.scala:891) at > scala.collection.AbstractIterator.foreach(Iterator.scala:1334) at > scala.collection.IterableLike$class.foreach(IterableLike.scala:72) at > scala.collection.AbstractIterable.foreach(Iterable.scala:54) at > scala.collection.TraversableLike$class.map(TraversableLike.scala:234) at > scala.collection.AbstractTraversable.map(Traversable.scala:104) at > org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:81) > at > org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:105) > at > org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:82) > at > org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:81) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at scala.collection.Iterator$class.foreach(Iterator.scala:891) at > scala.collection.AbstractIterator.foreach(Iterator.scala:1334) at > scala.collection.IterableLike$class.foreach(IterableLike.scala:72) at > scala.collection.AbstractIterable.foreach(Iterable.scala:54) at > scala.collection.TraversableLike$class.map(TraversableLike.scala:234) at > scala.collection.AbstractTraversable.map(Traversable.scala:104) at > org.apache.s
[jira] [Updated] (HUDI-1602) Corrupted Avro schema extracted from parquet file
[ https://issues.apache.org/jira/browse/HUDI-1602?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinoth Chandar updated HUDI-1602: - Labels: sev:critical (was: ) > Corrupted Avro schema extracted from parquet file > - > > Key: HUDI-1602 > URL: https://issues.apache.org/jira/browse/HUDI-1602 > Project: Apache Hudi > Issue Type: Bug >Reporter: Alexander Filipchik >Priority: Major > Labels: sev:critical > Fix For: 0.8.0 > > > we are running a HUDI deltastreamer on a very complex stream. Schema is > deeply nested, with several levels of hierarchy (avro schema is around 6600 > LOC). > > The version of HUDI that writes the dataset if 0.5-SNAPTHOT and we recently > started attempts to upgrade to the latest. Hovewer, latest HUDI can't read > the provided dataset. Exception I get: > > > {code:java} > Got exception while parsing the arguments:Got exception while parsing the > arguments:Found recursive reference in Avro schema, which can not be > processed by Spark:{ "type" : "record", "name" : "array", "fields" : [ { > "name" : "id", "type" : [ "null", "string" ], "default" : null }, { > "name" : "type", "type" : [ "null", "string" ], "default" : null }, { > "name" : "exist", "type" : [ "null", "boolean" ], "default" : null > } ]} Stack > trace:org.apache.spark.sql.avro.IncompatibleSchemaException:Found recursive > reference in Avro schema, which can not be processed by Spark:{ "type" : > "record", "name" : "array", "fields" : [ { "name" : "id", "type" : [ > "null", "string" ], "default" : null }, { "name" : "type", "type" : > [ "null", "string" ], "default" : null }, { "name" : "exist", > "type" : [ "null", "boolean" ], "default" : null } ]} > at > org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:75) > at > org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:89) > at > org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:105) > at > org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:82) > at > org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:81) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at scala.collection.Iterator$class.foreach(Iterator.scala:891) at > scala.collection.AbstractIterator.foreach(Iterator.scala:1334) at > scala.collection.IterableLike$class.foreach(IterableLike.scala:72) at > scala.collection.AbstractIterable.foreach(Iterable.scala:54) at > scala.collection.TraversableLike$class.map(TraversableLike.scala:234) at > scala.collection.AbstractTraversable.map(Traversable.scala:104) at > org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:81) > at > org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:105) > at > org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:82) > at > org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:81) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at scala.collection.Iterator$class.foreach(Iterator.scala:891) at > scala.collection.AbstractIterator.foreach(Iterator.scala:1334) at > scala.collection.IterableLike$class.foreach(IterableLike.scala:72) at > scala.collection.AbstractIterable.foreach(Iterable.scala:54) at > scala.collection.TraversableLike$class.map(TraversableLike.scala:234) at > scala.collection.AbstractTraversable.map(Traversable.scala:104) at > org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:81) > at > org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:105) > at > org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:82) > at > org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:81) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at scala.collection.Iterator$class.foreach(Iterator.scala:891) at > scala.collection.AbstractIterator.foreach(Iterator.scala:1334) at > scala.collection.IterableLike$class.foreach(IterableLike.scala:72) at > scala.collection.AbstractIterable.foreach(Iterable.scala:54) at > scala.collection.TraversableLike$class.map(TraversableLike.scala:234) at > scala.collection.AbstractTraversable.
[jira] [Updated] (HUDI-1602) Corrupted Avro schema extracted from parquet file
[ https://issues.apache.org/jira/browse/HUDI-1602?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander Filipchik updated HUDI-1602: -- Description: we are running a HUDI deltastreamer on a very complex stream. Schema is deeply nested, with several levels of hierarchy (avro schema is around 6600 LOC). The version of HUDI that writes the dataset if 0.5-SNAPTHOT and we recently started attempts to upgrade to the latest. Hovewer, latest HUDI can't read the provided dataset. Exception I get: {code:java} Got exception while parsing the arguments:Got exception while parsing the arguments:Found recursive reference in Avro schema, which can not be processed by Spark:{ "type" : "record", "name" : "array", "fields" : [ { "name" : "id", "type" : [ "null", "string" ], "default" : null }, { "name" : "type", "type" : [ "null", "string" ], "default" : null }, { "name" : "exist", "type" : [ "null", "boolean" ], "default" : null } ]} Stack trace:org.apache.spark.sql.avro.IncompatibleSchemaException:Found recursive reference in Avro schema, which can not be processed by Spark:{ "type" : "record", "name" : "array", "fields" : [ { "name" : "id", "type" : [ "null", "string" ], "default" : null }, { "name" : "type", "type" : [ "null", "string" ], "default" : null }, { "name" : "exist", "type" : [ "null", "boolean" ], "default" : null } ]} at org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:75) at org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:89) at org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:105) at org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:82) at org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:81) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) at scala.collection.Iterator$class.foreach(Iterator.scala:891) at scala.collection.AbstractIterator.foreach(Iterator.scala:1334) at scala.collection.IterableLike$class.foreach(IterableLike.scala:72) at scala.collection.AbstractIterable.foreach(Iterable.scala:54) at scala.collection.TraversableLike$class.map(TraversableLike.scala:234) at scala.collection.AbstractTraversable.map(Traversable.scala:104) at org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:81) at org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:105) at org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:82) at org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:81) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) at scala.collection.Iterator$class.foreach(Iterator.scala:891) at scala.collection.AbstractIterator.foreach(Iterator.scala:1334) at scala.collection.IterableLike$class.foreach(IterableLike.scala:72) at scala.collection.AbstractIterable.foreach(Iterable.scala:54) at scala.collection.TraversableLike$class.map(TraversableLike.scala:234) at scala.collection.AbstractTraversable.map(Traversable.scala:104) at org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:81) at org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:105) at org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:82) at org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:81) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) at scala.collection.Iterator$class.foreach(Iterator.scala:891) at scala.collection.AbstractIterator.foreach(Iterator.scala:1334) at scala.collection.IterableLike$class.foreach(IterableLike.scala:72) at scala.collection.AbstractIterable.foreach(Iterable.scala:54) at scala.collection.TraversableLike$class.map(TraversableLike.scala:234) at scala.collection.AbstractTraversable.map(Traversable.scala:104) at org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:81) at org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:89) at org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:105) at org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:82) at org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:81) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
[jira] [Updated] (HUDI-1602) Corrupted Avro schema extracted from parquet file
[ https://issues.apache.org/jira/browse/HUDI-1602?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander Filipchik updated HUDI-1602: -- Description: we are running a HUDI deltastreamer on a very complex stream. Schema is deeply nested, with several levels of hierarchy (avro schema is around 6600 LOC). The version of HUDI that writes the dataset if 0.5-SNAPTHOT and we recently started attempts to upgrade to the latest. Hovewer, latest HUDI can't read the provided dataset. Exception I get: {code:java} Got exception while parsing the arguments:Got exception while parsing the arguments:Found recursive reference in Avro schema, which can not be processed by Spark:{ "type" : "record", "name" : "array", "fields" : [ { "name" : "id", "type" : [ "null", "string" ], "default" : null }, { "name" : "type", "type" : [ "null", "string" ], "default" : null }, { "name" : "exist", "type" : [ "null", "boolean" ], "default" : null } ]} Stack trace:org.apache.spark.sql.avro.IncompatibleSchemaException:Found recursive reference in Avro schema, which can not be processed by Spark:{ "type" : "record", "name" : "array", "fields" : [ { "name" : "id", "type" : [ "null", "string" ], "default" : null }, { "name" : "type", "type" : [ "null", "string" ], "default" : null }, { "name" : "exist", "type" : [ "null", "boolean" ], "default" : null } ]} at org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:75) at org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:89) at org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:105) at org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:82) at org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:81) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) at scala.collection.Iterator$class.foreach(Iterator.scala:891) at scala.collection.AbstractIterator.foreach(Iterator.scala:1334) at scala.collection.IterableLike$class.foreach(IterableLike.scala:72) at scala.collection.AbstractIterable.foreach(Iterable.scala:54) at scala.collection.TraversableLike$class.map(TraversableLike.scala:234) at scala.collection.AbstractTraversable.map(Traversable.scala:104) at org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:81) at org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:105) at org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:82) at org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:81) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) at scala.collection.Iterator$class.foreach(Iterator.scala:891) at scala.collection.AbstractIterator.foreach(Iterator.scala:1334) at scala.collection.IterableLike$class.foreach(IterableLike.scala:72) at scala.collection.AbstractIterable.foreach(Iterable.scala:54) at scala.collection.TraversableLike$class.map(TraversableLike.scala:234) at scala.collection.AbstractTraversable.map(Traversable.scala:104) at org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:81) at org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:105) at org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:82) at org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:81) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) at scala.collection.Iterator$class.foreach(Iterator.scala:891) at scala.collection.AbstractIterator.foreach(Iterator.scala:1334) at scala.collection.IterableLike$class.foreach(IterableLike.scala:72) at scala.collection.AbstractIterable.foreach(Iterable.scala:54) at scala.collection.TraversableLike$class.map(TraversableLike.scala:234) at scala.collection.AbstractTraversable.map(Traversable.scala:104) at org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:81) at org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:89) at org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:105) at org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:82) at org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:81) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)