[jira] [Updated] (HUDI-1602) Corrupted Avro schema extracted from parquet file

2022-05-08 Thread Sagar Sumit (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1602?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sagar Sumit updated HUDI-1602:
--
Story Points: 1  (was: 0.5)

> Corrupted Avro schema extracted from parquet file
> -
>
> Key: HUDI-1602
> URL: https://issues.apache.org/jira/browse/HUDI-1602
> Project: Apache Hudi
>  Issue Type: Bug
>Affects Versions: 0.9.0
>Reporter: Alexander Filipchik
>Assignee: Sagar Sumit
>Priority: Blocker
>  Labels: core-flow-ds, pull-request-available, sev:critical
> Fix For: 0.11.0
>
>
> we are running a HUDI deltastreamer on a very complex stream. Schema is 
> deeply nested, with several levels of hierarchy (avro schema is around 6600 
> LOC).
>  
> The version of HUDI that writes the dataset if 0.5-SNAPTHOT and we recently 
> started attempts to upgrade to the latest. Hovewer, latest HUDI can't read 
> the provided dataset. Exception I get: 
>  
>  
> {code:java}
> Got exception while parsing the arguments:Got exception while parsing the 
> arguments:Found recursive reference in Avro schema, which can not be 
> processed by Spark:{  "type" : "record",  "name" : "array",  "fields" : [ {   
>  "name" : "id",    "type" : [ "null", "string" ],    "default" : null  }, {   
>  "name" : "type",    "type" : [ "null", "string" ],    "default" : null  }, { 
>    "name" : "exist",    "type" : [ "null", "boolean" ],    "default" : null  
> } ]}          Stack 
> trace:org.apache.spark.sql.avro.IncompatibleSchemaException:Found recursive 
> reference in Avro schema, which can not be processed by Spark:{  "type" : 
> "record",  "name" : "array",  "fields" : [ {    "name" : "id",    "type" : [ 
> "null", "string" ],    "default" : null  }, {    "name" : "type",    "type" : 
> [ "null", "string" ],    "default" : null  }, {    "name" : "exist",    
> "type" : [ "null", "boolean" ],    "default" : null  } ]}
>  at 
> org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:75)
>  at 
> org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:89)
>  at 
> org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:105)
>  at 
> org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:82)
>  at 
> org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:81)
>  at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>  at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>  at scala.collection.Iterator$class.foreach(Iterator.scala:891) at 
> scala.collection.AbstractIterator.foreach(Iterator.scala:1334) at 
> scala.collection.IterableLike$class.foreach(IterableLike.scala:72) at 
> scala.collection.AbstractIterable.foreach(Iterable.scala:54) at 
> scala.collection.TraversableLike$class.map(TraversableLike.scala:234) at 
> scala.collection.AbstractTraversable.map(Traversable.scala:104) at 
> org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:81)
>  at 
> org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:105)
>  at 
> org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:82)
>  at 
> org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:81)
>  at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>  at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>  at scala.collection.Iterator$class.foreach(Iterator.scala:891) at 
> scala.collection.AbstractIterator.foreach(Iterator.scala:1334) at 
> scala.collection.IterableLike$class.foreach(IterableLike.scala:72) at 
> scala.collection.AbstractIterable.foreach(Iterable.scala:54) at 
> scala.collection.TraversableLike$class.map(TraversableLike.scala:234) at 
> scala.collection.AbstractTraversable.map(Traversable.scala:104) at 
> org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:81)
>  at 
> org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:105)
>  at 
> org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:82)
>  at 
> org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:81)
>  at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>  at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>  at scala.collection.Iterator$class.foreach(Iterator.scala:891) at 
> scala.collection.AbstractIterator.foreach(Iterator.scala:1334) at 
> scala.collection.IterableLike$class.foreach(IterableLike.scala:72) at 
> scala.collection.AbstractIterable.foreach(Iterable.scala:54) at 
> scala.collection

[jira] [Updated] (HUDI-1602) Corrupted Avro schema extracted from parquet file

2022-05-08 Thread Sagar Sumit (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1602?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sagar Sumit updated HUDI-1602:
--
Status: In Progress  (was: Open)

> Corrupted Avro schema extracted from parquet file
> -
>
> Key: HUDI-1602
> URL: https://issues.apache.org/jira/browse/HUDI-1602
> Project: Apache Hudi
>  Issue Type: Bug
>Affects Versions: 0.9.0
>Reporter: Alexander Filipchik
>Assignee: Sagar Sumit
>Priority: Blocker
>  Labels: core-flow-ds, pull-request-available, sev:critical
> Fix For: 0.11.0
>
>
> we are running a HUDI deltastreamer on a very complex stream. Schema is 
> deeply nested, with several levels of hierarchy (avro schema is around 6600 
> LOC).
>  
> The version of HUDI that writes the dataset if 0.5-SNAPTHOT and we recently 
> started attempts to upgrade to the latest. Hovewer, latest HUDI can't read 
> the provided dataset. Exception I get: 
>  
>  
> {code:java}
> Got exception while parsing the arguments:Got exception while parsing the 
> arguments:Found recursive reference in Avro schema, which can not be 
> processed by Spark:{  "type" : "record",  "name" : "array",  "fields" : [ {   
>  "name" : "id",    "type" : [ "null", "string" ],    "default" : null  }, {   
>  "name" : "type",    "type" : [ "null", "string" ],    "default" : null  }, { 
>    "name" : "exist",    "type" : [ "null", "boolean" ],    "default" : null  
> } ]}          Stack 
> trace:org.apache.spark.sql.avro.IncompatibleSchemaException:Found recursive 
> reference in Avro schema, which can not be processed by Spark:{  "type" : 
> "record",  "name" : "array",  "fields" : [ {    "name" : "id",    "type" : [ 
> "null", "string" ],    "default" : null  }, {    "name" : "type",    "type" : 
> [ "null", "string" ],    "default" : null  }, {    "name" : "exist",    
> "type" : [ "null", "boolean" ],    "default" : null  } ]}
>  at 
> org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:75)
>  at 
> org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:89)
>  at 
> org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:105)
>  at 
> org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:82)
>  at 
> org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:81)
>  at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>  at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>  at scala.collection.Iterator$class.foreach(Iterator.scala:891) at 
> scala.collection.AbstractIterator.foreach(Iterator.scala:1334) at 
> scala.collection.IterableLike$class.foreach(IterableLike.scala:72) at 
> scala.collection.AbstractIterable.foreach(Iterable.scala:54) at 
> scala.collection.TraversableLike$class.map(TraversableLike.scala:234) at 
> scala.collection.AbstractTraversable.map(Traversable.scala:104) at 
> org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:81)
>  at 
> org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:105)
>  at 
> org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:82)
>  at 
> org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:81)
>  at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>  at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>  at scala.collection.Iterator$class.foreach(Iterator.scala:891) at 
> scala.collection.AbstractIterator.foreach(Iterator.scala:1334) at 
> scala.collection.IterableLike$class.foreach(IterableLike.scala:72) at 
> scala.collection.AbstractIterable.foreach(Iterable.scala:54) at 
> scala.collection.TraversableLike$class.map(TraversableLike.scala:234) at 
> scala.collection.AbstractTraversable.map(Traversable.scala:104) at 
> org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:81)
>  at 
> org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:105)
>  at 
> org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:82)
>  at 
> org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:81)
>  at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>  at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>  at scala.collection.Iterator$class.foreach(Iterator.scala:891) at 
> scala.collection.AbstractIterator.foreach(Iterator.scala:1334) at 
> scala.collection.IterableLike$class.foreach(IterableLike.scala:72) at 
> scala.collection.AbstractIterable.foreach(Iterable.scala:54) at 
> scala.colle

[jira] [Updated] (HUDI-1602) Corrupted Avro schema extracted from parquet file

2022-05-03 Thread Raymond Xu (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1602?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-1602:
-
Sprint: Hudi-Sprint-Apr-12, Hudi-Sprint-Apr-19, Hudi-Sprint-Apr-25, 
Hudi-Sprint-May-02  (was: Hudi-Sprint-Apr-12, Hudi-Sprint-Apr-19, 
Hudi-Sprint-Apr-25)

> Corrupted Avro schema extracted from parquet file
> -
>
> Key: HUDI-1602
> URL: https://issues.apache.org/jira/browse/HUDI-1602
> Project: Apache Hudi
>  Issue Type: Bug
>Affects Versions: 0.9.0
>Reporter: Alexander Filipchik
>Assignee: Sagar Sumit
>Priority: Blocker
>  Labels: core-flow-ds, pull-request-available, sev:critical
> Fix For: 0.11.0
>
>
> we are running a HUDI deltastreamer on a very complex stream. Schema is 
> deeply nested, with several levels of hierarchy (avro schema is around 6600 
> LOC).
>  
> The version of HUDI that writes the dataset if 0.5-SNAPTHOT and we recently 
> started attempts to upgrade to the latest. Hovewer, latest HUDI can't read 
> the provided dataset. Exception I get: 
>  
>  
> {code:java}
> Got exception while parsing the arguments:Got exception while parsing the 
> arguments:Found recursive reference in Avro schema, which can not be 
> processed by Spark:{  "type" : "record",  "name" : "array",  "fields" : [ {   
>  "name" : "id",    "type" : [ "null", "string" ],    "default" : null  }, {   
>  "name" : "type",    "type" : [ "null", "string" ],    "default" : null  }, { 
>    "name" : "exist",    "type" : [ "null", "boolean" ],    "default" : null  
> } ]}          Stack 
> trace:org.apache.spark.sql.avro.IncompatibleSchemaException:Found recursive 
> reference in Avro schema, which can not be processed by Spark:{  "type" : 
> "record",  "name" : "array",  "fields" : [ {    "name" : "id",    "type" : [ 
> "null", "string" ],    "default" : null  }, {    "name" : "type",    "type" : 
> [ "null", "string" ],    "default" : null  }, {    "name" : "exist",    
> "type" : [ "null", "boolean" ],    "default" : null  } ]}
>  at 
> org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:75)
>  at 
> org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:89)
>  at 
> org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:105)
>  at 
> org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:82)
>  at 
> org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:81)
>  at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>  at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>  at scala.collection.Iterator$class.foreach(Iterator.scala:891) at 
> scala.collection.AbstractIterator.foreach(Iterator.scala:1334) at 
> scala.collection.IterableLike$class.foreach(IterableLike.scala:72) at 
> scala.collection.AbstractIterable.foreach(Iterable.scala:54) at 
> scala.collection.TraversableLike$class.map(TraversableLike.scala:234) at 
> scala.collection.AbstractTraversable.map(Traversable.scala:104) at 
> org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:81)
>  at 
> org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:105)
>  at 
> org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:82)
>  at 
> org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:81)
>  at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>  at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>  at scala.collection.Iterator$class.foreach(Iterator.scala:891) at 
> scala.collection.AbstractIterator.foreach(Iterator.scala:1334) at 
> scala.collection.IterableLike$class.foreach(IterableLike.scala:72) at 
> scala.collection.AbstractIterable.foreach(Iterable.scala:54) at 
> scala.collection.TraversableLike$class.map(TraversableLike.scala:234) at 
> scala.collection.AbstractTraversable.map(Traversable.scala:104) at 
> org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:81)
>  at 
> org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:105)
>  at 
> org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:82)
>  at 
> org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:81)
>  at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>  at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>  at scala.collection.Iterator$class.foreach(Iterator.scala:891) at 
> scala.collection.AbstractIterator.foreach(Iterator.scala:1334) at 
> scala.collection.IterableLike$

[jira] [Updated] (HUDI-1602) Corrupted Avro schema extracted from parquet file

2022-04-26 Thread Raymond Xu (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1602?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-1602:
-
Sprint: Hudi-Sprint-Apr-12, Hudi-Sprint-Apr-19, Hudi-Sprint-Apr-25  (was: 
Hudi-Sprint-Apr-12, Hudi-Sprint-Apr-19)

> Corrupted Avro schema extracted from parquet file
> -
>
> Key: HUDI-1602
> URL: https://issues.apache.org/jira/browse/HUDI-1602
> Project: Apache Hudi
>  Issue Type: Bug
>Affects Versions: 0.9.0
>Reporter: Alexander Filipchik
>Assignee: Sagar Sumit
>Priority: Blocker
>  Labels: core-flow-ds, pull-request-available, sev:critical
> Fix For: 0.11.0
>
>
> we are running a HUDI deltastreamer on a very complex stream. Schema is 
> deeply nested, with several levels of hierarchy (avro schema is around 6600 
> LOC).
>  
> The version of HUDI that writes the dataset if 0.5-SNAPTHOT and we recently 
> started attempts to upgrade to the latest. Hovewer, latest HUDI can't read 
> the provided dataset. Exception I get: 
>  
>  
> {code:java}
> Got exception while parsing the arguments:Got exception while parsing the 
> arguments:Found recursive reference in Avro schema, which can not be 
> processed by Spark:{  "type" : "record",  "name" : "array",  "fields" : [ {   
>  "name" : "id",    "type" : [ "null", "string" ],    "default" : null  }, {   
>  "name" : "type",    "type" : [ "null", "string" ],    "default" : null  }, { 
>    "name" : "exist",    "type" : [ "null", "boolean" ],    "default" : null  
> } ]}          Stack 
> trace:org.apache.spark.sql.avro.IncompatibleSchemaException:Found recursive 
> reference in Avro schema, which can not be processed by Spark:{  "type" : 
> "record",  "name" : "array",  "fields" : [ {    "name" : "id",    "type" : [ 
> "null", "string" ],    "default" : null  }, {    "name" : "type",    "type" : 
> [ "null", "string" ],    "default" : null  }, {    "name" : "exist",    
> "type" : [ "null", "boolean" ],    "default" : null  } ]}
>  at 
> org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:75)
>  at 
> org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:89)
>  at 
> org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:105)
>  at 
> org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:82)
>  at 
> org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:81)
>  at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>  at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>  at scala.collection.Iterator$class.foreach(Iterator.scala:891) at 
> scala.collection.AbstractIterator.foreach(Iterator.scala:1334) at 
> scala.collection.IterableLike$class.foreach(IterableLike.scala:72) at 
> scala.collection.AbstractIterable.foreach(Iterable.scala:54) at 
> scala.collection.TraversableLike$class.map(TraversableLike.scala:234) at 
> scala.collection.AbstractTraversable.map(Traversable.scala:104) at 
> org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:81)
>  at 
> org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:105)
>  at 
> org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:82)
>  at 
> org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:81)
>  at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>  at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>  at scala.collection.Iterator$class.foreach(Iterator.scala:891) at 
> scala.collection.AbstractIterator.foreach(Iterator.scala:1334) at 
> scala.collection.IterableLike$class.foreach(IterableLike.scala:72) at 
> scala.collection.AbstractIterable.foreach(Iterable.scala:54) at 
> scala.collection.TraversableLike$class.map(TraversableLike.scala:234) at 
> scala.collection.AbstractTraversable.map(Traversable.scala:104) at 
> org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:81)
>  at 
> org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:105)
>  at 
> org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:82)
>  at 
> org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:81)
>  at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>  at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>  at scala.collection.Iterator$class.foreach(Iterator.scala:891) at 
> scala.collection.AbstractIterator.foreach(Iterator.scala:1334) at 
> scala.collection.IterableLike$class.foreach(IterableLike.scala:72) at 

[jira] [Updated] (HUDI-1602) Corrupted Avro schema extracted from parquet file

2022-04-19 Thread Raymond Xu (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1602?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-1602:
-
Sprint: Hudi-Sprint-Apr-12, Hudi-Sprint-Apr-19  (was: Hudi-Sprint-Apr-12)

> Corrupted Avro schema extracted from parquet file
> -
>
> Key: HUDI-1602
> URL: https://issues.apache.org/jira/browse/HUDI-1602
> Project: Apache Hudi
>  Issue Type: Bug
>Affects Versions: 0.9.0
>Reporter: Alexander Filipchik
>Assignee: Sagar Sumit
>Priority: Blocker
>  Labels: core-flow-ds, pull-request-available, sev:critical
> Fix For: 0.11.0
>
>
> we are running a HUDI deltastreamer on a very complex stream. Schema is 
> deeply nested, with several levels of hierarchy (avro schema is around 6600 
> LOC).
>  
> The version of HUDI that writes the dataset if 0.5-SNAPTHOT and we recently 
> started attempts to upgrade to the latest. Hovewer, latest HUDI can't read 
> the provided dataset. Exception I get: 
>  
>  
> {code:java}
> Got exception while parsing the arguments:Got exception while parsing the 
> arguments:Found recursive reference in Avro schema, which can not be 
> processed by Spark:{  "type" : "record",  "name" : "array",  "fields" : [ {   
>  "name" : "id",    "type" : [ "null", "string" ],    "default" : null  }, {   
>  "name" : "type",    "type" : [ "null", "string" ],    "default" : null  }, { 
>    "name" : "exist",    "type" : [ "null", "boolean" ],    "default" : null  
> } ]}          Stack 
> trace:org.apache.spark.sql.avro.IncompatibleSchemaException:Found recursive 
> reference in Avro schema, which can not be processed by Spark:{  "type" : 
> "record",  "name" : "array",  "fields" : [ {    "name" : "id",    "type" : [ 
> "null", "string" ],    "default" : null  }, {    "name" : "type",    "type" : 
> [ "null", "string" ],    "default" : null  }, {    "name" : "exist",    
> "type" : [ "null", "boolean" ],    "default" : null  } ]}
>  at 
> org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:75)
>  at 
> org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:89)
>  at 
> org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:105)
>  at 
> org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:82)
>  at 
> org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:81)
>  at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>  at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>  at scala.collection.Iterator$class.foreach(Iterator.scala:891) at 
> scala.collection.AbstractIterator.foreach(Iterator.scala:1334) at 
> scala.collection.IterableLike$class.foreach(IterableLike.scala:72) at 
> scala.collection.AbstractIterable.foreach(Iterable.scala:54) at 
> scala.collection.TraversableLike$class.map(TraversableLike.scala:234) at 
> scala.collection.AbstractTraversable.map(Traversable.scala:104) at 
> org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:81)
>  at 
> org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:105)
>  at 
> org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:82)
>  at 
> org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:81)
>  at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>  at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>  at scala.collection.Iterator$class.foreach(Iterator.scala:891) at 
> scala.collection.AbstractIterator.foreach(Iterator.scala:1334) at 
> scala.collection.IterableLike$class.foreach(IterableLike.scala:72) at 
> scala.collection.AbstractIterable.foreach(Iterable.scala:54) at 
> scala.collection.TraversableLike$class.map(TraversableLike.scala:234) at 
> scala.collection.AbstractTraversable.map(Traversable.scala:104) at 
> org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:81)
>  at 
> org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:105)
>  at 
> org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:82)
>  at 
> org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:81)
>  at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>  at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>  at scala.collection.Iterator$class.foreach(Iterator.scala:891) at 
> scala.collection.AbstractIterator.foreach(Iterator.scala:1334) at 
> scala.collection.IterableLike$class.foreach(IterableLike.scala:72) at 
> scala.collection.AbstractIterable.forea

[jira] [Updated] (HUDI-1602) Corrupted Avro schema extracted from parquet file

2022-04-11 Thread Raymond Xu (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1602?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-1602:
-
Story Points: 0.5

> Corrupted Avro schema extracted from parquet file
> -
>
> Key: HUDI-1602
> URL: https://issues.apache.org/jira/browse/HUDI-1602
> Project: Apache Hudi
>  Issue Type: Bug
>Affects Versions: 0.9.0
>Reporter: Alexander Filipchik
>Priority: Blocker
>  Labels: core-flow-ds, pull-request-available, sev:critical
> Fix For: 0.11.0
>
>
> we are running a HUDI deltastreamer on a very complex stream. Schema is 
> deeply nested, with several levels of hierarchy (avro schema is around 6600 
> LOC).
>  
> The version of HUDI that writes the dataset if 0.5-SNAPTHOT and we recently 
> started attempts to upgrade to the latest. Hovewer, latest HUDI can't read 
> the provided dataset. Exception I get: 
>  
>  
> {code:java}
> Got exception while parsing the arguments:Got exception while parsing the 
> arguments:Found recursive reference in Avro schema, which can not be 
> processed by Spark:{  "type" : "record",  "name" : "array",  "fields" : [ {   
>  "name" : "id",    "type" : [ "null", "string" ],    "default" : null  }, {   
>  "name" : "type",    "type" : [ "null", "string" ],    "default" : null  }, { 
>    "name" : "exist",    "type" : [ "null", "boolean" ],    "default" : null  
> } ]}          Stack 
> trace:org.apache.spark.sql.avro.IncompatibleSchemaException:Found recursive 
> reference in Avro schema, which can not be processed by Spark:{  "type" : 
> "record",  "name" : "array",  "fields" : [ {    "name" : "id",    "type" : [ 
> "null", "string" ],    "default" : null  }, {    "name" : "type",    "type" : 
> [ "null", "string" ],    "default" : null  }, {    "name" : "exist",    
> "type" : [ "null", "boolean" ],    "default" : null  } ]}
>  at 
> org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:75)
>  at 
> org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:89)
>  at 
> org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:105)
>  at 
> org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:82)
>  at 
> org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:81)
>  at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>  at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>  at scala.collection.Iterator$class.foreach(Iterator.scala:891) at 
> scala.collection.AbstractIterator.foreach(Iterator.scala:1334) at 
> scala.collection.IterableLike$class.foreach(IterableLike.scala:72) at 
> scala.collection.AbstractIterable.foreach(Iterable.scala:54) at 
> scala.collection.TraversableLike$class.map(TraversableLike.scala:234) at 
> scala.collection.AbstractTraversable.map(Traversable.scala:104) at 
> org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:81)
>  at 
> org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:105)
>  at 
> org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:82)
>  at 
> org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:81)
>  at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>  at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>  at scala.collection.Iterator$class.foreach(Iterator.scala:891) at 
> scala.collection.AbstractIterator.foreach(Iterator.scala:1334) at 
> scala.collection.IterableLike$class.foreach(IterableLike.scala:72) at 
> scala.collection.AbstractIterable.foreach(Iterable.scala:54) at 
> scala.collection.TraversableLike$class.map(TraversableLike.scala:234) at 
> scala.collection.AbstractTraversable.map(Traversable.scala:104) at 
> org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:81)
>  at 
> org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:105)
>  at 
> org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:82)
>  at 
> org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:81)
>  at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>  at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>  at scala.collection.Iterator$class.foreach(Iterator.scala:891) at 
> scala.collection.AbstractIterator.foreach(Iterator.scala:1334) at 
> scala.collection.IterableLike$class.foreach(IterableLike.scala:72) at 
> scala.collection.AbstractIterable.foreach(Iterable.scala:54) at 
> scala.collection.TraversableLike$class.map(TraversableLike.scal

[jira] [Updated] (HUDI-1602) Corrupted Avro schema extracted from parquet file

2022-04-11 Thread Raymond Xu (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1602?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-1602:
-
Sprint: Hudi-Sprint-Apr-12

> Corrupted Avro schema extracted from parquet file
> -
>
> Key: HUDI-1602
> URL: https://issues.apache.org/jira/browse/HUDI-1602
> Project: Apache Hudi
>  Issue Type: Bug
>Affects Versions: 0.9.0
>Reporter: Alexander Filipchik
>Priority: Blocker
>  Labels: core-flow-ds, pull-request-available, sev:critical
> Fix For: 0.11.0
>
>
> we are running a HUDI deltastreamer on a very complex stream. Schema is 
> deeply nested, with several levels of hierarchy (avro schema is around 6600 
> LOC).
>  
> The version of HUDI that writes the dataset if 0.5-SNAPTHOT and we recently 
> started attempts to upgrade to the latest. Hovewer, latest HUDI can't read 
> the provided dataset. Exception I get: 
>  
>  
> {code:java}
> Got exception while parsing the arguments:Got exception while parsing the 
> arguments:Found recursive reference in Avro schema, which can not be 
> processed by Spark:{  "type" : "record",  "name" : "array",  "fields" : [ {   
>  "name" : "id",    "type" : [ "null", "string" ],    "default" : null  }, {   
>  "name" : "type",    "type" : [ "null", "string" ],    "default" : null  }, { 
>    "name" : "exist",    "type" : [ "null", "boolean" ],    "default" : null  
> } ]}          Stack 
> trace:org.apache.spark.sql.avro.IncompatibleSchemaException:Found recursive 
> reference in Avro schema, which can not be processed by Spark:{  "type" : 
> "record",  "name" : "array",  "fields" : [ {    "name" : "id",    "type" : [ 
> "null", "string" ],    "default" : null  }, {    "name" : "type",    "type" : 
> [ "null", "string" ],    "default" : null  }, {    "name" : "exist",    
> "type" : [ "null", "boolean" ],    "default" : null  } ]}
>  at 
> org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:75)
>  at 
> org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:89)
>  at 
> org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:105)
>  at 
> org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:82)
>  at 
> org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:81)
>  at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>  at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>  at scala.collection.Iterator$class.foreach(Iterator.scala:891) at 
> scala.collection.AbstractIterator.foreach(Iterator.scala:1334) at 
> scala.collection.IterableLike$class.foreach(IterableLike.scala:72) at 
> scala.collection.AbstractIterable.foreach(Iterable.scala:54) at 
> scala.collection.TraversableLike$class.map(TraversableLike.scala:234) at 
> scala.collection.AbstractTraversable.map(Traversable.scala:104) at 
> org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:81)
>  at 
> org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:105)
>  at 
> org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:82)
>  at 
> org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:81)
>  at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>  at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>  at scala.collection.Iterator$class.foreach(Iterator.scala:891) at 
> scala.collection.AbstractIterator.foreach(Iterator.scala:1334) at 
> scala.collection.IterableLike$class.foreach(IterableLike.scala:72) at 
> scala.collection.AbstractIterable.foreach(Iterable.scala:54) at 
> scala.collection.TraversableLike$class.map(TraversableLike.scala:234) at 
> scala.collection.AbstractTraversable.map(Traversable.scala:104) at 
> org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:81)
>  at 
> org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:105)
>  at 
> org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:82)
>  at 
> org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:81)
>  at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>  at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>  at scala.collection.Iterator$class.foreach(Iterator.scala:891) at 
> scala.collection.AbstractIterator.foreach(Iterator.scala:1334) at 
> scala.collection.IterableLike$class.foreach(IterableLike.scala:72) at 
> scala.collection.AbstractIterable.foreach(Iterable.scala:54) at 
> scala.collection.TraversableLike$class.map(Traversable

[jira] [Updated] (HUDI-1602) Corrupted Avro schema extracted from parquet file

2022-04-11 Thread Raymond Xu (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1602?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-1602:
-
Priority: Blocker  (was: Major)

> Corrupted Avro schema extracted from parquet file
> -
>
> Key: HUDI-1602
> URL: https://issues.apache.org/jira/browse/HUDI-1602
> Project: Apache Hudi
>  Issue Type: Bug
>Affects Versions: 0.9.0
>Reporter: Alexander Filipchik
>Priority: Blocker
>  Labels: core-flow-ds, pull-request-available, sev:critical
> Fix For: 0.11.0
>
>
> we are running a HUDI deltastreamer on a very complex stream. Schema is 
> deeply nested, with several levels of hierarchy (avro schema is around 6600 
> LOC).
>  
> The version of HUDI that writes the dataset if 0.5-SNAPTHOT and we recently 
> started attempts to upgrade to the latest. Hovewer, latest HUDI can't read 
> the provided dataset. Exception I get: 
>  
>  
> {code:java}
> Got exception while parsing the arguments:Got exception while parsing the 
> arguments:Found recursive reference in Avro schema, which can not be 
> processed by Spark:{  "type" : "record",  "name" : "array",  "fields" : [ {   
>  "name" : "id",    "type" : [ "null", "string" ],    "default" : null  }, {   
>  "name" : "type",    "type" : [ "null", "string" ],    "default" : null  }, { 
>    "name" : "exist",    "type" : [ "null", "boolean" ],    "default" : null  
> } ]}          Stack 
> trace:org.apache.spark.sql.avro.IncompatibleSchemaException:Found recursive 
> reference in Avro schema, which can not be processed by Spark:{  "type" : 
> "record",  "name" : "array",  "fields" : [ {    "name" : "id",    "type" : [ 
> "null", "string" ],    "default" : null  }, {    "name" : "type",    "type" : 
> [ "null", "string" ],    "default" : null  }, {    "name" : "exist",    
> "type" : [ "null", "boolean" ],    "default" : null  } ]}
>  at 
> org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:75)
>  at 
> org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:89)
>  at 
> org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:105)
>  at 
> org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:82)
>  at 
> org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:81)
>  at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>  at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>  at scala.collection.Iterator$class.foreach(Iterator.scala:891) at 
> scala.collection.AbstractIterator.foreach(Iterator.scala:1334) at 
> scala.collection.IterableLike$class.foreach(IterableLike.scala:72) at 
> scala.collection.AbstractIterable.foreach(Iterable.scala:54) at 
> scala.collection.TraversableLike$class.map(TraversableLike.scala:234) at 
> scala.collection.AbstractTraversable.map(Traversable.scala:104) at 
> org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:81)
>  at 
> org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:105)
>  at 
> org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:82)
>  at 
> org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:81)
>  at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>  at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>  at scala.collection.Iterator$class.foreach(Iterator.scala:891) at 
> scala.collection.AbstractIterator.foreach(Iterator.scala:1334) at 
> scala.collection.IterableLike$class.foreach(IterableLike.scala:72) at 
> scala.collection.AbstractIterable.foreach(Iterable.scala:54) at 
> scala.collection.TraversableLike$class.map(TraversableLike.scala:234) at 
> scala.collection.AbstractTraversable.map(Traversable.scala:104) at 
> org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:81)
>  at 
> org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:105)
>  at 
> org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:82)
>  at 
> org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:81)
>  at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>  at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>  at scala.collection.Iterator$class.foreach(Iterator.scala:891) at 
> scala.collection.AbstractIterator.foreach(Iterator.scala:1334) at 
> scala.collection.IterableLike$class.foreach(IterableLike.scala:72) at 
> scala.collection.AbstractIterable.foreach(Iterable.scala:54) at 
> scala.collection.TraversableLike$class.map(Traver

[jira] [Updated] (HUDI-1602) Corrupted Avro schema extracted from parquet file

2022-03-29 Thread Raymond Xu (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1602?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-1602:
-
Fix Version/s: 0.11.0
   (was: 0.12.0)

> Corrupted Avro schema extracted from parquet file
> -
>
> Key: HUDI-1602
> URL: https://issues.apache.org/jira/browse/HUDI-1602
> Project: Apache Hudi
>  Issue Type: Bug
>Affects Versions: 0.9.0
>Reporter: Alexander Filipchik
>Assignee: Nishith Agarwal
>Priority: Major
>  Labels: core-flow-ds, pull-request-available, sev:critical
> Fix For: 0.11.0
>
>
> we are running a HUDI deltastreamer on a very complex stream. Schema is 
> deeply nested, with several levels of hierarchy (avro schema is around 6600 
> LOC).
>  
> The version of HUDI that writes the dataset if 0.5-SNAPTHOT and we recently 
> started attempts to upgrade to the latest. Hovewer, latest HUDI can't read 
> the provided dataset. Exception I get: 
>  
>  
> {code:java}
> Got exception while parsing the arguments:Got exception while parsing the 
> arguments:Found recursive reference in Avro schema, which can not be 
> processed by Spark:{  "type" : "record",  "name" : "array",  "fields" : [ {   
>  "name" : "id",    "type" : [ "null", "string" ],    "default" : null  }, {   
>  "name" : "type",    "type" : [ "null", "string" ],    "default" : null  }, { 
>    "name" : "exist",    "type" : [ "null", "boolean" ],    "default" : null  
> } ]}          Stack 
> trace:org.apache.spark.sql.avro.IncompatibleSchemaException:Found recursive 
> reference in Avro schema, which can not be processed by Spark:{  "type" : 
> "record",  "name" : "array",  "fields" : [ {    "name" : "id",    "type" : [ 
> "null", "string" ],    "default" : null  }, {    "name" : "type",    "type" : 
> [ "null", "string" ],    "default" : null  }, {    "name" : "exist",    
> "type" : [ "null", "boolean" ],    "default" : null  } ]}
>  at 
> org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:75)
>  at 
> org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:89)
>  at 
> org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:105)
>  at 
> org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:82)
>  at 
> org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:81)
>  at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>  at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>  at scala.collection.Iterator$class.foreach(Iterator.scala:891) at 
> scala.collection.AbstractIterator.foreach(Iterator.scala:1334) at 
> scala.collection.IterableLike$class.foreach(IterableLike.scala:72) at 
> scala.collection.AbstractIterable.foreach(Iterable.scala:54) at 
> scala.collection.TraversableLike$class.map(TraversableLike.scala:234) at 
> scala.collection.AbstractTraversable.map(Traversable.scala:104) at 
> org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:81)
>  at 
> org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:105)
>  at 
> org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:82)
>  at 
> org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:81)
>  at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>  at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>  at scala.collection.Iterator$class.foreach(Iterator.scala:891) at 
> scala.collection.AbstractIterator.foreach(Iterator.scala:1334) at 
> scala.collection.IterableLike$class.foreach(IterableLike.scala:72) at 
> scala.collection.AbstractIterable.foreach(Iterable.scala:54) at 
> scala.collection.TraversableLike$class.map(TraversableLike.scala:234) at 
> scala.collection.AbstractTraversable.map(Traversable.scala:104) at 
> org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:81)
>  at 
> org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:105)
>  at 
> org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:82)
>  at 
> org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:81)
>  at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>  at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>  at scala.collection.Iterator$class.foreach(Iterator.scala:891) at 
> scala.collection.AbstractIterator.foreach(Iterator.scala:1334) at 
> scala.collection.IterableLike$class.foreach(IterableLike.scala:72) at 
> scala.collection.AbstractIterable.foreach(Iterable.s

[jira] [Updated] (HUDI-1602) Corrupted Avro schema extracted from parquet file

2022-03-29 Thread Raymond Xu (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1602?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-1602:
-
Status: Open  (was: Patch Available)

> Corrupted Avro schema extracted from parquet file
> -
>
> Key: HUDI-1602
> URL: https://issues.apache.org/jira/browse/HUDI-1602
> Project: Apache Hudi
>  Issue Type: Bug
>Affects Versions: 0.9.0
>Reporter: Alexander Filipchik
>Assignee: Nishith Agarwal
>Priority: Major
>  Labels: core-flow-ds, pull-request-available, sev:critical
> Fix For: 0.12.0
>
>
> we are running a HUDI deltastreamer on a very complex stream. Schema is 
> deeply nested, with several levels of hierarchy (avro schema is around 6600 
> LOC).
>  
> The version of HUDI that writes the dataset if 0.5-SNAPTHOT and we recently 
> started attempts to upgrade to the latest. Hovewer, latest HUDI can't read 
> the provided dataset. Exception I get: 
>  
>  
> {code:java}
> Got exception while parsing the arguments:Got exception while parsing the 
> arguments:Found recursive reference in Avro schema, which can not be 
> processed by Spark:{  "type" : "record",  "name" : "array",  "fields" : [ {   
>  "name" : "id",    "type" : [ "null", "string" ],    "default" : null  }, {   
>  "name" : "type",    "type" : [ "null", "string" ],    "default" : null  }, { 
>    "name" : "exist",    "type" : [ "null", "boolean" ],    "default" : null  
> } ]}          Stack 
> trace:org.apache.spark.sql.avro.IncompatibleSchemaException:Found recursive 
> reference in Avro schema, which can not be processed by Spark:{  "type" : 
> "record",  "name" : "array",  "fields" : [ {    "name" : "id",    "type" : [ 
> "null", "string" ],    "default" : null  }, {    "name" : "type",    "type" : 
> [ "null", "string" ],    "default" : null  }, {    "name" : "exist",    
> "type" : [ "null", "boolean" ],    "default" : null  } ]}
>  at 
> org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:75)
>  at 
> org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:89)
>  at 
> org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:105)
>  at 
> org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:82)
>  at 
> org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:81)
>  at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>  at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>  at scala.collection.Iterator$class.foreach(Iterator.scala:891) at 
> scala.collection.AbstractIterator.foreach(Iterator.scala:1334) at 
> scala.collection.IterableLike$class.foreach(IterableLike.scala:72) at 
> scala.collection.AbstractIterable.foreach(Iterable.scala:54) at 
> scala.collection.TraversableLike$class.map(TraversableLike.scala:234) at 
> scala.collection.AbstractTraversable.map(Traversable.scala:104) at 
> org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:81)
>  at 
> org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:105)
>  at 
> org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:82)
>  at 
> org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:81)
>  at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>  at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>  at scala.collection.Iterator$class.foreach(Iterator.scala:891) at 
> scala.collection.AbstractIterator.foreach(Iterator.scala:1334) at 
> scala.collection.IterableLike$class.foreach(IterableLike.scala:72) at 
> scala.collection.AbstractIterable.foreach(Iterable.scala:54) at 
> scala.collection.TraversableLike$class.map(TraversableLike.scala:234) at 
> scala.collection.AbstractTraversable.map(Traversable.scala:104) at 
> org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:81)
>  at 
> org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:105)
>  at 
> org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:82)
>  at 
> org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:81)
>  at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>  at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>  at scala.collection.Iterator$class.foreach(Iterator.scala:891) at 
> scala.collection.AbstractIterator.foreach(Iterator.scala:1334) at 
> scala.collection.IterableLike$class.foreach(IterableLike.scala:72) at 
> scala.collection.AbstractIterable.foreach(Iterable.scala:54) at 
> scala.c

[jira] [Updated] (HUDI-1602) Corrupted Avro schema extracted from parquet file

2022-03-29 Thread Raymond Xu (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1602?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-1602:
-
Fix Version/s: 0.12.0
   (was: 0.11.0)

> Corrupted Avro schema extracted from parquet file
> -
>
> Key: HUDI-1602
> URL: https://issues.apache.org/jira/browse/HUDI-1602
> Project: Apache Hudi
>  Issue Type: Bug
>Affects Versions: 0.9.0
>Reporter: Alexander Filipchik
>Assignee: Nishith Agarwal
>Priority: Major
>  Labels: core-flow-ds, pull-request-available, sev:critical
> Fix For: 0.12.0
>
>
> we are running a HUDI deltastreamer on a very complex stream. Schema is 
> deeply nested, with several levels of hierarchy (avro schema is around 6600 
> LOC).
>  
> The version of HUDI that writes the dataset if 0.5-SNAPTHOT and we recently 
> started attempts to upgrade to the latest. Hovewer, latest HUDI can't read 
> the provided dataset. Exception I get: 
>  
>  
> {code:java}
> Got exception while parsing the arguments:Got exception while parsing the 
> arguments:Found recursive reference in Avro schema, which can not be 
> processed by Spark:{  "type" : "record",  "name" : "array",  "fields" : [ {   
>  "name" : "id",    "type" : [ "null", "string" ],    "default" : null  }, {   
>  "name" : "type",    "type" : [ "null", "string" ],    "default" : null  }, { 
>    "name" : "exist",    "type" : [ "null", "boolean" ],    "default" : null  
> } ]}          Stack 
> trace:org.apache.spark.sql.avro.IncompatibleSchemaException:Found recursive 
> reference in Avro schema, which can not be processed by Spark:{  "type" : 
> "record",  "name" : "array",  "fields" : [ {    "name" : "id",    "type" : [ 
> "null", "string" ],    "default" : null  }, {    "name" : "type",    "type" : 
> [ "null", "string" ],    "default" : null  }, {    "name" : "exist",    
> "type" : [ "null", "boolean" ],    "default" : null  } ]}
>  at 
> org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:75)
>  at 
> org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:89)
>  at 
> org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:105)
>  at 
> org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:82)
>  at 
> org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:81)
>  at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>  at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>  at scala.collection.Iterator$class.foreach(Iterator.scala:891) at 
> scala.collection.AbstractIterator.foreach(Iterator.scala:1334) at 
> scala.collection.IterableLike$class.foreach(IterableLike.scala:72) at 
> scala.collection.AbstractIterable.foreach(Iterable.scala:54) at 
> scala.collection.TraversableLike$class.map(TraversableLike.scala:234) at 
> scala.collection.AbstractTraversable.map(Traversable.scala:104) at 
> org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:81)
>  at 
> org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:105)
>  at 
> org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:82)
>  at 
> org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:81)
>  at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>  at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>  at scala.collection.Iterator$class.foreach(Iterator.scala:891) at 
> scala.collection.AbstractIterator.foreach(Iterator.scala:1334) at 
> scala.collection.IterableLike$class.foreach(IterableLike.scala:72) at 
> scala.collection.AbstractIterable.foreach(Iterable.scala:54) at 
> scala.collection.TraversableLike$class.map(TraversableLike.scala:234) at 
> scala.collection.AbstractTraversable.map(Traversable.scala:104) at 
> org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:81)
>  at 
> org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:105)
>  at 
> org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:82)
>  at 
> org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:81)
>  at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>  at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>  at scala.collection.Iterator$class.foreach(Iterator.scala:891) at 
> scala.collection.AbstractIterator.foreach(Iterator.scala:1334) at 
> scala.collection.IterableLike$class.foreach(IterableLike.scala:72) at 
> scala.collection.AbstractIterable.foreach(Iterable.s

[jira] [Updated] (HUDI-1602) Corrupted Avro schema extracted from parquet file

2021-12-12 Thread sivabalan narayanan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1602?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan updated HUDI-1602:
--
Labels: core-flow-ds pull-request-available sev:critical  (was: 
pull-request-available sev:critical)

> Corrupted Avro schema extracted from parquet file
> -
>
> Key: HUDI-1602
> URL: https://issues.apache.org/jira/browse/HUDI-1602
> Project: Apache Hudi
>  Issue Type: Bug
>Affects Versions: 0.9.0
>Reporter: Alexander Filipchik
>Assignee: Nishith Agarwal
>Priority: Major
>  Labels: core-flow-ds, pull-request-available, sev:critical
> Fix For: 0.11.0
>
>
> we are running a HUDI deltastreamer on a very complex stream. Schema is 
> deeply nested, with several levels of hierarchy (avro schema is around 6600 
> LOC).
>  
> The version of HUDI that writes the dataset if 0.5-SNAPTHOT and we recently 
> started attempts to upgrade to the latest. Hovewer, latest HUDI can't read 
> the provided dataset. Exception I get: 
>  
>  
> {code:java}
> Got exception while parsing the arguments:Got exception while parsing the 
> arguments:Found recursive reference in Avro schema, which can not be 
> processed by Spark:{  "type" : "record",  "name" : "array",  "fields" : [ {   
>  "name" : "id",    "type" : [ "null", "string" ],    "default" : null  }, {   
>  "name" : "type",    "type" : [ "null", "string" ],    "default" : null  }, { 
>    "name" : "exist",    "type" : [ "null", "boolean" ],    "default" : null  
> } ]}          Stack 
> trace:org.apache.spark.sql.avro.IncompatibleSchemaException:Found recursive 
> reference in Avro schema, which can not be processed by Spark:{  "type" : 
> "record",  "name" : "array",  "fields" : [ {    "name" : "id",    "type" : [ 
> "null", "string" ],    "default" : null  }, {    "name" : "type",    "type" : 
> [ "null", "string" ],    "default" : null  }, {    "name" : "exist",    
> "type" : [ "null", "boolean" ],    "default" : null  } ]}
>  at 
> org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:75)
>  at 
> org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:89)
>  at 
> org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:105)
>  at 
> org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:82)
>  at 
> org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:81)
>  at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>  at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>  at scala.collection.Iterator$class.foreach(Iterator.scala:891) at 
> scala.collection.AbstractIterator.foreach(Iterator.scala:1334) at 
> scala.collection.IterableLike$class.foreach(IterableLike.scala:72) at 
> scala.collection.AbstractIterable.foreach(Iterable.scala:54) at 
> scala.collection.TraversableLike$class.map(TraversableLike.scala:234) at 
> scala.collection.AbstractTraversable.map(Traversable.scala:104) at 
> org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:81)
>  at 
> org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:105)
>  at 
> org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:82)
>  at 
> org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:81)
>  at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>  at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>  at scala.collection.Iterator$class.foreach(Iterator.scala:891) at 
> scala.collection.AbstractIterator.foreach(Iterator.scala:1334) at 
> scala.collection.IterableLike$class.foreach(IterableLike.scala:72) at 
> scala.collection.AbstractIterable.foreach(Iterable.scala:54) at 
> scala.collection.TraversableLike$class.map(TraversableLike.scala:234) at 
> scala.collection.AbstractTraversable.map(Traversable.scala:104) at 
> org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:81)
>  at 
> org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:105)
>  at 
> org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:82)
>  at 
> org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:81)
>  at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>  at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>  at scala.collection.Iterator$class.foreach(Iterator.scala:891) at 
> scala.collection.AbstractIterator.foreach(Iterator.scala:1334) at 
> scala.collection.IterableLike$class.foreach(IterableLike.scala:7

[jira] [Updated] (HUDI-1602) Corrupted Avro schema extracted from parquet file

2021-11-26 Thread sivabalan narayanan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1602?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan updated HUDI-1602:
--
Fix Version/s: 0.11.0
   (was: 0.10.0)

> Corrupted Avro schema extracted from parquet file
> -
>
> Key: HUDI-1602
> URL: https://issues.apache.org/jira/browse/HUDI-1602
> Project: Apache Hudi
>  Issue Type: Bug
>Affects Versions: 0.9.0
>Reporter: Alexander Filipchik
>Assignee: Nishith Agarwal
>Priority: Major
>  Labels: pull-request-available, sev:critical
> Fix For: 0.11.0
>
>
> we are running a HUDI deltastreamer on a very complex stream. Schema is 
> deeply nested, with several levels of hierarchy (avro schema is around 6600 
> LOC).
>  
> The version of HUDI that writes the dataset if 0.5-SNAPTHOT and we recently 
> started attempts to upgrade to the latest. Hovewer, latest HUDI can't read 
> the provided dataset. Exception I get: 
>  
>  
> {code:java}
> Got exception while parsing the arguments:Got exception while parsing the 
> arguments:Found recursive reference in Avro schema, which can not be 
> processed by Spark:{  "type" : "record",  "name" : "array",  "fields" : [ {   
>  "name" : "id",    "type" : [ "null", "string" ],    "default" : null  }, {   
>  "name" : "type",    "type" : [ "null", "string" ],    "default" : null  }, { 
>    "name" : "exist",    "type" : [ "null", "boolean" ],    "default" : null  
> } ]}          Stack 
> trace:org.apache.spark.sql.avro.IncompatibleSchemaException:Found recursive 
> reference in Avro schema, which can not be processed by Spark:{  "type" : 
> "record",  "name" : "array",  "fields" : [ {    "name" : "id",    "type" : [ 
> "null", "string" ],    "default" : null  }, {    "name" : "type",    "type" : 
> [ "null", "string" ],    "default" : null  }, {    "name" : "exist",    
> "type" : [ "null", "boolean" ],    "default" : null  } ]}
>  at 
> org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:75)
>  at 
> org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:89)
>  at 
> org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:105)
>  at 
> org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:82)
>  at 
> org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:81)
>  at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>  at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>  at scala.collection.Iterator$class.foreach(Iterator.scala:891) at 
> scala.collection.AbstractIterator.foreach(Iterator.scala:1334) at 
> scala.collection.IterableLike$class.foreach(IterableLike.scala:72) at 
> scala.collection.AbstractIterable.foreach(Iterable.scala:54) at 
> scala.collection.TraversableLike$class.map(TraversableLike.scala:234) at 
> scala.collection.AbstractTraversable.map(Traversable.scala:104) at 
> org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:81)
>  at 
> org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:105)
>  at 
> org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:82)
>  at 
> org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:81)
>  at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>  at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>  at scala.collection.Iterator$class.foreach(Iterator.scala:891) at 
> scala.collection.AbstractIterator.foreach(Iterator.scala:1334) at 
> scala.collection.IterableLike$class.foreach(IterableLike.scala:72) at 
> scala.collection.AbstractIterable.foreach(Iterable.scala:54) at 
> scala.collection.TraversableLike$class.map(TraversableLike.scala:234) at 
> scala.collection.AbstractTraversable.map(Traversable.scala:104) at 
> org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:81)
>  at 
> org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:105)
>  at 
> org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:82)
>  at 
> org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:81)
>  at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>  at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>  at scala.collection.Iterator$class.foreach(Iterator.scala:891) at 
> scala.collection.AbstractIterator.foreach(Iterator.scala:1334) at 
> scala.collection.IterableLike$class.foreach(IterableLike.scala:72) at 
> scala.collection.AbstractIterable.foreach(Iterab

[jira] [Updated] (HUDI-1602) Corrupted Avro schema extracted from parquet file

2021-08-24 Thread Udit Mehrotra (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1602?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Udit Mehrotra updated HUDI-1602:

Fix Version/s: (was: 0.9.0)
   0.10.0

> Corrupted Avro schema extracted from parquet file
> -
>
> Key: HUDI-1602
> URL: https://issues.apache.org/jira/browse/HUDI-1602
> Project: Apache Hudi
>  Issue Type: Bug
>Affects Versions: 0.9.0
>Reporter: Alexander Filipchik
>Assignee: Nishith Agarwal
>Priority: Major
>  Labels: pull-request-available, sev:critical
> Fix For: 0.10.0
>
>
> we are running a HUDI deltastreamer on a very complex stream. Schema is 
> deeply nested, with several levels of hierarchy (avro schema is around 6600 
> LOC).
>  
> The version of HUDI that writes the dataset if 0.5-SNAPTHOT and we recently 
> started attempts to upgrade to the latest. Hovewer, latest HUDI can't read 
> the provided dataset. Exception I get: 
>  
>  
> {code:java}
> Got exception while parsing the arguments:Got exception while parsing the 
> arguments:Found recursive reference in Avro schema, which can not be 
> processed by Spark:{  "type" : "record",  "name" : "array",  "fields" : [ {   
>  "name" : "id",    "type" : [ "null", "string" ],    "default" : null  }, {   
>  "name" : "type",    "type" : [ "null", "string" ],    "default" : null  }, { 
>    "name" : "exist",    "type" : [ "null", "boolean" ],    "default" : null  
> } ]}          Stack 
> trace:org.apache.spark.sql.avro.IncompatibleSchemaException:Found recursive 
> reference in Avro schema, which can not be processed by Spark:{  "type" : 
> "record",  "name" : "array",  "fields" : [ {    "name" : "id",    "type" : [ 
> "null", "string" ],    "default" : null  }, {    "name" : "type",    "type" : 
> [ "null", "string" ],    "default" : null  }, {    "name" : "exist",    
> "type" : [ "null", "boolean" ],    "default" : null  } ]}
>  at 
> org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:75)
>  at 
> org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:89)
>  at 
> org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:105)
>  at 
> org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:82)
>  at 
> org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:81)
>  at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>  at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>  at scala.collection.Iterator$class.foreach(Iterator.scala:891) at 
> scala.collection.AbstractIterator.foreach(Iterator.scala:1334) at 
> scala.collection.IterableLike$class.foreach(IterableLike.scala:72) at 
> scala.collection.AbstractIterable.foreach(Iterable.scala:54) at 
> scala.collection.TraversableLike$class.map(TraversableLike.scala:234) at 
> scala.collection.AbstractTraversable.map(Traversable.scala:104) at 
> org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:81)
>  at 
> org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:105)
>  at 
> org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:82)
>  at 
> org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:81)
>  at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>  at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>  at scala.collection.Iterator$class.foreach(Iterator.scala:891) at 
> scala.collection.AbstractIterator.foreach(Iterator.scala:1334) at 
> scala.collection.IterableLike$class.foreach(IterableLike.scala:72) at 
> scala.collection.AbstractIterable.foreach(Iterable.scala:54) at 
> scala.collection.TraversableLike$class.map(TraversableLike.scala:234) at 
> scala.collection.AbstractTraversable.map(Traversable.scala:104) at 
> org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:81)
>  at 
> org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:105)
>  at 
> org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:82)
>  at 
> org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:81)
>  at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>  at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>  at scala.collection.Iterator$class.foreach(Iterator.scala:891) at 
> scala.collection.AbstractIterator.foreach(Iterator.scala:1334) at 
> scala.collection.IterableLike$class.foreach(IterableLike.scala:72) at 
> scala.collection.AbstractIterable.foreach(Iterable.scala:54) 

[jira] [Updated] (HUDI-1602) Corrupted Avro schema extracted from parquet file

2021-04-23 Thread sivabalan narayanan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1602?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan updated HUDI-1602:
--
Status: In Progress  (was: Open)

> Corrupted Avro schema extracted from parquet file
> -
>
> Key: HUDI-1602
> URL: https://issues.apache.org/jira/browse/HUDI-1602
> Project: Apache Hudi
>  Issue Type: Bug
>Affects Versions: 0.9.0
>Reporter: Alexander Filipchik
>Assignee: Vinoth Chandar
>Priority: Major
>  Labels: pull-request-available, sev:critical
> Fix For: 0.9.0
>
>
> we are running a HUDI deltastreamer on a very complex stream. Schema is 
> deeply nested, with several levels of hierarchy (avro schema is around 6600 
> LOC).
>  
> The version of HUDI that writes the dataset if 0.5-SNAPTHOT and we recently 
> started attempts to upgrade to the latest. Hovewer, latest HUDI can't read 
> the provided dataset. Exception I get: 
>  
>  
> {code:java}
> Got exception while parsing the arguments:Got exception while parsing the 
> arguments:Found recursive reference in Avro schema, which can not be 
> processed by Spark:{  "type" : "record",  "name" : "array",  "fields" : [ {   
>  "name" : "id",    "type" : [ "null", "string" ],    "default" : null  }, {   
>  "name" : "type",    "type" : [ "null", "string" ],    "default" : null  }, { 
>    "name" : "exist",    "type" : [ "null", "boolean" ],    "default" : null  
> } ]}          Stack 
> trace:org.apache.spark.sql.avro.IncompatibleSchemaException:Found recursive 
> reference in Avro schema, which can not be processed by Spark:{  "type" : 
> "record",  "name" : "array",  "fields" : [ {    "name" : "id",    "type" : [ 
> "null", "string" ],    "default" : null  }, {    "name" : "type",    "type" : 
> [ "null", "string" ],    "default" : null  }, {    "name" : "exist",    
> "type" : [ "null", "boolean" ],    "default" : null  } ]}
>  at 
> org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:75)
>  at 
> org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:89)
>  at 
> org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:105)
>  at 
> org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:82)
>  at 
> org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:81)
>  at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>  at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>  at scala.collection.Iterator$class.foreach(Iterator.scala:891) at 
> scala.collection.AbstractIterator.foreach(Iterator.scala:1334) at 
> scala.collection.IterableLike$class.foreach(IterableLike.scala:72) at 
> scala.collection.AbstractIterable.foreach(Iterable.scala:54) at 
> scala.collection.TraversableLike$class.map(TraversableLike.scala:234) at 
> scala.collection.AbstractTraversable.map(Traversable.scala:104) at 
> org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:81)
>  at 
> org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:105)
>  at 
> org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:82)
>  at 
> org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:81)
>  at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>  at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>  at scala.collection.Iterator$class.foreach(Iterator.scala:891) at 
> scala.collection.AbstractIterator.foreach(Iterator.scala:1334) at 
> scala.collection.IterableLike$class.foreach(IterableLike.scala:72) at 
> scala.collection.AbstractIterable.foreach(Iterable.scala:54) at 
> scala.collection.TraversableLike$class.map(TraversableLike.scala:234) at 
> scala.collection.AbstractTraversable.map(Traversable.scala:104) at 
> org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:81)
>  at 
> org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:105)
>  at 
> org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:82)
>  at 
> org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:81)
>  at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>  at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>  at scala.collection.Iterator$class.foreach(Iterator.scala:891) at 
> scala.collection.AbstractIterator.foreach(Iterator.scala:1334) at 
> scala.collection.IterableLike$class.foreach(IterableLike.scala:72) at 
> scala.collection.AbstractIterable.foreach(Iterable.scala:54) at 
> scala.col

[jira] [Updated] (HUDI-1602) Corrupted Avro schema extracted from parquet file

2021-04-23 Thread sivabalan narayanan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1602?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan updated HUDI-1602:
--
Status: Patch Available  (was: In Progress)

> Corrupted Avro schema extracted from parquet file
> -
>
> Key: HUDI-1602
> URL: https://issues.apache.org/jira/browse/HUDI-1602
> Project: Apache Hudi
>  Issue Type: Bug
>Affects Versions: 0.9.0
>Reporter: Alexander Filipchik
>Assignee: Vinoth Chandar
>Priority: Major
>  Labels: pull-request-available, sev:critical
> Fix For: 0.9.0
>
>
> we are running a HUDI deltastreamer on a very complex stream. Schema is 
> deeply nested, with several levels of hierarchy (avro schema is around 6600 
> LOC).
>  
> The version of HUDI that writes the dataset if 0.5-SNAPTHOT and we recently 
> started attempts to upgrade to the latest. Hovewer, latest HUDI can't read 
> the provided dataset. Exception I get: 
>  
>  
> {code:java}
> Got exception while parsing the arguments:Got exception while parsing the 
> arguments:Found recursive reference in Avro schema, which can not be 
> processed by Spark:{  "type" : "record",  "name" : "array",  "fields" : [ {   
>  "name" : "id",    "type" : [ "null", "string" ],    "default" : null  }, {   
>  "name" : "type",    "type" : [ "null", "string" ],    "default" : null  }, { 
>    "name" : "exist",    "type" : [ "null", "boolean" ],    "default" : null  
> } ]}          Stack 
> trace:org.apache.spark.sql.avro.IncompatibleSchemaException:Found recursive 
> reference in Avro schema, which can not be processed by Spark:{  "type" : 
> "record",  "name" : "array",  "fields" : [ {    "name" : "id",    "type" : [ 
> "null", "string" ],    "default" : null  }, {    "name" : "type",    "type" : 
> [ "null", "string" ],    "default" : null  }, {    "name" : "exist",    
> "type" : [ "null", "boolean" ],    "default" : null  } ]}
>  at 
> org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:75)
>  at 
> org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:89)
>  at 
> org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:105)
>  at 
> org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:82)
>  at 
> org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:81)
>  at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>  at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>  at scala.collection.Iterator$class.foreach(Iterator.scala:891) at 
> scala.collection.AbstractIterator.foreach(Iterator.scala:1334) at 
> scala.collection.IterableLike$class.foreach(IterableLike.scala:72) at 
> scala.collection.AbstractIterable.foreach(Iterable.scala:54) at 
> scala.collection.TraversableLike$class.map(TraversableLike.scala:234) at 
> scala.collection.AbstractTraversable.map(Traversable.scala:104) at 
> org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:81)
>  at 
> org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:105)
>  at 
> org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:82)
>  at 
> org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:81)
>  at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>  at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>  at scala.collection.Iterator$class.foreach(Iterator.scala:891) at 
> scala.collection.AbstractIterator.foreach(Iterator.scala:1334) at 
> scala.collection.IterableLike$class.foreach(IterableLike.scala:72) at 
> scala.collection.AbstractIterable.foreach(Iterable.scala:54) at 
> scala.collection.TraversableLike$class.map(TraversableLike.scala:234) at 
> scala.collection.AbstractTraversable.map(Traversable.scala:104) at 
> org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:81)
>  at 
> org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:105)
>  at 
> org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:82)
>  at 
> org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:81)
>  at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>  at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>  at scala.collection.Iterator$class.foreach(Iterator.scala:891) at 
> scala.collection.AbstractIterator.foreach(Iterator.scala:1334) at 
> scala.collection.IterableLike$class.foreach(IterableLike.scala:72) at 
> scala.collection.AbstractIterable.foreach(Iterable.scala:54) at 

[jira] [Updated] (HUDI-1602) Corrupted Avro schema extracted from parquet file

2021-03-23 Thread Gary Li (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1602?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gary Li updated HUDI-1602:
--
Fix Version/s: (was: 0.8.0)
   0.9.0

> Corrupted Avro schema extracted from parquet file
> -
>
> Key: HUDI-1602
> URL: https://issues.apache.org/jira/browse/HUDI-1602
> Project: Apache Hudi
>  Issue Type: Bug
>Affects Versions: 0.9.0
>Reporter: Alexander Filipchik
>Assignee: Vinoth Chandar
>Priority: Major
>  Labels: pull-request-available, sev:critical
> Fix For: 0.9.0
>
>
> we are running a HUDI deltastreamer on a very complex stream. Schema is 
> deeply nested, with several levels of hierarchy (avro schema is around 6600 
> LOC).
>  
> The version of HUDI that writes the dataset if 0.5-SNAPTHOT and we recently 
> started attempts to upgrade to the latest. Hovewer, latest HUDI can't read 
> the provided dataset. Exception I get: 
>  
>  
> {code:java}
> Got exception while parsing the arguments:Got exception while parsing the 
> arguments:Found recursive reference in Avro schema, which can not be 
> processed by Spark:{  "type" : "record",  "name" : "array",  "fields" : [ {   
>  "name" : "id",    "type" : [ "null", "string" ],    "default" : null  }, {   
>  "name" : "type",    "type" : [ "null", "string" ],    "default" : null  }, { 
>    "name" : "exist",    "type" : [ "null", "boolean" ],    "default" : null  
> } ]}          Stack 
> trace:org.apache.spark.sql.avro.IncompatibleSchemaException:Found recursive 
> reference in Avro schema, which can not be processed by Spark:{  "type" : 
> "record",  "name" : "array",  "fields" : [ {    "name" : "id",    "type" : [ 
> "null", "string" ],    "default" : null  }, {    "name" : "type",    "type" : 
> [ "null", "string" ],    "default" : null  }, {    "name" : "exist",    
> "type" : [ "null", "boolean" ],    "default" : null  } ]}
>  at 
> org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:75)
>  at 
> org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:89)
>  at 
> org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:105)
>  at 
> org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:82)
>  at 
> org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:81)
>  at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>  at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>  at scala.collection.Iterator$class.foreach(Iterator.scala:891) at 
> scala.collection.AbstractIterator.foreach(Iterator.scala:1334) at 
> scala.collection.IterableLike$class.foreach(IterableLike.scala:72) at 
> scala.collection.AbstractIterable.foreach(Iterable.scala:54) at 
> scala.collection.TraversableLike$class.map(TraversableLike.scala:234) at 
> scala.collection.AbstractTraversable.map(Traversable.scala:104) at 
> org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:81)
>  at 
> org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:105)
>  at 
> org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:82)
>  at 
> org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:81)
>  at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>  at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>  at scala.collection.Iterator$class.foreach(Iterator.scala:891) at 
> scala.collection.AbstractIterator.foreach(Iterator.scala:1334) at 
> scala.collection.IterableLike$class.foreach(IterableLike.scala:72) at 
> scala.collection.AbstractIterable.foreach(Iterable.scala:54) at 
> scala.collection.TraversableLike$class.map(TraversableLike.scala:234) at 
> scala.collection.AbstractTraversable.map(Traversable.scala:104) at 
> org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:81)
>  at 
> org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:105)
>  at 
> org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:82)
>  at 
> org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:81)
>  at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>  at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>  at scala.collection.Iterator$class.foreach(Iterator.scala:891) at 
> scala.collection.AbstractIterator.foreach(Iterator.scala:1334) at 
> scala.collection.IterableLike$class.foreach(IterableLike.scala:72) at 
> scala.collection.AbstractIterable.foreach(Iterable.scala:54) at 
> scala.col

[jira] [Updated] (HUDI-1602) Corrupted Avro schema extracted from parquet file

2021-03-23 Thread Gary Li (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1602?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gary Li updated HUDI-1602:
--
Affects Version/s: 0.9.0

> Corrupted Avro schema extracted from parquet file
> -
>
> Key: HUDI-1602
> URL: https://issues.apache.org/jira/browse/HUDI-1602
> Project: Apache Hudi
>  Issue Type: Bug
>Affects Versions: 0.9.0
>Reporter: Alexander Filipchik
>Assignee: Vinoth Chandar
>Priority: Major
>  Labels: pull-request-available, sev:critical
> Fix For: 0.8.0
>
>
> we are running a HUDI deltastreamer on a very complex stream. Schema is 
> deeply nested, with several levels of hierarchy (avro schema is around 6600 
> LOC).
>  
> The version of HUDI that writes the dataset if 0.5-SNAPTHOT and we recently 
> started attempts to upgrade to the latest. Hovewer, latest HUDI can't read 
> the provided dataset. Exception I get: 
>  
>  
> {code:java}
> Got exception while parsing the arguments:Got exception while parsing the 
> arguments:Found recursive reference in Avro schema, which can not be 
> processed by Spark:{  "type" : "record",  "name" : "array",  "fields" : [ {   
>  "name" : "id",    "type" : [ "null", "string" ],    "default" : null  }, {   
>  "name" : "type",    "type" : [ "null", "string" ],    "default" : null  }, { 
>    "name" : "exist",    "type" : [ "null", "boolean" ],    "default" : null  
> } ]}          Stack 
> trace:org.apache.spark.sql.avro.IncompatibleSchemaException:Found recursive 
> reference in Avro schema, which can not be processed by Spark:{  "type" : 
> "record",  "name" : "array",  "fields" : [ {    "name" : "id",    "type" : [ 
> "null", "string" ],    "default" : null  }, {    "name" : "type",    "type" : 
> [ "null", "string" ],    "default" : null  }, {    "name" : "exist",    
> "type" : [ "null", "boolean" ],    "default" : null  } ]}
>  at 
> org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:75)
>  at 
> org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:89)
>  at 
> org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:105)
>  at 
> org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:82)
>  at 
> org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:81)
>  at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>  at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>  at scala.collection.Iterator$class.foreach(Iterator.scala:891) at 
> scala.collection.AbstractIterator.foreach(Iterator.scala:1334) at 
> scala.collection.IterableLike$class.foreach(IterableLike.scala:72) at 
> scala.collection.AbstractIterable.foreach(Iterable.scala:54) at 
> scala.collection.TraversableLike$class.map(TraversableLike.scala:234) at 
> scala.collection.AbstractTraversable.map(Traversable.scala:104) at 
> org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:81)
>  at 
> org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:105)
>  at 
> org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:82)
>  at 
> org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:81)
>  at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>  at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>  at scala.collection.Iterator$class.foreach(Iterator.scala:891) at 
> scala.collection.AbstractIterator.foreach(Iterator.scala:1334) at 
> scala.collection.IterableLike$class.foreach(IterableLike.scala:72) at 
> scala.collection.AbstractIterable.foreach(Iterable.scala:54) at 
> scala.collection.TraversableLike$class.map(TraversableLike.scala:234) at 
> scala.collection.AbstractTraversable.map(Traversable.scala:104) at 
> org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:81)
>  at 
> org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:105)
>  at 
> org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:82)
>  at 
> org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:81)
>  at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>  at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>  at scala.collection.Iterator$class.foreach(Iterator.scala:891) at 
> scala.collection.AbstractIterator.foreach(Iterator.scala:1334) at 
> scala.collection.IterableLike$class.foreach(IterableLike.scala:72) at 
> scala.collection.AbstractIterable.foreach(Iterable.scala:54) at 
> scala.collection.TraversableLike$class.ma

[jira] [Updated] (HUDI-1602) Corrupted Avro schema extracted from parquet file

2021-02-24 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1602?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-1602:
-
Labels: pull-request-available sev:critical  (was: sev:critical)

> Corrupted Avro schema extracted from parquet file
> -
>
> Key: HUDI-1602
> URL: https://issues.apache.org/jira/browse/HUDI-1602
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: Alexander Filipchik
>Assignee: Vinoth Chandar
>Priority: Major
>  Labels: pull-request-available, sev:critical
> Fix For: 0.8.0
>
>
> we are running a HUDI deltastreamer on a very complex stream. Schema is 
> deeply nested, with several levels of hierarchy (avro schema is around 6600 
> LOC).
>  
> The version of HUDI that writes the dataset if 0.5-SNAPTHOT and we recently 
> started attempts to upgrade to the latest. Hovewer, latest HUDI can't read 
> the provided dataset. Exception I get: 
>  
>  
> {code:java}
> Got exception while parsing the arguments:Got exception while parsing the 
> arguments:Found recursive reference in Avro schema, which can not be 
> processed by Spark:{  "type" : "record",  "name" : "array",  "fields" : [ {   
>  "name" : "id",    "type" : [ "null", "string" ],    "default" : null  }, {   
>  "name" : "type",    "type" : [ "null", "string" ],    "default" : null  }, { 
>    "name" : "exist",    "type" : [ "null", "boolean" ],    "default" : null  
> } ]}          Stack 
> trace:org.apache.spark.sql.avro.IncompatibleSchemaException:Found recursive 
> reference in Avro schema, which can not be processed by Spark:{  "type" : 
> "record",  "name" : "array",  "fields" : [ {    "name" : "id",    "type" : [ 
> "null", "string" ],    "default" : null  }, {    "name" : "type",    "type" : 
> [ "null", "string" ],    "default" : null  }, {    "name" : "exist",    
> "type" : [ "null", "boolean" ],    "default" : null  } ]}
>  at 
> org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:75)
>  at 
> org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:89)
>  at 
> org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:105)
>  at 
> org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:82)
>  at 
> org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:81)
>  at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>  at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>  at scala.collection.Iterator$class.foreach(Iterator.scala:891) at 
> scala.collection.AbstractIterator.foreach(Iterator.scala:1334) at 
> scala.collection.IterableLike$class.foreach(IterableLike.scala:72) at 
> scala.collection.AbstractIterable.foreach(Iterable.scala:54) at 
> scala.collection.TraversableLike$class.map(TraversableLike.scala:234) at 
> scala.collection.AbstractTraversable.map(Traversable.scala:104) at 
> org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:81)
>  at 
> org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:105)
>  at 
> org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:82)
>  at 
> org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:81)
>  at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>  at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>  at scala.collection.Iterator$class.foreach(Iterator.scala:891) at 
> scala.collection.AbstractIterator.foreach(Iterator.scala:1334) at 
> scala.collection.IterableLike$class.foreach(IterableLike.scala:72) at 
> scala.collection.AbstractIterable.foreach(Iterable.scala:54) at 
> scala.collection.TraversableLike$class.map(TraversableLike.scala:234) at 
> scala.collection.AbstractTraversable.map(Traversable.scala:104) at 
> org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:81)
>  at 
> org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:105)
>  at 
> org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:82)
>  at 
> org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:81)
>  at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>  at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>  at scala.collection.Iterator$class.foreach(Iterator.scala:891) at 
> scala.collection.AbstractIterator.foreach(Iterator.scala:1334) at 
> scala.collection.IterableLike$class.foreach(IterableLike.scala:72) at 
> scala.collection.AbstractIterable.foreach(Iterable.scala:54) at 
> scala.collection

[jira] [Updated] (HUDI-1602) Corrupted Avro schema extracted from parquet file

2021-02-09 Thread Vinoth Chandar (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1602?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinoth Chandar updated HUDI-1602:
-
Fix Version/s: 0.8.0

> Corrupted Avro schema extracted from parquet file
> -
>
> Key: HUDI-1602
> URL: https://issues.apache.org/jira/browse/HUDI-1602
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: Alexander Filipchik
>Priority: Major
> Fix For: 0.8.0
>
>
> we are running a HUDI deltastreamer on a very complex stream. Schema is 
> deeply nested, with several levels of hierarchy (avro schema is around 6600 
> LOC).
>  
> The version of HUDI that writes the dataset if 0.5-SNAPTHOT and we recently 
> started attempts to upgrade to the latest. Hovewer, latest HUDI can't read 
> the provided dataset. Exception I get: 
>  
>  
> {code:java}
> Got exception while parsing the arguments:Got exception while parsing the 
> arguments:Found recursive reference in Avro schema, which can not be 
> processed by Spark:{  "type" : "record",  "name" : "array",  "fields" : [ {   
>  "name" : "id",    "type" : [ "null", "string" ],    "default" : null  }, {   
>  "name" : "type",    "type" : [ "null", "string" ],    "default" : null  }, { 
>    "name" : "exist",    "type" : [ "null", "boolean" ],    "default" : null  
> } ]}          Stack 
> trace:org.apache.spark.sql.avro.IncompatibleSchemaException:Found recursive 
> reference in Avro schema, which can not be processed by Spark:{  "type" : 
> "record",  "name" : "array",  "fields" : [ {    "name" : "id",    "type" : [ 
> "null", "string" ],    "default" : null  }, {    "name" : "type",    "type" : 
> [ "null", "string" ],    "default" : null  }, {    "name" : "exist",    
> "type" : [ "null", "boolean" ],    "default" : null  } ]}
>  at 
> org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:75)
>  at 
> org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:89)
>  at 
> org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:105)
>  at 
> org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:82)
>  at 
> org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:81)
>  at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>  at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>  at scala.collection.Iterator$class.foreach(Iterator.scala:891) at 
> scala.collection.AbstractIterator.foreach(Iterator.scala:1334) at 
> scala.collection.IterableLike$class.foreach(IterableLike.scala:72) at 
> scala.collection.AbstractIterable.foreach(Iterable.scala:54) at 
> scala.collection.TraversableLike$class.map(TraversableLike.scala:234) at 
> scala.collection.AbstractTraversable.map(Traversable.scala:104) at 
> org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:81)
>  at 
> org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:105)
>  at 
> org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:82)
>  at 
> org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:81)
>  at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>  at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>  at scala.collection.Iterator$class.foreach(Iterator.scala:891) at 
> scala.collection.AbstractIterator.foreach(Iterator.scala:1334) at 
> scala.collection.IterableLike$class.foreach(IterableLike.scala:72) at 
> scala.collection.AbstractIterable.foreach(Iterable.scala:54) at 
> scala.collection.TraversableLike$class.map(TraversableLike.scala:234) at 
> scala.collection.AbstractTraversable.map(Traversable.scala:104) at 
> org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:81)
>  at 
> org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:105)
>  at 
> org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:82)
>  at 
> org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:81)
>  at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>  at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>  at scala.collection.Iterator$class.foreach(Iterator.scala:891) at 
> scala.collection.AbstractIterator.foreach(Iterator.scala:1334) at 
> scala.collection.IterableLike$class.foreach(IterableLike.scala:72) at 
> scala.collection.AbstractIterable.foreach(Iterable.scala:54) at 
> scala.collection.TraversableLike$class.map(TraversableLike.scala:234) at 
> scala.collection.AbstractTraversable.map(Traversable.scala:104) at 
> org.apache.s

[jira] [Updated] (HUDI-1602) Corrupted Avro schema extracted from parquet file

2021-02-09 Thread Vinoth Chandar (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1602?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinoth Chandar updated HUDI-1602:
-
Labels: sev:critical  (was: )

> Corrupted Avro schema extracted from parquet file
> -
>
> Key: HUDI-1602
> URL: https://issues.apache.org/jira/browse/HUDI-1602
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: Alexander Filipchik
>Priority: Major
>  Labels: sev:critical
> Fix For: 0.8.0
>
>
> we are running a HUDI deltastreamer on a very complex stream. Schema is 
> deeply nested, with several levels of hierarchy (avro schema is around 6600 
> LOC).
>  
> The version of HUDI that writes the dataset if 0.5-SNAPTHOT and we recently 
> started attempts to upgrade to the latest. Hovewer, latest HUDI can't read 
> the provided dataset. Exception I get: 
>  
>  
> {code:java}
> Got exception while parsing the arguments:Got exception while parsing the 
> arguments:Found recursive reference in Avro schema, which can not be 
> processed by Spark:{  "type" : "record",  "name" : "array",  "fields" : [ {   
>  "name" : "id",    "type" : [ "null", "string" ],    "default" : null  }, {   
>  "name" : "type",    "type" : [ "null", "string" ],    "default" : null  }, { 
>    "name" : "exist",    "type" : [ "null", "boolean" ],    "default" : null  
> } ]}          Stack 
> trace:org.apache.spark.sql.avro.IncompatibleSchemaException:Found recursive 
> reference in Avro schema, which can not be processed by Spark:{  "type" : 
> "record",  "name" : "array",  "fields" : [ {    "name" : "id",    "type" : [ 
> "null", "string" ],    "default" : null  }, {    "name" : "type",    "type" : 
> [ "null", "string" ],    "default" : null  }, {    "name" : "exist",    
> "type" : [ "null", "boolean" ],    "default" : null  } ]}
>  at 
> org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:75)
>  at 
> org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:89)
>  at 
> org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:105)
>  at 
> org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:82)
>  at 
> org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:81)
>  at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>  at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>  at scala.collection.Iterator$class.foreach(Iterator.scala:891) at 
> scala.collection.AbstractIterator.foreach(Iterator.scala:1334) at 
> scala.collection.IterableLike$class.foreach(IterableLike.scala:72) at 
> scala.collection.AbstractIterable.foreach(Iterable.scala:54) at 
> scala.collection.TraversableLike$class.map(TraversableLike.scala:234) at 
> scala.collection.AbstractTraversable.map(Traversable.scala:104) at 
> org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:81)
>  at 
> org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:105)
>  at 
> org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:82)
>  at 
> org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:81)
>  at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>  at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>  at scala.collection.Iterator$class.foreach(Iterator.scala:891) at 
> scala.collection.AbstractIterator.foreach(Iterator.scala:1334) at 
> scala.collection.IterableLike$class.foreach(IterableLike.scala:72) at 
> scala.collection.AbstractIterable.foreach(Iterable.scala:54) at 
> scala.collection.TraversableLike$class.map(TraversableLike.scala:234) at 
> scala.collection.AbstractTraversable.map(Traversable.scala:104) at 
> org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:81)
>  at 
> org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:105)
>  at 
> org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:82)
>  at 
> org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:81)
>  at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>  at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>  at scala.collection.Iterator$class.foreach(Iterator.scala:891) at 
> scala.collection.AbstractIterator.foreach(Iterator.scala:1334) at 
> scala.collection.IterableLike$class.foreach(IterableLike.scala:72) at 
> scala.collection.AbstractIterable.foreach(Iterable.scala:54) at 
> scala.collection.TraversableLike$class.map(TraversableLike.scala:234) at 
> scala.collection.AbstractTraversable.

[jira] [Updated] (HUDI-1602) Corrupted Avro schema extracted from parquet file

2021-02-08 Thread Alexander Filipchik (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1602?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Filipchik updated HUDI-1602:
--
Description: 
we are running a HUDI deltastreamer on a very complex stream. Schema is deeply 
nested, with several levels of hierarchy (avro schema is around 6600 LOC).

 

The version of HUDI that writes the dataset if 0.5-SNAPTHOT and we recently 
started attempts to upgrade to the latest. Hovewer, latest HUDI can't read the 
provided dataset. Exception I get: 

 

 
{code:java}
Got exception while parsing the arguments:Got exception while parsing the 
arguments:Found recursive reference in Avro schema, which can not be processed 
by Spark:{  "type" : "record",  "name" : "array",  "fields" : [ {    "name" : 
"id",    "type" : [ "null", "string" ],    "default" : null  }, {    "name" : 
"type",    "type" : [ "null", "string" ],    "default" : null  }, {    "name" : 
"exist",    "type" : [ "null", "boolean" ],    "default" : null  } ]}          
Stack trace:org.apache.spark.sql.avro.IncompatibleSchemaException:Found 
recursive reference in Avro schema, which can not be processed by Spark:{  
"type" : "record",  "name" : "array",  "fields" : [ {    "name" : "id",    
"type" : [ "null", "string" ],    "default" : null  }, {    "name" : "type",    
"type" : [ "null", "string" ],    "default" : null  }, {    "name" : "exist",   
 "type" : [ "null", "boolean" ],    "default" : null  } ]}
 at 
org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:75)
 at 
org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:89)
 at 
org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:105)
 at 
org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:82)
 at 
org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:81)
 at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
 at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
 at scala.collection.Iterator$class.foreach(Iterator.scala:891) at 
scala.collection.AbstractIterator.foreach(Iterator.scala:1334) at 
scala.collection.IterableLike$class.foreach(IterableLike.scala:72) at 
scala.collection.AbstractIterable.foreach(Iterable.scala:54) at 
scala.collection.TraversableLike$class.map(TraversableLike.scala:234) at 
scala.collection.AbstractTraversable.map(Traversable.scala:104) at 
org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:81)
 at 
org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:105)
 at 
org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:82)
 at 
org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:81)
 at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
 at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
 at scala.collection.Iterator$class.foreach(Iterator.scala:891) at 
scala.collection.AbstractIterator.foreach(Iterator.scala:1334) at 
scala.collection.IterableLike$class.foreach(IterableLike.scala:72) at 
scala.collection.AbstractIterable.foreach(Iterable.scala:54) at 
scala.collection.TraversableLike$class.map(TraversableLike.scala:234) at 
scala.collection.AbstractTraversable.map(Traversable.scala:104) at 
org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:81)
 at 
org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:105)
 at 
org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:82)
 at 
org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:81)
 at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
 at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
 at scala.collection.Iterator$class.foreach(Iterator.scala:891) at 
scala.collection.AbstractIterator.foreach(Iterator.scala:1334) at 
scala.collection.IterableLike$class.foreach(IterableLike.scala:72) at 
scala.collection.AbstractIterable.foreach(Iterable.scala:54) at 
scala.collection.TraversableLike$class.map(TraversableLike.scala:234) at 
scala.collection.AbstractTraversable.map(Traversable.scala:104) at 
org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:81)
 at 
org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:89)
 at 
org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:105)
 at 
org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:82)
 at 
org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:81)
 at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)

[jira] [Updated] (HUDI-1602) Corrupted Avro schema extracted from parquet file

2021-02-08 Thread Alexander Filipchik (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1602?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Filipchik updated HUDI-1602:
--
Description: 
we are running a HUDI deltastreamer on a very complex stream. Schema is deeply 
nested, with several levels of hierarchy (avro schema is around 6600 LOC).

 

The version of HUDI that writes the dataset if 0.5-SNAPTHOT and we recently 
started attempts to upgrade to the latest. Hovewer, latest HUDI can't read the 
provided dataset. Exception I get: 

 

 
{code:java}
Got exception while parsing the arguments:Got exception while parsing the 
arguments:Found recursive reference in Avro schema, which can not be processed 
by Spark:{  "type" : "record",  "name" : "array",  "fields" : [ {    "name" : 
"id",    "type" : [ "null", "string" ],    "default" : null  }, {    "name" : 
"type",    "type" : [ "null", "string" ],    "default" : null  }, {    "name" : 
"exist",    "type" : [ "null", "boolean" ],    "default" : null  } ]}          
Stack trace:org.apache.spark.sql.avro.IncompatibleSchemaException:Found 
recursive reference in Avro schema, which can not be processed by Spark:{  
"type" : "record",  "name" : "array",  "fields" : [ {    "name" : "id",    
"type" : [ "null", "string" ],    "default" : null  }, {    "name" : "type",    
"type" : [ "null", "string" ],    "default" : null  }, {    "name" : "exist",   
 "type" : [ "null", "boolean" ],    "default" : null  } ]}
 at 
org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:75)
 at 
org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:89)
 at 
org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:105)
 at 
org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:82)
 at 
org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:81)
 at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
 at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
 at scala.collection.Iterator$class.foreach(Iterator.scala:891) at 
scala.collection.AbstractIterator.foreach(Iterator.scala:1334) at 
scala.collection.IterableLike$class.foreach(IterableLike.scala:72) at 
scala.collection.AbstractIterable.foreach(Iterable.scala:54) at 
scala.collection.TraversableLike$class.map(TraversableLike.scala:234) at 
scala.collection.AbstractTraversable.map(Traversable.scala:104) at 
org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:81)
 at 
org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:105)
 at 
org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:82)
 at 
org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:81)
 at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
 at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
 at scala.collection.Iterator$class.foreach(Iterator.scala:891) at 
scala.collection.AbstractIterator.foreach(Iterator.scala:1334) at 
scala.collection.IterableLike$class.foreach(IterableLike.scala:72) at 
scala.collection.AbstractIterable.foreach(Iterable.scala:54) at 
scala.collection.TraversableLike$class.map(TraversableLike.scala:234) at 
scala.collection.AbstractTraversable.map(Traversable.scala:104) at 
org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:81)
 at 
org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:105)
 at 
org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:82)
 at 
org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:81)
 at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
 at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
 at scala.collection.Iterator$class.foreach(Iterator.scala:891) at 
scala.collection.AbstractIterator.foreach(Iterator.scala:1334) at 
scala.collection.IterableLike$class.foreach(IterableLike.scala:72) at 
scala.collection.AbstractIterable.foreach(Iterable.scala:54) at 
scala.collection.TraversableLike$class.map(TraversableLike.scala:234) at 
scala.collection.AbstractTraversable.map(Traversable.scala:104) at 
org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:81)
 at 
org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:89)
 at 
org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:105)
 at 
org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:82)
 at 
org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:81)
 at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)