hudi-bot opened a new issue, #14771:
URL: https://github.com/apache/hudi/issues/14771

   I was trying to read the MOR HUDI table incrementally using delta streamer, 
while doing that I ran into this issue where it says:
   {code:java}
   Found recursive reference in Avro schema, which can not be processed by 
Spark:{code}
   Spark Version: 2.4
   
   Hudi Version: 0.7.0-SNAPSHOT or the latest master
   
    
   
   Full Stack Trace:
   {code:java}
   Found recursive reference in Avro schema, which can not be processed by 
Spark:
   {
     "type" : "record",
     "name" : "meta",
     "fields" : [ {
       "name" : "verified",
       "type" : [ "null", "boolean" ],
       "default" : null
     }, {
       "name" : "zip",
       "type" : [ "null", "string" ],
       "default" : null
     }, {
       "name" : "lname",
       "type" : [ "null", "string" ],
       "default" : null
     }]
   }
             
        at 
org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:75)
        at 
org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:105)
        at 
org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:82)
        at 
org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:81)
        at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
        at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
        at scala.collection.Iterator$class.foreach(Iterator.scala:891)
        at scala.collection.AbstractIterator.foreach(Iterator.scala:1334)
        at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
        at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
        at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
        at scala.collection.AbstractTraversable.map(Traversable.scala:104)
        at 
org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:81)
        at 
org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:105)
        at 
org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:82)
        at 
org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:81)
        at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
        at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
        at scala.collection.Iterator$class.foreach(Iterator.scala:891)
        at scala.collection.AbstractIterator.foreach(Iterator.scala:1334)
        at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
        at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
        at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
        at scala.collection.AbstractTraversable.map(Traversable.scala:104)
        at 
org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:81)
        at 
org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:95)
        at 
org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:105)
        at 
org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:82)
        at 
org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:81)
        at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
        at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
        at scala.collection.Iterator$class.foreach(Iterator.scala:891)
        at scala.collection.AbstractIterator.foreach(Iterator.scala:1334)
        at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
        at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
        at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
        at scala.collection.AbstractTraversable.map(Traversable.scala:104)
        at 
org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:81)
        at 
org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:105)
        at 
org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:82)
        at 
org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:81)
        at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
        at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
        at scala.collection.Iterator$class.foreach(Iterator.scala:891)
        at scala.collection.AbstractIterator.foreach(Iterator.scala:1334)
        at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
        at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
        at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
        at scala.collection.AbstractTraversable.map(Traversable.scala:104)
        at 
org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:81)
        at 
org.apache.spark.sql.avro.SchemaConverters$.toSqlType(SchemaConverters.scala:46)
        at 
org.apache.hudi.AvroConversionUtils$.convertAvroSchemaToStructType(AvroConversionUtils.scala:56)
        at 
org.apache.hudi.MergeOnReadIncrementalRelation.<init>(MergeOnReadIncrementalRelation.scala:77)
        at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:109)
        at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:63)
        at 
org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:318)
        at 
org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:223)
        at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:211)
        at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:178)
        at 
org.apache.hudi.utilities.sources.HoodieIncrSource.fetchNextBatch(HoodieIncrSource.java:122)
        at 
org.apache.hudi.utilities.sources.RowSource.fetchNewData(RowSource.java:43)
        at org.apache.hudi.utilities.sources.Source.fetchNext(Source.java:75)
        at 
org.apache.hudi.utilities.deltastreamer.SourceFormatAdapter.fetchNewDataInRowFormat(SourceFormatAdapter.java:94)
        at 
org.apache.hudi.utilities.deltastreamer.DeltaSync.readFromSource(DeltaSync.java:350)
        at 
org.apache.hudi.utilities.deltastreamer.DeltaSync.syncOnce(DeltaSync.java:259)
        at 
org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.lambda$sync$2(HoodieDeltaStreamer.java:170)
        at org.apache.hudi.common.util.Option.ifPresent(Option.java:96)
        at 
org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.sync(HoodieDeltaStreamer.java:168)
        at 
org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.main(HoodieDeltaStreamer.java:470)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at 
org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:690)
   )
   {code}
    
   
    
   
   ## JIRA info
   
   - Link: https://issues.apache.org/jira/browse/HUDI-1747
   - Type: Bug
   
   
   ---
   
   
   ## Comments
   
   20/Apr/21 03:27;shivnarayan;[~vino]: Can you help me understand the use-case 
or how to reproduce. 
    * Do you see the exception during first time itself when trying to 
incrementally read from hudi MOR table via delta streamer? or after few 
incremental pulls via deltastreamer. 
    * I assume the schema given in the stack trace is the source table schema 
of deltastreamer. Can you confirm that. 
    * I need to set aside sometime to try and reproduce this. but if you have 
steps to reproduce w/ configs, would be awesome. but if it might consume too 
much time for you, nvm. ;;;
   
   ---
   
   20/Apr/21 04:32;vino;[~shivnarayan] - Here are the answers:
    * Do you see the exception during first time itself when trying to 
incrementally read from hudi MOR table via delta streamer? or after few 
incremental pulls via deltastreamer. 
    ** I see it the first time itself.
    * I assume the schema given in the stack trace is the source table schema 
of deltastreamer. Can you confirm that. 
    ** It's a big table, the schema is just the first few fields, not all the 
fields are shown in the log.
    * I need to set aside sometime to try and reproduce this. but if you have 
steps to reproduce w/ configs, would be awesome. but if it might consume too 
much time for you, nvm. 
    ** I've a test pipeline which you can use for debugging, I will share the 
details in slack.;;;
   
   ---
   
   20/Apr/21 12:56;shivnarayan;awesome, thanks. ;;;
   
   ---
   
   13/Dec/21 14:30;shivnarayan;[~harsh1231] [~codope] ;;;
   
   ---
   
   12/Jan/22 03:43;shivnarayan;[~vino] : Can you help w/ more info on 
reproducing this. ;;;


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to