[ https://issues.apache.org/jira/browse/HUDI-1205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Vinoth Chandar updated HUDI-1205: --------------------------------- Labels: user-support-issues (was: ) > Serialization fail when log file is larger than 2GB > --------------------------------------------------- > > Key: HUDI-1205 > URL: https://issues.apache.org/jira/browse/HUDI-1205 > Project: Apache Hudi > Issue Type: Bug > Reporter: Yanjia Gary Li > Priority: Major > Labels: user-support-issues > > When scanning the log file, if the log file(or log file group) is larger than > 2GB, serialization will fail because Hudi uses Integer to store size in byte > for the log file. The maximum integer representing bytes is 2GB. > Caused by: com.esotericsoftware.kryo.KryoException: Unable to find class: > org.apache.hudi.common.model.OverwriteWithLatestAvroPayload$$Lambda$45/62103784 > Serialization trace: > orderingVal (org.apache.hudi.common.model.OverwriteWithLatestAvroPayload) > data (org.apache.hudi.common.model.HoodieRecord) > at > com.esotericsoftware.kryo.util.DefaultClassResolver.readName(DefaultClassResolver.java:160) > at > com.esotericsoftware.kryo.util.DefaultClassResolver.readClass(DefaultClassResolver.java:133) > at com.esotericsoftware.kryo.Kryo.readClass(Kryo.java:693) > at > com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:118) > at > com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:543) > at com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:731) > at > com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:125) > at > com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:543) > at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:813) > at > org.apache.hudi.common.util.SerializationUtils$KryoSerializerInstance.deserialize(SerializationUtils.java:107) > at > org.apache.hudi.common.util.SerializationUtils.deserialize(SerializationUtils.java:81) > at > org.apache.hudi.common.util.collection.DiskBasedMap.get(DiskBasedMap.java:217) > at > org.apache.hudi.common.util.collection.DiskBasedMap.get(DiskBasedMap.java:211) > at > org.apache.hudi.common.util.collection.DiskBasedMap.get(DiskBasedMap.java:207) > at > org.apache.hudi.common.util.collection.ExternalSpillableMap.get(ExternalSpillableMap.java:168) > at > org.apache.hudi.common.util.collection.ExternalSpillableMap.get(ExternalSpillableMap.java:55) > at > org.apache.hudi.HoodieMergeOnReadRDD$$anon$1.hasNext(HoodieMergeOnReadRDD.scala:128) > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.agg_doAggregateWithoutKey_0$(Unknown > Source) > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown > Source) > at > org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) > at > org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$11$$anon$1.hasNext(WholeStageCodegenExec.scala:624) > at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:409) > at > org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:125) > at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99) > at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:55) > at org.apache.spark.scheduler.Task.run(Task.scala:121) > at > org.apache.spark.executor.Executor$TaskRunner$$anonfun$11.apply(Executor.scala:407) > at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1408) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:413) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > Caused by: java.lang.ClassNotFoundException: > org.apache.hudi.common.model.OverwriteWithLatestAvroPayload$$Lambda$45/62103784 > at java.lang.Class.forName0(Native Method) > at java.lang.Class.forName(Class.java:348) > at > com.esotericsoftware.kryo.util.DefaultClassResolver.readName(DefaultClassResolver.java:154) > ... 31 more -- This message was sent by Atlassian Jira (v8.3.4#803005)