Re: [QUESTION] Encountering exceptions while upserting with Deltastreamer

nishith agarwal Tue, 24 Dec 2019 23:31:04 -0800

Kabir,

Here is a good resource to quickly understand some avro schema evolution
rules : http://dekarlab.de/wp/?p=489 (apart from the avro spec itself which
I usually work with).
On the note of default values, I agree, it's not backwards compatible to
add them later.
I think the 2 versions of the schema are incorrectly pasted in the JIRA.
Both schema versions *did* have default, it's just an error on the part of
the person who opened the JIRA is my understanding.
If you read more closely on the JIRA, there is another linked jira :
https://issues.apache.org/jira/browse/PARQUET-1681. There is a
legitimate bug in the parquet-avro reader that causes this issue in
certain cases of schema evolution. To tell you more about the exact issue
that resulted in creating the above ticket, if you create a collection type
field in avro, say an array and then have only 1 sub-entry in it (could be
a record, primitive type etc), parquet considers this as a primitive type
and creates a corresponding schema. Now, when one goes through the schema
evolution and adds another sub-entry to the array type, in avro, it's
supposed to be a complex/collection type field but parquet still thinks
it's primitive and that's where the parquet-avro reader fails and throws
the exception that I expect a primitive type (binary something.. in Ethan's
case as well).


But +1 on replicating the issue with a reduced schema and get to the bottom
of the issue if it's something else!

Thanks,
Nishith

On Sat, Dec 21, 2019 at 11:19 AM Y Ethan Guo <ethan.guoyi...@gmail.com>
wrote:

> Kabeer and Vinoth, thanks for the suggestion.  Yes, I'm spending time on
> using a subset of schema and sample inserts/updates to reproduce the
> problem locally.  I'd like to get to the bottom of this.  Once I have some
> findings, I'll follow up here.
>
> On Sat, Dec 21, 2019 at 8:06 AM Vinoth Chandar <vin...@apache.org> wrote:
>
> > +1 for trimming the schema down and iterating
> >
> > On Thu, Dec 19, 2019 at 10:07 PM Kabeer Ahmed <kab...@linuxmail.org>
> > wrote:
> >
> > > Hi Nishith,
> > >
> > > I do not want to diverge this thread. I looked into the jira link that
> > you
> > > have sent which is - (
> > >
> >
> https://issues.apache.org/jira/projects/PARQUET/issues/PARQUET-1656?filter=allopenissues
> > > (
> > >
> >
> https://link.getmailspring.com/link/f14acfa3-1f88-4290-94f9-e4c3ed928...@getmailspring.com/0?redirect=https%3A%2F%2Fissues.apache.org%2Fjira%2Fprojects%2FPARQUET%2Fissues%2FPARQUET-1656%3Ffilter%3Dallopenissues&recipient=ZGV2QGh1ZGkuYXBhY2hlLm9yZw%3D%3D
> )
> > ).
> > > In the example of this jira, I see that the schema has changed
> especially
> > > wrt addition of a default value. This has never worked for me in the
> past
> > > and the rules of avro schema evolution are as below (I cant put an
> > > authoritative link but these are derived from experience):
> > > Always provide a default value for the field to start with.
> > >
> > > Never change a field's data type.
> > >
> > > If a new few field is added in the future, always provide a default
> > value.
> > > (back to rule 1).
> > >
> > > Never rename an existing field but instead use aliases.
> > >
> > > In the example given in the jira, we see that there is a default value
> > > being added later when it was missed in the first step. I have
> > highlighted
> > > one field in the ps section of the email below. This has not worked for
> > me
> > > or atleast I can say that this has given issues in the future.
> > >
> > > Coming back to this thread, I think the issue Ethan is having is not
> (or
> > > should not) be directly related to the jira. Ethan seems to be using
> > delta
> > > streamer from a source but encountering issues in the 2nd load whilst
> the
> > > 1st one goes without issues. So he is not necessarily changing the
> > > definition of the schemas manually.
> > > If you have other thoughts, please feel free. We can discuss in another
> > > thread.
> > > Ethan - please see if you can replicate the issue with a subset of the
> > > schema. This involves effort but from past experience, I can always
> vouch
> > > for this practice as it enables others to help you faster or atleast we
> > > understand, file and fix a bug faster.
> > > Thanks,
> > > Kabeer.
> > >
> > > ____________________________
> > > First version of schema:
> > >
> > > {
> > > "default": null,
> > > "name": "master_cluster",
> > > "type": [
> > > "null",
> > > {
> > > "fields": [
> > > { "name": "uuid", "type": "string" }
> > > ,
> > > { "name": "namespace", "type": "string" }
> > > ,
> > >
> > > { "name": "version", "type": "long" }
> > > ],
> > > "name": "master_cluster",
> > > "type": "record"
> > > }
> > > ]
> > > },
> > >
> > > ___________________________________
> > > Second version of the schema:
> > >
> > > {
> > > "default": null,
> > > "name": "master_cluster",
> > > "type": [
> > > "null",
> > > {
> > > "fields": [
> > > { "default": null, "name": "uuid", "type": [ "null", "string" ] }
> > > ,
> > > { "default": null, "name": "namespace", "type": [ "null", "string" ] }
> > > ,
> > >
> > > { "default": null, "name": "version", "type": [ "null", "long" ] }
> > > ],
> > > "name": "VORGmaster_cluster",
> > > "type": "record"
> > > }
> > > ]
> > > },
> > >
> > > On Dec 19 2019, at 10:59 pm, Y Ethan Guo <ethan.guoyi...@gmail.com>
> > wrote:
> > > > Got it. Thanks for the clarification.
> > > >
> > > > On Thu, Dec 19, 2019 at 2:54 PM nishith agarwal <n3.nas...@gmail.com
> >
> > > wrote:
> > > > > Ethan,
> > > > > There isn't one available in the open-source, it's an internal
> build
> > we
> > > > > have.
> > > > >
> > > > > -Nishith
> > > > > On Thu, Dec 19, 2019 at 2:50 PM Y Ethan Guo <
> > ethan.guoyi...@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > Thanks Kabeer and Nishith for the responses.
> > > > > > The schema hasn't been changed. I'm now trying to reproduce the
> > > problem
> > > > > > locally with some synthetic data, given that it's expensive to
> > > iterate in
> > > > > > my cluster testing setup.
> > > > > >
> > > > > > @Nishith, thanks for the pointer. Is there an existing build of
> > > > > > parquet-avro v1.8.1 with the fix? I don't see the fix attached to
> > the
> > > > > > ticket. I suppose that I also need to rebuild Hudi utilities
> bundle
> > > to
> > > > > > pick that up.
> > > > > >
> > > > > > On Thu, Dec 19, 2019 at 1:51 PM nishith agarwal <
> > n3.nas...@gmail.com
> > > >
> > > > > > wrote:
> > > > > >
> > > > > > > Ethan,
> > > > > > > Unless this is a backwards incompatible schema change, this
> seems
> > > > > > > related to a parquet-avro reader bug we've seen before, find
> more
> > > > > >
> > > > >
> > > > > details
> > > > > > > here :
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > >
> >
> https://issues.apache.org/jira/projects/PARQUET/issues/PARQUET-1656?filter=allopenissues
> > > > > > > .
> > > > > > > There's a fix for the parquet-avro reader for 1.8.1 which we
> > > patched
> > > > > >
> > > > >
> > > > > and
> > > > > > > works for us in production. There isn't a proper fix available
> > for
> > > > > >
> > > > >
> > > > > later
> > > > > > > parquet versions since the code has changed quite a bit. Which
> is
> > > also
> > > > > >
> > > > > > the
> > > > > > > reason why the patch for 1.8.1 is not being upstreamed since
> > fewer
> > > > > >
> > > > >
> > > > > people
> > > > > > > are using it.
> > > > > > >
> > > > > > > -Nishith
> > > > > > > On Thu, Dec 19, 2019 at 12:34 PM Kabeer Ahmed <
> > > kab...@linuxmail.org>
> > > > > > > wrote:
> > > > > > >
> > > > > > > > Hi Ethan,
> > > > > > > > It is often tricky to debug or help with issues when I do not
> > > have an
> > > > > > > idea
> > > > > > > > of the data. My "guess" is that your schema is changing. This
> > > could
> > > > > > >
> > > > > >
> > > > >
> > > > > be
> > > > > > > > related to: https://stackoverflow.com/a/42946528/4517001 (
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > >
> >
> https://link.getmailspring.com/link/966cdc69-f6ce-4745-88bd-0e5553efa...@getmailspring.com/0?redirect=https%3A%2F%2Fstackoverflow.com%2Fa%2F42946528%2F4517001&recipient=ZGV2QGh1ZGkuYXBhY2hlLm9yZw%3D%3D
> > > > > > > > ).
> > > > > > > > It may help if you could send sample data of the first pass
> > that
> > > went
> > > > > > > > without issues and the second set of data that caused an
> issue.
> > > > > > >
> > > > > >
> > > > >
> > > > > Please
> > > > > > > > check that you are allowed to share the data.
> > > > > > > > Thanks
> > > > > > > > Kabeer.
> > > > > > > >
> > > > > > > > On Dec 19 2019, at 7:33 pm, Y Ethan Guo <
> > > ethan.guoyi...@gmail.com>
> > > > > > > wrote:
> > > > > > > > > Hi folks,
> > > > > > > > >
> > > > > > > > > I'm testing a new Deltastreamer job in cluster which
> > > incrementally
> > > > > > > pulls
> > > > > > > > > data from an upstream Hudi table and upserts the dataset
> into
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > > > another
> > > > > > > > > table. The first run of Deltastreamer job which involves
> only
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > > > inserts
> > > > > > > > > succeeded. The second run of the job which involves updates
> > > throws
> > > > > > > >
> > > > > > >
> > > > > >
> > > > > > the
> > > > > > > > > following exception. I'm using a snapshot build of Hudi:
> > > > > > > > > 0.4.8-SNAPSHOT[1]. I believe this is related to schema,
> but I
> > > don't
> > > > > > > >
> > > > > > >
> > > > > > > know
> > > > > > > > > how I should debug and fix this. Any suggestion is
> > appreciated.
> > > > > > > > >
> > > > > > > > > org.apache.spark.SparkException: Job aborted due to stage
> > > failure:
> > > > > > > > > Task 33 in stage 78.0 failed 4 times, most recent failure:
> > Lost
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > > > task
> > > > > > > > > 33.3 in stage 78.0 (TID 13283, executor 5):
> > > > > > > > > com.uber.hoodie.exception.HoodieUpsertException: Error
> > > upserting
> > > > > > > > > bucketType UPDATE for partition :33
> > > > > > > > > at
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > >
> >
> com.uber.hoodie.table.HoodieCopyOnWriteTable.handleUpsertPartition(HoodieCopyOnWriteTable.java:271)
> > > > > > > > > at
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > >
> >
> com.uber.hoodie.HoodieWriteClient.lambda$upsertRecordsInternal$7ef77fd$1(HoodieWriteClient.java:442)
> > > > > > > > > at
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > >
> >
> org.apache.spark.api.java.JavaRDDLike$$anonfun$mapPartitionsWithIndex$1.apply(JavaRDDLike.scala:102)
> > > > > > > > > at
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > >
> >
> org.apache.spark.api.java.JavaRDDLike$$anonfun$mapPartitionsWithIndex$1.apply(JavaRDDLike.scala:102)
> > > > > > > > > at
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > >
> >
> org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1$$anonfun$apply$26.apply(RDD.scala:844)
> > > > > > > > > at
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > >
> >
> org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1$$anonfun$apply$26.apply(RDD.scala:844)
> > > > > > > > > at
> > > > > > > >
> > > > > > > >
> > > > > >
> > >
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
> > > > > > > > > at
> > > org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
> > > > > > > > > at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
> > > > > > > > > at
> > > > > > > >
> > > > > > > >
> > > > > >
> > >
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
> > > > > > > > > at
> > > org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
> > > > > > > > > at org.apache.spark.rdd.RDD$$anonfun$8.apply(RDD.scala:336)
> > > > > > > > > at org.apache.spark.rdd.RDD$$anonfun$8.apply(RDD.scala:334)
> > > > > > > > > at
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > >
> >
> org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:1055)
> > > > > > > > > at
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > >
> >
> org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:1029)
> > > > > > > > > at
> > > > > > > >
> > > > > > >
> > > > > >
> > > > > >
> org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:969)
> > > > > > > > > at
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > >
> >
> org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:1029)
> > > > > > > > > at
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > >
> >
> org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:760)
> > > > > > > > > at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:334)
> > > > > > > > > at org.apache.spark.rdd.RDD.iterator(RDD.scala:285)
> > > > > > > > > at
> > > > > > > >
> > > > > > > >
> > > > > >
> > >
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
> > > > > > > > > at
> > > org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
> > > > > > > > > at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
> > > > > > > > > at
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > > > org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
> > > > > > > > > at org.apache.spark.scheduler.Task.run(Task.scala:108)
> > > > > > > > > at
> > > > > > > >
> > > > > > >
> > > > > > >
> > > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:338)
> > > > > > > > > at
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > >
> >
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> > > > > > > > > at
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > >
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> > > > > > > > > at java.lang.Thread.run(Thread.java:748)
> > > > > > > > > Caused by: com.uber.hoodie.exception.HoodieException:
> > > > > > > > > com.uber.hoodie.exception.HoodieException:
> > > > > > > > > java.util.concurrent.ExecutionException:
> > > > > > > > > com.uber.hoodie.exception.HoodieException: operation has
> > failed
> > > > > > > > > at
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > >
> >
> com.uber.hoodie.table.HoodieCopyOnWriteTable.handleUpdateInternal(HoodieCopyOnWriteTable.java:206)
> > > > > > > > > at
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > >
> >
> com.uber.hoodie.table.HoodieCopyOnWriteTable.handleUpdate(HoodieCopyOnWriteTable.java:181)
> > > > > > > > > at
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > >
> >
> com.uber.hoodie.table.HoodieCopyOnWriteTable.handleUpsertPartition(HoodieCopyOnWriteTable.java:263)
> > > > > > > > > ... 28 more
> > > > > > > > > Caused by: com.uber.hoodie.exception.HoodieException:
> > > > > > > > > java.util.concurrent.ExecutionException:
> > > > > > > > > com.uber.hoodie.exception.HoodieException: operation has
> > failed
> > > > > > > > > at
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > >
> >
> com.uber.hoodie.common.util.queue.BoundedInMemoryExecutor.execute(BoundedInMemoryExecutor.java:146)
> > > > > > > > > at
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > >
> >
> com.uber.hoodie.table.HoodieCopyOnWriteTable.handleUpdateInternal(HoodieCopyOnWriteTable.java:204)
> > > > > > > > > ... 30 more
> > > > > > > > > Caused by: java.util.concurrent.ExecutionException:
> > > > > > > > > com.uber.hoodie.exception.HoodieException: operation has
> > failed
> > > > > > > > > at
> > java.util.concurrent.FutureTask.report(FutureTask.java:122)
> > > > > > > > > at java.util.concurrent.FutureTask.get(FutureTask.java:192)
> > > > > > > > > at
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > >
> >
> com.uber.hoodie.common.util.queue.BoundedInMemoryExecutor.execute(BoundedInMemoryExecutor.java:144)
> > > > > > > > > ... 31 more
> > > > > > > > > Caused by: com.uber.hoodie.exception.HoodieException:
> > > operation has
> > > > > > > >
> > > > > > > > failed
> > > > > > > > > at
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > >
> >
> com.uber.hoodie.common.util.queue.BoundedInMemoryQueue.throwExceptionIfFailed(BoundedInMemoryQueue.java:232)
> > > > > > > > > at
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > >
> >
> com.uber.hoodie.common.util.queue.BoundedInMemoryQueue.readNextRecord(BoundedInMemoryQueue.java:211)
> > > > > > > > > at
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > >
> >
> com.uber.hoodie.common.util.queue.BoundedInMemoryQueue.access$100(BoundedInMemoryQueue.java:49)
> > > > > > > > > at
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > >
> >
> com.uber.hoodie.common.util.queue.BoundedInMemoryQueue$QueueIterator.hasNext(BoundedInMemoryQueue.java:262)
> > > > > > > > > at
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > >
> >
> com.uber.hoodie.common.util.queue.BoundedInMemoryQueueConsumer.consume(BoundedInMemoryQueueConsumer.java:37)
> > > > > > > > > at
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > >
> >
> com.uber.hoodie.common.util.queue.BoundedInMemoryExecutor.lambda$null$2(BoundedInMemoryExecutor.java:124)
> > > > > > > > > at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> > > > > > > > > ... 3 more
> > > > > > > > > Caused by: java.lang.ClassCastException: required binary
> key
> > > (UTF8)
> > > > > > > >
> > > > > > >
> > > > > >
> > > > > > is
> > > > > > > > > not a group
> > > > > > > > > at
> org.apache.parquet.schema.Type.asGroupType(Type.java:202)
> > > > > > > > > at
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > >
> >
> org.apache.parquet.avro.AvroRecordConverter$MapConverter.<init>(AvroRecordConverter.java:821)
> > > > > > > > > at
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > >
> >
> org.apache.parquet.avro.AvroRecordConverter.newConverter(AvroRecordConverter.java:210)
> > > > > > > > > at
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > >
> >
> org.apache.parquet.avro.AvroRecordConverter.access$200(AvroRecordConverter.java:63)
> > > > > > > > > at
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > >
> >
> org.apache.parquet.avro.AvroRecordConverter$AvroCollectionConverter$ElementConverter.<init>(AvroRecordConverter.java:435)
> > > > > > > > > at
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > >
> >
> org.apache.parquet.avro.AvroRecordConverter$AvroCollectionConverter.<init>(AvroRecordConverter.java:385)
> > > > > > > > > at
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > >
> >
> org.apache.parquet.avro.AvroRecordConverter.newConverter(AvroRecordConverter.java:206)
> > > > > > > > > at
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > >
> >
> org.apache.parquet.avro.AvroRecordConverter.access$200(AvroRecordConverter.java:63)
> > > > > > > > > at
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > >
> >
> org.apache.parquet.avro.AvroRecordConverter$MapConverter$MapKeyValueConverter.<init>(AvroRecordConverter.java:872)
> > > > > > > > > at
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > >
> >
> org.apache.parquet.avro.AvroRecordConverter$MapConverter.<init>(AvroRecordConverter.java:822)
> > > > > > > > > at
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > >
> >
> org.apache.parquet.avro.AvroRecordConverter.newConverter(AvroRecordConverter.java:210)
> > > > > > > > > at
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > >
> >
> org.apache.parquet.avro.AvroRecordConverter.<init>(AvroRecordConverter.java:112)
> > > > > > > > > at
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > >
> >
> org.apache.parquet.avro.AvroRecordConverter.<init>(AvroRecordConverter.java:79)
> > > > > > > > > at
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > >
> >
> org.apache.parquet.avro.AvroRecordMaterializer.<init>(AvroRecordMaterializer.java:33)
> > > > > > > > > at
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > >
> >
> org.apache.parquet.avro.AvroReadSupport.prepareForRead(AvroReadSupport.java:132)
> > > > > > > > > at
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > >
> >
> org.apache.parquet.hadoop.InternalParquetRecordReader.initialize(InternalParquetRecordReader.java:175)
> > > > > > > > > at
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > >
> >
> org.apache.parquet.hadoop.ParquetReader.initReader(ParquetReader.java:149)
> > > > > > > > > at
> > > > > > > >
> > > > > > >
> > > > > >
> > > > > >
> > org.apache.parquet.hadoop.ParquetReader.read(ParquetReader.java:125)
> > > > > > > > > at
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > >
> >
> com.uber.hoodie.func.ParquetReaderIterator.hasNext(ParquetReaderIterator.java:45)
> > > > > > > > > at
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > >
> >
> com.uber.hoodie.common.util.queue.IteratorBasedQueueProducer.produce(IteratorBasedQueueProducer.java:44)
> > > > > > > > > at
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > >
> >
> com.uber.hoodie.common.util.queue.BoundedInMemoryExecutor.lambda$null$0(BoundedInMemoryExecutor.java:94)
> > > > > > > > > at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> > > > > > > > > at
> > > > > > > >
> > > > > > > >
> > > > >
> > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> > > > > > > > > ... 4 more
> > > > > > > > >
> > > > > > > > > Driver stacktrace:
> > > > > > > > > at org.apache.spark.scheduler.DAGScheduler.org
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > >
> >
> $apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1517)
> > > > > > > > > at
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > >
> >
> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1505)
> > > > > > > > > at
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > >
> >
> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1504)
> > > > > > > > > at
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > >
> >
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
> > > > > > > > > at
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > > > scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
> > > > > > > > > at
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > >
> >
> org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1504)
> > > > > > > > > at
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > >
> >
> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:814)
> > > > > > > > > at
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > >
> >
> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:814)
> > > > > > > > > at scala.Option.foreach(Option.scala:257)
> > > > > > > > > at
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > >
> >
> org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:814)
> > > > > > > > > at
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > >
> >
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1732)
> > > > > > > > > at
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > >
> >
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1687)
> > > > > > > > > at
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > >
> >
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1676)
> > > > > > > > > at
> > > org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
> > > > > > > > > at
> > > > > > > >
> > > > > > >
> > > > > > >
> > > org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:630)
> > > > > > > > > at
> > > org.apache.spark.SparkContext.runJob(SparkContext.scala:2029)
> > > > > > > > > at
> > > org.apache.spark.SparkContext.runJob(SparkContext.scala:2126)
> > > > > > > > > at
> > > org.apache.spark.rdd.RDD$$anonfun$fold$1.apply(RDD.scala:1089)
> > > > > > > > > at
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > >
> >
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
> > > > > > > > > at
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > >
> >
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
> > > > > > > > > at org.apache.spark.rdd.RDD.withScope(RDD.scala:362)
> > > > > > > > > at org.apache.spark.rdd.RDD.fold(RDD.scala:1083)
> > > > > > > > > at
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > >
> >
> org.apache.spark.rdd.DoubleRDDFunctions$$anonfun$sum$1.apply$mcD$sp(DoubleRDDFunctions.scala:35)
> > > > > > > > > at
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > >
> >
> org.apache.spark.rdd.DoubleRDDFunctions$$anonfun$sum$1.apply(DoubleRDDFunctions.scala:35)
> > > > > > > > > at
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > >
> >
> org.apache.spark.rdd.DoubleRDDFunctions$$anonfun$sum$1.apply(DoubleRDDFunctions.scala:35)
> > > > > > > > > at
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > >
> >
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
> > > > > > > > > at
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > >
> >
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
> > > > > > > > > at org.apache.spark.rdd.RDD.withScope(RDD.scala:362)
> > > > > > > > > at
> > > > > > > >
> > > > > > > >
> > > > > >
> > >
> org.apache.spark.rdd.DoubleRDDFunctions.sum(DoubleRDDFunctions.scala:34)
> > > > > > > > > at
> > > > > > > >
> > > > > > >
> > > > > >
> > > > > >
> > org.apache.spark.api.java.JavaDoubleRDD.sum(JavaDoubleRDD.scala:165)
> > > > > > > > > at
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > >
> >
> com.uber.hoodie.utilities.deltastreamer.HoodieDeltaStreamer.sync(HoodieDeltaStreamer.java:279)
> > > > > > > > > ... 47 elided
> > > > > > > > > Caused by: com.uber.hoodie.exception.HoodieUpsertException:
> > > Error
> > > > > > > > > upserting bucketType UPDATE for partition :33
> > > > > > > > > at
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > >
> >
> com.uber.hoodie.table.HoodieCopyOnWriteTable.handleUpsertPartition(HoodieCopyOnWriteTable.java:271)
> > > > > > > > > at
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > >
> >
> com.uber.hoodie.HoodieWriteClient.lambda$upsertRecordsInternal$7ef77fd$1(HoodieWriteClient.java:442)
> > > > > > > > > at
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > >
> >
> org.apache.spark.api.java.JavaRDDLike$$anonfun$mapPartitionsWithIndex$1.apply(JavaRDDLike.scala:102)
> > > > > > > > > at
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > >
> >
> org.apache.spark.api.java.JavaRDDLike$$anonfun$mapPartitionsWithIndex$1.apply(JavaRDDLike.scala:102)
> > > > > > > > > at
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > >
> >
> org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1$$anonfun$apply$26.apply(RDD.scala:844)
> > > > > > > > > at
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > >
> >
> org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1$$anonfun$apply$26.apply(RDD.scala:844)
> > > > > > > > > at
> > > > > > > >
> > > > > > > >
> > > > > >
> > >
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
> > > > > > > > > at
> > > org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
> > > > > > > > > at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
> > > > > > > > > at
> > > > > > > >
> > > > > > > >
> > > > > >
> > >
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
> > > > > > > > > at
> > > org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
> > > > > > > > > at org.apache.spark.rdd.RDD$$anonfun$8.apply(RDD.scala:336)
> > > > > > > > > at org.apache.spark.rdd.RDD$$anonfun$8.apply(RDD.scala:334)
> > > > > > > > > at
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > >
> >
> org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:1055)
> > > > > > > > > at
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > >
> >
> org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:1029)
> > > > > > > > > at
> > > > > > > >
> > > > > > >
> > > > > >
> > > > > >
> org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:969)
> > > > > > > > > at
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > >
> >
> org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:1029)
> > > > > > > > > at
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > >
> >
> org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:760)
> > > > > > > > > at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:334)
> > > > > > > > > at org.apache.spark.rdd.RDD.iterator(RDD.scala:285)
> > > > > > > > > at
> > > > > > > >
> > > > > > > >
> > > > > >
> > >
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
> > > > > > > > > at
> > > org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
> > > > > > > > > at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
> > > > > > > > > at
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > > > org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
> > > > > > > > > at org.apache.spark.scheduler.Task.run(Task.scala:108)
> > > > > > > > > at
> > > > > > > >
> > > > > > >
> > > > > > >
> > > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:338)
> > > > > > > > > ... 3 more
> > > > > > > > > Caused by: com.uber.hoodie.exception.HoodieException:
> > > > > > > > > com.uber.hoodie.exception.HoodieException:
> > > > > > > > > java.util.concurrent.ExecutionException:
> > > > > > > > > com.uber.hoodie.exception.HoodieException: operation has
> > failed
> > > > > > > > > at
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > >
> >
> com.uber.hoodie.table.HoodieCopyOnWriteTable.handleUpdateInternal(HoodieCopyOnWriteTable.java:206)
> > > > > > > > > at
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > >
> >
> com.uber.hoodie.table.HoodieCopyOnWriteTable.handleUpdate(HoodieCopyOnWriteTable.java:181)
> > > > > > > > > at
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > >
> >
> com.uber.hoodie.table.HoodieCopyOnWriteTable.handleUpsertPartition(HoodieCopyOnWriteTable.java:263)
> > > > > > > > > ... 28 more
> > > > > > > > > Caused by: com.uber.hoodie.exception.HoodieException:
> > > > > > > > > java.util.concurrent.ExecutionException:
> > > > > > > > > com.uber.hoodie.exception.HoodieException: operation has
> > failed
> > > > > > > > > at
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > >
> >
> com.uber.hoodie.common.util.queue.BoundedInMemoryExecutor.execute(BoundedInMemoryExecutor.java:146)
> > > > > > > > > at
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > >
> >
> com.uber.hoodie.table.HoodieCopyOnWriteTable.handleUpdateInternal(HoodieCopyOnWriteTable.java:204)
> > > > > > > > > ... 30 more
> > > > > > > > > Caused by: java.util.concurrent.ExecutionException:
> > > > > > > > > com.uber.hoodie.exception.HoodieException: operation has
> > failed
> > > > > > > > > at
> > java.util.concurrent.FutureTask.report(FutureTask.java:122)
> > > > > > > > > at java.util.concurrent.FutureTask.get(FutureTask.java:192)
> > > > > > > > > at
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > >
> >
> com.uber.hoodie.common.util.queue.BoundedInMemoryExecutor.execute(BoundedInMemoryExecutor.java:144)
> > > > > > > > > ... 31 more
> > > > > > > > > Caused by: com.uber.hoodie.exception.HoodieException:
> > > operation has
> > > > > > > >
> > > > > > > > failed
> > > > > > > > > at
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > >
> >
> com.uber.hoodie.common.util.queue.BoundedInMemoryQueue.throwExceptionIfFailed(BoundedInMemoryQueue.java:232)
> > > > > > > > > at
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > >
> >
> com.uber.hoodie.common.util.queue.BoundedInMemoryQueue.readNextRecord(BoundedInMemoryQueue.java:211)
> > > > > > > > > at
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > >
> >
> com.uber.hoodie.common.util.queue.BoundedInMemoryQueue.access$100(BoundedInMemoryQueue.java:49)
> > > > > > > > > at
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > >
> >
> com.uber.hoodie.common.util.queue.BoundedInMemoryQueue$QueueIterator.hasNext(BoundedInMemoryQueue.java:262)
> > > > > > > > > at
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > >
> >
> com.uber.hoodie.common.util.queue.BoundedInMemoryQueueConsumer.consume(BoundedInMemoryQueueConsumer.java:37)
> > > > > > > > > at
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > >
> >
> com.uber.hoodie.common.util.queue.BoundedInMemoryExecutor.lambda$null$2(BoundedInMemoryExecutor.java:124)
> > > > > > > > > at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> > > > > > > > > ... 3 more
> > > > > > > > > Caused by: java.lang.ClassCastException: required binary
> key
> > > (UTF8)
> > > > > > > >
> > > > > > >
> > > > > >
> > > > > > is
> > > > > > > > > not a group
> > > > > > > > > at
> org.apache.parquet.schema.Type.asGroupType(Type.java:202)
> > > > > > > > > at
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > >
> >
> org.apache.parquet.avro.AvroRecordConverter$MapConverter.<init>(AvroRecordConverter.java:821)
> > > > > > > > > at
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > >
> >
> org.apache.parquet.avro.AvroRecordConverter.newConverter(AvroRecordConverter.java:210)
> > > > > > > > > at
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > >
> >
> org.apache.parquet.avro.AvroRecordConverter.access$200(AvroRecordConverter.java:63)
> > > > > > > > > at
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > >
> >
> org.apache.parquet.avro.AvroRecordConverter$AvroCollectionConverter$ElementConverter.<init>(AvroRecordConverter.java:435)
> > > > > > > > > at
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > >
> >
> org.apache.parquet.avro.AvroRecordConverter$AvroCollectionConverter.<init>(AvroRecordConverter.java:385)
> > > > > > > > > at
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > >
> >
> org.apache.parquet.avro.AvroRecordConverter.newConverter(AvroRecordConverter.java:206)
> > > > > > > > > at
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > >
> >
> org.apache.parquet.avro.AvroRecordConverter.access$200(AvroRecordConverter.java:63)
> > > > > > > > > at
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > >
> >
> org.apache.parquet.avro.AvroRecordConverter$MapConverter$MapKeyValueConverter.<init>(AvroRecordConverter.java:872)
> > > > > > > > > at
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > >
> >
> org.apache.parquet.avro.AvroRecordConverter$MapConverter.<init>(AvroRecordConverter.java:822)
> > > > > > > > > at
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > >
> >
> org.apache.parquet.avro.AvroRecordConverter.newConverter(AvroRecordConverter.java:210)
> > > > > > > > > at
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > >
> >
> org.apache.parquet.avro.AvroRecordConverter.<init>(AvroRecordConverter.java:112)
> > > > > > > > > at
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > >
> >
> org.apache.parquet.avro.AvroRecordConverter.<init>(AvroRecordConverter.java:79)
> > > > > > > > > at
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > >
> >
> org.apache.parquet.avro.AvroRecordMaterializer.<init>(AvroRecordMaterializer.java:33)
> > > > > > > > > at
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > >
> >
> org.apache.parquet.avro.AvroReadSupport.prepareForRead(AvroReadSupport.java:132)
> > > > > > > > > at
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > >
> >
> org.apache.parquet.hadoop.InternalParquetRecordReader.initialize(InternalParquetRecordReader.java:175)
> > > > > > > > > at
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > >
> >
> org.apache.parquet.hadoop.ParquetReader.initReader(ParquetReader.java:149)
> > > > > > > > > at
> > > > > > > >
> > > > > > >
> > > > > >
> > > > > >
> > org.apache.parquet.hadoop.ParquetReader.read(ParquetReader.java:125)
> > > > > > > > > at
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > >
> >
> com.uber.hoodie.func.ParquetReaderIterator.hasNext(ParquetReaderIterator.java:45)
> > > > > > > > > at
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > >
> >
> com.uber.hoodie.common.util.queue.IteratorBasedQueueProducer.produce(IteratorBasedQueueProducer.java:44)
> > > > > > > > > at
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > >
> >
> com.uber.hoodie.common.util.queue.BoundedInMemoryExecutor.lambda$null$0(BoundedInMemoryExecutor.java:94)
> > > > > > > > > at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> > > > > > > > > at
> > > > > > > >
> > > > > > > >
> > > > >
> > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> > > > > > > > > ... 4 more
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > [1]
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > >
> >
> https://oss.sonatype.org/content/repositories/snapshots/com/uber/hoodie/hoodie-utilities-bundle/0.4.8-SNAPSHOT/hoodie-utilities-bundle-0.4.8-20190531.060546-1.jar
> > > > > > > > >
> > > > > > > > > Thanks,
> > > > > > > > > - Ethan
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > > >
> > >
> > >
> >
>

Re: [QUESTION] Encountering exceptions while upserting with Deltastreamer

Reply via email to