soumilshah1995 opened a new issue, #7451:
URL: https://github.com/apache/hudi/issues/7451

   Hello i am using glue 4.0 and i am trying to make  a tutorial for community. 
the title is move data from dynamodb into Apache Hudi.
   
   ##### Table 
   
![image](https://user-images.githubusercontent.com/39345855/207491290-6a23b4ce-1ed9-4e2c-8839-03afdb11b883.png)
   
   ##### Sample JSON
   ```
   {
     "id": {
       "S": "bbbe96f6-f762-4750-a946-a5e0dc5d0fa5"
     },
     "address": {
       "S": "361 Terri Rapids\nDonnaberg, ID 55397"
     },
     "city": {
       "S": "361 Terri Rapids\nDonnaberg, ID 55397"
     },
     "first_name": {
       "S": "Melissa"
     },
     "last_name": {
       "S": "Frederick"
     },
     "state": {
       "S": "Miss what personal door country energy rate. Court school its 
indicate. Remember finally because debate role hospital appear."
     },
     "text": {
       "S": "Miss what personal door country energy rate. Court school its 
indicate. Remember finally because debate role hospital appear."
     }
   }
   ```
   
   #### Glue table 
   
![image](https://user-images.githubusercontent.com/39345855/207491453-876af9d6-2be9-4cd2-a1d5-8c0e5b6e9565.png)
   
   ```
   import sys
   from awsglue.transforms import *
   from awsglue.utils import getResolvedOptions
   from pyspark.context import SparkContext
   from awsglue.context import GlueContext
   from awsglue.job import Job
   
   args = getResolvedOptions(sys.argv, ["JOB_NAME"])
   sc = SparkContext()
   glueContext = GlueContext(sc)
   spark = glueContext.spark_session
   job = Job(glueContext)
   job.init(args["JOB_NAME"], args)
   
   # Script generated for node AWS Glue Data Catalog
   AWSGlueDataCatalog_node1670979530578 = 
glueContext.create_dynamic_frame.from_catalog(
       database="dev.dynamodbdb",
       table_name="dev_users",
       transformation_ctx="AWSGlueDataCatalog_node1670979530578",
   )
   
   # Script generated for node Rename Field
   RenameField_node1670981729330 = RenameField.apply(
       frame=AWSGlueDataCatalog_node1670979530578,
       old_name="id",
       new_name="pk",
       transformation_ctx="RenameField_node1670981729330",
   )
   
   # Script generated for node Change Schema (Apply Mapping)
   ChangeSchemaApplyMapping_node1670981753064 = ApplyMapping.apply(
       frame=RenameField_node1670981729330,
       mappings=[
           ("address", "string", "address", "string"),
           ("city", "string", "city", "string"),
           ("last_name", "string", "last_name", "string"),
           ("text", "string", "text", "string"),
           ("pk", "string", "pk", "string"),
           ("state", "string", "state", "string"),
           ("first_name", "string", "first_name", "string"),
       ],
       transformation_ctx="ChangeSchemaApplyMapping_node1670981753064",
   )
   
   additional_options={
       "hoodie.datasource.hive_sync.database": "hudidb",
       "hoodie.table.name": "hudi_table",
       "hoodie.datasource.hive_sync.table": "hudi_table",
   
       "hoodie.datasource.write.storage.type": "COPY_ON_WRITE",
       "hoodie.datasource.write.operation": "upsert",
       "hoodie.datasource.write.recordkey.field": "pk",
       "hoodie.datasource.write.precombine.field": "pk",
       "hoodie.combine.before.delete":"false",
   
   
       "hoodie.datasource.write.hive_style_partitioning": "true",
       "hoodie.datasource.hive_sync.enable": "true",
       'hoodie.datasource.hive_sync.sync_as_datasource': 'false',
       "hoodie.datasource.hive_sync.use_jdbc": "false",
   
       'hoodie.datasource.hive_sync.partition_extractor_class': 
'org.apache.hudi.hive.MultiPartKeysValueExtractor',
       "hoodie.datasource.hive_sync.mode": "hms",
       "path": "s3://glue-learn-begineers/data/"
   }
   df = ChangeSchemaApplyMapping_node1670981753064.toDF()
   
   
df.write.format("hudi").options(**additional_options).mode("overwrite").save()
   
   job.commit()
   
   ```
   
   #### Error Message 
   
   An error occurred while calling o138.save. Failed to upsert for commit time 
20221214022409792
   ##### Detailed Logs 
   ```
   here are older events to load. 
   Load more.
   2022-12-14 02:24:16,947 INFO [main] spark.SecurityManager 
(Logging.scala:logInfo(61)): Changing modify acls to: spark
   2022-12-14 02:24:16,948 INFO [main] spark.SecurityManager 
(Logging.scala:logInfo(61)): Changing view acls groups to: 
   2022-12-14 02:24:16,949 INFO [main] spark.SecurityManager 
(Logging.scala:logInfo(61)): Changing modify acls groups to: 
   2022-12-14 02:24:16,949 INFO [main] spark.SecurityManager 
(Logging.scala:logInfo(61)): SecurityManager: authentication enabled; ui acls 
disabled; users  with view permissions: Set(spark); groups with view 
permissions: Set(); users  with modify permissions: Set(spark); groups with 
modify permissions: Set()
   2022-12-14 02:24:17,452 INFO [netty-rpc-connection-0] 
client.TransportClientFactory (TransportClientFactory.java:createClient(310)): 
Successfully created connection to /172.35.86.171:37187 after 201 ms (126 ms 
spent in bootstraps)
   2022-12-14 02:24:17,552 INFO [main] spark.SecurityManager 
(Logging.scala:logInfo(61)): Changing view acls to: spark
   2022-12-14 02:24:17,552 INFO [main] spark.SecurityManager 
(Logging.scala:logInfo(61)): Changing modify acls to: spark
   2022-12-14 02:24:17,553 INFO [main] spark.SecurityManager 
(Logging.scala:logInfo(61)): Changing view acls groups to: 
   2022-12-14 02:24:17,553 INFO [main] spark.SecurityManager 
(Logging.scala:logInfo(61)): Changing modify acls groups to: 
   2022-12-14 02:24:17,556 INFO [main] spark.SecurityManager 
(Logging.scala:logInfo(61)): SecurityManager: authentication enabled; ui acls 
disabled; users  with view permissions: Set(spark); groups with view 
permissions: Set(); users  with modify permissions: Set(spark); groups with 
modify permissions: Set()
   2022-12-14 02:24:17,649 INFO [netty-rpc-connection-0] 
client.TransportClientFactory (TransportClientFactory.java:createClient(310)): 
Successfully created connection to /172.35.86.171:37187 after 12 ms (6 ms spent 
in bootstraps)
   2022-12-14 02:24:17,727 INFO [main] storage.DiskBlockManager 
(Logging.scala:logInfo(61)): Created local directory at 
/tmp/blockmgr-a35a422c-50cc-41bd-a5b6-404abcb60907
   2022-12-14 02:24:17,746 INFO [main] memory.MemoryStore 
(Logging.scala:logInfo(61)): MemoryStore started with capacity 5.8 GiB
   2022-12-14 02:24:17,813 INFO [main] sink.GlueCloudwatchSink 
(GlueCloudwatchSink.scala:<init>(53)): GlueCloudwatchSink: get cloudwatch 
client using proxy: host null, port -1
   2022-12-14 02:24:17,896 INFO [main] sink.GlueCloudwatchSink 
(GlueCloudwatchSink.scala:logInfo(22)): CloudwatchSink: Obtained credentials 
from the Instance Profile
   2022-12-14 02:24:17,943 INFO [main] sink.GlueCloudwatchSink 
(GlueCloudwatchSink.scala:logInfo(22)): CloudwatchSink: jobName: hudi-glue-4 
jobRunId: jr_b30b8eca9bb8e2c66af24af7fdee89dc3935906b7d44cd20791bc373bd250065
   2022-12-14 02:24:18,006 INFO [main] subresultcache.SubResultCacheManager 
(Logging.scala:logInfo(61)): Sub-result caches are disabled.
   2022-12-14 02:24:18,025 INFO [dispatcher-Executor] 
executor.CoarseGrainedExecutorBackend (Logging.scala:logInfo(61)): Connecting 
to driver: spark://CoarseGrainedScheduler@172.35.86.171:37187
   2022-12-14 02:24:18,548 INFO [dispatcher-Executor] resource.ResourceUtils 
(Logging.scala:logInfo(61)): 
==============================================================
   2022-12-14 02:24:18,549 INFO [dispatcher-Executor] resource.ResourceUtils 
(Logging.scala:logInfo(61)): No custom resources configured for spark.executor.
   2022-12-14 02:24:18,550 INFO [dispatcher-Executor] resource.ResourceUtils 
(Logging.scala:logInfo(61)): 
==============================================================
   2022-12-14 02:24:18,604 INFO [dispatcher-Executor] 
executor.CoarseGrainedExecutorBackend (Logging.scala:logInfo(61)): Successfully 
registered with driver
   2022-12-14 02:24:18,613 INFO [dispatcher-Executor] executor.Executor 
(Logging.scala:logInfo(61)): Starting executor ID 1 on host 172.35.38.197
   2022-12-14 02:24:18,682 INFO [dispatcher-Executor] util.Utils 
(Logging.scala:logInfo(61)): Successfully started service 
'org.apache.spark.network.netty.NettyBlockTransferService' on port 39803.
   2022-12-14 02:24:18,683 INFO [dispatcher-Executor] 
netty.NettyBlockTransferService (NettyBlockTransferService.scala:init(82)): 
Server created on 172.35.38.197:39803
   2022-12-14 02:24:18,685 INFO [dispatcher-Executor] storage.BlockManager 
(Logging.scala:logInfo(61)): Using 
org.apache.spark.storage.RandomBlockReplicationPolicy for block replication 
policy
   2022-12-14 02:24:18,695 INFO [dispatcher-Executor] 
storage.BlockManagerMaster (Logging.scala:logInfo(61)): Registering 
BlockManager BlockManagerId(1, 172.35.38.197, 39803, None)
   2022-12-14 02:24:18,707 INFO [dispatcher-Executor] 
storage.BlockManagerMaster (Logging.scala:logInfo(61)): Registered BlockManager 
BlockManagerId(1, 172.35.38.197, 39803, None)
   2022-12-14 02:24:18,708 INFO [dispatcher-Executor] storage.BlockManager 
(Logging.scala:logInfo(61)): Initialized BlockManager: BlockManagerId(1, 
172.35.38.197, 39803, None)
   2022-12-14 02:24:18,718 INFO [dispatcher-Executor] executor.Executor 
(Logging.scala:logInfo(61)): Starting executor with user classpath 
(userClassPathFirst = false): ''
   2022-12-14 02:24:18,905 INFO [pool-11-thread-3] util.PlatformInfo 
(PlatformInfo.java:getJobFlowId(56)): Unable to read clusterId from 
http://localhost:8321/configuration, trying extra instance data file: 
/var/lib/instance-controller/extraInstanceData.json
   2022-12-14 02:24:18,906 INFO [pool-11-thread-3] util.PlatformInfo 
(PlatformInfo.java:getJobFlowId(63)): Unable to read clusterId from 
/var/lib/instance-controller/extraInstanceData.json, trying EMR job-flow data 
file: /var/lib/info/job-flow.json
   2022-12-14 02:24:18,907 INFO [pool-11-thread-3] util.PlatformInfo 
(PlatformInfo.java:getJobFlowId(71)): Unable to read clusterId from 
/var/lib/info/job-flow.json, out of places to look
   2022-12-14 02:24:20,069 INFO [pool-11-thread-3] 
executor.CoarseGrainedExecutorBackend (Logging.scala:logInfo(61)): eagerFSInit: 
Eagerly initialized FileSystem at s3://does/not/exist in 1956 ms
   2022-12-14 02:24:38,121 INFO [dispatcher-Executor] 
executor.CoarseGrainedExecutorBackend (Logging.scala:logInfo(61)): Got assigned 
task 0
   2022-12-14 02:24:38,211 INFO [Executor task launch worker for task 0.0 in 
stage 0.0 (TID 0)] executor.Executor (Logging.scala:logInfo(61)): Running task 
0.0 in stage 0.0 (TID 0)
   2022-12-14 02:24:39,166 INFO [Executor task launch worker for task 0.0 in 
stage 0.0 (TID 0)] broadcast.TorrentBroadcast (Logging.scala:logInfo(61)): 
Started reading broadcast variable 4 with 1 pieces (estimated total size 4.0 
MiB)
   2022-12-14 02:24:39,212 INFO [Executor task launch worker for task 0.0 in 
stage 0.0 (TID 0)] client.TransportClientFactory 
(TransportClientFactory.java:createClient(310)): Successfully created 
connection to /172.35.86.171:39523 after 7 ms (5 ms spent in bootstraps)
   2022-12-14 02:24:39,267 INFO [Executor task launch worker for task 0.0 in 
stage 0.0 (TID 0)] memory.MemoryStore (Logging.scala:logInfo(61)): Block 
broadcast_4_piece0 stored as bytes in memory (estimated size 36.9 KiB, free 5.8 
GiB)
   2022-12-14 02:24:39,277 INFO [Executor task launch worker for task 0.0 in 
stage 0.0 (TID 0)] broadcast.TorrentBroadcast (Logging.scala:logInfo(61)): 
Reading broadcast variable 4 took 111 ms
   2022-12-14 02:24:39,321 INFO [Executor task launch worker for task 0.0 in 
stage 0.0 (TID 0)] memory.MemoryStore (Logging.scala:logInfo(61)): Block 
broadcast_4 stored as values in memory (estimated size 103.0 KiB, free 5.8 GiB)
   2022-12-14 02:24:41,268 INFO [Executor task launch worker for task 0.0 in 
stage 0.0 (TID 0)] executor.Executor (Logging.scala:logInfo(61)): Finished task 
0.0 in stage 0.0 (TID 0). 1012 bytes result sent to driver
   2022-12-14 02:24:41,481 INFO [dispatcher-Executor] 
executor.CoarseGrainedExecutorBackend (Logging.scala:logInfo(61)): Got assigned 
task 1
   2022-12-14 02:24:41,482 INFO [Executor task launch worker for task 0.0 in 
stage 1.0 (TID 1)] executor.Executor (Logging.scala:logInfo(61)): Running task 
0.0 in stage 1.0 (TID 1)
   2022-12-14 02:24:41,538 INFO [Executor task launch worker for task 0.0 in 
stage 1.0 (TID 1)] broadcast.TorrentBroadcast (Logging.scala:logInfo(61)): 
Started reading broadcast variable 5 with 1 pieces (estimated total size 4.0 
MiB)
   2022-12-14 02:24:41,544 INFO [Executor task launch worker for task 0.0 in 
stage 1.0 (TID 1)] memory.MemoryStore (Logging.scala:logInfo(61)): Block 
broadcast_5_piece0 stored as bytes in memory (estimated size 1882.0 B, free 5.8 
GiB)
   2022-12-14 02:24:41,550 INFO [Executor task launch worker for task 0.0 in 
stage 1.0 (TID 1)] broadcast.TorrentBroadcast (Logging.scala:logInfo(61)): 
Reading broadcast variable 5 took 12 ms
   2022-12-14 02:24:41,552 INFO [Executor task launch worker for task 0.0 in 
stage 1.0 (TID 1)] memory.MemoryStore (Logging.scala:logInfo(61)): Block 
broadcast_5 stored as values in memory (estimated size 3.1 KiB, free 5.8 GiB)
   2022-12-14 02:24:41,558 INFO [Executor task launch worker for task 0.0 in 
stage 1.0 (TID 1)] executor.Executor (Logging.scala:logInfo(61)): Finished task 
0.0 in stage 1.0 (TID 1). 803 bytes result sent to driver
   2022-12-14 02:24:44,355 INFO [dispatcher-Executor] 
executor.CoarseGrainedExecutorBackend (Logging.scala:logInfo(61)): Got assigned 
task 2
   2022-12-14 02:24:44,408 INFO [Executor task launch worker for task 0.0 in 
stage 2.0 (TID 2)] executor.Executor (Logging.scala:logInfo(61)): Running task 
0.0 in stage 2.0 (TID 2)
   2022-12-14 02:24:44,414 INFO [Executor task launch worker for task 0.0 in 
stage 2.0 (TID 2)] broadcast.TorrentBroadcast (Logging.scala:logInfo(61)): 
Started reading broadcast variable 6 with 1 pieces (estimated total size 4.0 
MiB)
   2022-12-14 02:24:44,420 INFO [Executor task launch worker for task 0.0 in 
stage 2.0 (TID 2)] memory.MemoryStore (Logging.scala:logInfo(61)): Block 
broadcast_6_piece0 stored as bytes in memory (estimated size 5.3 KiB, free 5.8 
GiB)
   2022-12-14 02:24:44,424 INFO [Executor task launch worker for task 0.0 in 
stage 2.0 (TID 2)] broadcast.TorrentBroadcast (Logging.scala:logInfo(61)): 
Reading broadcast variable 6 took 10 ms
   2022-12-14 02:24:44,426 INFO [Executor task launch worker for task 0.0 in 
stage 2.0 (TID 2)] memory.MemoryStore (Logging.scala:logInfo(61)): Block 
broadcast_6 stored as values in memory (estimated size 9.8 KiB, free 5.8 GiB)
   2022-12-14 02:24:44,556 INFO [Executor task launch worker for task 0.0 in 
stage 2.0 (TID 2)] memory.MemoryStore (Logging.scala:logInfo(61)): Block 
rdd_22_0 stored as values in memory (estimated size 224.0 B, free 5.8 GiB)
   2022-12-14 02:24:44,624 INFO [Executor task launch worker for task 0.0 in 
stage 2.0 (TID 2)] executor.Executor (Logging.scala:logInfo(61)): Finished task 
0.0 in stage 2.0 (TID 2). 1096 bytes result sent to driver
   2022-12-14 02:24:44,684 INFO [dispatcher-Executor] 
executor.CoarseGrainedExecutorBackend (Logging.scala:logInfo(61)): Got assigned 
task 3
   2022-12-14 02:24:44,687 INFO [Executor task launch worker for task 0.0 in 
stage 3.0 (TID 3)] executor.Executor (Logging.scala:logInfo(61)): Running task 
0.0 in stage 3.0 (TID 3)
   2022-12-14 02:24:44,687 INFO [dispatcher-Executor] 
executor.CoarseGrainedExecutorBackend (Logging.scala:logInfo(61)): Got assigned 
task 4
   2022-12-14 02:24:44,688 INFO [Executor task launch worker for task 1.0 in 
stage 3.0 (TID 4)] executor.Executor (Logging.scala:logInfo(61)): Running task 
1.0 in stage 3.0 (TID 4)
   2022-12-14 02:24:44,688 INFO [dispatcher-Executor] 
executor.CoarseGrainedExecutorBackend (Logging.scala:logInfo(61)): Got assigned 
task 5
   2022-12-14 02:24:44,691 INFO [Executor task launch worker for task 2.0 in 
stage 3.0 (TID 5)] executor.Executor (Logging.scala:logInfo(61)): Running task 
2.0 in stage 3.0 (TID 5)
   2022-12-14 02:24:44,691 INFO [dispatcher-Executor] 
executor.CoarseGrainedExecutorBackend (Logging.scala:logInfo(61)): Got assigned 
task 6
   2022-12-14 02:24:44,693 INFO [Executor task launch worker for task 1.0 in 
stage 3.0 (TID 4)] spark.MapOutputTrackerWorker (Logging.scala:logInfo(61)): 
Updating epoch to 1 and clearing cache
   2022-12-14 02:24:44,695 INFO [Executor task launch worker for task 3.0 in 
stage 3.0 (TID 6)] executor.Executor (Logging.scala:logInfo(61)): Running task 
3.0 in stage 3.0 (TID 6)
   2022-12-14 02:24:44,696 INFO [Executor task launch worker for task 1.0 in 
stage 3.0 (TID 4)] broadcast.TorrentBroadcast (Logging.scala:logInfo(61)): 
Started reading broadcast variable 7 with 1 pieces (estimated total size 4.0 
MiB)
   2022-12-14 02:24:44,705 INFO [Executor task launch worker for task 1.0 in 
stage 3.0 (TID 4)] memory.MemoryStore (Logging.scala:logInfo(61)): Block 
broadcast_7_piece0 stored as bytes in memory (estimated size 3.3 KiB, free 5.8 
GiB)
   2022-12-14 02:24:44,709 INFO [Executor task launch worker for task 1.0 in 
stage 3.0 (TID 4)] broadcast.TorrentBroadcast (Logging.scala:logInfo(61)): 
Reading broadcast variable 7 took 12 ms
   2022-12-14 02:24:44,711 INFO [Executor task launch worker for task 1.0 in 
stage 3.0 (TID 4)] memory.MemoryStore (Logging.scala:logInfo(61)): Block 
broadcast_7 stored as values in memory (estimated size 5.7 KiB, free 5.8 GiB)
   2022-12-14 02:24:44,717 INFO [Executor task launch worker for task 0.0 in 
stage 3.0 (TID 3)] spark.MapOutputTrackerWorker (Logging.scala:logInfo(61)): 
Don't have map outputs for shuffle 0, fetching them
   2022-12-14 02:24:44,717 INFO [Executor task launch worker for task 1.0 in 
stage 3.0 (TID 4)] spark.MapOutputTrackerWorker (Logging.scala:logInfo(61)): 
Don't have map outputs for shuffle 0, fetching them
   2022-12-14 02:24:44,719 INFO [Executor task launch worker for task 2.0 in 
stage 3.0 (TID 5)] spark.MapOutputTrackerWorker (Logging.scala:logInfo(61)): 
Don't have map outputs for shuffle 0, fetching them
   2022-12-14 02:24:44,719 INFO [Executor task launch worker for task 0.0 in 
stage 3.0 (TID 3)] spark.MapOutputTrackerWorker (Logging.scala:logInfo(61)): 
Doing the fetch; tracker endpoint = 
NettyRpcEndpointRef(spark://MapOutputTracker@172.35.86.171:37187)
   2022-12-14 02:24:44,722 INFO [Executor task launch worker for task 3.0 in 
stage 3.0 (TID 6)] spark.MapOutputTrackerWorker (Logging.scala:logInfo(61)): 
Don't have map outputs for shuffle 0, fetching them
   2022-12-14 02:24:44,896 INFO [Executor task launch worker for task 0.0 in 
stage 3.0 (TID 3)] spark.MapOutputTrackerWorker (Logging.scala:logInfo(61)): 
Got the map output locations
   2022-12-14 02:24:44,939 INFO [Executor task launch worker for task 3.0 in 
stage 3.0 (TID 6)] storage.ShuffleBlockFetcherIterator 
(Logging.scala:logInfo(61)): Getting 0 (0.0 B) non-empty blocks including 0 
(0.0 B) local and 0 (0.0 B) host-local and 0 (0.0 B) push-merged-local and 0 
(0.0 B) remote blocks
   2022-12-14 02:24:44,939 INFO [Executor task launch worker for task 1.0 in 
stage 3.0 (TID 4)] storage.ShuffleBlockFetcherIterator 
(Logging.scala:logInfo(61)): Getting 0 (0.0 B) non-empty blocks including 0 
(0.0 B) local and 0 (0.0 B) host-local and 0 (0.0 B) push-merged-local and 0 
(0.0 B) remote blocks
   2022-12-14 02:24:44,940 INFO [Executor task launch worker for task 2.0 in 
stage 3.0 (TID 5)] storage.ShuffleBlockFetcherIterator 
(Logging.scala:logInfo(61)): Getting 0 (0.0 B) non-empty blocks including 0 
(0.0 B) local and 0 (0.0 B) host-local and 0 (0.0 B) push-merged-local and 0 
(0.0 B) remote blocks
   2022-12-14 02:24:44,939 INFO [Executor task launch worker for task 0.0 in 
stage 3.0 (TID 3)] storage.ShuffleBlockFetcherIterator 
(Logging.scala:logInfo(61)): Getting 1 (142.0 B) non-empty blocks including 1 
(142.0 B) local and 0 (0.0 B) host-local and 0 (0.0 B) push-merged-local and 0 
(0.0 B) remote blocks
   2022-12-14 02:24:44,943 INFO [Executor task launch worker for task 0.0 in 
stage 3.0 (TID 3)] storage.ShuffleBlockFetcherIterator 
(Logging.scala:logInfo(61)): Started 0 remote fetches in 20 ms
   2022-12-14 02:24:44,943 INFO [Executor task launch worker for task 1.0 in 
stage 3.0 (TID 4)] storage.ShuffleBlockFetcherIterator 
(Logging.scala:logInfo(61)): Started 0 remote fetches in 15 ms
   2022-12-14 02:24:44,943 INFO [Executor task launch worker for task 3.0 in 
stage 3.0 (TID 6)] storage.ShuffleBlockFetcherIterator 
(Logging.scala:logInfo(61)): Started 0 remote fetches in 16 ms
   2022-12-14 02:24:44,949 INFO [Executor task launch worker for task 2.0 in 
stage 3.0 (TID 5)] storage.ShuffleBlockFetcherIterator 
(Logging.scala:logInfo(61)): Started 0 remote fetches in 24 ms
   2022-12-14 02:24:44,978 INFO [Executor task launch worker for task 2.0 in 
stage 3.0 (TID 5)] executor.Executor (Logging.scala:logInfo(61)): Finished task 
2.0 in stage 3.0 (TID 5). 1278 bytes result sent to driver
   2022-12-14 02:24:44,978 INFO [Executor task launch worker for task 3.0 in 
stage 3.0 (TID 6)] executor.Executor (Logging.scala:logInfo(61)): Finished task 
3.0 in stage 3.0 (TID 6). 1278 bytes result sent to driver
   2022-12-14 02:24:44,983 INFO [Executor task launch worker for task 1.0 in 
stage 3.0 (TID 4)] executor.Executor (Logging.scala:logInfo(61)): Finished task 
1.0 in stage 3.0 (TID 4). 1321 bytes result sent to driver
   2022-12-14 02:24:44,993 INFO [Executor task launch worker for task 0.0 in 
stage 3.0 (TID 3)] executor.Executor (Logging.scala:logInfo(61)): Finished task 
0.0 in stage 3.0 (TID 3). 1406 bytes result sent to driver
   2022-12-14 02:24:45,198 INFO [dispatcher-Executor] 
executor.CoarseGrainedExecutorBackend (Logging.scala:logInfo(61)): Got assigned 
task 7
   2022-12-14 02:24:45,199 INFO [Executor task launch worker for task 0.0 in 
stage 4.0 (TID 7)] executor.Executor (Logging.scala:logInfo(61)): Running task 
0.0 in stage 4.0 (TID 7)
   2022-12-14 02:24:45,202 INFO [Executor task launch worker for task 0.0 in 
stage 4.0 (TID 7)] broadcast.TorrentBroadcast (Logging.scala:logInfo(61)): 
Started reading broadcast variable 8 with 1 pieces (estimated total size 4.0 
MiB)
   2022-12-14 02:24:45,209 INFO [Executor task launch worker for task 0.0 in 
stage 4.0 (TID 7)] memory.MemoryStore (Logging.scala:logInfo(61)): Block 
broadcast_8_piece0 stored as bytes in memory (estimated size 122.9 KiB, free 
5.8 GiB)
   2022-12-14 02:24:45,213 INFO [Executor task launch worker for task 0.0 in 
stage 4.0 (TID 7)] broadcast.TorrentBroadcast (Logging.scala:logInfo(61)): 
Reading broadcast variable 8 took 10 ms
   2022-12-14 02:24:45,216 INFO [Executor task launch worker for task 0.0 in 
stage 4.0 (TID 7)] memory.MemoryStore (Logging.scala:logInfo(61)): Block 
broadcast_8 stored as values in memory (estimated size 344.0 KiB, free 5.8 GiB)
   2022-12-14 02:24:45,390 INFO [Executor task launch worker for task 0.0 in 
stage 4.0 (TID 7)] executor.Executor (Logging.scala:logInfo(61)): Finished task 
0.0 in stage 4.0 (TID 7). 848 bytes result sent to driver
   2022-12-14 02:24:47,412 INFO [dispatcher-Executor] 
executor.CoarseGrainedExecutorBackend (Logging.scala:logInfo(61)): Got assigned 
task 8
   2022-12-14 02:24:47,413 INFO [Executor task launch worker for task 0.0 in 
stage 5.0 (TID 8)] executor.Executor (Logging.scala:logInfo(61)): Running task 
0.0 in stage 5.0 (TID 8)
   2022-12-14 02:24:47,415 INFO [Executor task launch worker for task 0.0 in 
stage 5.0 (TID 8)] broadcast.TorrentBroadcast (Logging.scala:logInfo(61)): 
Started reading broadcast variable 9 with 1 pieces (estimated total size 4.0 
MiB)
   2022-12-14 02:24:47,422 INFO [Executor task launch worker for task 0.0 in 
stage 5.0 (TID 8)] memory.MemoryStore (Logging.scala:logInfo(61)): Block 
broadcast_9_piece0 stored as bytes in memory (estimated size 124.6 KiB, free 
5.8 GiB)
   2022-12-14 02:24:47,425 INFO [Executor task launch worker for task 0.0 in 
stage 5.0 (TID 8)] broadcast.TorrentBroadcast (Logging.scala:logInfo(61)): 
Reading broadcast variable 9 took 9 ms
   2022-12-14 02:24:47,427 INFO [Executor task launch worker for task 0.0 in 
stage 5.0 (TID 8)] memory.MemoryStore (Logging.scala:logInfo(61)): Block 
broadcast_9 stored as values in memory (estimated size 348.2 KiB, free 5.8 GiB)
   2022-12-14 02:24:47,455 INFO [Executor task launch worker for task 0.0 in 
stage 5.0 (TID 8)] storage.BlockManager (Logging.scala:logInfo(61)): Found 
block rdd_22_0 locally
   2022-12-14 02:24:47,481 INFO [Executor task launch worker for task 0.0 in 
stage 5.0 (TID 8)] executor.Executor (Logging.scala:logInfo(61)): Finished task 
0.0 in stage 5.0 (TID 8). 1050 bytes result sent to driver
   2022-12-14 02:24:47,533 INFO [dispatcher-Executor] 
executor.CoarseGrainedExecutorBackend (Logging.scala:logInfo(61)): Got assigned 
task 9
   2022-12-14 02:24:47,533 INFO [Executor task launch worker for task 0.0 in 
stage 6.0 (TID 9)] executor.Executor (Logging.scala:logInfo(61)): Running task 
0.0 in stage 6.0 (TID 9)
   2022-12-14 02:24:47,534 INFO [Executor task launch worker for task 0.0 in 
stage 6.0 (TID 9)] spark.MapOutputTrackerWorker (Logging.scala:logInfo(61)): 
Updating epoch to 2 and clearing cache
   2022-12-14 02:24:47,536 INFO [Executor task launch worker for task 0.0 in 
stage 6.0 (TID 9)] broadcast.TorrentBroadcast (Logging.scala:logInfo(61)): 
Started reading broadcast variable 10 with 1 pieces (estimated total size 4.0 
MiB)
   2022-12-14 02:24:47,542 INFO [Executor task launch worker for task 0.0 in 
stage 6.0 (TID 9)] memory.MemoryStore (Logging.scala:logInfo(61)): Block 
broadcast_10_piece0 stored as bytes in memory (estimated size 168.1 KiB, free 
5.8 GiB)
   2022-12-14 02:24:47,545 INFO [Executor task launch worker for task 0.0 in 
stage 6.0 (TID 9)] broadcast.TorrentBroadcast (Logging.scala:logInfo(61)): 
Reading broadcast variable 10 took 8 ms
   2022-12-14 02:24:47,548 INFO [Executor task launch worker for task 0.0 in 
stage 6.0 (TID 9)] memory.MemoryStore (Logging.scala:logInfo(61)): Block 
broadcast_10 stored as values in memory (estimated size 474.8 KiB, free 5.8 GiB)
   2022-12-14 02:24:47,597 INFO [Executor task launch worker for task 0.0 in 
stage 6.0 (TID 9)] spark.MapOutputTrackerWorker (Logging.scala:logInfo(61)): 
Don't have map outputs for shuffle 1, fetching them
   2022-12-14 02:24:47,597 INFO [Executor task launch worker for task 0.0 in 
stage 6.0 (TID 9)] spark.MapOutputTrackerWorker (Logging.scala:logInfo(61)): 
Doing the fetch; tracker endpoint = 
NettyRpcEndpointRef(spark://MapOutputTracker@172.35.86.171:37187)
   2022-12-14 02:24:47,601 INFO [Executor task launch worker for task 0.0 in 
stage 6.0 (TID 9)] spark.MapOutputTrackerWorker (Logging.scala:logInfo(61)): 
Got the map output locations
   2022-12-14 02:24:47,603 INFO [Executor task launch worker for task 0.0 in 
stage 6.0 (TID 9)] storage.ShuffleBlockFetcherIterator 
(Logging.scala:logInfo(61)): Getting 1 (276.0 B) non-empty blocks including 1 
(276.0 B) local and 0 (0.0 B) host-local and 0 (0.0 B) push-merged-local and 0 
(0.0 B) remote blocks
   2022-12-14 02:24:47,603 INFO [Executor task launch worker for task 0.0 in 
stage 6.0 (TID 9)] storage.ShuffleBlockFetcherIterator 
(Logging.scala:logInfo(61)): Started 0 remote fetches in 1 ms
   2022-12-14 02:24:47,624 INFO [Executor task launch worker for task 0.0 in 
stage 6.0 (TID 9)] deltacommit.BaseSparkDeltaCommitActionExecutor 
(BaseSparkDeltaCommitActionExecutor.java:handleUpdate(76)): Merging updates for 
commit 00000000000000 for file files-0000
   2022-12-14 02:24:47,735 INFO [Executor task launch worker for task 0.0 in 
stage 6.0 (TID 9)] view.FileSystemViewManager 
(FileSystemViewManager.java:createViewManager(245)): Creating View Manager with 
storage type :MEMORY
   2022-12-14 02:24:47,738 INFO [Executor task launch worker for task 0.0 in 
stage 6.0 (TID 9)] view.FileSystemViewManager 
(FileSystemViewManager.java:createViewManager(257)): Creating in-memory based 
Table View
   2022-12-14 02:24:47,740 INFO [Executor task launch worker for task 0.0 in 
stage 6.0 (TID 9)] view.FileSystemViewManager 
(FileSystemViewManager.java:createInMemoryFileSystemView(165)): Creating 
InMemory based view for basePath s3://glue-learn-begineers/data/.hoodie/metadata
   2022-12-14 02:24:47,746 INFO [Executor task launch worker for task 0.0 in 
stage 6.0 (TID 9)] view.AbstractTableFileSystemView 
(AbstractTableFileSystemView.java:resetFileGroupsReplaced(244)): Took 2 ms to 
read  0 instants, 0 replaced file groups
   2022-12-14 02:24:48,232 INFO [Executor task launch worker for task 0.0 in 
stage 6.0 (TID 9)] util.ClusteringUtils 
(ClusteringUtils.java:getAllFileGroupsInPendingClusteringPlans(137)): Found 0 
files in pending clustering operations
   2022-12-14 02:24:48,233 INFO [Executor task launch worker for task 0.0 in 
stage 6.0 (TID 9)] view.AbstractTableFileSystemView 
(AbstractTableFileSystemView.java:lambda$ensurePartitionLoadedCorrectly$9(302)):
 Building file system view for partition (files)
   2022-12-14 02:24:48,400 INFO [Executor task launch worker for task 0.0 in 
stage 6.0 (TID 9)] view.AbstractTableFileSystemView 
(AbstractTableFileSystemView.java:addFilesToView(151)): addFilesToView: 
NumFiles=1, NumFileGroups=1, FileGroupsCreationTime=6, StoreTimeTaken=0
   2022-12-14 02:24:48,401 INFO [Executor task launch worker for task 0.0 in 
stage 6.0 (TID 9)] io.HoodieAppendHandle (HoodieAppendHandle.java:init(156)): 
New AppendHandle for partition :files
   2022-12-14 02:24:49,944 INFO [Executor task launch worker for task 0.0 in 
stage 6.0 (TID 9)] s3n.MultipartUploadOutputStream 
(MultipartUploadOutputStream.java:close(421)): close closed:false 
s3://glue-learn-begineers/data/.hoodie/metadata/files/.hoodie_partition_metadata_0
   2022-12-14 02:24:50,282 INFO [Executor task launch worker for task 0.0 in 
stage 6.0 (TID 9)] s3n.S3NativeFileSystem 
(S3NativeFileSystem.java:rename(966)): rename 
s3://glue-learn-begineers/data/.hoodie/metadata/files/.hoodie_partition_metadata_0
 
s3://glue-learn-begineers/data/.hoodie/metadata/files/.hoodie_partition_metadata
 using algorithm version 1
   2022-12-14 02:24:53,020 INFO [Executor task launch worker for task 0.0 in 
stage 6.0 (TID 9)] marker.DirectWriteMarkers 
(DirectWriteMarkers.java:create(173)): Creating Marker 
Path=s3://glue-learn-begineers/data/.hoodie/metadata/.hoodie/.temp/00000000000000/files/files-0000_0-6-9_00000000000000.hfile.marker.APPEND
   2022-12-14 02:24:53,255 INFO [Executor task launch worker for task 0.0 in 
stage 6.0 (TID 9)] s3n.MultipartUploadOutputStream 
(MultipartUploadOutputStream.java:close(421)): close closed:false 
s3://glue-learn-begineers/data/.hoodie/metadata/.hoodie/.temp/00000000000000/files/files-0000_0-6-9_00000000000000.hfile.marker.APPEND
   2022-12-14 02:24:53,340 INFO [Executor task launch worker for task 0.0 in 
stage 6.0 (TID 9)] marker.DirectWriteMarkers 
(DirectWriteMarkers.java:create(178)): [direct] Created marker file 
s3://glue-learn-begineers/data/.hoodie/metadata/.hoodie/.temp/00000000000000/files/files-0000_0-6-9_00000000000000.hfile.marker.APPEND
 in 1963 ms
   2022-12-14 02:24:53,346 INFO [Executor task launch worker for task 0.0 in 
stage 6.0 (TID 9)] log.HoodieLogFormat$WriterBuilder 
(HoodieLogFormat.java:build(208)): Building HoodieLogFormat Writer
   2022-12-14 02:24:53,347 INFO [Executor task launch worker for task 0.0 in 
stage 6.0 (TID 9)] log.HoodieLogFormat$WriterBuilder 
(HoodieLogFormat.java:build(255)): HoodieLogFile on path 
s3://glue-learn-begineers/data/.hoodie/metadata/files/.files-0000_00000000000000.log.1_0-6-9
   2022-12-14 02:24:53,757 INFO [Executor task launch worker for task 0.0 in 
stage 6.0 (TID 9)] log.HoodieLogFormatWriter 
(HoodieLogFormatWriter.java:getOutputStream(125)): 
HoodieLogFile{pathStr='s3://glue-learn-begineers/data/.hoodie/metadata/files/.files-0000_00000000000000.log.1_0-6-9',
 fileLen=0} does not exist. Create a new file
   2022-12-14 02:24:54,305 WARN [Executor task launch worker for task 0.0 in 
stage 6.0 (TID 9)] impl.MetricsConfig (MetricsConfig.java:loadFirst(136)): 
Cannot locate configuration: tried 
hadoop-metrics2-hbase.properties,hadoop-metrics2.properties
   2022-12-14 02:24:54,319 INFO [Executor task launch worker for task 0.0 in 
stage 6.0 (TID 9)] impl.MetricsSystemImpl 
(MetricsSystemImpl.java:startTimer(378)): Scheduled Metric snapshot period at 
10 second(s).
   2022-12-14 02:24:54,319 INFO [Executor task launch worker for task 0.0 in 
stage 6.0 (TID 9)] impl.MetricsSystemImpl (MetricsSystemImpl.java:start(191)): 
HBase metrics system started
   2022-12-14 02:24:54,358 INFO [Executor task launch worker for task 0.0 in 
stage 6.0 (TID 9)] metrics.MetricRegistries 
(MetricRegistriesLoader.java:load(63)): Loaded MetricRegistries class 
org.apache.hudi.org.apache.hadoop.hbase.metrics.impl.MetricRegistriesImpl
   2022-12-14 02:24:54,419 INFO [Executor task launch worker for task 0.0 in 
stage 6.0 (TID 9)] zlib.ZlibFactory (ZlibFactory.java:loadNativeZLib(59)): 
Successfully loaded & initialized native-zlib library
   2022-12-14 02:24:54,421 INFO [Executor task launch worker for task 0.0 in 
stage 6.0 (TID 9)] compress.CodecPool (CodecPool.java:getCompressor(153)): Got 
brand-new compressor [.gz]
   2022-12-14 02:24:54,422 INFO [Executor task launch worker for task 0.0 in 
stage 6.0 (TID 9)] compress.CodecPool (CodecPool.java:getCompressor(153)): Got 
brand-new compressor [.gz]
   2022-12-14 02:24:54,579 INFO [Executor task launch worker for task 0.0 in 
stage 6.0 (TID 9)] io.HoodieAppendHandle 
(HoodieAppendHandle.java:processAppendResult(370)): AppendHandle for 
partitionPath files filePath files/.files-0000_00000000000000.log.1_0-6-9, took 
6892 ms.
   2022-12-14 02:24:54,580 INFO [Executor task launch worker for task 0.0 in 
stage 6.0 (TID 9)] s3n.MultipartUploadOutputStream 
(MultipartUploadOutputStream.java:close(421)): close closed:false 
s3://glue-learn-begineers/data/.hoodie/metadata/files/.files-0000_00000000000000.log.1_0-6-9
   2022-12-14 02:24:54,829 INFO [Executor task launch worker for task 0.0 in 
stage 6.0 (TID 9)] memory.MemoryStore (Logging.scala:logInfo(61)): Block 
rdd_32_0 stored as values in memory (estimated size 279.0 B, free 5.8 GiB)
   2022-12-14 02:24:54,901 INFO [Executor task launch worker for task 0.0 in 
stage 6.0 (TID 9)] executor.Executor (Logging.scala:logInfo(61)): Finished task 
0.0 in stage 6.0 (TID 9). 1563 bytes result sent to driver
   2022-12-14 02:24:55,049 INFO [dispatcher-Executor] 
executor.CoarseGrainedExecutorBackend (Logging.scala:logInfo(61)): Got assigned 
task 10
   2022-12-14 02:24:55,102 INFO [Executor task launch worker for task 0.0 in 
stage 8.0 (TID 10)] executor.Executor (Logging.scala:logInfo(61)): Running task 
0.0 in stage 8.0 (TID 10)
   2022-12-14 02:24:55,105 INFO [Executor task launch worker for task 0.0 in 
stage 8.0 (TID 10)] broadcast.TorrentBroadcast (Logging.scala:logInfo(61)): 
Started reading broadcast variable 11 with 1 pieces (estimated total size 4.0 
MiB)
   2022-12-14 02:24:55,112 INFO [Executor task launch worker for task 0.0 in 
stage 8.0 (TID 10)] memory.MemoryStore (Logging.scala:logInfo(61)): Block 
broadcast_11_piece0 stored as bytes in memory (estimated size 168.1 KiB, free 
5.8 GiB)
   2022-12-14 02:24:55,114 INFO [Executor task launch worker for task 0.0 in 
stage 8.0 (TID 10)] broadcast.TorrentBroadcast (Logging.scala:logInfo(61)): 
Reading broadcast variable 11 took 9 ms
   2022-12-14 02:24:55,117 INFO [Executor task launch worker for task 0.0 in 
stage 8.0 (TID 10)] memory.MemoryStore (Logging.scala:logInfo(61)): Block 
broadcast_11 stored as values in memory (estimated size 474.8 KiB, free 5.8 GiB)
   2022-12-14 02:24:55,133 INFO [Executor task launch worker for task 0.0 in 
stage 8.0 (TID 10)] storage.BlockManager (Logging.scala:logInfo(61)): Found 
block rdd_32_0 locally
   2022-12-14 02:24:55,138 INFO [Executor task launch worker for task 0.0 in 
stage 8.0 (TID 10)] executor.Executor (Logging.scala:logInfo(61)): Finished 
task 0.0 in stage 8.0 (TID 10). 1176 bytes result sent to driver
   2022-12-14 02:24:55,749 INFO [dispatcher-Executor] 
executor.CoarseGrainedExecutorBackend (Logging.scala:logInfo(61)): Got assigned 
task 11
   2022-12-14 02:24:55,750 INFO [Executor task launch worker for task 0.0 in 
stage 9.0 (TID 11)] executor.Executor (Logging.scala:logInfo(61)): Running task 
0.0 in stage 9.0 (TID 11)
   2022-12-14 02:24:55,752 INFO [Executor task launch worker for task 0.0 in 
stage 9.0 (TID 11)] broadcast.TorrentBroadcast (Logging.scala:logInfo(61)): 
Started reading broadcast variable 12 with 1 pieces (estimated total size 4.0 
MiB)
   2022-12-14 02:24:55,758 INFO [Executor task launch worker for task 0.0 in 
stage 9.0 (TID 11)] memory.MemoryStore (Logging.scala:logInfo(61)): Block 
broadcast_12_piece0 stored as bytes in memory (estimated size 42.5 KiB, free 
5.8 GiB)
   2022-12-14 02:24:55,760 INFO [Executor task launch worker for task 0.0 in 
stage 9.0 (TID 11)] broadcast.TorrentBroadcast (Logging.scala:logInfo(61)): 
Reading broadcast variable 12 took 7 ms
   2022-12-14 02:24:55,762 INFO [Executor task launch worker for task 0.0 in 
stage 9.0 (TID 11)] memory.MemoryStore (Logging.scala:logInfo(61)): Block 
broadcast_12 stored as values in memory (estimated size 125.6 KiB, free 5.8 GiB)
   2022-12-14 02:24:55,933 INFO [Executor task launch worker for task 0.0 in 
stage 9.0 (TID 11)] executor.Executor (Logging.scala:logInfo(61)): Finished 
task 0.0 in stage 9.0 (TID 11). 804 bytes result sent to driver
   2022-12-14 02:24:56,769 INFO [dispatcher-Executor] 
executor.CoarseGrainedExecutorBackend (Logging.scala:logInfo(61)): Got assigned 
task 12
   2022-12-14 02:24:56,770 INFO [Executor task launch worker for task 0.0 in 
stage 10.0 (TID 12)] executor.Executor (Logging.scala:logInfo(61)): Running 
task 0.0 in stage 10.0 (TID 12)
   2022-12-14 02:24:56,774 INFO [Executor task launch worker for task 0.0 in 
stage 10.0 (TID 12)] broadcast.TorrentBroadcast (Logging.scala:logInfo(61)): 
Started reading broadcast variable 13 with 1 pieces (estimated total size 4.0 
MiB)
   2022-12-14 02:24:56,779 INFO [Executor task launch worker for task 0.0 in 
stage 10.0 (TID 12)] memory.MemoryStore (Logging.scala:logInfo(61)): Block 
broadcast_13_piece0 stored as bytes in memory (estimated size 42.5 KiB, free 
5.8 GiB)
   2022-12-14 02:24:56,781 INFO [Executor task launch worker for task 0.0 in 
stage 10.0 (TID 12)] broadcast.TorrentBroadcast (Logging.scala:logInfo(61)): 
Reading broadcast variable 13 took 6 ms
   2022-12-14 02:24:56,782 INFO [Executor task launch worker for task 0.0 in 
stage 10.0 (TID 12)] memory.MemoryStore (Logging.scala:logInfo(61)): Block 
broadcast_13 stored as values in memory (estimated size 125.8 KiB, free 5.8 GiB)
   2022-12-14 02:24:57,352 INFO [Executor task launch worker for task 0.0 in 
stage 10.0 (TID 12)] executor.Executor (Logging.scala:logInfo(61)): Finished 
task 0.0 in stage 10.0 (TID 12). 895 bytes result sent to driver
   2022-12-14 02:24:58,483 INFO [dispatcher-Executor] 
executor.CoarseGrainedExecutorBackend (Logging.scala:logInfo(61)): Got assigned 
task 13
   2022-12-14 02:24:58,484 INFO [Executor task launch worker for task 0.0 in 
stage 12.0 (TID 13)] executor.Executor (Logging.scala:logInfo(61)): Running 
task 0.0 in stage 12.0 (TID 13)
   2022-12-14 02:24:58,486 INFO [Executor task launch worker for task 0.0 in 
stage 12.0 (TID 13)] broadcast.TorrentBroadcast (Logging.scala:logInfo(61)): 
Started reading broadcast variable 14 with 1 pieces (estimated total size 4.0 
MiB)
   2022-12-14 02:24:58,491 INFO [Executor task launch worker for task 0.0 in 
stage 12.0 (TID 13)] memory.MemoryStore (Logging.scala:logInfo(61)): Block 
broadcast_14_piece0 stored as bytes in memory (estimated size 168.1 KiB, free 
5.8 GiB)
   2022-12-14 02:24:58,494 INFO [Executor task launch worker for task 0.0 in 
stage 12.0 (TID 13)] broadcast.TorrentBroadcast (Logging.scala:logInfo(61)): 
Reading broadcast variable 14 took 7 ms
   2022-12-14 02:24:58,496 INFO [Executor task launch worker for task 0.0 in 
stage 12.0 (TID 13)] memory.MemoryStore (Logging.scala:logInfo(61)): Block 
broadcast_14 stored as values in memory (estimated size 474.3 KiB, free 5.8 GiB)
   2022-12-14 02:24:58,511 INFO [Executor task launch worker for task 0.0 in 
stage 12.0 (TID 13)] storage.BlockManager (Logging.scala:logInfo(61)): Found 
block rdd_32_0 locally
   2022-12-14 02:24:58,513 INFO [Executor task launch worker for task 0.0 in 
stage 12.0 (TID 13)] executor.Executor (Logging.scala:logInfo(61)): Finished 
task 0.0 in stage 12.0 (TID 13). 1256 bytes result sent to driver
   2022-12-14 02:25:03,187 INFO [dispatcher-Executor] 
executor.CoarseGrainedExecutorBackend (Logging.scala:logInfo(61)): Got assigned 
task 14
   2022-12-14 02:25:03,247 INFO [Executor task launch worker for task 0.0 in 
stage 13.0 (TID 14)] executor.Executor (Logging.scala:logInfo(61)): Running 
task 0.0 in stage 13.0 (TID 14)
   2022-12-14 02:25:03,295 INFO [Executor task launch worker for task 0.0 in 
stage 13.0 (TID 14)] broadcast.TorrentBroadcast (Logging.scala:logInfo(61)): 
Started reading broadcast variable 15 with 1 pieces (estimated total size 4.0 
MiB)
   2022-12-14 02:25:03,300 INFO [Executor task launch worker for task 0.0 in 
stage 13.0 (TID 14)] memory.MemoryStore (Logging.scala:logInfo(61)): Block 
broadcast_15_piece0 stored as bytes in memory (estimated size 3.9 KiB, free 5.8 
GiB)
   2022-12-14 02:25:03,303 INFO [Executor task launch worker for task 0.0 in 
stage 13.0 (TID 14)] broadcast.TorrentBroadcast (Logging.scala:logInfo(61)): 
Reading broadcast variable 15 took 7 ms
   2022-12-14 02:25:03,305 INFO [Executor task launch worker for task 0.0 in 
stage 13.0 (TID 14)] memory.MemoryStore (Logging.scala:logInfo(61)): Block 
broadcast_15 stored as values in memory (estimated size 6.8 KiB, free 5.8 GiB)
   2022-12-14 02:25:03,359 INFO [Executor task launch worker for task 0.0 in 
stage 13.0 (TID 14)] broadcast.TorrentBroadcast (Logging.scala:logInfo(61)): 
Started reading broadcast variable 3 with 1 pieces (estimated total size 4.0 
MiB)
   2022-12-14 02:25:03,366 INFO [Executor task launch worker for task 0.0 in 
stage 13.0 (TID 14)] memory.MemoryStore (Logging.scala:logInfo(61)): Block 
broadcast_3_piece0 stored as bytes in memory (estimated size 105.0 B, free 5.8 
GiB)
   2022-12-14 02:25:03,368 INFO [Executor task launch worker for task 0.0 in 
stage 13.0 (TID 14)] broadcast.TorrentBroadcast (Logging.scala:logInfo(61)): 
Reading broadcast variable 3 took 8 ms
   2022-12-14 02:25:03,373 INFO [Executor task launch worker for task 0.0 in 
stage 13.0 (TID 14)] memory.MemoryStore (Logging.scala:logInfo(61)): Block 
broadcast_3 stored as values in memory (estimated size 40.0 B, free 5.8 GiB)
   2022-12-14 02:25:03,377 ERROR [Executor task launch worker for task 0.0 in 
stage 13.0 (TID 14)] executor.Executor (Logging.scala:logError(98)): Exception 
in task 0.0 in stage 13.0 (TID 14)
   java.lang.NullPointerException: null
        at org.apache.hadoop.conf.Configuration.<init>(Configuration.java:842) 
~[hadoop-client-api-3.3.3-amzn-0.jar:?]
        at org.apache.hadoop.mapred.JobConf.<init>(JobConf.java:458) 
~[hadoop-client-api-3.3.3-amzn-0.jar:?]
        at 
com.amazonaws.services.glue.connections.DynamoConnection.getJobConf(DynamoConnection.scala:61)
 ~[AWSGlueDataplane-1.0.jar:?]
        at 
com.amazonaws.services.glue.connections.DynamoConnection.getReader(DynamoConnection.scala:136)
 ~[AWSGlueDataplane-1.0.jar:?]
        at 
com.amazonaws.services.glue.DynamicRecordRDD.compute(DataSource.scala:644) 
~[AWSGlueDataplane-1.0.jar:?]
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:365) 
~[spark-core_2.12-3.3.0-amzn-1.jar:3.3.0-amzn-1]
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:329) 
~[spark-core_2.12-3.3.0-amzn-1.jar:3.3.0-amzn-1]
        at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) 
~[spark-core_2.12-3.3.0-amzn-1.jar:3.3.0-amzn-1]
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:365) 
~[spark-core_2.12-3.3.0-amzn-1.jar:3.3.0-amzn-1]
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:329) 
~[spark-core_2.12-3.3.0-amzn-1.jar:3.3.0-amzn-1]
        at 
org.apache.spark.shuffle.ShuffleWriteProcessor.write(ShuffleWriteProcessor.scala:59)
 ~[spark-core_2.12-3.3.0-amzn-1.jar:3.3.0-amzn-1]
        at 
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99) 
~[spark-core_2.12-3.3.0-amzn-1.jar:3.3.0-amzn-1]
        at 
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:52) 
~[spark-core_2.12-3.3.0-amzn-1.jar:3.3.0-amzn-1]
        at org.apache.spark.scheduler.Task.run(Task.scala:138) 
~[spark-core_2.12-3.3.0-amzn-1.jar:3.3.0-amzn-1]
        at 
org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:548)
 ~[spark-core_2.12-3.3.0-amzn-1.jar:?]
        at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1516) 
~[spark-core_2.12-3.3.0-amzn-1.jar:3.3.0-amzn-1]
        at 
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:551) 
~[spark-core_2.12-3.3.0-amzn-1.jar:?]
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) 
~[?:1.8.0_352]
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) 
~[?:1.8.0_352]
        at java.lang.Thread.run(Thread.java:750) ~[?:1.8.0_352]
   2022-12-14 02:25:03,413 INFO [dispatcher-Executor] 
executor.CoarseGrainedExecutorBackend (Logging.scala:logInfo(61)): Got assigned 
task 15
   2022-12-14 02:25:03,414 INFO [Executor task launch worker for task 0.1 in 
stage 13.0 (TID 15)] executor.Executor (Logging.scala:logInfo(61)): Running 
task 0.1 in stage 13.0 (TID 15)
   2022-12-14 02:25:03,417 ERROR [Executor task launch worker for task 0.1 in 
stage 13.0 (TID 15)] executor.Executor (Logging.scala:logError(98)): Exception 
in task 0.1 in stage 13.0 (TID 15)
   java.lang.NullPointerException: null
        at org.apache.hadoop.conf.Configuration.<init>(Configuration.java:842) 
~[hadoop-client-api-3.3.3-amzn-0.jar:?]
        at org.apache.hadoop.mapred.JobConf.<init>(JobConf.java:458) 
~[hadoop-client-api-3.3.3-amzn-0.jar:?]
        at 
com.amazonaws.services.glue.connections.DynamoConnection.getJobConf(DynamoConnection.scala:61)
 ~[AWSGlueDataplane-1.0.jar:?]
        at 
com.amazonaws.services.glue.connections.DynamoConnection.getReader(DynamoConnection.scala:136)
 ~[AWSGlueDataplane-1.0.jar:?]
        at 
com.amazonaws.services.glue.DynamicRecordRDD.compute(DataSource.scala:644) 
~[AWSGlueDataplane-1.0.jar:?]
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:365) 
~[spark-core_2.12-3.3.0-amzn-1.jar:3.3.0-amzn-1]
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:329) 
~[spark-core_2.12-3.3.0-amzn-1.jar:3.3.0-amzn-1]
        at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) 
~[spark-core_2.12-3.3.0-amzn-1.jar:3.3.0-amzn-1]
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:365) 
~[spark-core_2.12-3.3.0-amzn-1.jar:3.3.0-amzn-1]
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:329) 
~[spark-core_2.12-3.3.0-amzn-1.jar:3.3.0-amzn-1]
        at 
org.apache.spark.shuffle.ShuffleWriteProcessor.write(ShuffleWriteProcessor.scala:59)
 ~[spark-core_2.12-3.3.0-amzn-1.jar:3.3.0-amzn-1]
        at 
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99) 
~[spark-core_2.12-3.3.0-amzn-1.jar:3.3.0-amzn-1]
        at 
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:52) 
~[spark-core_2.12-3.3.0-amzn-1.jar:3.3.0-amzn-1]
        at org.apache.spark.scheduler.Task.run(Task.scala:138) 
~[spark-core_2.12-3.3.0-amzn-1.jar:3.3.0-amzn-1]
        at 
org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:548)
 ~[spark-core_2.12-3.3.0-amzn-1.jar:?]
        at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1516) 
~[spark-core_2.12-3.3.0-amzn-1.jar:3.3.0-amzn-1]
        at 
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:551) 
~[spark-core_2.12-3.3.0-amzn-1.jar:?]
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) 
~[?:1.8.0_352]
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) 
~[?:1.8.0_352]
        at java.lang.Thread.run(Thread.java:750) ~[?:1.8.0_352]
   2022-12-14 02:25:03,426 INFO [dispatcher-Executor] 
executor.CoarseGrainedExecutorBackend (Logging.scala:logInfo(61)): Got assigned 
task 16
   2022-12-14 02:25:03,426 INFO [Executor task launch worker for task 0.2 in 
stage 13.0 (TID 16)] executor.Executor (Logging.scala:logInfo(61)): Running 
task 0.2 in stage 13.0 (TID 16)
   2022-12-14 02:25:03,429 ERROR [Executor task launch worker for task 0.2 in 
stage 13.0 (TID 16)] executor.Executor (Logging.scala:logError(98)): Exception 
in task 0.2 in stage 13.0 (TID 16)
   java.lang.NullPointerException: null
        at org.apache.hadoop.conf.Configuration.<init>(Configuration.java:842) 
~[hadoop-client-api-3.3.3-amzn-0.jar:?]
        at org.apache.hadoop.mapred.JobConf.<init>(JobConf.java:458) 
~[hadoop-client-api-3.3.3-amzn-0.jar:?]
        at 
com.amazonaws.services.glue.connections.DynamoConnection.getJobConf(DynamoConnection.scala:61)
 ~[AWSGlueDataplane-1.0.jar:?]
        at 
com.amazonaws.services.glue.connections.DynamoConnection.getReader(DynamoConnection.scala:136)
 ~[AWSGlueDataplane-1.0.jar:?]
        at 
com.amazonaws.services.glue.DynamicRecordRDD.compute(DataSource.scala:644) 
~[AWSGlueDataplane-1.0.jar:?]
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:365) 
~[spark-core_2.12-3.3.0-amzn-1.jar:3.3.0-amzn-1]
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:329) 
~[spark-core_2.12-3.3.0-amzn-1.jar:3.3.0-amzn-1]
        at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) 
~[spark-core_2.12-3.3.0-amzn-1.jar:3.3.0-amzn-1]
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:365) 
~[spark-core_2.12-3.3.0-amzn-1.jar:3.3.0-amzn-1]
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:329) 
~[spark-core_2.12-3.3.0-amzn-1.jar:3.3.0-amzn-1]
        at 
org.apache.spark.shuffle.ShuffleWriteProcessor.write(ShuffleWriteProcessor.scala:59)
 ~[spark-core_2.12-3.3.0-amzn-1.jar:3.3.0-amzn-1]
        at 
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99) 
~[spark-core_2.12-3.3.0-amzn-1.jar:3.3.0-amzn-1]
        at 
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:52) 
~[spark-core_2.12-3.3.0-amzn-1.jar:3.3.0-amzn-1]
        at org.apache.spark.scheduler.Task.run(Task.scala:138) 
~[spark-core_2.12-3.3.0-amzn-1.jar:3.3.0-amzn-1]
        at 
org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:548)
 ~[spark-core_2.12-3.3.0-amzn-1.jar:?]
        at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1516) 
~[spark-core_2.12-3.3.0-amzn-1.jar:3.3.0-amzn-1]
        at 
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:551) 
~[spark-core_2.12-3.3.0-amzn-1.jar:?]
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) 
~[?:1.8.0_352]
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) 
~[?:1.8.0_352]
        at java.lang.Thread.run(Thread.java:750) ~[?:1.8.0_352]
   2022-12-14 02:25:03,436 INFO [dispatcher-Executor] 
executor.CoarseGrainedExecutorBackend (Logging.scala:logInfo(61)): Got assigned 
task 17
   2022-12-14 02:25:03,437 INFO [Executor task launch worker for task 0.3 in 
stage 13.0 (TID 17)] executor.Executor (Logging.scala:logInfo(61)): Running 
task 0.3 in stage 13.0 (TID 17)
   2022-12-14 02:25:03,440 ERROR [Executor task launch worker for task 0.3 in 
stage 13.0 (TID 17)] executor.Executor (Logging.scala:logError(98)): Exception 
in task 0.3 in stage 13.0 (TID 17)
   java.lang.NullPointerException: null
        at org.apache.hadoop.conf.Configuration.<init>(Configuration.java:842) 
~[hadoop-client-api-3.3.3-amzn-0.jar:?]
        at org.apache.hadoop.mapred.JobConf.<init>(JobConf.java:458) 
~[hadoop-client-api-3.3.3-amzn-0.jar:?]
        at 
com.amazonaws.services.glue.connections.DynamoConnection.getJobConf(DynamoConnection.scala:61)
 ~[AWSGlueDataplane-1.0.jar:?]
        at 
com.amazonaws.services.glue.connections.DynamoConnection.getReader(DynamoConnection.scala:136)
 ~[AWSGlueDataplane-1.0.jar:?]
        at 
com.amazonaws.services.glue.DynamicRecordRDD.compute(DataSource.scala:644) 
~[AWSGlueDataplane-1.0.jar:?]
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:365) 
~[spark-core_2.12-3.3.0-amzn-1.jar:3.3.0-amzn-1]
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:329) 
~[spark-core_2.12-3.3.0-amzn-1.jar:3.3.0-amzn-1]
        at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) 
~[spark-core_2.12-3.3.0-amzn-1.jar:3.3.0-amzn-1]
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:365) 
~[spark-core_2.12-3.3.0-amzn-1.jar:3.3.0-amzn-1]
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:329) 
~[spark-core_2.12-3.3.0-amzn-1.jar:3.3.0-amzn-1]
        at 
org.apache.spark.shuffle.ShuffleWriteProcessor.write(ShuffleWriteProcessor.scala:59)
 ~[spark-core_2.12-3.3.0-amzn-1.jar:3.3.0-amzn-1]
        at 
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99) 
~[spark-core_2.12-3.3.0-amzn-1.jar:3.3.0-amzn-1]
        at 
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:52) 
~[spark-core_2.12-3.3.0-amzn-1.jar:3.3.0-amzn-1]
        at org.apache.spark.scheduler.Task.run(Task.scala:138) 
~[spark-core_2.12-3.3.0-amzn-1.jar:3.3.0-amzn-1]
        at 
org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:548)
 ~[spark-core_2.12-3.3.0-amzn-1.jar:?]
        at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1516) 
~[spark-core_2.12-3.3.0-amzn-1.jar:3.3.0-amzn-1]
        at 
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:551) 
~[spark-core_2.12-3.3.0-amzn-1.jar:?]
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) 
~[?:1.8.0_352]
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) 
~[?:1.8.0_352]
        at java.lang.Thread.run(Thread.java:750) ~[?:1.8.0_352]
   2022-12-14 02:25:03,558 INFO [block-manager-storage-async-thread-pool-39] 
storage.BlockManager (Logging.scala:logInfo(61)): Removing RDD 41
   2022-12-14 02:25:03,570 INFO [block-manager-storage-async-thread-pool-46] 
storage.BlockManager (Logging.scala:logInfo(61)): Removing RDD 22
   2022-12-14 02:25:03,574 INFO [block-manager-storage-async-thread-pool-49] 
storage.BlockManager (Logging.scala:logInfo(61)): Removing RDD 32
   2022-12-14 02:25:03,878 INFO [dispatcher-Executor] 
executor.CoarseGrainedExecutorBackend (Logging.scala:logInfo(61)): Driver 
commanded a shutdown
   2022-12-14 02:25:03,880 INFO [CoarseGrainedExecutorBackend-stop-executor] 
sink.GlueCloudwatchSink (GlueCloudwatchSink.scala:logInfo(22)): CloudwatchSink: 
SparkContext stopped - not reporting metrics now.
   2022-12-14 02:25:03,900 INFO [CoarseGrainedExecutorBackend-stop-executor] 
memory.MemoryStore (Logging.scala:logInfo(61)): MemoryStore cleared
   2022-12-14 02:25:03,901 INFO [CoarseGrainedExecutorBackend-stop-executor] 
storage.BlockManager (Logging.scala:logInfo(61)): BlockManager stopped
   Continuous Logging: Shutting down cloudwatch appender.
   Continuous Logging: Shutting down cloudwatch appender.
   
   2022-12-14 02:25:03,956 INFO [shutdown-hook-0] util.ShutdownHookManager 
(Logging.scala:logInfo(61)): Shutdown hook called
   ```
   
   
   ##### Note 
   i am not running parallel jobs. only single job is running.
   please let me know if any settings you see in my code is off i am happy to 
correct and give a try again.
   ```
       "hoodie.datasource.write.recordkey.field": "pk",
       "hoodie.datasource.write.precombine.field": "pk",
   ```
   can i use same for recordkey and precombine field ?
   
   Steps and code were given on AWS Website 
https://docs.aws.amazon.com/glue/latest/dg/aws-glue-programming-etl-format-hudi.html
   
   if i am doing something wrong happy to make changes based on your suggestion 
and i am willing to try again 
   looking fwd for your help 
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to