In my situation, when I use mt_dop=24, I encountered the problem that some
exchange operator take much time processing data.
There are much data in the defered queue waiting for being deserialized, why no
data be dequeue from the queue?
To speed up deserialized from defered queue, I change the config;
datastream_service_num_deserialization_threads=80
datastream_service_deserialization_queue_size=50000
And in the same time, cpu utilization is only about 20%.
EXCHANGE_NODE (id=17):(Total: 12s519ms, non-child: 7s804ms, % non-child: 62.34%)
Node Lifecycle Event Timeline: 20s865ms
- Open Started: 1s104ms (1s104ms)
- Open Finished: 5s716ms (4s612ms)
- First Batch Requested: 5s717ms (107.811us)
- First Batch Returned: 5s739ms (22.135ms)
- Last Batch Returned: 20s863ms (15s124ms)
- Closed: 20s865ms (1.429ms)
- ConvertRowBatchTime: 15.023ms
- PeakMemoryUsage: 40.31 MB (42271020)
- RowsReturned: 1.18M (1183889)
- RowsReturnedRate: 94.56 K/sec
Buffer pool:
- AllocTime: 121.084ms
- CumulativeAllocationBytes: 263.20 MB (275980288)
- CumulativeAllocations: 8.28K (8276)
- PeakReservation: 19.44 MB (20381696)
- PeakUnpinnedBytes: 0
- PeakUsedReservation: 19.44 MB (20381696)
- ReadIoBytes: 0
- ReadIoOps: 0 (0)
- ReadIoWaitTime: 0.000ns
- SystemAllocTime: 61.052ms
- WriteIoBytes: 0
- WriteIoOps: 0 (0)
- WriteIoWaitTime: 0.000ns
Dequeue:
- BytesDequeued (500.000ms): 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
155.30 KB, 155.30 KB, 155.30 KB, 155.30 KB, 155.30 KB, 155.30 KB, 155.30 KB,
155.30 KB, 155.30 KB, 155.30 KB, 155.30 KB, 155.30 KB, 155.30 KB, 5.69 MB,
13.08 MB, 20.32 MB, 27.79 MB, 35.03 MB, 42.62 MB, 50.28 MB, 57.71 MB, 65.44 MB,
73.14 MB, 80.80 MB, 88.12 MB, 95.52 MB, 103.14 MB, 110.16 MB, 120.44 MB, 129.98
MB
- FirstBatchWaitTime: 4s612ms
- TotalBytesDequeued: 145.28 MB (152335476)
- TotalGetBatchTime: 12s500ms
- DataWaitTime: 4s714ms
Enqueue:
- BytesReceived (500.000ms): 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
11.02 MB, 23.38 MB, 24.85 MB, 24.85 MB, 24.85 MB, 24.85 MB, 24.85 MB, 24.85 MB,
24.85 MB, 24.85 MB, 24.85 MB, 24.85 MB, 24.85 MB, 25.05 MB, 25.78 MB, 27.08 MB,
30.66 MB, 37.86 MB, 45.58 MB, 50.16 MB, 51.69 MB, 52.16 MB, 52.65 MB, 54.85 MB,
61.37 MB, 69.15 MB, 70.72 MB, 71.36 MB, 71.50 MB, 71.56 MB
- DeferredQueueSize (500.000ms): 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 324, 987, 1.07K, 1.07K, 1.07K, 1.07K, 1.07K, 1.07K, 1.07K, 1.07K, 1.07K,
1.07K, 1.07K, 933, 775, 654, 649, 844, 1.06K, 1.10K, 988, 809, 635, 567, 799,
1.17K, 1.07K, 888, 563, 209
- DispatchTime: (Avg: 52.816us ; Min: 7.746us ; Max: 3.283ms ;
Number of samples: 4138)
- DeserializeRowBatchTime: 293.881ms
- TotalBatchesEnqueued: 4.14K (4138)
- TotalBatchesReceived: 4.14K (4138)
- TotalBytesReceived: 71.56 MB (75036662)
- TotalEarlySenders: 0 (0)
- TotalEosReceived: 1.32K (1325)
- TotalHasDeferredRPCsTime: 14s833ms
- TotalRPCsDeferred: 3.87K (3871)
CodeGen:(Total: 1s000ms, non-child: 1s000ms, % non-child: 100.00%)
- CodegenInvoluntaryContextSwitches: 45 (45)
- CodegenTotalWallClockTime: 1s000ms
- CodegenSysTime: 4.000ms
- CodegenUserTime: 828.000ms
- CodegenVoluntaryContextSwitches: 487 (487)
- CompileTime: 252.544ms
- IrGenerationTime: 65.293ms
- LoadTime: 0.000ns
- ModuleBitcodeSize: 2.52 MB (2645424)
- NumFunctions: 165 (165)
- NumInstructions: 7.29K (7290)
- OptimizationTime: 607.741ms
- PeakMemoryUsage: 3.56 MB (3732480)
- PrepareTime: 69.221ms