soullkk opened a new issue, #14685:
URL: https://github.com/apache/druid/issues/14685
Historical errors when loading segments because segment is too large for
storages
### Affected Version
druid 24.0.1
### Description
Please include as much detailed information about the problem as possible.
- total 3 nodes in cluster
-
historical/runtime.properties:7:druid.segmentCache.locations=[{"path":"/srv/druid/var/druid9","maxSize":32862064640},{"path":"/srv/druid/var/druid8","maxSize":32862064640},{"path":"/srv/druid/var/druid7","maxSize":32862064640},{"path":"/srv/druid/var/druid6","maxSize":32862064640},{"path":"/srv/druid/var/druid5","maxSize":32862064640},{"path":"/srv/druid/var/druid4","maxSize":32862064640},{"path":"/srv/druid/var/druid3","maxSize":32862064640},{"path":"/srv/druid/var/druid2","maxSize":32862064640},{"path":"/srv/druid/var/druid1","maxSize":32862064640},{"path":"/srv/druid/var/druid10","maxSize":32862064640},{"path":"/srv/druid/var/druid12","maxSize":32862064640},{"path":"/srv/druid/var/druid11","maxSize":32862064640}]
- there is no idea to reproduce this problem
- 2023-07-27 01:42:21,740 WARN
[ZKCoordinator--8][ROOT][org.apache.druid.segment.loading.StorageLocation]
Segment[ODAEDATASET__DEFAULT_fi_dc_kpi_ne_raw__DEFAULT_2023-07-24T02:00:00.000Z_2023-07-24T02:15:00.000Z_2023-07-24T02:15:05.987Z:6,994]
too large for storage[/srv/druid/var/druid5:1,542]. Check your
druid.segmentCache.locations maxSize param
2023-07-27 01:42:21,740 WARN
[ZKCoordinator--8][ROOT][org.apache.druid.segment.loading.StorageLocation]
Segment[ODAEDATASET__DEFAULT_fi_dc_kpi_ne_raw__DEFAULT_2023-07-24T02:00:00.000Z_2023-07-24T02:15:00.000Z_2023-07-24T02:15:05.987Z:6,994]
too large for storage[/srv/druid/var/druid1:1,393]. Check your
druid.segmentCache.locations maxSize param
2023-07-27 01:42:21,740 WARN
[ZKCoordinator--8][ROOT][org.apache.druid.segment.loading.StorageLocation]
Segment[ODAEDATASET__DEFAULT_fi_dc_kpi_ne_raw__DEFAULT_2023-07-24T02:00:00.000Z_2023-07-24T02:15:00.000Z_2023-07-24T02:15:05.987Z:6,994]
too large for storage[/srv/druid/var/druid10:808]. Check your
druid.segmentCache.locations maxSize param
2023-07-27 01:42:21,740 WARN
[ZKCoordinator--8][ROOT][org.apache.druid.segment.loading.StorageLocation]
Segment[ODAEDATASET__DEFAULT_fi_dc_kpi_ne_raw__DEFAULT_2023-07-24T02:00:00.000Z_2023-07-24T02:15:00.000Z_2023-07-24T02:15:05.987Z:6,994]
too large for storage[/srv/druid/var/druid3:781]. Check your
druid.segmentCache.locations maxSize param
2023-07-27 01:42:21,740 WARN
[ZKCoordinator--8][ROOT][org.apache.druid.segment.loading.StorageLocation]
Segment[ODAEDATASET__DEFAULT_fi_dc_kpi_ne_raw__DEFAULT_2023-07-24T02:00:00.000Z_2023-07-24T02:15:00.000Z_2023-07-24T02:15:05.987Z:6,994]
too large for storage[/srv/druid/var/druid4:573]. Check your
druid.segmentCache.locations maxSize param
2023-07-27 01:42:21,740 WARN
[ZKCoordinator--8][ROOT][org.apache.druid.segment.loading.StorageLocation]
Segment[ODAEDATASET__DEFAULT_fi_dc_kpi_ne_raw__DEFAULT_2023-07-24T02:00:00.000Z_2023-07-24T02:15:00.000Z_2023-07-24T02:15:05.987Z:6,994]
too large for storage[/srv/druid/var/druid6:459]. Check your
druid.segmentCache.locations maxSize param
2023-07-27 01:42:21,740 WARN
[ZKCoordinator--8][ROOT][org.apache.druid.segment.loading.StorageLocation]
Segment[ODAEDATASET__DEFAULT_fi_dc_kpi_ne_raw__DEFAULT_2023-07-24T02:00:00.000Z_2023-07-24T02:15:00.000Z_2023-07-24T02:15:05.987Z:6,994]
too large for storage[/srv/druid/var/druid9:451]. Check your
druid.segmentCache.locations maxSize param
2023-07-27 01:42:21,740 WARN
[ZKCoordinator--8][ROOT][org.apache.druid.segment.loading.StorageLocation]
Segment[ODAEDATASET__DEFAULT_fi_dc_kpi_ne_raw__DEFAULT_2023-07-24T02:00:00.000Z_2023-07-24T02:15:00.000Z_2023-07-24T02:15:05.987Z:6,994]
too large for storage[/srv/druid/var/druid8:420]. Check your
druid.segmentCache.locations maxSize param
2023-07-27 01:42:21,741 WARN
[ZKCoordinator--8][ROOT][org.apache.druid.segment.loading.SegmentLocalCacheManager]
Asked to cleanup
something[ODAEDATASET__DEFAULT_fi_dc_kpi_ne_raw__DEFAULT_2023-07-24T02:00:00.000Z_2023-07-24T02:15:00.000Z_2023-07-24T02:15:05.987Z]
that didn't exist. Skipping.
2023-07-27 01:42:21,741 WARN
[ZKCoordinator--8][ROOT][org.apache.druid.server.coordination.BatchDataSegmentAnnouncer]
No path to unannounce
segment[ODAEDATASET__DEFAULT_fi_dc_kpi_ne_raw__DEFAULT_2023-07-24T02:00:00.000Z_2023-07-24T02:15:00.000Z_2023-07-24T02:15:05.987Z]
2023-07-27 01:42:21,741 INFO
[ZKCoordinator--8][ROOT][org.apache.druid.server.SegmentManager] Told to delete
a queryable on dataSource[ODAEDATASET__DEFAULT_fi_dc_kpi_ne_raw__DEFAULT] for
interval[2023-07-24T02:00:00.000Z/2023-07-24T02:15:00.000Z] and
version[2023-07-24T02:15:05.987Z] that I don't have.
2023-07-27 01:42:21,741 WARN
[ZKCoordinator--8][ROOT][org.apache.druid.segment.loading.SegmentLocalCacheManager]
Asked to cleanup
something[ODAEDATASET__DEFAULT_fi_dc_kpi_ne_raw__DEFAULT_2023-07-24T02:00:00.000Z_2023-07-24T02:15:00.000Z_2023-07-24T02:15:05.987Z]
that didn't exist. Skipping.
2023-07-27 01:42:21,741 WARN
[ZKCoordinator--8][ROOT][org.apache.druid.server.coordination.SegmentLoadDropHandler]
Unable to delete
segmentInfoCacheFile[/srv/druid/var/druid9/info_dir/ODAEDATASET__DEFAULT_fi_dc_kpi_ne_raw__DEFAULT_2023-07-24T02:00:00.000Z_2023-07-24T02:15:00.000Z_2023-07-24T02:15:05.987Z]
2023-07-27 01:42:21,741 ERROR
[ZKCoordinator--8][ROOT][org.apache.druid.server.coordination.SegmentLoadDropHandler]
Failed to load segment for dataSource:
{class=org.apache.druid.server.coordination.SegmentLoadDropHandler,
exceptionType=class org.apache.druid.segment.loading.SegmentLoadingException,
exceptionMessage=Exception loading
segment[ODAEDATASET__DEFAULT_fi_dc_kpi_ne_raw__DEFAULT_2023-07-24T02:00:00.000Z_2023-07-24T02:15:00.000Z_2023-07-24T02:15:05.987Z],
segment=DataSegment{binaryVersion=9,
id=ODAEDATASET__DEFAULT_fi_dc_kpi_ne_raw__DEFAULT_2023-07-24T02:00:00.000Z_2023-07-24T02:15:00.000Z_2023-07-24T02:15:05.987Z,
loadSpec={type=>hdfs,
path=>hdfs://hacluster/srv/bigdata/druid/segments/ODAEDATASET__DEFAULT_fi_dc_kpi_ne_raw__DEFAULT/20230724T020000.000Z_20230724T021500.000Z/2023-07-24T02_15_05.987Z/0_bff76d6a-fbec-46bf-b0a5-cc94c50ea9ec_index.zip},
dimensions=[fabric_id, ne_dn, ne_name, ne_ip, slot_id, slot_name,
slot_uniq_id, is_multi_slot, mac, slot_query_id, device_role]
, metrics=[cpu_usage, cpu_effcnt, mem_usage, mem_effcnt, period_effect,
period_ctn, deviceTime, count], shardSpec=NumberedShardSpec{partitionNum=0,
partitions=0}, lastCompactionState=null, size=6994}}
org.apache.druid.segment.loading.SegmentLoadingException: Exception loading
segment[ODAEDATASET__DEFAULT_fi_dc_kpi_ne_raw__DEFAULT_2023-07-24T02:00:00.000Z_2023-07-24T02:15:00.000Z_2023-07-24T02:15:05.987Z]
at
org.apache.druid.server.coordination.SegmentLoadDropHandler.loadSegment(SegmentLoadDropHandler.java:289)
~[druid-server-24.0.1-htrunk6.jar:?]
at
org.apache.druid.server.coordination.SegmentLoadDropHandler.loadSegment(SegmentLoadDropHandler.java:266)
~[druid-server-24.0.1-htrunk6.jar:?]
at
org.apache.druid.server.coordination.SegmentLoadDropHandler.addSegment(SegmentLoadDropHandler.java:343)
~[druid-server-24.0.1-htrunk6.jar:?]
at
org.apache.druid.server.coordination.SegmentChangeRequestLoad.go(SegmentChangeRequestLoad.java:61)
~[druid-server-24.0.1-htrunk6.jar:?]
at
org.apache.druid.server.coordination.ZkCoordinator.lambda$childAdded$2(ZkCoordinator.java:150)
~[druid-server-24.0.1-htrunk6.jar:?]
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
~[?:1.8.0_372]
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
~[?:1.8.0_372]
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
~[?:1.8.0_372]
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
~[?:1.8.0_372]
at java.lang.Thread.run(Thread.java:750) ~[?:1.8.0_372]
Caused by: org.apache.druid.segment.loading.SegmentLoadingException: Failed
to load segment
ODAEDATASET__DEFAULT_fi_dc_kpi_ne_raw__DEFAULT_2023-07-24T02:00:00.000Z_2023-07-24T02:15:00.000Z_2023-07-24T02:15:05.987Z
in all locations.
at
org.apache.druid.segment.loading.SegmentLocalCacheManager.loadSegmentWithRetry(SegmentLocalCacheManager.java:279)
~[druid-server-24.0.1-htrunk6.jar:?]
at
org.apache.druid.segment.loading.SegmentLocalCacheManager.getSegmentFiles(SegmentLocalCacheManager.java:229)
~[druid-server-24.0.1-htrunk6.jar:?]
at
org.apache.druid.segment.loading.SegmentLocalCacheLoader.getSegment(SegmentLocalCacheLoader.java:56)
~[druid-server-24.0.1-htrunk6.jar:?]
at
org.apache.druid.server.SegmentManager.getSegmentReference(SegmentManager.java:325)
~[druid-server-24.0.1-htrunk6.jar:?]
at
org.apache.druid.server.SegmentManager.loadSegment(SegmentManager.java:268)
~[druid-server-24.0.1-htrunk6.jar:?]
at
org.apache.druid.server.coordination.SegmentLoadDropHandler.loadSegment(SegmentLoadDropHandler.java:281)
~[druid-server-24.0.1-htrunk6.jar:?]
... 9 more
- The log indicates that storageLocation.currSizeBytes is close to
maxSizeBytes and there is no space in storage to load segments, But there is
still a lot of space in the cache directory. The cache directory information is
as follows:


I exported the historical dump for analysis,and find there has duplicate
directories in different storageLocation.files and i think this is not in line
with expectations.
SELECT location.files.map.size, location.currSizeBytes, location.files FROM
org.apache.druid.segment.loading.StorageLocation location WHERE
(location.currSizeBytes > 0)

SELECT file.path.toString(), file.path.toString().substring(21) FROM
java.io.File file WHERE ((file.path.toString().contains("/srv/druid/var/druid")
= true) and (file.path.toString().contains("smoosh") = false))



there has 66347 segments in db, and total segment size is 54GB
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]