[ 
https://issues.apache.org/jira/browse/TS-2447?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bryan Call updated TS-2447:
---------------------------

    Fix Version/s:     (was: 5.0.0)
                   5.1.0

> Cache fails to load / initialize, seems stuck on directory entry cleanup
> ------------------------------------------------------------------------
>
>                 Key: TS-2447
>                 URL: https://issues.apache.org/jira/browse/TS-2447
>             Project: Traffic Server
>          Issue Type: Bug
>          Components: Cache
>            Reporter: Leif Hedstrom
>            Assignee: Alan M. Carroll
>             Fix For: 5.1.0
>
>
> We had an issue where a number of machines would not startup properly. They 
> get stuck on reading / initializing the cache. It initializes the caches with
> {code}
> [Dec 20 16:45:40.195] Server {0x7f3f2f4ef7e0} DEBUG: (cache_hosting) Disk: 0: 
> Vol Blocks: 1: Free space: 0
> [Dec 20 16:45:40.195] Server {0x7f3f2f4ef7e0} DEBUG: (cache_hosting)    Vol: 
> 0 Size: 62509342
> [Dec 20 16:45:40.195] Server {0x7f3f2f4ef7e0} DEBUG: (cache_hosting)    Block 
> No: 0 Size: 62509342 Free: 0
> [Dec 20 16:45:40.195] Server {0x7f3f2f4ef7e0} DEBUG: (cache_hosting) Disk: 1: 
> Vol Blocks: 1: Free space: 0
> [Dec 20 16:45:40.195] Server {0x7f3f2f4ef7e0} DEBUG: (cache_hosting)    Vol: 
> 0 Size: 62509342
> [Dec 20 16:45:40.195] Server {0x7f3f2f4ef7e0} DEBUG: (cache_hosting)    Block 
> No: 0 Size: 62509342 Free: 0
> [Dec 20 16:45:40.195] Server {0x7f3f2f4ef7e0} DEBUG: (cache_hosting) Disk: 2: 
> Vol Blocks: 1: Free space: 0
> [Dec 20 16:45:40.195] Server {0x7f3f2f4ef7e0} DEBUG: (cache_hosting)    Vol: 
> 0 Size: 62509342
> [Dec 20 16:45:40.195] Server {0x7f3f2f4ef7e0} DEBUG: (cache_hosting)    Block 
> No: 0 Size: 62509342 Free: 0
> [Dec 20 16:45:40.195] Server {0x7f3f2f4ef7e0} DEBUG: (cache_hosting) Disk: 3: 
> Vol Blocks: 1: Free space: 0
> [Dec 20 16:45:40.195] Server {0x7f3f2f4ef7e0} DEBUG: (cache_hosting)    Vol: 
> 0 Size: 62509342
> [Dec 20 16:45:40.195] Server {0x7f3f2f4ef7e0} DEBUG: (cache_hosting)    Block 
> No: 0 Size: 62509342 Free: 0
> [Dec 20 16:45:40.195] Server {0x7f3f2f4ef7e0} DEBUG: (cache_hosting) Disk: 4: 
> Vol Blocks: 1: Free space: 0
> [Dec 20 16:45:40.195] Server {0x7f3f2f4ef7e0} DEBUG: (cache_hosting)    Vol: 
> 0 Size: 62509342
> [Dec 20 16:45:40.195] Server {0x7f3f2f4ef7e0} DEBUG: (cache_hosting)    Block 
> No: 0 Size: 62509342 Free: 0
> [Dec 20 16:45:40.195] Server {0x7f3f2f4ef7e0} DEBUG: (cache_hosting) Disk: 5: 
> Vol Blocks: 1: Free space: 0
> [Dec 20 16:45:40.195] Server {0x7f3f2f4ef7e0} DEBUG: (cache_hosting)    Vol: 
> 0 Size: 62509342
> [Dec 20 16:45:40.195] Server {0x7f3f2f4ef7e0} DEBUG: (cache_hosting)    Block 
> No: 0 Size: 62509342 Free: 0
> [Dec 20 16:45:40.195] Server {0x7f3f2f4ef7e0} DEBUG: (cache_hosting) Disk: 6: 
> Vol Blocks: 1: Free space: 0
> [Dec 20 16:45:40.195] Server {0x7f3f2f4ef7e0} DEBUG: (cache_hosting)    Vol: 
> 0 Size: 62509342
> [Dec 20 16:45:40.195] Server {0x7f3f2f4ef7e0} DEBUG: (cache_hosting)    Block 
> No: 0 Size: 62509342 Free: 0
> [Dec 20 16:45:40.195] Server {0x7f3f2f4ef7e0} DEBUG: (cache_init) Cache::open 
> - proxy.config.cache.min_average_object_size = 65536
> [Dec 20 16:45:40.198] Server {0x7f3f2f4ef7e0} DEBUG: (cache_init) allocating 
> 78135296 directory bytes for a 512076529664 byte volume (0.015259%)
> [Dec 20 16:45:40.201] Server {0x7f3f2f4ef7e0} DEBUG: (cache_init) allocating 
> 78135296 directory bytes for a 512076529664 byte volume (0.015259%)
> [Dec 20 16:45:40.204] Server {0x7f3f2f4ef7e0} DEBUG: (cache_init) allocating 
> 78135296 directory bytes for a 512076529664 byte volume (0.015259%)
> [Dec 20 16:45:40.208] Server {0x7f3f2f4ef7e0} DEBUG: (cache_init) allocating 
> 78135296 directory bytes for a 512076529664 byte volume (0.015259%)
> [Dec 20 16:45:40.213] Server {0x7f3f2f4ef7e0} DEBUG: (cache_init) allocating 
> 78135296 directory bytes for a 512076529664 byte volume (0.015259%)
> [Dec 20 16:45:40.219] Server {0x7f3f2f4ef7e0} DEBUG: (cache_init) allocating 
> 78135296 directory bytes for a 512076529664 byte volume (0.015259%)
> [Dec 20 16:45:40.224] Server {0x7f3f2f4ef7e0} DEBUG: (cache_init) allocating 
> 78135296 directory bytes for a 512076529664 byte volume (0.015259%)
> {code}
> After this, it enters a stage where it’s doing a *lot* of dir_clean events:
> {code}
> [Dec 20 16:45:40.314] Server {0x7f3f26f91700} DEBUG: (dir_clean) cleaning 
> 0x7f3edcd6f028 tag 0 boffset 0 b 0x7f3edcd6f028 p (nil) l 1
> [Dec 20 16:45:40.314] Server {0x7f3f26f91700} DEBUG: (dir_clean) cleaning 
> 0x7f3edcd6f050 tag 0 boffset 0 b 0x7f3edcd6f050 p (nil) l 1
> [Dec 20 16:45:40.314] Server {0x7f3f26f91700} DEBUG: (dir_clean) cleaning 
> 0x7f3edcd6f078 tag 0 boffset 0 b 0x7f3edcd6f078 p (nil) l 1
> [Dec 20 16:45:40.314] Server {0x7f3f26f91700} DEBUG: (dir_clean) cleaning 
> 0x7f3edcd6f0a0 tag 0 boffset 0 b 0x7f3edcd6f0a0 p (nil) l 1
> [Dec 20 16:45:40.314] Server {0x7f3f26f91700} DEBUG: (dir_clean) cleaning 
> 0x7f3edcd6f0c8 tag 0 boffset 0 b 0x7f3edcd6f0c8 p (nil) l 1
> [Dec 20 16:45:40.314] Server {0x7f3f26f91700} DEBUG: (dir_clean) cleaning 
> 0x7f3edcd6f0f0 tag 0 boffset 0 b 0x7f3edcd6f0f0 p (nil) l 1
> [Dec 20 16:45:40.314] Server {0x7f3f26f91700} DEBUG: (dir_clean) cleaning 
> 0x7f3edcd6f140 tag 0 boffset 0 b 0x7f3edcd6f140 p (nil) l 1
> {code}
> This goes on for a while, until it reaches some state where one entry is 
> cyclical, interspersed with other entries:
> {code}
> [Dec 20 16:45:51.651] Server {0x7f3f282a4700} DEBUG: (dir_clean) cleaning 
> 0x7f3eeb78c3f0 tag 2A7 boffset 0 b 0x7f3eeb78c3f0 p (nil) l -1
> [Dec 20 16:45:51.651] Server {0x7f3f27d9f700} DEBUG: (dir_clean) cleaning 
> 0x7f3ece462c98 tag 0 boffset 0 b 0x7f3ece462c98 p (nil) l 1
> [Dec 20 16:45:51.651] Server {0x7f3f285a7700} DEBUG: (dir_clean) cleaning 
> 0x7f3ed81d22b0 tag 0 boffset 0 b 0x7f3ed81d22b0 p (nil) l 1
> [Dec 20 16:45:51.651] Server {0x7f3f2f4ef7e0} DEBUG: (dir_clean) cleaning 
> 0x7f3ed32fb2a0 tag 0 boffset 0 b 0x7f3ed32fb2a0 p (nil) l 1
> [Dec 20 16:45:51.651] Server {0x7f3f27ea0700} DEBUG: (dir_clean) cleaning 
> 0x7f3ee1f66060 tag 0 boffset 0 b 0x7f3ee1f66060 p (nil) l 1
> [Dec 20 16:45:51.651] Server {0x7f3f26f91700} DEBUG: (dir_clean) cleaning 
> 0x7f3edd093420 tag 0 boffset 0 b 0x7f3edd093420 p (nil) l 1
> [Dec 20 16:45:51.651] Server {0x7f3f280a2700} DEBUG: (dir_clean) cleaning 
> 0x7f3ee6e0ea18 tag 0 boffset 0 b 0x7f3ee6e0ea18 p (nil) l 1
> [Dec 20 16:45:51.651] Server {0x7f3f282a4700} DEBUG: (dir_clean) cleaning 
> 0x7f3eeb78c3f0 tag C0 boffset 0 b 0x7f3eeb78c3f0 p (nil) l -1
> [Dec 20 16:45:51.651] Server {0x7f3f27d9f700} DEBUG: (dir_clean) cleaning 
> 0x7f3ece462ce8 tag 0 boffset 0 b 0x7f3ece462ce8 p (nil) l 1
> [Dec 20 16:45:51.651] Server {0x7f3f285a7700} DEBUG: (dir_clean) cleaning 
> 0x7f3ed81d22d8 tag 0 boffset 0 b 0x7f3ed81d22d8 p (nil) l 1
> [Dec 20 16:45:51.651] Server {0x7f3f2f4ef7e0} DEBUG: (dir_clean) cleaning 
> 0x7f3ed32fb2f0 tag 0 boffset 0 b 0x7f3ed32fb2f0 p (nil) l 1
> [Dec 20 16:45:51.651] Server {0x7f3f27ea0700} DEBUG: (dir_clean) cleaning 
> 0x7f3ee1f660b0 tag 0 boffset 0 b 0x7f3ee1f660b0 p (nil) l 1
> {code}
> Note how 0x7f3eeb78c3f0 is repeated once in a while. After this goes on for 
> quite a while, it ends up in a state like this:
> {code}
> [Dec 20 17:02:28.914] Server {0x7f3f282a4700} DEBUG: (dir_clean) cleaning 
> 0x7f3eeb78c3f0 tag FFE boffset 0 b 0x7f3eeb78c3f0 p (nil) l -1
> [Dec 20 17:02:28.914] Server {0x7f3f282a4700} DEBUG: (dir_clean) cleaning 
> 0x7f3eeb78c3f0 tag FFA boffset 0 b 0x7f3eeb78c3f0 p (nil) l -1
> [Dec 20 17:02:28.914] Server {0x7f3f282a4700} DEBUG: (dir_clean) cleaning 
> 0x7f3eeb78c3f0 tag FF6 boffset 0 b 0x7f3eeb78c3f0 p (nil) l -1
> [Dec 20 17:02:28.914] Server {0x7f3f282a4700} DEBUG: (dir_clean) cleaning 
> 0x7f3eeb78c3f0 tag FF2 boffset 0 b 0x7f3eeb78c3f0 p (nil) l -1
> [Dec 20 17:02:28.914] Server {0x7f3f282a4700} DEBUG: (dir_clean) cleaning 
> 0x7f3eeb78c3f0 tag FEE boffset 0 b 0x7f3eeb78c3f0 p (nil) l -1
> [Dec 20 17:02:28.914] Server {0x7f3f282a4700} DEBUG: (dir_clean) cleaning 
> 0x7f3eeb78c3f0 tag FEA boffset 0 b 0x7f3eeb78c3f0 p (nil) l -1
> .
> .
> .
> [Dec 20 17:02:28.922] Server {0x7f3f282a4700} DEBUG: (dir_clean) cleaning 
> 0x7f3eeb78c3f0 tag 16 boffset 0 b 0x7f3eeb78c3f0 p(nil) l -1
> [Dec 20 17:02:28.922] Server {0x7f3f282a4700} DEBUG: (dir_clean) cleaning 
> 0x7f3eeb78c3f0 tag 12 boffset 0 b 0x7f3eeb78c3f0 p(nil) l -1
> [Dec 20 17:02:28.922] Server {0x7f3f282a4700} DEBUG: (dir_clean) cleaning 
> 0x7f3eeb78c3f0 tag E boffset 0 b 0x7f3eeb78c3f0 p (nil) l -1
> [Dec 20 17:02:28.922] Server {0x7f3f282a4700} DEBUG: (dir_clean) cleaning 
> 0x7f3eeb78c3f0 tag A boffset 0 b 0x7f3eeb78c3f0 p (nil) l -1
> [Dec 20 17:02:28.922] Server {0x7f3f282a4700} DEBUG: (dir_clean) cleaning 
> 0x7f3eeb78c3f0 tag 6 boffset 0 b 0x7f3eeb78c3f0 p (nil) l -1
> [Dec 20 17:02:28.922] Server {0x7f3f282a4700} DEBUG: (dir_clean) cleaning 
> 0x7f3eeb78c3f0 tag 2 boffset 0 b 0x7f3eeb78c3f0 p (nil) l -1
> [Dec 20 17:02:28.922] Server {0x7f3f282a4700} DEBUG: (dir_clean) cleaning 
> 0x7f3eeb78c3f0 tag FFE boffset 0 b 0x7f3eeb78c3f0 p (nil) l -1
> [Dec 20 17:02:28.922] Server {0x7f3f282a4700} DEBUG: (dir_clean) cleaning 
> 0x7f3eeb78c3f0 tag FFA boffset 0 b 0x7f3eeb78c3f0 p (nil) l -1
> .
> .
> .
> {code}
> This last state never finishes, it seems to loop through the tag’s repeatedly 
> forever. It consumes 100% of one CPU while doing this. I have no idea what 
> this actually means yet, other than the debug tag comes from a test where we 
> verify 
> {code}
>    if (!dir_valid(vol, e) || !dir_offset(e)) {
> {code}
> I don’t know if this is a regression, a new bug, or something we’ve always 
> had. In all my years running at Yahoo, we never ran into this, and we’ve now 
> run into this 3-4 times at least in a short time here. I think I had it 
> happen in my lab box once, unfortunately I didn’t recognize it as such, and 
> cleared the cache.It happened once while debugging something else, and I was 
> restarting ATS frequently.
> Oh yeah, doing a “traffic_server -Cclear” obviously fixes the broken cache.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to