ctubbsii commented on issue #2322: URL: https://github.com/apache/accumulo/issues/2322#issuecomment-1136765560
This seems related to, or perhaps a duplicate of, #608 Should we close one of them? Also, I was thinking. The mapping of table IDs, which are BigInteger values, to their serialized form in the metadata table is not a monotonic function (it doesn't preserve order). So, when a new table is created, there's a good chance it's metadata is injected in the middle of the metadata table somewhere, and not at the end of the table in the last tablet. This might temporarily reduce initial hotspotting for new table's being created, but it could also be related to some potential garbage collection bugs if we don't account for it properly. Interestingly, the avoidance of hotspotting would only occur for the first few tables created, when the most significant digit is still changing frequently. We go back to hotspotting again quickly after a bunch of tables are created. This hotspotting is probably not substantial, but if we want to avoid it, that could be a separate ticket... to try to generate tableIds that are more spread out. However, if it turns out that inserting new table metadata (I'm particularly concerned about cloned tables, which have duplicate file references to existing tables) into the middle of a candidate garbage collection scan is the source of any bugs, we may want to just accept the hotspotting, and generate table IDs that are always appended to the end of the table, so their metadata is never inserted into the middle of a candidate scan. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
