ctubbsii commented on PR #3511: URL: https://github.com/apache/accumulo/pull/3511#issuecomment-1601646574
Need to be sure this validation method only checks prefixes for filenames that we've stored... and this code path doesn't get executed when a user provides their own file names as in put (rfile-info or as a file that is being bulk imported, for example), because users can name their RFiles anything they like. I'm also concerned generally about doing this... for a long time, when HDFS or another issue caused an RFile to be corrupt or missing, we have recommended users perform metadata surgery to place an RFile directly. I can imagine other scenarios where a user has done that for surgical/maintenance reasons (like maybe manually compacting some files that is inconvenient for them to compact using our built-in compactors), and may have their own naming convention for the file they place (and for which they add an entry to the metadata table). This validation could break that situation where users have done that.... and not for any great reason. These file names are just conventions... they aren't a strict requirement. By default, Accumulo should work regardless of the file name, and these names would only matter for custom compaction strategies, trash policies, etc. that the user chose to deploy. Maybe a warning could be logged for unexpected name prefixes instead of a hard failure? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
