The policy code uses a more or less standard linux regexp library, so your regular expressions used for grep should work. The catch is the policy file is preprocessed with M4 which makes writing regexs a bit tricky. I grabbed a comment from the code:
Note: The policy SQL parser normally does M4 macro preprocessing with [ ] set as the quote characters. SOOOO.... we highly recommend you add an extra set of [ ] around your REGEX pattern string. Like this: ... WHERE REGEX(name,['^[a-z]*$']) only accept lowercase alphabetic file names Once you’ve gotten past M4, you can either match for not good characters or directly for bad characters REGEX(. FILENAME, [‘[^a-zA-Z0-9\_\-\.]’] ) ### match when you find a character not in the good set REGEX(. FILENAME, [‘[\n\*\\]’] ). ### match when you find a bad character I am not sure which is more difficult to enumerate. The ESCAPE clause described by Olaf is the trick we use to pass file names with bad characters through the surrounding scripts (like mmbackup, mmxcp, etc). There is code in samples/ilm that show how to use it. -Wayne From: gpfsug-discuss <gpfsug-discuss-boun...@gpfsug.org> on behalf of Olaf Weiser <olaf.wei...@de.ibm.com> Date: Wednesday, July 5, 2023 at 7:06 PM To: gpfsug main discussion list <gpfsug-discuss@gpfsug.org> Subject: [EXTERNAL] Re: [gpfsug-discuss] Special characters in filenames Hallo Jonathan, I haven't used it for a while, but I can remember a customer, where we masked "all" special characters with ESCAPE In fact, as far as I remember. . this was an iterative progress .. . 😉 😉 You're right, the doc's are ZjQcmQRYFpfptBannerStart This Message Is From an External Sender This message came from outside your organization. Report Suspicious <https://us-phishalarm-ewt.proofpoint.com/EWT/v1/PjiDSg!1e-vr57TRvm6FYv7eAEkoFZ5-fuixwOfksdMyYJ2Yw9UHwuf23wcNAn2q-2zAW_rt-pXEwiWUEgZYE59IM6oXjeF6R9iCOapflooMaGkIunnVczfBcG0YBhSB07msMGJqVJ3cuRnSrg$> ZjQcmQRYFpfptBannerEnd Hallo Jonathan, I haven't used it for a while, but I can remember a customer, where we masked "all" special characters with ESCAPE In fact, as far as I remember.. this was an iterative progress ... 😉 😉 You're right, the doc's are not really self-explaining here.. from my personal notes I found a litte better example: In GPFS 3.5 we introduce an (optional) ESCAPE clause to the EXTERNAL LIST and EXTERNAL POOL rules, which allow the user-administrator to specify that path names and SHOW(strings) within the associated file lists are encoded using an encoding based on the RFC3986 URI-percent-encoding scheme. For example: RULE 'xp' EXTERNAL POOL 'pool-name' EXEC 'script-name' ESCAPE '%' RULE 'xl' EXTERNAL LIST 'list-name' EXEC 'script-name' ESCAPE '%/+@#' ESCAPE '%' specifies that all characters except the "unreserved" characters in the set a-zA-Z0-9-_.~ are encoded as %XX where XX comprises 2 hexadecimal digits. The GPFS ESCAPE clause allows you to add to the set of "unreserved" characters. For example, ESCAPE '%/+@#', specifies that none of the characters in "/+@#" are escaped, so that a path name like "/root/directory/@abc+def#ghi.jkl" will appear in a file list with no escape sequences, whereas under ESCAPE '%', specifying a rigorous RFC3986 encoding yields "%2Froot%2Fdirectory%2F%40abc%2Bdef%23ghi.jkl". at least for us, it was doing the trick (back then) by using ESCAPE.. Maybe it is useful for your case here as well cheers laff ________________________________ Von: gpfsug-discuss <gpfsug-discuss-boun...@gpfsug.org> im Auftrag von Jonathan Buzzard <jonathan.buzz...@strath.ac.uk> Gesendet: Donnerstag, 6. Juli 2023 00:20 An: gpfsug main discussion list <gpfsug-discuss@gpfsug.org> Betreff: [EXTERNAL] [gpfsug-discuss] Special characters in filenames After another support incident that eventually transpired to be down to the user using what I will call stupid characters in their filenames (we include a section on not doing this in our mandatory training so no excuse) I have been musing on using the policy engine to periodically produce lists of files that have stupid characters in their filenames so we can proactively educate the users and get them to rename their files to something sensible :-) The issue is of course the stupid characters include all the regular expression wildcard characters in addition to \n, \r and backticks. I am coming up short on escaping them correctly in REGEX() for the policy engine. The documentation appears to be devoid of help on the subject, because of course only an fool would be including these characters in their filenames... Anyone any idea on how to do this? JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss_gpfsug.org<http://gpfsug.org/mailman/listinfo/gpfsug-discuss_gpfsug.org>
_______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss_gpfsug.org