Tamar Christina <tamar.christ...@arm.com> writes: > Hi Richard, > > Thanks for the review, > >> -----Original Message----- >> From: Richard Sandiford <richard.sandif...@arm.com> >> Sent: Thursday, July 9, 2020 1:35 PM >> To: Tamar Christina <tamar.christ...@arm.com> >> Cc: gcc-patches@gcc.gnu.org; nd <n...@arm.com>; Richard Earnshaw >> <richard.earns...@arm.com>; Marcus Shawcroft >> <marcus.shawcr...@arm.com>; Kyrylo Tkachov <kyrylo.tkac...@arm.com> >> Subject: Re: [PATCH 1/6] AArch64: Fix bugs in -mcpu=native detection. >> >> Tamar Christina <tamar.christ...@arm.com> writes: >> > Hi All, >> > >> > This patch fixes a couple of issues in AArch64's -mcpu=native processing: >> > >> > The buffer used to read the lines from /proc/cpuinfo is 128 bytes >> > long. While this was enough in the past with the increase in architecture >> extensions it is >> > no longer enough. It results in two bugs: >> > >> > 1) No option string longer than 127 characters is correctly parsed. >> > Features >> > that are supported are silently ignored. >> > >> > 2) It incorrectly enables features that are not present on the machine: >> > a) It checks for substring matching instead of full word matching. This >> makes >> > it incorrectly detect sb support when ssbs is provided instead. >> > b) Due to the truncation at the 127 char border it also incorrectly >> > enables >> > features due to the full feature being cut off and the part that is >> > left >> > accidentally enables something else. >> > >> > This breaks -mcpu=native detection on some of our newer system. >> > >> > The patch fixes these issues by reading full lines up to the \n in a >> > string. >> > This gives us the full feature line. Secondly it creates a set from >> > this string >> > to: >> > >> > 1) Reduce matching complexity from O(n*m) to O(n*logm). >> > 2) Perform whole word matching instead of substring matching. >> > >> > To make this code somewhat cleaner I also changed from using char* to >> > using std::string and std::set. >> > >> > Note that I have intentionally avoided the use of ifstream and >> > stringstream to make it easier to backport. I have also not change >> > the substring matching for the initial line classification as I cannot >> > find a documented cpuinfo format which leads me to believe there may >> > be kernels out there that require this which may be why the original code >> does this. >> > >> > I also do not want this to break if the kernel adds a new line that is >> > long and indents the file by two tabs to keep everything aligned. In >> > short I think an imprecise match is the right thing here. >> > >> > Test for this is added as the last thing in this series as it requires >> > some changes to be made to be able to test this. >> > >> > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues. >> >> Sorry to be awkward. I know Kyrill's already approved this, but I kind-of >> object. >> >> I think we should split this into two: fixing the limit bug, and fixing the >> complexity. It's easier to justify backporting the fix for the limit bug >> than the >> fix for the complexity. > > It was never the intention to fix the complexity. The change in complexity is > just > simply because I split the feature strings into actual words now. I do so > because > in my opinion this is simpler than doing memcmp and checking the the character > after the one you matched is a space or a null character, and then checking > that you are > one character away from the previous space or at the start of the line. > > Do you know of a simpler way to do word matches in C?
This isn't an anti-C++ thing :-) I just think any clean-up should be separate from the bugfix, and only the bugfix should be backported. >> For fixing the limit bug, it seems better to use an obstack to read the file >> instead. This should actually be much easier than what the patch does. :-) >> It >> should also be more efficient. >> > > Sorry.. I genuinely don't understand. I had looked at obstack and the > interface seemed to be > more work to me. I am probably wrong since this interface has zero > documentation > but it looks like to use obstack I have to (based on guessing from what other > code are doing) > > 1. call gcc_obstack_init > 2. call obstack_alloc > 3. call obstack_grow any time I need a bigger buffer. > 4. call obstack_free > > So I am missing why this is simpler than calling realloc... > But obviously I am missing something. Yeah, (1), (3) and (4) are the right steps. But the simplifying thing is that obstack has “optimised” single-character insertions. So rather than going by fgets, we can just do something like: while ((ch = getc (f)) != EOF) { obstack_1grow (&ob, ch); if (ch == '\n') break; } obstack_1grow (&ob, 0); return obstack_finish (&ob); to read a zero-terminated line. Then use: obstack_free (&ob, line); after each line. (Optional, but more efficient, since it reuses memory.) This saves doing a fresh xmalloc+free for each line. >> For fixing the complexity: does this actually make things noticeably faster >> in >> practice? The “m” in the O(n*m) is a fairly small constant. >> The cost of making it O(n*logm) this way involves many more memory >> allocations, so it isn't obvious that it's actually more efficient in >> practice. Do >> you have any timings? > > Well, no.. I didn't run any benchmarks. Did you have any in mind that you > wanted me to run? > > As I mentioned, the complexity change was just a byproduct of doing a much > easier to see correct > word matching. Ah, OK. > But if you want me to just add extra checks to the existing implementation to > see > if it meets all the criteria of being a standalone word then sure I can do > that. I don't particularly find > that implementation easier to read and I would have thought correctness was > to be favored over memory > allocation in something that is called once. > >> >> If search time is a problem, building a hash_map might be better. > > I would have thought the red-black tree of a set is fine.. But if I'm only > allowed to use GCC internal structures then > I will change it... No, using C++ is fine. The reason for suggesting hash_map (or hash_set, as I should have said) was that, if complexity was an issue, hash_set/map is supposed to have amortised constant lookup time for a good hash. As far as the patch itself goes: > @@ -126,6 +127,57 @@ parse_field (const char *field) > return fint; > } > > +/* Splits and returns a string based on whitespace and return it as > + part of a set. Empty strings are ignored. */ > + > +static void > +split_words (const std::string val, std::set<std::string> &result) Not sure there's any benefit to marking “val” const, and adding it makes it look like it was supposed to be “const std::string &” instead (which should be more efficient). > +{ > + size_t cur, prev = 0; > + std::string word; > + while ((cur = val.find_first_of (" \n", prev)) != std::string::npos) > + { The { … } should be indented two spaces relative the “while”. > + word = val.substr (prev, cur - prev); > + /* Skip adding empty words. */ > + if (!word.empty ()) > + result.insert (word); Guess, this is just personal preference, sorry, but: if (prev != cur) result.insert (std::string (prev, cur)); seems simpler. > + prev = cur+1; Should be spaces around the “+”. > + } > […] > + > + word = val.substr (prev, cur - prev); > + if (!word.empty ()) > + result.insert (word); > +} > + > […] > @@ -164,7 +216,6 @@ host_detect_local_cpu (int argc, const char **argv) > { > const char *res = NULL; > static const int num_exts = ARRAY_SIZE (aarch64_extensions); > - char buf[128]; > FILE *f = NULL; > bool arch = false; > bool tune = false; > @@ -178,6 +229,7 @@ host_detect_local_cpu (int argc, const char **argv) > bool processed_exts = false; > uint64_t extension_flags = 0; > uint64_t default_flags = 0; > + std::string buf; > > gcc_assert (argc); > > @@ -202,9 +254,9 @@ host_detect_local_cpu (int argc, const char **argv) > > /* Look through /proc/cpuinfo to determine the implementer > and then the part number that identifies a particular core. */ > - while (fgets (buf, sizeof (buf), f) != NULL) > + while (!(buf = readline (f)).empty ()) > { > - if (strstr (buf, "implementer") != NULL) > + if (buf.find ("implementer") != std::string::npos) > { > unsigned cimp = parse_field (buf); > if (cimp == INVALID_IMP) > @@ -216,8 +268,7 @@ host_detect_local_cpu (int argc, const char **argv) > else if (imp != cimp) > goto not_found; > } > - > - if (strstr (buf, "variant") != NULL) > + else if (buf.find ("variant") != std::string::npos) > { > unsigned cvariant = parse_field (buf); > if (!contains_core_p (variants, cvariant)) > @@ -229,8 +280,7 @@ host_detect_local_cpu (int argc, const char **argv) > } > continue; > } > - > - if (strstr (buf, "part") != NULL) > + else if (buf.find ("part") != std::string::npos) > { > unsigned ccore = parse_field (buf); > if (!contains_core_p (cores, ccore)) > @@ -242,39 +292,36 @@ host_detect_local_cpu (int argc, const char **argv) > } > continue; > } > - if (!tune && !processed_exts && strstr (buf, "Features") != NULL) > + else if (!tune && !processed_exts > + && buf.find ("Features") != std::string::npos) > { > + /* First create the list of features in the buffer. */ > + std::set<std::string> features; > + /* Drop everything till the :. */ > + buf = buf.substr (buf.find (':') + 1); We should check the result of the find, rather than just assume that the ':' exists. Looks like current parse_field has the same problem though… Thanks, Richard