Re: [PATCH 1/6] AArch64: Fix bugs in -mcpu=native detection.

Richard Sandiford Thu, 09 Jul 2020 10:32:20 -0700

Tamar Christina <tamar.christ...@arm.com> writes:
> Hi Richard,
>
> Thanks for the review,
>
>> -----Original Message-----
>> From: Richard Sandiford <richard.sandif...@arm.com>
>> Sent: Thursday, July 9, 2020 1:35 PM
>> To: Tamar Christina <tamar.christ...@arm.com>
>> Cc: gcc-patches@gcc.gnu.org; nd <n...@arm.com>; Richard Earnshaw
>> <richard.earns...@arm.com>; Marcus Shawcroft
>> <marcus.shawcr...@arm.com>; Kyrylo Tkachov <kyrylo.tkac...@arm.com>
>> Subject: Re: [PATCH 1/6] AArch64: Fix bugs in -mcpu=native detection.
>> 
>> Tamar Christina <tamar.christ...@arm.com> writes:
>> > Hi All,
>> >
>> > This patch fixes a couple of issues in AArch64's -mcpu=native processing:
>> >
>> > The buffer used to read the lines from /proc/cpuinfo is 128 bytes
>> > long.  While this was enough in the past with the increase in architecture
>> extensions it is
>> > no longer enough.   It results in two bugs:
>> >
>> > 1) No option string longer than 127 characters is correctly parsed.  
>> > Features
>> >    that are supported are silently ignored.
>> >
>> > 2) It incorrectly enables features that are not present on the machine:
>> >   a) It checks for substring matching instead of full word matching.  This
>> makes
>> >      it incorrectly detect sb support when ssbs is provided instead.
>> >   b) Due to the truncation at the 127 char border it also incorrectly 
>> > enables
>> >      features due to the full feature being cut off and the part that is 
>> > left
>> >      accidentally enables something else.
>> >
>> > This breaks -mcpu=native detection on some of our newer system.
>> >
>> > The patch fixes these issues by reading full lines up to the \n in a 
>> > string.
>> > This gives us the full feature line.  Secondly it creates a set from
>> > this string
>> > to:
>> >
>> >  1) Reduce matching complexity from O(n*m) to O(n*logm).
>> >  2) Perform whole word matching instead of substring matching.
>> >
>> > To make this code somewhat cleaner I also changed from using char* to
>> > using std::string and std::set.
>> >
>> > Note that I have intentionally avoided the use of ifstream and
>> > stringstream to make it easier to backport.  I have also not change
>> > the substring matching for the initial line classification as I cannot
>> > find a documented cpuinfo format which leads me to believe there may
>> > be kernels out there that require this which may be why the original code
>> does this.
>> >
>> > I also do not want this to break if the kernel adds a new line that is
>> > long and indents the file by two tabs to keep everything aligned.  In
>> > short I think an imprecise match is the right thing here.
>> >
>> > Test for this is added as the last thing in this series as it requires
>> > some changes to be made to be able to test this.
>> >
>> > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
>> 
>> Sorry to be awkward.  I know Kyrill's already approved this, but I kind-of
>> object.
>> 
>> I think we should split this into two: fixing the limit bug, and fixing the
>> complexity.  It's easier to justify backporting the fix for the limit bug 
>> than the
>> fix for the complexity.
>
> It was never the intention to fix the complexity. The change in complexity is 
> just
> simply because I split the feature strings into actual words now. I do so 
> because
> in my opinion this is simpler than doing memcmp and checking the the character
> after the one you matched is a space or a null character, and then checking 
> that you are
> one character away from the previous space or at the start of the line.
>
> Do you know of a simpler way to do word matches in C?


This isn't an anti-C++ thing :-)  I just think any clean-up should
be separate from the bugfix, and only the bugfix should be backported.

>> For fixing the limit bug, it seems better to use an obstack to read the file
>> instead.  This should actually be much easier than what the patch does. :-)  
>> It
>> should also be more efficient.
>> 
>
> Sorry.. I genuinely don't understand. I had looked at obstack and the 
> interface seemed to be
> more work to me.  I am probably wrong since this interface has zero 
> documentation
> but it looks like to use obstack I have to (based on guessing from what other 
> code are doing)
>
> 1. call gcc_obstack_init
> 2. call obstack_alloc
> 3. call obstack_grow any time I need a bigger buffer.
> 4. call obstack_free
>
> So I am missing why this is simpler than calling realloc... 
> But obviously I am missing something.

Yeah, (1), (3) and (4) are the right steps.  But the simplifying
thing is that obstack has “optimised” single-character insertions.
So rather than going by fgets, we can just do something like:

  while ((ch = getc (f)) != EOF)
    {
      obstack_1grow (&ob, ch);
      if (ch == '\n')
        break;
    }
  obstack_1grow (&ob, 0);
  return obstack_finish (&ob);

to read a zero-terminated line.  Then use:

  obstack_free (&ob, line);

after each line.  (Optional, but more efficient, since it reuses memory.)

This saves doing a fresh xmalloc+free for each line.

>> For fixing the complexity: does this actually make things noticeably faster 
>> in
>> practice?  The “m” in the O(n*m) is a fairly small constant.
>> The cost of making it O(n*logm) this way involves many more memory
>> allocations, so it isn't obvious that it's actually more efficient in 
>> practice.  Do
>> you have any timings?
>
> Well, no.. I didn't run any benchmarks. Did you have any in mind that you 
> wanted me to run?
>
> As I mentioned, the complexity change was just a byproduct of doing a much 
> easier to see correct
> word matching.

Ah, OK.

> But if you want me to just add extra checks to the existing implementation to 
> see
> if it meets all the criteria of being a standalone word then sure I can do 
> that.  I don't particularly find
> that implementation easier to read and I would have thought correctness was 
> to be favored over memory
> allocation in something that is called once.
>
>> 
>> If search time is a problem, building a hash_map might be better.
>
> I would have thought the red-black tree of a set is fine.. But if I'm only 
> allowed to use GCC internal structures then
> I will change it...

No, using C++ is fine.  The reason for suggesting hash_map
(or hash_set, as I should have said) was that, if complexity
was an issue, hash_set/map is supposed to have amortised constant
lookup time for a good hash.

As far as the patch itself goes:

> @@ -126,6 +127,57 @@ parse_field (const char *field)
>    return fint;
>  }
> 
> +/* Splits and returns a string based on whitespace and return it as
> +   part of a set. Empty strings are ignored.  */
> +
> +static void
> +split_words (const std::string val, std::set<std::string> &result)

Not sure there's any benefit to marking “val” const, and adding it makes
it look like it was supposed to be “const std::string &” instead
(which should be more efficient).

> +{
> +  size_t cur, prev = 0;
> +  std::string word;
> +  while ((cur = val.find_first_of (" \n", prev)) != std::string::npos)
> +  {

The { … } should be indented two spaces relative the “while”.

> +    word = val.substr (prev, cur - prev);
> +    /* Skip adding empty words.  */
> +    if (!word.empty ())
> +      result.insert (word);

Guess, this is just personal preference, sorry, but:

  if (prev != cur)
    result.insert (std::string (prev, cur));

seems simpler.

> +    prev = cur+1;

Should be spaces around the “+”.

> +  }
> […]
> +
> +  word = val.substr (prev, cur - prev);
> +  if (!word.empty ())
> +    result.insert (word);
> +}
> +
> […]
> @@ -164,7 +216,6 @@ host_detect_local_cpu (int argc, const char **argv)
>  {
>    const char *res = NULL;
>    static const int num_exts = ARRAY_SIZE (aarch64_extensions);
> -  char buf[128];
>    FILE *f = NULL;
>    bool arch = false;
>    bool tune = false;
> @@ -178,6 +229,7 @@ host_detect_local_cpu (int argc, const char **argv)
>    bool processed_exts = false;
>    uint64_t extension_flags = 0;
>    uint64_t default_flags = 0;
> +  std::string buf;
> 
>    gcc_assert (argc);
> 
> @@ -202,9 +254,9 @@ host_detect_local_cpu (int argc, const char **argv)
> 
>    /* Look through /proc/cpuinfo to determine the implementer
>       and then the part number that identifies a particular core.  */
> -  while (fgets (buf, sizeof (buf), f) != NULL)
> +  while (!(buf = readline (f)).empty ())
>      {
> -      if (strstr (buf, "implementer") != NULL)
> +      if (buf.find ("implementer") != std::string::npos)
>       {
>         unsigned cimp = parse_field (buf);
>         if (cimp == INVALID_IMP)
> @@ -216,8 +268,7 @@ host_detect_local_cpu (int argc, const char **argv)
>         else if (imp != cimp)
>           goto not_found;
>       }
> -
> -      if (strstr (buf, "variant") != NULL)
> +      else if (buf.find ("variant") != std::string::npos)
>       {
>         unsigned cvariant = parse_field (buf);
>         if (!contains_core_p (variants, cvariant))
> @@ -229,8 +280,7 @@ host_detect_local_cpu (int argc, const char **argv)
>           }
>            continue;
>          }
> -
> -      if (strstr (buf, "part") != NULL)
> +      else if (buf.find ("part") != std::string::npos)
>       {
>         unsigned ccore = parse_field (buf);
>         if (!contains_core_p (cores, ccore))
> @@ -242,39 +292,36 @@ host_detect_local_cpu (int argc, const char **argv)
>           }
>         continue;
>       }
> -      if (!tune && !processed_exts && strstr (buf, "Features") != NULL)
> +      else if (!tune && !processed_exts
> +            && buf.find ("Features") != std::string::npos)
>       {
> +       /* First create the list of features in the buffer.  */
> +       std::set<std::string> features;
> +       /* Drop everything till the :.  */
> +       buf = buf.substr (buf.find (':') + 1);

We should check the result of the find, rather than just assume that the
':' exists.

Looks like current parse_field has the same problem though…

Thanks,
Richard

Re: [PATCH 1/6] AArch64: Fix bugs in -mcpu=native detection.

Reply via email to