Re: [HarfBuzz] HB_CLOSURE_MAX_STAGES (was: harfbuzz: Branch 'master')

2018-07-30 Thread Richard Wordingham
On Mon, 30 Jul 2018 17:04:42 -0700
Behdad Esfahbod  wrote:

> On Thu, Jul 26, 2018 at 12:06 AM, Richard Wordingham <
> richard.wording...@ntlworld.com> wrote:  
> 
> > On Tue, 24 Jul 2018 16:31:50 + (UTC)
> > beh...@kemper.freedesktop.org (Behdad Esfahbod) wrote:
> >
> > The following change bothers me:
> >  
> > >  src/hb-ot-layout-common-private.hh |7 +++
> > >  src/hb-ot-layout.cc|5 -
> > >  2 files changed, 11 insertions(+), 1 deletion(-)
> > >
> > > New commits:
> > > commit 85646fdadb2f102333485e07425361795b4e0412
> > > Author: Garret Rieger 
> > > Date:   Mon Jul 23 15:37:18 2018 -0700
> > >
> > > [subset] Limit the iterations of the closure algorithm.
> > > Prevents O(n^2) run times.
> > >
> > > diff --git a/src/hb-ot-layout-common-private.hh
> > > b/src/hb-ot-layout-common-private.hh index 21caf9e9..7ff0dbeb
> > > 100644 --- a/src/hb-ot-layout-common-private.hh
> > > +++ b/src/hb-ot-layout-common-private.hh
> > > @@ -41,6 +41,13 @@
> > >  #ifndef HB_MAX_CONTEXT_LENGTH
> > >  #define HB_MAX_CONTEXT_LENGTH64
> > >  #endif
> > > +#ifndef HB_CLOSURE_MAX_STAGES
> > > +/*
> > > + * The maximum number of times a lookup can be applied during
> > > shaping.
> > > + * Used to limit the number of iterations of the closure
> > > algorithm.
> > > + */
> > > +#define HB_CLOSURE_MAX_STAGES8
> > > +#endif  
> >
> > I presume that this is intended to prevent a denial of service
> > attack, 
> 
> Correct.
> 
> 
> > at the cost of trashing a subset font.
> >  
> 
> Not really.
> 
> 
> > In non-malicious use, how is the victim supposed to detect that and
> > then how he needs to change HarfBuzz or his font?  Does he have to
> > read all the text using the subset font simply to detect a
> > problem?  How does one test that a font does not hit this limit?  
> 
> 
> It's impossible to hit that limit...  Ok, it would be impossible if we
> increase it to 32.  I'll do that.

That'll probably work, but I'm now intrigued.  Why have a limit that
will never be hit?  Are you just catering for HarfBuzz's logic simply
going badly wrong in very unusual circumstances?


The further points is just nit-picking and can be safely ignored.

> >   Does one have to
> > iterate over the power set of the supported characters for each
> > script?  That's O(2^n) - impossible to do!
> >
> > The description of HB_CLOSURE_MAX_STAGES is completely wrong.  I was
> > initially alarmed because I have lookups that are invoked in more
> > than 8 places in substitution subtables.  A more accurate, but
> > still not perfect, definition, would be 'the maximum number of
> > times lookup can change a bit of text'.
> >  
> 
> Nope.  Stage is a technical term in HarfBuzz GSUB processing.
> 
> According to OpenType spec, lookups are processed in increasing order
> of their indices.  This implies that each lookup is processed one.
> But then the script shaping specs say some features are applied
> separately.  Each of those separated list of features/lookups applied
> are called one stage.  The total number of stages in any shaper is
> the total number of times a lookup can be applied in theory.

That applies to lookups that are always formally unconditionally
applied. It doesn't apply to lookups invoked in response to context or
chaincontext lookups.

> Note
> that this does NOT limit recursion through Context and ChainContext
> lookups.

Richard.
___
HarfBuzz mailing list
HarfBuzz@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/harfbuzz


Re: [HarfBuzz] HB_CLOSURE_MAX_STAGES (was: harfbuzz: Branch 'master')

2018-07-30 Thread Behdad Esfahbod
On Thu, Jul 26, 2018 at 7:39 AM, Petr Kobalíček 
wrote:

> I initially thought that this was to prevent an infinite recursion of
> contextual lookups.
>
> I'm working with OpenType myself (not harfbuzz) and this is something that
> I think is not clarified in the specification. Can a contextual
> substitution invoke another contextual substitution and recurse?
>

Yes it can.



> Is HB_CLOSURE_MAX_STAGES here to enforce hard limit?
>

No.  But neighboring HB_MAX_NESTING_LEVEL does that.  Currently set to 6.

The font Noto Nastaliq Urdu uses nested recursive lookups extensively.


> To write a bit more about it. I thought that contextual lookup has
> basically 3 parts:
>
>   - backtrack sequence
>   - input sequence
>   - lookahead sequence
>
> I would imagine that "input" sequence would be pretty short, like one
> character most of the time, and the lookup applied if we have a match would
> only consist of "input sequence". So the question is, does it make sense to
> apply another contextual lookup to just the isolated "input sequence" in
> case we had a match?
>

Yes.


> Do you guys here know any material that would further explain how such
> cases of GSUB should be correctly handled?
>

How HarfBuzz does it is my best understanding of how it should be done (not
necessarily how Microsoft does it, but compatible-enough).


> Best,
> Petr.
>
>
> On Thu, Jul 26, 2018 at 9:06 AM, Richard Wordingham <
> richard.wording...@ntlworld.com> wrote:
>
>> On Tue, 24 Jul 2018 16:31:50 + (UTC)
>> beh...@kemper.freedesktop.org (Behdad Esfahbod) wrote:
>>
>> The following change bothers me:
>>
>> >  src/hb-ot-layout-common-private.hh |7 +++
>> >  src/hb-ot-layout.cc|5 -
>> >  2 files changed, 11 insertions(+), 1 deletion(-)
>> >
>> > New commits:
>> > commit 85646fdadb2f102333485e07425361795b4e0412
>> > Author: Garret Rieger 
>> > Date:   Mon Jul 23 15:37:18 2018 -0700
>> >
>> > [subset] Limit the iterations of the closure algorithm.
>> > Prevents O(n^2) run times.
>> >
>> > diff --git a/src/hb-ot-layout-common-private.hh
>> > b/src/hb-ot-layout-common-private.hh index 21caf9e9..7ff0dbeb 100644
>> > --- a/src/hb-ot-layout-common-private.hh
>> > +++ b/src/hb-ot-layout-common-private.hh
>> > @@ -41,6 +41,13 @@
>> >  #ifndef HB_MAX_CONTEXT_LENGTH
>> >  #define HB_MAX_CONTEXT_LENGTH64
>> >  #endif
>> > +#ifndef HB_CLOSURE_MAX_STAGES
>> > +/*
>> > + * The maximum number of times a lookup can be applied during
>> > shaping.
>> > + * Used to limit the number of iterations of the closure algorithm.
>> > + */
>> > +#define HB_CLOSURE_MAX_STAGES8
>> > +#endif
>>
>> I presume that this is intended to prevent a denial of service attack,
>> at the cost of trashing a subset font.
>>
>> In non-malicious use, how is the victim supposed to detect that and
>> then how he needs to change HarfBuzz or his font?  Does he have to read
>> all the text using the subset font simply to detect a problem?  How
>> does one test that a font does not hit this limit?  Does one have to
>> iterate over the power set of the supported characters for each
>> script?  That's O(2^n) - impossible to do!
>>
>> The description of HB_CLOSURE_MAX_STAGES is completely wrong.  I was
>> initially alarmed because I have lookups that are invoked in more than
>> 8 places in substitution subtables.  A more accurate, but still not
>> perfect, definition, would be 'the maximum number of times lookup can
>> change a bit of text'.
>>
>> A limit of 8 does not strike me as obviously generous.  Some contextual
>> changes can ripple through a string, and I would not be totally
>> surprised to find that 8+1 or more lookups act on some irreducible
>> strings in my Da Lekh font.  The consolations are that there are
>> probably shorter paths to create the resultant glyphs from the input
>> set, and one iteration will often process several lookups in the
>> correct sequence.
>>
>> Richard.
>> ___
>> HarfBuzz mailing list
>> HarfBuzz@lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/harfbuzz
>>
>
>
> ___
> HarfBuzz mailing list
> HarfBuzz@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/harfbuzz
>
>


-- 
behdad
http://behdad.org/
___
HarfBuzz mailing list
HarfBuzz@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/harfbuzz


Re: [HarfBuzz] HB_CLOSURE_MAX_STAGES (was: harfbuzz: Branch 'master')

2018-07-30 Thread Behdad Esfahbod
On Thu, Jul 26, 2018 at 12:06 AM, Richard Wordingham <
richard.wording...@ntlworld.com> wrote:

> On Tue, 24 Jul 2018 16:31:50 + (UTC)
> beh...@kemper.freedesktop.org (Behdad Esfahbod) wrote:
>
> The following change bothers me:
>
> >  src/hb-ot-layout-common-private.hh |7 +++
> >  src/hb-ot-layout.cc|5 -
> >  2 files changed, 11 insertions(+), 1 deletion(-)
> >
> > New commits:
> > commit 85646fdadb2f102333485e07425361795b4e0412
> > Author: Garret Rieger 
> > Date:   Mon Jul 23 15:37:18 2018 -0700
> >
> > [subset] Limit the iterations of the closure algorithm.
> > Prevents O(n^2) run times.
> >
> > diff --git a/src/hb-ot-layout-common-private.hh
> > b/src/hb-ot-layout-common-private.hh index 21caf9e9..7ff0dbeb 100644
> > --- a/src/hb-ot-layout-common-private.hh
> > +++ b/src/hb-ot-layout-common-private.hh
> > @@ -41,6 +41,13 @@
> >  #ifndef HB_MAX_CONTEXT_LENGTH
> >  #define HB_MAX_CONTEXT_LENGTH64
> >  #endif
> > +#ifndef HB_CLOSURE_MAX_STAGES
> > +/*
> > + * The maximum number of times a lookup can be applied during
> > shaping.
> > + * Used to limit the number of iterations of the closure algorithm.
> > + */
> > +#define HB_CLOSURE_MAX_STAGES8
> > +#endif
>
> I presume that this is intended to prevent a denial of service attack,
>

Correct.


> at the cost of trashing a subset font.
>

Not really.


> In non-malicious use, how is the victim supposed to detect that and
> then how he needs to change HarfBuzz or his font?  Does he have to read
> all the text using the subset font simply to detect a problem?  How
> does one test that a font does not hit this limit?


It's impossible to hit that limit...  Ok, it would be impossible if we
increase it to 32.  I'll do that.


>   Does one have to
> iterate over the power set of the supported characters for each
> script?  That's O(2^n) - impossible to do!
>
> The description of HB_CLOSURE_MAX_STAGES is completely wrong.  I was
> initially alarmed because I have lookups that are invoked in more than
> 8 places in substitution subtables.  A more accurate, but still not
> perfect, definition, would be 'the maximum number of times lookup can
> change a bit of text'.
>

Nope.  Stage is a technical term in HarfBuzz GSUB processing.

According to OpenType spec, lookups are processed in increasing order of
their indices.  This implies that each lookup is processed one.  But then
the script shaping specs say some features are applied separately.  Each of
those separated list of features/lookups applied are called one stage.  The
total number of stages in any shaper is the total number of times a lookup
can be applied in theory.  Note that this does NOT limit recursion through
Context and ChainContext lookups.

commit 66ccd8ac405c9c25b37de9eb467a7382880dda35 (HEAD -> master,
origin/master)
Author: Behdad Esfahbod 
Date:   Mon Jul 30 17:03:06 2018 -0700

[serialize] Increase stage count from 8 to 32

Indic shaper uses many stages.  Now we are provably not limiting
functionality whereas the previous limit of 8 was assuming real-world
practices.

diff --git a/src/hb-ot-layout-common-private.hh
b/src/hb-ot-layout-common-private.hh
index dec237c8..1cf530ea 100644
--- a/src/hb-ot-layout-common-private.hh
+++ b/src/hb-ot-layout-common-private.hh
@@ -45,8 +45,10 @@
 /*
  * The maximum number of times a lookup can be applied during shaping.
  * Used to limit the number of iterations of the closure algorithm.
+ * This must be larger than the number of times add_pause() is
+ * called in a collect_features call of any shaper.
  */
-#define HB_CLOSURE_MAX_STAGES  8
+#define HB_CLOSURE_MAX_STAGES  32
 #endif





> A limit of 8 does not strike me as obviously generous.  Some contextual
> changes can ripple through a string, and I would not be totally
> surprised to find that 8+1 or more lookups act on some irreducible
> strings in my Da Lekh font.  The consolations are that there are
> probably shorter paths to create the resultant glyphs from the input
> set, and one iteration will often process several lookups in the
> correct sequence.
>
> Richard.
>

-- 
behdad
http://behdad.org/
___
HarfBuzz mailing list
HarfBuzz@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/harfbuzz


[HarfBuzz] harfbuzz: Branch 'master' - 2 commits

2018-07-30 Thread Behdad Esfahbod
 src/hb-machinery-private.hh|6 --
 src/hb-ot-layout-common-private.hh |4 +++-
 2 files changed, 3 insertions(+), 7 deletions(-)

New commits:
commit 66ccd8ac405c9c25b37de9eb467a7382880dda35
Author: Behdad Esfahbod 
Date:   Mon Jul 30 17:03:06 2018 -0700

[serialize] Increase stage count from 8 to 32

Indic shaper uses many stages.  Now we are provably not limiting
functionality whereas the previous limit of 8 was assuming real-world
practices.

diff --git a/src/hb-ot-layout-common-private.hh 
b/src/hb-ot-layout-common-private.hh
index dec237c8..1cf530ea 100644
--- a/src/hb-ot-layout-common-private.hh
+++ b/src/hb-ot-layout-common-private.hh
@@ -45,8 +45,10 @@
 /*
  * The maximum number of times a lookup can be applied during shaping.
  * Used to limit the number of iterations of the closure algorithm.
+ * This must be larger than the number of times add_pause() is
+ * called in a collect_features call of any shaper.
  */
-#define HB_CLOSURE_MAX_STAGES  8
+#define HB_CLOSURE_MAX_STAGES  32
 #endif
 
 
commit ee8cf919654cb191e955fe1f89b1ebfb2b8b32ee
Author: Behdad Esfahbod 
Date:   Mon Jul 30 16:59:41 2018 -0700

[serialize] Remove unused truncate() method

diff --git a/src/hb-machinery-private.hh b/src/hb-machinery-private.hh
index 653d7c6f..ff56b1dc 100644
--- a/src/hb-machinery-private.hh
+++ b/src/hb-machinery-private.hh
@@ -467,12 +467,6 @@ struct hb_serialize_context_t
 return reinterpret_cast (&obj);
   }
 
-  inline void truncate (void *new_head)
-  {
-assert (this->start < new_head && new_head <= this->head);
-this->head = (char *) new_head;
-  }
-
   unsigned int debug_depth;
   char *start, *end, *head;
   bool ran_out_of_room;
___
HarfBuzz mailing list
HarfBuzz@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/harfbuzz


[HarfBuzz] harfbuzz: Branch 'master'

2018-07-30 Thread Behdad Esfahbod
 src/hb-ot-layout.cc
|   26 --
 
test/api/fonts/clusterfuzz-testcase-minimized-hb-subset-fuzzer-5670861909524480 
   |binary
 
test/api/fonts/clusterfuzz-testcase-minimized-hb-subset-get-codepoints-fuzzer-6136125075750912
 |binary
 3 files changed, 21 insertions(+), 5 deletions(-)

New commits:
commit 5edf454aa64aad461c90bd991e7eaf27668b7e6b
Author: Garret Rieger 
Date:   Thu Jul 26 17:42:02 2018 -0700

[subset] During lookup collection remember the features we've already 
processed.

diff --git a/src/hb-ot-layout.cc b/src/hb-ot-layout.cc
index 01ca5143..b38dbc38 100644
--- a/src/hb-ot-layout.cc
+++ b/src/hb-ot-layout.cc
@@ -663,6 +663,7 @@ _hb_ot_layout_collect_lookups_features (hb_face_t  
*face,
unsigned intscript_index,
unsigned intlanguage_index,
const hb_tag_t *features,
+hb_set_t   *visited_features,
hb_set_t   *lookup_indexes /* OUT 
*/)
 {
   if (!features)
@@ -673,11 +674,15 @@ _hb_ot_layout_collect_lookups_features (hb_face_t  
*face,
script_index,
language_index,
&required_feature_index,
-   nullptr))
-  _hb_ot_layout_collect_lookups_lookups (face,
-table_tag,
-required_feature_index,
-lookup_indexes);
+   nullptr)
+&& !visited_features->has (required_feature_index))
+{
+_hb_ot_layout_collect_lookups_lookups (face,
+   table_tag,
+   required_feature_index,
+   lookup_indexes);
+visited_features->add (required_feature_index);
+}
 
 /* All features */
 unsigned int feature_indices[32];
@@ -694,10 +699,14 @@ _hb_ot_layout_collect_lookups_features (hb_face_t  
*face,
 feature_indices);
 
   for (unsigned int i = 0; i < len; i++)
+  {
+if (visited_features->has (feature_indices[i])) continue;
_hb_ot_layout_collect_lookups_lookups (face,
   table_tag,
   feature_indices[i],
   lookup_indexes);
+visited_features->add (feature_indices[i]);
+  }
 
   offset += len;
 } while (len == ARRAY_LENGTH (feature_indices));
@@ -727,6 +736,7 @@ _hb_ot_layout_collect_lookups_languages (hb_face_t  
*face,
 unsigned intscript_index,
 const hb_tag_t *languages,
 const hb_tag_t *features,
+ hb_set_t   *visited_features,
 hb_set_t   *lookup_indexes /* OUT 
*/)
 {
   _hb_ot_layout_collect_lookups_features (face,
@@ -734,6 +744,7 @@ _hb_ot_layout_collect_lookups_languages (hb_face_t  
*face,
  script_index,
  HB_OT_LAYOUT_DEFAULT_LANGUAGE_INDEX,
  features,
+  visited_features,
  lookup_indexes);
 
   if (!languages)
@@ -749,6 +760,7 @@ _hb_ot_layout_collect_lookups_languages (hb_face_t  
*face,
  script_index,
  language_index,
  features,
+  visited_features,
  lookup_indexes);
   }
   else
@@ -766,6 +778,7 @@ _hb_ot_layout_collect_lookups_languages (hb_face_t  
*face,
script_index,
language_index,
features,
+visited_features,
lookup_indexes);
 }
   }
@@ -784,6 +797,7 @@ hb_ot_layout_collect_lookups (hb_face_t  *face,
  const hb_tag_t *features,
  hb_set_t   *lookup_indexes /* OUT */)
 {
+  hb_auto_t visited_features;
   if (!scripts)
   {
 /* All scripts */
@@ -796,6 +810,7 @@ hb_ot_lay