On Monday, July 27, 2020 5:12 AM, Wojtek Porczyk <w...@invisiblethingslab.com> 
wrote:

> On Sat, Jul 25, 2020 at 10:46:40AM +0000, WillyPillow wrote:
> 

> > On Saturday, July 25, 2020 12:18 AM, Wojtek Porczyk 
> > w...@invisiblethingslab.com wrote:
> > 

> > > On Thu, Jul 23, 2020 at 05:45:56PM +0000, WillyPillow wrote:
> > 

> > > > One issue is that from the qrexec client side it is basically 
> > > > impossible to
> > > > distinguish between the two. (Consider the case where a field contains
> > > > `xxx\\\\na:b:c`.)
> > 

> > > If there are more colons that there are supposed to be, there is no need 
> > > to
> > > distinguish anything anymore, just error out for "malformed input" or
> > > something.
> > 

> > > In Python I like to do it with tuple assingment:
> > 

> > > try:
> > > field1, field2, field3 = untrusted_line.split(':')
> > > except TypeError:
> > > raise ParseError('error message')
> > 

> > By "distinguish between the two" I meant correct input and malformed input.
> > Sorry for the confusion.
> > The point I was trying to make in that example was that when there are 
> > colons
> > and newlines, there could be cases like the following:
> > 

> >     field1 = 'a'
> >     field2 = 'b'
> >     field3 = 'c\\nd:e:f'
> >     encoded = ':'.join([field1, field2, field3]) # a single entry
> >     encoded_list = '\\n'.join([encoded] * 10) # simulate multiple entries
> >     # on the other side...
> >     decoded_list = encoded_list.split('\\n') # becomes 20 rows
> >     for untrusted_line in decoded_list:
> >         field1, field2, field3 = untrusted_line.split(':') # this does not 
> > error out
> >     

> 

> -   Yes, this breaks something and is not distinguishable from template 
> manager.
> -   No, it's not malformed, because it conforms to specification.
> -   It's OK from security perspective: You cannot cause installation of 
> anything
>     the user didn't explicitly mean (by choosing an item on a list and/or
>     accepting a signing key).

I agree. To be clear, I was only bringing that up from a UX perspective, i.e.,
not always able to detect the error.

(To be pedantic though, to deal with wildcards and stuff, all the templates on
the returned list *are* installed in some cases, so it is possible for some
unwanted templates to be installed currently. That being said, i) it's fairly
easy to check that the template names indeed matches the given patterns and ii)
an accepted signing key is still required, so no big issue here.)

>     To reiterate: in optimal case this should be avoided from the sending 
> side,
>     but it't not a very high priority. It would be also OK if the tool and
>     specification did something like "only the first line with ':' skipped".
>     

>     The worst case is something can be shown on the list of available 
> templates
>     that actually cannot be installed.

Again, I agree. Hence the analogy to undefined behavior below.

> > > It's as simple as that. The big advantage is that there aren't many ways 
> > > to do
> > > something wrong.
> > 

> > > > Security-wise, this is unlikely to cause issues as an entity that can 
> > > > do this
> > > > can probably modify the repo contents directly.
> > 

> > > The point is, we don't know. The repo content is untrusted, and yes, 
> > > attacker
> > > can modify it. What counts is signature on RPM.
> > 

> > > > However, if the repo, by accident, does contain packages with, say, 
> > > > colons in
> > > > summaries, it may be an issue usability-wise as it's hard to give 
> > > > meaningful
> > > > error messages when things break.
> > 

> > > "Malformed input" is OK. If we break loudly, template maintainers (the 
> > > honest
> > > among them) won't publish such summary, because it will break.
> > 

> > The issue is that it's not always possible to detect such errors, as 
> > elaborated
> > above. Of course, as long as we keep this "wart" explicit, I'm okay with 
> > having
> > this as an "undefined behavior". (After all, the repo listing is considered
> > untrusted anyway.)
> 

> Sure.
> 

> > > > There's also the original issue with descriptions (assuming that we 
> > > > don't omit
> > > > them), which contains newlines a lot of the time.
> > > > That being said, if we treat such errors as "repo errors" and leave to 
> > > > the repo
> > > > maintainers to ensure that the fields follow a certain format, then we 
> > > > can just
> > > > use a special character for the separator [5] and ban the character 
> > > > from the
> > > > fields.
> > 

> > > Yes, and IIUC the current proposal is to have ':' as that special 
> > > character.
> > > Am I missing something?
> > 

> > The reason I'm bringing this up is twofold:
> > 

> > 1.  The idea of having the repo maintainer ensure such constraints have not 
> > been
> >     explicitly brought up before AFAIK.
> >     

> > 2.  I'm unsure whether `:` is the best choice for this, as it's not an 
> > uncommon
> >     character, and banning it might be undesirable. Something like ASCII 
> > 28-31
> >     are possible alternatives.
> >     

> 

> I think this should be one of the printable characters, so the output is easy
> to read and audit. Though ':' might well not be the best character, but there
> are others like '|' or even '!' (and I'd argue banning '!' from descriptions
> would do good to avoid shouting at people).

I see :)

> > > > [5]: The separator may also need to be placed at the end of the format 
> > > > string.
> > 

> > > I don't think so.
> > 

> > If there is no such character at EOLs, then `\\n` becomes another "special
> > character", as stuff would break when doing the `.split('\\n')` step. If, 
> > on the
> > other hand, there is such a character, we can be sure that every entry 
> > consists
> > of exactly k (some positive number) <separator>-seperated entries.
> 

> Having separator at the end gains nothing.

Maybe I'm missing something, but I don't see why this is the case.

My understanding of the situation is this: (I'm also attempting to summarize
the context for anyone else reading this; the directly relevant points start
from number 4.)

1. To allow for better features on the `search` and `info` operations, we want
   to incorporate additional metadata like URL, description, ...etc in the
results of `qubes.TemplateSearch`.
2. Doing this requires us to rethink the output format. Previously, as the only
   field that may contain `:` is the summary field, which i) does not contain
`\n` and ii) is the last field in the output format, parsing it like [commit]
works.
3. As the result of previous discussions, changing the separator from ':' to
   something less common like '|', and having a guideline that bans the latter
from the package metadata is an acceptable solution.
4. However, the description field still may contain newlines [6], making
   splitting the output into separate records not exactly trivial. For
instance, a description field containing `abc\ndef` will be broken up from the
`\n`, resulting in a description only containing `abc` and a broken row
containing `def` (assuming that the description is the last field). This causes
us to raise an exception for an otherwise conformant package.
   (Yes, technically, we can take advantage of the fact that there are multiple
fields (i.e., those other than the description) that are guaranteed not to
contain newlines and place them at "strategic locations" to mitigate this
(e.g., `Name|...|Description` and something like `re.split('\n(?=[^\n|]*\\|)',
output)`). However, I feel that this is additional complexity just for saving a
character.)
5. As such, the idea is to include a separator character at the end of the
   line. This allows us to know precisely where the description field ends, and
parse the entire output like the following:

    ``` py
    def parse(output):
        split = output.split('|')
        return [split[i:i+NUM_FIELDS]
            for i in range(0, len(split), NUM_FIELDS)] 

    ```

[6]: Note that it is probably not desirable to ban newlines directly. Not only
does it serve semantic purposes, existing tools like rpmlint also explicitly
complain about description lines longer than 79 characters.

[commit]:
https://github.com/QubesOS/qubes-core-admin-client/pull/145/commits/efda82eaa2bcaebdbc1dfd1c5dce717be46ae63e

> > Thanks,
> > WillyPillow
> 

> Thank you for this week! Tomorrow Marmarek returns, so he'll carry on.

Thanks for the awesome help!

Thanks,
WillyPillow

> https://blog.nerde.pw/
>
> PGP fingerprint = 6CCF 3FC7 32AC 9D83 D154 217F 1C16 C70E E7C3 1C84
>
> Protonmail PGP = D02D CEFF ACE5 5A7B FF5D 871E 4004 1CB1 F52B 127E

-- 
You received this message because you are subscribed to the Google Groups 
"qubes-devel" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to qubes-devel+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/qubes-devel/tiy070qIdml0XSBCwexsJDTjIJJPH9Kyq9UQZ5OiJcv5cB5AHQCX1mMkWuwnoswhzzG6qruVJ29StvBKqQ-w9PZqZ_hYZ5H_tmknHpTyK2Y%3D%40nerde.pw.

Attachment: publickey - wp@nerde.pw - 0xD02DCEFF.asc
Description: application/pgp-keys

Attachment: signature.asc
Description: OpenPGP digital signature

Reply via email to