On Wed, Mar 18, 2020 at 12:09 PM Brian Hulette <[email protected]> wrote:

> In Beam schemas we don't seem to have a well-defined policy around special
> characters (like $.[]) in field names. There's never any explicit
> validation, but we do have some ad-hoc rules (e.g. we use _ rather than the
> more natural . when concatenating field names in a nested select [1])
>
> I think we should explicitly allow any special character (any valid UTF-8
> character?) to be used in Beam schema field names. But in order to do this
> we will need to provide solutions for some edge cases. To my knowledge
> there are two problems that arise with some special characters in field
> names:
>
1. They can't be mapped to language types (e.g. Java Classes, and
> NamedTuples in python).
>

We already have this problem - i.e. if you name a schema field to be int,
or any other reserved string. We should disambiguate.


> 2. It can make field accesses ambiguous (i.e. does
> `FieldAccessDescriptor.withFieldNames("parent.child")` reference a field
> with that exact name or a nested field?).
>

I still think that we should reserve _some_ special characters. I'm not
sure what the use is for allowing any character to be used.


> We already have some precedent for (1) - Beam SQL produces field names
> like `$col1` for unaliased fields in query outputs, and this is allowed. If
> a user wants to map a schema with a field like this to a POJO, they have to
> first rename the incompatible field(s), or use an @SchemaFieldName
> annotation to map the field name. I think these are reasonable solutions.
>
> We do not have a solution for (2) though. I think we should allow the use
> of a backslash to escape characters that otherwise have special meaning for
> FieldAccessDescriptors (based on [2] this is .[]{}*).
>
> Does anyone have any objection to this proposal, or is there anything I'm
> overlooking? If not, I'm happy to take the task to implement the escape
> character change.
>
> Brian
>
> [1]
> https://github.com/apache/beam/blob/8abc90b/sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/transforms/Select.java#L186-L189
> [2]
> https://github.com/apache/beam/blob/master/sdks/java/core/src/main/antlr/org/apache/beam/sdk/schemas/parser/generated/FieldSpecifierNotation.g4
>

Reply via email to