Ahmet Altay created BEAM-2386: --------------------------------- Summary: Change regex used for splitting words Key: BEAM-2386 URL: https://issues.apache.org/jira/browse/BEAM-2386 Project: Beam Issue Type: Bug Components: sdk-py Reporter: Ahmet Altay Priority: Minor
Regex used in splitting words ({{[A-Za-z\']+}}) only works on latin input, change it to make it work on non-latin inputs. For example, see Java version: https://github.com/apache/beam/blob/367fcb28d544934797d25cb34d54136b2d7d6e99/examples/java/src/main/java/org/apache/beam/examples/common/ExampleUtils.java#L75 -- This message was sent by Atlassian JIRA (v6.3.15#6346)