looks fine except that processing all Unicode whitespace characters might
add overhead to the parsing process, potentially impacting performance.
Although I think this is a moot point

+1

Mich Talebzadeh,
Technologist | Solutions Architect | Data Engineer  | Generative AI
London
United Kingdom


   view my Linkedin profile
<https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>


 https://en.everybodywiki.com/Mich_Talebzadeh



*Disclaimer:* The information provided is correct to the best of my
knowledge but of course cannot be guaranteed . It is essential to note
that, as with any advice, quote "one test result is worth one-thousand
expert opinions (Werner  <https://en.wikipedia.org/wiki/Wernher_von_Braun>Von
Braun <https://en.wikipedia.org/wiki/Wernher_von_Braun>)".


On Wed, 27 Mar 2024 at 22:57, Gengliang Wang <ltn...@gmail.com> wrote:

> +1, this is a reasonable change.
>
> Gengliang
>
> On Wed, Mar 27, 2024 at 9:54 AM serge rielau.com <se...@rielau.com> wrote:
>
>> Going once, going twice, …. last call for objections
>> On Mar 23, 2024 at 5:29 PM -0700, serge rielau.com <se...@rielau.com>,
>> wrote:
>>
>> Hello,
>>
>> I have a PR https://github.com/apache/spark/pull/45620  ready to go that
>> will extend the definition of whitespace (what separates token) from the
>> small set of ASCII characters space, tab, linefeed to those defined in
>> Unicode.
>> While this is a small and safe change, it is one where we would have a
>> hard time changing our minds about later.
>> It is also a change that, AFAIK, cannot be controlled under a config.
>>
>> What does the community think?
>>
>> Cheers
>> Serge
>> SQL Architect at Databricks
>>
>>

Reply via email to