Willi Raschkowski created SPARK-47307:
-----------------------------------------

             Summary: Spark 3.3 breaks base64
                 Key: SPARK-47307
                 URL: https://issues.apache.org/jira/browse/SPARK-47307
             Project: Spark
          Issue Type: Bug
          Components: SQL
    Affects Versions: 3.3.0
            Reporter: Willi Raschkowski


SPARK-37820 was introduced in Spark 3.3 and breaks behavior of {{base64}} 
(which is fine but shouldn't happen between minor version).

{code:title=Spark 3.2}
In [1]: lorem = """
   ...: Lorem ipsum dolor sit amet, consectetur adipiscing elit. Nunc ac 
laoreet metus. Curabitur sollicitudin magna ac lacinia ornare. Pellentesque 
semper elit nunc, vestibulum ultricies elit bibendum sed. Praesent vehicula 
sodales odio, tincidunt laoreet diam laoreet non. Mauris condimentum lacinia 
laoreet. Mauris ultrices urna ut sapien dictum commodo faucibu
   ...: s nec nisl. Nulla mattis tincidunt orci eget semper. Etiam dignissim 
finibus mi et lacinia. Curabitur vitae sem commodo, euismod nisl at, molestie 
tortor. Quisque ornare, tortor a vulputate molestie, augue lectus blandit erat, 
nec efficitur justo metus ut dui. Morbi purus lectus, accumsan vitae sem vitae, 
faucibus aliquet quam. Donec euismod, nulla a por
   ...: ta hendrerit, lorem magna vestibulum nunc, et eleifend quam metus quis 
purus.
   ...: 
   ...: Praesent id velit scelerisque, varius eros ac, cursus quam. Duis mollis 
facilisis ante a dictum. Nunc nisl sem, fermentum non sagittis non, convallis 
nec lectus. Praesent nec nulla sed velit interdum tristique sit amet non nisl. 
Pellentesque rhoncus libero urna, eget condimentum orci tristique in. Donec a 
felis eu nisl laoreet efficitur. Integer velit ju
   ...: sto, elementum a faucibus ac, fringilla ac nibh.
   ...: """

In [2]: spark.sql(f"""SELECT base64('{lorem}') AS base64""").collect()[0][0]
Out[2]: 
'CkxvcmVtIGlwc3VtIGRvbG9yIHNpdCBhbWV0LCBjb25zZWN0ZXR1ciBhZGlwaXNjaW5nIGVsaXQuIE51bmMgYWMgbGFvcmVldCBtZXR1cy4gQ3VyYWJpdHVyIHNvbGxpY2l0dWRpbiBtYWduYSBhYyBsYWNpbmlhIG9ybmFyZS4gUGVsbGVudGVzcXVlIHNlbXBlciBlbGl0IG51bmMsIHZlc3RpYnVsdW0gdWx0cmljaWVzIGVsaXQgYmliZW5kdW0gc2VkLiBQcmFlc2VudCB2ZWhpY3VsYSBzb2RhbGVzIG9kaW8sIHRpbmNpZHVudCBsYW9yZWV0IGRpYW0gbGFvcmVldCBub24uIE1hdXJpcyBjb25kaW1lbnR1bSBsYWNpbmlhIGxhb3JlZXQuIE1hdXJpcyB1bHRyaWNlcyB1cm5hIHV0IHNhcGllbiBkaWN0dW0gY29tbW9kbyBmYXVjaWJ1cyBuZWMgbmlzbC4gTnVsbGEgbWF0dGlzIHRpbmNpZHVudCBvcmNpIGVnZXQgc2VtcGVyLiBFdGlhbSBkaWduaXNzaW0gZmluaWJ1cyBtaSBldCBsYWNpbmlhLiBDdXJhYml0dXIgdml0YWUgc2VtIGNvbW1vZG8sIGV1aXNtb2QgbmlzbCBhdCwgbW9sZXN0aWUgdG9ydG9yLiBRdWlzcXVlIG9ybmFyZSwgdG9ydG9yIGEgdnVscHV0YXRlIG1vbGVzdGllLCBhdWd1ZSBsZWN0dXMgYmxhbmRpdCBlcmF0LCBuZWMgZWZmaWNpdHVyIGp1c3RvIG1ldHVzIHV0IGR1aS4gTW9yYmkgcHVydXMgbGVjdHVzLCBhY2N1bXNhbiB2aXRhZSBzZW0gdml0YWUsIGZhdWNpYnVzIGFsaXF1ZXQgcXVhbS4gRG9uZWMgZXVpc21vZCwgbnVsbGEgYSBwb3J0YSBoZW5kcmVyaXQsIGxvcmVtIG1hZ25hIHZlc3RpYnVsdW0gbnVuYywgZXQgZWxlaWZlbmQgcXVhbSBtZXR1cyBxdWlzIHB1cnVzLgoKUHJhZXNlbnQgaWQgdmVsaXQgc2NlbGVyaXNxdWUsIHZhcml1cyBlcm9zIGFjLCBjdXJzdXMgcXVhbS4gRHVpcyBtb2xsaXMgZmFjaWxpc2lzIGFudGUgYSBkaWN0dW0uIE51bmMgbmlzbCBzZW0sIGZlcm1lbnR1bSBub24gc2FnaXR0aXMgbm9uLCBjb252YWxsaXMgbmVjIGxlY3R1cy4gUHJhZXNlbnQgbmVjIG51bGxhIHNlZCB2ZWxpdCBpbnRlcmR1bSB0cmlzdGlxdWUgc2l0IGFtZXQgbm9uIG5pc2wuIFBlbGxlbnRlc3F1ZSByaG9uY3VzIGxpYmVybyB1cm5hLCBlZ2V0IGNvbmRpbWVudHVtIG9yY2kgdHJpc3RpcXVlIGluLiBEb25lYyBhIGZlbGlzIGV1IG5pc2wgbGFvcmVldCBlZmZpY2l0dXIuIEludGVnZXIgdmVsaXQganVzdG8sIGVsZW1lbnR1bSBhIGZhdWNpYnVzIGFjLCBmcmluZ2lsbGEgYWMgbmliaC4K'
{code}

Note the different output in Spark 3.3 (the addition of {{\r\n}} newlines).

{code:title=Spark 3.3}
In [1]: lorem = """
   ...: Lorem ipsum dolor sit amet, consectetur adipiscing elit. Nunc ac 
laoreet metus. Curabitur sollicitudin magna ac lacinia ornare. Pellentesque 
semper elit nunc, vestibulum ultricies elit bibendum sed. Praesent vehicula 
sodales odio, tincidunt laoreet diam laoreet non. Mauris condimentum lacinia 
laoreet. Mauris ultrices urna ut sapien dictum commodo faucibu
   ...: s nec nisl. Nulla mattis tincidunt orci eget semper. Etiam dignissim 
finibus mi et lacinia. Curabitur vitae sem commodo, euismod nisl at, molestie 
tortor. Quisque ornare, tortor a vulputate molestie, augue lectus blandit erat, 
nec efficitur justo metus ut dui. Morbi purus lectus, accumsan vitae sem vitae, 
faucibus aliquet quam. Donec euismod, nulla a por
   ...: ta hendrerit, lorem magna vestibulum nunc, et eleifend quam metus quis 
purus.
   ...: 
   ...: Praesent id velit scelerisque, varius eros ac, cursus quam. Duis mollis 
facilisis ante a dictum. Nunc nisl sem, fermentum non sagittis non, convallis 
nec lectus. Praesent nec nulla sed velit interdum tristique sit amet non nisl. 
Pellentesque rhoncus libero urna, eget condimentum orci tristique in. Donec a 
felis eu nisl laoreet efficitur. Integer velit ju
   ...: sto, elementum a faucibus ac, fringilla ac nibh.
   ...: """

In [2]: spark.sql(f"""SELECT base64('{lorem}') AS base64""").collect()[0][0]
Out[2]: 
'CkxvcmVtIGlwc3VtIGRvbG9yIHNpdCBhbWV0LCBjb25zZWN0ZXR1ciBhZGlwaXNjaW5nIGVsaXQu\r\nIE51bmMgYWMgbGFvcmVldCBtZXR1cy4gQ3VyYWJpdHVyIHNvbGxpY2l0dWRpbiBtYWduYSBhYyBs\r\nYWNpbmlhIG9ybmFyZS4gUGVsbGVudGVzcXVlIHNlbXBlciBlbGl0IG51bmMsIHZlc3RpYnVsdW0g\r\ndWx0cmljaWVzIGVsaXQgYmliZW5kdW0gc2VkLiBQcmFlc2VudCB2ZWhpY3VsYSBzb2RhbGVzIG9k\r\naW8sIHRpbmNpZHVudCBsYW9yZWV0IGRpYW0gbGFvcmVldCBub24uIE1hdXJpcyBjb25kaW1lbnR1\r\nbSBsYWNpbmlhIGxhb3JlZXQuIE1hdXJpcyB1bHRyaWNlcyB1cm5hIHV0IHNhcGllbiBkaWN0dW0g\r\nY29tbW9kbyBmYXVjaWJ1cyBuZWMgbmlzbC4gTnVsbGEgbWF0dGlzIHRpbmNpZHVudCBvcmNpIGVn\r\nZXQgc2VtcGVyLiBFdGlhbSBkaWduaXNzaW0gZmluaWJ1cyBtaSBldCBsYWNpbmlhLiBDdXJhYml0\r\ndXIgdml0YWUgc2VtIGNvbW1vZG8sIGV1aXNtb2QgbmlzbCBhdCwgbW9sZXN0aWUgdG9ydG9yLiBR\r\ndWlzcXVlIG9ybmFyZSwgdG9ydG9yIGEgdnVscHV0YXRlIG1vbGVzdGllLCBhdWd1ZSBsZWN0dXMg\r\nYmxhbmRpdCBlcmF0LCBuZWMgZWZmaWNpdHVyIGp1c3RvIG1ldHVzIHV0IGR1aS4gTW9yYmkgcHVy\r\ndXMgbGVjdHVzLCBhY2N1bXNhbiB2aXRhZSBzZW0gdml0YWUsIGZhdWNpYnVzIGFsaXF1ZXQgcXVh\r\nbS4gRG9uZWMgZXVpc21vZCwgbnVsbGEgYSBwb3J0YSBoZW5kcmVyaXQsIGxvcmVtIG1hZ25hIHZl\r\nc3RpYnVsdW0gbnVuYywgZXQgZWxlaWZlbmQgcXVhbSBtZXR1cyBxdWlzIHB1cnVzLgoKUHJhZXNl\r\nbnQgaWQgdmVsaXQgc2NlbGVyaXNxdWUsIHZhcml1cyBlcm9zIGFjLCBjdXJzdXMgcXVhbS4gRHVp\r\ncyBtb2xsaXMgZmFjaWxpc2lzIGFudGUgYSBkaWN0dW0uIE51bmMgbmlzbCBzZW0sIGZlcm1lbnR1\r\nbSBub24gc2FnaXR0aXMgbm9uLCBjb252YWxsaXMgbmVjIGxlY3R1cy4gUHJhZXNlbnQgbmVjIG51\r\nbGxhIHNlZCB2ZWxpdCBpbnRlcmR1bSB0cmlzdGlxdWUgc2l0IGFtZXQgbm9uIG5pc2wuIFBlbGxl\r\nbnRlc3F1ZSByaG9uY3VzIGxpYmVybyB1cm5hLCBlZ2V0IGNvbmRpbWVudHVtIG9yY2kgdHJpc3Rp\r\ncXVlIGluLiBEb25lYyBhIGZlbGlzIGV1IG5pc2wgbGFvcmVldCBlZmZpY2l0dXIuIEludGVnZXIg\r\ndmVsaXQganVzdG8sIGVsZW1lbnR1bSBhIGZhdWNpYnVzIGFjLCBmcmluZ2lsbGEgYWMgbmliaC4K'
{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to