[ https://issues.apache.org/jira/browse/SPARK-29041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16941289#comment-16941289 ]
Ruslan Dautkhanov commented on SPARK-29041: ------------------------------------------- Thank you [~hyukjin.kwon] Our users say this issue exists in 2.3 too.. could it be possible to apply that patch to 2.3 branch as well? > Allow createDataFrame to accept bytes as binary type > ---------------------------------------------------- > > Key: SPARK-29041 > URL: https://issues.apache.org/jira/browse/SPARK-29041 > Project: Spark > Issue Type: Bug > Components: PySpark > Affects Versions: 2.4.4, 3.0.0 > Reporter: Hyukjin Kwon > Assignee: Hyukjin Kwon > Priority: Major > Fix For: 3.0.0 > > > {code} > spark.createDataFrame([[b"abcd"]], "col binary") > {code} > simply fails as below: > in Python 3 > {code} > Traceback (most recent call last): > File "<stdin>", line 1, in <module> > File "/.../spark/python/pyspark/sql/session.py", line 787, in > createDataFrame > rdd, schema = self._createFromLocal(map(prepare, data), schema) > File "/.../spark/python/pyspark/sql/session.py", line 442, in > _createFromLocal > data = list(data) > File "/.../spark/python/pyspark/sql/session.py", line 769, in prepare > verify_func(obj) > File "/.../forked/spark/python/pyspark/sql/types.py", line 1403, in verify > verify_value(obj) > File "/.../spark/python/pyspark/sql/types.py", line 1384, in verify_struct > verifier(v) > File "/.../spark/python/pyspark/sql/types.py", line 1403, in verify > verify_value(obj) > File "/.../spark/python/pyspark/sql/types.py", line 1397, in verify_default > verify_acceptable_types(obj) > File "/.../spark/python/pyspark/sql/types.py", line 1282, in > verify_acceptable_types > % (dataType, obj, type(obj)))) > TypeError: field col: BinaryType can not accept object b'abcd' in type <class > 'bytes'> > {code} > in Python 2: > {code} > Traceback (most recent call last): > File "<stdin>", line 1, in <module> > File "/.../spark/python/pyspark/sql/session.py", line 787, in > createDataFrame > rdd, schema = self._createFromLocal(map(prepare, data), schema) > File "/.../spark/python/pyspark/sql/session.py", line 442, in > _createFromLocal > data = list(data) > File "/.../spark/python/pyspark/sql/session.py", line 769, in prepare > verify_func(obj) > File "/.../spark/python/pyspark/sql/types.py", line 1403, in verify > verify_value(obj) > File "/.../spark/python/pyspark/sql/types.py", line 1384, in verify_struct > verifier(v) > File "/.../spark/python/pyspark/sql/types.py", line 1403, in verify > verify_value(obj) > File "/.../spark/python/pyspark/sql/types.py", line 1397, in verify_default > verify_acceptable_types(obj) > File "/.../spark/python/pyspark/sql/types.py", line 1282, in > verify_acceptable_types > % (dataType, obj, type(obj)))) > TypeError: field col: BinaryType can not accept object 'abcd' in type <type > 'str'> > {code} > {{bytes}} should also be able to accepted as binary type -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org