Indeed. in spark-shell I ignore the parentheses always,
scala> sc.parallelize(List(3,2,1,4)).toDF.show
+-----+
|value|
+-----+
| 3|
| 2|
| 1|
| 4|
+-----+
So I think it would be ok in pyspark.
But this still doesn't work. why?
sc.parallelize([3,2,1,4]).toDF().show()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/opt/spark/python/pyspark/sql/session.py", line 66, in toDF
return sparkSession.createDataFrame(self, schema, sampleRatio)
File "/opt/spark/python/pyspark/sql/session.py", line 675, in
createDataFrame
return self._create_dataframe(data, schema, samplingRatio,
verifySchema)
File "/opt/spark/python/pyspark/sql/session.py", line 698, in
_create_dataframe
rdd, schema = self._createFromRDD(data.map(prepare), schema,
samplingRatio)
File "/opt/spark/python/pyspark/sql/session.py", line 486, in
_createFromRDD
struct = self._inferSchema(rdd, samplingRatio, names=schema)
File "/opt/spark/python/pyspark/sql/session.py", line 466, in
_inferSchema
schema = _infer_schema(first, names=names)
File "/opt/spark/python/pyspark/sql/types.py", line 1067, in
_infer_schema
raise TypeError("Can not infer schema for type: %s" % type(row))
TypeError: Can not infer schema for type: <class 'int'>
spark 3.2.0
On 07/02/2022 11:44, Sean Owen wrote:
This is just basic Python - you're missing parentheses on toDF, so you
are not calling a function nor getting its result.
On Sun, Feb 6, 2022 at 9:39 PM <capitnfrak...@free.fr> wrote:
I am a bit confused why in pyspark this doesn't work?
x = sc.parallelize([3,2,1,4])
x.toDF.show()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: 'function' object has no attribute 'show'
Thank you.
---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscr...@spark.apache.org
---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscr...@spark.apache.org