David Fagnan created SPARK-13179:
------------------------------------
Summary: pyspark row name collision 'count'
Key: SPARK-13179
URL: https://issues.apache.org/jira/browse/SPARK-13179
Project: Spark
Issue Type: Bug
Components: PySpark
Affects Versions: 1.6.0
Reporter: David Fagnan
The following example from the documentation results in a name collision:
{code:none}
>>> df = sc.parallelize([ Row(name='Alice', age=5, height=80),
>>> Row(name='Alice', age=10, height=140)]).toDF()
>>> alice_counts = df.groupby(df.name).count().collect()
>>> print(alice_counts[0])
Row(name=u'Alice',count=2)
>>> print(alice_counts[0].name)
Alice
{code}
Which is correct, but the column name count results in the name collision below:
{code:none}
>>> print(alice_counts[0].count)
<built-in method count of Row object at 0x...>
{code}
The collision results from the inherited method count from python tuples.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]