Max Moroz created SPARK-16204: --------------------------------- Summary: Row() interfact Key: SPARK-16204 URL: https://issues.apache.org/jira/browse/SPARK-16204 Project: Spark Issue Type: Improvement Components: PySpark Affects Versions: 2.0.0 Reporter: Max Moroz Priority: Trivial
Row('a', 'b') creates a Row-like class, while is slightly unexpected. To create an actual Row, one needs Row(field1 = 'a', field2 = 'b'). Of course Of course, Row('a', 'b')('a', 'b') does create a row. I understand the logic, it's similar to namedtuple. But there's a difference in that namedtuple *only* creates classes, while Row creates both Row-like classes and record-like instances. Wouldn't be possible to do something slightly more safe? Like for example, replace expose the class-creation interface through something else, like a global function, or a Row class method, or a brand new class like RowFactory? Overloading the __init__ to create both records and classes seems unnecessarily dangerous. In addition, the classes created by Row('a', 'b') allow creation of invalid classes (where the field names are not strings); it would be better to catch that early rather than let it happen silently and then fail (like when someone tries to print(Row('a', 42)). And finally, key in Row(field1 = 'a', field2 = 'b') seems to search through the values instead of keys as promised in the documentation at least in 1.6.1 (admittedly the docs only mention it in 2.0.0, but I thought it's not a change between the versions?). -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org