Github user holdenk commented on a diff in the pull request: https://github.com/apache/spark/pull/21654#discussion_r217808692 --- Diff: python/pyspark/sql/dataframe.py --- @@ -375,6 +375,9 @@ def _truncate(self): return int(self.sql_ctx.getConf( "spark.sql.repl.eagerEval.truncate", "20")) + def __len__(self): --- End diff -- Well those are a bit harder to say, I _think_ `iter` might be reasonable (main concern is if folks tried to use `map(lambda x, df)`) but those aren't the parts of the API we're talking about right now and is starting to boarder on a broader design decision we should consider taking to the list. Given the timeline of 3 this seems like a good time to have these discussions anyways -- maybe we can look at Dask for some inspiration on how to provide a more python friendly API while still encouraging good design on the part of our users. That being said, I think the potential confusion of `iter` or indexing into a DF shouldn't block adding other more reasonable helpers.
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org