timsaucer commented on code in PR #1015: URL: https://github.com/apache/datafusion-python/pull/1015#discussion_r1957173874
########## src/dataframe.rs: ########## @@ -90,8 +91,16 @@ impl PyDataFrame { } fn __repr__(&self, py: Python) -> PyDataFusionResult<String> { - let df = self.df.as_ref().clone().limit(0, Some(10))?; - let batches = wait_for_future(py, df.collect())?; + let df = self.df.as_ref().clone(); + + let stream = wait_for_future(py, df.execute_stream()).map_err(py_datafusion_err)?; + + let batches: Vec<RecordBatch> = wait_for_future( + py, + stream.take(10).collect::<Vec<_>>()) + .into_iter() + .collect::<Result<Vec<_>,_>>()?; + Review Comment: I did a test and this changes how `__repr__` works from what we currently have. With this change it looks like it is returning the first 10 record batches instead of the first 10 rows, as I would expect. The idea of putting the `limit(0, Some(10))` into the logical plan was so that you can get a small sampling of the data. I think we need to change this to support the bug but also to make sure we don't change the output here. I suspect we have the same problem for `__repr_html__` ########## src/dataframe.rs: ########## @@ -90,8 +91,16 @@ impl PyDataFrame { } fn __repr__(&self, py: Python) -> PyDataFusionResult<String> { - let df = self.df.as_ref().clone().limit(0, Some(10))?; - let batches = wait_for_future(py, df.collect())?; + let df = self.df.as_ref().clone(); + + let stream = wait_for_future(py, df.execute_stream()).map_err(py_datafusion_err)?; + + let batches: Vec<RecordBatch> = wait_for_future( + py, + stream.take(10).collect::<Vec<_>>()) + .into_iter() + .collect::<Result<Vec<_>,_>>()?; + Review Comment: As a side note, I wonder if we want to enhance `__repr__` to also check to see the total number of rows in the DataFrame. My guess is that we don't want to do that. But if we did we could add a line int the return that was something like `... {} additional rows`. A lighter weight would be to change the limit to 11, get the number returned, show the first 10 and if there was 11 returned to sad `... and additional rows` just so the end user knows that you're only seeing a portion of the DF. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org