Spaarsh commented on issue #1026:
URL:
https://github.com/apache/datafusion-python/issues/1026#issuecomment-2689730430
I'd like to work on this issue. Adding a few lines of code along the lines
of:
```
fn __repr__(&self, py: Python) -> PyDataFusionResult<String> {
let df = self.df.as_ref().clone().limit(0, Some(11))?;
let batches = wait_for_future(py, df.collect())?;
let num_rows = batches.iter().map(|batch|
batch.num_rows()).sum::<usize>();
let limited_batches =
batches.iter().take(10).cloned().collect::<Vec<_>>();
let batches_as_string = pretty::pretty_format_batches(&limited_batches);
match batches_as_string {
Ok(batch) => {
if num_rows > 10 {
Ok(format!("DataFrame()\n{batch}\nand more..."))
} else {
Ok(format!("DataFrame()\n{batch}"))
}
}
Err(err) => Ok(format!("Error: {:?}", err.to_string())),
}
}
```
Should suffice, I suppose?
> You could also implement a "config" system like pandas uses, so the user
can opt-in to displaying more columns or rows
https://pandas.pydata.org/docs/user_guide/options.html#overview
As for the config, we'd need to decide on a particular format. I would
suggest ```toml``` since it is used by ```Cargo```. But that in itself requires
a new issue since I am sure there can be a host of other things that could
benefit from this system.
We could start from this issue itself too if it is alright.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]