timsaucer opened a new issue, #713:
URL: https://github.com/apache/datafusion-python/issues/713

   **Is your feature request related to a problem or challenge? Please describe 
what you are trying to do.**
   
   Many users, especially those who want to try out DataFusion for the first 
time will use either notebooks, either Jupyter, Databricks, or others. It would 
be a nice feature to have dataframes shown in these notebooks rendered using 
html like some other dataframe libraries.
   
   **Describe the solution you'd like**
   
   In order to do this, we need to implement `_repr_html_` on the `PyDataFrame` 
object. This can operate in the same manner as `show()` and limit the output to 
a few lines. Additional enhancements could include setting config parameters 
for how much data to show.
   
   **Describe alternatives you've considered**
   
   The other alternative is to continue to use `show()` to inspect the data. 
Users can output the dataframe to pandas and then use it's rendering capability.
   
   **Additional context**
   
   Here is a minimal demonstrable version we could start with in `PyDataFrame`
   
   ```
       fn _repr_html_(&self, py: Python) -> PyResult<String> {
           let mut html_str = "<table border='1'>\n".to_string();
   
   
           let df = self.df.as_ref().clone().limit(0, Some(10))?;
           let batches = wait_for_future(py, df.collect())?;
   
           if batches.is_empty() {
               html_str.push_str("</table>\n");
               return Ok(html_str);
           }
   
           let schema = batches[0].schema();
   
           let mut header = Vec::new();
           for field in schema.fields() {
               header.push(format!("<th>{}</td>", field.name()));
           }
           let header_str = header.join("");
           html_str.push_str(&format!("<tr>{}</tr>\n", header_str));
   
           for batch in batches {
               let formatters = batch
                   .columns()
                   .iter()
                   .map(|c| ArrayFormatter::try_new(c.as_ref(), 
&FormatOptions::default()))
                   .map(|c| c.map_err(|e| PyValueError::new_err(format!("Error: 
{:?}", e.to_string()))))
                   .collect::<Result<Vec<_>, _>>()?;
   
               for row in 0..batch.num_rows() {
                   let mut cells = Vec::new();
                   for formatter in &formatters {
                       cells.push(format!("<td>{}</td>", formatter.value(row)));
                   }
                   let row_str = cells.join("");
                   html_str.push_str(&format!("<tr>{}</tr>\n", row_str));
               }
           }
   
           html_str.push_str("</table>\n");
   
           Ok(html_str)
       }
   ```
   
   This produces the following example:
   ![Screenshot 2024-05-22 at 3 02 07 
PM](https://github.com/apache/datafusion-python/assets/24943992/b69a1522-a711-4173-a570-d2e136d461e7)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to