[ https://issues.apache.org/jira/browse/BEAM-7926?focusedWorklogId=397104&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-397104 ]
ASF GitHub Bot logged work on BEAM-7926: ---------------------------------------- Author: ASF GitHub Bot Created on: 03/Mar/20 22:50 Start Date: 03/Mar/20 22:50 Worklog Time Spent: 10m Work Description: KevinGG commented on pull request #11020: [BEAM-7926] Update Data Visualization URL: https://github.com/apache/beam/pull/11020#discussion_r387281998 ########## File path: sdks/python/apache_beam/runners/interactive/display/pcoll_visualization.py ########## @@ -238,31 +322,57 @@ def _display_dive(self, data, update=None): display(HTML(html)) def _display_overview(self, data, update=None): + if (not data.empty and self._include_window_info and + all(column in data.columns + for column in ('event_time', 'windows', 'pane_info'))): + data = data.drop(['event_time', 'windows', 'pane_info'], axis=1) + gfsg = GenericFeatureStatisticsGenerator() proto = gfsg.ProtoFromDataFrames([{'name': 'data', 'table': data}]) protostr = base64.b64encode(proto.SerializeToString()).decode('utf-8') if update: script = _OVERVIEW_SCRIPT_TEMPLATE.format( - display_id=update, protostr=protostr) + display_id=update._overview_display_id, protostr=protostr) display_javascript(Javascript(script)) else: html = _OVERVIEW_HTML_TEMPLATE.format( display_id=self._overview_display_id, protostr=protostr) display(HTML(html)) def _display_dataframe(self, data, update=None): - if update: - table_id = 'table_{}'.format(update) - html = _DATAFRAME_PAGINATION_TEMPLATE.format( - dataframe_html=data.to_html(notebook=True, table_id=table_id), - table_id=table_id) - update_display(HTML(html), display_id=update) + table_id = 'table_{}'.format( + update._df_display_id if update else self._df_display_id) + columns = [{ + 'title': '' + }] + [{ + 'title': str(column) + } for column in data.columns] + format_window_info_in_dataframe(data) + rows = data.applymap(lambda x: str(x)).to_dict('split')['data'] Review comment: First, we get all the string `data` from the `split` [orient](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_dict.html) of `dataframe.to_dict`. Now the `rows` is a `list` of `row`s of values. Each `row` looks like `[column_1_val, column_2_val, ...]` Then we are going to add datatable column index for the values in each `row`. The index starts from 1 because we are also going to add a column `0` later., so we have `{k+1: v}`. Each `row` now becomes `{1: column_1_val, 2: column_2_val, ...}` Then we add column `0` (`row[0] = k`) of the datatable with values of int based index (which will be the default order column just as the original dataframe). Each `row` now becomes `{1: column_1_val, 2: column_2_val, ..., 0: int_index_in_dataframe}` Then the list of above `row`s get supplied as string in the Javascript to load the data into the table. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking ------------------- Worklog Id: (was: 397104) Time Spent: 51h 50m (was: 51h 40m) > Show PCollection with Interactive Beam in a data-centric user flow > ------------------------------------------------------------------ > > Key: BEAM-7926 > URL: https://issues.apache.org/jira/browse/BEAM-7926 > Project: Beam > Issue Type: New Feature > Components: runner-py-interactive > Reporter: Ning Kang > Assignee: Ning Kang > Priority: Major > Time Spent: 51h 50m > Remaining Estimate: 0h > > Support auto plotting / charting of materialized data of a given PCollection > with Interactive Beam. > Say an Interactive Beam pipeline defined as > > {code:java} > p = beam.Pipeline(InteractiveRunner()) > pcoll = p | 'Transform' >> transform() > pcoll2 = ... > pcoll3 = ...{code} > The use can call a single function and get auto-magical charting of the data. > e.g., > {code:java} > show(pcoll, pcoll2) > {code} > Throughout the process, a pipeline fragment is built to include only > transforms necessary to produce the desired pcolls (pcoll and pcoll2) and > execute that fragment. > This makes the Interactive Beam user flow data-centric. > > Detailed > [design|https://docs.google.com/document/d/1DYWrT6GL_qDCXhRMoxpjinlVAfHeVilK5Mtf8gO6zxQ/edit#heading=h.v6k2o3roarzz]. -- This message was sent by Atlassian Jira (v8.3.4#803005)