WweiL commented on code in PR #46996: URL: https://github.com/apache/spark/pull/46996#discussion_r1649448417
########## python/pyspark/sql/connect/dataframe.py: ########## @@ -206,7 +208,10 @@ def _repr_html_(self) -> Optional[str]: @property def write(self) -> "DataFrameWriter": - return DataFrameWriter(self._plan, self._session) + def cb(qe: "ExecutionInfo") -> None: + self._execution_info = qe + + return DataFrameWriter(self._plan, self._session, cb) Review Comment: Looks like writeStream is not override here. So I imagine streaming query is not supported yet. In streaming a query could have multiple data frames, what we do in scala is to access it with query.lastExecution https://github.com/apache/spark/blob/94763438943e21ee8156a4f1a3facddbf8d45797/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamExecution.scala#L192 That's, as it's name, the QueryExecution(`IncrementalExecution`) of the last execution. We could also add a similar mechanism to StreamingQuery object. This sounds like an interesting followup that im interested in ########## python/pyspark/sql/connect/dataframe.py: ########## @@ -206,7 +208,10 @@ def _repr_html_(self) -> Optional[str]: @property def write(self) -> "DataFrameWriter": - return DataFrameWriter(self._plan, self._session) + def cb(qe: "ExecutionInfo") -> None: + self._execution_info = qe + + return DataFrameWriter(self._plan, self._session, cb) Review Comment: Looks like writeStream is not overriden here. So I imagine streaming query is not supported yet. In streaming a query could have multiple data frames, what we do in scala is to access it with query.lastExecution https://github.com/apache/spark/blob/94763438943e21ee8156a4f1a3facddbf8d45797/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamExecution.scala#L192 That's, as it's name, the QueryExecution(`IncrementalExecution`) of the last execution. We could also add a similar mechanism to StreamingQuery object. This sounds like an interesting followup that im interested in ########## python/pyspark/sql/connect/dataframe.py: ########## @@ -206,7 +208,10 @@ def _repr_html_(self) -> Optional[str]: @property def write(self) -> "DataFrameWriter": - return DataFrameWriter(self._plan, self._session) + def cb(qe: "ExecutionInfo") -> None: + self._execution_info = qe + + return DataFrameWriter(self._plan, self._session, cb) Review Comment: Looks like writeStream is not overriden here. So I imagine streaming query is not supported yet. In streaming a query could have multiple data frames, what we do in scala is to access it with query.explain(), which uses this lastExecution https://github.com/apache/spark/blob/94763438943e21ee8156a4f1a3facddbf8d45797/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamExecution.scala#L192 That's, as it's name, the QueryExecution(`IncrementalExecution`) of the last execution. We could also add a similar mechanism to StreamingQuery object. This sounds like an interesting followup that im interested in -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org