TheNeuralBit commented on code in PR #21803:
URL: https://github.com/apache/beam/pull/21803#discussion_r894921012


##########
sdks/python/apache_beam/ml/inference/sklearn_inference.py:
##########
@@ -94,9 +91,6 @@ class SklearnModelHandlerPandas(ModelHandler[pandas.DataFrame,
                                              BaseEstimator]):
   """ Implementation of the ModelHandler interface for scikit-learn that
       supports pandas dataframes.
-
-      NOTE: This API and its implementation are under development and
-      do not provide backward compatibility guarantees.

Review Comment:
   To be more specific on why this might change - now that the batching DoFn 
infrastructure is in, I'd like to make the pandas sklearn implementation 
leverage it. We'd move to a model where the element type is Beam Row (with 
schema), and the batch type is a pandas DataFrame. As opposed to the current 
model where the batch type is a list of single element dataframes.
   
   Once we do that we could pass data from the DataFrame API (under the hood a 
`PCollection[pd.DataFrame]`) directly to RunInference, without having to 
unbatch it and then batch it back up.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to