TheNeuralBit commented on code in PR #21803:
URL: https://github.com/apache/beam/pull/21803#discussion_r894926019


##########
sdks/python/apache_beam/ml/inference/sklearn_inference.py:
##########
@@ -43,13 +41,26 @@ class ModelFileType(enum.Enum):
   JOBLIB = 2
 
 
-class SklearnModelHandler(ModelHandler[Union[numpy.ndarray, pandas.DataFrame],
-                                       PredictionResult,
-                                       BaseEstimator]):
-  """ Implementation of the ModelHandler interface for scikit-learn.
+def _load_model(model_uri, file_type):
+  file = FileSystems.open(model_uri, 'rb')
+  if file_type == ModelFileType.PICKLE:
+    return pickle.load(file)
+  elif file_type == ModelFileType.JOBLIB:
+    if not joblib:
+      raise ImportError(
+          'Could not import joblib in this execution environment. '
+          'For help with managing dependencies on Python workers.'
+          'see 
https://beam.apache.org/documentation/sdks/python-pipeline-dependencies/'  # 
pylint: disable=line-too-long

Review Comment:
   The rule is here: 
https://github.com/apache/beam/blob/b8e2e85ab1fb37a2f89ed20d88730e591ea3bf7e/sdks/python/.pylintrc#L180
   
   It looks like that regex won't actually work in this case, but it could be 
updated to handle string literals (in addition to comments) if you want



##########
sdks/python/apache_beam/ml/inference/sklearn_inference.py:
##########
@@ -43,13 +41,26 @@ class ModelFileType(enum.Enum):
   JOBLIB = 2
 
 
-class SklearnModelHandler(ModelHandler[Union[numpy.ndarray, pandas.DataFrame],
-                                       PredictionResult,
-                                       BaseEstimator]):
-  """ Implementation of the ModelHandler interface for scikit-learn.
+def _load_model(model_uri, file_type):
+  file = FileSystems.open(model_uri, 'rb')
+  if file_type == ModelFileType.PICKLE:
+    return pickle.load(file)
+  elif file_type == ModelFileType.JOBLIB:
+    if not joblib:
+      raise ImportError(
+          'Could not import joblib in this execution environment. '
+          'For help with managing dependencies on Python workers.'
+          'see 
https://beam.apache.org/documentation/sdks/python-pipeline-dependencies/'  # 
pylint: disable=line-too-long

Review Comment:
   (no action needed here right now, just FYI) I *think* we have a pylint rule 
that makes it so the line-too-long check is disabled if it starts with a url, 
so breaking it out like this might help:
   ```suggestion
             'see '
             
'https://beam.apache.org/documentation/sdks/python-pipeline-dependencies/'
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to