TheNeuralBit commented on code in PR #21803: URL: https://github.com/apache/beam/pull/21803#discussion_r894926019
########## sdks/python/apache_beam/ml/inference/sklearn_inference.py: ########## @@ -43,13 +41,26 @@ class ModelFileType(enum.Enum): JOBLIB = 2 -class SklearnModelHandler(ModelHandler[Union[numpy.ndarray, pandas.DataFrame], - PredictionResult, - BaseEstimator]): - """ Implementation of the ModelHandler interface for scikit-learn. +def _load_model(model_uri, file_type): + file = FileSystems.open(model_uri, 'rb') + if file_type == ModelFileType.PICKLE: + return pickle.load(file) + elif file_type == ModelFileType.JOBLIB: + if not joblib: + raise ImportError( + 'Could not import joblib in this execution environment. ' + 'For help with managing dependencies on Python workers.' + 'see https://beam.apache.org/documentation/sdks/python-pipeline-dependencies/' # pylint: disable=line-too-long Review Comment: The rule is here: https://github.com/apache/beam/blob/b8e2e85ab1fb37a2f89ed20d88730e591ea3bf7e/sdks/python/.pylintrc#L180 It looks like that regex won't actually work in this case, but it could be updated to handle string literals (in addition to comments) if you want ########## sdks/python/apache_beam/ml/inference/sklearn_inference.py: ########## @@ -43,13 +41,26 @@ class ModelFileType(enum.Enum): JOBLIB = 2 -class SklearnModelHandler(ModelHandler[Union[numpy.ndarray, pandas.DataFrame], - PredictionResult, - BaseEstimator]): - """ Implementation of the ModelHandler interface for scikit-learn. +def _load_model(model_uri, file_type): + file = FileSystems.open(model_uri, 'rb') + if file_type == ModelFileType.PICKLE: + return pickle.load(file) + elif file_type == ModelFileType.JOBLIB: + if not joblib: + raise ImportError( + 'Could not import joblib in this execution environment. ' + 'For help with managing dependencies on Python workers.' + 'see https://beam.apache.org/documentation/sdks/python-pipeline-dependencies/' # pylint: disable=line-too-long Review Comment: (no action needed here right now, just FYI) I *think* we have a pylint rule that makes it so the line-too-long check is disabled if it starts with a url, so breaking it out like this might help: ```suggestion 'see ' 'https://beam.apache.org/documentation/sdks/python-pipeline-dependencies/' ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
