fgerlits commented on code in PR #1721:
URL: https://github.com/apache/nifi-minifi-cpp/pull/1721#discussion_r1512842925


##########
extensions/python/PYTHON.md:
##########
@@ -155,10 +158,33 @@ In the flow configuration these Python processors can be 
referenced by their ful
 
 Due to some differences between the NiFi and MiNiFi C++ processors and 
implementation, there are some limitations using the NiFi Python processors:
 - Record based processors are not yet supported in MiNiFi C++, so the NiFi 
Python processors inherited from RecordTransform are not supported.
-- Virtualenv support is not yet available in MiNiFi C++, so all required 
packages must be installed on the system.
 - Controller properties are not supported at the moment.
 - There are some validators in NiFi that are not present in MiNiFi C++, so 
some property validations will be missing using the NiFi Python processors.
 - Allowable values specified in NiFi Python processors are ignored in MiNiFi 
C++ (due to MiNiFi C++ requiring them to be specified at compile time), so the 
property values are not pre-verified.
 - MiNiFi C++ only supports expression language with flow file attributes, so 
only FLOWFILE_ATTRIBUTES expression language scope is supported, otherwise the 
expression language will not be evaluated.
 - MiNiFi C++ does not support property dependencies, so the property 
dependencies will be ignored. If a property depends on another property, the 
property will not be required.
 - MiNiFi C++ does not support the use of self.jvm member in Python processors 
that provides JVM bindings in NiFi, it is set to None in MiNiFi C++.
+- Inline definition of Python package dependencies, defined in the 
ProcessorDetails nested class are not supported as in NiFi, so the dependencies 
must be defined in the requirements.txt files. If a processor's dependencies 
are defined in the ProcessorDetails class, the dependencies should be copied to 
the requirements.txt file.
+
+## Use Python processors from virtualenv
+
+It is possible to set a virtualenv to be used by the Python processors in 
Apache MiNiFi C++. If the virtualenv directory is set, the Python processors 
will be executed using the packages installed in the virtualenv. If the 
virtualenv directory is not set, the Python processors will be executed using 
the packages installed on the system.
+
+    # in minifi.properties
+    nifi.python.virtualenv.directory=${MINIFI_HOME}/minifi-python-env
+
+**NOTE:* Using different python versions for the system and the virtualenv is 
not supported. The virtualenv must be created using the same python version as 
the system python.
+
+## Automatically install dependencies from requirements.txt files
+
+It is possible to automatically install the dependencies of the Python 
processors in the virtualenv defined in requirements.txt files. To enable this 
feature, the `nifi.python.install.packages.automatically` property must be set 
to true. If this property is set to true, then all requirements.txt files that 
appear under the MiNiFi Python directory and its subdirectories (defined by the 
`nifi.python.processor.dir` property) will be used to install the Python 
packages. If the `nifi.python.virtualenv.directory` property is set, the 
packages are installed in the virtualenv, otherwise this option is ignored. Due 
to install schema differences in different platforms, system level packages are 
expected to be installed manually by the user.

Review Comment:
   I find "... otherwise this option is ignored" unclear (which option is 
"this"?)  If it means that for automatic installation, both properties need to 
be set, I would write:
   
   ```suggestion
   It is possible to automatically install the dependencies of the Python 
processors defined in requirements.txt files into a virtualenv. To enable this 
feature, the `nifi.python.install.packages.automatically` property must be set 
to true, and the `nifi.python.virtualenv.directory` property must be set to a 
directory where a virtualenv either already exists, or it can be set up. In 
this case, all requirements.txt files that appear under the MiNiFi Python 
directory (defined by the `nifi.python.processor.dir` property) and its 
subdirectories will be used to install the Python packages into the given 
virtualenv. Due to install schema differences in different platforms, system 
level packages are expected to be installed manually by the user.
   ```



##########
extensions/python/PythonScriptEngine.cpp:
##########
@@ -68,6 +67,73 @@ void initThreads() {
 #pragma warning(pop)
 #endif
 }
+
+std::string encapsulateCommandInQuotesIfNeeded(const std::string& command) {
+#if WIN32
+    return "\"" + command + "\"";
+#else
+    return command;
+#endif
+}
+
+std::vector<std::filesystem::path> getRequirementsFilePaths(const 
std::shared_ptr<Configure> &configuration) {
+  std::vector<std::filesystem::path> paths;
+  if (auto python_processor_path = 
configuration->get(minifi::Configuration::nifi_python_processor_dir)) {
+    for (const auto& entry : 
std::filesystem::recursive_directory_iterator(std::filesystem::path{*python_processor_path}))
 {
+      if (std::filesystem::is_regular_file(entry.path()) && 
entry.path().filename() == "requirements.txt") {
+        paths.push_back(entry.path());
+      }
+    }
+  }
+  return paths;
+}
+
+std::string getPythonBinary(const std::shared_ptr<Configure> &configuration) {
+#if WIN32
+  std::string python_binary = "python";
+#else
+  std::string python_binary = "python3";
+#endif
+  if (auto binary = 
configuration->get(minifi::Configuration::nifi_python_env_setup_binary)) {
+    python_binary = *binary;
+  }
+  return python_binary;
+}
+
+void createVirtualEnvIfSpecified(const std::shared_ptr<Configure> 
&configuration) {
+  if (auto path = 
configuration->get(minifi::Configuration::nifi_python_virtualenv_directory)) {
+    PythonConfigState::getInstance().virtualenv_path = *path;
+    if 
(!std::filesystem::exists(PythonConfigState::getInstance().virtualenv_path) || 
!std::filesystem::is_empty(PythonConfigState::getInstance().virtualenv_path)) {

Review Comment:
   why is the virtualenv creation skipped if the directory is empty?



##########
extensions/python/PythonConfigState.h:
##########
@@ -0,0 +1,67 @@
+/**
+ *
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+#pragma once
+
+#include <vector>
+#include <utility>
+#include <algorithm>
+#include <string>
+#include <memory>
+#include <filesystem>
+#include "core/Core.h"
+#include "core/logging/LoggerConfiguration.h"
+#include "core/Resource.h"
+#include "ExecutePythonProcessor.h"
+#include "PythonObjectFactory.h"
+#include "agent/agent_version.h"
+#include "agent/build_description.h"
+#include "utils/file/FileUtils.h"
+#include "utils/StringUtils.h"
+#include "range/v3/algorithm.hpp"
+#include "properties/Configuration.h"
+#include "utils/file/FilePattern.h"
+#include "range/v3/view/filter.hpp"

Review Comment:
   are all these `#include`s needed in this header file?



##########
docker/test/integration/cluster/containers/MinifiContainer.py:
##########
@@ -33,7 +33,11 @@ def __init__(self):
         self.enable_prometheus = False
         self.enable_prometheus_with_ssl = False
         self.enable_sql = False
-        self.use_nifi_python_processors = False
+        self.use_nifi_python_processors_with_system_python_packages_installed 
= False
+        self.use_nifi_python_processors_with_virtualenv = False
+        self.use_nifi_python_processors_with_virtualenv_packages_installed = 
False
+        self.create_python_virtualenv = False
+        self.install = False

Review Comment:
   Where are these two used?



##########
docker/python-verify/conda.Dockerfile:
##########
@@ -31,14 +31,12 @@ USER root
 RUN wget https://repo.anaconda.com/archive/Anaconda3-2023.09-0-Linux-x86_64.sh 
-P /tmp \
     && echo "6c8a4abb36fbb711dc055b7049a23bbfd61d356de9468b41c5140f8a11abd851 
/tmp/Anaconda3-2023.09-0-Linux-x86_64.sh" | sha256sum -c \
     && bash /tmp/Anaconda3-2023.09-0-Linux-x86_64.sh -b -p /opt/conda  \
-    && chown -R ${USER}:${USER} /opt/conda \
-    && mkdir /home/${USER}  \
-    && chown -R ${USER}:${USER} /home/${USER}
+    && chown -R ${USER}:${USER} /opt/conda
 
 USER ${USER}
 
 RUN ${CONDA_HOME}/bin/conda init bash
-RUN ${CONDA_HOME}/bin/conda install langchain -c conda-forge
+RUN ${CONDA_HOME}/bin/conda install "langchain<=0.17.0" -c conda-forge

Review Comment:
   Why `<= 0.17.0`? A comment could be useful so we can remove this restriction 
when it is no longer needed.



##########
extensions/python/PythonScriptEngine.cpp:
##########
@@ -68,6 +67,73 @@ void initThreads() {
 #pragma warning(pop)
 #endif
 }
+
+std::string encapsulateCommandInQuotesIfNeeded(const std::string& command) {
+#if WIN32
+    return "\"" + command + "\"";
+#else
+    return command;
+#endif
+}
+
+std::vector<std::filesystem::path> getRequirementsFilePaths(const 
std::shared_ptr<Configure> &configuration) {
+  std::vector<std::filesystem::path> paths;
+  if (auto python_processor_path = 
configuration->get(minifi::Configuration::nifi_python_processor_dir)) {
+    for (const auto& entry : 
std::filesystem::recursive_directory_iterator(std::filesystem::path{*python_processor_path}))
 {
+      if (std::filesystem::is_regular_file(entry.path()) && 
entry.path().filename() == "requirements.txt") {
+        paths.push_back(entry.path());
+      }
+    }
+  }
+  return paths;
+}
+
+std::string getPythonBinary(const std::shared_ptr<Configure> &configuration) {
+#if WIN32
+  std::string python_binary = "python";
+#else
+  std::string python_binary = "python3";
+#endif
+  if (auto binary = 
configuration->get(minifi::Configuration::nifi_python_env_setup_binary)) {
+    python_binary = *binary;
+  }
+  return python_binary;
+}
+
+void createVirtualEnvIfSpecified(const std::shared_ptr<Configure> 
&configuration) {
+  if (auto path = 
configuration->get(minifi::Configuration::nifi_python_virtualenv_directory)) {
+    PythonConfigState::getInstance().virtualenv_path = *path;
+    if 
(!std::filesystem::exists(PythonConfigState::getInstance().virtualenv_path) || 
!std::filesystem::is_empty(PythonConfigState::getInstance().virtualenv_path)) {
+      auto venv_command = "\"" + 
PythonConfigState::getInstance().python_binary + "\" -m venv \"" + 
PythonConfigState::getInstance().virtualenv_path.string() + "\"";
+      auto return_value = 
std::system(encapsulateCommandInQuotesIfNeeded(venv_command).c_str());
+      if (return_value != 0) {
+        throw PythonScriptException(fmt::format("The following command 
creating python virtual env failed: '{}'", venv_command));
+      }
+    }
+  }
+}
+
+void installPythonPackagesIfRequested(const std::shared_ptr<Configure> 
&configuration, const std::shared_ptr<core::logging::Logger>& logger) {
+  std::string automatic_install_str;
+  if (!PythonConfigState::getInstance().isPackageInstallationNeeded()) {
+    return;
+  }
+  auto requirement_file_paths = getRequirementsFilePaths(configuration);
+  for (const auto& requirements_file_path : requirement_file_paths) {
+    logger->log_info("Installing python packages from the following 
requirements.txt file: {}", requirements_file_path.string());
+    std::string pip_command;
+#if WIN32
+    
pip_command.append("\"").append((PythonConfigState::getInstance().virtualenv_path
 / "Scripts" / "activate.bat").string()).append("\" && ");
+#else
+    pip_command.append(". 
\"").append((PythonConfigState::getInstance().virtualenv_path / "bin" / 
"activate").string()).append("\" && ");
+#endif
+    
pip_command.append("\"").append(PythonConfigState::getInstance().python_binary).append("\"
 -m pip install --no-cache-dir -r 
\"").append(requirements_file_path.string()).append("\"");
+    auto return_value = 
std::system(encapsulateCommandInQuotesIfNeeded(pip_command).c_str());

Review Comment:
   Do we need `encapsulateCommandInQuotesIfNeeded`?  There are already quotes 
around the executable path-and-filename above.  In fact, is it going to work?  
We'll end up with something like `""\venv_path\Scripts\activate.bat" && 
"python" -m pip install --no-cache-dir -r 
"\requirements_path\requirements.txt""`, ie. double quotes at the beginning and 
the end, which looks strange.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@nifi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to