[ 
https://issues.apache.org/jira/browse/BEAM-9007?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía reassigned BEAM-9007:
----------------------------------

    Assignee:     (was: Aizhamal Nurmamat kyzy)

> beam.DoFn setup() will call several times when using python subprocess
> ----------------------------------------------------------------------
>
>                 Key: BEAM-9007
>                 URL: https://issues.apache.org/jira/browse/BEAM-9007
>             Project: Beam
>          Issue Type: Bug
>          Components: sdk-py-core
>    Affects Versions: 2.15.0, 2.16.0
>         Environment: python 3.5
> apache-beam[gcp] == 2.16.*
> google-cloud-storage == 1.23.*
> google-resumable-media == 0.5.*
> googleapis-common-protos == 1.6.*
> grpc-google-logging-v2 == 0.11.*
>            Reporter: Hokuto Tateyama
>            Priority: Minor
>
> Hello. 
>  I`m trying to use a make command on dataflow to use OpenCV source written in 
> C++.
> I was thinking, *setup()* function on *beam.DoFn* will run only once a time 
> before the process runs.
>  So I tried to run build commands on the setup() function, and it will run 
> successfully.
> h1. Problem
> After the running process, the setup() function will run again and try to 
> build commands several times. I`ve checked these logs from my stack driver.
> h1. Codes
> These are my codes using dataflow. I defined the command_list in the class 
> that inheritance from beam.DoFn and call run_cmd() from setup().
> ・Run command lines.
> {code:python}
> def run_cmd(command_list: List[List[str]], shell: bool = False) -> 
> List[Dict[str, Any]]:
>       outputs = []
>       try:
>               for cmd in command_list:
>                       logging.info(cmd)
>                       proc = subprocess.check_output(
>                       cmd, shell=shell, stderr=subprocess.STDOUT, 
> universal_newlines=True)
>                       outputs.append({“Input: “: cmd, “Output: “: proc})
>       except subprocess.CalledProcessError as e:
>               logging.warning(“Return code:{}, 
> Output:{}”.format(e.returncode, e.output))
>       return outputs{code}
> ・Command list to pass run_cmd() function.
> {code:python}
> command_list = [
>     [“cat /etc/issue”],
>     [“apt-get —assume-yes update”],
>     [
>         “apt-get —assume-yes install —no-install-recommends ffmpeg git 
> software-properties-common”
>     ],
>     [“apt-get install -y software-properties-common”],
>     [
>         ‘add-apt-repository -s “deb http://security.ubuntu.com/ubuntu 
> bionic-security main”’
>     ],
>     [
>         “apt-get install -y build-essential checkinstall cmake unzip 
> pkg-config yasm unzip”
>     ],
>     [“apt-get -y install git gfortran python3-dev”],
>     [
>         “apt-get -y install libjpeg62-turbo-dev libpng-dev libpng16-16 
> libavcodec-dev libavformat-dev libswscale-dev libdc1394-22-dev libxine2-dev 
> libv4l-dev”
>     ],
>     [“apt-get -y install libjpeg-dev libpng-dev libtiff-dev libtbb-dev”],
>     [
>         “apt-get -y install libavcodec-dev libavformat-dev libswscale-dev 
> libv4l-dev libatlas-base-dev libxvidcore-dev libx264-dev libgtk-3-dev”
>     ],
>     [“apt-get clean”],
>     [“rm -rf /var/lib/apt/lists/*”],
>     [“git clone https://github.com/opencv/opencv.git”],
>     [“git clone https://github.com/opencv/opencv_contrib.git”],
>     [“cd opencv_contrib”],
>     [“git checkout -b 3.4.3 refs/tags/3.4.3”],
>     [“cd ../opencv/“],
>     [“git checkout -b 3.4.3 refs/tags/3.4.3”],
>     [“mkdir build”],
>     [“cd build”],
>     [
>         “cmake -D CMAKE_BUILD_TYPE=Release \
>                 -D CMAKE_INSTALL_PREFIX=/usr/local \
>                 -D WITH_TBB=ON \
>                 -D OPENCV_EXTRA_MODULES_PATH=../../opencv_contrib/modules ..”
>     ],
>     [“make -j8”],
>     [“make install”],
>     [“echo /usr/local/lib > /etc/ld.so.conf.d/opencv.conf”],
>     [“ldconfig -v”]
> ]
> {code}
> h1. Question
> For my summary, I`m wondering if these are bugs for apache beam.
>  # What is the reason for calling setup() several times?
>  # Is there any solution to set up these commands only once in the total 
> running? This is a method what I tried.
>  ## Using os.system() instead of subprocess. I think subprocess will create 
> another process on setup() so, it can not extract process finished 
> successfully.
>  ## Writing commands on setup.py and use it for CustomCommand
>  [https://beam.apache.org/documentation/sdks/python-pipeline-dependencies/]
>  
> Regards, Collonville



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to