Re: [Question] Dataproc 1.5 - Flink version conflict

2022-02-24 Thread Andoni Guzman Becerra
Yes, I tried with Dataproc 2.0 and Flink 1.12. Cluster creation was fine.
But at the moment of start Groovy testing fails. This is the error.

*Task :sdks:python:apache_beam:testing:load_tests:run**16:44:49*
INFO:apache_beam.testing.load_tests.load_test_metrics_utils:Missing
InfluxDB options. Metrics will not be published to InfluxDB*16:44:50*
WARNING:root:Make sure that locally built Python SDK docker image has
Python 3.7 interpreter.*16:44:50* INFO:root:Default Python SDK image
for environment is apache/beam_python3.7_sdk:2.38.0.dev*16:44:50*
INFO:root:Using provided Python SDK container image:
gcr.io/apache-beam-testing/beam_portability/beam_python3.7_sdk:latest*16:44:50*
INFO:root:Python SDK container image set to
"gcr.io/apache-beam-testing/beam_portability/beam_python3.7_sdk:latest"
for Docker environment*16:44:50*
WARNING:apache_beam.options.pipeline_options:Discarding unparseable
args: ['--fanout=4', '--top_count=20',
'--use_stateful_load_generator']*16:44:51*
INFO:apache_beam.runners.portability.portable_runner:Job state changed
to STOPPED*16:44:51*
INFO:apache_beam.runners.portability.portable_runner:Job state changed
to STARTING*16:44:51*
INFO:apache_beam.runners.portability.portable_runner:Job state changed
to RUNNING*16:45:14* ERROR:root:java.lang.NoClassDefFoundError: Could
not initialize class
org.apache.beam.runners.core.construction.SerializablePipelineOptions*16:45:15*
INFO:apache_beam.runners.portability.portable_runner:Job state changed
to FAILED*16:45:15* Traceback (most recent call last):*16:45:15*
File "/usr/lib/python3.7/runpy.py", line 193, in
_run_module_as_main*16:45:15* "__main__", mod_spec)*16:45:15*
File "/usr/lib/python3.7/runpy.py", line 85, in _run_code*16:45:15*
 exec(code, run_globals)*16:45:15*   File
"/home/jenkins/jenkins-slave/workspace/beam_LoadTests_Python_Combine_Flink_Streaming_PR/src/sdks/python/apache_beam/testing/load_tests/combine_test.py",
line 129, in *16:45:15* CombineTest().run()*16:45:15*
File 
"/home/jenkins/jenkins-slave/workspace/beam_LoadTests_Python_Combine_Flink_Streaming_PR/src/sdks/python/apache_beam/testing/load_tests/load_test.py",
line 151, in run*16:45:15*
self.result.wait_until_finish(duration=self.timeout_ms)*16:45:15*
File 
"/home/jenkins/jenkins-slave/workspace/beam_LoadTests_Python_Combine_Flink_Streaming_PR/src/sdks/python/apache_beam/runners/portability/portable_runner.py",
line 600, in wait_until_finish*16:45:15* raise
self._runtime_exception*16:45:15* RuntimeError: Pipeline
load-tests-python-flink-streaming-combine-4-0223222352_89cc297d-d8b8-45cc-8ebc-97f5a50f43a8
failed in state FAILED: java.lang.NoClassDefFoundError: Could not
initialize class
org.apache.beam.runners.core.construction.SerializablePipelineOptions*16:45:15*
*16:45:15* >* Task :sdks:python:apache_beam:testing:load_tests:run*
FAILED



I'm trying to find the reason for this error, could it be the version? Some
dependency? At this moment I didn't find too much info about it.
This could help
https://stackoverflow.com/questions/62438382/run-beam-pipeline-with-flink-yarn-session-on-emr

Any help would be appreciated , thanks a lot

On Mon, Feb 14, 2022 at 2:42 PM Kyle Weaver  wrote:

> Can we use Dataproc 2.0, which supports Flink 1.12? (Since Flink 1.12 is
> still supported by Beam)
>
> On Mon, Feb 14, 2022 at 11:20 AM Andoni Guzman Becerra <
> andoni.guz...@wizeline.com> wrote:
>
>> Hi All, I'm working trying to re-enable some tests like
>> LoadTests_Combine_Flink_Python.groovy and fix some vms leaked in those
>> tests. https://issues.apache.org/jira/browse/BEAM-12898
>> The version of dataproc used before was 1.2 and now it's 1.5.
>> The problem is that dataproc 1.5  flink version is 1.9 and actually we
>> use flink 1.13. Causing a mismatch and error running the tests.
>> In dataproc 1.2 a init script was passed with all the info related with
>> flink version, but now in optional components only told the component to
>> install
>>
>> This was the way to create a cluster in dataproc 1.2
>>
>>  gcloud dataproc clusters create $CLUSTER_NAME --region=global
>> --num-workers=$num_dataproc_workers --initialization-actions
>> $DOCKER_INIT,$BEAM_INIT,$FLINK_INIT --metadata "${metadata}",
>> --image-version=$image_version --zone=$GCLOUD_ZONE --quiet
>>
>> And this is the way to do it in dataproc 1.5
>>
>> gcloud dataproc clusters create $CLUSTER_NAME --region=global
>> --num-workers=$num_dataproc_workers  --metadata "${metadata}",
>> --image-version=$image_version --zone=$GCLOUD_ZONE
>>  --optional-components=FLINK,DOCKER  --quiet--
>>
>> There is a way to force the flink version in dataproc ? I tried to use
>> Flink_init with initialization action but it didn't work.
>>
>> Any help would be appreciated.
>>
>> Thank you!
>>
>> Andoni Guzman | WIZELINE
>>
>> Software Engineer II
>>
>> andoni.guz...@wizeline.com
>>
>> Amado Nervo 2200, Esfera P6, Col. Ciudad del Sol, 45050 Zapopan, Jal.
>>
>>
>>
>>
>>
>>
>>
>>
>> *This email and its contents (including any attachments) 

[Question] Dataproc 1.5 - Flink version conflict

2022-02-14 Thread Andoni Guzman Becerra
Hi All, I'm working trying to re-enable some tests like
LoadTests_Combine_Flink_Python.groovy and fix some vms leaked in those
tests. https://issues.apache.org/jira/browse/BEAM-12898
The version of dataproc used before was 1.2 and now it's 1.5.
The problem is that dataproc 1.5  flink version is 1.9 and actually we use
flink 1.13. Causing a mismatch and error running the tests.
In dataproc 1.2 a init script was passed with all the info related with
flink version, but now in optional components only told the component to
install

This was the way to create a cluster in dataproc 1.2

 gcloud dataproc clusters create $CLUSTER_NAME --region=global
--num-workers=$num_dataproc_workers --initialization-actions
$DOCKER_INIT,$BEAM_INIT,$FLINK_INIT --metadata "${metadata}",
--image-version=$image_version --zone=$GCLOUD_ZONE --quiet

And this is the way to do it in dataproc 1.5

gcloud dataproc clusters create $CLUSTER_NAME --region=global
--num-workers=$num_dataproc_workers  --metadata "${metadata}",
--image-version=$image_version --zone=$GCLOUD_ZONE
 --optional-components=FLINK,DOCKER  --quiet--

There is a way to force the flink version in dataproc ? I tried to use
Flink_init with initialization action but it didn't work.

Any help would be appreciated.

Thank you!

Andoni Guzman | WIZELINE

Software Engineer II

andoni.guz...@wizeline.com

Amado Nervo 2200, Esfera P6, Col. Ciudad del Sol, 45050 Zapopan, Jal.

-- 
*This email and its contents (including any attachments) are being sent to
you on the condition of confidentiality and may be protected by legal
privilege. Access to this email by anyone other than the intended recipient
is unauthorized. If you are not the intended recipient, please immediately
notify the sender by replying to this message and delete the material
immediately from your system. Any further use, dissemination, distribution
or reproduction of this email is strictly prohibited. Further, no
representation is made with respect to any content contained in this email.*