Yes, I tried with Dataproc 2.0 and Flink 1.12. Cluster creation was fine.
But at the moment of start Groovy testing fails. This is the error.
*Task :sdks:python:apache_beam:testing:load_tests:run**16:44:49*
INFO:apache_beam.testing.load_tests.load_test_metrics_utils:Missing
InfluxDB options. Metrics will not be published to InfluxDB*16:44:50*
WARNING:root:Make sure that locally built Python SDK docker image has
Python 3.7 interpreter.*16:44:50* INFO:root:Default Python SDK image
for environment is apache/beam_python3.7_sdk:2.38.0.dev*16:44:50*
INFO:root:Using provided Python SDK container image:
gcr.io/apache-beam-testing/beam_portability/beam_python3.7_sdk:latest*16:44:50*
INFO:root:Python SDK container image set to
"gcr.io/apache-beam-testing/beam_portability/beam_python3.7_sdk:latest"
for Docker environment*16:44:50*
WARNING:apache_beam.options.pipeline_options:Discarding unparseable
args: ['--fanout=4', '--top_count=20',
'--use_stateful_load_generator']*16:44:51*
INFO:apache_beam.runners.portability.portable_runner:Job state changed
to STOPPED*16:44:51*
INFO:apache_beam.runners.portability.portable_runner:Job state changed
to STARTING*16:44:51*
INFO:apache_beam.runners.portability.portable_runner:Job state changed
to RUNNING*16:45:14* ERROR:root:java.lang.NoClassDefFoundError: Could
not initialize class
org.apache.beam.runners.core.construction.SerializablePipelineOptions*16:45:15*
INFO:apache_beam.runners.portability.portable_runner:Job state changed
to FAILED*16:45:15* Traceback (most recent call last):*16:45:15*
File "/usr/lib/python3.7/runpy.py", line 193, in
_run_module_as_main*16:45:15* "__main__", mod_spec)*16:45:15*
File "/usr/lib/python3.7/runpy.py", line 85, in _run_code*16:45:15*
exec(code, run_globals)*16:45:15* File
"/home/jenkins/jenkins-slave/workspace/beam_LoadTests_Python_Combine_Flink_Streaming_PR/src/sdks/python/apache_beam/testing/load_tests/combine_test.py",
line 129, in *16:45:15* CombineTest().run()*16:45:15*
File
"/home/jenkins/jenkins-slave/workspace/beam_LoadTests_Python_Combine_Flink_Streaming_PR/src/sdks/python/apache_beam/testing/load_tests/load_test.py",
line 151, in run*16:45:15*
self.result.wait_until_finish(duration=self.timeout_ms)*16:45:15*
File
"/home/jenkins/jenkins-slave/workspace/beam_LoadTests_Python_Combine_Flink_Streaming_PR/src/sdks/python/apache_beam/runners/portability/portable_runner.py",
line 600, in wait_until_finish*16:45:15* raise
self._runtime_exception*16:45:15* RuntimeError: Pipeline
load-tests-python-flink-streaming-combine-4-0223222352_89cc297d-d8b8-45cc-8ebc-97f5a50f43a8
failed in state FAILED: java.lang.NoClassDefFoundError: Could not
initialize class
org.apache.beam.runners.core.construction.SerializablePipelineOptions*16:45:15*
*16:45:15* >* Task :sdks:python:apache_beam:testing:load_tests:run*
FAILED
I'm trying to find the reason for this error, could it be the version? Some
dependency? At this moment I didn't find too much info about it.
This could help
https://stackoverflow.com/questions/62438382/run-beam-pipeline-with-flink-yarn-session-on-emr
Any help would be appreciated , thanks a lot
On Mon, Feb 14, 2022 at 2:42 PM Kyle Weaver wrote:
> Can we use Dataproc 2.0, which supports Flink 1.12? (Since Flink 1.12 is
> still supported by Beam)
>
> On Mon, Feb 14, 2022 at 11:20 AM Andoni Guzman Becerra <
> andoni.guz...@wizeline.com> wrote:
>
>> Hi All, I'm working trying to re-enable some tests like
>> LoadTests_Combine_Flink_Python.groovy and fix some vms leaked in those
>> tests. https://issues.apache.org/jira/browse/BEAM-12898
>> The version of dataproc used before was 1.2 and now it's 1.5.
>> The problem is that dataproc 1.5 flink version is 1.9 and actually we
>> use flink 1.13. Causing a mismatch and error running the tests.
>> In dataproc 1.2 a init script was passed with all the info related with
>> flink version, but now in optional components only told the component to
>> install
>>
>> This was the way to create a cluster in dataproc 1.2
>>
>> gcloud dataproc clusters create $CLUSTER_NAME --region=global
>> --num-workers=$num_dataproc_workers --initialization-actions
>> $DOCKER_INIT,$BEAM_INIT,$FLINK_INIT --metadata "${metadata}",
>> --image-version=$image_version --zone=$GCLOUD_ZONE --quiet
>>
>> And this is the way to do it in dataproc 1.5
>>
>> gcloud dataproc clusters create $CLUSTER_NAME --region=global
>> --num-workers=$num_dataproc_workers --metadata "${metadata}",
>> --image-version=$image_version --zone=$GCLOUD_ZONE
>> --optional-components=FLINK,DOCKER --quiet--
>>
>> There is a way to force the flink version in dataproc ? I tried to use
>> Flink_init with initialization action but it didn't work.
>>
>> Any help would be appreciated.
>>
>> Thank you!
>>
>> Andoni Guzman | WIZELINE
>>
>> Software Engineer II
>>
>> andoni.guz...@wizeline.com
>>
>> Amado Nervo 2200, Esfera P6, Col. Ciudad del Sol, 45050 Zapopan, Jal.
>>
>>
>>
>>
>>
>>
>>
>>
>> *This email and its contents (including any attachments)