Problem with pikcler

2024-02-23 Thread Juan Romero
Hi guys.

Currently i have the class SaptmCustomerUpsertPipeline that inherited from
the GenericPipeline class. Locally the pipeline works fine, but when i try
to run it in dataflow i got this error:

severity: "ERROR"
textPayload: "Error message from worker: generic::unknown: Traceback (most
recent call last): File
"/usr/local/lib/python3.9/site-packages/apache_beam/internal/dill_pickler.py",
line 285, in loads return dill.loads(s) File
"/usr/local/lib/python3.9/site-packages/dill/_dill.py", line 275, in loads
return load(file, ignore, **kwds) File
"/usr/local/lib/python3.9/site-packages/dill/_dill.py", line 270, in load
return Unpickler(file, ignore=ignore, **kwds).load() File
"/usr/local/lib/python3.9/site-packages/dill/_dill.py", line 472, in load
obj = StockUnpickler.load(self) File
"/usr/local/lib/python3.9/site-packages/dill/_dill.py", line 462, in
find_class return StockUnpickler.find_class(self, module, name)
AttributeError: Can't get attribute 'SaptmCustomerUpsertPipeline' on


Seems that there is an error in the serialization of the child. class.
Someone has faced with this problem? . The pipeline works fine with the
direct runner.


Re: Beam portable runner setup for Flink + Python on Kubernetes

2024-02-23 Thread Sam Bourne
Hey Jaehyeon,

Docker is the default environment type

when using the PortableRunner. I included them just for reference because
we found it useful to override the default sdk container with our own.

It is pretty complicated, especially to debug sometimes, but we had some
good success running some simple pipelines in production for around a year.
I was more wary about maintaining my own Flink cluster so eventually we
decided to shed the technical debt and pay for Dataflow. Runners already
rely on docker to support the portability framework
 so I don't think that is
much of a concern.

On Thu, Feb 22, 2024 at 7:49 PM Jaehyeon Kim  wrote:

> Hi Sam
>
> Thanks for the GitHub repo link. In your example, the environment type is
> set to DOCKER and it requires a docker container running together with the
> task manager. Would you think it is acceptable in a production environment?
>
> Cheers,
> Jaehyeon
>
> On Fri, 23 Feb 2024 at 13:57, Sam Bourne  wrote:
>
>> I made this a few years ago to help people like yourself.
>>
>> https://github.com/sambvfx/beam-flink-k8s
>>
>> Hopefully it's insightful and I'm happy to accept any MRs to update any
>> outdated information or to flesh it out more.
>>
>> On Thu, Feb 22, 2024 at 3:48 PM Jaehyeon Kim  wrote:
>>
>>> Hello,
>>>
>>> I'm playing with the beam portable runner to read/write data from Kafka.
>>> I see a spark runner example on Kubernetes (
>>> https://beam.apache.org/documentation/runners/spark/#kubernetes) but
>>> the flink runner section doesn't include such an example.
>>>
>>> Is there a resource that I can learn? Ideally it'll be good if it is
>>> updated in the documentation.
>>>
>>> Cheers,
>>> Jaehyeon
>>>
>>


Re: Beam portable runner setup for Flink + Python on Kubernetes

2024-02-23 Thread Jan Lukavský

Hi,

I have set up such configuration for local environment (minikube), that 
can be found at [1] and [2]. It is somewhat older, but it might serve as 
an inspiration. If you would like write up your solution to the 
documentation, that would be awesome, I'd be happy to review it. :)


Best,
 Jan

[1] 
https://github.com/PacktPublishing/Building-Big-Data-Pipelines-with-Apache-Beam/blob/main/env/manifests/flink.yaml


[2] 
https://github.com/PacktPublishing/Building-Big-Data-Pipelines-with-Apache-Beam/blob/main/env/docker/flink/Dockerfile


On 2/23/24 00:48, Jaehyeon Kim wrote:

Hello,

I'm playing with the beam portable runner to read/write data from 
Kafka. I see a spark runner example on Kubernetes 
(https://beam.apache.org/documentation/runners/spark/#kubernetes) but 
the flink runner section doesn't include such an example.


Is there a resource that I can learn? Ideally it'll be good if it is 
updated in the documentation.


Cheers,
Jaehyeon