Re: how to enable debugging mode for python worker harness

2024-03-18 Thread XQ Hu via user
I did not do anything special but ran `docker-compose -f
docker-compose.yaml up` from your repo.

On Sun, Mar 17, 2024 at 11:38 PM Lydian Lee  wrote:

> Hi XQ,
>
> The code is simplified from my previous work and thus it is still using
> the old version. But I've tested with Beam 2.54.0 and the code still works
> (I mean using my company's image.)  If this is running well in your linux,
> I guess there could be something related to how I build the docker image.
> Curious if you could share the image you built to docker.io so that I can
> confirm if the problem is related to only the image, thanks.
>
> The goal for this repo is to complete my previous talk:
> https://www.youtube.com/watch?v=XUz90LpGAgc&ab_channel=ApacheBeam
>
> On Sun, Mar 17, 2024 at 8:07 AM XQ Hu via user 
> wrote:
>
>> I cloned your repo on my Linux machine, which is super useful to run. Not
>> sure why you use Beam 2.41 but anyway, I tried this on my Linux machine:
>>
>> python t.py \
>>   --topic test --group test-group --bootstrap-server localhost:9092 \
>>   --job_endpoint localhost:8099 \
>>   --artifact_endpoint localhost:8098 \
>>   --environment_type=EXTERNAL \
>>   --environment_config=localhost:5
>>
>> Note I replaced host.docker.internal with localhost and it runs well.
>>
>> I then tried to use host.docker.internal and it also runs well,
>>
>> Maybe this is related to your Mac setting?
>>
>> On Sun, Mar 17, 2024 at 8:34 AM Lydian Lee 
>> wrote:
>>
>>>
>>> Hi,
>>>
>>> Just FYI, the similar things works on a different image with the one I
>>> built using my company’s image as base image. I’ve only replaced the base
>>> image with ubuntu. But given that the error log is completely not helpful,
>>> it’s really hard for me to continue debugging on the issue though.
>>>
>>> The docker is not required on my base image as I’ve already add extra
>>> args to ReadFromKafka with default environment to be Process. This is proof
>>> to work with my company’s docker image. For the host.internal.docker which
>>> is also supported by docker for mac. The only thing i need to do is to
>>> configure /etc/hosts so that i can submit the job directly from the laptop
>>> and not the flink master.
>>>
>>>
>>>
>>>
>>> On Sun, Mar 17, 2024 at 2:40 AM Jaehyeon Kim  wrote:
>>>
 Hello,

 The pipeline runs in host while host.docker.internal would only be
 resolved on the containers that run with the host network mode. I guess the
 pipeline wouldn't be accessible to host.docker.internal and fails to run.

 If everything before ReadFromKafka works successfully, a docker
 container will be launched with the host network mode so that
 host.docker.internal:9092 can be resolved inside the container. As far as
 I've checked, however, it fails when I start a flink cluster on docker and
 I had to rely on a local flink cluster. If you'd like to try to use docker,
 you should have docker installed on your custom docker image and
 volume-map /var/run/docker.sock to the flink task manager. Otherwise, it
 won't be able to launch a Docker container for reading kafka messages.

 Cheers,
 Jaehyeon


 On Sun, 17 Mar 2024 at 18:21, Lydian Lee 
 wrote:

> Hi,
>
> I have an issue when setting up a POC of  Python SDK with Flink runner
> to run in docker-compose.  The python worker harness was not returning any
> error but:
> ```
> python-worker-harness-1  | 2024/03/17 07:10:17 Executing: python -m
> apache_beam.runners.worker.sdk_worker_main
> python-worker-harness-1  | 2024/03/17 07:10:24 Python exited: 
> ```
> and dead.  The error message seems totally unuseful, and I am
> wondering if there's a way to make the harness script show more debug
> logging.
>
> I started my harness via:
> ```
> /opt/apache/beam/boot --worker_pool
> ```
> and configure my script to use the harness
> ```
> python docker/src/example.py \
>   --topic test --group test-group --bootstrap-server
> host.docker.internal:9092 \
>   --job_endpoint host.docker.internal:8099 \
>   --artifact_endpoint host.docker.internal:8098 \
>   --environment_type=EXTERNAL \
>   --environment_config=host.docker.internal:5
> ```
> The full settings is available in:
> https://github.com/lydian/beam-python-flink-runner-examples
> Thanks for your help
>
>


Re: [QUESTION] about Metric REST spec

2024-03-18 Thread LDesire
Thank you for your reply.

I found the MetricHttpSink class in the Apache Beam project.

However, I realized that I need to add an additional dependency to use this 
class in a normal pipeline.

I also think that the definition of the MetricOptions interface is ambiguous.

After all, in order to use the `-metricsHttpSinkUrl` option, we need to provide 
the FQCN of the MetricHttpSink as a string in the `-metricsSink` option, which 
is where the beam-runners-extensions-java-metrics dependency comes in.

I think we should look into this a bit more and consider moving the 
MetricHttpSink class to core if necessary.

Thansk.



> 2024. 3. 15. 오후 9:33, Puertos tavares, Jose J (Canada) via user 
>  작성:
> 
> Hello Beamer:
>  
> We have been aware of some of the shortcoming with reporting, so we actually 
> make that part of our Batch Code  includes Beam Counters on what we want to 
> track (standardize “elements” and “errors” as the main ones ) and then with 
> the  outcome of 
>  
> result = p.run()
>  
> we call the following function that allows us to store a JSON representation 
> of the counters for Direct Runner, Flink Runner and Dataflow Runner in a 
> somewhat standardized way. Notice we added support to do a POST to and 
> endpoint and return the json to print on stdout if wanted to.
>  
> # 
> =
> #
> # Report Metrics in a JSON Payload to a local path or remote post
> #
> # 
> =
> def reportMetrics(result,
>   reportTarget=None,
>   counterNames=["errors", "elements"]):
> import re
> import json
> import requests
>  
> #Provide Json structure
> def toJsonEntry(metric):
> step = metric.key.step
>  
> #Check to see if it has Flink Runner naming
> m = re.match("ref_AppliedPTransform_(.+)_\d+$", step)
> if not m is None:
> step = m.group(1).replace("-", " ")
>  
> #Standarized with lower case step name
> step = step.lower().strip()
>  
> return {
> "kind": "counter",
> "step": step,
> "name": metric.key.metric.name,
> "value": metric.committed
> }
>  
> counters = 
> result.metrics().query(beam.metrics.MetricsFilter())['counters']
> metrics = [
> toJsonEntry(counter)
> for counter in counters
> if counter.key.metric.name in counterNames
> ]
>  
> #Report to external resource
> if not reportTarget is None:
> payload = json.dumps(metrics)
>  
> #If HTTP(S) then POST
> if reportTarget.startswith("http"):
> requests.post(reportTarget, data=payload)
> else:
> with open(reportTarget, "w") as fd:
> fd.write(payload)
>  
> return metrics
>  
>  
>  
> It sure would be nice to have a standardized way to get it OOTB
>  
> 
> INTERNAL USE
> 
> From: LDesire mailto:two_som...@icloud.com>>
> Sent: Friday, March 15, 2024 5:30 AM
> To: user@beam.apache.org 
> Subject: [EXTERNAL] [QUESTION] about Metric REST spec
> 
>  
>  
> Hello Apache Beam community.
> I'm reading the programming-guide in the official documentation, and I'm 
> seeing  
> 
>   
> There it states the following
>  
> """
> As for now only the REST HTTP and the Graphite sinks are supported and only 
> Flink and Spark runners support metrics export.
> """
> Is there any specification for this REST?
>  
> Thank you.
>  
> 
> 
> The information in this Internet Email is confidential and may be legally 
> privileged. It is intended solely for the addressee. Access to this Email by 
> anyone else is unauthorized. If you are not the intended recipient, any 
> disclosure, copying, distribution or any action taken or omitted to be taken 
> in reliance on it, is prohibited and may be unlawful. When addressed to our 
> clients any opinions or advice contained in this Email are subject to the 
> terms and conditions expressed in any applicable governing The Home Depot 
> terms of business or client engagement letter. The Home Depot disclaims all 
> responsibility and liability for the accuracy and content of this attachment 
> and for any damages or losses arising from any inaccuracies, errors, viruses, 
> e.g., worms, trojan horses, etc., or other items of a destructive nature, 
> which may be contained in this attachment and shall not be liable for direct, 
> indirect, consequential or special damages in connection with this e-mail 
> message or its attachment.