Re: [FEEDBACK] Metadata Platforms / Catalogs / Lineage integration

2022-01-13 Thread Pedro Silva
Hello,

I'm part of the DataHub community and working in collaboration with the
company behind it: http://acryldata.io
Happy to have a conversation or clarify any questions you may have on
DataHub :)

Have a nice day!

Em qui., 13 de jan. de 2022 às 15:33, Andrew Otto 
escreveu:

> Hello!  The Wikimedia Foundation is currently doing a similar evaluation
> (although we are not currently including any Flink considerations).
>
>
> https://wikitech.wikimedia.org/wiki/Data_Catalog_Application_Evaluation_Rubric
>
> More details will be published there as folks keep working on this.
> Hope that helps a little bit! :)
>
> -Andrew Otto
>
> On Thu, Jan 13, 2022 at 10:27 AM Martijn Visser 
> wrote:
>
>> Hi everyone,
>>
>> I'm currently checking out different metadata platforms, such as Amundsen
>> [1] and Datahub [2]. In short, these types of tools try to address problems
>> related to topics such as data discovery, data lineage and an overall data
>> catalogue.
>>
>> I'm reaching out to the Dev and User mailing lists to get some feedback.
>> It would really help if you could spend a couple of minutes to let me know
>> if you already use either one of the two mentioned metadata platforms or
>> another one, or are you evaluating such tools? If so, is that for
>> the purpose as a catalogue, for lineage or anything else? Any type of
>> feedback on these types of tools is appreciated.
>>
>> Best regards,
>>
>> Martijn
>>
>> [1] https://github.com/amundsen-io/amundsen/
>> [2] https://github.com/linkedin/datahub
>>
>>
>>


Re: CEP library support in Python

2021-09-15 Thread Pedro Silva
Understood, I was looking for a way to define these metrics that is attainable 
for non-programmers to develop. 

Thank you for the answer Seth

Pedro

> On 15 Sep 2021, at 18:38, Seth Wiesman  wrote:
> 
> 
> Honestly, I don't think you need CEP or MATCH_RECOGNIZE for that use case. It 
> can be solved with a simple process function that tracks the state for each 
> id. Output a 1 when a job completes and a -1 if canceled. Output the sum. You 
> can use a simple timer to clear the state for a job after 6 months have 
> passed. 
> 
> Seth 
> 
>> On Wed, Sep 15, 2021 at 12:34 PM Pedro Silva  wrote:
>> Hello,
>> 
>> As anyone used streaming sql pattern matching as shown in this email thread 
>> to count certain transitions on a stream?
>> Is it feasible?
>> 
>> Thank you,
>> Pedro Silva
>> 
>>>> On 13 Sep 2021, at 11:16, Pedro Silva  wrote:
>>>> 
>>> 
>>> Hello Seth,
>>> 
>>> Thank you very much for your reply. I've taken a look at MATCH_RECOGNIZE 
>>> but I have the following doubt. Can I implement a state machine that detect 
>>> patterns with multiple end states?
>>> To give you a concrete example:
>>> 
>>> I'm trying to count the number of Jobs that have been cancelled and 
>>> completed. The state machine associated with this Job concept is as follows:
>>> Started -> On-Going (Multiple Progress messages) -> Closed -> Completed 
>>> \ 
>>> 
>>> \--\\\-
>>>  > Cancelled
>>> 
>>> At any point the Job can be cancelled from the previous state. 
>>> This cancel message can take anywhere from 1-2 weeks to be received.
>>> The duration of this state machine (Job lifecycle) is roughly 6 months.
>>> 
>>> How can I keep a count of the number of Jobs that have been completed but 
>>> not cancelled such that when a cancel appears on a previously (completed | 
>>> closed)  I decrease my counter but not when a cancel appears after a 
>>> started or progress state (no counter increment or decrement) ?
>>> 
>>> I hope this example was clear.
>>> 
>>> Thank you for your time!
>>> Pedro Silva
>>> 
>>> 
>>>> Em sex., 10 de set. de 2021 às 20:18, Seth Wiesman  
>>>> escreveu:
>>>> Hi Pedro, 
>>>> 
>>>> The DataStream CEP library is not available in Python but you can use 
>>>> `MATCH_RECOGNIZE` in the table API which is implemented on-top of the CEP 
>>>> library from Python. 
>>>> 
>>>> https://nightlies.apache.org/flink/flink-docs-release-1.13/docs/dev/table/sql/queries/match_recognize/
>>>>  
>>>> 
>>>> Seth 
>>>> 
>>>>> On Fri, Sep 10, 2021 at 11:34 AM Pedro Silva  
>>>>> wrote:
>>>>> Hello,
>>>>> 
>>>>> Is Flink's CEP library available in python? From the documentation I see 
>>>>> no references so I'm guessing the answer is no but wanted some 
>>>>> confirmation from the community or developers.
>>>>> 
>>>>> Are there plans to support this library in python or alternatively, 
>>>>> another library altogether that can be used in python?
>>>>> 
>>>>> Thank you and have a nice weekend,
>>>>> Pedro Silva


Re: CEP library support in Python

2021-09-15 Thread Pedro Silva
Hello,

As anyone used streaming sql pattern matching as shown in this email thread to 
count certain transitions on a stream?
Is it feasible?

Thank you,
Pedro Silva

> On 13 Sep 2021, at 11:16, Pedro Silva  wrote:
> 
> 
> Hello Seth,
> 
> Thank you very much for your reply. I've taken a look at MATCH_RECOGNIZE but 
> I have the following doubt. Can I implement a state machine that detect 
> patterns with multiple end states?
> To give you a concrete example:
> 
> I'm trying to count the number of Jobs that have been cancelled and 
> completed. The state machine associated with this Job concept is as follows:
> Started -> On-Going (Multiple Progress messages) -> Closed -> Completed \ 
> 
> \--\\\-
>  > Cancelled
> 
> At any point the Job can be cancelled from the previous state. 
> This cancel message can take anywhere from 1-2 weeks to be received.
> The duration of this state machine (Job lifecycle) is roughly 6 months.
> 
> How can I keep a count of the number of Jobs that have been completed but not 
> cancelled such that when a cancel appears on a previously (completed | 
> closed)  I decrease my counter but not when a cancel appears after a started 
> or progress state (no counter increment or decrement) ?
> 
> I hope this example was clear.
> 
> Thank you for your time!
> Pedro Silva
> 
> 
>> Em sex., 10 de set. de 2021 às 20:18, Seth Wiesman  
>> escreveu:
>> Hi Pedro, 
>> 
>> The DataStream CEP library is not available in Python but you can use 
>> `MATCH_RECOGNIZE` in the table API which is implemented on-top of the CEP 
>> library from Python. 
>> 
>> https://nightlies.apache.org/flink/flink-docs-release-1.13/docs/dev/table/sql/queries/match_recognize/
>>  
>> 
>> Seth 
>> 
>>> On Fri, Sep 10, 2021 at 11:34 AM Pedro Silva  wrote:
>>> Hello,
>>> 
>>> Is Flink's CEP library available in python? From the documentation I see no 
>>> references so I'm guessing the answer is no but wanted some confirmation 
>>> from the community or developers.
>>> 
>>> Are there plans to support this library in python or alternatively, another 
>>> library altogether that can be used in python?
>>> 
>>> Thank you and have a nice weekend,
>>> Pedro Silva


FlinkSQL Sinks

2021-09-15 Thread Pedro Silva
Hello,

Is it possible to configure a sink for sql client queries other than the
terminal/stdout?
Looking at the SQL Client Configuration
,
it seems that the output of the client is always to visualize the results.

If I wanted to sink a changelog stream to a database like Postgres or Kafka
topic would I have to create a streaming application, hardcode the SQL
query and configure the sink as java/scala/python code?

Thank you.


Re: CEP library support in Python

2021-09-13 Thread Pedro Silva
Hello Seth,

Thank you very much for your reply. I've taken a look at MATCH_RECOGNIZE
but I have the following doubt. Can I implement a state machine that detect
patterns with multiple end states?
To give you a concrete example:

I'm trying to count the number of *Jobs* that have been *cancelled* and
*completed*. The state machine associated with this Job concept is as
follows:
Started -> On-Going (Multiple Progress messages) -> Closed -> Completed
\

\--\\\-
> Cancelled

At any point the Job can be cancelled from the previous state.
This cancel message can take anywhere from 1-2 weeks to be received.
The duration of this state machine (Job lifecycle) is roughly 6 months.

How can I keep a count of the number of Jobs that have been completed but
not cancelled such that when a cancel appears on a previously (completed |
closed)  I decrease my counter but not when a cancel appears after a
started or progress state (no counter increment or decrement) ?

I hope this example was clear.

Thank you for your time!
Pedro Silva


Em sex., 10 de set. de 2021 às 20:18, Seth Wiesman 
escreveu:

> Hi Pedro,
>
> The DataStream CEP library is not available in Python but you can use
> `MATCH_RECOGNIZE` in the table API which is implemented on-top of the CEP
> library from Python.
>
>
> https://nightlies.apache.org/flink/flink-docs-release-1.13/docs/dev/table/sql/queries/match_recognize/
>
>
> Seth
>
> On Fri, Sep 10, 2021 at 11:34 AM Pedro Silva 
> wrote:
>
>> Hello,
>>
>> Is Flink's CEP library
>> <https://nightlies.apache.org/flink/flink-docs-release-1.13/docs/libs/cep/> 
>> available
>> in python? From the documentation I see no references so I'm guessing the
>> answer is no but wanted some confirmation from the community or developers.
>>
>> Are there plans to support this library in python or alternatively,
>> another library altogether that can be used in python?
>>
>> Thank you and have a nice weekend,
>> Pedro Silva
>>
>


CEP library support in Python

2021-09-10 Thread Pedro Silva
Hello,

Is Flink's CEP library
<https://nightlies.apache.org/flink/flink-docs-release-1.13/docs/libs/cep/>
available
in python? From the documentation I see no references so I'm guessing the
answer is no but wanted some confirmation from the community or developers.

Are there plans to support this library in python or alternatively, another
library altogether that can be used in python?

Thank you and have a nice weekend,
Pedro Silva


Helm chart for Flink

2021-05-17 Thread Pedro Silva
Hello,

Forwarding this question from the dev mailing list in case this is a more
appropriate list.

Does flink have an official Helm Chart? I haven't been able to find any,
the closest most up-to-date one seems to be
https://github.com/GoogleCloudPlatform/flink-on-k8s-operator.
Is this correct or is there a more mature and/or recommeded helm chart to
use?

Thank you.