https://beamsummit.org/sessions/event-driven-movie-magic/

^^ the question made me think of that use case.  Though, unclear how close
it is to what you're thinking about.

Cheers -

On Fri, Dec 15, 2023 at 7:01 AM Byron Ellis via user <user@beam.apache.org>
wrote:

> As Jan says, theoretically possible? Sure. That particular set of
> operations? Overkill. If you don't have it already set up I'd say even
> something like Airflow is overkill here. If all you need to do is "launch
> job and wait" when a file arrives... that's a small script and not
> something that particularly requires a distributed data processing system.
>
> On Fri, Dec 15, 2023 at 4:58 AM Jan Lukavský <je...@seznam.cz> wrote:
>
>> Hi,
>>
>> Apache Beam describes itself as "Apache Beam is an open-source, unified
>> programming model for batch and streaming data processing pipelines, ...".
>> As such, it is possible to use it to express essentially arbitrary logic
>> and run it as a streaming pipeline. A streaming pipeline processes input
>> data and produces output data and/or actions. Given these assumptions, it
>> is technically feasible to use Apache Beam for orchestrating other
>> workflows, the problem is that it will very much likely not be efficient.
>> Apache Beam has a lot of heavy-lifting related to the fact it is designed
>> to process large volumes of data in a scalable way, which is probably not
>> what would one need for workflow orchestration. So, my two cents would be,
>> that although it _could_ be done, it probably _should not_ be done.
>>
>> Best,
>>
>>  Jan
>> On 12/15/23 13:39, Mikhail Khludnev wrote:
>>
>> Hello,
>> I think this page https://beam.apache.org/documentation/ml/orchestration/
>> might answer your question.
>> Frankly speaking: GCP Workflows and Apache Airflow.
>> But Beam itself is a data-stream/flow or batch processor; not a workflow
>> engine (IMHO).
>>
>> On Fri, Dec 15, 2023 at 3:13 PM data_nerd_666 <dataner...@gmail.com>
>> wrote:
>>
>>> I know it is technically possible, but my case may be a little special.
>>> Say I have 3 steps for my control flow (ETL workflow):
>>> Step 1. upstream file watching
>>> Step 2. call some external service to run one job, e.g. run a notebook,
>>> run a python script
>>> Step 3. notify downstream workflow
>>> Can I use apache beam to build a DAG with 3 nodes and run this as either
>>> flink or spark job.  It might be a little weird, but I just want to
>>> learn from the community whether this is the right way to use apache beam,
>>> and has anyone done this before? Thanks
>>>
>>>
>>>
>>> On Fri, Dec 15, 2023 at 10:28 AM Byron Ellis via user <
>>> user@beam.apache.org> wrote:
>>>
>>>> It’s technically possible but the closest thing I can think of would be
>>>> triggering things based on things like file watching.
>>>>
>>>> On Thu, Dec 14, 2023 at 2:46 PM data_nerd_666 <dataner...@gmail.com>
>>>> wrote:
>>>>
>>>>> Not using beam as time-based scheduler, but just use it to control
>>>>> execution orders of ETL workflow DAG, because beam's abstraction is also a
>>>>> DAG.
>>>>> I know it is a little weird, just want to confirm with the community,
>>>>> has anyone used beam like this before?
>>>>>
>>>>>
>>>>>
>>>>> On Thu, Dec 14, 2023 at 10:59 PM Jan Lukavský <je...@seznam.cz> wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> can you give an example of what you mean for better understanding? Do
>>>>>> you mean using Beam as a scheduler of other ETL workflows?
>>>>>>
>>>>>>   Jan
>>>>>>
>>>>>> On 12/14/23 13:17, data_nerd_666 wrote:
>>>>>> > Hi all,
>>>>>> >
>>>>>> > I am new to apache beam, and am very excited to find beam in apache
>>>>>> > community. I see lots of use cases of using apache beam for data
>>>>>> flow
>>>>>> > (process large amount of batch/streaming data). I am just wondering
>>>>>> > whether I can use apache beam for control flow (ETL workflow). I
>>>>>> don't
>>>>>> > mean the spark/flink job in the ETL workflow, I mean the ETL
>>>>>> workflow
>>>>>> > itself. Because ETL workflow is also a DAG which is very similar as
>>>>>> > the abstraction of apache beam, but unfortunately I didn't find
>>>>>> such
>>>>>> > use cases on internet. So I'd like to ask this question in beam
>>>>>> > community to confirm whether I can use apache beam for control flow
>>>>>> > (ETL workflow). If yes, please let me know some success stories of
>>>>>> > this. Thanks
>>>>>> >
>>>>>> >
>>>>>> >
>>>>>>
>>>>>
>>
>> --
>> Sincerely yours
>> Mikhail Khludnev
>>
>>

Reply via email to