Hi Yarden,

Since it's a bounded source you could try with Sql transformation
grouping by the timestamp column. Here are some examples of grouping:

https://github.com/apache/beam/tree/master/sdks/python/apache_beam/yaml

However, if you want to add a timestamp column in addition to the
original CSV records then, there are multiple ways to achieve that.

1) MapToFields:
https://github.com/apache/beam/blob/master/sdks/python/apache_beam/yaml/yaml_mapping.md
[Your timestamp column could be a callable to get the current
timestamp on each record]

2) If you need an extra layer of transformation complexity I would
recommend creating a custom transformation:

# - type: MyCustomTransform
# name: AddDateTimeColumn
# config:
# prefix: 'whatever'

providers:
- type: 'javaJar'
config:
jar: 'gs://path/of/the/java.jar'
transforms:
MyCustomTransform: 'beam:transform:org.apache.beam:javatransformation:v1'

Here a good example of how to do that in Java:
https://github.com/apache/beam/blob/master/examples/multi-language/src/main/java/org/apache/beam/examples/multilanguage/JavaPrefixRegistrar.java

Best,
Ferran

El lun, 8 ene 2024 a las 19:53, Yarden BenMoshe (<yarde...@gmail.com>) escribió:
>
> Hi all,
> Im quite new to using beam yaml. I am working with a CSV file and want to 
> implement some windowing logic to it.
> Was wondering what is the right way to add timestamps to each element, 
> assuming I have a column including a timestamp.
>
> I am aware of Beam Programming Guide (apache.org) part but not sure how this 
> can be implemented and used from yaml prespective.
>
> Thanks
> Yarden

Reply via email to