Just created https://github.com/apache/beam/pull/29969

On Mon, Jan 8, 2024 at 2:49 PM Robert Bradshaw <rober...@google.com> wrote:
>
> This does appear to be a significant missing feature. I'll try to make
> sure something easier gets in by the next release. See also below.
>
> On Mon, Jan 8, 2024 at 11:30 AM Ferran Fernández Garrido
> <ffernandez....@gmail.com> wrote:
> >
> > Hi Yarden,
> >
> > Since it's a bounded source you could try with Sql transformation
> > grouping by the timestamp column. Here are some examples of grouping:
> >
> > https://github.com/apache/beam/tree/master/sdks/python/apache_beam/yaml
> >
> > However, if you want to add a timestamp column in addition to the
> > original CSV records then, there are multiple ways to achieve that.
> >
> > 1) MapToFields:
> > https://github.com/apache/beam/blob/master/sdks/python/apache_beam/yaml/yaml_mapping.md
> > [Your timestamp column could be a callable to get the current
> > timestamp on each record]
> >
> > 2) If you need an extra layer of transformation complexity I would
> > recommend creating a custom transformation:
> >
> > # - type: MyCustomTransform
> > # name: AddDateTimeColumn
> > # config:
> > # prefix: 'whatever'
> >
> > providers:
> > - type: 'javaJar'
> > config:
> > jar: 'gs://path/of/the/java.jar'
> > transforms:
> > MyCustomTransform: 'beam:transform:org.apache.beam:javatransformation:v1'
>
> Alternatively you can use PyTransform, if you're more comfortable with
> that by invoking it via its fully qualified name.
>
> pipeline:
>   transforms:
>     ...
>     - type: MyAssignTimestamps
>       config:
>           kwarg1: ...
>           kwarg2: ...
>
> providers:
>   type:python
>   config:
>     packages: ['py_py_package_identifier']
>   transforms:
>     MyAssignTimestamps:
> fully_qualified_package.module.AssignTimestampsPTransform
>
>
>
> > Best,
> > Ferran
> >
> > El lun, 8 ene 2024 a las 19:53, Yarden BenMoshe (<yarde...@gmail.com>) 
> > escribió:
> > >
> > > Hi all,
> > > Im quite new to using beam yaml. I am working with a CSV file and want to 
> > > implement some windowing logic to it.
> > > Was wondering what is the right way to add timestamps to each element, 
> > > assuming I have a column including a timestamp.
> > >
> > > I am aware of Beam Programming Guide (apache.org) part but not sure how 
> > > this can be implemented and used from yaml prespective.
> > >
> > > Thanks
> > > Yarden

Reply via email to