RE: Re: Re: Re: Re: [Question][Contribution] Python SDK ByteKeyRange

2022-02-17 Thread Sami Niemi
LexicographicKeyRangeTracker supports both string and byte keys so it’s more 
complex than tracker that would only support byte keys. This is why I would 
make ByteKeyRestrictionTracker and if someone wants to support string keys they 
could make another contribution.

On 2022/02/15 22:17:37 Chamikara Jayalath wrote:
> Agree with Robert that sharing code with existing
> LexicographicKeyRangeTracker is more important than trying to stay close to
> the Java implementation. This code is relatively complicated and the
> interface difference between restriction and range trackers is not too
> large so we should be able to share most of the logic between Python
> implementations.
>
> Thanks,
> Cham
>
> On Tue, Feb 15, 2022 at 2:14 PM Sami Niemi 
> mailto:sa...@solita.fi>> wrote:
>
> > That tracker is not a restriction tracker which I need for my Bigtable
> > reader SDF. When I started working on this tracker I noticed that it was
> > implemented in Java and I figured it would be best to make functionally
> > similar implementation in Python. LexicographicKeyRangeTracker is not
> > that different except it can also handle strings as keys. I did not need
> > the tracker to do this so I left it out to keep it more simple and closer
> > to Java implementation.
> >
> >
> >
> > I’m open to changes in implementation but I would like to keep it simple
> > and not too far away from Java implementation.
> >
> >
> >
> > On 2022/02/15 16:42:35 Robert Bradshaw wrote:
> >
> > > On Tue, Feb 15, 2022 at 2:03 AM Sami Niemi 
> > > mailto:sa...@solita.fi>> wrote:
> >
> > > >
> >
> > > > Hi Ismaël,
> >
> > > >
> >
> > > >
> >
> > > >
> >
> > > > What I’ve currently been working on locally is almost 100% based on
> > that Java implementation.
> >
> > >
> >
> > > Did the existing LexicographicKeyRangeTracker not meet your needs?
> >
> > >
> >
> > > > I suppose I need to create Jira issue and make the contribution.
> >
> > > >
> >
> > > >
> >
> > > >
> >
> > > > On 2022/02/15 09:19:33 Ismaël Mejía wrote:
> >
> > > >
> >
> > > > > Oh, forgot to add also the link to the tests that cover most of those
> >
> > > >
> >
> > > > > unexpected cases:
> >
> > > >
> >
> > > > > [2]
> >
> > > >
> >
> > > > >
> > https://github.com/apache/beam/blob/master/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/splittabledofn/ByteKeyRangeTrackerTest.java
> >
> > > >
> >
> > > > >
> >
> > > >
> >
> > > > >
> >
> > > >
> >
> > > > > On Tue, Feb 15, 2022 at 10:17 AM Ismaël Mejía 
> > > > > mailto:ie...@gmail.com>>
> > wrote:
> >
> > > >
> >
> > > > >
> >
> > > >
> >
> > > > > > Great idea, please take a look at the Java
> > ByteKeyRestrictionTracker
> >
> > > >
> >
> > > > > > implementation for consistency [1]
> >
> > > >
> >
> > > > > > I remember we had to deal with lots of corner cases so probably
> > worth a
> >
> > > >
> >
> > > > > > look.
> >
> > > >
> >
> > > > > >
> >
> > > >
> >
> > > > > > [1]
> >
> > > >
> >
> > > > > >
> > https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/splittabledofn/ByteKeyRangeTracker.java
> >
> > > >
> >
> > > > > >
> >
> > > >
> >
> > > > > >
> >
> > > >
> >
> > > > > > On Mon, Feb 14, 2022 at 6:39 PM Robert Bradshaw 
> > > > > > mailto:ro...@google.com>>
> >
> > > >
> >
> > > > > > wrote:
> >
> > > >
> >
> > > > > >
> >
> > > >
> >
> > > > > >> +1 to being forward looking and making restriction trackers.
> >
> > > >
> >
> > > > > >> Hopefully the restriction tracker and existing range tracker
> > could share
> >
> &g

RE: Re: Re: Re: [Question][Contribution] Python SDK ByteKeyRange

2022-02-15 Thread Sami Niemi
That tracker is not a restriction tracker which I need for my Bigtable reader 
SDF. When I started working on this tracker I noticed that it was implemented 
in Java and I figured it would be best to make functionally similar 
implementation in Python. LexicographicKeyRangeTracker is not that different 
except it can also handle strings as keys. I did not need the tracker to do 
this so I left it out to keep it more simple and closer to Java implementation.

I’m open to changes in implementation but I would like to keep it simple and 
not too far away from Java implementation.

On 2022/02/15 16:42:35 Robert Bradshaw wrote:
> On Tue, Feb 15, 2022 at 2:03 AM Sami Niemi 
> mailto:sa...@solita.fi>> wrote:
> >
> > Hi Ismaël,
> >
> >
> >
> > What I’ve currently been working on locally is almost 100% based on that 
> > Java implementation.
>
> Did the existing LexicographicKeyRangeTracker not meet your needs?
>
> > I suppose I need to create Jira issue and make the contribution.
> >
> >
> >
> > On 2022/02/15 09:19:33 Ismaël Mejía wrote:
> >
> > > Oh, forgot to add also the link to the tests that cover most of those
> >
> > > unexpected cases:
> >
> > > [2]
> >
> > > https://github.com/apache/beam/blob/master/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/splittabledofn/ByteKeyRangeTrackerTest.java
> >
> > >
> >
> > >
> >
> > > On Tue, Feb 15, 2022 at 10:17 AM Ismaël Mejía 
> > > mailto:ie...@gmail.com>> wrote:
> >
> > >
> >
> > > > Great idea, please take a look at the Java ByteKeyRestrictionTracker
> >
> > > > implementation for consistency [1]
> >
> > > > I remember we had to deal with lots of corner cases so probably worth a
> >
> > > > look.
> >
> > > >
> >
> > > > [1]
> >
> > > > https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/splittabledofn/ByteKeyRangeTracker.java
> >
> > > >
> >
> > > >
> >
> > > > On Mon, Feb 14, 2022 at 6:39 PM Robert Bradshaw 
> > > > mailto:ro...@google.com>>
> >
> > > > wrote:
> >
> > > >
> >
> > > >> +1 to being forward looking and making restriction trackers.
> >
> > > >> Hopefully the restriction tracker and existing range tracker could 
> > > >> share
> >
> > > >> 90% of their code.
> >
> > > >>
> >
> > > >> On Mon, Feb 14, 2022 at 9:36 AM Sami Niemi 
> > > >> mailto:sa...@solita.fi>> wrote:
> >
> > > >>
> >
> > > >>> Hello Robert,
> >
> > > >>>
> >
> > > >>>
> >
> > > >>>
> >
> > > >>> Beam has documented only OffsetRangeTracker [1] for new SDF API. Since
> >
> > > >>> Beam is moving away from Source API, I thought it would be nice to 
> > > >>> develop
> >
> > > >>> IO connectors by using new SDFs. For this I need to create restriction
> >
> > > >>> tracker that follows new SDF API.
> >
> > > >>>
> >
> > > >>>
> >
> > > >>>
> >
> > > >>> So I propose adding ByteKeyRange as new restriction class and
> >
> > > >>> ByteKeyRestrictionTracker as new restriction tracker class. In my
> >
> > > >>> implementation I’ve also used ByteKey class which are given to 
> > > >>> restriction.
> >
> > > >>>
> >
> > > >>>
> >
> > > >>>
> >
> > > >>>1.
> >
> > > >>>
> > > >>> https://github.com/apache/beam/blob/7eb7fd017a43353204eb8037603409dda7e0414a/sdks/python/apache_beam/io/restriction_trackers.py#L76
> >
> > > >>>
> >
> > > >>>
> >
> > > >>>
> >
> > > >>> On 2022/02/11 18:27:23 Robert Bradshaw wrote:
> >
> > > >>>
> >
> > > >>> > Hi Sam! Glad to hear you're willing to contribute.
> >
> > > >>>
> >
> > > >>> >
> >
> > > >>>
> >
> > > >>> > Though the name is a bit different, I'm wondering if this is already
&

RE: Re: Contributor permission for Jira tickets

2022-02-15 Thread Sami Niemi
My username is samnisol.

On 2022/02/15 18:52:33 Ahmet Altay wrote:
> What is your jira username?
>
> On Tue, Feb 15, 2022 at 2:12 AM Sami Niemi 
> mailto:sa...@solita.fi>> wrote:
>
> > Hello,
> >
> >
> >
> > This is Sami from Solita. I’m working on ByteKeyRange and
> > ByteKeyRestrictionTracker for Python SDK and I would need contributor
> > permissions so I could create/assign tickets in Jira.
> >
> >
> >
> > Thank you,
> >
> > Sami Niemi
> >
>


Contributor permission for Jira tickets

2022-02-15 Thread Sami Niemi
Hello,

This is Sami from Solita. I’m working on ByteKeyRange and 
ByteKeyRestrictionTracker for Python SDK and I would need contributor 
permissions so I could create/assign tickets in Jira.

Thank you,
Sami Niemi


RE: Re: Re: [Question][Contribution] Python SDK ByteKeyRange

2022-02-15 Thread Sami Niemi
Hi Ismaël,

What I’ve currently been working on locally is almost 100% based on that Java 
implementation. I suppose I need to create Jira issue and make the contribution.

On 2022/02/15 09:19:33 Ismaël Mejía wrote:
> Oh, forgot to add also the link to the tests that cover most of those
> unexpected cases:
> [2]
> https://github.com/apache/beam/blob/master/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/splittabledofn/ByteKeyRangeTrackerTest.java
>
>
> On Tue, Feb 15, 2022 at 10:17 AM Ismaël Mejía 
> mailto:ie...@gmail.com>> wrote:
>
> > Great idea, please take a look at the Java ByteKeyRestrictionTracker
> > implementation for consistency [1]
> > I remember we had to deal with lots of corner cases so probably worth a
> > look.
> >
> > [1]
> > https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/splittabledofn/ByteKeyRangeTracker.java
> >
> >
> > On Mon, Feb 14, 2022 at 6:39 PM Robert Bradshaw 
> > mailto:ro...@google.com>>
> > wrote:
> >
> >> +1 to being forward looking and making restriction trackers.
> >> Hopefully the restriction tracker and existing range tracker could share
> >> 90% of their code.
> >>
> >> On Mon, Feb 14, 2022 at 9:36 AM Sami Niemi 
> >> mailto:sa...@solita.fi>> wrote:
> >>
> >>> Hello Robert,
> >>>
> >>>
> >>>
> >>> Beam has documented only OffsetRangeTracker [1] for new SDF API. Since
> >>> Beam is moving away from Source API, I thought it would be nice to develop
> >>> IO connectors by using new SDFs. For this I need to create restriction
> >>> tracker that follows new SDF API.
> >>>
> >>>
> >>>
> >>> So I propose adding ByteKeyRange as new restriction class and
> >>> ByteKeyRestrictionTracker as new restriction tracker class. In my
> >>> implementation I’ve also used ByteKey class which are given to 
> >>> restriction.
> >>>
> >>>
> >>>
> >>>1.
> >>>
> >>> https://github.com/apache/beam/blob/7eb7fd017a43353204eb8037603409dda7e0414a/sdks/python/apache_beam/io/restriction_trackers.py#L76
> >>>
> >>>
> >>>
> >>> On 2022/02/11 18:27:23 Robert Bradshaw wrote:
> >>>
> >>> > Hi Sam! Glad to hear you're willing to contribute.
> >>>
> >>> >
> >>>
> >>> > Though the name is a bit different, I'm wondering if this is already
> >>>
> >>> > present as LexicographicKeyRangeTracker.
> >>>
> >>> >
> >>> https://github.com/apache/beam/blob/release-2.35.0/sdks/python/apache_beam/io/range_trackers.py#L349
> >>>
> >>> >
> >>>
> >>> > On Fri, Feb 11, 2022 at 9:54 AM Ahmet Altay 
> >>> > mailto:al...@google.com>> wrote:
> >>>
> >>> > >
> >>>
> >>> > > Hi Sami. Thank you for your interest.
> >>>
> >>> > >
> >>>
> >>> > > Adding people who might be able to comment: @Chamikara Jayalath
> >>> @Lukasz Cwik
> >>>
> >>> > >
> >>>
> >>> > > On Thu, Feb 10, 2022 at 8:38 AM Sami Niemi 
> >>> > > mailto:sa...@solita.fi>> wrote:
> >>>
> >>> > >>
> >>>
> >>> > >> Hello,
> >>>
> >>> > >>
> >>>
> >>> > >>
> >>>
> >>> > >>
> >>>
> >>> > >> I noticed that Python SDK only has implementation for
> >>> OffsetRangeTracker and OffsetRange while Java also has ByteKeyRange and
> >>> -Tracker.
> >>>
> >>> > >>
> >>>
> >>> > >>
> >>>
> >>> > >>
> >>>
> >>> > >> I have currently created simple implementations of following Python
> >>> classes:
> >>>
> >>> > >>
> >>>
> >>> > >> ByteKey
> >>>
> >>> > >> ByteKeyRange
> >>>
> >>> > >> ByteKeyRestrictionTracker
> >>>
> >>> > >>
> >>>
> >>> > >>
> >>>
> >>> > >>
> >>>
> >>> > >> I would like to make contribution and make these available in
> >>> Python SDK in addition to OffsetRange and -Tracker. I would like to hear
> >>> any thoughts about this and should I make a contribution.
> >>>
> >>> > >>
> >>>
> >>> > >>
> >>>
> >>> > >>
> >>>
> >>> > >> Thank you,
> >>>
> >>> > >>
> >>>
> >>> > >> Sami Niemi
> >>>
> >>> >
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>> *SAMI NIEMI*
> >>> Data Engineer
> >>> +358 50 412 2115 <+358%2050%204122115>
> >>> sami.ni...@solita.fi<mailto:sami.ni...@solita.fi>
> >>>
> >>>
> >>>
> >>> *SOLITA*
> >>> Eteläesplanadi 8
> >>> 00130 Helsinki
> >>> solita.fi <https://www.solita.fi><https://www.solita.fi%3e>
> >>>
> >>>
> >>>
> >>
>


RE: Re: [Question][Contribution] Python SDK ByteKeyRange

2022-02-14 Thread Sami Niemi
Hello Robert,

Beam has documented only OffsetRangeTracker [1] for new SDF API. Since Beam is 
moving away from Source API, I thought it would be nice to develop IO 
connectors by using new SDFs. For this I need to create restriction tracker 
that follows new SDF API.

So I propose adding ByteKeyRange as new restriction class and 
ByteKeyRestrictionTracker as new restriction tracker class. In my 
implementation I’ve also used ByteKey class which are given to restriction.


  1.  
https://github.com/apache/beam/blob/7eb7fd017a43353204eb8037603409dda7e0414a/sdks/python/apache_beam/io/restriction_trackers.py#L76

On 2022/02/11 18:27:23 Robert Bradshaw wrote:
> Hi Sam! Glad to hear you're willing to contribute.
>
> Though the name is a bit different, I'm wondering if this is already
> present as LexicographicKeyRangeTracker.
> https://github.com/apache/beam/blob/release-2.35.0/sdks/python/apache_beam/io/range_trackers.py#L349
>
> On Fri, Feb 11, 2022 at 9:54 AM Ahmet Altay 
> mailto:al...@google.com>> wrote:
> >
> > Hi Sami. Thank you for your interest.
> >
> > Adding people who might be able to comment: @Chamikara Jayalath @Lukasz Cwik
> >
> > On Thu, Feb 10, 2022 at 8:38 AM Sami Niemi 
> > mailto:sa...@solita.fi>> wrote:
> >>
> >> Hello,
> >>
> >>
> >>
> >> I noticed that Python SDK only has implementation for OffsetRangeTracker 
> >> and OffsetRange while Java also has ByteKeyRange and -Tracker.
> >>
> >>
> >>
> >> I have currently created simple implementations of following Python 
> >> classes:
> >>
> >> ByteKey
> >> ByteKeyRange
> >> ByteKeyRestrictionTracker
> >>
> >>
> >>
> >> I would like to make contribution and make these available in Python SDK 
> >> in addition to OffsetRange and -Tracker. I would like to hear any thoughts 
> >> about this and should I make a contribution.
> >>
> >>
> >>
> >> Thank you,
> >>
> >> Sami Niemi
>






SAMI NIEMI
Data Engineer
+358 50 412 2115
sami.ni...@solita.fi<mailto:sami.ni...@solita.fi>

SOLITA
Eteläesplanadi 8
00130 Helsinki
solita.fi<https://www.solita.fi>



[Question][Contribution] Python SDK ByteKeyRange

2022-02-10 Thread Sami Niemi
Hello,

I noticed that Python SDK only has implementation for OffsetRangeTracker and 
OffsetRange while Java also has ByteKeyRange and -Tracker.

I have currently created simple implementations of following Python classes:

  *   ByteKey
  *   ByteKeyRange
  *   ByteKeyRestrictionTracker

I would like to make contribution and make these available in Python SDK in 
addition to OffsetRange and -Tracker. I would like to hear any thoughts about 
this and should I make a contribution.

Thank you,
Sami Niemi