RE: Re: Re: Re: Re: [Question][Contribution] Python SDK ByteKeyRange
LexicographicKeyRangeTracker supports both string and byte keys so it’s more complex than tracker that would only support byte keys. This is why I would make ByteKeyRestrictionTracker and if someone wants to support string keys they could make another contribution. On 2022/02/15 22:17:37 Chamikara Jayalath wrote: > Agree with Robert that sharing code with existing > LexicographicKeyRangeTracker is more important than trying to stay close to > the Java implementation. This code is relatively complicated and the > interface difference between restriction and range trackers is not too > large so we should be able to share most of the logic between Python > implementations. > > Thanks, > Cham > > On Tue, Feb 15, 2022 at 2:14 PM Sami Niemi > mailto:sa...@solita.fi>> wrote: > > > That tracker is not a restriction tracker which I need for my Bigtable > > reader SDF. When I started working on this tracker I noticed that it was > > implemented in Java and I figured it would be best to make functionally > > similar implementation in Python. LexicographicKeyRangeTracker is not > > that different except it can also handle strings as keys. I did not need > > the tracker to do this so I left it out to keep it more simple and closer > > to Java implementation. > > > > > > > > I’m open to changes in implementation but I would like to keep it simple > > and not too far away from Java implementation. > > > > > > > > On 2022/02/15 16:42:35 Robert Bradshaw wrote: > > > > > On Tue, Feb 15, 2022 at 2:03 AM Sami Niemi > > > mailto:sa...@solita.fi>> wrote: > > > > > > > > > > > > Hi Ismaël, > > > > > > > > > > > > > > > > > > > > > > > > What I’ve currently been working on locally is almost 100% based on > > that Java implementation. > > > > > > > > > > Did the existing LexicographicKeyRangeTracker not meet your needs? > > > > > > > > > > > I suppose I need to create Jira issue and make the contribution. > > > > > > > > > > > > > > > > > > > > > > > > On 2022/02/15 09:19:33 Ismaël Mejía wrote: > > > > > > > > > > > > > Oh, forgot to add also the link to the tests that cover most of those > > > > > > > > > > > > > unexpected cases: > > > > > > > > > > > > > [2] > > > > > > > > > > > > > > > https://github.com/apache/beam/blob/master/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/splittabledofn/ByteKeyRangeTrackerTest.java > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Tue, Feb 15, 2022 at 10:17 AM Ismaël Mejía > > > > > mailto:ie...@gmail.com>> > > wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > > Great idea, please take a look at the Java > > ByteKeyRestrictionTracker > > > > > > > > > > > > > > implementation for consistency [1] > > > > > > > > > > > > > > I remember we had to deal with lots of corner cases so probably > > worth a > > > > > > > > > > > > > > look. > > > > > > > > > > > > > > > > > > > > > > > > > > > > [1] > > > > > > > > > > > > > > > > https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/splittabledofn/ByteKeyRangeTracker.java > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Mon, Feb 14, 2022 at 6:39 PM Robert Bradshaw > > > > > > mailto:ro...@google.com>> > > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > > >> +1 to being forward looking and making restriction trackers. > > > > > > > > > > > > > >> Hopefully the restriction tracker and existing range tracker > > could share > > > &g
RE: Re: Re: Re: [Question][Contribution] Python SDK ByteKeyRange
That tracker is not a restriction tracker which I need for my Bigtable reader SDF. When I started working on this tracker I noticed that it was implemented in Java and I figured it would be best to make functionally similar implementation in Python. LexicographicKeyRangeTracker is not that different except it can also handle strings as keys. I did not need the tracker to do this so I left it out to keep it more simple and closer to Java implementation. I’m open to changes in implementation but I would like to keep it simple and not too far away from Java implementation. On 2022/02/15 16:42:35 Robert Bradshaw wrote: > On Tue, Feb 15, 2022 at 2:03 AM Sami Niemi > mailto:sa...@solita.fi>> wrote: > > > > Hi Ismaël, > > > > > > > > What I’ve currently been working on locally is almost 100% based on that > > Java implementation. > > Did the existing LexicographicKeyRangeTracker not meet your needs? > > > I suppose I need to create Jira issue and make the contribution. > > > > > > > > On 2022/02/15 09:19:33 Ismaël Mejía wrote: > > > > > Oh, forgot to add also the link to the tests that cover most of those > > > > > unexpected cases: > > > > > [2] > > > > > https://github.com/apache/beam/blob/master/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/splittabledofn/ByteKeyRangeTrackerTest.java > > > > > > > > > > > > > > > On Tue, Feb 15, 2022 at 10:17 AM Ismaël Mejía > > > mailto:ie...@gmail.com>> wrote: > > > > > > > > > > > Great idea, please take a look at the Java ByteKeyRestrictionTracker > > > > > > implementation for consistency [1] > > > > > > I remember we had to deal with lots of corner cases so probably worth a > > > > > > look. > > > > > > > > > > > > [1] > > > > > > https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/splittabledofn/ByteKeyRangeTracker.java > > > > > > > > > > > > > > > > > > On Mon, Feb 14, 2022 at 6:39 PM Robert Bradshaw > > > > mailto:ro...@google.com>> > > > > > > wrote: > > > > > > > > > > > >> +1 to being forward looking and making restriction trackers. > > > > > >> Hopefully the restriction tracker and existing range tracker could > > > >> share > > > > > >> 90% of their code. > > > > > >> > > > > > >> On Mon, Feb 14, 2022 at 9:36 AM Sami Niemi > > > >> mailto:sa...@solita.fi>> wrote: > > > > > >> > > > > > >>> Hello Robert, > > > > > >>> > > > > > >>> > > > > > >>> > > > > > >>> Beam has documented only OffsetRangeTracker [1] for new SDF API. Since > > > > > >>> Beam is moving away from Source API, I thought it would be nice to > > > >>> develop > > > > > >>> IO connectors by using new SDFs. For this I need to create restriction > > > > > >>> tracker that follows new SDF API. > > > > > >>> > > > > > >>> > > > > > >>> > > > > > >>> So I propose adding ByteKeyRange as new restriction class and > > > > > >>> ByteKeyRestrictionTracker as new restriction tracker class. In my > > > > > >>> implementation I’ve also used ByteKey class which are given to > > > >>> restriction. > > > > > >>> > > > > > >>> > > > > > >>> > > > > > >>>1. > > > > > >>> > > > >>> https://github.com/apache/beam/blob/7eb7fd017a43353204eb8037603409dda7e0414a/sdks/python/apache_beam/io/restriction_trackers.py#L76 > > > > > >>> > > > > > >>> > > > > > >>> > > > > > >>> On 2022/02/11 18:27:23 Robert Bradshaw wrote: > > > > > >>> > > > > > >>> > Hi Sam! Glad to hear you're willing to contribute. > > > > > >>> > > > > > >>> > > > > > > >>> > > > > > >>> > Though the name is a bit different, I'm wondering if this is already &
RE: Re: Contributor permission for Jira tickets
My username is samnisol. On 2022/02/15 18:52:33 Ahmet Altay wrote: > What is your jira username? > > On Tue, Feb 15, 2022 at 2:12 AM Sami Niemi > mailto:sa...@solita.fi>> wrote: > > > Hello, > > > > > > > > This is Sami from Solita. I’m working on ByteKeyRange and > > ByteKeyRestrictionTracker for Python SDK and I would need contributor > > permissions so I could create/assign tickets in Jira. > > > > > > > > Thank you, > > > > Sami Niemi > > >
Contributor permission for Jira tickets
Hello, This is Sami from Solita. I’m working on ByteKeyRange and ByteKeyRestrictionTracker for Python SDK and I would need contributor permissions so I could create/assign tickets in Jira. Thank you, Sami Niemi
RE: Re: Re: [Question][Contribution] Python SDK ByteKeyRange
Hi Ismaël, What I’ve currently been working on locally is almost 100% based on that Java implementation. I suppose I need to create Jira issue and make the contribution. On 2022/02/15 09:19:33 Ismaël Mejía wrote: > Oh, forgot to add also the link to the tests that cover most of those > unexpected cases: > [2] > https://github.com/apache/beam/blob/master/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/splittabledofn/ByteKeyRangeTrackerTest.java > > > On Tue, Feb 15, 2022 at 10:17 AM Ismaël Mejía > mailto:ie...@gmail.com>> wrote: > > > Great idea, please take a look at the Java ByteKeyRestrictionTracker > > implementation for consistency [1] > > I remember we had to deal with lots of corner cases so probably worth a > > look. > > > > [1] > > https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/splittabledofn/ByteKeyRangeTracker.java > > > > > > On Mon, Feb 14, 2022 at 6:39 PM Robert Bradshaw > > mailto:ro...@google.com>> > > wrote: > > > >> +1 to being forward looking and making restriction trackers. > >> Hopefully the restriction tracker and existing range tracker could share > >> 90% of their code. > >> > >> On Mon, Feb 14, 2022 at 9:36 AM Sami Niemi > >> mailto:sa...@solita.fi>> wrote: > >> > >>> Hello Robert, > >>> > >>> > >>> > >>> Beam has documented only OffsetRangeTracker [1] for new SDF API. Since > >>> Beam is moving away from Source API, I thought it would be nice to develop > >>> IO connectors by using new SDFs. For this I need to create restriction > >>> tracker that follows new SDF API. > >>> > >>> > >>> > >>> So I propose adding ByteKeyRange as new restriction class and > >>> ByteKeyRestrictionTracker as new restriction tracker class. In my > >>> implementation I’ve also used ByteKey class which are given to > >>> restriction. > >>> > >>> > >>> > >>>1. > >>> > >>> https://github.com/apache/beam/blob/7eb7fd017a43353204eb8037603409dda7e0414a/sdks/python/apache_beam/io/restriction_trackers.py#L76 > >>> > >>> > >>> > >>> On 2022/02/11 18:27:23 Robert Bradshaw wrote: > >>> > >>> > Hi Sam! Glad to hear you're willing to contribute. > >>> > >>> > > >>> > >>> > Though the name is a bit different, I'm wondering if this is already > >>> > >>> > present as LexicographicKeyRangeTracker. > >>> > >>> > > >>> https://github.com/apache/beam/blob/release-2.35.0/sdks/python/apache_beam/io/range_trackers.py#L349 > >>> > >>> > > >>> > >>> > On Fri, Feb 11, 2022 at 9:54 AM Ahmet Altay > >>> > mailto:al...@google.com>> wrote: > >>> > >>> > > > >>> > >>> > > Hi Sami. Thank you for your interest. > >>> > >>> > > > >>> > >>> > > Adding people who might be able to comment: @Chamikara Jayalath > >>> @Lukasz Cwik > >>> > >>> > > > >>> > >>> > > On Thu, Feb 10, 2022 at 8:38 AM Sami Niemi > >>> > > mailto:sa...@solita.fi>> wrote: > >>> > >>> > >> > >>> > >>> > >> Hello, > >>> > >>> > >> > >>> > >>> > >> > >>> > >>> > >> > >>> > >>> > >> I noticed that Python SDK only has implementation for > >>> OffsetRangeTracker and OffsetRange while Java also has ByteKeyRange and > >>> -Tracker. > >>> > >>> > >> > >>> > >>> > >> > >>> > >>> > >> > >>> > >>> > >> I have currently created simple implementations of following Python > >>> classes: > >>> > >>> > >> > >>> > >>> > >> ByteKey > >>> > >>> > >> ByteKeyRange > >>> > >>> > >> ByteKeyRestrictionTracker > >>> > >>> > >> > >>> > >>> > >> > >>> > >>> > >> > >>> > >>> > >> I would like to make contribution and make these available in > >>> Python SDK in addition to OffsetRange and -Tracker. I would like to hear > >>> any thoughts about this and should I make a contribution. > >>> > >>> > >> > >>> > >>> > >> > >>> > >>> > >> > >>> > >>> > >> Thank you, > >>> > >>> > >> > >>> > >>> > >> Sami Niemi > >>> > >>> > > >>> > >>> > >>> > >>> > >>> > >>> > >>> > >>> > >>> > >>> > >>> > >>> > >>> > >>> *SAMI NIEMI* > >>> Data Engineer > >>> +358 50 412 2115 <+358%2050%204122115> > >>> sami.ni...@solita.fi<mailto:sami.ni...@solita.fi> > >>> > >>> > >>> > >>> *SOLITA* > >>> Eteläesplanadi 8 > >>> 00130 Helsinki > >>> solita.fi <https://www.solita.fi><https://www.solita.fi%3e> > >>> > >>> > >>> > >> >
RE: Re: [Question][Contribution] Python SDK ByteKeyRange
Hello Robert, Beam has documented only OffsetRangeTracker [1] for new SDF API. Since Beam is moving away from Source API, I thought it would be nice to develop IO connectors by using new SDFs. For this I need to create restriction tracker that follows new SDF API. So I propose adding ByteKeyRange as new restriction class and ByteKeyRestrictionTracker as new restriction tracker class. In my implementation I’ve also used ByteKey class which are given to restriction. 1. https://github.com/apache/beam/blob/7eb7fd017a43353204eb8037603409dda7e0414a/sdks/python/apache_beam/io/restriction_trackers.py#L76 On 2022/02/11 18:27:23 Robert Bradshaw wrote: > Hi Sam! Glad to hear you're willing to contribute. > > Though the name is a bit different, I'm wondering if this is already > present as LexicographicKeyRangeTracker. > https://github.com/apache/beam/blob/release-2.35.0/sdks/python/apache_beam/io/range_trackers.py#L349 > > On Fri, Feb 11, 2022 at 9:54 AM Ahmet Altay > mailto:al...@google.com>> wrote: > > > > Hi Sami. Thank you for your interest. > > > > Adding people who might be able to comment: @Chamikara Jayalath @Lukasz Cwik > > > > On Thu, Feb 10, 2022 at 8:38 AM Sami Niemi > > mailto:sa...@solita.fi>> wrote: > >> > >> Hello, > >> > >> > >> > >> I noticed that Python SDK only has implementation for OffsetRangeTracker > >> and OffsetRange while Java also has ByteKeyRange and -Tracker. > >> > >> > >> > >> I have currently created simple implementations of following Python > >> classes: > >> > >> ByteKey > >> ByteKeyRange > >> ByteKeyRestrictionTracker > >> > >> > >> > >> I would like to make contribution and make these available in Python SDK > >> in addition to OffsetRange and -Tracker. I would like to hear any thoughts > >> about this and should I make a contribution. > >> > >> > >> > >> Thank you, > >> > >> Sami Niemi > SAMI NIEMI Data Engineer +358 50 412 2115 sami.ni...@solita.fi<mailto:sami.ni...@solita.fi> SOLITA Eteläesplanadi 8 00130 Helsinki solita.fi<https://www.solita.fi>
[Question][Contribution] Python SDK ByteKeyRange
Hello, I noticed that Python SDK only has implementation for OffsetRangeTracker and OffsetRange while Java also has ByteKeyRange and -Tracker. I have currently created simple implementations of following Python classes: * ByteKey * ByteKeyRange * ByteKeyRestrictionTracker I would like to make contribution and make these available in Python SDK in addition to OffsetRange and -Tracker. I would like to hear any thoughts about this and should I make a contribution. Thank you, Sami Niemi