Thanks Eugene. I will tackle some Jira when back next week.
Regards JB On Feb 7, 2017, 18:16, at 18:16, Eugene Kirpichov <[email protected]> wrote: >Hey all, > >I bit the bullet and audited all PTransform classes in Beam Java SDK >and >filed JIRA issues for all violations I could find. >I linked all them to the master JIRA issue >https://issues.apache.org/jira/browse/BEAM-1353 > >In general, all of these should be fixed before declaring Beam stable >API. >Would appreciate if some senior folks looked at the issues and >confirmed >that my suggested changes make sense. > >PRs very welcome :) (though I'll be gone for the next few weeks so I >can't >review right now) >Many of these are very easy to fix (a few lines of code); some require >a >little more code, but as far as I can tell all of them are mechanical. > > >On Tue, Jan 31, 2017 at 4:10 PM Eugene Kirpichov <[email protected]> >wrote: > >> On Mon, Jan 30, 2017 at 7:56 PM Dan Halperin ><[email protected]> >> wrote: >> >> On Mon, Jan 30, 2017 at 5:42 PM, Eugene Kirpichov < >> [email protected]> wrote: >> >> > Hello, >> > >> > The PTransform Style Guide is live >> > https://beam.apache.org/contribute/ptransform-style-guide/ - a >natural >> > next >> > step is to audit Beam libraries for compliance and file JIRAs for >places >> > that need to be fixed. It'd be great to finish these cleanups >before >> > declaring Beam stable API. >> > >> > Please take a look and file JIRAs / post suggestions on this >thread! >> > >> > I think it'll also make a great source of easy and useful work for >new >> > contributors. >> > >> > Some things I remember off the top of my head: >> > - TextIO, KafkaIO use coders improperly - coders should not be used >as a >> > general-purpose byte parsing mechanism. >> > >> >> Can you say more about Kafka? Kafka actually exports byte[] by >default, >> whereas Text files are String by default. So it does not seem nearly >as >> egregious for Kafka as it is for Text. >> >> Agreed that KafkaIO is less egregious, but it still has methods >> withKeyCoder and withValueCoder - these should be replaced with >something >> that doesn't take Coder. >> >> >> >> - HadoopFileSource is not packaged as a PTransform >> > - Some connectors, e.g. KafkaIO, should use AutoValue for their >parameter >> > builders, but don't >> > >> >> Isn't AutoValue entirely an internal implementation detail that is >not >> exposed(*) to users? I think this is irrelevant to a stable API. >> >> Agreed - doesn't block stable API, but still a good thing to do >because it >> makes the code cleaner (for KafkaIO there's a long-standing PR that >was >> blocked on ratifying the style guide >> https://github.com/apache/beam/pull/1048) >> >> >> >> (*) except that it makes transforms not able to be final, which is a >> regression. >> >> I think AutoValue use should generally be considered *very* optional. >In >> transforms I author, I prefer not to use AutoValue because it makes >the >> code more complex and less readable. >> >> Yeah, guidance on when to use / not use AutoValue could be improved. >I >> think it makes a lot of sense when the transform has more than one or >two >> parameters or when the set of parameters can grow. >> >> >> >> >> > - A few connectors improperly use >> > - Some transforms expose their transform type as "Something.Bound" >and >> > "Something.Unbound", e.g. TextIO.Read.Bound - such names are banned >> > >> >> "banned" is a strong word to use here. All of these are just >> recommendations. >> >> In general yes; the goal of the style guide is to be the default, >where if >> you deviate from it, you should have a good reason. I don't think >there >> ever exists a good reason to name a transform Something.Bound/Unbound >> though. >> >> >> >> >> > >> > I filed an umbrella JIRA >https://issues.apache.org/jira/browse/BEAM-1353 >> > about >> > making existing Beam transforms comply with the guide - let's >crowdsource >> > this! >> > >> > Thanks. >> > >> >>
