On Mon, Jan 30, 2017 at 7:56 PM Dan Halperin <dhalp...@google.com.invalid> wrote:
> On Mon, Jan 30, 2017 at 5:42 PM, Eugene Kirpichov < > kirpic...@google.com.invalid> wrote: > > > Hello, > > > > The PTransform Style Guide is live > > https://beam.apache.org/contribute/ptransform-style-guide/ - a natural > > next > > step is to audit Beam libraries for compliance and file JIRAs for places > > that need to be fixed. It'd be great to finish these cleanups before > > declaring Beam stable API. > > > > Please take a look and file JIRAs / post suggestions on this thread! > > > > I think it'll also make a great source of easy and useful work for new > > contributors. > > > > Some things I remember off the top of my head: > > - TextIO, KafkaIO use coders improperly - coders should not be used as a > > general-purpose byte parsing mechanism. > > > > Can you say more about Kafka? Kafka actually exports byte[] by default, > whereas Text files are String by default. So it does not seem nearly as > egregious for Kafka as it is for Text. > Agreed that KafkaIO is less egregious, but it still has methods withKeyCoder and withValueCoder - these should be replaced with something that doesn't take Coder. > > - HadoopFileSource is not packaged as a PTransform > > - Some connectors, e.g. KafkaIO, should use AutoValue for their parameter > > builders, but don't > > > > Isn't AutoValue entirely an internal implementation detail that is not > exposed(*) to users? I think this is irrelevant to a stable API. > Agreed - doesn't block stable API, but still a good thing to do because it makes the code cleaner (for KafkaIO there's a long-standing PR that was blocked on ratifying the style guide https://github.com/apache/beam/pull/1048) > > (*) except that it makes transforms not able to be final, which is a > regression. > > I think AutoValue use should generally be considered *very* optional. In > transforms I author, I prefer not to use AutoValue because it makes the > code more complex and less readable. > Yeah, guidance on when to use / not use AutoValue could be improved. I think it makes a lot of sense when the transform has more than one or two parameters or when the set of parameters can grow. > > > > - A few connectors improperly use > > - Some transforms expose their transform type as "Something.Bound" and > > "Something.Unbound", e.g. TextIO.Read.Bound - such names are banned > > > > "banned" is a strong word to use here. All of these are just > recommendations. > In general yes; the goal of the style guide is to be the default, where if you deviate from it, you should have a good reason. I don't think there ever exists a good reason to name a transform Something.Bound/Unbound though. > > > > > > I filed an umbrella JIRA https://issues.apache.org/jira/browse/BEAM-1353 > > about > > making existing Beam transforms comply with the guide - let's crowdsource > > this! > > > > Thanks. > > >