Hi, Do you use specific/complex coders in your pipeline ?
I'm sure Eugene will propose some insights about this change: AFAIR, the purpose is to have a cleaner use of coders and identify identity copy. Regards JB On 09/07/2018 16:22, Vojtech Janota wrote: > Hi, > > We are using Apache Beam in our project for some time now. Since our > datasets are of modest size, we have so far used DirectRunner as the > computation easily fits onto a single machine. Recently we upgraded Beam > from 2.2 to 2.4 and found out that performance of our pipelines > drastically deteriorated. Pipelines that took ~3 minutes with 2.2 do not > finish within hours now. We tried to isolate the change that causes the > slowdown and came to the commits into the "InMemoryStateInternals" class: > > * https://github.com/apache/beam/commit/32a427c > * https://github.com/apache/beam/commit/8151d82 > > In a nutshell where previously the copy() method simply assigned: > > that.value = this.value > > There is now coder encode/decode combo hidden behind: > > that.value = uncheckedClone(coder, this.value) > > Can somebody explain the purpose of this change? Is it meant as an > additional "enforcement" point, similar to DirectRunner's > enforceImmutability and enforceEncodability? Or is it something that is > genuinely needed to provide correct behaviour of the pipeline? > > Any hints or thoughts are appreciated. > > Regards, > Vojta > > > > > -- Jean-Baptiste Onofré jbono...@apache.org http://blog.nanthrax.net Talend - http://www.talend.com