Thanks for the information. I will take a look. Best Regards, Pulasthi
On Fri, Nov 15, 2019 at 2:07 PM Luke Cwik <[email protected]> wrote: > They are serialized but not with Java serialization. There is a > CloudObject serialization[1] layer that only Dataflow uses while all other > runners who need to serialize are using the Coder -> Proto serialization > layer[2]. The CloudObject representation is slated for deletion once we can > migrate Dataflow to the pure proto representation. Feel free to send a PR > to improve the wording in the Javadoc. > > 1: > https://github.com/apache/beam/blob/5de27d5c0cb86962e28b5168b7f1dec62352230b/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/util/CloudObjects.java#L87 > 2: > https://github.com/apache/beam/blob/5de27d5c0cb86962e28b5168b7f1dec62352230b/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/CoderTranslation.java#L68 > > On Fri, Nov 15, 2019 at 11:00 AM Pulasthi Supun Wickramasinghe < > [email protected]> wrote: > >> Hi Luke, >> >> Aren't the coders supposed to be serializable? The doc on the Coder >> interface has the following java doc comment, which seems to mean that they >> should be, and most of the basic coders seem to serializable. >> >> " {@link Coder} instances are serialized during job creation and >> deserialized before use. This will generally be performed by serializing >> the object via Java Serialization." >> >> However "TimestampedValueCoder" does not have a default constructor so it >> cannot be deserialized. adding a private non-arg constructor would solve >> the problem, but this may not be the only class that has this issue. I can >> work on a pull request to add the private non-args constructors to coders >> that have them missing if this was not done intentionally. WDYT? >> >> Best Regards, >> Pulasthi >> >> On Fri, Nov 15, 2019 at 12:05 AM Pulasthi Supun Wickramasinghe < >> [email protected]> wrote: >> >>> Hi Luke, >>> >>> That is the approach i am taking currently to handle the functions. I >>> Might have to do the same for Coders as well since some coders have the >>> same issue of not having default constructors. >>> >>> I also initially considered converting the pipeline into a JSON format >>> and sending that over to the workers, Will take a look at the option you >>> have mentioned since we do plan to implement a Portable pipeline runner for >>> Twister2 as well. Thanks for the information >>> >>> Best Regards, >>> Pulasthi >>> >>> >>> On Thu, Nov 14, 2019 at 2:30 PM Luke Cwik <[email protected]> wrote: >>> >>>> You should create placeholders inside of your Twister2/OpenMPI >>>> implementation that represent these functions and then instantiate actual >>>> instances of them on the workers if you want to write your own pipeline >>>> representation and format for OpenMPI/Twister2. >>>> >>>> Or consider converting the pipeline to its proto representation and >>>> building a portable pipeline runner. This way you could run Go and Python >>>> pipelines as well. The best example of this is the current Flink >>>> integration[1] >>>> >>>> 1: >>>> https://github.com/apache/beam/blob/master/runners/flink/src/main/java/org/apache/beam/runners/flink/FlinkBatchPortablePipelineTranslator.java >>>> >>>> On Wed, Nov 13, 2019 at 7:44 PM Pulasthi Supun Wickramasinghe < >>>> [email protected]> wrote: >>>> >>>>> Hi Dev's >>>>> >>>>> Currently, the Pipeline class in Beam is not Serializable. This is not >>>>> a problem for the current runners since the pipeline is translated and >>>>> submitted through a centralized Driver like model. However, if the runner >>>>> has a decentralized model similar to OpenMPI (MPI), which is also the case >>>>> with Twister2, which I am developing a runner currently, it would have >>>>> been >>>>> better if the pipeline itself was Serializable. >>>>> >>>>> Currently, I am trying to transform the Pipeline into a Twister2 graph >>>>> and then send over to the workers, however since there are some functions >>>>> such as "SystemReduceFn" that are not serializable this also is somewhat >>>>> troublesome. >>>>> >>>>> Was the decision to make Pipelines not Serializable made due to some >>>>> specific reason or because all the current use cases did not present any >>>>> valid requirement to make them Serializable? >>>>> >>>>> Best Regards, >>>>> Pulasthi >>>>> -- >>>>> Pulasthi S. Wickramasinghe >>>>> PhD Candidate | Research Assistant >>>>> School of Informatics and Computing | Digital Science Center >>>>> Indiana University, Bloomington >>>>> cell: 224-386-9035 <(224)%20386-9035> >>>>> >>>> >>> >>> -- >>> Pulasthi S. Wickramasinghe >>> PhD Candidate | Research Assistant >>> School of Informatics and Computing | Digital Science Center >>> Indiana University, Bloomington >>> cell: 224-386-9035 <(224)%20386-9035> >>> >> >> >> -- >> Pulasthi S. Wickramasinghe >> PhD Candidate | Research Assistant >> School of Informatics and Computing | Digital Science Center >> Indiana University, Bloomington >> cell: 224-386-9035 <(224)%20386-9035> >> > -- Pulasthi S. Wickramasinghe PhD Candidate | Research Assistant School of Informatics and Computing | Digital Science Center Indiana University, Bloomington cell: 224-386-9035
