Re: Schema Discovery Support in Apex Applications

Sergey Golovko Mon, 30 Jan 2017 14:32:03 -0800

Sorry, Iâm a new person in the APEX team. And I don't understand clearly who 
are consumers of the output port operator schema(s).


1. If the consumers are non-run-time callers like the application manager or UI 
designer, maybe it makes sense to use Java static method(s) to retrieve the 
output port operator schema(s). I guess the performance of a single call of a 
static method via reflection can be ignored.

2. If the consumer is next downstream operator, maybe it makes sense to send an 
output port operator schema from upstream operator to next downstream operator 
via the stream. The corresponded methods that would send and receive the schema 
should be declared in the interface/abstract-class of the upstream and 
downstream operators. The sending/receiving of an output schema should be 
processed right before the sending of the first data record via the stream.

One of examples of a typical implementation for sending of metadata with a 
regular result set is the sending of JDBC metadata as a part of JDBC result 
set. And I hope the output schema (metadata of the streamed data) in the 
implementation should contain not only a signature of the streamed objects 
(like field names and data types), but also any other properties of the data 
that can be useful by the schema receiver to process the data (for instance, a 
delimiter for CSV record stream).

Thanks,
Sergey

On 2017-01-25 01:47 (-0800), Chinmay Kolhatkar <[email protected]> wrote: 
> Thank you all for the feedback.
> 
> I've created a Jira for this: APEXCORE-623 and I'll attach the same
> document and link to this mailchain there.
> 
> As a first part of this Jira, there are 2 steps I would like to propose:
> 1. Add following interface at com.datatorrent.common.util.SchemaAware.
> 
> interface SchemaAware {
> 
> Map<OutputPort, Schema> registerSchema(Map<InputPort, Schema> inputSchema);
> }
> 
> This interface can be implemented by Operators to communicate its output
> schema(s) to engine.
> Input to this schema will be schema at its input port.
> 
> 2. After LogicalPlan is created call SchemaAware method from upstream to
> downstream operator in the DAG to propagate the Schema.
> 
> Once this is done, changes can be done in Malhar for the operators in
> question.
> 
> Please share your opinion on this approach.
> 
> Thanks,
> Chinmay.
> 
> 
> 
> 
> On Wed, Jan 18, 2017 at 2:31 PM, Priyanka Gugale <[email protected]> wrote:
> 
> > +1 to have this feature.
> >
> > -Priyanka
> >
> > On Tue, Jan 17, 2017 at 9:18 PM, Pramod Immaneni <[email protected]>
> > wrote:
> >
> > > +1
> > >
> > > On Mon, Jan 16, 2017 at 1:23 AM, Chinmay Kolhatkar <[email protected]>
> > > wrote:
> > >
> > > > Hi All,
> > > >
> > > > Currently a DAG that is generated by user, if contains any POJOfied
> > > > operators, TUPLE_CLASS attribute needs to be set on each and every port
> > > > which receives or sends a POJO.
> > > >
> > > > For e.g., if a DAG is like File -> Parser -> Transform -> Dedup ->
> > > > Formatter -> Kafka, then TUPLE_CLASS attribute needs to be set by user
> > on
> > > > both input and output ports of transform, dedup operators and also on
> > > > parser output and formatter input.
> > > >
> > > > The proposal here is to reduce work that is required by user to
> > configure
> > > > the DAG. Technically speaking if an operators knows input schema and
> > > > processing properties, it can determine output schema and convey it to
> > > > downstream operators. This way the complete pipeline can be configured
> > > > without user setting TUPLE_CLASS or even creating POJOs and adding them
> > > to
> > > > classpath.
> > > >
> > > > On the same idea, I want to propose an approach where the pipeline can
> > be
> > > > configured without user setting TUPLE_CLASS or even creating POJOs and
> > > > adding them to classpath.
> > > > Here is the document which at a high level explains the idea and a high
> > > > level design:
> > > > https://docs.google.com/document/d/1ibLQ1KYCLTeufG7dLoHyN_
> > > > tRQXEM3LR-7o_S0z_porQ/edit?usp=sharing
> > > >
> > > > I would like to get opinion from community about feasibility and
> > > > applications of this proposal.
> > > > Once we get some consensus we can discuss the design in details.
> > > >
> > > > Thanks,
> > > > Chinmay.
> > > >
> > >
> >
>

Re: Schema Discovery Support in Apex Applications

Reply via email to