Hi Christophe,

I think this is a very good idea!

I agree with Enrico that the body should depend on the record schema, but
it could also be done as a follow-up task.

Another thing to think about could be an optional batching mechanism that
would take a batch of records and send them as a list of JSON objects in a
single HTTP request.

Best,
Alex

On Tue, Sep 20, 2022 at 2:16 PM Enrico Olivelli <eolive...@gmail.com> wrote:

> Christophe,
> very good initiative!
>
> I support it
> Some comments inline below
>
>
> Enrico
>
> Il giorno lun 19 set 2022 alle ore 19:10 Christophe Bornet
> <bornet.ch...@gmail.com> ha scritto:
> >
> > Hi all,
> >
> > I have drafted PIP-208: HTTP Sink
> >
> > PIP link:
> > https://github.com/apache/pulsar/issues/17719
> >
> > Here's a copy of the contents of the GH issue for your references:
> >
> > ### Motivation
> >
> > Currently, when you want to consume from Pulsar topics in applications
> > written in languages that don't have a Pulsar driver supported, you need
> to
> > run some type of proxy like the WebSocket Proxy or Pulsar Beam. In
> > production this needs additional effort to deploy, scale, load balance,
> > monitor, and so on...
> > Pulsar IO is a framework that deals with all these operational subjects
> and
> > can be leveraged to provide a way to push messages to external systems
> > using HTTP, a protocol supported by every existing language and OS.
> >
> > ### Goal
> >
> > This proposal defines an HTTP Sink that sends the messages to a
> configured
> > URL.
> > It takes inspiration from [Pulsar Beam](
> > https://github.com/kafkaesque-io/pulsar-beam) and the [Confluent HTTP
> Sink
> > connector](
> > https://docs.confluent.io/kafka-connectors/http/current/overview.html).
> >
> >
> > ### Implementation
> >
> > A `pulsar-io-http` module will be added to `pulsar-io`.
> > On building the project `pulsar-io-http-{version}.nar` will be built and
> > added to the `pulsar-all` distribution.
> > The name of the Sink will be `http`.
> >
> > The HTTP Sink pushes records to any HTTP server with the record value in
> > the body of a POST method.
> > The body of the HTTP request is the JSON representation of the record
> value.
>
> What do you mean ?
> I think that this should depend on the Schema.
>
> BYTES SCHEMA -> I would push the raw message payload
> PRIMITIVE VALUES (long, integer, string) - > I would push the JSON
> represantation
> JSON SCHEMA ->  push the raw message payload
> AVRO -> ?  convert to JSON ?
> PROTOBUF -> ? convert to JSON ?
> KEY-VALUE ?
>
> Probably we need some flag to define the behaviour for the non trivial
> cases.
>
>
> >
> > Some headers are added to the HTTP request:
> > * `PulsarTopic`: the topic of the record
> > * `PulsarKey`: the key of the record
> > * `PulsarEventTime`: the event time of the record
> > * `PulsarPublishTime`: the publish time of the record
> > * `PulsarMessageId`: the ID of the message contained in the record
> > * `PulsarProperties-*`: each record property is passed with the property
> > name prefixed by `PulsarProperties-`
> >
>
> Can we make the "Content-Type" configurable ?
> Can we make the HTTP METHOD configurable ?
>
>
> > ### Alternatives
> >
> > Creating a separated project for this Sink is rejected since:
> > * this Sink is very useful for developers to test the Pulsar IO
> framework,
> > transform functions, and to make demos.
> > * the code has a very small footprint with no external dependencies.
> > * it should be visible at the same level as other sinks
>
> 100% agreed !
>
> >
> > I'm looking forward the discussion.
> >
> > Best regards,
> >
> > Christophe Bornet
>

Reply via email to