Re: New comer to the Apache Streampipes

Zike Yang Thu, 25 Aug 2022 08:45:31 -0700

Hi, Dominik,

Thanks for your reply.


> 2. The schema guess step is currently mandatory (in most cases where we 
> connect to machines, data is already available), but it would be nice to 
> support both - e.g., users could upload an example or just manually define 
> the schema. We can work on a concept for this if you want!

Get it. That's a good idea. I will try to investigate it and start a
discussion when I have the initial concept.

> I forgot to mention that we already have some e2e tests for third-party 
> components under [1]. These tests create a data sink and an adapter and check 
> that data sent by the sink is received by the adapter.

That's great. I can start to add the integration test for the pulsar component.

I have created a PR to refactor the pulsar sink component: [0]. Please
help review it and feel free to comment on it when you have time.
Thanks.

In addition, I find another point we can improve when I write this PR.
In the current implementation, the pulsar producer publishes messages
in a synchronous way, which makes it impossible to leverage the
advantage of the batch sending of the pulsar producer. I think we can
add an option for the user to choose whether to send synchronously or
asynchronously.

[0] https://github.com/apache/incubator-streampipes/pull/107

Thanks,
Zike Yang

On Thu, Aug 25, 2022 at 10:41 PM Dominik Riemer
<[email protected]> wrote:
>
> Hi Zike,
>
> just one addition concerning tests:
> I forgot to mention that we already have some e2e tests for third-party 
> components under [1]. These tests create a data sink and an adapter and check 
> that data sent by the sink is received by the adapter.
> We would just need to add Pulsar to the validation docker compose file in the 
> project root which is used for running the e2e tests.
>
> Cheers
> Dominik
>
> [1] 
> https://github.com/apache/incubator-streampipes/tree/dev/ui/cypress/tests/thirdparty
>
>
> -----Original Message-----
> From: Zike Yang <[email protected]>
> Sent: Wednesday, August 24, 2022 4:53 PM
> To: [email protected]
> Subject: Re: New comer to the Apache Streampipes
>
> Hi, Dominik
>
> Thanks for your feedback and your helpful information. Today I set up my data 
> sinks development environment, and it worked fine. Next, I will start working 
> on refactoring the old three-class implementation.
>
> However, I have some questions:
> 1. I see that other data sink modules also use the old three-class 
> implementation like the kafka data sink or rabbitmq data sink. We will 
> refactor them too, right?
> 2. When I tried to create my adapter, I found that it must guess the schema 
> before creating the adapter. It's mandatory, and it would only work when 
> there were some data in the data source. Why do we need to guess the schema 
> when creating the adapter? Can we make it optional?
> Because I think that in many cases the data source does not have pre-stored 
> data when creating the adapter.
> 3. How could I write tests to verify my code changes? For my current task, I 
> think I could write some unit tests by mocking the pulsar client. What's your 
> thought?
>
> Thanks for your kind guidance. I am excited to contribute to this project.
>
> Thanks,
> Zike Yang
>
> On Wed, Aug 24, 2022 at 4:20 AM Dominik Riemer <[email protected]> 
> wrote:
> >
> > Hi Zike,
> >
> > great to hear that you want to contribute!
> >
> > A good start can be to setup your development environment and to improve 
> > some connectors or sinks. Improving the pulsar components would be great as 
> > these are some rather basic implementations we did at ApacheCon.
> >
> > For development, the best setup is to use the CLI tool [1] which can be 
> > configured to for different development targets. E.g., "dev" mode can be 
> > used to develop core services, UI and extensions so that only other 
> > mandatory services are started in Docker, while "pipeline-element" mode 
> > starts also the core and UI in Docker which is useful in case you are only 
> > developing pipeline elements. There are a few environment variables which 
> > might be needed to set and I'm happy to help with that.
> >
> > If you are interested in improving the Pulsar components, a good start 
> > could be to take the Pulsar sink [2] and refactor the old three-class 
> > implementation used there to the one-class-implementation described in the 
> > documentation [3]. Other cool things would be to upgrade the Pulsar version 
> > to the latest version and it would be also great to support more advanced 
> > options, e.g., authentication or topic discovery so that users see a list 
> > of available topics, or whatever you think would be a good configuration 
> > option for Pulsar users. Similar things would also be great for the 
> > connector.
> >
> > I'm happy to guide you through the first steps, feel free to ask here or 
> > join the ASF StreamPipes Slack channel for quick questions!
> >
> > Cheers
> > Dominik
> >
> > [1]
> > https://github.com/apache/incubator-streampipes/tree/dev/installer/cli
> > [2]
> > https://github.com/apache/incubator-streampipes/tree/dev/streampipes-e
> > xtensions/streampipes-sinks-brokers-jvm/src/main/java/org/apache/strea
> > mpipes/sinks/brokers/jvm/pulsar [3]
> > https://streampipes.apache.org/docs/docs/extend-tutorial-data-sinks.ht
> > ml
> >
> > -----Original Message-----
> > From: Zike Yang <[email protected]>
> > Sent: Tuesday, August 23, 2022 5:50 PM
> > To: [email protected]
> > Subject: New comer to the Apache Streampipes
> >
> > Hi, Apache Streampipes community
> >
> > I am a software engineer and a committer from the Apache Pulsar community.
> >
> > I recently learned about this project. I read some documentation and tried 
> > to use it myself and found that it's an excellent project. I'm interested 
> > in it. I wish I could contribute to this project.
> >
> > Currently, I'm familiar with the Apache Pulsar. I see that this project 
> > also uses Apache Pulsar as a data source connector and sink. I think I can 
> > get started by contributing to those modules.
> >
> > Are there other learning resources that can help me deepen my understanding 
> > of this project?
> > Is there anything else I can start working on?
> > Very appreciate it if you could provide me with more information.
> >
> > Thanks,
> > Zike Yang

Re: New comer to the Apache Streampipes

Reply via email to