RE: New comer to the Apache Streampipes

Dominik Riemer Thu, 25 Aug 2022 12:36:00 -0700

Hi Zike,

thanks for this very cool first PR, I've already merged it into dev and also 
added it to the 0.70.0 release branch!
Adding an option for synchronous and asynchronous publishing is a cool idea!


Cheers
Dominik 


-----Original Message-----
From: Zike Yang <[email protected]> 
Sent: Thursday, August 25, 2022 5:45 PM
To: [email protected]
Subject: Re: New comer to the Apache Streampipes

Hi, Dominik,

Thanks for your reply.

> 2. The schema guess step is currently mandatory (in most cases where we 
> connect to machines, data is already available), but it would be nice to 
> support both - e.g., users could upload an example or just manually define 
> the schema. We can work on a concept for this if you want!

Get it. That's a good idea. I will try to investigate it and start a discussion 
when I have the initial concept.

> I forgot to mention that we already have some e2e tests for third-party 
> components under [1]. These tests create a data sink and an adapter and check 
> that data sent by the sink is received by the adapter.

That's great. I can start to add the integration test for the pulsar component.

I have created a PR to refactor the pulsar sink component: [0]. Please help 
review it and feel free to comment on it when you have time.
Thanks.

In addition, I find another point we can improve when I write this PR.
In the current implementation, the pulsar producer publishes messages in a 
synchronous way, which makes it impossible to leverage the advantage of the 
batch sending of the pulsar producer. I think we can add an option for the user 
to choose whether to send synchronously or asynchronously.

[0] https://github.com/apache/incubator-streampipes/pull/107

Thanks,
Zike Yang

On Thu, Aug 25, 2022 at 10:41 PM Dominik Riemer <[email protected]> 
wrote:
>
> Hi Zike,
>
> just one addition concerning tests:
> I forgot to mention that we already have some e2e tests for third-party 
> components under [1]. These tests create a data sink and an adapter and check 
> that data sent by the sink is received by the adapter.
> We would just need to add Pulsar to the validation docker compose file in the 
> project root which is used for running the e2e tests.
>
> Cheers
> Dominik
>
> [1] 
> https://github.com/apache/incubator-streampipes/tree/dev/ui/cypress/te
> sts/thirdparty
>
>
> -----Original Message-----
> From: Zike Yang <[email protected]>
> Sent: Wednesday, August 24, 2022 4:53 PM
> To: [email protected]
> Subject: Re: New comer to the Apache Streampipes
>
> Hi, Dominik
>
> Thanks for your feedback and your helpful information. Today I set up my data 
> sinks development environment, and it worked fine. Next, I will start working 
> on refactoring the old three-class implementation.
>
> However, I have some questions:
> 1. I see that other data sink modules also use the old three-class 
> implementation like the kafka data sink or rabbitmq data sink. We will 
> refactor them too, right?
> 2. When I tried to create my adapter, I found that it must guess the schema 
> before creating the adapter. It's mandatory, and it would only work when 
> there were some data in the data source. Why do we need to guess the schema 
> when creating the adapter? Can we make it optional?
> Because I think that in many cases the data source does not have pre-stored 
> data when creating the adapter.
> 3. How could I write tests to verify my code changes? For my current task, I 
> think I could write some unit tests by mocking the pulsar client. What's your 
> thought?
>
> Thanks for your kind guidance. I am excited to contribute to this project.
>
> Thanks,
> Zike Yang
>
> On Wed, Aug 24, 2022 at 4:20 AM Dominik Riemer <[email protected]> 
> wrote:
> >
> > Hi Zike,
> >
> > great to hear that you want to contribute!
> >
> > A good start can be to setup your development environment and to improve 
> > some connectors or sinks. Improving the pulsar components would be great as 
> > these are some rather basic implementations we did at ApacheCon.
> >
> > For development, the best setup is to use the CLI tool [1] which can be 
> > configured to for different development targets. E.g., "dev" mode can be 
> > used to develop core services, UI and extensions so that only other 
> > mandatory services are started in Docker, while "pipeline-element" mode 
> > starts also the core and UI in Docker which is useful in case you are only 
> > developing pipeline elements. There are a few environment variables which 
> > might be needed to set and I'm happy to help with that.
> >
> > If you are interested in improving the Pulsar components, a good start 
> > could be to take the Pulsar sink [2] and refactor the old three-class 
> > implementation used there to the one-class-implementation described in the 
> > documentation [3]. Other cool things would be to upgrade the Pulsar version 
> > to the latest version and it would be also great to support more advanced 
> > options, e.g., authentication or topic discovery so that users see a list 
> > of available topics, or whatever you think would be a good configuration 
> > option for Pulsar users. Similar things would also be great for the 
> > connector.
> >
> > I'm happy to guide you through the first steps, feel free to ask here or 
> > join the ASF StreamPipes Slack channel for quick questions!
> >
> > Cheers
> > Dominik
> >
> > [1]
> > https://github.com/apache/incubator-streampipes/tree/dev/installer/c
> > li
> > [2]
> > https://github.com/apache/incubator-streampipes/tree/dev/streampipes
> > -e 
> > xtensions/streampipes-sinks-brokers-jvm/src/main/java/org/apache/str
> > ea
> > mpipes/sinks/brokers/jvm/pulsar [3]
> > https://streampipes.apache.org/docs/docs/extend-tutorial-data-sinks.
> > ht
> > ml
> >
> > -----Original Message-----
> > From: Zike Yang <[email protected]>
> > Sent: Tuesday, August 23, 2022 5:50 PM
> > To: [email protected]
> > Subject: New comer to the Apache Streampipes
> >
> > Hi, Apache Streampipes community
> >
> > I am a software engineer and a committer from the Apache Pulsar community.
> >
> > I recently learned about this project. I read some documentation and tried 
> > to use it myself and found that it's an excellent project. I'm interested 
> > in it. I wish I could contribute to this project.
> >
> > Currently, I'm familiar with the Apache Pulsar. I see that this project 
> > also uses Apache Pulsar as a data source connector and sink. I think I can 
> > get started by contributing to those modules.
> >
> > Are there other learning resources that can help me deepen my understanding 
> > of this project?
> > Is there anything else I can start working on?
> > Very appreciate it if you could provide me with more information.
> >
> > Thanks,
> > Zike Yang

RE: New comer to the Apache Streampipes

Reply via email to