sharanf commented on pull request #8: URL: https://github.com/apache/kibble/pull/8#issuecomment-851610895
> @kaxil @sharanf @Humbedooh @michalslowikowski00 happy to get your opinion. > > As suggested on the dev list I introduced concept of `DataSource` and `DataType`. Those for now can be configured yaml configuration file: > > ```yaml > data_sources: > - name: github_kibble > class: kibble.data_sources.github.GithubDataSource > config: > repo_owner: apache > repo_name: kibble > enabled_data_types: > - pr_issues > ``` > > This form allow users to specify any external data sources as long as the class path points to importable object. > > The role of `DataSource` is to provide authentication methods for the external service represented by it. `DataType` represent single type of information we can get from this source, in case of this PR those are Github issues (which include also PRs). Role of `DataType` is to define : > > * how to process the raw data from external source and how to persist them into database (to be done) > > * how to read the data from database including aggregation, filters etc. > > > In general this is rough idea I have in m mind: > ![Kibble-2](https://user-images.githubusercontent.com/9528307/120119311-4eba8080-c197-11eb-9e81-acce6e650c10.png) @turbaszek Thanks for working on this. My initial thought is that this looks a lot more granular than what we have in place now - which is good as we have sometimes missed at been able to get to the right level of granularity. For Github the datatypes seem fairly organised and can pretty much already allocated - how do you see this working for example for our project mailing lists? Would each list the be a datasource and the conversations the datatype? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org