Hi Daniel
+1
I am extremely happy to see this thread about rebooting Kibble :-)
I think anything that makes it easier to contribute is a good thing. We
have been talking about making the visualisations better (and more
modern!) so adding the ability to improve it and make it more flexible
is great, They say a picture says a 1000 words so while discussion is
good - a perhaps an architecture diagram similar to what our friends at
DevLake <https://devlake.apache.org/docs/Overview/Architecture>have done
would help.
The data part did need some re-organisation and I remember the
discussion we had previously <https://github.com/apache/kibble/pull/8>
about trying to separate the source and the types so am happy to see
this is still part of the roadmap to tidy up.
I like the idea of not abandoning the mailing list, subversion and
other data sources in favour of only Git and Github as there are many
projects out there that still use them.
The mono repo idea is good - we already tried keeping two repos so let's
see if one works better.
The only other thing I would bring up now is the possibility of making a
release of Kibble-1 to give people coming along something to look at,
work with and download. So will start a new thread for that. If
necessary I'd be willing to have a go at working on organising that
(with any other help I can get!) :-)
Thanks
Sharan
On 2022-09-11 18:36, Daniel Gruno wrote:
Hi folks,
a while back we attempted a complete redesign of the Kibble platform,
which unfortunately fizzled out. I'd like to restart this process, but
perhaps simplify and condense our goals a bit, so as to lower the bar
for participation and implementation.
I'd like to propose we divide Kibble into three components:
- A management service that purely exists to manage sources, data
access, and delegate jobs
- A scanner service that uses the sources/jobs from the above and
gathers data points
- An optional visualization service that can latch onto the database
and visualize the data gathered by the scanners. This would be
optional in that the base server and scanners would work independently
from this, and any other visualization platform (jupyter, devlake??
kibana??) could be utilized instead of the default option.
For the data part of it all, I'd also like to propose we do a split
between source types and data types. That is to say, a source type
could be a git repo, a github repo, a subversion repo, a pony mail
list, a jira instance etc, whereas a data type could be a commit, an
email, a ticket, etc. data types could have one or more associated
sources, and these sources would themselves have individual ways of
obtaining the required data. Thus, an issue scanner could essentially
be source-type-agnostic, as the data type plugin itself would supply
the API for grabbing a pre-determined base set of common data for that
data type. As an example, a ticket scanner could work off both GitHub
Issues, JIRA, and BugZilla. The scanner module would not need to know
how to handle these individual calls, as the data type module would
abstract that part and provide a standardized interface. This would
also mean we could expand into subversion/mercurial/etc territory by
abstracting the repository calls and providing a unified way of
interacting with commits of any repository type.
Provided this is agreeable, I am willing to spend both time and
resources on this (both that of my own, and that of my company).
WDYT?
With regards,
Daniel.
PS: I'd propose we start off with a mono-repo strategy, to ease the
deployment and release workflow. If we later feel that a split into
server/client is better, then we can do that.