On 12/03/2017 01:02 PM, Daniel Gruno wrote: > On 12/02/2017 10:41 PM, Steve Blackmon wrote: >> Sorry about that! Here’s a link to the notebook that doesn’t require >> registration. >> >> https://www.zepl.com/viewer/notebooks/bm90ZTovL3N0ZXZlYmxhY2ttb24vYXBhY2hlLXplcHBlbGluLWRhc2hib2FyZC84YjQ5YmY3MWIxYTU0ZTE2YjlkMDQyMTliMzNlMjQzYS9ub3RlLmpzb24 >> >> In this notebook we used the %spark interpreter to collect the data, but >> most of the work is done as scala in the driver process. The streams code >> base is java and not dependent on spark or other frameworks external to the >> jar file. >> >> The easiest integration I can think of given the python/java language gap >> would use docker - Streams could prepare a docker container packaged with >> all the necessary code, and Kibble installations could use it to run ad-hoc >> or scheduled data processes. The data collected could be written as >> new-line delimited json on container mounted volumes, or directly to an >> elasticsearch index. >> >> Docker’s not really necessary though, if the system where Kibble’s running >> has a JRE configured and a streams distribution local that could work too. > > Right, but probably the easiest entry point for people just "wanting to > get things done" :). I could also imagine us setting up a remote service > that could handle this via HTTP API as an alternate solution, akin to > how you would use a GitHub API - that is to say, we'd have a VM that you > could query and it'd have all the Java in place for speedy access to > these sort of things. Either or both would work for me, and if streams > is willing to sort out the actual data gathering, we could have this put > into ES quickly and get started on using the data gathered. > > I'll have to ponder how we're going to present this, and which charts > would be most informative here. There is a lot of potential here. > > If Streams can provide us with a "run this" sort of container that can > spit out JSON, that would be awesome. While ES directly might be easier, > there's the use-case scenario where ES is not local to the system > (Kibble is intended to support both local ES and remote-via-json-api > systems), so a JSON output might be the best for now. > > With regards, > Daniel.
A request: Could we get this JSON output as a single document per repost/like? That is to say, every time jane doe does a retweet etc of one of our tweets, that should be one document with the various data fields. This would allow for some interesting mappings instead of just bar charts :) > >> >> Steve >> >> On Dec 2, 2017 at 2:10 PM, Daniel Gruno <humbed...@apache.org> wrote: >> >> >> On 12/02/2017 09:07 PM, Steve Blackmon wrote: >> >> Hi Kibble Team, >> >> I've been checking out the code and the demo site this weekend. >> >> I'm interested in joining the team and integrating some of the data >> sources maintained in http://streams.apache.org >> >> Specifically, activity streams from the social media presences of >> projects and contributors (who opt in) as well as statistics derived >> from them could make a nice addition to Kibble. >> >> Here's an example: analysis of Twitter accounts of Apache project >> using Streams and Zeppelin: >> https://www.zepl.com/UvGWgAZb7/spaces/Sb9ElZuDD/8b49bf71b1a54e16b9d04219b33e243a >> >> Cheers, >> >> Steve Blackmon >> sblack...@apache.org >> >> >> Hi Steve, >> I like the idea, but I am unable to see the link you shared, it shows a >> 404 for me :(. Having said that, looking into the social media space is >> definitely something worth doing! >> >> With regards, >> Daniel. >> >