Re: Hello from Apache Streams

Steve Blackmon Sun, 03 Dec 2017 08:53:38 -0800

if streams
is willing to sort out the actual data gathering, we could have this put
into ES quickly and get started on using the data gathered.

I’ll re-run that notebook and share all of the raw historical data via a
zip file that the ASF deployment of kibble could incorporate to aid
development of the front-end.

A request: Could we get this JSON output as a single document per
repost/like? That is to say, every time jane doe does a retweet etc of
one of our tweets, that should be one document with the various data
fields. This would allow for some interesting mappings instead of just
bar charts :)

It’s not currently possible using the Rest API to get a list of everyone
who liked a tweet, but it is possible for retweets. I created STREAMS-550
to enable that capability. Also created STREAMS-551 to get all the needed
pieces into a container.

I'll have to ponder how we're going to present this, and which charts
would be most informative here. There is a lot of potential here.

Much more than just bar charts for sure.

Steve

On Dec 3, 2017 at 6:17 AM, Daniel Gruno <[email protected]> wrote:

On 12/03/2017 01:02 PM, Daniel Gruno wrote:

On 12/02/2017 10:41 PM, Steve Blackmon wrote:

Sorry about that! Here’s a link to the notebook that doesn’t require
registration.

https://www.zepl.com/viewer/notebooks/bm90ZTovL3N0ZXZlYmxhY2ttb24vYXBhY2hlLXplcHBlbGluLWRhc2hib2FyZC84YjQ5YmY3MWIxYTU0ZTE2YjlkMDQyMTliMzNlMjQzYS9ub3RlLmpzb24

In this notebook we used the %spark interpreter to collect the data, but
most of the work is done as scala in the driver process. The streams code
base is java and not dependent on spark or other frameworks external to the
jar file.

The easiest integration I can think of given the python/java language gap
would use docker - Streams could prepare a docker container packaged with
all the necessary code, and Kibble installations could use it to run ad-hoc
or scheduled data processes. The data collected could be written as
new-line delimited json on container mounted volumes, or directly to an
elasticsearch index.

Docker’s not really necessary though, if the system where Kibble’s running
has a JRE configured and a streams distribution local that could work too.

Right, but probably the easiest entry point for people just "wanting to
get things done" :). I could also imagine us setting up a remote service
that could handle this via HTTP API as an alternate solution, akin to
how you would use a GitHub API - that is to say, we'd have a VM that you
could query and it'd have all the Java in place for speedy access to
these sort of things. Either or both would work for me, and if streams
is willing to sort out the actual data gathering, we could have this put
into ES quickly and get started on using the data gathered.

I'll have to ponder how we're going to present this, and which charts
would be most informative here. There is a lot of potential here.

If Streams can provide us with a "run this" sort of container that can
spit out JSON, that would be awesome. While ES directly might be easier,
there's the use-case scenario where ES is not local to the system
(Kibble is intended to support both local ES and remote-via-json-api
systems), so a JSON output might be the best for now.

With regards,
Daniel.

Steve

On Dec 2, 2017 at 2:10 PM, Daniel Gruno <[email protected]> wrote:

On 12/02/2017 09:07 PM, Steve Blackmon wrote:

Hi Kibble Team,

I've been checking out the code and the demo site this weekend.

I'm interested in joining the team and integrating some of the data
sources maintained in http://streams.apache.org

Specifically, activity streams from the social media presences of
projects and contributors (who opt in) as well as statistics derived
from them could make a nice addition to Kibble.

Here's an example: analysis of Twitter accounts of Apache project
using Streams and Zeppelin:
https://www.zepl.com/UvGWgAZb7/spaces/Sb9ElZuDD/8b49bf71b1a54e16b9d04219b33e243a

Cheers,

Steve Blackmon
[email protected]

Hi Steve,
I like the idea, but I am unable to see the link you shared, it shows a
404 for me :(. Having said that, looking into the social media space is
definitely something worth doing!

With regards,
Daniel.

Re: Hello from Apache Streams

Reply via email to