Re: Hello from Apache Streams

Daniel Gruno Sun, 03 Dec 2017 04:17:28 -0800

On 12/03/2017 01:02 PM, Daniel Gruno wrote:
> On 12/02/2017 10:41 PM, Steve Blackmon wrote:
>>  Sorry about that!  Here’s a link to the notebook that doesn’t require
>> registration.
>>
>> https://www.zepl.com/viewer/notebooks/bm90ZTovL3N0ZXZlYmxhY2ttb24vYXBhY2hlLXplcHBlbGluLWRhc2hib2FyZC84YjQ5YmY3MWIxYTU0ZTE2YjlkMDQyMTliMzNlMjQzYS9ub3RlLmpzb24
>>
>> In this notebook we used the %spark interpreter to collect the data, but
>> most of the work is done as scala in the driver process.  The streams code
>> base is java and not dependent on spark or other frameworks external to the
>> jar file.
>>
>> The easiest integration I can think of given the python/java language gap
>> would use docker - Streams could prepare a docker container packaged with
>> all the necessary code, and Kibble installations could use it to run ad-hoc
>> or scheduled data processes.  The data collected could be written as
>> new-line delimited json on container mounted volumes,  or directly to an
>> elasticsearch index.
>>
>> Docker’s not really necessary though, if the system where Kibble’s running
>> has a JRE configured and a streams distribution local that could work too.
> 
> Right, but probably the easiest entry point for people just "wanting to
> get things done" :). I could also imagine us setting up a remote service
> that could handle this via HTTP API as an alternate solution, akin to
> how you would use a GitHub API - that is to say, we'd have a VM that you
> could query and it'd have all the Java in place for speedy access to
> these sort of things. Either or both would work for me, and if streams
> is willing to sort out the actual data gathering, we could have this put
> into ES quickly and get started on using the data gathered.
> 
> I'll have to ponder how we're going to present this, and which charts
> would be most informative here. There is a lot of potential here.
> 
> If Streams can provide us with a "run this" sort of container that can
> spit out JSON, that would be awesome. While ES directly might be easier,
> there's the use-case scenario where ES is not local to the system
> (Kibble is intended to support both local ES and remote-via-json-api
> systems), so a JSON output might be the best for now.
> 
> With regards,
> Daniel.


A request: Could we get this JSON output as a single document per
repost/like? That is to say, every time jane doe does a retweet etc of
one of our tweets, that should be one document with the various data
fields. This would allow for some interesting mappings instead of just
bar charts :)

> 
>>
>> Steve
>>
>> On Dec 2, 2017 at 2:10 PM, Daniel Gruno <humbed...@apache.org> wrote:
>>
>>
>> On 12/02/2017 09:07 PM, Steve Blackmon wrote:
>>
>> Hi Kibble Team,
>>
>> I've been checking out the code and the demo site this weekend.
>>
>> I'm interested in joining the team and integrating some of the data
>> sources maintained in http://streams.apache.org
>>
>> Specifically, activity streams from the social media presences of
>> projects and contributors (who opt in) as well as statistics derived
>> from them could make a nice addition to Kibble.
>>
>> Here's an example: analysis of Twitter accounts of Apache project
>> using Streams and Zeppelin:
>> https://www.zepl.com/UvGWgAZb7/spaces/Sb9ElZuDD/8b49bf71b1a54e16b9d04219b33e243a
>>
>> Cheers,
>>
>> Steve Blackmon
>> sblack...@apache.org
>>
>>
>> Hi Steve,
>> I like the idea, but I am unable to see the link you shared, it shows a
>> 404 for me :(. Having said that, looking into the social media space is
>> definitely something worth doing!
>>
>> With regards,
>> Daniel.
>>
>

Re: Hello from Apache Streams

Reply via email to