Re: Streaming Data

2019-04-09 Thread Kenny Gorman
Nick,

Have you looked at Apache Flink? It’s got very powerful API’s and you can 
stream aggregations, filters, etc right to druid and also it has very robust 
state management that might be a good fit for your use case.

https://flink.apache.org/
https://github.com/druid-io/tranquility

Thanks,
Kenny Gorman
https://www.eventador.io

> On Apr 9, 2019, at 3:26 PM, Nick Torenvliet  wrote:
> 
> Hi all,
> 
> Just looking for some general guidance.
> 
> We have a kafka -> druid pipeline we intend to use in an industrial setting
> to monitor process data.
> 
> Our kafka system recieves messages on a single topic.
> 
> The messages are {"timestamp": yy:mm:ddThh:mm:ss.mmm, "plant_equipment_id":
> "id_string", "sensorvalue": float}
> 
> For our POC there are about 2000 unique plant_equipment ids, this will
> quickly grow to 20,000.
> 
> The kafka topic streams into druid
> 
> We are building some node.js/react browser based apps for analytics and
> real time stream monitoring.
> 
> We are thinking that for visualizing historical data sets we will hit druid
> for data.
> 
> For real time streaming we are wondering what our best option is.
> 
> One option is to just hit druid semi regularly and update the on screen
> visualization as data arrives from there.
> 
> Another option is to stream subset of the topics (somehow) from kafka using
> some streams interface.
> 
> With all the stock ticker apps out there, I have to imagine this is a
> really common use case.
> 
> Anyone have any thoughts as to what we are best to do?
> 
> Nick


Re: Kafka Monitoring

2017-06-20 Thread Kenny Gorman
Similar to other approaches, our service uses JMX via Jolokia and then we save 
the time-series data in Redis. Then we expose this in a number of ways 
including our dashboard, etc. We have found Redis to be quite good for a 
time-series backend for this purpose. This all gets setup automatically as part 
of our service, but it would also work very well stand-alone if you wanted to 
rig something similar yourself.

Ping me if you go this way, we can help.

Thanks,
Kenny Gorman
Founder and CEO
www.eventador.io

> On Jun 20, 2017, at 9:51 AM, Todd Palino  wrote:
> 
> Not for monitoring Kafka. We pull the JMX metrics two ways - one is a
> container that wraps around the Kafka application and annotates the beans
> to be emitted to Kafka as metrics, which gets pulled into our
> autometrics/InGraphs system for graphing. But for alerting, we use an agent
> that polls the critical metrics via JMX and pushes them into a separate
> system (that doesn’t use Kafka). ELK is used for log analysis for other
> applications.
> 
> Kafka-monitor is what we built/use for synthetic traffic monitoring for
> availability. And Burrow for monitoring consumers.
> 
> -Todd
> 
> 
> On Tue, Jun 20, 2017 at 9:53 AM, Andrew Hoblitzell <
> ahoblitz...@salesforce.com> wrote:
> 
>> Using Elasticsearch, Logstash, and Kibana is a pretty popular pattern at
>> LinkedIn.
>> 
>> Also giving honorable mentions to Kafka Monitor and Kafka Manager since
>> they hadn't been mentioned yet
>> https://github.com/yahoo/kafka-manager
>> https://github.com/linkedin/kafka-monitor
>> 
>> Thanks,
>> 
>> Andrew Hoblitzell
>> Sr. Software Engineer, Salesforce
>> 
>> 
>> On Tue, Jun 20, 2017 at 9:37 AM, Todd S  wrote:
>> 
>>> You can look at enabling JMX on kafka (
>>> https://stackoverflow.com/questions/36708384/enable-jmx-on-kafka-brokers
>> )
>>> using
>>> JMXTrans (https://github.com/jmxtrans/jmxtrans) and a config (
>>> https://github.com/wikimedia/puppet-kafka/blob/master/
>>> kafka-jmxtrans.json.md)
>>> to gather stats, and insert them into influxdb (
>>> https://www.digitalocean.com/community/tutorials/how-to-
>>> monitor-system-metrics-with-the-tick-stack-on-centos-7)
>>> then graph the resulsts with grafana (
>>> https://softwaremill.com/monitoring-apache-kafka-with-influxdb-grafana/,
>>> https://grafana.com/dashboards/721)
>>> 
>>> This is likely a solid day of work to get working nicely, but it also
>>> enables you to do a lot of extra cool stuff for monitoring, more than
>> just
>>> Kafka.  JMXTrans can be a bit of a pain, because Kafkas JMX metrics are
>> ..
>>> plentiful ... but the example configuration above should get you started.
>>> Using Telegraf to collect system stats and graph them with Grafana is
>>> really simple and powerful, as the Grafana community has a lot of
>> pre-built
>>> content you can steal and make quick wins with.
>>> 
>>> Monitoring Kafka can be a beast, but there is a lot of useful data there
>>> for if(when?) there is a problem.  The more time you spend with the
>>> metrics, the more you start to get a feel for the internals.
>>> 
>>> On Mon, Jun 19, 2017 at 6:52 PM, Muhammad Arshad <
>>> muhammad.ars...@alticeusa.com> wrote:
>>> 
>>>> Hi,
>>>> 
>>>> wanted to see if there is Kafka monitoring which is available. I am
>>>> looking to the following:
>>>> 
>>>> 
>>>> 
>>>> how much data came in at a certain time.
>>>> 
>>>> 
>>>> 
>>>> Thanks,
>>>> 
>>>> *Muhammad Faisal Arshad*
>>>> 
>>>> Manager, Enterprise Data Quality
>>>> 
>>>> Data Services & Architecture
>>>> 
>>>> [image:
>>>> http://www.multichannel.com/sites/default/files/public/
>>> styles/blog_content/public/Altice-NewLogo2017_RESIZED_0.
>> jpg?itok=RmwvsCI6]
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> The information transmitted in this email and any of its attachments is
>>>> intended only for the person or entity to which it is addressed and may
>>>> contain information concerning Altice USA and/or its affiliates and
>>>> subsidiaries that is proprietary, privileged, confidential and/or
>> subject
>>>> to copyright. Any review, retransmission, dissemination or other use
>> of,
>>> or
>>>> taking of any action in reliance upon, this information by persons or
>>>> entities other than the intended recipient(s) is prohibited and may be
>>>> unlawful. If you received this in error, please contact the sender
>>>> immediately and delete and destroy the communication and all of the
>>>> attachments you have received and all copies thereof.
>>>> 
>>>> 
>>>> 
>>> 
>> 
> 
> 
> 
> -- 
> *Todd Palino*
> Senior Staff Engineer, Site Reliability
> Data Infrastructure Streaming
> 
> 
> 
> linkedin.com/in/toddpalino



Re: [ANNOUNCE] Apache Kafka 0.10.2.0 Released

2017-02-22 Thread Kenny Gorman
We are excited about this release! Excellent work!

Thanks
Kenny Gorman
www.eventador.io

> On Feb 22, 2017, at 2:33 AM, Ewen Cheslack-Postava  wrote:
> 
> The Apache Kafka community is pleased to announce the release for Apache
> Kafka 0.10.2.0. This is a feature release which includes the completion
> of 15 KIPs, over 200 bug fixes and improvements, and more than 500 pull
> requests merged.
> 
> All of the changes in this release can be found in the release notes:
> https://archive.apache.org/dist/kafka/0.10.2.0/RELEASE_NOTES.html
> 
> Apache Kafka is a distributed streaming platform with four four core
> APIs:
> 
> ** The Producer API allows an application to publish a stream records to
> one or more Kafka topics.
> 
> ** The Consumer API allows an application to subscribe to one or more
> topics and process the stream of records produced to them.
> 
> ** The Streams API allows an application to act as a stream processor,
> consuming an input stream from one or more topics and producing an
> output
> stream to one or more output topics, effectively transforming the input
> streams to output streams.
> 
> ** The Connector API allows building and running reusable producers or
> consumers that connect Kafka topics to existing applications or data
> systems. For example, a connector to a relational database might capture
> every change to a table.three key capabilities:
> 
> 
> With these APIs, Kafka can be used for two broad classes of application:
> 
> ** Building real-time streaming data pipelines that reliably get data
> between systems or applications.
> 
> ** Building real-time streaming applications that transform or react to
> the
> streams of data.
> 
> 
> You can download the source release from
> https://www.apache.org/dyn/closer.cgi?path=/kafka/0.10.2.0/kafka-0.10.2.0-src.tgz
> 
> and binary releases from
> https://www.apache.org/dyn/closer.cgi?path=/kafka/0.10.2.0/kafka_2.11-0.10.2.0.tgz
> https://www.apache.org/dyn/closer.cgi?path=/kafka/0.10.2.0/kafka_2.10-0.10.2.0.tgz
> https://www.apache.org/dyn/closer.cgi?path=/kafka/0.10.2.0/kafka_2.12-0.10.2.0.tgz
> (experimental 2.12 artifact)
> 
> Thanks to the 101 contributors on this release!
> 
> Akash Sethi, Alex Loddengaard, Alexey Ozeritsky, amethystic, Andrea
> Cosentino, Andrew Olson, Andrew Stevenson, Anton Karamanov, Antony
> Stubbs, Apurva Mehta, Arun Mahadevan, Ashish Singh, Balint Molnar, Ben
> Stopford, Bernard Leach, Bill Bejeck, Colin P. Mccabe, Damian Guy, Dan
> Norwood, Dana Powers, dasl, Derrick Or, Dong Lin, Dustin Cote, Edoardo
> Comar, Edward Ribeiro, Elias Levy, Emanuele Cesena, Eno Thereska, Ewen
> Cheslack-Postava, Flavio Junqueira, fpj, Geoff Anderson, Guozhang Wang,
> Gwen Shapira, Hikiko Murakami, Himani Arora, himani1, Hojjat Jafarpour,
> huxi, Ishita Mandhan, Ismael Juma, Jakub Dziworski, Jan Lukavsky, Jason
> Gustafson, Jay Kreps, Jeff Widman, Jeyhun Karimov, Jiangjie Qin, Joel
> Koshy, Jon Freedman, Joshi, Jozef Koval, Json Tu, Jun He, Jun Rao,
> Kamal, Kamal C, Kamil Szymanski, Kim Christensen, Kiran Pillarisetty,
> Konstantine Karantasis, Lihua Xin, LoneRifle, Magnus Edenhill, Magnus
> Reftel, Manikumar Reddy O, Mark Rose, Mathieu Fenniak, Matthias J. Sax,
> Mayuresh Gharat, MayureshGharat, Michael Schiff, Mickael Maison,
> MURAKAMI Masahiko, Nikki Thean, Olivier Girardot, pengwei-li, pilo,
> Prabhat Kashyap, Qian Zheng, Radai Rosenblatt, radai-rosenblatt, Raghav
> Kumar Gautam, Rajini Sivaram, Rekha Joshi, rnpridgeon, Ryan Pridgeon,
> Sandesh K, Scott Ferguson, Shikhar Bhushan, steve, Stig Rohde Døssing,
> Sumant Tambe, Sumit Arrawatia, Theo, Tim Carey-Smith, Tu Yang, Vahid
> Hashemian, wangzzu, Will Marshall, Xavier Léauté, Xavier Léauté, Xi Hu,
> Yang Wei, yaojuncn, Yuto Kawamura
> 
> We welcome your help and feedback. For more information on how to
> report problems, and to get involved, visit the project website at
> http://kafka.apache.org/
> 
> Thanks,
> Ewen


Re: Regarding Connection Problem

2016-12-17 Thread Kenny Gorman
Here are some examples, hope they help:

https://github.com/Eventador/examples/tree/master/node

Thanks
Kenny Gorman
www.eventador.io



Sent from my iPhone
> On Dec 17, 2016, at 1:56 PM, Hans Jespersen  wrote:
> 
> I would recommend you use either the Blizzard node-rdkafka module ( see 
> https://github.com/Blizzard/node-rdkafka 
> <https://github.com/Blizzard/node-rdkafka>) or the Confluent kafka-rest-node 
> module ( see https://github.com/confluentinc/kafka-rest-node 
> <https://github.com/confluentinc/kafka-rest-node>)
> 
> The Blizzard code is a node.js wrapper on the excellent C language librdkafka 
> library so it talkes native Kafka wire protocol.
> 
> The Confluent code is a pure javascript node.js interface to the Confluent 
> Kafka REST Proxy (see 
> http://docs.confluent.io/3.1.0/kafka-rest/docs/index.html 
> <http://docs.confluent.io/3.1.0/kafka-rest/docs/index.html>) so it uses 
> HTTP(S) between your node.js app and the Kafka REST Proxy which makes it a 
> good choice if your code is running outside the datacenter and behind 
> firewalls that might not allow direct Kafka TCP client connections to remote 
> Kafka Brokers.
> 
> I have personally used both on these node.js libraries in a number of my own 
> node Kafka applications and they are both up to date and stable.
> 
> -hans
> 
> 
> 
> 
>> On Dec 16, 2016, at 7:01 PM, Chintan Bhatt  
>> wrote:
>> 
>> Hi
>> I want to give continuous output (avg. temperature) generated from node.js
>> to store on Hadoop and then retrieve it for visualization.
>> please guide me how to give continuous output of node.js to kafka.
>> 
>> -- 
>> CHINTAN BHATT <http://in.linkedin.com/pub/chintan-bhatt/22/b31/336/>
>> 
>> -- 
>> 
>> 
>> DISCLAIMER: The information transmitted is intended only for the person or 
>> entity to which it is addressed and may contain confidential and/or 
>> privileged material which is the intellectual property of Charotar 
>> University of Science & Technology (CHARUSAT). Any review, retransmission, 
>> dissemination or other use of, or taking of any action in reliance upon 
>> this information by persons or entities other than the intended recipient 
>> is strictly prohibited. If you are not the intended recipient, or the 
>> employee, or agent responsible for delivering the message to the intended 
>> recipient and/or if you have received this in error, please contact the 
>> sender and delete the material from the computer or device. CHARUSAT does 
>> not take any liability or responsibility for any malicious codes/software 
>> and/or viruses/Trojan horses that may have been picked up during the 
>> transmission of this message. By opening and solely relying on the contents 
>> or part thereof this message, and taking action thereof, the recipient 
>> relieves the CHARUSAT of all the liabilities including any damages done to 
>> the recipient's pc/laptop/peripherals and other communication devices due 
>> to any reason.
> 


Re: Kafka as a database/repository question

2016-12-15 Thread Kenny Gorman
A couple thoughts..

- If you plan on fetching old messages in a non-contiguous manner then this may 
not be the best design. For instance, “give me messages from mondays for the 
last 3 quarters” is better served with a database. But if you want to say “give 
me messages from the last month until now” that works great.

- I am not sure what you mean by updating messages. You would need to have some 
sort of key and push in new messages with that key. Then when you read by key, 
the application should understand that the latest is the version it should use.

- Alternatively, you can consume to something like a DB and use SQL to select 
what you want using regular SQL. We see this pattern a lot.

- For storing messages indefinitely it’s mostly making sure the config options 
are set appropriately and you have enough storage space. Set replication to 
something that makes you comfortable, maybe take backups as was mentioned.

Hope this helps some

Kenny Gorman
Founder
www.eventador.io


> On Dec 15, 2016, at 12:00 PM, Susheel Kumar  wrote:
> 
> Hello Folks,
> 
> I am going thru an existing design where Kafka is planned to be utilised in
> below manner
> 
> 
>   1. Messages will pushed to Kafka by producers
>   2. There will be updates to existing messages on ongoing basis.  The
>   expectation is that all the updates are consolidated in Kafka and the
>   latest and greatest version/copy is kept
>   3. Consumers will read the messages from Kafka and push to Solr for
>   ingestion purposes
>   4. There will be no purging/removal of messages since it is expected to
>   replay the messages in the future and perform full-re-ingestion.  So
>   messages will be kept in Kafka for indefinite period similar to database
>   where data once stored remains there and can be used later in teh future.
> 
> 
> Do you see any pitfalls / any issue with this design especially wrt to
> storing the messages indefinitely.
> 
> 
> Thanks,
> Susheel



Re: Added to Wiki please

2016-11-07 Thread Kenny Gorman
Gwen,

Makes total sense! Sorry for the wide distribution then, my apologies to the 
list.

Kenny Gorman
Founder
www.eventador.io


> On Nov 5, 2016, at 9:30 PM, Gwen Shapira  wrote:
> 
> Hi Kenny,
> 
> First, thank you for letting the community know about your valuable service.
> Second, the wiki you pointed to is for companies using Kafka, not for
> vendors selling Kafka. We are trying to avoid commercializing the
> community Wiki.  Notice that Heroku, Cloudera, Hortonworks, and
> Confluent are all missing from the wiki.
> 
> If you have customers who are interested in being listed, we will
> gladly include them - since they are using Kafka through your service.
> 
> Hope this clarifies the use of the Powered-By page.
> 
> Gwen
> 
> On Wed, Nov 2, 2016 at 12:11 PM, Kenny Gorman  wrote:
>> Per the wiki, I am emailing the list for this. Can you please add us to 
>> https://cwiki.apache.org/confluence/display/KAFKA/Powered+By?
>> 
>> Eventador.io (https://www.eventador.io/) is a whole stack Kafka as-a-service 
>> company. We enable developers to quickly create and painlessly manage 
>> real-time data pipelines on Apache Kafka.
>> 
>> Thx!!
>> Kenny Gorman
>> Founder
>> www.eventador.io
> 
> 
> 
> -- 
> Gwen Shapira
> Product Manager | Confluent
> 650.450.2760 | @gwenshap
> Follow us: Twitter | blog



Added to Wiki please

2016-11-02 Thread Kenny Gorman
Per the wiki, I am emailing the list for this. Can you please add us to 
https://cwiki.apache.org/confluence/display/KAFKA/Powered+By?

Eventador.io (https://www.eventador.io/) is a whole stack Kafka as-a-service 
company. We enable developers to quickly create and painlessly manage real-time 
data pipelines on Apache Kafka.

Thx!!
Kenny Gorman
Founder 
www.eventador.io

Re: Benchmarking kafka performance

2016-09-23 Thread Kenny Gorman
Vadim,

We mostly made this little script as a joke. Remember the unix utility ‘yes’?

It does in fact work if you want to simply direct some random load at Kafka to 
test things. Throw it into Docker and run a bunch of them. ;-)

https://github.com/Eventador/evtools/tree/master/yesbench

In terms of metrics, we measure through JMX:
- BytesInPerSec/BytesOutPerSec
- TotalTimeMs

This is also a good post:
https://www.datadoghq.com/blog/monitoring-kafka-performance-metrics/

Thanks
Kenny Gorman
http://www.eventador.io


> On Sep 22, 2016, at 3:21 PM, Vadim Keylis  wrote:
> 
> Good afternoon. Any suggestions regarding benchmark tool would be greatly
> appreciated.
> 
> Thanks
> 
> On Mon, Sep 19, 2016 at 8:18 AM, Vadim Keylis  wrote:
> 
>> Good morning. Which benchmarking tools we should use to compare
>> performance of 0.8 and 0.10 versions? Which metrics  should we monitor ?
>> 
>> Thanks in advance,
>> Vadim
>>