Re: Streaming Data
Nick, Have you looked at Apache Flink? It’s got very powerful API’s and you can stream aggregations, filters, etc right to druid and also it has very robust state management that might be a good fit for your use case. https://flink.apache.org/ https://github.com/druid-io/tranquility Thanks, Kenny Gorman https://www.eventador.io > On Apr 9, 2019, at 3:26 PM, Nick Torenvliet wrote: > > Hi all, > > Just looking for some general guidance. > > We have a kafka -> druid pipeline we intend to use in an industrial setting > to monitor process data. > > Our kafka system recieves messages on a single topic. > > The messages are {"timestamp": yy:mm:ddThh:mm:ss.mmm, "plant_equipment_id": > "id_string", "sensorvalue": float} > > For our POC there are about 2000 unique plant_equipment ids, this will > quickly grow to 20,000. > > The kafka topic streams into druid > > We are building some node.js/react browser based apps for analytics and > real time stream monitoring. > > We are thinking that for visualizing historical data sets we will hit druid > for data. > > For real time streaming we are wondering what our best option is. > > One option is to just hit druid semi regularly and update the on screen > visualization as data arrives from there. > > Another option is to stream subset of the topics (somehow) from kafka using > some streams interface. > > With all the stock ticker apps out there, I have to imagine this is a > really common use case. > > Anyone have any thoughts as to what we are best to do? > > Nick
Re: Kafka Monitoring
Similar to other approaches, our service uses JMX via Jolokia and then we save the time-series data in Redis. Then we expose this in a number of ways including our dashboard, etc. We have found Redis to be quite good for a time-series backend for this purpose. This all gets setup automatically as part of our service, but it would also work very well stand-alone if you wanted to rig something similar yourself. Ping me if you go this way, we can help. Thanks, Kenny Gorman Founder and CEO www.eventador.io > On Jun 20, 2017, at 9:51 AM, Todd Palino wrote: > > Not for monitoring Kafka. We pull the JMX metrics two ways - one is a > container that wraps around the Kafka application and annotates the beans > to be emitted to Kafka as metrics, which gets pulled into our > autometrics/InGraphs system for graphing. But for alerting, we use an agent > that polls the critical metrics via JMX and pushes them into a separate > system (that doesn’t use Kafka). ELK is used for log analysis for other > applications. > > Kafka-monitor is what we built/use for synthetic traffic monitoring for > availability. And Burrow for monitoring consumers. > > -Todd > > > On Tue, Jun 20, 2017 at 9:53 AM, Andrew Hoblitzell < > ahoblitz...@salesforce.com> wrote: > >> Using Elasticsearch, Logstash, and Kibana is a pretty popular pattern at >> LinkedIn. >> >> Also giving honorable mentions to Kafka Monitor and Kafka Manager since >> they hadn't been mentioned yet >> https://github.com/yahoo/kafka-manager >> https://github.com/linkedin/kafka-monitor >> >> Thanks, >> >> Andrew Hoblitzell >> Sr. Software Engineer, Salesforce >> >> >> On Tue, Jun 20, 2017 at 9:37 AM, Todd S wrote: >> >>> You can look at enabling JMX on kafka ( >>> https://stackoverflow.com/questions/36708384/enable-jmx-on-kafka-brokers >> ) >>> using >>> JMXTrans (https://github.com/jmxtrans/jmxtrans) and a config ( >>> https://github.com/wikimedia/puppet-kafka/blob/master/ >>> kafka-jmxtrans.json.md) >>> to gather stats, and insert them into influxdb ( >>> https://www.digitalocean.com/community/tutorials/how-to- >>> monitor-system-metrics-with-the-tick-stack-on-centos-7) >>> then graph the resulsts with grafana ( >>> https://softwaremill.com/monitoring-apache-kafka-with-influxdb-grafana/, >>> https://grafana.com/dashboards/721) >>> >>> This is likely a solid day of work to get working nicely, but it also >>> enables you to do a lot of extra cool stuff for monitoring, more than >> just >>> Kafka. JMXTrans can be a bit of a pain, because Kafkas JMX metrics are >> .. >>> plentiful ... but the example configuration above should get you started. >>> Using Telegraf to collect system stats and graph them with Grafana is >>> really simple and powerful, as the Grafana community has a lot of >> pre-built >>> content you can steal and make quick wins with. >>> >>> Monitoring Kafka can be a beast, but there is a lot of useful data there >>> for if(when?) there is a problem. The more time you spend with the >>> metrics, the more you start to get a feel for the internals. >>> >>> On Mon, Jun 19, 2017 at 6:52 PM, Muhammad Arshad < >>> muhammad.ars...@alticeusa.com> wrote: >>> >>>> Hi, >>>> >>>> wanted to see if there is Kafka monitoring which is available. I am >>>> looking to the following: >>>> >>>> >>>> >>>> how much data came in at a certain time. >>>> >>>> >>>> >>>> Thanks, >>>> >>>> *Muhammad Faisal Arshad* >>>> >>>> Manager, Enterprise Data Quality >>>> >>>> Data Services & Architecture >>>> >>>> [image: >>>> http://www.multichannel.com/sites/default/files/public/ >>> styles/blog_content/public/Altice-NewLogo2017_RESIZED_0. >> jpg?itok=RmwvsCI6] >>>> >>>> >>>> >>>> >>>> >>>> The information transmitted in this email and any of its attachments is >>>> intended only for the person or entity to which it is addressed and may >>>> contain information concerning Altice USA and/or its affiliates and >>>> subsidiaries that is proprietary, privileged, confidential and/or >> subject >>>> to copyright. Any review, retransmission, dissemination or other use >> of, >>> or >>>> taking of any action in reliance upon, this information by persons or >>>> entities other than the intended recipient(s) is prohibited and may be >>>> unlawful. If you received this in error, please contact the sender >>>> immediately and delete and destroy the communication and all of the >>>> attachments you have received and all copies thereof. >>>> >>>> >>>> >>> >> > > > > -- > *Todd Palino* > Senior Staff Engineer, Site Reliability > Data Infrastructure Streaming > > > > linkedin.com/in/toddpalino
Re: [ANNOUNCE] Apache Kafka 0.10.2.0 Released
We are excited about this release! Excellent work! Thanks Kenny Gorman www.eventador.io > On Feb 22, 2017, at 2:33 AM, Ewen Cheslack-Postava wrote: > > The Apache Kafka community is pleased to announce the release for Apache > Kafka 0.10.2.0. This is a feature release which includes the completion > of 15 KIPs, over 200 bug fixes and improvements, and more than 500 pull > requests merged. > > All of the changes in this release can be found in the release notes: > https://archive.apache.org/dist/kafka/0.10.2.0/RELEASE_NOTES.html > > Apache Kafka is a distributed streaming platform with four four core > APIs: > > ** The Producer API allows an application to publish a stream records to > one or more Kafka topics. > > ** The Consumer API allows an application to subscribe to one or more > topics and process the stream of records produced to them. > > ** The Streams API allows an application to act as a stream processor, > consuming an input stream from one or more topics and producing an > output > stream to one or more output topics, effectively transforming the input > streams to output streams. > > ** The Connector API allows building and running reusable producers or > consumers that connect Kafka topics to existing applications or data > systems. For example, a connector to a relational database might capture > every change to a table.three key capabilities: > > > With these APIs, Kafka can be used for two broad classes of application: > > ** Building real-time streaming data pipelines that reliably get data > between systems or applications. > > ** Building real-time streaming applications that transform or react to > the > streams of data. > > > You can download the source release from > https://www.apache.org/dyn/closer.cgi?path=/kafka/0.10.2.0/kafka-0.10.2.0-src.tgz > > and binary releases from > https://www.apache.org/dyn/closer.cgi?path=/kafka/0.10.2.0/kafka_2.11-0.10.2.0.tgz > https://www.apache.org/dyn/closer.cgi?path=/kafka/0.10.2.0/kafka_2.10-0.10.2.0.tgz > https://www.apache.org/dyn/closer.cgi?path=/kafka/0.10.2.0/kafka_2.12-0.10.2.0.tgz > (experimental 2.12 artifact) > > Thanks to the 101 contributors on this release! > > Akash Sethi, Alex Loddengaard, Alexey Ozeritsky, amethystic, Andrea > Cosentino, Andrew Olson, Andrew Stevenson, Anton Karamanov, Antony > Stubbs, Apurva Mehta, Arun Mahadevan, Ashish Singh, Balint Molnar, Ben > Stopford, Bernard Leach, Bill Bejeck, Colin P. Mccabe, Damian Guy, Dan > Norwood, Dana Powers, dasl, Derrick Or, Dong Lin, Dustin Cote, Edoardo > Comar, Edward Ribeiro, Elias Levy, Emanuele Cesena, Eno Thereska, Ewen > Cheslack-Postava, Flavio Junqueira, fpj, Geoff Anderson, Guozhang Wang, > Gwen Shapira, Hikiko Murakami, Himani Arora, himani1, Hojjat Jafarpour, > huxi, Ishita Mandhan, Ismael Juma, Jakub Dziworski, Jan Lukavsky, Jason > Gustafson, Jay Kreps, Jeff Widman, Jeyhun Karimov, Jiangjie Qin, Joel > Koshy, Jon Freedman, Joshi, Jozef Koval, Json Tu, Jun He, Jun Rao, > Kamal, Kamal C, Kamil Szymanski, Kim Christensen, Kiran Pillarisetty, > Konstantine Karantasis, Lihua Xin, LoneRifle, Magnus Edenhill, Magnus > Reftel, Manikumar Reddy O, Mark Rose, Mathieu Fenniak, Matthias J. Sax, > Mayuresh Gharat, MayureshGharat, Michael Schiff, Mickael Maison, > MURAKAMI Masahiko, Nikki Thean, Olivier Girardot, pengwei-li, pilo, > Prabhat Kashyap, Qian Zheng, Radai Rosenblatt, radai-rosenblatt, Raghav > Kumar Gautam, Rajini Sivaram, Rekha Joshi, rnpridgeon, Ryan Pridgeon, > Sandesh K, Scott Ferguson, Shikhar Bhushan, steve, Stig Rohde Døssing, > Sumant Tambe, Sumit Arrawatia, Theo, Tim Carey-Smith, Tu Yang, Vahid > Hashemian, wangzzu, Will Marshall, Xavier Léauté, Xavier Léauté, Xi Hu, > Yang Wei, yaojuncn, Yuto Kawamura > > We welcome your help and feedback. For more information on how to > report problems, and to get involved, visit the project website at > http://kafka.apache.org/ > > Thanks, > Ewen
Re: Regarding Connection Problem
Here are some examples, hope they help: https://github.com/Eventador/examples/tree/master/node Thanks Kenny Gorman www.eventador.io Sent from my iPhone > On Dec 17, 2016, at 1:56 PM, Hans Jespersen wrote: > > I would recommend you use either the Blizzard node-rdkafka module ( see > https://github.com/Blizzard/node-rdkafka > <https://github.com/Blizzard/node-rdkafka>) or the Confluent kafka-rest-node > module ( see https://github.com/confluentinc/kafka-rest-node > <https://github.com/confluentinc/kafka-rest-node>) > > The Blizzard code is a node.js wrapper on the excellent C language librdkafka > library so it talkes native Kafka wire protocol. > > The Confluent code is a pure javascript node.js interface to the Confluent > Kafka REST Proxy (see > http://docs.confluent.io/3.1.0/kafka-rest/docs/index.html > <http://docs.confluent.io/3.1.0/kafka-rest/docs/index.html>) so it uses > HTTP(S) between your node.js app and the Kafka REST Proxy which makes it a > good choice if your code is running outside the datacenter and behind > firewalls that might not allow direct Kafka TCP client connections to remote > Kafka Brokers. > > I have personally used both on these node.js libraries in a number of my own > node Kafka applications and they are both up to date and stable. > > -hans > > > > >> On Dec 16, 2016, at 7:01 PM, Chintan Bhatt >> wrote: >> >> Hi >> I want to give continuous output (avg. temperature) generated from node.js >> to store on Hadoop and then retrieve it for visualization. >> please guide me how to give continuous output of node.js to kafka. >> >> -- >> CHINTAN BHATT <http://in.linkedin.com/pub/chintan-bhatt/22/b31/336/> >> >> -- >> >> >> DISCLAIMER: The information transmitted is intended only for the person or >> entity to which it is addressed and may contain confidential and/or >> privileged material which is the intellectual property of Charotar >> University of Science & Technology (CHARUSAT). Any review, retransmission, >> dissemination or other use of, or taking of any action in reliance upon >> this information by persons or entities other than the intended recipient >> is strictly prohibited. If you are not the intended recipient, or the >> employee, or agent responsible for delivering the message to the intended >> recipient and/or if you have received this in error, please contact the >> sender and delete the material from the computer or device. CHARUSAT does >> not take any liability or responsibility for any malicious codes/software >> and/or viruses/Trojan horses that may have been picked up during the >> transmission of this message. By opening and solely relying on the contents >> or part thereof this message, and taking action thereof, the recipient >> relieves the CHARUSAT of all the liabilities including any damages done to >> the recipient's pc/laptop/peripherals and other communication devices due >> to any reason. >
Re: Kafka as a database/repository question
A couple thoughts.. - If you plan on fetching old messages in a non-contiguous manner then this may not be the best design. For instance, “give me messages from mondays for the last 3 quarters” is better served with a database. But if you want to say “give me messages from the last month until now” that works great. - I am not sure what you mean by updating messages. You would need to have some sort of key and push in new messages with that key. Then when you read by key, the application should understand that the latest is the version it should use. - Alternatively, you can consume to something like a DB and use SQL to select what you want using regular SQL. We see this pattern a lot. - For storing messages indefinitely it’s mostly making sure the config options are set appropriately and you have enough storage space. Set replication to something that makes you comfortable, maybe take backups as was mentioned. Hope this helps some Kenny Gorman Founder www.eventador.io > On Dec 15, 2016, at 12:00 PM, Susheel Kumar wrote: > > Hello Folks, > > I am going thru an existing design where Kafka is planned to be utilised in > below manner > > > 1. Messages will pushed to Kafka by producers > 2. There will be updates to existing messages on ongoing basis. The > expectation is that all the updates are consolidated in Kafka and the > latest and greatest version/copy is kept > 3. Consumers will read the messages from Kafka and push to Solr for > ingestion purposes > 4. There will be no purging/removal of messages since it is expected to > replay the messages in the future and perform full-re-ingestion. So > messages will be kept in Kafka for indefinite period similar to database > where data once stored remains there and can be used later in teh future. > > > Do you see any pitfalls / any issue with this design especially wrt to > storing the messages indefinitely. > > > Thanks, > Susheel
Re: Added to Wiki please
Gwen, Makes total sense! Sorry for the wide distribution then, my apologies to the list. Kenny Gorman Founder www.eventador.io > On Nov 5, 2016, at 9:30 PM, Gwen Shapira wrote: > > Hi Kenny, > > First, thank you for letting the community know about your valuable service. > Second, the wiki you pointed to is for companies using Kafka, not for > vendors selling Kafka. We are trying to avoid commercializing the > community Wiki. Notice that Heroku, Cloudera, Hortonworks, and > Confluent are all missing from the wiki. > > If you have customers who are interested in being listed, we will > gladly include them - since they are using Kafka through your service. > > Hope this clarifies the use of the Powered-By page. > > Gwen > > On Wed, Nov 2, 2016 at 12:11 PM, Kenny Gorman wrote: >> Per the wiki, I am emailing the list for this. Can you please add us to >> https://cwiki.apache.org/confluence/display/KAFKA/Powered+By? >> >> Eventador.io (https://www.eventador.io/) is a whole stack Kafka as-a-service >> company. We enable developers to quickly create and painlessly manage >> real-time data pipelines on Apache Kafka. >> >> Thx!! >> Kenny Gorman >> Founder >> www.eventador.io > > > > -- > Gwen Shapira > Product Manager | Confluent > 650.450.2760 | @gwenshap > Follow us: Twitter | blog
Added to Wiki please
Per the wiki, I am emailing the list for this. Can you please add us to https://cwiki.apache.org/confluence/display/KAFKA/Powered+By? Eventador.io (https://www.eventador.io/) is a whole stack Kafka as-a-service company. We enable developers to quickly create and painlessly manage real-time data pipelines on Apache Kafka. Thx!! Kenny Gorman Founder www.eventador.io
Re: Benchmarking kafka performance
Vadim, We mostly made this little script as a joke. Remember the unix utility ‘yes’? It does in fact work if you want to simply direct some random load at Kafka to test things. Throw it into Docker and run a bunch of them. ;-) https://github.com/Eventador/evtools/tree/master/yesbench In terms of metrics, we measure through JMX: - BytesInPerSec/BytesOutPerSec - TotalTimeMs This is also a good post: https://www.datadoghq.com/blog/monitoring-kafka-performance-metrics/ Thanks Kenny Gorman http://www.eventador.io > On Sep 22, 2016, at 3:21 PM, Vadim Keylis wrote: > > Good afternoon. Any suggestions regarding benchmark tool would be greatly > appreciated. > > Thanks > > On Mon, Sep 19, 2016 at 8:18 AM, Vadim Keylis wrote: > >> Good morning. Which benchmarking tools we should use to compare >> performance of 0.8 and 0.10 versions? Which metrics should we monitor ? >> >> Thanks in advance, >> Vadim >>