RE: RRDTool, there is rrd4j - a Java implementation licensed under Apache 2.0 (https://code.google.com/p/rrd4j/) On Apr 22, 2013, at 11:03 AM, Eric Newton <[email protected]> wrote:
> Presently the information is stored in memory and it certainly could be > stored in tables. > > This reminds me of an idea that I've been thinking about for a long time. > It's a little aggressive to do in a single summer. > > ---- > > RRDTool stores time series data in fixed-length files. One important > feature is the ability to compress time-series data into less-fine-grained > results over time. > > However, updating many RRD files, with periodic updates, requires making > lots of small seeks and updates to individual files. It works well when > all the files fit in the disk cache. It falls down hard when it doesn't. > > My idea is to put updates into an Accumulo row for one collected data > point, along with some recent version in RRD format: > > Key Value > row, cf:cq > -------------------------------------------------------- > point rrd: [RRDTool data] > point ts:timestamp value > point ts:timestamp value > point ts:timestamp value > point ts:timestamp value > point ts:timestamp value > > When the tablet compacts, you use a Combiner to push the updates into the > RRD data: > > Key Value > row, cf:cv > ------------------------------------------------------- > point rrd: [Updated RRDTool data] > point ts:timestamp value > > Further, when you scan the data, you could use an RRD iterator to perform > queries on the RRD format, which would extract out only the > summary/graph/data you want. > > This leverages the Accumulo write-ahead log, and efficiency of > log-structured merge trees to defer RRD updates to a point where they can > be done efficiently (with respect to disk seeks), and even the block cache > to access recently read information quickly. And, the data won't grow > indefinitely due to the properties of the RRD storage format. > > Sadly, RRDTool does not have a Java API. But there appear to be java-based > substitutes; I have no idea if they are license compatible. > > OpenTSDB does something similar: they compress updates into blocks of > updates in hourly chunks, converting many small records into one larger > one. Their scheme does not lose data, which was important to them. > > > -Eric > > > > On Mon, Apr 22, 2013 at 10:33 AM, Supun Kamburugamuva > <[email protected]>wrote: > >> I can see how summaries are very helpful to a user. We can introduce new >> fields to the existing table/tablet summery tables that displays problem >> information etc. >> >> To make the JMX polling time configurable we can introduce configuration >> parameters. >> >> For the JMX statistics we can keep data at the server for a constant time >> to avoid memory growth. I think the stats are stored in memory (please >> correct me if I'm wrong). If that is the case, is it possible to store them >> in accumulo tables? >> >> Thanks, >> Supun... >> >> On Mon, Apr 22, 2013 at 9:05 AM, Eric Newton <[email protected]> >> wrote: >> >>> Another thing to consider is scale. On large clusters (many hundreds of >>> nodes), more data is not helpful for visualization. Instead, summaries, >>> averages and outliers are important. >>> >>> For example, if one node is consistently slow, it is better to know that >>> than to see one graph with low numbers in a sea of graphs. >> >> >>> If the monitor collects information using JMX, collection time for each >>> node would be a good thing to know, too. >>> >> >> >> >> >>> >>> -Eric >>> >>> >>> On Sun, Apr 21, 2013 at 10:00 PM, Josh Elser <[email protected]> >> wrote: >>> >>>> Supun, >>>> >>>> Yup, very much so. Having a way to consume any and all metrics via JMX >>>> would simplify things for any consumers (internal or external). >>>> >>>> >>>> >>>> On 04/21/2013 02:15 PM, Supun Kamburugamuva wrote: >>>> >>>>> Hi Josh, >>>>> >>>>> Thanks for the suggestions. I'll incorporate these to the proposal. >>>>> >>>>> Another area I would like to work is on JMX. There is a Jira that says >>> to >>>>> replace the Monitor calls from Thrift to JMX (Accumulo 694). Do you >>> think >>>>> this is a good addition to the Monitor? >>>>> >>>>> Thanks, >>>>> Supun.. >>>>> >>>>> >>>>> On Sun, Apr 21, 2013 at 1:45 PM, Josh Elser <[email protected]> >>> wrote: >>>>> >>>>> Supun, >>>>>> >>>>>> Looks good! Can I make some suggestions/comments? >>>>>> >>>>>> For: "Per table plots: ACCUMULO-594", I'd also like to see minor >>>>>> compactions, major compactions, index cache hit rate, and data cache >>> hit >>>>>> rate per table (same graphs that are displayed system-wide when you >>> visit >>>>>> http://${MONITOR_HOST}:50095/. >>>>>> >>>>>> For "Per tablet [server] plots", it would be neat if you could also >>>>>> extract some general statistics like top N least performing, top N >>>>>> highest >>>>>> performing, etc. tablet servers. Ideally, this could correlate with >>>>>> servers >>>>>> that may be having problems :). >>>>>> >>>>>> Do you see these proposed changes as being sufficient for 3-4 months >> of >>>>>> 40hrs/week work? If you plan to really dig into these changes >> (perhaps >>>>>> reworking components of the monitor itself), I could perhaps see >> this. >>> Do >>>>>> you have any ideas for more lofty goals that you could pursue as >> well? >>> I >>>>>> don't want you/us to get one month into things and see you complete >>>>>> everything we initially planned to accomplish :) >>>>>> >>>>>> - Josh >>>>>> >>>>>> >>>>>> On 04/21/2013 10:37 AM, Supun Kamburugamuva wrote: >>>>>> >>>>>> Hi all, >>>>>>> >>>>>>> I would like to start writing the proposal for the GSoc. I've put >>>>>>> together >>>>>>> some initial high level goals of the project. Please let me know >> what >>> I >>>>>>> can >>>>>>> improve. >>>>>>> >>>>>>> Per table plots: Accumulo 594 >>>>>>> --------------------- >>>>>>> >>>>>>> The goal of this is to display plots that explains the various >>>>>>> activtities >>>>>>> that happens per table. When we go to the tables page of the monitor >>> and >>>>>>> go >>>>>>> to a specific table it displays some information in a table format. >> We >>>>>>> can >>>>>>> argument this information by showing graphs for >>>>>>> >>>>>>> 1. Ingest entries >>>>>>> 2. Ingest data size >>>>>>> 3. Scan entries >>>>>>> 4. Scan data size >>>>>>> >>>>>>> Per tablet plots >>>>>>> ---------------------- >>>>>>> >>>>>>> Same as in the table plots we can display information regarding >> tablet >>>>>>> servers in the tablet server page. The plots will display the same >>>>>>> information as table plots considering data per tablet server. >>>>>>> >>>>>>> Trace Visualization: Accumulo 1198 >>>>>>> ---------------------------- >>>>>>> >>>>>>> Since we are displaying graphs about each tablet and each table we >> can >>>>>>> add >>>>>>> major and minor compaction graph to each table and each tablet. >>>>>>> >>>>>>> Or other option is to display this in a single graph in overview >> page >>>>>>> with >>>>>>> different graph lines for different tables and tablets. >>>>>>> >>>>>>> Server type information : Accumulo 807 >>>>>>> ------------------------------****--- >>>>>>> >>>>>>> For displaying this informations we can add a new page and display >> the >>>>>>> information as a table. The table should specify the network address >>> of >>>>>>> the >>>>>>> server, server type, weather it is active or in-active etc. >>>>>>> >>>>>>> Thanks, >>>>>>> Supun... >>>>>>> >>>>>>> >>>>>>> >>>>> >>>> >>> >> >> >> >> -- >> Supun Kamburugamuva >> Member, Apache Software Foundation; http://www.apache.org >> E-mail: [email protected]; Mobile: +1 812 369 6762 >> Blog: http://supunk.blogspot.com >>
smime.p7s
Description: S/MIME cryptographic signature
