Hi George,

So the idea here is for the tool to gather the metrics and then either have 
documentation or some kind of script that crunches through the metrics and 
produces a configuration recommendation.  So what you mention would be the 
outcome of this analysis. 

So an example would be if your messages are of size A, the average EPS is B, 
and you have C peaks, then that corresponds to the following storm 
configuration…(whatever Storm parallelism and worker setup should be)

Another example is if you have X number of fields per message with key of size 
Y and value of size Z then that corresponds to you needing a specified number 
of search heads.


Thanks,
James 




On 4/13/16, 9:40 AM, "George Vetticaden" <gvettica...@hortonworks.com> wrote:

>+ 1 to James suggestion.
>We also need to consider not just the data volume and storage requirements
>for proper cluster sizing but also processing requirements as well. Given
>that in the new architecture, we have moved to single enrichment topology
>that will support all data sources, proper sizing of the enrichment
>topology  will be even more crucial to maintain SLAs and HA requirements.
>The following key questions will apply to each parser topology and single
>enrichment topology
>
>1. Number of workers?
>2. Number of workers per machine?
>3. Size of each workers (in memory)?
>4. Supervisor memory settings
>
>The assessment tool should also be used to size topologies correctly as
>well. 
>
>Tuning Kafka, Hbase and Solr/Elastic should also be governed by the Metron
>assessment tool.
>
>
>-- 
>George Vetticaden
>
>
>
>
>
>
>
>On 4/13/16, 11:28 AM, "James Sirota" <jsir...@hortonworks.com> wrote:
>
>>Prior to adoption of Metron each adopting entity needs to guesstimate
>>it¹s data volume and data storage requirements so they can size their
>>cluster properly.  I propose a creation of an assessment tool that can
>>plug in to a Kafka topic for a given telemetry and over time produce
>>statistics for ingest volumes and storage requirement.  The idea is that
>>prior to adoption of Metron someone can set up all the feeds and kafka
>>topics, but instead of deploying Metron right away they would deploy this
>>tool.  This tool would then produce statistics for data ingest/storage
>>requirement, and all relevant information needed for cluster sizing.
>>
>>Some of the metrics that can be recorded are:
>>
>>  *   Number of system events per second (average, max, mean, standard
>>dev)
>>  *   Message size  (average, max, mean, standard dev)
>>  *   Average number of peaks
>>  *   Duration of peaks  (average, max, mean, standard dev)
>>
>>If the parser for a telemetry exist the tool can produce additional
>>statistics
>>
>>  *   Number of keys/fields parsed (average, max, mean, standard dev)
>>  *   Length of field parsed (average, max, mean, standard dev)
>>  *   Length of key parsed (average, max, mean, standard dev)
>>
>>The tool can run for a week or a month and produce these kinds of
>>statistics.  Then once the statistics are available we can come up with a
>>guidance documentation of recommended cluster setup.  Otherwise it¹s hard
>>to properly size a cluster and setup streaming parallelism not knowing
>>these metrics.
>>
>>
>>Thoughts/ideas?
>>
>>Thanks,
>>James
>
>

Reply via email to