Re: [Ntop-misc] General questions and documentation of nprobe internals

Luca Deri Mon, 01 Jan 2018 09:29:08 -0800

Hi Mark,
sorry for the late reply but we;ve been in vacation lately

Please see below


> On 20 Dec 2017, at 13:25, Mark Petronic <markpetro...@gmail.com> wrote:
> 
> I am running with nprobe 8.2 in collector mode. I am currently designing a 
> collection infrastructure so I want to try to understand what nprobe is doing 
> internally as to better understand how data is being processed. I a number of 
> questions in regard to this. I have read the latest version of the user guide 
> PDF but still have some questions. I tried to organize my questions in blocks 
> to hopefully allow for easier commenting on each question. This is fairly 
> long but I figured asking this all together, in context, would be better. 
> Thanks in advance to whoever takes this on - I really appreciate it. :)
> 
> Is there any detailed documentation on what is going on internally with 
> nprobe. In particular, I am using it as a collector to forward UDP netflow v9 
> from our Cisco routers to Kafka. I am particularly interesting in 
> understanding some of these stats and what they "infer" is happening under 
> the hood:
> 
> 19/Dec/2017 13:36:09 [nprobe.c:3202] Average traffic: [0.00 pps][All Traffic 
> 0 b/sec][IP Traffic 0 b/sec][ratio -nan]
> 19/Dec/2017 13:36:09 [nprobe.c:3210] Current traffic: [0.00 pps][0 b/sec]
> 19/Dec/2017 13:36:09 [nprobe.c:3216] Current flow export rate: [1818.5 
> flows/sec]
> 19/Dec/2017 13:36:09 [nprobe.c:3219] Flow drops: [export queue too 
> long=0][too many flows=0][ELK queue flow drops=0]
> 19/Dec/2017 13:36:09 [nprobe.c:3224] Export Queue: 0/512000 [0.0 %]
> 19/Dec/2017 13:36:09 [nprobe.c:3229] Flow Buckets: 
> [active=92792][allocated=92792][toBeExported=0]
> 19/Dec/2017 13:36:09 [nprobe.c:3235] Kafka [flows exported=366299/1818.5 
> flows/sec][msgs sent=366299/1.0 flows/msg][send errors=0]
> 19/Dec/2017 13:36:09 [nprobe.c:3260] Collector Threads: [167203 pkts@0] 
> 19/Dec/2017 13:36:09 [nprobe.c:3052] Processed packets: 0 (max bucket search: 
> 8)
> 19/Dec/2017 13:36:09 [nprobe.c:3035] Fragment queue length: 0
> 19/Dec/2017 13:36:09 [nprobe.c:3061] Flow export stats: [0 bytes/0 pkts][0 
> flows/0 pkts sent]
> 19/Dec/2017 13:36:09 [nprobe.c:3068] Flow collection:   [collected pkts: 
> 167203][processed flows: 4561802]
> 19/Dec/2017 13:36:09 [nprobe.c:3071] Flow drop stats:   [0 bytes/0 pkts][0 
> flows]
> 19/Dec/2017 13:36:09 [nprobe.c:3076] Total flow stats:  [0 bytes/0 pkts][0 
> flows/0 pkts sent]
> 19/Dec/2017 13:36:09 [nprobe.c:3087] Kafka [flows exported=366299][msgs 
> sent=366299/1.0 flows/msg][send errors=0]
> 
> For these two stats:
> 
> Flow collection:   [collected pkts: 167203][processed flows: 4561802]
> Kafka [flows exported=366299][msgs sent=366299/1.0 flows/msg][send errors=0]
> 
> I am thinking they mean that 167203 UDP packets where received from routers 
> comprising a total of 4561802 individual flow records. However, is see only 
> 366299 flows exported to Kafka. So, am I correct in assuming that nprobe is 
> doing some internal aggregation of flow records that is essentially squashing 
> the 4561802 received flow records into 366299 aggregates?
Yes your assumption is correct. If you want to avoid that please use 
--disable-cache

> 
> A follow on question to this, then, is related to:
> 
> Flow Buckets: [active=92792][allocated=92792][toBeExported=0]
> 
> What are these and how are they utilized? Again, I am assuming these are hash 
> buckets used for internal aggregation per the user's guide. I have seen 
> warning indicating that the allotment of these buckets are too small and to 
> expect drops. So, my guess is, based on flows/sec ingested, these have to be 
> sized appropriately to support the flow volume. Is that a correct assumption? 

When you see these messages we need to investigate. This happens when too many 
flows fall into the same hash bucket for instance. Enlarging the hash (-w) can 
help if too small compared to the number of collected flows, but for replying 
more in detail I need some extra context
> 
> I also notice that, when I start up nprobe in collector mode publishing to 
> Kafka, it takes about 30 or more seconds before any flows actually are 
> published to Kafka. This leads me to believe internal aggregations are 
> occurring that are delaying publishing of data. If I crank up the --verbose 
> to 2, I can see UDP packets being processed and then, after some time, I 
> start to see log messages indicating flows are being exported to Kafka. It is 
> not as much the latency issue I am concerned with here but rather just 
> understanding what is happening so that I can properly monitor and 
> configure/size the system.
Yes correct. By default flows are aggregated in the cache and as you write 
below the minimum timeout is 30 sec
> 
> Do these parameters impact the utilization of the flow buckets in collector 
> mode or just when running in sniffer mode? I ask because, I know the routers 
> are already doing aggregations meaning, accumulation counts for flows over 
> time before emitting a flow record that is active. Does it mean that nprobe 
> is then doing the same thing again for these flows and essentially 
> aggregating already aggregated flow records coming from my routers?
> 
> [--lifetime-timeout|-t] <timeout>   It specifies the maximum (seconds) flow 
> lifetime [default=120]
> [--idle-timeout|-d] <timeout>       It specifies the maximum (seconds) flow 
> idle lifetime [default=30]
> [--queue-timeout|-l] <timeout>      It specifies how long expired flows 
> (queued before delivery) are emitted [default=30]
> 
They affect the cache regardless of the mode (collector or probe). As you use 
the cache (unless --disable-cache is used) these defaults also apply to you
> 
> Also, based on the assumption of aggregating already aggregated data and the 
> type of traffic on the network I am monitoring (lots of short-lived 
> transactions, like credit card swipe processing by vendors and DNS lookups), 
> does it even make sense to have nprobe aggregating this traffic that I know 
> is NOT going to consist of more than one flow record anyway?

The answer depends on the environment you are monitoring.
> 
> The user document does not mention anything about monitoring nprobe 
> programmatically. What is the best way to monitor nprobe for internal packet 
> drops? I can get various OS stats from /proc/xxx, like UDP queue size, drops, 
> etc, but I need nprobe internal stats to round out the picture. I see that 
> there is information like this on stdout:
> 
> Flow drops: [export queue too long=0][too many flows=0][ELK queue flow 
> drops=0]
> 
> However, I want to monitor my nprobe instances with Nagios and generate 
> alerts on threshold checks as well as track utilization over time by posting 
> periodic stats to our InfluxDB/Grafana setup. Is there some way (other than 
> parsing stdout in a log) to gain programmatic access to these stats for 
> monitoring tools to use?

Nobody has asked this before so in short no API is available. Instead people 
use --dump-stats to generate dumps, or the /proc stats. If they are not enough 
please file a ticket on https://github.com/ntop/nProbe/issues 
<https://github.com/ntop/nProbe/issues> and explain what you you need. Please 
one ticket per request.

> 
> Regarding Kafka, the producer has many configuration options but only very 
> few are exposed for configuration in nprobe. Let me ask these one by one:
> 
> batch.size, linger.ms <http://linger.ms/>, buffer.memory - These are 
> essential to controlling batching in Kafka. nprobe has options 
> --kafka-enable-batch and--kafka-batch-len. However, these end up wrapping N 
> messages into a JSON array of size N and publishing that to Kafka. I feel 
> this is a wrong approach. Consider the downstream Kafka consumer. It expects 
> to receive a series of message off a topic. The format of those message 
> should not change due to batching. When batching is not enabled in nprobe, 
> the consumer sees a series of JSON dictionaries - each a single flow record. 
> When batching is enabled, the consumer now sees a series of JSON arrays, each 
> with N JSON dictionaries. IMO, the proper way to do this is to use the Kafka 
> configuration values to control batching. In that case, the producer simply 
> queues up messages (each a dictionary) and, when configured thresholds are 
> met, emits those messages. This results in a batch of dictionaries being sent 
> and the consumer ONLY sees dictionaries. Changing the message structure due 
> to batching complicates things for consumers and is not a typical pattern in 
> Kafka processing.
> Options topic - Your documentation does not even mention this (nprobe --help 
> does) but I don't understand what it means? What is a Kafka options topic?
> Partitioning - If we want to perform stream process of netflow data, then we 
> want to ensure that all flow records from a given n-tuple are placed on the 
> same Kafka partition. We need to partition the data because it is the only 
> way to scale consumers in Kafka. If I want to perform some aggregations on 
> the data stream then I have to be sure that all netflow records for a given 
> conversation, for example, are on the same topic partition. A simple example 
> that will make that happen would be to use the IPV4_SRC_ADDR field of the 
> flow record as the partition key. Or, maybe an N-tuple of (IPV4_SRC_ADDR, 
> IPV4_DST_ADDR, L4_SRC_PORT, L4_SRC_PORT) as the partition key. In Java, a 
> producer would do this by hashing the string that comprises the partition key 
> desired then doing a hash % num-partitions to figure out the partition to 
> send the message on. I am guessing that nprobe relies on the default 
> partitioning scheme in the producer which is a simple round-robin approach 
> based on the number of partitions that exist for the topic being used. This, 
> however, would randomly distribute flow records for a given conversation 
> across multiple partitions and, therefore across multiple consumers in a 
> downstream consumer group. That would break the aggregations. So, my request 
> is that you consider allowing a configuration option that enables the user to 
> define the partition key. This might be done, for example, by allowing the 
> user to define a CSV list of template fields to use to form the partition key 
> string. You could just concatenate them together and hash that value then 
> modulo divide by the number of partitions for the topic being used and use 
> that to enable the producer to publish on the appropriate topic partition. 
> The gives the user the freedom to define the partition key while making the 
> implemention in nprobe fairly generic. Maybe this could also be done via some 
> sort of "partition plugin" to make it even more extensible? How you 
> considered any such capability. Without such a capability, we will have to 
> initially publish all flows on a say "netflow-raw" topic (using round-robin) 
> then consume this topic in a consumer group only to republish it by 
> repartitioning it (as described above using some N-tuple of fields) only be 
> then be consumer by another consumer group who will be doing the aggregations 
> and enrichments needed. Sure, we can make it would but partitioning should 
> "really" be done at the source. The approach I just described necessarily 
> doubles our broker traffic which I would not like to have to do.
> Producer Options in General - Why not just make them all configurable? For 
> example, allow the user to define a name=value config file using any 
> supported producer configuration options and provide the path to the file as 
> an nprobe Kafka configuration option. Then, when you instantiate the producer 
> in nprobe, read in those configuration values and pass them into the 
> producer. This gives the users access to all options available and not just 
> the current topic, acks, and compression values.
> 
> Miscellaneous Notes:
> The v8.1 users guide lists "New Options --kafka-enable-batch 
> and--kafka-batch-len to batch flow export to kafka" but does not provide any 
> detailed documentation on these. Looks like someone forgot to add the 
> description of these later in the document
> nprobe --help show this under the Kafka options:  "<options topic> Flow 
> options topic" but the v8.1 user's guide gives no mention to it. I have no 
> idea what an options topic is.
As of the above notes on Kafka, I let my colleague Simone answer you who is the 
kafka export in our team.

Simone can you please answer Mark, and if there are changed to be made (I think 
so from what I understand) file individual tickets?

Thanks Luca


> _______________________________________________
> Ntop-misc mailing list
> Ntop-misc@listgateway.unipi.it
> http://listgateway.unipi.it/mailman/listinfo/ntop-misc

_______________________________________________
Ntop-misc mailing list
Ntop-misc@listgateway.unipi.it
http://listgateway.unipi.it/mailman/listinfo/ntop-misc

Re: [Ntop-misc] General questions and documentation of nprobe internals

Reply via email to