[email protected] wrote:

> 1. CAIDA backscatter dataset (contains reflected suspicious traffic) 

This is just reflected backscatter from DoS attacks, how you would use
it to evaluate an IDS is beyond me.

> 2. LBNL/ICSI enterprise router dataset (contains segregated scan and
> benign traffic)

Just packet traces, anonymized, no content

 3. DEFCON 8-10 CTF datasets (contain only attack
> traffic during DEFCON competition) 

Only attacks, in a non-realistic network

> 4. UMASS gateway link dataset (is
> manually labeled by Yu Gu at University of Massachusetts) 

See 2

> 5. Endpoint
> worm dataset (both benign and worm traffic, logged by argus --
> probably the only data available at endpoints)

I can't understand how this was generated or collected. Pointers ?

> tricky task. There are two standard ways to create labeled IDS
> datasets: (1) separately collecting benign and malicious traffic and
> then injecting to create infected traffic profiles, 

Which generates the artifacts present in IDEVAL

>(2) collecting
> data and then labeling it via manual inspection or a combination of
> heuristics.

Which is a tedious task no one wants to do (manually). What do you mean
by heuristics ?

> have been working on some semi-automated procedures to label
> anomalies in network traffic.

Which is the same as developing an anomaly detector. Thus, you are
effectively using an anomaly detector to evaluate another, opening up a
can of worms you really don't want to open.

SZ



Reply via email to