I cannot agree more with Terry.
However, in my view, the situation is not that bleak! Some other openly available network datasets available are: 1. CAIDA backscatter dataset (contains reflected suspicious traffic) 2. LBNL/ICSI enterprise router dataset (contains segregated scan and benign traffic) 3. DEFCON 8-10 CTF datasets (contain only attack traffic during DEFCON competition) 4. UMASS gateway link dataset (is manually labeled by Yu Gu at University of Massachusetts) 5. Endpoint worm dataset (both benign and worm traffic, logged by argus -- probably the only data available at endpoints) The links to download the these datasets are available at http://www.nexginrc.org/~zubair/research.htm. Labeling network datasets (or establishing "ground truth") can be a tricky task. There are two standard ways to create labeled IDS datasets: (1) separately collecting benign and malicious traffic and then injecting to create infected traffic profiles, (2) collecting data and then labeling it via manual inspection or a combination of heuristics. The first method has been previously used in a number of papers published at SIGCOMM, S&P (for example "Mining Anomalies" paper by Lakhina). However, some reviews that I have received from 2008 S&P indicate that this method is no longer trusted and there is a possibility that unwanted artifacts can be introduced. The second method can be laborious and is prone to errors. However, I have been working on some semi-automated procedures to label anomalies in network traffic. Let me know if you have any ideas in this regard.
