Package: wnpp
X-Debbugs-Cc: debian-devel@lists.debian.org, 
debian-security-to...@lists.debian.org

Owner: Jan Gru <j4n...@gmail.com>
Severity: wishlist

* Package name    : bulk-extractor
  Version         : 1.6.0
  Upstream Author : Simson L. Garfinkel <slgar...@nps.edu>
* URL             : https://github.com/simsong/bulk_extractor
* License         : MIT and CC0
  Programming Lang: C++, Python (and Java for the BEViewier, probably not 
packaged)
  Description     : A stream-based forensics tool for triage and cross-evidence 
analysis, which scans the media and extracts recognizable content


bulk_extractor is a program for bulk data extraction and analysis, it carves 
for relevant features such as email addresses, credit card numbers, URLs,
and other types of information from digital evidence files in a stream-based 
manner by parallelized processing blocks to omit disk seeking.

** Why is this package relevant?
It is a useful tool for forensic investigations, because it is way more than 
just another file carver. The program provides several unusual capabilities 
including:

- It finds email addresses, URLs and credit card numbers that other tools miss 
because it can process compressed data (like ZIP, PDF and GZIP files) and 
incomplete or partially corrupted data.
- It can carve JPEGs, office documents and other kinds of files out of 
fragments of compressed data. It will detect and carve encrypted RAR files.
- It builds word lists based on all of the words found within the data, even 
those in compressed files that are in unallocated space. Those word lists can 
be useful for password cracking.
- It is multi-threaded; running bulk_extractor on a computer with twice the 
number of cores typically makes it complete a run in half the time.
- It creates histograms showing the most common email addresses, URLs, domains, 
search terms and other kinds of information on the drive.

The program is authored by the renowned forensics researcher Simson L. 
Garfinkel, who is probably most recognized for his work on DFXML at the Naval 
Postgraduate School (NPS) and the National Institute of Standards and 
Technology (NIST). It provides rich documentation -- for the end-users as well 
as for potential contributors [0].

To sum it up, bulk_extractor has great potential for improving triage and 
automatation workflows within digital forensics and should be therefore 
included in Debian's package sources. 

** Resolved issues
bulk_extractor is already packaged in Kali [1], but had licensing issues until 
recently.
To be more precise, it linked code with OpenSSL while not explicitly permitting 
it and used a the modified MIT-license from the
JSON-project, which is considered non-free and not DFSG-compliant. To overcome 
this issues I resolved this issues in cooperation
with upstream by sending two recent patches [2], which were already accepted.

** Maintanance plan
I plan to maintain it within the pkg-security-team's repository on salsa, where 
a lot of forensics packages live [3].
I am looking for a sponsor of this package, who would be ideally a member of 
the a/m team.

Best regards
   Jan

[0] See http://digitalcorpora.org/downloads/bulk_extractor/BEUsersManual.pdf, 
https://digitalcorpora.s3.amazonaws.com/downloads/bulk_extractor/BEProgrammersManual.pdf
 and 
https://digitalcorpora.s3.amazonaws.com/downloads/bulk_extractor/BEWorkedExamplesStandalone.pdf
[1] See https://tools.kali.org/forensics/bulk-extractor
[2] See https://github.com/simsong/bulk_extractor/issues/168, 
https://github.com/simsong/bulk_extractor/pull/169 and 
https://github.com/simsong/bulk_extractor/pull/170
[3] See https://salsa.debian.org/pkg-security-team/

  

Reply via email to