README.md            |  227 +++++++++++++++++++++++++++++++++++++++++++++++++++
 msodumper/globals.py |    2 
 2 files changed, 228 insertions(+), 1 deletion(-)

New commits:
commit 1aa6b6e71227bd5864a965a49db9ed74cdcf61ae
Author:     Samuel Mehrbrodt <[email protected]>
AuthorDate: Mon Jan 26 16:38:40 2026 +0100
Commit:     Samuel Mehrbrodt <[email protected]>
CommitDate: Mon Jan 26 16:40:03 2026 +0100

    Add README
    
    Thanks Claude for generating this
    
    Change-Id: If929efb337140c60574f84a02eb121199d7d18f8

diff --git a/README.md b/README.md
new file mode 100644
index 0000000..eadc358
--- /dev/null
+++ b/README.md
@@ -0,0 +1,227 @@
+# MSO-Dumper
+
+A comprehensive set of tools for analyzing and dumping Microsoft Office file 
formats.
+
+## Description
+
+MSO-Dumper is a package for analyzing and dumping various Microsoft Office 
file formats, including binary formats like DOC, XLS, PPT, and graphics formats 
like EMF, WMF. It provides detailed structural analysis and can extract content 
from these files.
+
+## Author Information
+
+- **Authors**: See 
https://github.com/LibreOffice/mso-dumper/graphs/contributors
+- **Email**: [email protected]
+- **License**: Mozilla Public License 2.0
+
+## Installation
+
+```bash
+python setup.py install
+```
+
+## Tools and Usage
+
+### Document Format Dumpers
+
+#### ppt-dump.py - PowerPoint File Dumper
+Analyzes and dumps PowerPoint (.ppt) binary format files.
+
+```bash
+./ppt-dump.py [options] [ppt file]
+```
+
+**Options:**
+- `--help` - displays help message
+- `--no-struct-output` - suppress normal structure analysis output
+- `--dump-text` - extract and print textual content
+- `--no-raw-dumps` - suppress raw hex dumps of uninterpreted areas
+- `--id-select=id1[,id2 ...]` - limit output to selected record IDs
+
+**Example:**
+```bash
+./ppt-dump.py presentation.ppt
+./ppt-dump.py --dump-text --no-raw-dumps slides.ppt
+```
+
+#### doc-dump.py - Word Document Dumper
+Analyzes and dumps Word (.doc) binary format files.
+
+```bash
+./doc-dump.py [doc file]
+```
+
+**Example:**
+```bash
+./doc-dump.py document.doc
+```
+
+#### xls-dump.py - Excel Spreadsheet Dumper
+Analyzes and dumps Excel (.xls) binary format files with extensive options.
+
+```bash
+./xls-dump.py [options] [xls file]
+```
+
+**Options:**
+- `-d, --debug` - turn on debug mode
+- `--show-sector-chain` - show sector chain information at start of output
+- `--show-stream-pos` - show position of each record relative to the stream
+- `--dump-mode MODE` - specify dump mode: 'flat' (default), 'xml', or 
'canonical-xml'
+- `--catch` - catch exceptions and try to continue
+- `--utf-8` - output strings as UTF-8
+
+**Examples:**
+```bash
+./xls-dump.py spreadsheet.xls
+./xls-dump.py --dump-mode xml --debug workbook.xls
+./xls-dump.py --show-stream-pos --utf-8 data.xls
+```
+
+#### vsd-dump.py - Visio Document Dumper
+Analyzes and dumps Visio (.vsd) format files.
+
+```bash
+./vsd-dump.py [vsd file]
+```
+
+**Example:**
+```bash
+./vsd-dump.py diagram.vsd
+```
+
+### Graphics Format Dumpers
+
+#### emf-dump.py - Enhanced Metafile Dumper
+Analyzes and dumps Enhanced Metafile (.emf) format files.
+
+```bash
+./emf-dump.py [emf file]
+```
+
+**Example:**
+```bash
+./emf-dump.py image.emf
+```
+
+#### wmf-dump.py - Windows Metafile Dumper
+Analyzes and dumps Windows Metafile (.wmf) format files.
+
+```bash
+./wmf-dump.py [wmf file]
+```
+
+**Example:**
+```bash
+./wmf-dump.py graphic.wmf
+```
+
+### OLE Format Dumpers
+
+#### ole1-dump.py - OLE1 Embedded Object Dumper
+Dumps OLE1 embedded objects according to [MS-OLEDS] 2.2.5 specification.
+
+```bash
+./ole1-dump.py [ole1 file]
+```
+
+**Example:**
+```bash
+./ole1-dump.py embedded_object.ole1
+```
+
+#### ole2preview-dump.py - OLE2 Preview Stream Dumper
+Dumps OLE2 preview streams according to [MS-OLEDS] 2.3.4 specification.
+
+```bash
+./ole2preview-dump.py [ole2 file]
+```
+
+**Example:**
+```bash
+./ole2preview-dump.py preview_stream.ole2
+```
+
+### VBA and Macro Analysis
+
+#### vbadump.py - VBA Project Dumper
+Extracts and analyzes VBA (Visual Basic for Applications) code from Office 
documents.
+
+```bash
+./vbadump.py [office file with VBA]
+```
+
+**Example:**
+```bash
+./vbadump.py macro_document.xls
+```
+
+### Special Format Tools
+
+#### swlaycache-dump.py - StarWriter Layout Cache Dumper
+Dumps Star Writer binary layout cache format.
+
+```bash
+./swlaycache-dump.py [cache file]
+```
+
+**Example:**
+```bash
+./swlaycache-dump.py layout.cache
+```
+
+### Utility Scripts
+
+#### compress.py - VBA Stream Compressor
+Compresses VBA streams using Microsoft's compression algorithm.
+
+```bash
+./compress.py [offset]
+```
+Takes input from stdin and outputs compressed stream to stdout. Optional 
offset parameter.
+
+#### decompress.py - VBA Stream Decompressor
+Decompresses VBA streams.
+
+```bash
+./decompress.py [offset]
+```
+Takes compressed input from stdin and outputs decompressed stream to stdout. 
Optional offset parameter.
+
+#### pptx-kill-uuid.py - PowerPoint UUID Replacement Tool
+Replaces UUIDs in PowerPoint XML streams with sequential integers for easier 
analysis.
+
+```bash
+cat ppt/diagrams/data1.xml | ./pptx-kill-uuid.py
+```
+
+#### convert-enum.py
+Utility script for converting enumerations (see source for specific usage).
+
+## Output Formats
+
+Most dump tools output XML-formatted analysis data that includes:
+
+- File structure information
+- Record-by-record analysis
+- Raw hex dumps of binary data
+- Extracted text content (where applicable)
+- Stream hierarchies for compound document formats
+
+## Development
+
+The core parsing logic is contained in the `msodumper/` package with 
specialized modules for each format:
+
+- `docstream.py`, `docrecord.py` - Word document parsing
+- `xlsstream.py`, `xlsrecord.py`, `xlsmodel.py` - Excel parsing
+- `pptstream.py`, `pptrecord.py` - PowerPoint parsing
+- `emfrecord.py`, `wmfrecord.py` - Graphics format parsing
+- `ole.py`, `olestream.py` - OLE compound document parsing
+- `vbahelper.py` - VBA macro analysis
+- etc.
+
+Submit Patches to LibreOffice Gerrit:
+* https://gerrit.libreoffice.org
+* https://wiki.documentfoundation.org/Development/gerrit
+
+## License
+
+This project is licensed under the Mozilla Public License 2.0 - see the 
license header in each source file for details.
commit e71c36202d76c8123301bb59b1c8cf4d0746f97d
Author:     Samuel Mehrbrodt <[email protected]>
AuthorDate: Mon Jan 26 16:36:11 2026 +0100
Commit:     Samuel Mehrbrodt <[email protected]>
CommitDate: Mon Jan 26 16:40:02 2026 +0100

    Add missing os import (for Windows)
    
    See https://github.com/LibreOffice/mso-dumper/issues/7
    
    Change-Id: I705762dd580b15f678ec1b0a49744a816b214d94

diff --git a/msodumper/globals.py b/msodumper/globals.py
index 993cc56..7a0ac1a 100644
--- a/msodumper/globals.py
+++ b/msodumper/globals.py
@@ -5,7 +5,7 @@
 # file, You can obtain one at http://mozilla.org/MPL/2.0/.
 #
 from builtins import range
-import sys, struct, math, zipfile, io
+import sys, struct, math, zipfile, io, os
 from . import xmlpp
 
 PY3 = sys.version > '3'

Reply via email to