Document / Content Harvesting (Word, Excel.. others)
----------------------------------------------------
Key: MSITE-514
URL: http://jira.codehaus.org/browse/MSITE-514
Project: Maven 2.x Site Plugin
Issue Type: New Feature
Affects Versions: 3.0-beta-2
Reporter: Andrew Hughes
Hi Guys,
Have an idea, but I wouldn't know where to get started on this... besides I
think this is more than a one person job. Just like we have reporting plugins
and the project-info reports, I think a "project documents" site plugin would
be an excellent idea.
Purpose:
The primary purpose is to provide easy integration of non apt, xdoc...
formatted documents into maven sites.
Objectives:
The primary objective should be to create a menu on the site that lists all of
the discovered documents in the project source.
Example (that extents the normal "Project Documentation" menu.
* Project Documentation
** Project Information
*** Continuous Integration
*** Issue Tracking
*** Project Team
*** Source Repository
** Project Reports
*** Maven Surefire Report
*** Other Report
** Documents <- NEW name TDB, clicking on this should open a page with a
table of all documents with their harvested metadata.
*** Acme Project SRS (doc) <- New, showing a harvested word document.. the link
title is the document title
*** Contract (pdf) <- New, showing a harvested pdf document.
*** Estimates (xls) <- New, an excel spreadsheet
*** Risk Register (xls) <- New, another excel spreadsheet.
The index page's could hopefully gather enuff metadata about the documents to
create something that looks like...
||Title||Filename||Format||Author||Last Modified||Last Mofified By||
|Acme Project SRS| APD-ACME-SRS.doc|doc|John Smith|14-10-2010|A Hughes|
|Contact| APC-ACME-CONTACT-23489345.pdf|pdf|N/A|22-02-2010|N/A|
|Estimates| APE-ACME-Estimates.xls|xls|A Schwarzenegger|22-02-2010|JP Freely|
Implementation:
I got very little idea how this kinda thing could be integrated into the site.
From menu creation, velocity templates e.t.c... sorry I am quite useless. I do
know that we have things like http://poi.apache.org/ to help gather meta data
about microsoft documents, and similarly pdf is available. LaTex or other
formats hopefully have similar API's.
Configuration:
I'd think that the pom config might help define how this could work.. what
options and functionality it would/could potentially offer...
{noformat}
<plugin>
...ommitting normal stuff...
<configuration>
<resources>
<resource>
<!-- override the default of
./src/site/resources -->
<directory>${basedir}/documents</directory>
<!-- override the default of what files to
include -->
<includes>
<include>**.doc</include>
<include>**.xls</include>
</includes>
</resource>
</resources>
<!-- override the default label shown on the menu -->
<menuTitle>Documentz</menuTitle>
<!-- select the metaData harvested from documents to show on
the index page -->
<metaData>title,version,author,lastModifiedBy,lastModifiedData</metaData>
</configuration>
<plugin>
{noformat}
What do you think, is this a practical idea? is this achievable and how much
work would be involved?
CHEERS :)
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
http://jira.codehaus.org/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira