[ 
https://issues.apache.org/jira/browse/TIKA-695?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13185001#comment-13185001
 ] 

Nick Burch commented on TIKA-695:
---------------------------------

Thanks for the sample files. Based on them, I've added support for all the 
common custom property types (to both Tika and POI), and added unit tests for 
custom properties on both OLE2 and OOXML files

As of r1230576, custom properties from OOXML files are being correctly extracted

The only parts left not supported are Vectors/Arrays (where a property can have 
multiple values), and the byte based blogs/streams. I don't think we're likely 
to be able to do much with the byte based ones, but possibly the vectors/arrays 
could be worth adding later. If you're able to create files with these custom 
properties, please open a new enhancement for it!
                
> Custom properties on xlsx, docx, pptx
> -------------------------------------
>
>                 Key: TIKA-695
>                 URL: https://issues.apache.org/jira/browse/TIKA-695
>             Project: Tika
>          Issue Type: Bug
>          Components: parser
>    Affects Versions: 0.10, 1.0
>         Environment: All OS
>            Reporter: Etienne Jouvin
>            Priority: Minor
>             Fix For: 1.1
>
>
> Parser on office Xfiles do not get custom properties.
> In class MetadataExtractor, method extract, only core and extended properties 
> are retrieve.
> I added something like this:
> extractMetadata(extractor.getCustomProperties(), metadata);
> {quote}
>       /**
>        * Add this method to read custom properties on document.
>        * 
>        * @param properties All custom properties.
>        * @param metadata Metadata to complete with read properties.
>        */
>       private void extractMetadata(CustomProperties properties, Metadata 
> metadata) {
>               
> org.openxmlformats.schemas.officeDocument.x2006.customProperties.CTProperties 
> propsHolder = properties.getUnderlyingProperties();
>               String value = null;
>               DateUtils dateUtils = DateUtils.getInstance();
>               BigDecimal bigDecimal;
>               for (CTProperty property : propsHolder.getPropertyList()) {
>                       /* Parse each property */
>                       if (property.isSetLpwstr()) {
>                               value = property.getLpwstr();
>                       } else if (property.isSetFiletime()) {
>                               value = 
> dateUtils.convertDate(property.getFiletime(), null);
>                       } else if (property.isSetDate()) {
>                               value = 
> dateUtils.convertDate(property.getDate(), null);
>                       } else if (property.isSetDecimal()) {
>                               bigDecimal = property.getDecimal();
>                               value = null == bigDecimal ? null : 
> bigDecimal.toString();
>                       } else if (property.isSetBool()) {
>                               value = 
> BooleanUtils.toStringTrueFalse(property.getBool());
>                       } else if (property.isSetInt()) {
>                               value = Integer.toString(property.getInt());
>                       } else if (property.isSetLpstr()) {
>                               value = property.getLpstr();
>                       } else if (property.isSetI4()) {
>                               /* Number in Excel for example.... Why i4 ? Ask 
> microsoft. */
>                               value = Integer.toString(property.getI4());
>                       } else {
>                               /* For other type, do nothing. */
>                               continue;
>                       }
>                       /* Add the custom prefix, as done in old office format. 
> */
>                       addProperty(metadata, "custom:" + property.getName(), 
> value);
>               }
>       }
> {quote}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to