[ https://issues.apache.org/jira/browse/TIKA-712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13112801#comment-13112801 ]
Michael McCandless commented on TIKA-712: ----------------------------------------- Good idea! Nice how approachable OOXML is... In theory the answer is here: http://www.ecma-international.org/publications/standards/Ecma-376.htm but I have not tried to dig. So, here's a boilerplate-only chunk from the master slide (PowerPoint does not display this on the slide): {noformat} <p:sp> <p:nvSpPr> <p:cNvPr id="2" name="Title Placeholder 1"/> <p:cNvSpPr> <a:spLocks noGrp="1"/> </p:cNvSpPr> <p:nvPr> <p:ph type="title"/> </p:nvPr> </p:nvSpPr> <p:spPr> <a:xfrm> <a:off x="457200" y="274638"/> <a:ext cx="8229600" cy="1143000"/> </a:xfrm> <a:prstGeom prst="rect"> <a:avLst/> </a:prstGeom> </p:spPr> <p:txBody> <a:bodyPr vert="horz" lIns="91440" tIns="45720" rIns="91440" bIns="45720" rtlCol="0" anchor="ctr"> <a:normAutofit/> </a:bodyPr> <a:lstStyle/> <a:p> <a:r> <a:rPr lang="en-US" smtClean="0"/> <a:t>Click to edit Master title style </a:t> </a:r> <a:endParaRPr lang="en-US"/> </a:p> </p:txBody> </p:sp> {noformat} And here's the footer I edited (PowerPoint does display this on the slide): {noformat} <p:sp> <p:nvSpPr> <p:cNvPr id="5" name="Footer Placeholder 4"/> <p:cNvSpPr> <a:spLocks noGrp="1"/> </p:cNvSpPr> <p:nvPr> <p:ph type="ftr" sz="quarter" idx="3"/> </p:nvPr> </p:nvSpPr> <p:spPr> <a:xfrm> <a:off x="3124200" y="6356350"/> <a:ext cx="2895600" cy="365125"/> </a:xfrm> <a:prstGeom prst="rect"> <a:avLst/> </a:prstGeom> </p:spPr> <p:txBody> <a:bodyPr vert="horz" lIns="91440" tIns="45720" rIns="91440" bIns="45720" rtlCol="0" anchor="ctr"/> <a:lstStyle> <a:lvl1pPr algn="ctr"> <a:defRPr sz="1200"> <a:solidFill> <a:schemeClr val="tx1"> <a:tint val="75000"/> </a:schemeClr> </a:solidFill> </a:defRPr> </a:lvl1pPr> </a:lstStyle> <a:p> <a:r> <a:rPr lang="en-US" smtClean="0"/> <a:t>Slide footer is right here </a:t> </a:r> <a:endParaRPr lang="en-US"/> </a:p> </p:txBody> </p:sp> {noformat} I can't spot any obvious ideas on quick glance... I'll attach the full master slide XML (there's lots of other stuff); could be the difference is elsewhere in there. > Master slide text isn't extracted > --------------------------------- > > Key: TIKA-712 > URL: https://issues.apache.org/jira/browse/TIKA-712 > Project: Tika > Issue Type: Bug > Components: parser > Reporter: Michael McCandless > Attachments: TIKA-712.patch, testPPT_masterFooter.ppt, > testPPT_masterFooter.pptx, testPPT_masterFooter2.ppt, > testPPT_masterFooter2.pptx > > > It looks like we are not getting text from the master slide for PPT > and PPTX. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira