[jira] [Commented] (TIKA-3731) Tika CAD DWG reader not pulling meta data from new cad files

2022-04-26 Thread Dan Coldrick (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17528205#comment-17528205
 ] 

Dan Coldrick commented on TIKA-3731:


[~tallison]  Fine by me ref the custom metadata keys

> Tika CAD DWG reader not pulling meta data from new cad files
> 
>
> Key: TIKA-3731
> URL: https://issues.apache.org/jira/browse/TIKA-3731
> Project: Tika
>  Issue Type: Improvement
>  Components: metadata
>Affects Versions: 2.3.0
>Reporter: Dan Coldrick
>Priority: Minor
> Fix For: 2.4.0
>
> Attachments: AutoCAD 2018 format (1).dwg, testDWG-AC1027.dwg
>
>
>  
> The tika DWG reader is only pulling meta data from up to drawing format 
> AC1024  (see code snippet) where it looks to be AC1027 & AC1032 can also be 
> read from the same get2007and2010Props meta data extractor.
> {code:java}
>  switch (version) {
>             case "AC1015":
>                 metadata.set(Metadata.CONTENT_TYPE, TYPE.toString());
>                 if (skipTo2000PropertyInfoSection(stream, header)) {
>                     get2000Props(stream, metadata, xhtml);
>                 }
>                 break;
>             case "AC1018":
>                 metadata.set(Metadata.CONTENT_TYPE, TYPE.toString());
>                 if (skipToPropertyInfoSection(stream, header)) {
>                     get2004Props(stream, metadata, xhtml);
>                 }
>                 break;
>             case "AC1021":
>             case "AC1024":
>                 metadata.set(Metadata.CONTENT_TYPE, TYPE.toString());
>                 if (skipToPropertyInfoSection(stream, header)) {
>                     get2007and2010Props(stream, metadata, xhtml);
>                 }
>                 break;
>             default:
>                 throw new TikaException("Unsupported AutoCAD drawing version: 
> " + version);
>         } {code}
> Looks like the case statement just needs extending and for examples files to 
> be created for AC1027/AC1032. 
> Current versions of auto cad can be found here:
> https://knowledge.autodesk.com/support/autocad/learn-explore/caas/sfdcarticles/sfdcarticles/drawing-version-codes-for-autocad.html
>  



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Commented] (TIKA-3731) Tika CAD DWG reader not pulling meta data from new cad files

2022-04-26 Thread Hudson (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17528198#comment-17528198
 ] 

Hudson commented on TIKA-3731:
--

SUCCESS: Integrated in Jenkins build Tika » tika-main-jdk8 #525 (See 
[https://ci-builds.apache.org/job/Tika/job/tika-main-jdk8/525/])
TIKA-3731 -- expand metadata extraction for DWG AC1027 and AC1032; add prefix 
for custom metadata (tallison: 
[https://github.com/apache/tika/commit/079db8d8286d681dd05568b532e11fbf02f23fd0])
* (add) 
tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-cad-module/src/test/resources/test-documents/testDWG-AC1027.dwg
* (edit) CHANGES.txt
* (edit) 
tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-cad-module/src/test/java/org/apache/tika/parser/dwg/DWGParserTest.java
* (edit) 
tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-cad-module/src/main/java/org/apache/tika/parser/dwg/DWGParser.java
* (add) 
tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-cad-module/src/test/resources/test-documents/testDWG-AC1032.dwg


> Tika CAD DWG reader not pulling meta data from new cad files
> 
>
> Key: TIKA-3731
> URL: https://issues.apache.org/jira/browse/TIKA-3731
> Project: Tika
>  Issue Type: Improvement
>  Components: metadata
>Affects Versions: 2.3.0
>Reporter: Dan Coldrick
>Priority: Minor
> Fix For: 2.4.0
>
> Attachments: AutoCAD 2018 format (1).dwg, testDWG-AC1027.dwg
>
>
>  
> The tika DWG reader is only pulling meta data from up to drawing format 
> AC1024  (see code snippet) where it looks to be AC1027 & AC1032 can also be 
> read from the same get2007and2010Props meta data extractor.
> {code:java}
>  switch (version) {
>             case "AC1015":
>                 metadata.set(Metadata.CONTENT_TYPE, TYPE.toString());
>                 if (skipTo2000PropertyInfoSection(stream, header)) {
>                     get2000Props(stream, metadata, xhtml);
>                 }
>                 break;
>             case "AC1018":
>                 metadata.set(Metadata.CONTENT_TYPE, TYPE.toString());
>                 if (skipToPropertyInfoSection(stream, header)) {
>                     get2004Props(stream, metadata, xhtml);
>                 }
>                 break;
>             case "AC1021":
>             case "AC1024":
>                 metadata.set(Metadata.CONTENT_TYPE, TYPE.toString());
>                 if (skipToPropertyInfoSection(stream, header)) {
>                     get2007and2010Props(stream, metadata, xhtml);
>                 }
>                 break;
>             default:
>                 throw new TikaException("Unsupported AutoCAD drawing version: 
> " + version);
>         } {code}
> Looks like the case statement just needs extending and for examples files to 
> be created for AC1027/AC1032. 
> Current versions of auto cad can be found here:
> https://knowledge.autodesk.com/support/autocad/learn-explore/caas/sfdcarticles/sfdcarticles/drawing-version-codes-for-autocad.html
>  



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Commented] (TIKA-3731) Tika CAD DWG reader not pulling meta data from new cad files

2022-04-26 Thread Tim Allison (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17528158#comment-17528158
 ] 

Tim Allison commented on TIKA-3731:
---

+1.  Thank you, [~nick]!

> Tika CAD DWG reader not pulling meta data from new cad files
> 
>
> Key: TIKA-3731
> URL: https://issues.apache.org/jira/browse/TIKA-3731
> Project: Tika
>  Issue Type: Improvement
>  Components: metadata
>Affects Versions: 2.3.0
>Reporter: Dan Coldrick
>Priority: Minor
> Fix For: 2.4.0
>
> Attachments: AutoCAD 2018 format (1).dwg, testDWG-AC1027.dwg
>
>
>  
> The tika DWG reader is only pulling meta data from up to drawing format 
> AC1024  (see code snippet) where it looks to be AC1027 & AC1032 can also be 
> read from the same get2007and2010Props meta data extractor.
> {code:java}
>  switch (version) {
>             case "AC1015":
>                 metadata.set(Metadata.CONTENT_TYPE, TYPE.toString());
>                 if (skipTo2000PropertyInfoSection(stream, header)) {
>                     get2000Props(stream, metadata, xhtml);
>                 }
>                 break;
>             case "AC1018":
>                 metadata.set(Metadata.CONTENT_TYPE, TYPE.toString());
>                 if (skipToPropertyInfoSection(stream, header)) {
>                     get2004Props(stream, metadata, xhtml);
>                 }
>                 break;
>             case "AC1021":
>             case "AC1024":
>                 metadata.set(Metadata.CONTENT_TYPE, TYPE.toString());
>                 if (skipToPropertyInfoSection(stream, header)) {
>                     get2007and2010Props(stream, metadata, xhtml);
>                 }
>                 break;
>             default:
>                 throw new TikaException("Unsupported AutoCAD drawing version: 
> " + version);
>         } {code}
> Looks like the case statement just needs extending and for examples files to 
> be created for AC1027/AC1032. 
> Current versions of auto cad can be found here:
> https://knowledge.autodesk.com/support/autocad/learn-explore/caas/sfdcarticles/sfdcarticles/drawing-version-codes-for-autocad.html
>  



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Commented] (TIKA-3731) Tika CAD DWG reader not pulling meta data from new cad files

2022-04-26 Thread Nick Burch (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17528157#comment-17528157
 ] 

Nick Burch commented on TIKA-3731:
--

We already do a prefix for several other formats for custom metadata keys, so 
makes sense to me

> Tika CAD DWG reader not pulling meta data from new cad files
> 
>
> Key: TIKA-3731
> URL: https://issues.apache.org/jira/browse/TIKA-3731
> Project: Tika
>  Issue Type: Improvement
>  Components: metadata
>Affects Versions: 2.3.0
>Reporter: Dan Coldrick
>Priority: Minor
> Fix For: 2.4.0
>
> Attachments: AutoCAD 2018 format (1).dwg, testDWG-AC1027.dwg
>
>
>  
> The tika DWG reader is only pulling meta data from up to drawing format 
> AC1024  (see code snippet) where it looks to be AC1027 & AC1032 can also be 
> read from the same get2007and2010Props meta data extractor.
> {code:java}
>  switch (version) {
>             case "AC1015":
>                 metadata.set(Metadata.CONTENT_TYPE, TYPE.toString());
>                 if (skipTo2000PropertyInfoSection(stream, header)) {
>                     get2000Props(stream, metadata, xhtml);
>                 }
>                 break;
>             case "AC1018":
>                 metadata.set(Metadata.CONTENT_TYPE, TYPE.toString());
>                 if (skipToPropertyInfoSection(stream, header)) {
>                     get2004Props(stream, metadata, xhtml);
>                 }
>                 break;
>             case "AC1021":
>             case "AC1024":
>                 metadata.set(Metadata.CONTENT_TYPE, TYPE.toString());
>                 if (skipToPropertyInfoSection(stream, header)) {
>                     get2007and2010Props(stream, metadata, xhtml);
>                 }
>                 break;
>             default:
>                 throw new TikaException("Unsupported AutoCAD drawing version: 
> " + version);
>         } {code}
> Looks like the case statement just needs extending and for examples files to 
> be created for AC1027/AC1032. 
> Current versions of auto cad can be found here:
> https://knowledge.autodesk.com/support/autocad/learn-explore/caas/sfdcarticles/sfdcarticles/drawing-version-codes-for-autocad.html
>  



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Commented] (TIKA-3731) Tika CAD DWG reader not pulling meta data from new cad files

2022-04-26 Thread Tim Allison (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17528107#comment-17528107
 ] 

Tim Allison commented on TIKA-3731:
---

Unrelated, but while I'm looking at the DWGParser, I'd like to add prefixes to 
custom metadata (dwg-custom:).  It is dangerous not to prefix user generated 
metadata keys.  Any objections?

> Tika CAD DWG reader not pulling meta data from new cad files
> 
>
> Key: TIKA-3731
> URL: https://issues.apache.org/jira/browse/TIKA-3731
> Project: Tika
>  Issue Type: Improvement
>  Components: metadata
>Affects Versions: 2.3.0
>Reporter: Dan Coldrick
>Priority: Minor
> Attachments: AutoCAD 2018 format (1).dwg, testDWG-AC1027.dwg
>
>
>  
> The tika DWG reader is only pulling meta data from up to drawing format 
> AC1024  (see code snippet) where it looks to be AC1027 & AC1032 can also be 
> read from the same get2007and2010Props meta data extractor.
> {code:java}
>  switch (version) {
>             case "AC1015":
>                 metadata.set(Metadata.CONTENT_TYPE, TYPE.toString());
>                 if (skipTo2000PropertyInfoSection(stream, header)) {
>                     get2000Props(stream, metadata, xhtml);
>                 }
>                 break;
>             case "AC1018":
>                 metadata.set(Metadata.CONTENT_TYPE, TYPE.toString());
>                 if (skipToPropertyInfoSection(stream, header)) {
>                     get2004Props(stream, metadata, xhtml);
>                 }
>                 break;
>             case "AC1021":
>             case "AC1024":
>                 metadata.set(Metadata.CONTENT_TYPE, TYPE.toString());
>                 if (skipToPropertyInfoSection(stream, header)) {
>                     get2007and2010Props(stream, metadata, xhtml);
>                 }
>                 break;
>             default:
>                 throw new TikaException("Unsupported AutoCAD drawing version: 
> " + version);
>         } {code}
> Looks like the case statement just needs extending and for examples files to 
> be created for AC1027/AC1032. 
> Current versions of auto cad can be found here:
> https://knowledge.autodesk.com/support/autocad/learn-explore/caas/sfdcarticles/sfdcarticles/drawing-version-codes-for-autocad.html
>  



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Commented] (TIKA-3731) Tika CAD DWG reader not pulling meta data from new cad files

2022-04-25 Thread Dan Coldrick (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17527787#comment-17527787
 ] 

Dan Coldrick commented on TIKA-3731:


I've attached a AC1027 and AC1032 dwg to extend the tests.

> Tika CAD DWG reader not pulling meta data from new cad files
> 
>
> Key: TIKA-3731
> URL: https://issues.apache.org/jira/browse/TIKA-3731
> Project: Tika
>  Issue Type: Bug
>  Components: metadata
>Affects Versions: 2.3.0
>Reporter: Dan Coldrick
>Priority: Major
> Attachments: AutoCAD 2018 format (1).dwg, testDWG-AC1027.dwg
>
>
>  
> The tika DWG reader is only pulling meta data from up to drawing format 
> AC1024  (see code snippet) where it looks to be AC1027 & AC1032 can also be 
> read from the same get2007and2010Props meta data extractor.
> {code:java}
>  switch (version) {
>             case "AC1015":
>                 metadata.set(Metadata.CONTENT_TYPE, TYPE.toString());
>                 if (skipTo2000PropertyInfoSection(stream, header)) {
>                     get2000Props(stream, metadata, xhtml);
>                 }
>                 break;
>             case "AC1018":
>                 metadata.set(Metadata.CONTENT_TYPE, TYPE.toString());
>                 if (skipToPropertyInfoSection(stream, header)) {
>                     get2004Props(stream, metadata, xhtml);
>                 }
>                 break;
>             case "AC1021":
>             case "AC1024":
>                 metadata.set(Metadata.CONTENT_TYPE, TYPE.toString());
>                 if (skipToPropertyInfoSection(stream, header)) {
>                     get2007and2010Props(stream, metadata, xhtml);
>                 }
>                 break;
>             default:
>                 throw new TikaException("Unsupported AutoCAD drawing version: 
> " + version);
>         } {code}
> Looks like the case statement just needs extending and for examples files to 
> be created for AC1027/AC1032. 
> Current versions of auto cad can be found here:
> https://knowledge.autodesk.com/support/autocad/learn-explore/caas/sfdcarticles/sfdcarticles/drawing-version-codes-for-autocad.html
>  



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Commented] (TIKA-3731) Tika CAD DWG reader not pulling meta data from new cad files

2022-04-25 Thread Dan Coldrick (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17527786#comment-17527786
 ] 

Dan Coldrick commented on TIKA-3731:


related to https://issues.apache.org/jira/browse/TIKA-1735 but that looked to 
also try to include a parser so thought it would be good to split the two 
issues and get the bug fixed. 

> Tika CAD DWG reader not pulling meta data from new cad files
> 
>
> Key: TIKA-3731
> URL: https://issues.apache.org/jira/browse/TIKA-3731
> Project: Tika
>  Issue Type: Bug
>  Components: metadata
>Affects Versions: 2.3.0
>Reporter: Dan Coldrick
>Priority: Major
>
>  
> The tika DWG reader is only pulling meta data from up to drawing format 
> AC1024  (see code snippet) where it looks to be AC1027 & AC1032 can also be 
> read from the same get2007and2010Props meta data extractor.
> {code:java}
>  switch (version) {
>             case "AC1015":
>                 metadata.set(Metadata.CONTENT_TYPE, TYPE.toString());
>                 if (skipTo2000PropertyInfoSection(stream, header)) {
>                     get2000Props(stream, metadata, xhtml);
>                 }
>                 break;
>             case "AC1018":
>                 metadata.set(Metadata.CONTENT_TYPE, TYPE.toString());
>                 if (skipToPropertyInfoSection(stream, header)) {
>                     get2004Props(stream, metadata, xhtml);
>                 }
>                 break;
>             case "AC1021":
>             case "AC1024":
>                 metadata.set(Metadata.CONTENT_TYPE, TYPE.toString());
>                 if (skipToPropertyInfoSection(stream, header)) {
>                     get2007and2010Props(stream, metadata, xhtml);
>                 }
>                 break;
>             default:
>                 throw new TikaException("Unsupported AutoCAD drawing version: 
> " + version);
>         } {code}
> Looks like the case statement just needs extending and for examples files to 
> be created for AC1027/AC1032. 
> Current versions of auto cad can be found here:
> https://knowledge.autodesk.com/support/autocad/learn-explore/caas/sfdcarticles/sfdcarticles/drawing-version-codes-for-autocad.html
>  



--
This message was sent by Atlassian Jira
(v8.20.7#820007)