[jira] [Commented] (TIKA-1353) OpenDocumentParser doesn't correctly process metadata

2014-06-24 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-1353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14042352#comment-14042352
 ] 

Hudson commented on TIKA-1353:
--

SUCCESS: Integrated in tika-trunk-jdk1.7 #64 (See 
[https://builds.apache.org/job/tika-trunk-jdk1.7/64/])
TIKA-1353 If a File is available, parse ODF documents with it, so that the 
metadata can always be processed first (nick: 
http://svn.apache.org/viewvc/tika/trunk/?view=revrev=1605124)
* 
/tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/odf/OpenDocumentParser.java
* 
/tika/trunk/tika-parsers/src/test/java/org/apache/tika/parser/odf/ODFParserTest.java


 OpenDocumentParser doesn't correctly process metadata
 -

 Key: TIKA-1353
 URL: https://issues.apache.org/jira/browse/TIKA-1353
 Project: Tika
  Issue Type: Bug
  Components: metadata, parser
Affects Versions: 1.5
Reporter: Steve R
 Fix For: 1.6

   Original Estimate: 24h
  Remaining Estimate: 24h

 When using OpenDocumentParser, the metadata isn't set correctly. When using 
 it to write an html file, the only metadata that it knows about is content 
 type because it is set ahead of time.
 The problem is that when iterating over the zip contents, meta.xml isn't 
 processed before content.xml. The metadata set on the parse object is correct 
 after parse() returns, however the contents of the resulting html file is 
 missing all of the metadata.
 Changing the code to be 
 boolean parsedMetaData = false;
 boolean delayLoadContent = false;
 while (entry != null) {
 ...
 } else if (entry.getName().equals(meta.xml)) {
 meta.parse(zip, new DefaultHandler(), metadata, context);
 parsedMetaData = true;
 if (delayLoadContent) {
 if (content instanceof OpenDocumentContentParser) {
 ((OpenDocumentContentParser) 
 content).parseInternal(zip, handler, metadata, context);
 } else {
 // Foreign content parser was set:
 content.parse(zip, handler, metadata, context);
 }
 }
 } else if (entry.getName().endsWith(content.xml)) {
 if (!parsedMetaData) {
 delayLoadContent = true;
 } else {
 if (content instanceof OpenDocumentContentParser) {
 ((OpenDocumentContentParser) 
 content).parseInternal(zip, handler, metadata, context);
 } else {
 // Foreign content parser was set:
 content.parse(zip, handler, metadata, context);
 }
 }
 }
 works as expected.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (TIKA-1353) OpenDocumentParser doesn't correctly process metadata

2014-06-24 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-1353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14042376#comment-14042376
 ] 

Hudson commented on TIKA-1353:
--

SUCCESS: Integrated in tika-trunk-jdk1.6 #64 (See 
[https://builds.apache.org/job/tika-trunk-jdk1.6/64/])
TIKA-1353 If a File is available, parse ODF documents with it, so that the 
metadata can always be processed first (nick: 
http://svn.apache.org/viewvc/tika/trunk/?view=revrev=1605124)
* 
/tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/odf/OpenDocumentParser.java
* 
/tika/trunk/tika-parsers/src/test/java/org/apache/tika/parser/odf/ODFParserTest.java


 OpenDocumentParser doesn't correctly process metadata
 -

 Key: TIKA-1353
 URL: https://issues.apache.org/jira/browse/TIKA-1353
 Project: Tika
  Issue Type: Bug
  Components: metadata, parser
Affects Versions: 1.5
Reporter: Steve R
 Fix For: 1.6

   Original Estimate: 24h
  Remaining Estimate: 24h

 When using OpenDocumentParser, the metadata isn't set correctly. When using 
 it to write an html file, the only metadata that it knows about is content 
 type because it is set ahead of time.
 The problem is that when iterating over the zip contents, meta.xml isn't 
 processed before content.xml. The metadata set on the parse object is correct 
 after parse() returns, however the contents of the resulting html file is 
 missing all of the metadata.
 Changing the code to be 
 boolean parsedMetaData = false;
 boolean delayLoadContent = false;
 while (entry != null) {
 ...
 } else if (entry.getName().equals(meta.xml)) {
 meta.parse(zip, new DefaultHandler(), metadata, context);
 parsedMetaData = true;
 if (delayLoadContent) {
 if (content instanceof OpenDocumentContentParser) {
 ((OpenDocumentContentParser) 
 content).parseInternal(zip, handler, metadata, context);
 } else {
 // Foreign content parser was set:
 content.parse(zip, handler, metadata, context);
 }
 }
 } else if (entry.getName().endsWith(content.xml)) {
 if (!parsedMetaData) {
 delayLoadContent = true;
 } else {
 if (content instanceof OpenDocumentContentParser) {
 ((OpenDocumentContentParser) 
 content).parseInternal(zip, handler, metadata, context);
 } else {
 // Foreign content parser was set:
 content.parse(zip, handler, metadata, context);
 }
 }
 }
 works as expected.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (TIKA-1353) OpenDocumentParser doesn't correctly process metadata

2014-06-23 Thread Steve R (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-1353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14041639#comment-14041639
 ] 

Steve R commented on TIKA-1353:
---

Ignore my suggested code example, it clearly doesn't work.

My question is now this, why is the following code commented out? It seems to 
work.

/*
 * ZipFile zipFile; if (stream instanceof TikaInputStream) { 
TikaInputStream tis = (TikaInputStream) stream;
 * Object container = ((TikaInputStream) stream).getOpenContainer(); if 
(container instanceof ZipFile) { zipFile
 * = (ZipFile) container; } else if (tis.hasFile()) { zipFile = new 
ZipFile(tis.getFile()); } }
 */

// TODO: if incoming IS is a TIS with a file
// associated, we should open ZipFile so we can
// visit metadata, mimetype first; today we lose
// all the metadata if meta.xml is hit after
// content.xml in the stream. Then we can still
// read-once for the content.xml.

 OpenDocumentParser doesn't correctly process metadata
 -

 Key: TIKA-1353
 URL: https://issues.apache.org/jira/browse/TIKA-1353
 Project: Tika
  Issue Type: Bug
  Components: metadata, parser
Affects Versions: 1.5
Reporter: Steve R
   Original Estimate: 24h
  Remaining Estimate: 24h

 When using OpenDocumentParser, the metadata isn't set correctly. When using 
 it to write an html file, the only metadata that it knows about is content 
 type because it is set ahead of time.
 The problem is that when iterating over the zip contents, meta.xml isn't 
 processed before content.xml. The metadata set on the parse object is correct 
 after parse() returns, however the contents of the resulting html file is 
 missing all of the metadata.
 Changing the code to be 
 boolean parsedMetaData = false;
 boolean delayLoadContent = false;
 while (entry != null) {
 ...
 } else if (entry.getName().equals(meta.xml)) {
 meta.parse(zip, new DefaultHandler(), metadata, context);
 parsedMetaData = true;
 if (delayLoadContent) {
 if (content instanceof OpenDocumentContentParser) {
 ((OpenDocumentContentParser) 
 content).parseInternal(zip, handler, metadata, context);
 } else {
 // Foreign content parser was set:
 content.parse(zip, handler, metadata, context);
 }
 }
 } else if (entry.getName().endsWith(content.xml)) {
 if (!parsedMetaData) {
 delayLoadContent = true;
 } else {
 if (content instanceof OpenDocumentContentParser) {
 ((OpenDocumentContentParser) 
 content).parseInternal(zip, handler, metadata, context);
 } else {
 // Foreign content parser was set:
 content.parse(zip, handler, metadata, context);
 }
 }
 }
 works as expected.



--
This message was sent by Atlassian JIRA
(v6.2#6252)