[ 
https://issues.apache.org/jira/browse/TIKA-2309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15955231#comment-15955231
 ] 

ASF GitHub Bot commented on TIKA-2309:
--------------------------------------

Shinobi75 commented on issue #161: fix for TIKA-2309 contributed by Shinobi@75
URL: https://github.com/apache/tika/pull/161#issuecomment-291531208
 
 
   @tballison, Ok you're right. TSD is actually a crypto wrapper format for any 
other type of data files. I've tried to create a private method inside the 
TSDParser class to extract the metadata of the embedded TSD file: 
   
       private void parseTSDContent(InputStream stream, ContentHandler handler, 
                                     Metadata metadata, ParseContext context) {
        
            EmbeddedDocumentExtractor embeddedDocumentExtractor = 
                                                      new 
ParsingEmbeddedDocumentExtractor(context);
            
                if(embeddedDocumentExtractor.shouldParseEmbedded(metadata)) {
                   try(InputStream is = TikaInputStream.get(new 
CMSTimeStampedData(stream).getContent())) {
                           embeddedDocumentExtractor.parseEmbedded(is, handler, 
metadata, true);
                   } catch(Exception ex) {
                         LOG.error("Error in TSDParser.parseTSDContent ", 
ex.getMessage());
                   }
                }
       }
   
   but the metadata map, after the parseEmbedded method call, contains the same 
data before the call. Do you intend to call the EmbeddedDocumentExtractor 
inside TSDParser class or do you mean to call EmbeddedDocumentExtractor for 
test purpose inside TSDParserTest class?
   
   You can find the updated code in TIKA-2309 branch.
   
   Thank you for your patience
   
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> New Detector and Parser classes for Time Stamped Data Envelope file format
> --------------------------------------------------------------------------
>
>                 Key: TIKA-2309
>                 URL: https://issues.apache.org/jira/browse/TIKA-2309
>             Project: Tika
>          Issue Type: Improvement
>          Components: detector, parser
>    Affects Versions: 1.13, 1.14
>            Reporter: Fabio
>            Priority: Minor
>         Attachments: MANIFEST.XML.TSD
>
>
> Hello,
> I'm Fabio Evangelista from Rome. I'm working for an italian Public 
> Administration company and i'm using Apache Tika in my Java applications to 
> detect and parse a broad kinds of file formats. During that activity, after 
> following your good guide on Tika project page, I've made with success new 
> type of Detector and Parser classes for a particular crypto timestamp type 
> with these caracteristics:
> Format name:               Time Stamped Data Envelope
> Mime Type:                   application/timestamped-data
> File extension:              .tsd
> TSD file hax magic code at the start of the file:   30 80 06 0B 2A 86 48 86 F7
> I've integrated and tested successfully with my applications those new 
> classes in Tika 1.13 tika-core.jar and tika-parsers.jar. What should I do to 
> submit my new classes to you? Should I to push those in a particular git 
> branch or, is there a particular process to follow to submit my classes?
> Thank you for you patience and best regards.
> Fabio.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to