[ 
https://issues.apache.org/jira/browse/TIKA-3742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17529517#comment-17529517
 ] 

Nick Burch commented on TIKA-3742:
----------------------------------

I've updated the code in the gist to use Commons IO and skipFully / readFully, 
as well as fetching a few more values so it isn't blindly skipping as much.

I'm not sure if the text element type is always at the top level, or if we need 
to go hunting inside complex elements for them. Are you able to check in some 
of your larger test files [~monkmachine] ?

We might be able to get some useful info out of the tags, at least based on 
[http://dgnlib.maptools.org/dgn.html#type37] - do you have / could you create a 
test file with some Dan?

Finally, needs converting to an actual parser, 
[https://tika.apache.org/2.3.0/parser_guide.html] has the steps if you want to 
give it a whirl Dan!

> Advice around DGN7 parser and whether to add to TIKA
> ----------------------------------------------------
>
>                 Key: TIKA-3742
>                 URL: https://issues.apache.org/jira/browse/TIKA-3742
>             Project: Tika
>          Issue Type: Task
>          Components: parser
>            Reporter: Dan Coldrick
>            Priority: Minor
>         Attachments: DGN.zip, ExampleOutput.txt
>
>
> Hi [~tallison] & Whoever else. 
> I managed to compile the C/C++ library [http://dgnlib.maptools.org/]  for 
> DGN7 which produces an dgndump.exe which will dump all the data from the DGN. 
> From my initial testing it looks pretty good. 
> Would you guys think it was worth adding this or just keep it as a custom 
> parser rather than in the main source code? It's under MIT license. I've 
> attached the exe (zipped), a copy of the output from the dump and my very 
> dirty testing calling the exe (my code I was only interested in the Strings 
> so am only pulling those into a string array at the moment to check it's 
> pulling out the correct data).



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

Reply via email to