[jira] [Commented] (TIKA-1663) Add a DigestingParser to add MD5/SHA-X hashes as fields in Metadata
[ https://issues.apache.org/jira/browse/TIKA-1663?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14604476#comment-14604476 ] Hudson commented on TIKA-1663: -- SUCCESS: Integrated in tika-trunk-jdk1.7 #769 (See [https://builds.apache.org/job/tika-trunk-jdk1.7/769/]) TIKA-1663 add a DigestingParser (tallison: http://svn.apache.org/viewvc/tika/trunk/?view=rev&rev=1687981) * /tika/trunk/CHANGES.txt * /tika/trunk/tika-app/src/main/java/org/apache/tika/batch * /tika/trunk/tika-app/src/main/java/org/apache/tika/batch/DigestingAutoDetectParserFactory.java * /tika/trunk/tika-app/src/main/java/org/apache/tika/batch/builders * /tika/trunk/tika-app/src/main/java/org/apache/tika/batch/builders/AppParserFactoryBuilder.java * /tika/trunk/tika-app/src/main/java/org/apache/tika/cli/TikaCLI.java * /tika/trunk/tika-app/src/main/java/org/apache/tika/gui/TikaGUI.java * /tika/trunk/tika-app/src/main/resources/tika-app-batch-config.xml * /tika/trunk/tika-app/src/test/java/org/apache/tika/cli/TikaCLIBatchIntegrationTest.java * /tika/trunk/tika-app/src/test/java/org/apache/tika/cli/TikaCLITest.java * /tika/trunk/tika-app/src/test/resources/log4j_batch_process_test.properties * /tika/trunk/tika-batch/src/main/java/org/apache/tika/batch/AutoDetectParserFactory.java * /tika/trunk/tika-batch/src/main/java/org/apache/tika/batch/ParserFactory.java * /tika/trunk/tika-batch/src/main/java/org/apache/tika/batch/builders/IParserFactoryBuilder.java * /tika/trunk/tika-batch/src/main/java/org/apache/tika/batch/builders/ParserFactoryBuilder.java * /tika/trunk/tika-batch/src/main/java/org/apache/tika/batch/fs/FSBatchProcessCLI.java * /tika/trunk/tika-batch/src/main/java/org/apache/tika/batch/fs/builders/BasicTikaFSConsumersBuilder.java * /tika/trunk/tika-batch/src/main/resources/org/apache/tika/batch/fs/default-tika-batch-config.xml * /tika/trunk/tika-batch/src/test/java/org/apache/tika/parser/mock/MockParserFactory.java * /tika/trunk/tika-batch/src/test/resources/tika-batch-config-MockConsumersBuilder.xml * /tika/trunk/tika-batch/src/test/resources/tika-batch-config-broken.xml * /tika/trunk/tika-batch/src/test/resources/tika-batch-config-test.xml * /tika/trunk/tika-core/src/main/java/org/apache/tika/parser/DigestingParser.java * /tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/utils * /tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/utils/CommonsDigester.java * /tika/trunk/tika-parsers/src/test/java/org/apache/tika/TikaTest.java * /tika/trunk/tika-parsers/src/test/java/org/apache/tika/parser/DigestingParserTest.java * /tika/trunk/tika-parsers/src/test/java/org/apache/tika/parser/RecursiveParserWrapperTest.java * /tika/trunk/tika-server/src/main/java/org/apache/tika/server/TikaServerCli.java * /tika/trunk/tika-server/src/main/java/org/apache/tika/server/resource/DetectorResource.java * /tika/trunk/tika-server/src/main/java/org/apache/tika/server/resource/LanguageResource.java * /tika/trunk/tika-server/src/main/java/org/apache/tika/server/resource/MetadataResource.java * /tika/trunk/tika-server/src/main/java/org/apache/tika/server/resource/RecursiveMetadataResource.java * /tika/trunk/tika-server/src/main/java/org/apache/tika/server/resource/TikaDetectors.java * /tika/trunk/tika-server/src/main/java/org/apache/tika/server/resource/TikaMimeTypes.java * /tika/trunk/tika-server/src/main/java/org/apache/tika/server/resource/TikaParsers.java * /tika/trunk/tika-server/src/main/java/org/apache/tika/server/resource/TikaResource.java * /tika/trunk/tika-server/src/main/java/org/apache/tika/server/resource/TikaUtils.java * /tika/trunk/tika-server/src/main/java/org/apache/tika/server/resource/TikaVersion.java * /tika/trunk/tika-server/src/main/java/org/apache/tika/server/resource/TikaWelcome.java * /tika/trunk/tika-server/src/main/java/org/apache/tika/server/resource/TranslateResource.java * /tika/trunk/tika-server/src/main/java/org/apache/tika/server/resource/UnpackerResource.java * /tika/trunk/tika-server/src/test/java/org/apache/tika/server/CXFTestBase.java * /tika/trunk/tika-server/src/test/java/org/apache/tika/server/DetectorResourceTest.java * /tika/trunk/tika-server/src/test/java/org/apache/tika/server/LanguageResourceTest.java * /tika/trunk/tika-server/src/test/java/org/apache/tika/server/MetadataResourceTest.java * /tika/trunk/tika-server/src/test/java/org/apache/tika/server/RecursiveMetadataResourceTest.java * /tika/trunk/tika-server/src/test/java/org/apache/tika/server/StackTraceOffTest.java * /tika/trunk/tika-server/src/test/java/org/apache/tika/server/StackTraceTest.java * /tika/trunk/tika-server/src/test/java/org/apache/tika/server/TikaDetectorsTest.java * /tika/trunk/tika-server/src/test/java/org/apache/tika/server/TikaMimeTypesTest.java * /tika/trunk/tika-server/src/test/java/org/apache/tika/server/TikaParsersTest.java * /tika/trunk/tika-server/src/test/java/org/apache/tika/server/TikaResourceTest.java
[jira] [Resolved] (TIKA-1663) Add a DigestingParser to add MD5/SHA-X hashes as fields in Metadata
[ https://issues.apache.org/jira/browse/TIKA-1663?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison resolved TIKA-1663. --- Resolution: Fixed r1687981. > Add a DigestingParser to add MD5/SHA-X hashes as fields in Metadata > --- > > Key: TIKA-1663 > URL: https://issues.apache.org/jira/browse/TIKA-1663 > Project: Tika > Issue Type: Improvement >Reporter: Tim Allison >Priority: Minor > Attachments: digesting_parser_v1.patch > > > It might be useful to integrate commons' DigestUtils and allow users to > easily add the MD5 or other supported hashes to the Metadata object. > Anyone else find this of use? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (TIKA-1601) Integrate Jackcess to handle MSAccess files
[ https://issues.apache.org/jira/browse/TIKA-1601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14603946#comment-14603946 ] Tim Allison edited comment on TIKA-1601 at 6/27/15 6:44 PM: Not anywhere near committing, but this is a rough start. Some TODOs: * -Figure out how to get non-ascii text out correctly- * Figure out how to grab attachments from the accdb file * Figure out if there's a flag for html-marked up text cells so that we can strip the markup [0] * Figure out if there's a way to prevent Jackcess from trying to open linked files [0] * Add unit tests :) I used [~centic]'s code [1] to pull ~3k mdb files from CommonCrawl for testing. [0]: https://sourceforge.net/p/jackcess/discussion/456474/thread/038878e6/ [1]: https://github.com/centic9/CommonCrawlDocumentDownload was (Author: talli...@mitre.org): Not anywhere near committing, but this is a rough start. Some TODOs: * Figure out how to get non-ascii text out correctly * Figure out how to grab attachments from the accdb file * Add unit tests :) > Integrate Jackcess to handle MSAccess files > --- > > Key: TIKA-1601 > URL: https://issues.apache.org/jira/browse/TIKA-1601 > Project: Tika > Issue Type: Improvement >Reporter: Tim Allison > Attachments: jackcess_nocommit_v1.patch, testAccess2.zip > > > Recently, James Ahlborn, the current maintainer of > [Jackcess|http://jackcess.sourceforge.net/], kindly agreed to relicense > Jackcess to Apache 2.0. [~boneill], the CTO at [Health Market Science, a > LexisNexis® Company|https://www.healthmarketscience.com/], also agreed with > this relicensing and led the charge to obtain all necessary corporate > approval to deliver a > [CCLA|https://www.apache.org/licenses/cla-corporate.txt] for Jackcess to > Apache. As anyone who has tried to get corporate approval for anything > knows, this can sometimes require not a small bit of effort. > If I may speak on behalf of Tika and the larger Apache community, I offer a > sincere thanks to James, Brian and the other developers and contributors to > Jackcess!!! > Once the licensing info has been changed in Jackcess and the new release is > available in maven, we can integrate Jackcess into Tika and add a capability > to process MSAccess. > As a side note, I reached out to the developers and contributors to determine > if there were any objections. I couldn't find addresses for everyone, and > not everyone replied, but those who did offered their support to this move. -- This message was sent by Atlassian JIRA (v6.3.4#6332)