[jira] [Commented] (TIKA-1663) Add a DigestingParser to add MD5/SHA-X hashes as fields in Metadata

2015-06-27 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-1663?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14604476#comment-14604476
 ] 

Hudson commented on TIKA-1663:
--

SUCCESS: Integrated in tika-trunk-jdk1.7 #769 (See 
[https://builds.apache.org/job/tika-trunk-jdk1.7/769/])
TIKA-1663 add a DigestingParser (tallison: 
http://svn.apache.org/viewvc/tika/trunk/?view=rev&rev=1687981)
* /tika/trunk/CHANGES.txt
* /tika/trunk/tika-app/src/main/java/org/apache/tika/batch
* 
/tika/trunk/tika-app/src/main/java/org/apache/tika/batch/DigestingAutoDetectParserFactory.java
* /tika/trunk/tika-app/src/main/java/org/apache/tika/batch/builders
* 
/tika/trunk/tika-app/src/main/java/org/apache/tika/batch/builders/AppParserFactoryBuilder.java
* /tika/trunk/tika-app/src/main/java/org/apache/tika/cli/TikaCLI.java
* /tika/trunk/tika-app/src/main/java/org/apache/tika/gui/TikaGUI.java
* /tika/trunk/tika-app/src/main/resources/tika-app-batch-config.xml
* 
/tika/trunk/tika-app/src/test/java/org/apache/tika/cli/TikaCLIBatchIntegrationTest.java
* /tika/trunk/tika-app/src/test/java/org/apache/tika/cli/TikaCLITest.java
* /tika/trunk/tika-app/src/test/resources/log4j_batch_process_test.properties
* 
/tika/trunk/tika-batch/src/main/java/org/apache/tika/batch/AutoDetectParserFactory.java
* /tika/trunk/tika-batch/src/main/java/org/apache/tika/batch/ParserFactory.java
* 
/tika/trunk/tika-batch/src/main/java/org/apache/tika/batch/builders/IParserFactoryBuilder.java
* 
/tika/trunk/tika-batch/src/main/java/org/apache/tika/batch/builders/ParserFactoryBuilder.java
* 
/tika/trunk/tika-batch/src/main/java/org/apache/tika/batch/fs/FSBatchProcessCLI.java
* 
/tika/trunk/tika-batch/src/main/java/org/apache/tika/batch/fs/builders/BasicTikaFSConsumersBuilder.java
* 
/tika/trunk/tika-batch/src/main/resources/org/apache/tika/batch/fs/default-tika-batch-config.xml
* 
/tika/trunk/tika-batch/src/test/java/org/apache/tika/parser/mock/MockParserFactory.java
* 
/tika/trunk/tika-batch/src/test/resources/tika-batch-config-MockConsumersBuilder.xml
* /tika/trunk/tika-batch/src/test/resources/tika-batch-config-broken.xml
* /tika/trunk/tika-batch/src/test/resources/tika-batch-config-test.xml
* 
/tika/trunk/tika-core/src/main/java/org/apache/tika/parser/DigestingParser.java
* /tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/utils
* 
/tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/utils/CommonsDigester.java
* /tika/trunk/tika-parsers/src/test/java/org/apache/tika/TikaTest.java
* 
/tika/trunk/tika-parsers/src/test/java/org/apache/tika/parser/DigestingParserTest.java
* 
/tika/trunk/tika-parsers/src/test/java/org/apache/tika/parser/RecursiveParserWrapperTest.java
* 
/tika/trunk/tika-server/src/main/java/org/apache/tika/server/TikaServerCli.java
* 
/tika/trunk/tika-server/src/main/java/org/apache/tika/server/resource/DetectorResource.java
* 
/tika/trunk/tika-server/src/main/java/org/apache/tika/server/resource/LanguageResource.java
* 
/tika/trunk/tika-server/src/main/java/org/apache/tika/server/resource/MetadataResource.java
* 
/tika/trunk/tika-server/src/main/java/org/apache/tika/server/resource/RecursiveMetadataResource.java
* 
/tika/trunk/tika-server/src/main/java/org/apache/tika/server/resource/TikaDetectors.java
* 
/tika/trunk/tika-server/src/main/java/org/apache/tika/server/resource/TikaMimeTypes.java
* 
/tika/trunk/tika-server/src/main/java/org/apache/tika/server/resource/TikaParsers.java
* 
/tika/trunk/tika-server/src/main/java/org/apache/tika/server/resource/TikaResource.java
* 
/tika/trunk/tika-server/src/main/java/org/apache/tika/server/resource/TikaUtils.java
* 
/tika/trunk/tika-server/src/main/java/org/apache/tika/server/resource/TikaVersion.java
* 
/tika/trunk/tika-server/src/main/java/org/apache/tika/server/resource/TikaWelcome.java
* 
/tika/trunk/tika-server/src/main/java/org/apache/tika/server/resource/TranslateResource.java
* 
/tika/trunk/tika-server/src/main/java/org/apache/tika/server/resource/UnpackerResource.java
* /tika/trunk/tika-server/src/test/java/org/apache/tika/server/CXFTestBase.java
* 
/tika/trunk/tika-server/src/test/java/org/apache/tika/server/DetectorResourceTest.java
* 
/tika/trunk/tika-server/src/test/java/org/apache/tika/server/LanguageResourceTest.java
* 
/tika/trunk/tika-server/src/test/java/org/apache/tika/server/MetadataResourceTest.java
* 
/tika/trunk/tika-server/src/test/java/org/apache/tika/server/RecursiveMetadataResourceTest.java
* 
/tika/trunk/tika-server/src/test/java/org/apache/tika/server/StackTraceOffTest.java
* 
/tika/trunk/tika-server/src/test/java/org/apache/tika/server/StackTraceTest.java
* 
/tika/trunk/tika-server/src/test/java/org/apache/tika/server/TikaDetectorsTest.java
* 
/tika/trunk/tika-server/src/test/java/org/apache/tika/server/TikaMimeTypesTest.java
* 
/tika/trunk/tika-server/src/test/java/org/apache/tika/server/TikaParsersTest.java
* 
/tika/trunk/tika-server/src/test/java/org/apache/tika/server/TikaResourceTest.java

[jira] [Resolved] (TIKA-1663) Add a DigestingParser to add MD5/SHA-X hashes as fields in Metadata

2015-06-27 Thread Tim Allison (JIRA)

 [ 
https://issues.apache.org/jira/browse/TIKA-1663?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Allison resolved TIKA-1663.
---
Resolution: Fixed

r1687981.

> Add a DigestingParser to add MD5/SHA-X hashes as fields in Metadata
> ---
>
> Key: TIKA-1663
> URL: https://issues.apache.org/jira/browse/TIKA-1663
> Project: Tika
>  Issue Type: Improvement
>Reporter: Tim Allison
>Priority: Minor
> Attachments: digesting_parser_v1.patch
>
>
> It might be useful to integrate commons' DigestUtils and allow users to 
> easily add the MD5 or other supported hashes to the Metadata object.
> Anyone else find this of use?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (TIKA-1601) Integrate Jackcess to handle MSAccess files

2015-06-27 Thread Tim Allison (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-1601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14603946#comment-14603946
 ] 

Tim Allison edited comment on TIKA-1601 at 6/27/15 6:44 PM:


Not anywhere near committing, but this is a rough start.

Some TODOs:
* -Figure out how to get non-ascii text out correctly-
* Figure out how to grab attachments from the accdb file
* Figure out if there's a flag for html-marked up text cells so that we can 
strip the markup [0]
* Figure out if there's a way to prevent Jackcess from trying to open linked 
files [0]
* Add unit tests :)

I used [~centic]'s code [1] to pull ~3k mdb files from CommonCrawl for testing.

[0]: https://sourceforge.net/p/jackcess/discussion/456474/thread/038878e6/
[1]: https://github.com/centic9/CommonCrawlDocumentDownload



was (Author: talli...@mitre.org):
Not anywhere near committing, but this is a rough start.

Some TODOs:
* Figure out how to get non-ascii text out correctly
* Figure out how to grab attachments from the accdb file
* Add unit tests :)


> Integrate Jackcess to handle MSAccess files
> ---
>
> Key: TIKA-1601
> URL: https://issues.apache.org/jira/browse/TIKA-1601
> Project: Tika
>  Issue Type: Improvement
>Reporter: Tim Allison
> Attachments: jackcess_nocommit_v1.patch, testAccess2.zip
>
>
> Recently, James Ahlborn, the current maintainer of 
> [Jackcess|http://jackcess.sourceforge.net/], kindly agreed to relicense 
> Jackcess to Apache 2.0.  [~boneill], the CTO at [Health Market Science, a 
> LexisNexis® Company|https://www.healthmarketscience.com/], also agreed with 
> this relicensing and led the charge to obtain all necessary corporate 
> approval to deliver a 
> [CCLA|https://www.apache.org/licenses/cla-corporate.txt] for Jackcess to 
> Apache.  As anyone who has tried to get corporate approval for anything 
> knows, this can sometimes require not a small bit of effort.
> If I may speak on behalf of Tika and the larger Apache community, I offer a 
> sincere thanks to James, Brian and the other developers and contributors to 
> Jackcess!!!
> Once the licensing info has been changed in Jackcess and the new release is 
> available in maven, we can integrate Jackcess into Tika and add a capability 
> to process MSAccess.
> As a side note, I reached out to the developers and contributors to determine 
> if there were any objections.  I couldn't find addresses for everyone, and 
> not everyone replied, but those who did offered their support to this move. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)