[jira] [Commented] (TIKA-1860) Tika 2.0 - Create Module OSGi implementations to replace tika-bundle

2016-02-27 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-1860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15170952#comment-15170952
 ] 

Hudson commented on TIKA-1860:
--

SUCCESS: Integrated in tika-2.x #39 (See 
[https://builds.apache.org/job/tika-2.x/39/])
TIKA-1860 - Enable osgi integration tests.  Added explicit jar path from (bob: 
rev 3962ceb7a11cabc2eb0c2364908ea430c208efeb)
* tika-parser-modules/tika-parser-crypto-module/test-bundles.xml
* tika-parser-modules/pom.xml
* tika-parser-modules/tika-parser-code-module/pom.xml
* 
tika-parser-modules/tika-parser-multimedia-module/src/test/java/org/apache/tika/module/BundleIT.java
* 
tika-parser-modules/tika-parser-cad-module/src/test/java/org/apache/tika/module/BundleIT.java
* tika-parser-modules/tika-parser-crypto-module/pom.xml
* tika-parser-modules/tika-parser-advanced-module/test-bundles.xml
* tika-parser-modules/tika-parser-code-module/test-bundles.xml
* tika-parser-modules/tika-parser-advanced-module/pom.xml
* 
tika-parser-modules/tika-parser-crypto-module/src/test/java/org/apache/tika/module/BundleIT.java
* 
tika-parser-modules/tika-parser-code-module/src/test/java/org/apache/tika/module/BundleIT.java
* tika-parser-modules/tika-parser-cad-module/test-bundles.xml
* tika-parser-modules/tika-parser-multimedia-module/test-bundles.xml
* tika-parser-modules/tika-parser-cad-module/pom.xml
* 
tika-parser-modules/tika-parser-advanced-module/src/test/java/org/apache/tika/module/BundleIT.java
* tika-parser-modules/tika-parser-multimedia-module/pom.xml


> Tika 2.0 - Create Module OSGi implementations to replace tika-bundle
> 
>
> Key: TIKA-1860
> URL: https://issues.apache.org/jira/browse/TIKA-1860
> Project: Tika
>  Issue Type: Sub-task
>Reporter: Bob Paulin
>Assignee: Bob Paulin
>
> Create a replacement for the OSGi tika-bundle project out of the new 
> tika-parser-* modules



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TIKA-1860) Tika 2.0 - Create Module OSGi implementations to replace tika-bundle

2016-02-27 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-1860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15170892#comment-15170892
 ] 

Hudson commented on TIKA-1860:
--

UNSTABLE: Integrated in tika-2.x #38 (See 
[https://builds.apache.org/job/tika-2.x/38/])
TIKA-1860 - Enable osgi integration tests.  Added explicit version to (bob: rev 
4ed7c9f7e63f57d4c99569533d7cc08a7c324392)
* 
tika-parser-modules/tika-parser-multimedia-module/src/test/java/org/apache/tika/module/BundleIT.java
* 
tika-parser-modules/tika-parser-cad-module/src/test/java/org/apache/tika/module/BundleIT.java
* 
tika-parser-modules/tika-parser-code-module/src/test/java/org/apache/tika/module/BundleIT.java
* tika-parser-modules/pom.xml
* 
tika-parser-modules/tika-parser-advanced-module/src/test/java/org/apache/tika/module/BundleIT.java
* 
tika-parser-modules/tika-parser-crypto-module/src/test/java/org/apache/tika/module/BundleIT.java


> Tika 2.0 - Create Module OSGi implementations to replace tika-bundle
> 
>
> Key: TIKA-1860
> URL: https://issues.apache.org/jira/browse/TIKA-1860
> Project: Tika
>  Issue Type: Sub-task
>Reporter: Bob Paulin
>Assignee: Bob Paulin
>
> Create a replacement for the OSGi tika-bundle project out of the new 
> tika-parser-* modules



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (TIKA-1878) Upgrade Apache SIS 0.6

2016-02-27 Thread Hendy Irawan (JIRA)
Hendy Irawan created TIKA-1878:
--

 Summary: Upgrade Apache SIS 0.6
 Key: TIKA-1878
 URL: https://issues.apache.org/jira/browse/TIKA-1878
 Project: Tika
  Issue Type: Improvement
  Components: parser
Affects Versions: 1.12
Reporter: Hendy Irawan
Priority: Trivial


Pull request here: https://github.com/apache/tika/pull/79



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TIKA-1878) Upgrade Apache SIS 0.6

2016-02-27 Thread Hendy Irawan (JIRA)

 [ 
https://issues.apache.org/jira/browse/TIKA-1878?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hendy Irawan updated TIKA-1878:
---
Flags: Patch

> Upgrade Apache SIS 0.6
> --
>
> Key: TIKA-1878
> URL: https://issues.apache.org/jira/browse/TIKA-1878
> Project: Tika
>  Issue Type: Improvement
>  Components: parser
>Affects Versions: 1.12
>Reporter: Hendy Irawan
>Priority: Trivial
>
> Pull request here: https://github.com/apache/tika/pull/79



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] tika pull request: Upgrade to Apache SIS 0.6

2016-02-27 Thread ceefour
GitHub user ceefour opened a pull request:

https://github.com/apache/tika/pull/79

Upgrade to Apache SIS 0.6



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/ceefour/tika patch-1

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/tika/pull/79.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #79


commit 97a8a30b21d071fa8352e89b4e3b214c3495661c
Author: Hendy Irawan 
Date:   2016-02-28T04:10:48Z

Upgrade to Apache SIS 0.6




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (TIKA-1824) Tika 2.0 - Create Initial Parser Modules

2016-02-27 Thread Luis Filipe Nassif (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-1824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15170711#comment-15170711
 ] 

Luis Filipe Nassif commented on TIKA-1824:
--

Great job [~bobpaulin]! I suggest putting MboxParser, OutlookPSTParser and 
RFC822Parser in a separete tika-mail-parser module. OutlookPSTParser depends on 
java-lib-pst, not on POI. MboxParser depends on RFC822Parser. Unfortunately 
Outlook MSG parsing depends on POI and should stay into tika-office-parser 
module.

> Tika 2.0 -  Create Initial Parser Modules
> -
>
> Key: TIKA-1824
> URL: https://issues.apache.org/jira/browse/TIKA-1824
> Project: Tika
>  Issue Type: Improvement
>Affects Versions: 2.0
>Reporter: Bob Paulin
>Assignee: Bob Paulin
>
> Create initial break down of parser modules.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TIKA-1860) Tika 2.0 - Create Module OSGi implementations to replace tika-bundle

2016-02-27 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-1860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15170666#comment-15170666
 ] 

Hudson commented on TIKA-1860:
--

SUCCESS: Integrated in tika-2.x #37 (See 
[https://builds.apache.org/job/tika-2.x/37/])
TIKA-1860 - Disable osgi integration tests.  Still an issue with good (bob: rev 
125902c47d3c15d1c6021b20b2cf4affb99e5b7c)
* tika-parser-modules/pom.xml


> Tika 2.0 - Create Module OSGi implementations to replace tika-bundle
> 
>
> Key: TIKA-1860
> URL: https://issues.apache.org/jira/browse/TIKA-1860
> Project: Tika
>  Issue Type: Sub-task
>Reporter: Bob Paulin
>Assignee: Bob Paulin
>
> Create a replacement for the OSGi tika-bundle project out of the new 
> tika-parser-* modules



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TIKA-1860) Tika 2.0 - Create Module OSGi implementations to replace tika-bundle

2016-02-27 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-1860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15170650#comment-15170650
 ] 

Hudson commented on TIKA-1860:
--

UNSTABLE: Integrated in tika-2.x #36 (See 
[https://builds.apache.org/job/tika-2.x/36/])
TIKA-1860 - Temp disable osgi integration tests so tika-core builds. (bob: rev 
be43266a14bb3adc29e3ab770f573d6da6b7b871)
* tika-parser-modules/pom.xml
* tika-parser-modules/tika-parser-cad-module/pom.xml
* tika-parser-modules/tika-parser-multimedia-module/pom.xml
* tika-parser-modules/tika-parser-advanced-module/pom.xml
* tika-parser-modules/tika-parser-code-module/pom.xml
* tika-parser-modules/tika-parser-crypto-module/pom.xml
TIKA-1860 - Enable osgi integration tests (bob: rev 
b95c9f70313b6adfdb478d2dbcceda3d70054c3b)
* tika-parser-modules/tika-parser-advanced-module/pom.xml
* tika-parser-modules/tika-parser-cad-module/pom.xml
* tika-parser-modules/tika-parser-multimedia-module/pom.xml
* tika-parser-modules/tika-parser-crypto-module/pom.xml
* tika-parser-modules/tika-parser-code-module/pom.xml


> Tika 2.0 - Create Module OSGi implementations to replace tika-bundle
> 
>
> Key: TIKA-1860
> URL: https://issues.apache.org/jira/browse/TIKA-1860
> Project: Tika
>  Issue Type: Sub-task
>Reporter: Bob Paulin
>Assignee: Bob Paulin
>
> Create a replacement for the OSGi tika-bundle project out of the new 
> tika-parser-* modules



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TIKA-1860) Tika 2.0 - Create Module OSGi implementations to replace tika-bundle

2016-02-27 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-1860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15170630#comment-15170630
 ] 

Hudson commented on TIKA-1860:
--

SUCCESS: Integrated in tika-2.x #35 (See 
[https://builds.apache.org/job/tika-2.x/35/])
TIKA-1860 - Temp disable osgi integration tests so tika-core builds. (bob: rev 
e5d43a3a18f88ba2e1ea36f9d29fb164b190785e)
* tika-parser-modules/pom.xml


> Tika 2.0 - Create Module OSGi implementations to replace tika-bundle
> 
>
> Key: TIKA-1860
> URL: https://issues.apache.org/jira/browse/TIKA-1860
> Project: Tika
>  Issue Type: Sub-task
>Reporter: Bob Paulin
>Assignee: Bob Paulin
>
> Create a replacement for the OSGi tika-bundle project out of the new 
> tika-parser-* modules



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TIKA-1865) Save sender email address in Outlook MSG metadata

2016-02-27 Thread Tim Allison (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-1865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15170601#comment-15170601
 ] 

Tim Allison commented on TIKA-1865:
---

http://download.microsoft.com/download/5/D/D/5DD33FDF-91F5-496D-9884-0A0B0EE698BB/[MS-OXMSG].pdf
 

If anyone has time and the inclination...

> Save sender email address in Outlook MSG metadata
> -
>
> Key: TIKA-1865
> URL: https://issues.apache.org/jira/browse/TIKA-1865
> Project: Tika
>  Issue Type: Improvement
>  Components: parser
>Affects Versions: 1.11
> Environment: Windows 7 x64, jre 1.8.0_60 x64
>Reporter: Luis Filipe Nassif
>
> Sender email address is lost when extracting metadata from Outlook msg files. 
> Currently only sender name is extracted. That is an important information to 
> be extracted for search engines.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TIKA-1865) Save sender email address in Outlook MSG metadata

2016-02-27 Thread Tim Allison (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-1865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15170599#comment-15170599
 ] 

Tim Allison commented on TIKA-1865:
---

Outlook shows part of a name, but no address. Couldn't see address w hex 
editor. POI has a really useful msg dumper to display chunks...next step...

> Save sender email address in Outlook MSG metadata
> -
>
> Key: TIKA-1865
> URL: https://issues.apache.org/jira/browse/TIKA-1865
> Project: Tika
>  Issue Type: Improvement
>  Components: parser
>Affects Versions: 1.11
> Environment: Windows 7 x64, jre 1.8.0_60 x64
>Reporter: Luis Filipe Nassif
>
> Sender email address is lost when extracting metadata from Outlook msg files. 
> Currently only sender name is extracted. That is an important information to 
> be extracted for search engines.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TIKA-1877) On updating the tika-mimetypes.xml to detect .fts file format, tika detector does not return anything

2016-02-27 Thread Nick Burch (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-1877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15170579#comment-15170579
 ] 

Nick Burch commented on TIKA-1877:
--

Posting the whole modified tika mimetypes file isn't ideal - it's hard for us 
to see what has changed and what hasn't, especially given the file's large 
size. Would you be able to post a patch/diff showing just your changes, to help 
us review and possibly spot the issue?

(I tried diff'ing it to trunk, but got such a large number of changes I 
couldn't see what was supposed to be your change amongst them)

Ideally, also, it would be easier if you could write a short junit unit test 
showing the detection issue. That's generally much quicker and easier to test 
with, as well as having the bonus of proving a check to ensure that post-fix it 
stays fixed!

> On updating the tika-mimetypes.xml to detect .fts file format, tika detector 
> does not return anything
> -
>
> Key: TIKA-1877
> URL: https://issues.apache.org/jira/browse/TIKA-1877
> Project: Tika
>  Issue Type: Bug
>  Components: mime
>Reporter: Prasad Nagaraj Subramanya
>Priority: Minor
> Attachments: 
> 4E8D6B46E2366D7063DE3926AF0F976A0DCCD57A7E3B53B7D54768F16DD23984, 
> tika-mimetypes.xml
>
>
> The match value for .fts file format in tika-mimetypes.xml is "SIMPLE  =  
>   T".
> Tika detected a .fts file as application/octet-stream. On verifying the 
> header I found the value to be "SIMPLE  =T"(just 16 spaces 
> before = and T)
> I tried the following changes-
> Change 1) Updated the existing match value. But the build failed 
> Change 2) Added a new match value  type="string" offset="0"/> after the existing one.
> But now, tika returns empty value. It neither identifies the file as .fts nor 
> as application/octet-stream.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TIKA-1875) Updating tika-mimetypes.xml to detect .NC files

2016-02-27 Thread Nick Burch (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-1875?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15170572#comment-15170572
 ] 

Nick Burch commented on TIKA-1875:
--

As mentioned on list, there is a github pull for this: 
https://github.com/apache/tika/pull/78 (needs some more work before committing 
though)

> Updating tika-mimetypes.xml to detect .NC files 
> 
>
> Key: TIKA-1875
> URL: https://issues.apache.org/jira/browse/TIKA-1875
> Project: Tika
>  Issue Type: Improvement
>  Components: mime
>Affects Versions: 1.12
>Reporter: Prasad Nagaraj Subramanya
>Priority: Minor
>  Labels: patch
> Fix For: 1.11
>
>
> Adding magic number to detect .NC files



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)