[jira] [Commented] (TIKA-1617) Change OSGi Detection test to use OSGi Service

2015-05-14 Thread Bob Paulin (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-1617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14544515#comment-14544515
 ] 

Bob Paulin commented on TIKA-1617:
--

{quote}
 Does that mean that the other unit tests in there that use a Tika object are 
running outside of the bundle? 
{quote}
Generally I would say you would want users to use the service rather than 
instantiating their own Tika objects.  When you use PAX-EXAM for unit tests in 
OSGi it does create a bundle with just the unit tests (see 
http://iocanel.blogspot.com/2012/01/advanced-integration-testing-with-pax.html).
  I'm not certain what the use case would be for instantiate a new Tika object 
when I already have one in a services that allows me to keep the Tika object in 
it's own classloader isolated from the rest of the code in the test bundle. The 
less we exposes the less classes we need to export out of the tika-bundle.
{quote}
And if so, how can we get the Tika facade class to play nicely with OSGi?
{quote}
What's not playing nicely right now?  I believe the test also passed with the 
Tika object there too.  I just think if you have a service it's better to use 
it since users will generally look to unit test for implementation hints.


> Change OSGi Detection test to use OSGi Service
> --
>
> Key: TIKA-1617
> URL: https://issues.apache.org/jira/browse/TIKA-1617
> Project: Tika
>  Issue Type: Test
>Reporter: Bob Paulin
>Priority: Minor
> Attachments: TIKA-1617.patch
>
>
> Currently the testDetection test does not actually use the OSGi service 
> created within the OSGi Framework.  I've changed the test to use the service 
> defined in the tika-bundle



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TIKA-1630) Mention APK support in List of Supported Formats

2015-05-14 Thread Lorenz Leutgeb (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-1630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14544315#comment-14544315
 ] 

Lorenz Leutgeb commented on TIKA-1630:
--

The page states "document formats supported by Apache Tika" and continues "and 
how it is parsed by Tika". For me as a user, not deeply involved in Tika, 
that's slightly misleading and translates to "everything that Tika in any form 
supports". A user like me may not be aware of the difference between "able to 
parse" and "able to recognise". This may be not accurate, but that's just what 
you understand by reading this if you land on that page.

So it appears that this renders my request obsolete, but shows that the 
documentation is easily misunderstood in its wording on that page.

The list of formats Tika only offers detection might be less helpful, but would 
have been exactly what I wanted. So I suggest adding a reference to it to the 
page.

I do not know in what detail Tika "supports" APK files, and would guess that 
contributors might be the right people to answer this.

> Mention APK support in List of Supported Formats
> 
>
> Key: TIKA-1630
> URL: https://issues.apache.org/jira/browse/TIKA-1630
> Project: Tika
>  Issue Type: Improvement
>  Components: documentation
>Affects Versions: 1.8
>Reporter: Lorenz Leutgeb
>Priority: Trivial
>
> http://tika.apache.org/1.8/formats.html claims to offer a "full list of 
> supported formats" does not mention support for APK files at all.
> I trusted that source and only found that tike supports APK files and their 
> respective MIME types from looking at Tikas codebase, which is suboptimal.
> Please add APK files to that list as appropriate (at least include the MIME 
> type Tika understands).
> Consider reevaluating the list to find out whether other formats are missing 
> (this is not covered by this ticket).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TIKA-1630) Mention APK support in List of Supported Formats

2015-05-14 Thread Lorenz Leutgeb (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-1630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14544243#comment-14544243
 ] 

Lorenz Leutgeb commented on TIKA-1630:
--

Support for APK file detection and awareness is indicated in 
[tika-mimetypes.xml|https://github.com/apache/tika/blob/trunk/tika-core/src/main/resources/org/apache/tika/mime/tika-mimetypes.xml#L255]
 and in 
[ZipContainerDetector|https://github.com/apache/tika/blob/trunk/tika-parsers/src/main/java/org/apache/tika/parser/pkg/ZipContainerDetector.java#L327]

Those URLs point to "trunk". I am not familiar with SVN, but AFAIK it relates 
to Git's "master" and should be a recent version.

Yes I am.

> Mention APK support in List of Supported Formats
> 
>
> Key: TIKA-1630
> URL: https://issues.apache.org/jira/browse/TIKA-1630
> Project: Tika
>  Issue Type: Improvement
>  Components: documentation
>Affects Versions: 1.8
>Reporter: Lorenz Leutgeb
>Priority: Trivial
>
> http://tika.apache.org/1.8/formats.html claims to offer a "full list of 
> supported formats" does not mention support for APK files at all.
> I trusted that source and only found that tike supports APK files and their 
> respective MIME types from looking at Tikas codebase, which is suboptimal.
> Please add APK files to that list as appropriate (at least include the MIME 
> type Tika understands).
> Consider reevaluating the list to find out whether other formats are missing 
> (this is not covered by this ticket).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (TIKA-1624) Syntax error in DOAP file release section

2015-05-14 Thread Ken Krugler (JIRA)

 [ 
https://issues.apache.org/jira/browse/TIKA-1624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ken Krugler closed TIKA-1624.
-
Resolution: Done

With Tyler's change to the release procedure doc on the wiki 
(https://wiki.apache.org/tika/ReleaseProcess) I believe this is now complete.

> Syntax error in DOAP file release section
> -
>
> Key: TIKA-1624
> URL: https://issues.apache.org/jira/browse/TIKA-1624
> Project: Tika
>  Issue Type: Bug
> Environment: 
> http://svn.apache.org/repos/asf/tika/site/src/site/resources/doap.rdf
>Reporter: Sebb
>Assignee: Ken Krugler
>
> DOAP files can contain details of multiple release Versions, however each 
> must be listed in a separate release section, for example:
> 
>   
> Apache XYZ
> 2015-02-16
> 1.6.2
>   
> 
> 
>   
> Apache XYZ
> 2014-09-24
> 1.6.1
>   
> 
> Please can the project DOAP be corrected accordingly?
> Thanks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TIKA-1624) Syntax error in DOAP file release section

2015-05-14 Thread Ken Krugler (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-1624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14544193#comment-14544193
 ] 

Ken Krugler commented on TIKA-1624:
---

As per Chris Mattmann's email, "You should only have to update the doap in the 
src dir - when you run mvn install from
the http://svn.apache.org/repos/asf/tika/site/ dir it will publish into the 
publish folder automatically."


> Syntax error in DOAP file release section
> -
>
> Key: TIKA-1624
> URL: https://issues.apache.org/jira/browse/TIKA-1624
> Project: Tika
>  Issue Type: Bug
> Environment: 
> http://svn.apache.org/repos/asf/tika/site/src/site/resources/doap.rdf
>Reporter: Sebb
>Assignee: Ken Krugler
>
> DOAP files can contain details of multiple release Versions, however each 
> must be listed in a separate release section, for example:
> 
>   
> Apache XYZ
> 2015-02-16
> 1.6.2
>   
> 
> 
>   
> Apache XYZ
> 2014-09-24
> 1.6.1
>   
> 
> Please can the project DOAP be corrected accordingly?
> Thanks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Published Site Changes

2015-05-14 Thread Nick Burch

On Thu, 14 May 2015, Tyler Palsulich wrote:

I was about to update the site for TIKA-1619 (checksums wrong on the site),
but found unpublished changes in the site. This is the status after
checking out the repo and running `mvn install`:

➜  site  svn status
M   publish/1.7/examples.html
M   publish/1.8/examples.html
M   publish/1.9/examples.html


At some point, we may need to pin the examples page to the state of the 
examples area of the repo at the release. Right now, when we add more 
examples code, the older examples pages will pull them in, which isn't 
always correct...



M   publish/1.8/index.html
M   publish/doap.rdf
M   publish/plugin-management.html
X   src/examples-src


No idea about the 1.8 index page, sorry

Doap looks expected. Plugin page is auto-generated by maven so updates 
fairly often. The X on src/examples-src just tells you it's a svn 
externals, as it pulls in the example java source files so that snippets 
can be put in the examples page


Nick

[jira] [Commented] (TIKA-1624) Syntax error in DOAP file release section

2015-05-14 Thread Tyler Palsulich (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-1624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14544150#comment-14544150
 ] 

Tyler Palsulich commented on TIKA-1624:
---

[~kkrugler], yes. I just updated the release instructions.

> Syntax error in DOAP file release section
> -
>
> Key: TIKA-1624
> URL: https://issues.apache.org/jira/browse/TIKA-1624
> Project: Tika
>  Issue Type: Bug
> Environment: 
> http://svn.apache.org/repos/asf/tika/site/src/site/resources/doap.rdf
>Reporter: Sebb
>Assignee: Ken Krugler
>
> DOAP files can contain details of multiple release Versions, however each 
> must be listed in a separate release section, for example:
> 
>   
> Apache XYZ
> 2015-02-16
> 1.6.2
>   
> 
> 
>   
> Apache XYZ
> 2014-09-24
> 1.6.1
>   
> 
> Please can the project DOAP be corrected accordingly?
> Thanks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Published Site Changes

2015-05-14 Thread Tyler Palsulich
Hi Everyone,

I was about to update the site for TIKA-1619 (checksums wrong on the site),
but found unpublished changes in the site. This is the status after
checking out the repo and running `mvn install`:

➜  site  svn status
M   publish/1.7/examples.html
M   publish/1.8/examples.html
M   publish/1.8/index.html
M   publish/1.9/examples.html
M   publish/doap.rdf
M   publish/plugin-management.html
X   src/examples-src

Not all of the changes are correct (e.g. make the list of contributors for
1.8 point to the list for 1.7). So, I don't want to commit all of the
changes. Maybe someone (probably me) didn't add site/src when committing to
site/publish?

I think the doap.rdf change was from r1678405
. But, I don't
know about the others.

Anyone have any ideas/clean solutions before I check each page by hand and
redo any necessary 1.7/8/9 changes?

Thanks,
Tyler


[jira] [Commented] (TIKA-1630) Mention APK support in List of Supported Formats

2015-05-14 Thread Nick Burch (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-1630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14544129#comment-14544129
 ] 

Nick Burch commented on TIKA-1630:
--

The supported formats page covers those formats for which Tika is able to 
extract Metadata and/or Textual Content. It doesn't mention those for which 
Tika is only able to offer detection, as the latter list would be much longer 
and possibly less helpful. (The latter list is easily available from the Tika 
App or Tika Server)

Does Tika have APK support at the parser level, giving Metadata and/or Text 
Content?

> Mention APK support in List of Supported Formats
> 
>
> Key: TIKA-1630
> URL: https://issues.apache.org/jira/browse/TIKA-1630
> Project: Tika
>  Issue Type: Improvement
>  Components: documentation
>Affects Versions: 1.8
>Reporter: Lorenz Leutgeb
>Priority: Trivial
>
> http://tika.apache.org/1.8/formats.html claims to offer a "full list of 
> supported formats" does not mention support for APK files at all.
> I trusted that source and only found that tike supports APK files and their 
> respective MIME types from looking at Tikas codebase, which is suboptimal.
> Please add APK files to that list as appropriate (at least include the MIME 
> type Tika understands).
> Consider reevaluating the list to find out whether other formats are missing 
> (this is not covered by this ticket).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TIKA-1630) Mention APK support in List of Supported Formats

2015-05-14 Thread Tyler Palsulich (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-1630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14544104#comment-14544104
 ] 

Tyler Palsulich commented on TIKA-1630:
---

Hi. Thanks for reporting this! Can you be a little more specific about which 
file is supported? What in the Tika codebase indicates support for APK formats? 
Also, just to be clear, are you referring to android application packages?

> Mention APK support in List of Supported Formats
> 
>
> Key: TIKA-1630
> URL: https://issues.apache.org/jira/browse/TIKA-1630
> Project: Tika
>  Issue Type: Improvement
>  Components: documentation
>Affects Versions: 1.8
>Reporter: Lorenz Leutgeb
>Priority: Trivial
>
> http://tika.apache.org/1.8/formats.html claims to offer a "full list of 
> supported formats" does not mention support for APK files at all.
> I trusted that source and only found that tike supports APK files and their 
> respective MIME types from looking at Tikas codebase, which is suboptimal.
> Please add APK files to that list as appropriate (at least include the MIME 
> type Tika understands).
> Consider reevaluating the list to find out whether other formats are missing 
> (this is not covered by this ticket).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (TIKA-1630) Mention APK support in List of Supported Formats

2015-05-14 Thread Lorenz Leutgeb (JIRA)
Lorenz Leutgeb created TIKA-1630:


 Summary: Mention APK support in List of Supported Formats
 Key: TIKA-1630
 URL: https://issues.apache.org/jira/browse/TIKA-1630
 Project: Tika
  Issue Type: Improvement
  Components: documentation
Affects Versions: 1.8
Reporter: Lorenz Leutgeb
Priority: Trivial


http://tika.apache.org/1.8/formats.html claims to offer a "full list of 
supported formats" does not mention support for APK files at all.

I trusted that source and only found that tike supports APK files and their 
respective MIME types from looking at Tikas codebase, which is suboptimal.

Please add APK files to that list as appropriate (at least include the MIME 
type Tika understands).

Consider reevaluating the list to find out whether other formats are missing 
(this is not covered by this ticket).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TIKA-1622) Expose Tika LanguageIdentifier via Tika Server

2015-05-14 Thread Thomas Ledoux (JIRA)

 [ 
https://issues.apache.org/jira/browse/TIKA-1622?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Ledoux updated TIKA-1622:

Attachment: TIKA-1622-cestcommeci.patch

Apologies for not testing my first patch.
I suppose the confusion comes for the italian 'come ci'. Nevertheless, giving a 
like more context does work. 
So here is a second patch for that, that do work ...
{code}
---
 T E S T S
---
Running org.apache.tika.server.LanguageResourceTest
Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 1.734 sec - in 
org.apache.tika.server.LanguageResourceTest

Results :

Tests run: 4, Failures: 0, Errors: 0, Skipped: 0

[INFO] 
[INFO] BUILD SUCCESS
[INFO] 
{code}

> Expose Tika LanguageIdentifier via Tika Server
> --
>
> Key: TIKA-1622
> URL: https://issues.apache.org/jira/browse/TIKA-1622
> Project: Tika
>  Issue Type: Bug
>  Components: languageidentifier, server
>Reporter: Chris A. Mattmann
>Assignee: Chris A. Mattmann
> Fix For: 1.9
>
> Attachments: TIKA-1622-cestcommeci.patch, TIKA-1622-commeci.patch
>
>
> The LanguageIdentifier in Tika should be exposed via Tika JAX-RS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)