[jira] [Commented] (SLING-2924) Full text extraction issue with Tika v1.0 under OSGi environment
[ https://issues.apache.org/jira/browse/SLING-2924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13715013#comment-13715013 ] Robert Munteanu commented on SLING-2924: You're perfectly right, that's a much better solution. I've looked at the Tika bundles and at version 1.4 they don't provide the metadata we need. I suggest you file a bug against Tika for this and if they agree file a bug here so that we can consume the fix. Full text extraction issue with Tika v1.0 under OSGi environment Key: SLING-2924 URL: https://issues.apache.org/jira/browse/SLING-2924 Project: Sling Issue Type: Bug Components: Launchpad Reporter: Anjan Assignee: Robert Munteanu Labels: tika,text-extraction Fix For: Launchpad Builder 7 The latest stable build (I checked out revision 1487628) of Sling is using Jackrabbit version 2.4.2 and it uses Tika version 1.0 for extracting metatdata and text for indexing purpose. Jackrabbit v2.4.2 deployed as a separate web application extracts metadata and text from the uploaded documents perfectly fine, but when deployed in Sling (OSGi environment), full text extraction doesn't work. Updating the Tika dependency to Version 1.2 in Sling resolved the above issue. Secondly, if the indexes are deleted from the repository and the server is restarted, indexes are not rebuilt for the existing documents. The Tika bundles were not ready by the time Jackrabbit starts to rebuild the indexes during the Sling server start up. Updating the startlevel from 15 to 10 for the Tika bundles helps to resolve the issue. The changes related to above fixes are in sling/launchpad/builder/src/main/bundles/list.xml file. Currently Tika bundles are at start level 15 as shown below: startLevel level=15 .. bundle groupIdorg.apache.tika/groupId artifactIdtika-core/artifactId version1.0/version /bundle bundle groupIdorg.apache.tika/groupId artifactIdtika-bundle/artifactId version1.0/version /bundle .. /startLevel Moved the above bundles to start level 10 and also the version is changed to 1.2 startLevel level=10 .. bundle groupIdorg.apache.tika/groupId artifactIdtika-core/artifactId version1.2/version /bundle bundle groupIdorg.apache.tika/groupId artifactIdtika-bundle/artifactId version1.2/version /bundle .. /startLevel -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (SLING-2924) Full text extraction issue with Tika v1.0 under OSGi environment
[ https://issues.apache.org/jira/browse/SLING-2924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13710999#comment-13710999 ] Oliver Lietz commented on SLING-2924: - Please use Provide-Capability and Require-Capability instead of start levels. http://wiki.osgi.org/wiki/Provide-Capability http://wiki.osgi.org/wiki/Require-Capability http://blog.osgi.org/2012/03/requirements-and-capabilities.html Full text extraction issue with Tika v1.0 under OSGi environment Key: SLING-2924 URL: https://issues.apache.org/jira/browse/SLING-2924 Project: Sling Issue Type: Bug Components: Launchpad Reporter: Anjan Assignee: Robert Munteanu Labels: tika,text-extraction Fix For: Launchpad Builder 7 The latest stable build (I checked out revision 1487628) of Sling is using Jackrabbit version 2.4.2 and it uses Tika version 1.0 for extracting metatdata and text for indexing purpose. Jackrabbit v2.4.2 deployed as a separate web application extracts metadata and text from the uploaded documents perfectly fine, but when deployed in Sling (OSGi environment), full text extraction doesn't work. Updating the Tika dependency to Version 1.2 in Sling resolved the above issue. Secondly, if the indexes are deleted from the repository and the server is restarted, indexes are not rebuilt for the existing documents. The Tika bundles were not ready by the time Jackrabbit starts to rebuild the indexes during the Sling server start up. Updating the startlevel from 15 to 10 for the Tika bundles helps to resolve the issue. The changes related to above fixes are in sling/launchpad/builder/src/main/bundles/list.xml file. Currently Tika bundles are at start level 15 as shown below: startLevel level=15 .. bundle groupIdorg.apache.tika/groupId artifactIdtika-core/artifactId version1.0/version /bundle bundle groupIdorg.apache.tika/groupId artifactIdtika-bundle/artifactId version1.0/version /bundle .. /startLevel Moved the above bundles to start level 10 and also the version is changed to 1.2 startLevel level=10 .. bundle groupIdorg.apache.tika/groupId artifactIdtika-core/artifactId version1.2/version /bundle bundle groupIdorg.apache.tika/groupId artifactIdtika-bundle/artifactId version1.2/version /bundle .. /startLevel -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (SLING-2924) Full text extraction issue with Tika v1.0 under OSGi environment
[ https://issues.apache.org/jira/browse/SLING-2924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13704428#comment-13704428 ] Robert Munteanu commented on SLING-2924: Updated the test timeout to 10 seconds to address Jenkins failures in http://svn.apache.org/viewvc?view=revisionrevision=1501716 Full text extraction issue with Tika v1.0 under OSGi environment Key: SLING-2924 URL: https://issues.apache.org/jira/browse/SLING-2924 Project: Sling Issue Type: Bug Components: Launchpad Reporter: Anjan Assignee: Robert Munteanu Labels: tika,text-extraction Fix For: Launchpad Builder 7 The latest stable build (I checked out revision 1487628) of Sling is using Jackrabbit version 2.4.2 and it uses Tika version 1.0 for extracting metatdata and text for indexing purpose. Jackrabbit v2.4.2 deployed as a separate web application extracts metadata and text from the uploaded documents perfectly fine, but when deployed in Sling (OSGi environment), full text extraction doesn't work. Updating the Tika dependency to Version 1.2 in Sling resolved the above issue. Secondly, if the indexes are deleted from the repository and the server is restarted, indexes are not rebuilt for the existing documents. The Tika bundles were not ready by the time Jackrabbit starts to rebuild the indexes during the Sling server start up. Updating the startlevel from 15 to 10 for the Tika bundles helps to resolve the issue. The changes related to above fixes are in sling/launchpad/builder/src/main/bundles/list.xml file. Currently Tika bundles are at start level 15 as shown below: startLevel level=15 .. bundle groupIdorg.apache.tika/groupId artifactIdtika-core/artifactId version1.0/version /bundle bundle groupIdorg.apache.tika/groupId artifactIdtika-bundle/artifactId version1.0/version /bundle .. /startLevel Moved the above bundles to start level 10 and also the version is changed to 1.2 startLevel level=10 .. bundle groupIdorg.apache.tika/groupId artifactIdtika-core/artifactId version1.2/version /bundle bundle groupIdorg.apache.tika/groupId artifactIdtika-bundle/artifactId version1.2/version /bundle .. /startLevel -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (SLING-2924) Full text extraction issue with Tika v1.0 under OSGi environment
[ https://issues.apache.org/jira/browse/SLING-2924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13703314#comment-13703314 ] Robert Munteanu commented on SLING-2924: I've updated the bundles to tika-1.2 and verified the behaviour using an integration test in http://svn.apache.org/viewvc?view=revisionrevision=1501285 . Full text extraction issue with Tika v1.0 under OSGi environment Key: SLING-2924 URL: https://issues.apache.org/jira/browse/SLING-2924 Project: Sling Issue Type: Bug Components: JCR Reporter: Anjan Assignee: Robert Munteanu Labels: tika,text-extraction The latest stable build (I checked out revision 1487628) of Sling is using Jackrabbit version 2.4.2 and it uses Tika version 1.0 for extracting metatdata and text for indexing purpose. Jackrabbit v2.4.2 deployed as a separate web application extracts metadata and text from the uploaded documents perfectly fine, but when deployed in Sling (OSGi environment), full text extraction doesn't work. Updating the Tika dependency to Version 1.2 in Sling resolved the above issue. Secondly, if the indexes are deleted from the repository and the server is restarted, indexes are not rebuilt for the existing documents. The Tika bundles were not ready by the time Jackrabbit starts to rebuild the indexes during the Sling server start up. Updating the startlevel from 15 to 10 for the Tika bundles helps to resolve the issue. The changes related to above fixes are in sling/launchpad/builder/src/main/bundles/list.xml file. Currently Tika bundles are at start level 15 as shown below: startLevel level=15 .. bundle groupIdorg.apache.tika/groupId artifactIdtika-core/artifactId version1.0/version /bundle bundle groupIdorg.apache.tika/groupId artifactIdtika-bundle/artifactId version1.0/version /bundle .. /startLevel Moved the above bundles to start level 10 and also the version is changed to 1.2 startLevel level=10 .. bundle groupIdorg.apache.tika/groupId artifactIdtika-core/artifactId version1.2/version /bundle bundle groupIdorg.apache.tika/groupId artifactIdtika-bundle/artifactId version1.2/version /bundle .. /startLevel -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira