[jira] [Commented] (SLING-2924) Full text extraction issue with Tika v1.0 under OSGi environment

2013-07-22 Thread Robert Munteanu (JIRA)

[ 
https://issues.apache.org/jira/browse/SLING-2924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13715013#comment-13715013
 ] 

Robert Munteanu commented on SLING-2924:


You're perfectly right, that's a much better solution. I've looked at the Tika 
bundles and at version 1.4 they don't provide the metadata we need.

I suggest you file a bug against Tika for this and if they agree file a bug 
here so that we can consume the fix.

 Full text extraction issue with Tika v1.0 under OSGi environment
 

 Key: SLING-2924
 URL: https://issues.apache.org/jira/browse/SLING-2924
 Project: Sling
  Issue Type: Bug
  Components: Launchpad
Reporter: Anjan
Assignee: Robert Munteanu
  Labels: tika,text-extraction
 Fix For: Launchpad Builder 7


 The latest stable build (I checked out revision 1487628) of Sling is using 
 Jackrabbit version 2.4.2 and it uses Tika version 1.0 for extracting 
 metatdata and text for indexing purpose.  Jackrabbit v2.4.2 deployed as a 
 separate web application extracts metadata and text from the uploaded 
 documents perfectly fine, but when deployed in Sling (OSGi environment), full 
 text extraction doesn't work.
 Updating the Tika dependency to Version 1.2 in Sling resolved the above issue.
 Secondly, if the indexes are deleted from the repository and the server is 
 restarted, indexes are not rebuilt for the existing documents.  The Tika 
 bundles were not ready by the time Jackrabbit starts to rebuild the indexes 
 during the Sling server start up.  Updating the startlevel from 15 to 10 for 
 the Tika bundles helps to resolve the issue.
 The changes related to above fixes are in 
 sling/launchpad/builder/src/main/bundles/list.xml file.
 Currently Tika bundles are at start level 15 as shown below:
 startLevel level=15
 ..
 bundle
 groupIdorg.apache.tika/groupId
 artifactIdtika-core/artifactId
 version1.0/version
 /bundle
 bundle
 groupIdorg.apache.tika/groupId
 artifactIdtika-bundle/artifactId
 version1.0/version
 /bundle
 ..
 /startLevel
 Moved the above bundles to start level 10 and also the version is changed to 
 1.2
 startLevel level=10
 ..
 bundle
 groupIdorg.apache.tika/groupId
 artifactIdtika-core/artifactId
 version1.2/version
 /bundle
 bundle
 groupIdorg.apache.tika/groupId
 artifactIdtika-bundle/artifactId
 version1.2/version
 /bundle
 ..
 /startLevel

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (SLING-2924) Full text extraction issue with Tika v1.0 under OSGi environment

2013-07-17 Thread Oliver Lietz (JIRA)

[ 
https://issues.apache.org/jira/browse/SLING-2924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13710999#comment-13710999
 ] 

Oliver Lietz commented on SLING-2924:
-

Please use Provide-Capability and Require-Capability instead of start levels.

http://wiki.osgi.org/wiki/Provide-Capability
http://wiki.osgi.org/wiki/Require-Capability
http://blog.osgi.org/2012/03/requirements-and-capabilities.html



 Full text extraction issue with Tika v1.0 under OSGi environment
 

 Key: SLING-2924
 URL: https://issues.apache.org/jira/browse/SLING-2924
 Project: Sling
  Issue Type: Bug
  Components: Launchpad
Reporter: Anjan
Assignee: Robert Munteanu
  Labels: tika,text-extraction
 Fix For: Launchpad Builder 7


 The latest stable build (I checked out revision 1487628) of Sling is using 
 Jackrabbit version 2.4.2 and it uses Tika version 1.0 for extracting 
 metatdata and text for indexing purpose.  Jackrabbit v2.4.2 deployed as a 
 separate web application extracts metadata and text from the uploaded 
 documents perfectly fine, but when deployed in Sling (OSGi environment), full 
 text extraction doesn't work.
 Updating the Tika dependency to Version 1.2 in Sling resolved the above issue.
 Secondly, if the indexes are deleted from the repository and the server is 
 restarted, indexes are not rebuilt for the existing documents.  The Tika 
 bundles were not ready by the time Jackrabbit starts to rebuild the indexes 
 during the Sling server start up.  Updating the startlevel from 15 to 10 for 
 the Tika bundles helps to resolve the issue.
 The changes related to above fixes are in 
 sling/launchpad/builder/src/main/bundles/list.xml file.
 Currently Tika bundles are at start level 15 as shown below:
 startLevel level=15
 ..
 bundle
 groupIdorg.apache.tika/groupId
 artifactIdtika-core/artifactId
 version1.0/version
 /bundle
 bundle
 groupIdorg.apache.tika/groupId
 artifactIdtika-bundle/artifactId
 version1.0/version
 /bundle
 ..
 /startLevel
 Moved the above bundles to start level 10 and also the version is changed to 
 1.2
 startLevel level=10
 ..
 bundle
 groupIdorg.apache.tika/groupId
 artifactIdtika-core/artifactId
 version1.2/version
 /bundle
 bundle
 groupIdorg.apache.tika/groupId
 artifactIdtika-bundle/artifactId
 version1.2/version
 /bundle
 ..
 /startLevel

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (SLING-2924) Full text extraction issue with Tika v1.0 under OSGi environment

2013-07-10 Thread Robert Munteanu (JIRA)

[ 
https://issues.apache.org/jira/browse/SLING-2924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13704428#comment-13704428
 ] 

Robert Munteanu commented on SLING-2924:


Updated the test timeout to 10 seconds to address Jenkins failures in 
http://svn.apache.org/viewvc?view=revisionrevision=1501716

 Full text extraction issue with Tika v1.0 under OSGi environment
 

 Key: SLING-2924
 URL: https://issues.apache.org/jira/browse/SLING-2924
 Project: Sling
  Issue Type: Bug
  Components: Launchpad
Reporter: Anjan
Assignee: Robert Munteanu
  Labels: tika,text-extraction
 Fix For: Launchpad Builder 7


 The latest stable build (I checked out revision 1487628) of Sling is using 
 Jackrabbit version 2.4.2 and it uses Tika version 1.0 for extracting 
 metatdata and text for indexing purpose.  Jackrabbit v2.4.2 deployed as a 
 separate web application extracts metadata and text from the uploaded 
 documents perfectly fine, but when deployed in Sling (OSGi environment), full 
 text extraction doesn't work.
 Updating the Tika dependency to Version 1.2 in Sling resolved the above issue.
 Secondly, if the indexes are deleted from the repository and the server is 
 restarted, indexes are not rebuilt for the existing documents.  The Tika 
 bundles were not ready by the time Jackrabbit starts to rebuild the indexes 
 during the Sling server start up.  Updating the startlevel from 15 to 10 for 
 the Tika bundles helps to resolve the issue.
 The changes related to above fixes are in 
 sling/launchpad/builder/src/main/bundles/list.xml file.
 Currently Tika bundles are at start level 15 as shown below:
 startLevel level=15
 ..
 bundle
 groupIdorg.apache.tika/groupId
 artifactIdtika-core/artifactId
 version1.0/version
 /bundle
 bundle
 groupIdorg.apache.tika/groupId
 artifactIdtika-bundle/artifactId
 version1.0/version
 /bundle
 ..
 /startLevel
 Moved the above bundles to start level 10 and also the version is changed to 
 1.2
 startLevel level=10
 ..
 bundle
 groupIdorg.apache.tika/groupId
 artifactIdtika-core/artifactId
 version1.2/version
 /bundle
 bundle
 groupIdorg.apache.tika/groupId
 artifactIdtika-bundle/artifactId
 version1.2/version
 /bundle
 ..
 /startLevel

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (SLING-2924) Full text extraction issue with Tika v1.0 under OSGi environment

2013-07-09 Thread Robert Munteanu (JIRA)

[ 
https://issues.apache.org/jira/browse/SLING-2924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13703314#comment-13703314
 ] 

Robert Munteanu commented on SLING-2924:


I've updated the bundles to tika-1.2 and verified the behaviour using an 
integration test in http://svn.apache.org/viewvc?view=revisionrevision=1501285 
.

 Full text extraction issue with Tika v1.0 under OSGi environment
 

 Key: SLING-2924
 URL: https://issues.apache.org/jira/browse/SLING-2924
 Project: Sling
  Issue Type: Bug
  Components: JCR
Reporter: Anjan
Assignee: Robert Munteanu
  Labels: tika,text-extraction

 The latest stable build (I checked out revision 1487628) of Sling is using 
 Jackrabbit version 2.4.2 and it uses Tika version 1.0 for extracting 
 metatdata and text for indexing purpose.  Jackrabbit v2.4.2 deployed as a 
 separate web application extracts metadata and text from the uploaded 
 documents perfectly fine, but when deployed in Sling (OSGi environment), full 
 text extraction doesn't work.
 Updating the Tika dependency to Version 1.2 in Sling resolved the above issue.
 Secondly, if the indexes are deleted from the repository and the server is 
 restarted, indexes are not rebuilt for the existing documents.  The Tika 
 bundles were not ready by the time Jackrabbit starts to rebuild the indexes 
 during the Sling server start up.  Updating the startlevel from 15 to 10 for 
 the Tika bundles helps to resolve the issue.
 The changes related to above fixes are in 
 sling/launchpad/builder/src/main/bundles/list.xml file.
 Currently Tika bundles are at start level 15 as shown below:
 startLevel level=15
 ..
 bundle
 groupIdorg.apache.tika/groupId
 artifactIdtika-core/artifactId
 version1.0/version
 /bundle
 bundle
 groupIdorg.apache.tika/groupId
 artifactIdtika-bundle/artifactId
 version1.0/version
 /bundle
 ..
 /startLevel
 Moved the above bundles to start level 10 and also the version is changed to 
 1.2
 startLevel level=10
 ..
 bundle
 groupIdorg.apache.tika/groupId
 artifactIdtika-core/artifactId
 version1.2/version
 /bundle
 bundle
 groupIdorg.apache.tika/groupId
 artifactIdtika-bundle/artifactId
 version1.2/version
 /bundle
 ..
 /startLevel

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira