[jira] Created: (SOLR-2029) Support for Index Time Document Boost in SolrContentHandler
Support for Index Time Document Boost in SolrContentHandler --- Key: SOLR-2029 URL: https://issues.apache.org/jira/browse/SOLR-2029 Project: Solr Issue Type: Improvement Components: contrib - Solr Cell (Tika extraction) Affects Versions: 1.4.1 Reporter: Jayendra Patil We are using the extract request handler to index rich content documents with other metadata. However, SolrContentHandler does seem to support the parameter for applying index time document boost. Basically, including document.setDocumentBoost(boost). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (SOLR-2416) Solr Cell fails to index Zip file contents
[ https://issues.apache.org/jira/browse/SOLR-2416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13008284#comment-13008284 ] Jayendra Patil commented on SOLR-2416: -- This issue existed in Solr 1.4 packaged with Tika 0.4, which prevented us from using the stable version. Thread - http://lucene.472066.n3.nabble.com/Issue-Indexing-zip-file-content-in-Solr-1-4-td504914.html The issue was resolved with the Tika 0.5 upgrade @ https://issues.apache.org/jira/browse/SOLR-1567 We are working on a Snapshot of Solr Trunk 4.X marked around 15 August 2010, which uses the Tika 0.8 snapshot jars, and the extraction works fine for us. However, with the latest Trunk upgraded to the stable release of Tika 0.8, it does not have the same behaviour. > Solr Cell fails to index Zip file contents > -- > > Key: SOLR-2416 > URL: https://issues.apache.org/jira/browse/SOLR-2416 > Project: Solr > Issue Type: Bug > Components: contrib - DataImportHandler, contrib - Solr Cell (Tika > extraction) >Affects Versions: 1.4.1 >Reporter: Jayendra Patil > Fix For: 3.2 > > Attachments: SOLR-2416_ExtractingDocumentLoader.patch > > > Working with the latest Solr Trunk code and seems the Tika handlers for Solr > Cell (ExtractingDocumentLoader.java) and Data Import handler > (TikaEntityProcessor.java) fails to index the zip file contents again. > It just indexes the file names again. > This issue was addressed some time back, late last year, but seems to have > reappeared with the latest code. > Jira for the Data Import handler part with the patch and the testcase - > https://issues.apache.org/jira/browse/SOLR-2332. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Created: (SOLR-2156) Solr Replication - SnapPuller fails to clean Old Index Directories on Full Copy
Solr Replication - SnapPuller fails to clean Old Index Directories on Full Copy --- Key: SOLR-2156 URL: https://issues.apache.org/jira/browse/SOLR-2156 Project: Solr Issue Type: Improvement Components: replication (java) Affects Versions: 4.0 Reporter: Jayendra Patil We are working on the Solr trunk and have a Master and Two slaves configuration . Our indexing consists of Periodic Full and Incremental index building on the master and replication on the slaves. When a Full indexing (clean and rebuild) is performed, we always end with an extra index folder copy, which holds the complete index and hence the size just grows on, on the slaves. e.g. drwxr-xr-x 2 tomcat tomcat 4096 2010-10-09 12:10 index drwxr-xr-x 2 tomcat tomcat 4096 2010-10-11 09:43 index.20101009120649 drwxr-xr-x 2 tomcat tomcat 4096 2010-10-12 10:27 index.20101011094043 -rw-r--r-- 1 tomcattomcat 75 2010-10-11 09:43 index.properties -rw-r--r-- 1 tomcattomcat 422 2010-10-12 10:26 replication.properties drwxr-xr-x 2 tomcat tomcat 68 2010-10-12 10:27 spellchecker Where index.20101011094043 is the active index and the other index.xxx directories are no more used. The SnapPuller deletes the temporary Index directory, but does not delete the old one when the switch is performed for the full copy. The below code should do the trick. boolean fetchLatestIndex(SolrCore core) throws IOException { .. } finally { if(deleteTmpIdxDir) { delTree(tmpIndexDir); } else { // Delete the old index directory, as the flag is set only after the full copy is performed delTree(indexDir); } } . } -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Updated: (SOLR-2156) Solr Replication - SnapPuller fails to clean Old Index Directories on Full Copy
[ https://issues.apache.org/jira/browse/SOLR-2156?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jayendra Patil updated SOLR-2156: - Attachment: Solr-2156_SnapPuller.patch Attached the Fix. > Solr Replication - SnapPuller fails to clean Old Index Directories on Full > Copy > --- > > Key: SOLR-2156 > URL: https://issues.apache.org/jira/browse/SOLR-2156 > Project: Solr > Issue Type: Improvement > Components: replication (java) >Affects Versions: 4.0 >Reporter: Jayendra Patil > Attachments: Solr-2156_SnapPuller.patch > > > We are working on the Solr trunk and have a Master and Two slaves > configuration . > Our indexing consists of Periodic Full and Incremental index building on the > master and replication on the slaves. > When a Full indexing (clean and rebuild) is performed, we always end with an > extra index folder copy, which holds the complete index and hence the size > just grows on, on the slaves. > e.g. > drwxr-xr-x 2 tomcat tomcat 4096 2010-10-09 12:10 index > drwxr-xr-x 2 tomcat tomcat 4096 2010-10-11 09:43 index.20101009120649 > drwxr-xr-x 2 tomcat tomcat 4096 2010-10-12 10:27 index.20101011094043 > -rw-r--r-- 1 tomcattomcat 75 2010-10-11 09:43 index.properties > -rw-r--r-- 1 tomcattomcat 422 2010-10-12 10:26 replication.properties > drwxr-xr-x 2 tomcat tomcat 68 2010-10-12 10:27 spellchecker > Where index.20101011094043 is the active index and the other index.xxx > directories are no more used. > The SnapPuller deletes the temporary Index directory, but does not delete the > old one when the switch is performed for the full copy. > The below code should do the trick. > boolean fetchLatestIndex(SolrCore core) throws IOException { > .. > } finally { > if(deleteTmpIdxDir) { > delTree(tmpIndexDir); > } else { > // Delete the old index directory, as the flag is set only after > the full copy is performed > delTree(indexDir); > } > } > . > } -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (SOLR-2156) Solr Replication - SnapPuller fails to clean Old Index Directories on Full Copy
[ https://issues.apache.org/jira/browse/SOLR-2156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12928928#action_12928928 ] Jayendra Patil commented on SOLR-2156: -- There are different conditions in which the flag gets set to false. If the Index is stale (Common Index files of slave does not exist in master or are of different size, or with different timestamp - isIndexStale method) or more newer than the master (slave generation > master generation) a full index download is performed. In our case we do a clean build so the files on on slave don't exist on master and hence the flag is set to false. > Solr Replication - SnapPuller fails to clean Old Index Directories on Full > Copy > --- > > Key: SOLR-2156 > URL: https://issues.apache.org/jira/browse/SOLR-2156 > Project: Solr > Issue Type: Improvement > Components: replication (java) >Affects Versions: 4.0 >Reporter: Jayendra Patil > Attachments: Solr-2156_SnapPuller.patch > > > We are working on the Solr trunk and have a Master and Two slaves > configuration . > Our indexing consists of Periodic Full and Incremental index building on the > master and replication on the slaves. > When a Full indexing (clean and rebuild) is performed, we always end with an > extra index folder copy, which holds the complete index and hence the size > just grows on, on the slaves. > e.g. > drwxr-xr-x 2 tomcat tomcat 4096 2010-10-09 12:10 index > drwxr-xr-x 2 tomcat tomcat 4096 2010-10-11 09:43 index.20101009120649 > drwxr-xr-x 2 tomcat tomcat 4096 2010-10-12 10:27 index.20101011094043 > -rw-r--r-- 1 tomcattomcat 75 2010-10-11 09:43 index.properties > -rw-r--r-- 1 tomcattomcat 422 2010-10-12 10:26 replication.properties > drwxr-xr-x 2 tomcat tomcat 68 2010-10-12 10:27 spellchecker > Where index.20101011094043 is the active index and the other index.xxx > directories are no more used. > The SnapPuller deletes the temporary Index directory, but does not delete the > old one when the switch is performed for the full copy. > The below code should do the trick. > boolean fetchLatestIndex(SolrCore core) throws IOException { > .. > } finally { > if(deleteTmpIdxDir) { > delTree(tmpIndexDir); > } else { > // Delete the old index directory, as the flag is set only after > the full copy is performed > delTree(indexDir); > } > } > . > } -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Updated: (SOLR-2029) Support for Index Time Document Boost in SolrContentHandler
[ https://issues.apache.org/jira/browse/SOLR-2029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jayendra Patil updated SOLR-2029: - Attachment: SolrContentHandler.patch Attached is the Fix Patch. The parameter name to be passed is boost. > Support for Index Time Document Boost in SolrContentHandler > --- > > Key: SOLR-2029 > URL: https://issues.apache.org/jira/browse/SOLR-2029 > Project: Solr > Issue Type: Improvement > Components: contrib - Solr Cell (Tika extraction) >Affects Versions: 1.4.1 >Reporter: Jayendra Patil > Attachments: SolrContentHandler.patch > > > We are using the extract request handler to index rich content documents with > other metadata. > However, SolrContentHandler does seem to support the parameter for applying > index time document boost. > Basically, including document.setDocumentBoost(boost). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Updated: (SOLR-2029) Support for Index Time Document Boost in SolrContentHandler
[ https://issues.apache.org/jira/browse/SOLR-2029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jayendra Patil updated SOLR-2029: - Description: We are using the extract request handler to index rich content documents with other metadata. However, SolrContentHandler does not seem to support the parameter for applying index time document boost. Basically, including document.setDocumentBoost(boost). was: We are using the extract request handler to index rich content documents with other metadata. However, SolrContentHandler does seem to support the parameter for applying index time document boost. Basically, including document.setDocumentBoost(boost). > Support for Index Time Document Boost in SolrContentHandler > --- > > Key: SOLR-2029 > URL: https://issues.apache.org/jira/browse/SOLR-2029 > Project: Solr > Issue Type: Improvement > Components: contrib - Solr Cell (Tika extraction) >Affects Versions: 1.4.1 >Reporter: Jayendra Patil > Attachments: SolrContentHandler.patch > > > We are using the extract request handler to index rich content documents with > other metadata. > However, SolrContentHandler does not seem to support the parameter for > applying index time document boost. > Basically, including document.setDocumentBoost(boost). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Created: (SOLR-2240) Basic authentication for stream.url
Basic authentication for stream.url --- Key: SOLR-2240 URL: https://issues.apache.org/jira/browse/SOLR-2240 Project: Solr Issue Type: Improvement Components: update Affects Versions: 4.0 Reporter: Jayendra Patil Priority: Minor We intend to use stream.url for indexing documents from remote locations exposed through http. However, the remote urls are secured and would need basic authentication to be able access the documents. The current implementation for stream.url in ContentStreamBase.URLStream does not support authentication. The implementation with stream.file would mean to download the files to a local box and would cause duplicity, whereas stream.body would have indexing performance issues with the hugh data being transferred over the network. An approach would be :- 1. Passing additional authentication parameter e.g. stream.url.auth with the encoded authentication value - SolrRequestParsers 2. Setting Authorization request property for the Connection - ContentStreamBase.URLStream this.conn.setRequestProperty("Authorization", "Basic " + encodedauthentication); Any thoughts ?? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Updated: (SOLR-2240) Basic authentication for stream.url
[ https://issues.apache.org/jira/browse/SOLR-2240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jayendra Patil updated SOLR-2240: - Attachment: SOLR-2240.patch Attached the Patch for the changes. > Basic authentication for stream.url > --- > > Key: SOLR-2240 > URL: https://issues.apache.org/jira/browse/SOLR-2240 > Project: Solr > Issue Type: Improvement > Components: update >Affects Versions: 4.0 >Reporter: Jayendra Patil >Priority: Minor > Attachments: SOLR-2240.patch > > > We intend to use stream.url for indexing documents from remote locations > exposed through http. > However, the remote urls are secured and would need basic authentication to > be able access the documents. > The current implementation for stream.url in ContentStreamBase.URLStream does > not support authentication. > The implementation with stream.file would mean to download the files to a > local box and would cause duplicity, whereas stream.body would have indexing > performance issues with the hugh data being transferred over the network. > An approach would be :- > 1. Passing additional authentication parameter e.g. stream.url.auth with the > encoded authentication value - SolrRequestParsers > 2. Setting Authorization request property for the Connection - > ContentStreamBase.URLStream > this.conn.setRequestProperty("Authorization", "Basic " + > encodedauthentication); > Any thoughts ?? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Created: (SOLR-2283) Expose QueryUtils methods
Expose QueryUtils methods - Key: SOLR-2283 URL: https://issues.apache.org/jira/browse/SOLR-2283 Project: Solr Issue Type: Improvement Components: search Affects Versions: 4.0 Reporter: Jayendra Patil Priority: Minor We have a custom implementation of ExtendedDismaxQParserPlugin, bundled into a jar in the multicore lib. The custom ExtendedDismaxQParserPlugin implementation still uses org.apache.solr.search.QueryUtils makeQueryable method, same as the old implementation. However, the method calls throws an java.lang.IllegalAccessError, as it is being called from the inner ExtendedSolrQueryParser class and the makeQueryable has no access modifier (basically default). Can we have the access modifier to public, as all the methods are static, to be accessible -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Created: (SOLR-2332) TikaEntityProcessor retrieves only File Names from Zip extraction
TikaEntityProcessor retrieves only File Names from Zip extraction - Key: SOLR-2332 URL: https://issues.apache.org/jira/browse/SOLR-2332 Project: Solr Issue Type: Bug Components: contrib - DataImportHandler Affects Versions: 4.0 Reporter: Jayendra Patil Extraction of Zip files using TikaEntityProcessor results in only names of file. It does not extract the contents of the Files in the Zip -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Updated: (SOLR-2332) TikaEntityProcessor retrieves only File Names from Zip extraction
[ https://issues.apache.org/jira/browse/SOLR-2332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jayendra Patil updated SOLR-2332: - Attachment: solr-word.zip SOLR-2332.patch Attached is the Patch for the fix and Testcase. Also attached is the Test zip file. > TikaEntityProcessor retrieves only File Names from Zip extraction > - > > Key: SOLR-2332 > URL: https://issues.apache.org/jira/browse/SOLR-2332 > Project: Solr > Issue Type: Bug > Components: contrib - DataImportHandler >Affects Versions: 4.0 >Reporter: Jayendra Patil > Attachments: SOLR-2332.patch, solr-word.zip > > > Extraction of Zip files using TikaEntityProcessor results in only names of > file. > It does not extract the contents of the Files in the Zip -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Updated: (SOLR-2283) Expose QueryUtils methods
[ https://issues.apache.org/jira/browse/SOLR-2283?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jayendra Patil updated SOLR-2283: - Attachment: SOLR-2283.patch Patch Attached - makeQueryable made public. > Expose QueryUtils methods > - > > Key: SOLR-2283 > URL: https://issues.apache.org/jira/browse/SOLR-2283 > Project: Solr > Issue Type: Improvement > Components: search >Affects Versions: 4.0 >Reporter: Jayendra Patil >Priority: Minor > Attachments: SOLR-2283.patch > > > We have a custom implementation of ExtendedDismaxQParserPlugin, bundled into > a jar in the multicore lib. > The custom ExtendedDismaxQParserPlugin implementation still uses > org.apache.solr.search.QueryUtils makeQueryable method, same as the old > implementation. > However, the method calls throws an java.lang.IllegalAccessError, as it is > being called from the inner ExtendedSolrQueryParser class and the > makeQueryable has no access modifier (basically default). > Can we have the access modifier to public, as all the methods are static, to > be accessible -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (SOLR-2317) Slaves have leftover index.xxxxx directories, and leftover files in index/ directory
[ https://issues.apache.org/jira/browse/SOLR-2317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12986514#action_12986514 ] Jayendra Patil commented on SOLR-2317: -- For the extra index. you can try the patch @ https://issues.apache.org/jira/browse/SOLR-2156 > Slaves have leftover index.x directories, and leftover files in index/ > directory > > > Key: SOLR-2317 > URL: https://issues.apache.org/jira/browse/SOLR-2317 > Project: Solr > Issue Type: Bug >Affects Versions: 3.1 >Reporter: Bill Bell > > When replicating, we are getting leftover files on slaves. Some slaves are > getting index. with files leftover. And more concerning, the index/ > direcotry has left over files from previous replicated runs. > This is a pain to keep cleaning up. > Bill -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Created: (SOLR-2416) Solr Cell & DataImport Tika handler broken - fails to index Zip file contents
Solr Cell & DataImport Tika handler broken - fails to index Zip file contents - Key: SOLR-2416 URL: https://issues.apache.org/jira/browse/SOLR-2416 Project: Solr Issue Type: Bug Components: contrib - DataImportHandler, contrib - Solr Cell (Tika extraction) Affects Versions: 4.0 Reporter: Jayendra Patil Working with the latest Solr Trunk code and seems the Tika handlers for Solr Cell (ExtractingDocumentLoader.java) and Data Import handler (TikaEntityProcessor.java) fails to index the zip file contents again. It just indexes the file names again. This issue was addressed some time back, late last year, but seems to have reappeared with the latest code. Jira for the Data Import handler part with the patch and the testcase - https://issues.apache.org/jira/browse/SOLR-2332. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Updated: (SOLR-2416) Solr Cell & DataImport Tika handler broken - fails to index Zip file contents
[ https://issues.apache.org/jira/browse/SOLR-2416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jayendra Patil updated SOLR-2416: - Attachment: SOLR-2416_ExtractingDocumentLoader.patch Fix attached. > Solr Cell & DataImport Tika handler broken - fails to index Zip file contents > - > > Key: SOLR-2416 > URL: https://issues.apache.org/jira/browse/SOLR-2416 > Project: Solr > Issue Type: Bug > Components: contrib - DataImportHandler, contrib - Solr Cell (Tika > extraction) >Affects Versions: 4.0 >Reporter: Jayendra Patil > Attachments: SOLR-2416_ExtractingDocumentLoader.patch > > > Working with the latest Solr Trunk code and seems the Tika handlers for Solr > Cell (ExtractingDocumentLoader.java) and Data Import handler > (TikaEntityProcessor.java) fails to index the zip file contents again. > It just indexes the file names again. > This issue was addressed some time back, late last year, but seems to have > reappeared with the latest code. > Jira for the Data Import handler part with the patch and the testcase - > https://issues.apache.org/jira/browse/SOLR-2332. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org