from:"Jayendra Patil \(JIRA\)"

[jira] Created: (SOLR-2029) Support for Index Time Document Boost in SolrContentHandler

2010-08-04 Thread Jayendra Patil (JIRA)

Support for Index Time Document Boost in SolrContentHandler
---

 Key: SOLR-2029
 URL: https://issues.apache.org/jira/browse/SOLR-2029
 Project: Solr
  Issue Type: Improvement
  Components: contrib - Solr Cell (Tika extraction)
Affects Versions: 1.4.1
Reporter: Jayendra Patil


We are using the extract request handler to index rich content documents with 
other metadata.
However, SolrContentHandler does seem to support the parameter for applying 
index time document boost. 
Basically, including document.setDocumentBoost(boost).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (SOLR-2416) Solr Cell fails to index Zip file contents

2011-03-17 Thread Jayendra Patil (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-2416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13008284#comment-13008284
 ] 

Jayendra Patil commented on SOLR-2416:
--

This issue existed in Solr 1.4 packaged with Tika 0.4, which prevented us from 
using the stable version.

Thread - 
http://lucene.472066.n3.nabble.com/Issue-Indexing-zip-file-content-in-Solr-1-4-td504914.html
The issue was resolved with the Tika 0.5 upgrade @ 
https://issues.apache.org/jira/browse/SOLR-1567

We are working on a Snapshot of Solr Trunk 4.X marked around 15 August 2010, 
which uses the Tika 0.8 snapshot jars, and the extraction works fine for us.
However, with the latest Trunk upgraded to the stable release of Tika 0.8, it 
does not have the same behaviour.

> Solr Cell fails to index Zip file contents
> --
>
> Key: SOLR-2416
> URL: https://issues.apache.org/jira/browse/SOLR-2416
> Project: Solr
>  Issue Type: Bug
>  Components: contrib - DataImportHandler, contrib - Solr Cell (Tika 
> extraction)
>Affects Versions: 1.4.1
>Reporter: Jayendra Patil
> Fix For: 3.2
>
> Attachments: SOLR-2416_ExtractingDocumentLoader.patch
>
>
> Working with the latest Solr Trunk code and seems the Tika handlers for Solr 
> Cell (ExtractingDocumentLoader.java) and Data Import handler 
> (TikaEntityProcessor.java) fails to index the zip file contents again.
> It just indexes the file names again.
> This issue was addressed some time back, late last year, but seems to have 
> reappeared with the latest code.
> Jira for the Data Import handler part with the patch and the testcase - 
> https://issues.apache.org/jira/browse/SOLR-2332.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Created: (SOLR-2156) Solr Replication - SnapPuller fails to clean Old Index Directories on Full Copy

2010-10-14 Thread Jayendra Patil (JIRA)

Solr Replication - SnapPuller fails to clean Old Index Directories on Full Copy
---

 Key: SOLR-2156
 URL: https://issues.apache.org/jira/browse/SOLR-2156
 Project: Solr
  Issue Type: Improvement
  Components: replication (java)
Affects Versions: 4.0
Reporter: Jayendra Patil


We are working on the Solr trunk and have a Master and Two slaves 
configuration . 
Our indexing consists of Periodic Full and Incremental index building on the 
master and replication on the slaves.

When a Full indexing (clean and rebuild) is performed, we always end with an 
extra index folder copy, which holds the complete index and hence the size just 
grows on, on the slaves.

e.g.
drwxr-xr-x 2 tomcat tomcat 4096 2010-10-09 12:10 index
drwxr-xr-x 2 tomcat tomcat 4096 2010-10-11 09:43 index.20101009120649
drwxr-xr-x 2 tomcat tomcat 4096 2010-10-12 10:27 index.20101011094043
-rw-r--r-- 1 tomcattomcat   75 2010-10-11 09:43  index.properties
-rw-r--r-- 1 tomcattomcat  422 2010-10-12 10:26 replication.properties
drwxr-xr-x 2 tomcat tomcat   68 2010-10-12 10:27 spellchecker

Where index.20101011094043 is the active index and the other index.xxx 
directories are no more used.

The SnapPuller deletes the temporary Index directory, but does not delete the 
old one when the switch is performed for the full copy.

The below code should do the trick.

 boolean fetchLatestIndex(SolrCore core) throws IOException {
..
  } finally {
if(deleteTmpIdxDir) {
delTree(tmpIndexDir);
} else {
// Delete the old index directory, as the flag is set only after 
the full copy is performed
delTree(indexDir);
}
  }
.
  }

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Updated: (SOLR-2156) Solr Replication - SnapPuller fails to clean Old Index Directories on Full Copy

2010-10-21 Thread Jayendra Patil (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-2156?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jayendra Patil updated SOLR-2156:
-

Attachment: Solr-2156_SnapPuller.patch

Attached the Fix.

> Solr Replication - SnapPuller fails to clean Old Index Directories on Full 
> Copy
> ---
>
> Key: SOLR-2156
> URL: https://issues.apache.org/jira/browse/SOLR-2156
> Project: Solr
>  Issue Type: Improvement
>  Components: replication (java)
>Affects Versions: 4.0
>Reporter: Jayendra Patil
> Attachments: Solr-2156_SnapPuller.patch
>
>
> We are working on the Solr trunk and have a Master and Two slaves 
> configuration . 
> Our indexing consists of Periodic Full and Incremental index building on the 
> master and replication on the slaves.
> When a Full indexing (clean and rebuild) is performed, we always end with an 
> extra index folder copy, which holds the complete index and hence the size 
> just grows on, on the slaves.
> e.g.
> drwxr-xr-x 2 tomcat tomcat 4096 2010-10-09 12:10 index
> drwxr-xr-x 2 tomcat tomcat 4096 2010-10-11 09:43 index.20101009120649
> drwxr-xr-x 2 tomcat tomcat 4096 2010-10-12 10:27 index.20101011094043
> -rw-r--r-- 1 tomcattomcat   75 2010-10-11 09:43  index.properties
> -rw-r--r-- 1 tomcattomcat  422 2010-10-12 10:26 replication.properties
> drwxr-xr-x 2 tomcat tomcat   68 2010-10-12 10:27 spellchecker
> Where index.20101011094043 is the active index and the other index.xxx 
> directories are no more used.
> The SnapPuller deletes the temporary Index directory, but does not delete the 
> old one when the switch is performed for the full copy.
> The below code should do the trick.
>  boolean fetchLatestIndex(SolrCore core) throws IOException {
> ..
>   } finally {
> if(deleteTmpIdxDir) {
> delTree(tmpIndexDir);
> } else {
> // Delete the old index directory, as the flag is set only after 
> the full copy is performed
> delTree(indexDir);
> }
>   }
> .
>   }

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (SOLR-2156) Solr Replication - SnapPuller fails to clean Old Index Directories on Full Copy

2010-11-05 Thread Jayendra Patil (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-2156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12928928#action_12928928
 ] 

Jayendra Patil commented on SOLR-2156:
--

There are different conditions in which the flag gets set to false.

If the Index is stale (Common Index files of slave does not exist in master
or are of different size, or with different timestamp - isIndexStale method)
or more newer than the master (slave generation > master generation) a full
index download is performed.

In our case we do a clean build so the files on on slave don't exist on
master and hence the flag is set to false.




> Solr Replication - SnapPuller fails to clean Old Index Directories on Full 
> Copy
> ---
>
> Key: SOLR-2156
> URL: https://issues.apache.org/jira/browse/SOLR-2156
> Project: Solr
>  Issue Type: Improvement
>  Components: replication (java)
>Affects Versions: 4.0
>Reporter: Jayendra Patil
> Attachments: Solr-2156_SnapPuller.patch
>
>
> We are working on the Solr trunk and have a Master and Two slaves 
> configuration . 
> Our indexing consists of Periodic Full and Incremental index building on the 
> master and replication on the slaves.
> When a Full indexing (clean and rebuild) is performed, we always end with an 
> extra index folder copy, which holds the complete index and hence the size 
> just grows on, on the slaves.
> e.g.
> drwxr-xr-x 2 tomcat tomcat 4096 2010-10-09 12:10 index
> drwxr-xr-x 2 tomcat tomcat 4096 2010-10-11 09:43 index.20101009120649
> drwxr-xr-x 2 tomcat tomcat 4096 2010-10-12 10:27 index.20101011094043
> -rw-r--r-- 1 tomcattomcat   75 2010-10-11 09:43  index.properties
> -rw-r--r-- 1 tomcattomcat  422 2010-10-12 10:26 replication.properties
> drwxr-xr-x 2 tomcat tomcat   68 2010-10-12 10:27 spellchecker
> Where index.20101011094043 is the active index and the other index.xxx 
> directories are no more used.
> The SnapPuller deletes the temporary Index directory, but does not delete the 
> old one when the switch is performed for the full copy.
> The below code should do the trick.
>  boolean fetchLatestIndex(SolrCore core) throws IOException {
> ..
>   } finally {
> if(deleteTmpIdxDir) {
> delTree(tmpIndexDir);
> } else {
> // Delete the old index directory, as the flag is set only after 
> the full copy is performed
> delTree(indexDir);
> }
>   }
> .
>   }

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Updated: (SOLR-2029) Support for Index Time Document Boost in SolrContentHandler

2010-11-07 Thread Jayendra Patil (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-2029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jayendra Patil updated SOLR-2029:
-

Attachment: SolrContentHandler.patch

Attached is the Fix Patch.
The parameter name to be passed is boost.

> Support for Index Time Document Boost in SolrContentHandler
> ---
>
> Key: SOLR-2029
> URL: https://issues.apache.org/jira/browse/SOLR-2029
> Project: Solr
>  Issue Type: Improvement
>  Components: contrib - Solr Cell (Tika extraction)
>Affects Versions: 1.4.1
>Reporter: Jayendra Patil
> Attachments: SolrContentHandler.patch
>
>
> We are using the extract request handler to index rich content documents with 
> other metadata.
> However, SolrContentHandler does seem to support the parameter for applying 
> index time document boost. 
> Basically, including document.setDocumentBoost(boost).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Updated: (SOLR-2029) Support for Index Time Document Boost in SolrContentHandler

2010-11-07 Thread Jayendra Patil (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-2029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jayendra Patil updated SOLR-2029:
-

Description: 
We are using the extract request handler to index rich content documents with 
other metadata.
However, SolrContentHandler does not seem to support the parameter for applying 
index time document boost. 
Basically, including document.setDocumentBoost(boost).

  was:
We are using the extract request handler to index rich content documents with 
other metadata.
However, SolrContentHandler does seem to support the parameter for applying 
index time document boost. 
Basically, including document.setDocumentBoost(boost).


> Support for Index Time Document Boost in SolrContentHandler
> ---
>
> Key: SOLR-2029
> URL: https://issues.apache.org/jira/browse/SOLR-2029
> Project: Solr
>  Issue Type: Improvement
>  Components: contrib - Solr Cell (Tika extraction)
>Affects Versions: 1.4.1
>Reporter: Jayendra Patil
> Attachments: SolrContentHandler.patch
>
>
> We are using the extract request handler to index rich content documents with 
> other metadata.
> However, SolrContentHandler does not seem to support the parameter for 
> applying index time document boost. 
> Basically, including document.setDocumentBoost(boost).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Created: (SOLR-2240) Basic authentication for stream.url

2010-11-17 Thread Jayendra Patil (JIRA)

Basic authentication for stream.url
---

 Key: SOLR-2240
 URL: https://issues.apache.org/jira/browse/SOLR-2240
 Project: Solr
  Issue Type: Improvement
  Components: update
Affects Versions: 4.0
Reporter: Jayendra Patil
Priority: Minor


We intend to use stream.url for indexing documents from remote locations 
exposed through http.
However, the remote urls are secured and would need basic authentication to be 
able access the documents.
The current implementation for stream.url in ContentStreamBase.URLStream does 
not support authentication.

The implementation with stream.file would mean to download the files to a local 
box and would cause duplicity, whereas stream.body would have indexing 
performance issues with the hugh data being transferred over the network.

An approach would be :-
1. Passing additional authentication parameter e.g. stream.url.auth with the 
encoded authentication value - SolrRequestParsers
2. Setting Authorization request property for the Connection - 
ContentStreamBase.URLStream
this.conn.setRequestProperty("Authorization", "Basic " + 
encodedauthentication);

Any thoughts ??

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Updated: (SOLR-2240) Basic authentication for stream.url

2010-11-17 Thread Jayendra Patil (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-2240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jayendra Patil updated SOLR-2240:
-

Attachment: SOLR-2240.patch

Attached the Patch for the changes.

> Basic authentication for stream.url
> ---
>
> Key: SOLR-2240
> URL: https://issues.apache.org/jira/browse/SOLR-2240
> Project: Solr
>  Issue Type: Improvement
>  Components: update
>Affects Versions: 4.0
>Reporter: Jayendra Patil
>Priority: Minor
> Attachments: SOLR-2240.patch
>
>
> We intend to use stream.url for indexing documents from remote locations 
> exposed through http.
> However, the remote urls are secured and would need basic authentication to 
> be able access the documents.
> The current implementation for stream.url in ContentStreamBase.URLStream does 
> not support authentication.
> The implementation with stream.file would mean to download the files to a 
> local box and would cause duplicity, whereas stream.body would have indexing 
> performance issues with the hugh data being transferred over the network.
> An approach would be :-
> 1. Passing additional authentication parameter e.g. stream.url.auth with the 
> encoded authentication value - SolrRequestParsers
> 2. Setting Authorization request property for the Connection - 
> ContentStreamBase.URLStream
> this.conn.setRequestProperty("Authorization", "Basic " + 
> encodedauthentication);
> Any thoughts ??

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Created: (SOLR-2283) Expose QueryUtils methods

2010-12-13 Thread Jayendra Patil (JIRA)

Expose QueryUtils methods
-

 Key: SOLR-2283
 URL: https://issues.apache.org/jira/browse/SOLR-2283
 Project: Solr
  Issue Type: Improvement
  Components: search
Affects Versions: 4.0
Reporter: Jayendra Patil
Priority: Minor


We have a custom implementation of ExtendedDismaxQParserPlugin, bundled into a 
jar in the multicore lib.
The custom ExtendedDismaxQParserPlugin implementation still uses 
org.apache.solr.search.QueryUtils makeQueryable method, same as the old 
implementation.

However, the method calls throws an java.lang.IllegalAccessError, as it is 
being called from the inner ExtendedSolrQueryParser class and the makeQueryable 
has no access modifier (basically default).

Can we have the access modifier to public, as all the methods are static, to be 
accessible 



-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Created: (SOLR-2332) TikaEntityProcessor retrieves only File Names from Zip extraction

2011-01-23 Thread Jayendra Patil (JIRA)

TikaEntityProcessor retrieves only File Names from Zip extraction
-

 Key: SOLR-2332
 URL: https://issues.apache.org/jira/browse/SOLR-2332
 Project: Solr
  Issue Type: Bug
  Components: contrib - DataImportHandler
Affects Versions: 4.0
Reporter: Jayendra Patil


Extraction of Zip files using TikaEntityProcessor results in only names of file.
It does not extract the contents of the Files in the Zip

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Updated: (SOLR-2332) TikaEntityProcessor retrieves only File Names from Zip extraction

2011-01-23 Thread Jayendra Patil (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-2332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jayendra Patil updated SOLR-2332:
-

Attachment: solr-word.zip
SOLR-2332.patch

Attached is the Patch for the fix and Testcase.
Also attached is the Test zip file.

> TikaEntityProcessor retrieves only File Names from Zip extraction
> -
>
> Key: SOLR-2332
> URL: https://issues.apache.org/jira/browse/SOLR-2332
> Project: Solr
>  Issue Type: Bug
>  Components: contrib - DataImportHandler
>Affects Versions: 4.0
>Reporter: Jayendra Patil
> Attachments: SOLR-2332.patch, solr-word.zip
>
>
> Extraction of Zip files using TikaEntityProcessor results in only names of 
> file.
> It does not extract the contents of the Files in the Zip

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Updated: (SOLR-2283) Expose QueryUtils methods

2011-01-23 Thread Jayendra Patil (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-2283?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jayendra Patil updated SOLR-2283:
-

Attachment: SOLR-2283.patch

Patch Attached - makeQueryable made public.

> Expose QueryUtils methods
> -
>
> Key: SOLR-2283
> URL: https://issues.apache.org/jira/browse/SOLR-2283
> Project: Solr
>  Issue Type: Improvement
>  Components: search
>Affects Versions: 4.0
>Reporter: Jayendra Patil
>Priority: Minor
> Attachments: SOLR-2283.patch
>
>
> We have a custom implementation of ExtendedDismaxQParserPlugin, bundled into 
> a jar in the multicore lib.
> The custom ExtendedDismaxQParserPlugin implementation still uses 
> org.apache.solr.search.QueryUtils makeQueryable method, same as the old 
> implementation.
> However, the method calls throws an java.lang.IllegalAccessError, as it is 
> being called from the inner ExtendedSolrQueryParser class and the 
> makeQueryable has no access modifier (basically default).
> Can we have the access modifier to public, as all the methods are static, to 
> be accessible 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (SOLR-2317) Slaves have leftover index.xxxxx directories, and leftover files in index/ directory

2011-01-25 Thread Jayendra Patil (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-2317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12986514#action_12986514
 ] 

Jayendra Patil commented on SOLR-2317:
--

For the extra index. you can try the patch @ 
https://issues.apache.org/jira/browse/SOLR-2156

> Slaves have leftover index.x directories, and leftover files in index/ 
> directory
> 
>
> Key: SOLR-2317
> URL: https://issues.apache.org/jira/browse/SOLR-2317
> Project: Solr
>  Issue Type: Bug
>Affects Versions: 3.1
>Reporter: Bill Bell
>
> When replicating, we are getting leftover files on slaves. Some slaves are 
> getting index. with files leftover. And more concerning, the index/ 
> direcotry has left over files from previous replicated runs.
> This is a pain to keep cleaning up.
> Bill

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Created: (SOLR-2416) Solr Cell & DataImport Tika handler broken - fails to index Zip file contents

2011-03-09 Thread Jayendra Patil (JIRA)

Solr Cell & DataImport Tika handler broken - fails to index Zip file contents
-

 Key: SOLR-2416
 URL: https://issues.apache.org/jira/browse/SOLR-2416
 Project: Solr
  Issue Type: Bug
  Components: contrib - DataImportHandler, contrib - Solr Cell (Tika 
extraction)
Affects Versions: 4.0
Reporter: Jayendra Patil


Working with the latest Solr Trunk code and seems the Tika handlers for Solr 
Cell (ExtractingDocumentLoader.java) and Data Import handler 
(TikaEntityProcessor.java) fails to index the zip file contents again.
It just indexes the file names again.
This issue was addressed some time back, late last year, but seems to have 
reappeared with the latest code.

Jira for the Data Import handler part with the patch and the testcase - 
https://issues.apache.org/jira/browse/SOLR-2332.


--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Updated: (SOLR-2416) Solr Cell & DataImport Tika handler broken - fails to index Zip file contents

2011-03-09 Thread Jayendra Patil (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-2416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jayendra Patil updated SOLR-2416:
-

Attachment: SOLR-2416_ExtractingDocumentLoader.patch

Fix attached.

> Solr Cell & DataImport Tika handler broken - fails to index Zip file contents
> -
>
> Key: SOLR-2416
> URL: https://issues.apache.org/jira/browse/SOLR-2416
> Project: Solr
>  Issue Type: Bug
>  Components: contrib - DataImportHandler, contrib - Solr Cell (Tika 
> extraction)
>Affects Versions: 4.0
>Reporter: Jayendra Patil
> Attachments: SOLR-2416_ExtractingDocumentLoader.patch
>
>
> Working with the latest Solr Trunk code and seems the Tika handlers for Solr 
> Cell (ExtractingDocumentLoader.java) and Data Import handler 
> (TikaEntityProcessor.java) fails to index the zip file contents again.
> It just indexes the file names again.
> This issue was addressed some time back, late last year, but seems to have 
> reappeared with the latest code.
> Jira for the Data Import handler part with the patch and the testcase - 
> https://issues.apache.org/jira/browse/SOLR-2332.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Created: (SOLR-2029) Support for Index Time Document Boost in SolrContentHandler

[jira] Commented: (SOLR-2416) Solr Cell fails to index Zip file contents

[jira] Created: (SOLR-2156) Solr Replication - SnapPuller fails to clean Old Index Directories on Full Copy

[jira] Updated: (SOLR-2156) Solr Replication - SnapPuller fails to clean Old Index Directories on Full Copy

[jira] Commented: (SOLR-2156) Solr Replication - SnapPuller fails to clean Old Index Directories on Full Copy

[jira] Updated: (SOLR-2029) Support for Index Time Document Boost in SolrContentHandler

[jira] Updated: (SOLR-2029) Support for Index Time Document Boost in SolrContentHandler

[jira] Created: (SOLR-2240) Basic authentication for stream.url

[jira] Updated: (SOLR-2240) Basic authentication for stream.url

[jira] Created: (SOLR-2283) Expose QueryUtils methods

[jira] Created: (SOLR-2332) TikaEntityProcessor retrieves only File Names from Zip extraction

[jira] Updated: (SOLR-2332) TikaEntityProcessor retrieves only File Names from Zip extraction

[jira] Updated: (SOLR-2283) Expose QueryUtils methods

[jira] Commented: (SOLR-2317) Slaves have leftover index.xxxxx directories, and leftover files in index/ directory

[jira] Created: (SOLR-2416) Solr Cell & DataImport Tika handler broken - fails to index Zip file contents

[jira] Updated: (SOLR-2416) Solr Cell & DataImport Tika handler broken - fails to index Zip file contents

16 matches

Site Navigation

Mail list logo

Footer information