[jira] Created: (SOLR-1637) Deprecate ALIAS command

2009-12-08 Thread Noble Paul (JIRA)
Deprecate ALIAS command
---

 Key: SOLR-1637
 URL: https://issues.apache.org/jira/browse/SOLR-1637
 Project: Solr
  Issue Type: Sub-task
  Components: multicore
Affects Versions: 1.5
Reporter: Noble Paul
 Fix For: 1.5


Mulicore makes the CoreContainer code more complex. We should remove it for now 
and revisit it for a simpler cleaner implementation

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (SOLR-1636) CoreAdminHandler should be thin wrapper over CoreContainer

2009-12-08 Thread Shalin Shekhar Mangar (JIRA)
CoreAdminHandler should be thin wrapper over CoreContainer
--

 Key: SOLR-1636
 URL: https://issues.apache.org/jira/browse/SOLR-1636
 Project: Solr
  Issue Type: Improvement
Reporter: Shalin Shekhar Mangar
 Fix For: 1.5


There's too much functionality in CoreAdminHandler which ideally belongs to 
CoreContainer. EmbeddedSolrServer clients have no way to easily load/unload 
cores or merge cores without duplicating all the code inside CoreAdminHandler.

The goal of this issue is to refactor CoreAdminHandler and move its features to 
CoreContainer so that CoreAdminHandler is just a thin wrapper over 
CoreContainer.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-1358) Integration of Tika and DataImportHandler

2009-12-08 Thread Noble Paul (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Noble Paul updated SOLR-1358:
-

Attachment: (was: SOLR-1358.patch)

> Integration of Tika and DataImportHandler
> -
>
> Key: SOLR-1358
> URL: https://issues.apache.org/jira/browse/SOLR-1358
> Project: Solr
>  Issue Type: New Feature
>  Components: contrib - DataImportHandler
>Reporter: Sascha Szott
>Assignee: Noble Paul
> Attachments: SOLR-1358.patch, SOLR-1358.patch
>
>
> At the moment, it's impossible to configure Solr such that it build up 
> documents by using data that comes from both pdf documents and database table 
> columns. Currently, to accomplish this task, it's up to the user to add some 
> preprocessing that converts pdf files into plain text files. Therefore, I 
> would like to see an integration of Solr Cell into DIH that makes those 
> preprocessing obsolete.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-1358) Integration of Tika and DataImportHandler

2009-12-08 Thread Noble Paul (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Noble Paul updated SOLR-1358:
-

Attachment: (was: SOLR-1358.patch)

> Integration of Tika and DataImportHandler
> -
>
> Key: SOLR-1358
> URL: https://issues.apache.org/jira/browse/SOLR-1358
> Project: Solr
>  Issue Type: New Feature
>  Components: contrib - DataImportHandler
>Reporter: Sascha Szott
>Assignee: Noble Paul
> Attachments: SOLR-1358.patch, SOLR-1358.patch
>
>
> At the moment, it's impossible to configure Solr such that it build up 
> documents by using data that comes from both pdf documents and database table 
> columns. Currently, to accomplish this task, it's up to the user to add some 
> preprocessing that converts pdf files into plain text files. Therefore, I 
> would like to see an integration of Solr Cell into DIH that makes those 
> preprocessing obsolete.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-1358) Integration of Tika and DataImportHandler

2009-12-08 Thread Noble Paul (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Noble Paul updated SOLR-1358:
-

Attachment: SOLR-1358.patch

> Integration of Tika and DataImportHandler
> -
>
> Key: SOLR-1358
> URL: https://issues.apache.org/jira/browse/SOLR-1358
> Project: Solr
>  Issue Type: New Feature
>  Components: contrib - DataImportHandler
>Reporter: Sascha Szott
>Assignee: Noble Paul
> Attachments: SOLR-1358.patch, SOLR-1358.patch
>
>
> At the moment, it's impossible to configure Solr such that it build up 
> documents by using data that comes from both pdf documents and database table 
> columns. Currently, to accomplish this task, it's up to the user to add some 
> preprocessing that converts pdf files into plain text files. Therefore, I 
> would like to see an integration of Solr Cell into DIH that makes those 
> preprocessing obsolete.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1625) Add regexp support for TermsComponent

2009-12-08 Thread Noble Paul (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12787925#action_12787925
 ] 

Noble Paul commented on SOLR-1625:
--

a few comments
isn't regex' better than 'regexp'

The regexp.hints is not very clear. users will not be able to understand it.

have expplicit strings like regex.flag=case_sensitive®ex.flag=multiline 

> Add regexp support for TermsComponent
> -
>
> Key: SOLR-1625
> URL: https://issues.apache.org/jira/browse/SOLR-1625
> Project: Solr
>  Issue Type: Improvement
>  Components: search
>Affects Versions: 1.4
>Reporter: Uri Boness
>Assignee: Noble Paul
>Priority: Minor
> Fix For: 1.5
>
> Attachments: SOLR-1625.patch, SOLR-1625.patch
>
>
> At the moment the only way to filter the returned terms is by a prefix. It 
> would be nice it the filter could also be done by regular expression

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (SOLR-1625) Add regexp support for TermsComponent

2009-12-08 Thread Noble Paul (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1625?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Noble Paul reassigned SOLR-1625:


Assignee: Noble Paul

> Add regexp support for TermsComponent
> -
>
> Key: SOLR-1625
> URL: https://issues.apache.org/jira/browse/SOLR-1625
> Project: Solr
>  Issue Type: Improvement
>  Components: search
>Affects Versions: 1.4
>Reporter: Uri Boness
>Assignee: Noble Paul
>Priority: Minor
> Fix For: 1.5
>
> Attachments: SOLR-1625.patch, SOLR-1625.patch
>
>
> At the moment the only way to filter the returned terms is by a prefix. It 
> would be nice it the filter could also be done by regular expression

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Issue Comment Edited: (SOLR-1358) Integration of Tika and DataImportHandler

2009-12-08 Thread Noble Paul (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12750855#action_12750855
 ] 

Noble Paul edited comment on SOLR-1358 at 12/9/09 4:48 AM:
---

Let us provide a new TikaEntityProcessor 

{code:xml}

  
  
  
   

  
  
  
  
  
 
  

{code}

With format=xml|html XPathEntityProcessor can be nested. This may help users 
extract more nested data from a file. It is even possible to create multiple 
documents from a single file

  was (Author: noble.paul):
Let us provide a new TikaEntityProcessor 

{code:xml}

  
  
  
   

  
  
  
  
  
 
  

{code}

With format=xml|html XPathEntityProcessor can be nested. This may help users 
extract more nested data from a file. It is even possible to create multiple 
documents from a single file
  
> Integration of Tika and DataImportHandler
> -
>
> Key: SOLR-1358
> URL: https://issues.apache.org/jira/browse/SOLR-1358
> Project: Solr
>  Issue Type: New Feature
>  Components: contrib - DataImportHandler
>Reporter: Sascha Szott
>Assignee: Noble Paul
> Attachments: SOLR-1358.patch, SOLR-1358.patch, SOLR-1358.patch
>
>
> At the moment, it's impossible to configure Solr such that it build up 
> documents by using data that comes from both pdf documents and database table 
> columns. Currently, to accomplish this task, it's up to the user to add some 
> preprocessing that converts pdf files into plain text files. Therefore, I 
> would like to see an integration of Solr Cell into DIH that makes those 
> preprocessing obsolete.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-1358) Integration of Tika and DataImportHandler

2009-12-08 Thread Noble Paul (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Noble Paul updated SOLR-1358:
-

Attachment: SOLR-1358.patch

onError implemented

> Integration of Tika and DataImportHandler
> -
>
> Key: SOLR-1358
> URL: https://issues.apache.org/jira/browse/SOLR-1358
> Project: Solr
>  Issue Type: New Feature
>  Components: contrib - DataImportHandler
>Reporter: Sascha Szott
>Assignee: Noble Paul
> Attachments: SOLR-1358.patch, SOLR-1358.patch, SOLR-1358.patch
>
>
> At the moment, it's impossible to configure Solr such that it build up 
> documents by using data that comes from both pdf documents and database table 
> columns. Currently, to accomplish this task, it's up to the user to add some 
> preprocessing that converts pdf files into plain text files. Therefore, I 
> would like to see an integration of Solr Cell into DIH that makes those 
> preprocessing obsolete.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1621) Allow current single core deployments to be specified by solr.xml

2009-12-08 Thread Noble Paul (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12787908#action_12787908
 ] 

Noble Paul commented on SOLR-1621:
--

bq.I'm good with yanking ALIAS for now.

I am happy to remove a lot of complexity from CoreContainer which was 
introduced by ALIAS. When we implemented SOLR-1293 (internally) we disabled 
ALIAS so that the implementation is simple. 
Let us revisit ALIAS later and make it simpler. 


> Allow current single core deployments to be specified by solr.xml
> -
>
> Key: SOLR-1621
> URL: https://issues.apache.org/jira/browse/SOLR-1621
> Project: Solr
>  Issue Type: New Feature
>Affects Versions: 1.5
>Reporter: Noble Paul
> Fix For: 1.5
>
> Attachments: SOLR-1621.patch, SOLR-1621.patch, SOLR-1621.patch, 
> SOLR-1621.patch, SOLR-1621.patch, SOLR-1621.patch
>
>
> supporting two different modes of deployments is turning out to be hard. This 
> leads to duplication of code. Moreover there is a lot of confusion on where 
> do we put common configuration. See the mail thread 
> http://markmail.org/message/3m3rqvp2ckausjnf

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1606) Integrate Near Realtime

2009-12-08 Thread Jason Rutherglen (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12787800#action_12787800
 ] 

Jason Rutherglen commented on SOLR-1606:


The current NRT IndexWriter.getReader API cannot yet support 
IndexReaderFactory, I'll open a Lucene issue.

> Integrate Near Realtime 
> 
>
> Key: SOLR-1606
> URL: https://issues.apache.org/jira/browse/SOLR-1606
> Project: Solr
>  Issue Type: Improvement
>  Components: update
>Affects Versions: 1.4
>Reporter: Jason Rutherglen
>Priority: Minor
> Fix For: 1.5
>
> Attachments: SOLR-1606.patch
>
>
> We'll integrate IndexWriter.getReader.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (SOLR-1635) DOMUtils doesn't wrap NumberFormatExceptions with useful errors

2009-12-08 Thread Hoss Man (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1635?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hoss Man resolved SOLR-1635.


   Resolution: Fixed
Fix Version/s: 1.5

> DOMUtils doesn't wrap NumberFormatExceptions with useful errors
> ---
>
> Key: SOLR-1635
> URL: https://issues.apache.org/jira/browse/SOLR-1635
> Project: Solr
>  Issue Type: Bug
>Reporter: Hoss Man
>Assignee: Hoss Man
> Fix For: 1.5
>
> Attachments: SOLR-1635.patch
>
>
> When parsing NamedList style XML, DOMUtils does a really crappy job of 
> reporting errors when it can't parse numeric types (ie:  , , 
> etc...)
> http://old.nabble.com/java.lang.NumberFormatException%3A-For-input-string%3A-%22%22-to26631247.html

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1635) DOMUtils doesn't wrap NumberFormatExceptions with useful errors

2009-12-08 Thread Hoss Man (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12787797#action_12787797
 ] 

Hoss Man commented on SOLR-1635:


Committed revision 888622.

> DOMUtils doesn't wrap NumberFormatExceptions with useful errors
> ---
>
> Key: SOLR-1635
> URL: https://issues.apache.org/jira/browse/SOLR-1635
> Project: Solr
>  Issue Type: Bug
>Reporter: Hoss Man
>Assignee: Hoss Man
> Attachments: SOLR-1635.patch
>
>
> When parsing NamedList style XML, DOMUtils does a really crappy job of 
> reporting errors when it can't parse numeric types (ie:  , , 
> etc...)
> http://old.nabble.com/java.lang.NumberFormatException%3A-For-input-string%3A-%22%22-to26631247.html

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-1635) DOMUtils doesn't wrap NumberFormatExceptions with useful errors

2009-12-08 Thread Hoss Man (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1635?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hoss Man updated SOLR-1635:
---

Attachment: SOLR-1635.patch

Patch for fix and added some much needed javadocs. 

works nicely in the example, just waiting for tests to finish before i commit

> DOMUtils doesn't wrap NumberFormatExceptions with useful errors
> ---
>
> Key: SOLR-1635
> URL: https://issues.apache.org/jira/browse/SOLR-1635
> Project: Solr
>  Issue Type: Bug
>Reporter: Hoss Man
>Assignee: Hoss Man
> Attachments: SOLR-1635.patch
>
>
> When parsing NamedList style XML, DOMUtils does a really crappy job of 
> reporting errors when it can't parse numeric types (ie:  , , 
> etc...)
> http://old.nabble.com/java.lang.NumberFormatException%3A-For-input-string%3A-%22%22-to26631247.html

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (SOLR-1635) DOMUtils doesn't wrap NumberFormatExceptions with useful errors

2009-12-08 Thread Hoss Man (JIRA)
DOMUtils doesn't wrap NumberFormatExceptions with useful errors
---

 Key: SOLR-1635
 URL: https://issues.apache.org/jira/browse/SOLR-1635
 Project: Solr
  Issue Type: Bug
Reporter: Hoss Man
Assignee: Hoss Man


When parsing NamedList style XML, DOMUtils does a really crappy job of 
reporting errors when it can't parse numeric types (ie:  , , etc...)

http://old.nabble.com/java.lang.NumberFormatException%3A-For-input-string%3A-%22%22-to26631247.html

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-236) Field collapsing

2009-12-08 Thread Martijn van Groningen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Martijn van Groningen updated SOLR-236:
---

Attachment: field-collapse-5.patch

I have updated the patch and fixed the following issues:
* The issue that Marc described on the solr-dev list. The collapsed groups 
identifiers disappeared when the id field was anything other then a plain field 
(int, long etc...).
* The caching was not properly working when the collapse.field was changed 
between requests. Queries that should not have been cached were.

> Field collapsing
> 
>
> Key: SOLR-236
> URL: https://issues.apache.org/jira/browse/SOLR-236
> Project: Solr
>  Issue Type: New Feature
>  Components: search
>Affects Versions: 1.3
>Reporter: Emmanuel Keller
> Fix For: 1.5
>
> Attachments: collapsing-patch-to-1.3.0-dieter.patch, 
> collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, 
> collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch, 
> field-collapse-4-with-solrj.patch, field-collapse-5.patch, 
> field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, 
> field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, 
> field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, 
> field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, 
> field-collapse-5.patch, field-collapse-solr-236-2.patch, 
> field-collapse-solr-236.patch, field-collapsing-extended-592129.patch, 
> field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, 
> field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, 
> field_collapsing_dsteigerwald.diff, quasidistributed.additional.patch, 
> SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, 
> SOLR-236-FieldCollapsing.patch, solr-236.patch, SOLR-236_collapsing.patch, 
> SOLR-236_collapsing.patch
>
>
> This patch include a new feature called "Field collapsing".
> "Used in order to collapse a group of results with similar value for a given 
> field to a single entry in the result set. Site collapsing is a special case 
> of this, where all results for a given web site is collapsed into one or two 
> entries in the result set, typically with an associated "more documents from 
> this site" link. See also Duplicate detection."
> http://www.fastsearch.com/glossary.aspx?m=48&amid=299
> The implementation add 3 new query parameters (SolrParams):
> "collapse.field" to choose the field used to group results
> "collapse.type" normal (default value) or adjacent
> "collapse.max" to select how many continuous results are allowed before 
> collapsing
> TODO (in progress):
> - More documentation (on source code)
> - Test cases
> Two patches:
> - "field_collapsing.patch" for current development version
> - "field_collapsing_1.1.0.patch" for Solr-1.1.0
> P.S.: Feedback and misspelling correction are welcome ;-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-1625) Add regexp support for TermsComponent

2009-12-08 Thread Uri Boness (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1625?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uri Boness updated SOLR-1625:
-

Attachment: SOLR-1625.patch

Added support for regexp hints based on the different constants in the Pattern 
class. The terms.regexp.hints parameter accepts an int value corresponding to 
the value passed to the Pattern.compile(String expression, int hints) factory 
method. 

Using hints it is now possible to support case insensitive patterns.

> Add regexp support for TermsComponent
> -
>
> Key: SOLR-1625
> URL: https://issues.apache.org/jira/browse/SOLR-1625
> Project: Solr
>  Issue Type: Improvement
>  Components: search
>Affects Versions: 1.4
>Reporter: Uri Boness
>Priority: Minor
> Fix For: 1.5
>
> Attachments: SOLR-1625.patch, SOLR-1625.patch
>
>
> At the moment the only way to filter the returned terms is by a prefix. It 
> would be nice it the filter could also be done by regular expression

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1621) Allow current single core deployments to be specified by solr.xml

2009-12-08 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12787712#action_12787712
 ] 

Yonik Seeley commented on SOLR-1621:


I'm good with yanking ALIAS for now.

But I think it would be nice to have a single hard-coded alias (perhaps just at 
the level of dispatch filter) that can treat an existing core as the default 
core (w/o having to name it something specific like DEFAULT_CORE.  As long as 
we can have a normal core with a normal name like "music", with the ability to 
use it with legacy URLs.


> Allow current single core deployments to be specified by solr.xml
> -
>
> Key: SOLR-1621
> URL: https://issues.apache.org/jira/browse/SOLR-1621
> Project: Solr
>  Issue Type: New Feature
>Affects Versions: 1.5
>Reporter: Noble Paul
> Fix For: 1.5
>
> Attachments: SOLR-1621.patch, SOLR-1621.patch, SOLR-1621.patch, 
> SOLR-1621.patch, SOLR-1621.patch, SOLR-1621.patch
>
>
> supporting two different modes of deployments is turning out to be hard. This 
> leads to duplication of code. Moreover there is a lot of confusion on where 
> do we put common configuration. See the mail thread 
> http://markmail.org/message/3m3rqvp2ckausjnf

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1621) Allow current single core deployments to be specified by solr.xml

2009-12-08 Thread Hoss Man (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12787704#action_12787704
 ] 

Hoss Man commented on SOLR-1621:


The nice thing about the *idea* of ALIAS is that you can have cores with very 
explicit names (ie: catalog_v1, catalog_v2, etc...) and then you can have 
aliases like "LIVE" and "EXPERIMENTAL" that you can move around as needed.  
Similar things can be accomplished with the SWAP command, but it's more 
limiting.

That said: the last time i tired using ALIAS it was such a confusing pain in 
the ass because of the way the original name was still tracked separately from 
the list of aliases, even if that name had been taken over by another core) i 
couldn't bring myself to use it -- using SWAP and keeping track of the logical 
names externally was less confusing.

If we can make ALIAS work well, it will kick ass -- but we can always yank it 
for now if it's in our way, and add it back again later if someone comes up 
with a good way to do it.  (it was probably a mistake to try and treat names an 
aliases as equals in the first place ... looking up a core by a "string" is 
probably the only use cases where they should be treated equally, all of the 
other CoreAdmin commands should really differentiate.)

> Allow current single core deployments to be specified by solr.xml
> -
>
> Key: SOLR-1621
> URL: https://issues.apache.org/jira/browse/SOLR-1621
> Project: Solr
>  Issue Type: New Feature
>Affects Versions: 1.5
>Reporter: Noble Paul
> Fix For: 1.5
>
> Attachments: SOLR-1621.patch, SOLR-1621.patch, SOLR-1621.patch, 
> SOLR-1621.patch, SOLR-1621.patch, SOLR-1621.patch
>
>
> supporting two different modes of deployments is turning out to be hard. This 
> leads to duplication of code. Moreover there is a lot of confusion on where 
> do we put common configuration. See the mail thread 
> http://markmail.org/message/3m3rqvp2ckausjnf

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1606) Integrate Near Realtime

2009-12-08 Thread Jason Rutherglen (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12787686#action_12787686
 ] 

Jason Rutherglen commented on SOLR-1606:


I was going to start on the auto-warming using IndexWriter's
IndexReaderWarmer, however because this is heavily cache
dependent I think it'll have to wait for SOLR-1308 because we
need to regenerate the cache per reader. 

> Integrate Near Realtime 
> 
>
> Key: SOLR-1606
> URL: https://issues.apache.org/jira/browse/SOLR-1606
> Project: Solr
>  Issue Type: Improvement
>  Components: update
>Affects Versions: 1.4
>Reporter: Jason Rutherglen
>Priority: Minor
> Fix For: 1.5
>
> Attachments: SOLR-1606.patch
>
>
> We'll integrate IndexWriter.getReader.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (SOLR-1634) change order of field operations in SolrCell

2009-12-08 Thread Hoss Man (JIRA)
change order of field operations in SolrCell


 Key: SOLR-1634
 URL: https://issues.apache.org/jira/browse/SOLR-1634
 Project: Solr
  Issue Type: Improvement
  Components: contrib - Solr Cell (Tika extraction)
Reporter: Hoss Man


As noted on the mailing list, SolrCell evaluates fmap.* params AFTER literal.* 
params.  This makes it impossible for users to map tika produced fields to 
other names (possibly for the purpose of ignoring them completely) and then 
using literal to provide explicit values for those fields.  At first glance 
this seems like a bug, except that it is explicitly documented...

http://wiki.apache.org/solr/ExtractingRequestHandler#Order_of_field_operations

...so i'm opening this as an "Improvement".   We should either consider 
changing the order of operations, or find some other way to support what seems 
like a very common use case...

http://old.nabble.com/Re%3A-WELCOME-to-solr-user%40lucene.apache.org-to26650071.html#a26650071

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (SOLR-1633) Solr Cell should be smarter about literal and multiValued="false"

2009-12-08 Thread Hoss Man (JIRA)
Solr Cell should be smarter about literal and multiValued="false"
-

 Key: SOLR-1633
 URL: https://issues.apache.org/jira/browse/SOLR-1633
 Project: Solr
  Issue Type: Improvement
  Components: contrib - Solr Cell (Tika extraction)
Reporter: Hoss Man



As noted on solr-user, SolrCell has less then ideal behavior when "foo" is a 
single value field, and literal.foo=bar is specified in the request, but Tika 
also produces a value for the "foo" field from the document.  It seems like a 
possible improvement here would be for SolrCell to ignore the value from Tika 
if it already has one that was explicitly provided (as opposed to the current 
behavior of letting hte add fail because of multiple values in a single valued 
field).

It seems pretty clear that in cases like this, the users intention is to have 
their one literal field used as the value.

http://old.nabble.com/Re%3A-WELCOME-to-solr-user%40lucene.apache.org-to26650071.html#a26650071

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-773) Incorporate Local Lucene/Solr

2009-12-08 Thread patrick o'leary (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12787663#action_12787663
 ] 

patrick o'leary commented on SOLR-773:
--

You can certainly implement a fuzzy scoring method, but you really want to 
avoid having to calculate distances for all your results, so
some sort of restriction is good.

If your data set is small ~100K docs, you might get away with using a value 
scorer and boost on distances.
But if your data set is in the order of millions, that's not going to be a good 
idea. 

> Incorporate Local Lucene/Solr
> -
>
> Key: SOLR-773
> URL: https://issues.apache.org/jira/browse/SOLR-773
> Project: Solr
>  Issue Type: New Feature
>Reporter: Grant Ingersoll
>Assignee: Grant Ingersoll
>Priority: Minor
> Fix For: 1.5
>
> Attachments: exampleSpatial.zip, lucene-spatial-2.9-dev.jar, 
> lucene.tar.gz, screenshot-1.jpg, SOLR-773-local-lucene.patch, 
> SOLR-773-local-lucene.patch, SOLR-773-local-lucene.patch, 
> SOLR-773-local-lucene.patch, SOLR-773-local-lucene.patch, 
> SOLR-773-spatial_solr.patch, SOLR-773.patch, SOLR-773.patch, 
> solrGeoQuery.tar, spatial-solr.tar.gz
>
>
> Local Lucene has been donated to the Lucene project.  It has some Solr 
> components, but we should evaluate how best to incorporate it into Solr.
> See http://lucene.markmail.org/message/orzro22sqdj3wows?q=LocalLucene

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Issue Comment Edited: (SOLR-773) Incorporate Local Lucene/Solr

2009-12-08 Thread Eric Pugh (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12787655#action_12787655
 ] 

Eric Pugh edited comment on SOLR-773 at 12/8/09 7:00 PM:
-

Patrick, I tried out your "Batteries Included" example, and it worked great.  
One of the questions I have is that it seems like the scoring process doesn't 
take into account the distance from a central point..  In other words, if I 
specify a 10 mile radius, and there is a really high scoring match more then 10 
miles out, it doesn't get returned.  The radius functions as a strict filter of 
what gets returned.  However, I think what we are really trying to do is to 
find the best search results, and have distance factored in as well.  

I was thinking that I could sort of do this "fuzzy" boundary by making a query 
with a radius x, and then doing the same query radius x * 2.  Then, if any of 
the documents in x * 2 are much better then in radius x, then to include them.  
Obviously this would be somewhat clunky to do from the client side!

A use case I can think of is searching for gas stations within 5 miles of me, 
but if a gas station has really cheap gas, and is 6 miles away, then include 
that.  But just a penny cheaper ignore it.   

I added as a "screenshot" a drawing of what I was sort of thinking.



  was (Author: epugh):
Idea of fuzzy borders drawing.
  
> Incorporate Local Lucene/Solr
> -
>
> Key: SOLR-773
> URL: https://issues.apache.org/jira/browse/SOLR-773
> Project: Solr
>  Issue Type: New Feature
>Reporter: Grant Ingersoll
>Assignee: Grant Ingersoll
>Priority: Minor
> Fix For: 1.5
>
> Attachments: exampleSpatial.zip, lucene-spatial-2.9-dev.jar, 
> lucene.tar.gz, screenshot-1.jpg, SOLR-773-local-lucene.patch, 
> SOLR-773-local-lucene.patch, SOLR-773-local-lucene.patch, 
> SOLR-773-local-lucene.patch, SOLR-773-local-lucene.patch, 
> SOLR-773-spatial_solr.patch, SOLR-773.patch, SOLR-773.patch, 
> solrGeoQuery.tar, spatial-solr.tar.gz
>
>
> Local Lucene has been donated to the Lucene project.  It has some Solr 
> components, but we should evaluate how best to incorporate it into Solr.
> See http://lucene.markmail.org/message/orzro22sqdj3wows?q=LocalLucene

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-773) Incorporate Local Lucene/Solr

2009-12-08 Thread Eric Pugh (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-773?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Pugh updated SOLR-773:
---

Attachment: screenshot-1.jpg

Idea of fuzzy borders drawing.

> Incorporate Local Lucene/Solr
> -
>
> Key: SOLR-773
> URL: https://issues.apache.org/jira/browse/SOLR-773
> Project: Solr
>  Issue Type: New Feature
>Reporter: Grant Ingersoll
>Assignee: Grant Ingersoll
>Priority: Minor
> Fix For: 1.5
>
> Attachments: exampleSpatial.zip, lucene-spatial-2.9-dev.jar, 
> lucene.tar.gz, screenshot-1.jpg, SOLR-773-local-lucene.patch, 
> SOLR-773-local-lucene.patch, SOLR-773-local-lucene.patch, 
> SOLR-773-local-lucene.patch, SOLR-773-local-lucene.patch, 
> SOLR-773-spatial_solr.patch, SOLR-773.patch, SOLR-773.patch, 
> solrGeoQuery.tar, spatial-solr.tar.gz
>
>
> Local Lucene has been donated to the Lucene project.  It has some Solr 
> components, but we should evaluate how best to incorporate it into Solr.
> See http://lucene.markmail.org/message/orzro22sqdj3wows?q=LocalLucene

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1606) Integrate Near Realtime

2009-12-08 Thread Jason Rutherglen (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12787621#action_12787621
 ] 

Jason Rutherglen commented on SOLR-1606:


{quote}For example, q=foo&freshness=1000 would cause a new realtime reader to 
be opened of the current one was more than 1000ms old.{quote}

Good idea.

> Integrate Near Realtime 
> 
>
> Key: SOLR-1606
> URL: https://issues.apache.org/jira/browse/SOLR-1606
> Project: Solr
>  Issue Type: Improvement
>  Components: update
>Affects Versions: 1.4
>Reporter: Jason Rutherglen
>Priority: Minor
> Fix For: 1.5
>
> Attachments: SOLR-1606.patch
>
>
> We'll integrate IndexWriter.getReader.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1606) Integrate Near Realtime

2009-12-08 Thread Jason Rutherglen (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12787619#action_12787619
 ] 

Jason Rutherglen commented on SOLR-1606:


{quote}In any case, I assume it must not fsync the files, so you
don't get a commit where you know your in a stable
condition?{quote}

OK, right, for the user commit currently means that after the
call, the index is in a stable state, and that it can be
replicated? I agree, for clarity, I'll create a refresh command
and remove the NRT option from the commit command.



> Integrate Near Realtime 
> 
>
> Key: SOLR-1606
> URL: https://issues.apache.org/jira/browse/SOLR-1606
> Project: Solr
>  Issue Type: Improvement
>  Components: update
>Affects Versions: 1.4
>Reporter: Jason Rutherglen
>Priority: Minor
> Fix For: 1.5
>
> Attachments: SOLR-1606.patch
>
>
> We'll integrate IndexWriter.getReader.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1277) Implement a Solr specific naming service (using Zookeeper)

2009-12-08 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12787572#action_12787572
 ] 

Yonik Seeley commented on SOLR-1277:


I just created a branch: 
http://svn.apache.org/viewvc/lucene/solr/branches/cloud/

I like the direction you've been going Mark, do you think it's ready to do a 
check-in on the branch?

> Implement a Solr specific naming service (using Zookeeper)
> --
>
> Key: SOLR-1277
> URL: https://issues.apache.org/jira/browse/SOLR-1277
> Project: Solr
>  Issue Type: New Feature
>Affects Versions: 1.4
>Reporter: Jason Rutherglen
>Assignee: Grant Ingersoll
>Priority: Minor
> Fix For: 1.5
>
> Attachments: log4j-1.2.15.jar, SOLR-1277.patch, SOLR-1277.patch, 
> SOLR-1277.patch, SOLR-1277.patch, zookeeper-3.2.1.jar
>
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> The goal is to give Solr server clusters self-healing attributes
> where if a server fails, indexing and searching don't stop and
> all of the partitions remain searchable. For configuration, the
> ability to centrally deploy a new configuration without servers
> going offline.
> We can start with basic failover and start from there?
> Features:
> * Automatic failover (i.e. when a server fails, clients stop
> trying to index to or search it)
> * Centralized configuration management (i.e. new solrconfig.xml
> or schema.xml propagates to a live Solr cluster)
> * Optionally allow shards of a partition to be moved to another
> server (i.e. if a server gets hot, move the hot segments out to
> cooler servers). Ideally we'd have a way to detect hot segments
> and move them seamlessly. With NRT this becomes somewhat more
> difficult but not impossible?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Issue Comment Edited: (SOLR-1358) Integration of Tika and DataImportHandler

2009-12-08 Thread Noble Paul (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12750855#action_12750855
 ] 

Noble Paul edited comment on SOLR-1358 at 12/8/09 3:29 PM:
---

Let us provide a new TikaEntityProcessor 

{code:xml}

  
  
  
   

  
  
  
  
  
 
  

{code}

With format=xml|html XPathEntityProcessor can be nested. This may help users 
extract more nested data from a file. It is even possible to create multiple 
documents from a single file

  was (Author: noble.paul):
Let us provide a new TikaEntityProcessor 

{code:xml}

  
  
  

  
  
  
  
  
 
  

{code}

This most likely would need a BinUrlDataSource/BinContentStreamDataSource 
because Tika uses binary inputs.

My suggestion is that TikaEntityProcessor live in the extraction contrib so 
that managing dependencies is easier. But we will have to make extraction have 
a compile-time dependency on DIH. 

Grant , what do you think?
  
> Integration of Tika and DataImportHandler
> -
>
> Key: SOLR-1358
> URL: https://issues.apache.org/jira/browse/SOLR-1358
> Project: Solr
>  Issue Type: New Feature
>  Components: contrib - DataImportHandler
>Reporter: Sascha Szott
>Assignee: Noble Paul
> Attachments: SOLR-1358.patch, SOLR-1358.patch
>
>
> At the moment, it's impossible to configure Solr such that it build up 
> documents by using data that comes from both pdf documents and database table 
> columns. Currently, to accomplish this task, it's up to the user to add some 
> preprocessing that converts pdf files into plain text files. Therefore, I 
> would like to see an integration of Solr Cell into DIH that makes those 
> preprocessing obsolete.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-1358) Integration of Tika and DataImportHandler

2009-12-08 Thread Noble Paul (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Noble Paul updated SOLR-1358:
-

Comment: was deleted

(was: Configuration with attribute to select format of emitted content:

{code:xml} 

  
  
  
  

  
  
  
  
  
 
  

{code} 

With 'emitFormat' different EntityProcessors can be chained. E.g. using "xml" 
value will allow chaining XPathEntityProcessor with TikaEntityProcessor for 
further custom processing.)

> Integration of Tika and DataImportHandler
> -
>
> Key: SOLR-1358
> URL: https://issues.apache.org/jira/browse/SOLR-1358
> Project: Solr
>  Issue Type: New Feature
>  Components: contrib - DataImportHandler
>Reporter: Sascha Szott
>Assignee: Noble Paul
> Attachments: SOLR-1358.patch, SOLR-1358.patch
>
>
> At the moment, it's impossible to configure Solr such that it build up 
> documents by using data that comes from both pdf documents and database table 
> columns. Currently, to accomplish this task, it's up to the user to add some 
> preprocessing that converts pdf files into plain text files. Therefore, I 
> would like to see an integration of Solr Cell into DIH that makes those 
> preprocessing obsolete.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-1358) Integration of Tika and DataImportHandler

2009-12-08 Thread Noble Paul (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Noble Paul updated SOLR-1358:
-

Attachment: SOLR-1358.patch

cleaned a bit

> Integration of Tika and DataImportHandler
> -
>
> Key: SOLR-1358
> URL: https://issues.apache.org/jira/browse/SOLR-1358
> Project: Solr
>  Issue Type: New Feature
>  Components: contrib - DataImportHandler
>Reporter: Sascha Szott
>Assignee: Noble Paul
> Attachments: SOLR-1358.patch, SOLR-1358.patch
>
>
> At the moment, it's impossible to configure Solr such that it build up 
> documents by using data that comes from both pdf documents and database table 
> columns. Currently, to accomplish this task, it's up to the user to add some 
> preprocessing that converts pdf files into plain text files. Therefore, I 
> would like to see an integration of Solr Cell into DIH that makes those 
> preprocessing obsolete.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1606) Integrate Near Realtime

2009-12-08 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12787487#action_12787487
 ] 

Mark Miller commented on SOLR-1606:
---

bq. We could however it'd work the same as commit? 

I've never actually used NRT, so I don't fully understand it. Doesn't it not 
commit the index? Are the changes persisted over a reboot then?

In any case, I assume it must not fsync the files, so you don't get a commit 
where you know your in a stable condition?

There are differences right? Seems like you should have the option of either ...

> Integrate Near Realtime 
> 
>
> Key: SOLR-1606
> URL: https://issues.apache.org/jira/browse/SOLR-1606
> Project: Solr
>  Issue Type: Improvement
>  Components: update
>Affects Versions: 1.4
>Reporter: Jason Rutherglen
>Priority: Minor
> Fix For: 1.5
>
> Attachments: SOLR-1606.patch
>
>
> We'll integrate IndexWriter.getReader.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: Solr Cell revamped as an UpdateProcessor?

2009-12-08 Thread Noble Paul നോബിള്‍ नोब्ळ्
I was refering to SOLR-1358. Anyway , SolrCell as an updateprocessor
is a good idea

On Tue, Dec 8, 2009 at 4:47 PM, Grant Ingersoll  wrote:
>
> On Dec 8, 2009, at 12:22 AM, Noble Paul നോബിള്‍ नोब्ळ् wrote:
>
>> Integrating Extraction w/ DIH is a better option. DIH makes it easier
>> to do the mapping of fields etc.
>
> Which comment is this directed at?  I'm lacking context here.
>
>>
>>
>> On Tue, Dec 8, 2009 at 4:59 AM, Grant Ingersoll  wrote:
>>>
>>> On Dec 7, 2009, at 3:51 PM, Chris Hostetter wrote:
>>>

 ASs someone with very little knowledge of Solr Cell and/or Tika, I find 
 myself wondering if ExtractingRequestHandler would make more sense as an 
 extractingUpdateProcessor -- where it could be configured to take take 
 either binary fields (or string fields containing URLs) out of the 
 Documents, parse them with tika, and add the various XPath matching hunks 
 of text back into the document as new fields.

 Then ExtractingRequestHandler just becomes a handler that slurps up it's 
 ContentStreams and adds them as binary data fields and adds the other 
 literal params as fields.

 Wouldn't that make things like SOLR-1358, and using Tika with 
 URLs/filepaths in XML and CSV based updates fairly trivial?
>>>
>>> It probably could, but am not sure how it works in a processor chain.  
>>> However, I'm not sure I understand how they work all that much either.  I 
>>> also plan on adding, BTW, a SolrJ client for Tika that does the extraction 
>>> on the client.  In many cases, the ExtrReqHandler is really only designed 
>>> for lighter weight extraction cases, as one would simply not want to send 
>>> that much rich content over the wire.
>>
>>
>>
>> --
>> -
>> Noble Paul | Systems Architect| AOL | http://aol.com
>
> --
> Grant Ingersoll
> http://www.lucidimagination.com/
>
> Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using 
> Solr/Lucene:
> http://www.lucidimagination.com/search
>
>



-- 
-
Noble Paul | Systems Architect| AOL | http://aol.com


[jira] Commented: (SOLR-1606) Integrate Near Realtime

2009-12-08 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12787482#action_12787482
 ] 

Yonik Seeley commented on SOLR-1606:


Another thing to consider: allow some things to be turned around for realtime 
by allowing clients to trigger opens of a new reader.
For example, q=foo&freshness=1000 would cause a new realtime reader to be 
opened of the current one was more than 1000ms old.

> Integrate Near Realtime 
> 
>
> Key: SOLR-1606
> URL: https://issues.apache.org/jira/browse/SOLR-1606
> Project: Solr
>  Issue Type: Improvement
>  Components: update
>Affects Versions: 1.4
>Reporter: Jason Rutherglen
>Priority: Minor
> Fix For: 1.5
>
> Attachments: SOLR-1606.patch
>
>
> We'll integrate IndexWriter.getReader.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: Upgrading Lucene jars

2009-12-08 Thread Koji Sekiguchi

Shalin Shekhar Mangar wrote:

I need to upgrade contrib-spellcheck jar for SOLR-785. Should I go ahead and
upgrade all Lucene jars to the latest 2.9 branch code?

  

+1.

Koji

--
http://www.rondhuit.com/en/



[jira] Commented: (SOLR-1277) Implement a Solr specific naming service (using Zookeeper)

2009-12-08 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12787479#action_12787479
 ] 

Yonik Seeley commented on SOLR-1277:


bq. We should probably start a ZooKeeper branch since this issue is likely to 
get quite large and hopefully have many contributors

+1, that will help both direct developers and power users who want to try it 
out (and thus lower the bar for small contributions)

> Implement a Solr specific naming service (using Zookeeper)
> --
>
> Key: SOLR-1277
> URL: https://issues.apache.org/jira/browse/SOLR-1277
> Project: Solr
>  Issue Type: New Feature
>Affects Versions: 1.4
>Reporter: Jason Rutherglen
>Assignee: Grant Ingersoll
>Priority: Minor
> Fix For: 1.5
>
> Attachments: log4j-1.2.15.jar, SOLR-1277.patch, SOLR-1277.patch, 
> SOLR-1277.patch, SOLR-1277.patch, zookeeper-3.2.1.jar
>
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> The goal is to give Solr server clusters self-healing attributes
> where if a server fails, indexing and searching don't stop and
> all of the partitions remain searchable. For configuration, the
> ability to centrally deploy a new configuration without servers
> going offline.
> We can start with basic failover and start from there?
> Features:
> * Automatic failover (i.e. when a server fails, clients stop
> trying to index to or search it)
> * Centralized configuration management (i.e. new solrconfig.xml
> or schema.xml propagates to a live Solr cluster)
> * Optionally allow shards of a partition to be moved to another
> server (i.e. if a server gets hot, move the hot segments out to
> cooler servers). Ideally we'd have a way to detect hot segments
> and move them seamlessly. With NRT this becomes somewhat more
> difficult but not impossible?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-1358) Integration of Tika and DataImportHandler

2009-12-08 Thread Akshay K. Ukey (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akshay K. Ukey updated SOLR-1358:
-

Attachment: SOLR-1358.patch

First cut patch. Not tested.

> Integration of Tika and DataImportHandler
> -
>
> Key: SOLR-1358
> URL: https://issues.apache.org/jira/browse/SOLR-1358
> Project: Solr
>  Issue Type: New Feature
>  Components: contrib - DataImportHandler
>Reporter: Sascha Szott
>Assignee: Noble Paul
> Attachments: SOLR-1358.patch
>
>
> At the moment, it's impossible to configure Solr such that it build up 
> documents by using data that comes from both pdf documents and database table 
> columns. Currently, to accomplish this task, it's up to the user to add some 
> preprocessing that converts pdf files into plain text files. Therefore, I 
> would like to see an integration of Solr Cell into DIH that makes those 
> preprocessing obsolete.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1358) Integration of Tika and DataImportHandler

2009-12-08 Thread Akshay K. Ukey (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12787466#action_12787466
 ] 

Akshay K. Ukey commented on SOLR-1358:
--

Configuration with attribute to select format of emitted content:

{code:xml} 

  
  
  
  

  
  
  
  
  
 
  

{code} 

With 'emitFormat' different EntityProcessors can be chained. E.g. using "xml" 
value will allow chaining XPathEntityProcessor with TikaEntityProcessor for 
further custom processing.

> Integration of Tika and DataImportHandler
> -
>
> Key: SOLR-1358
> URL: https://issues.apache.org/jira/browse/SOLR-1358
> Project: Solr
>  Issue Type: New Feature
>  Components: contrib - DataImportHandler
>Reporter: Sascha Szott
>Assignee: Noble Paul
>
> At the moment, it's impossible to configure Solr such that it build up 
> documents by using data that comes from both pdf documents and database table 
> columns. Currently, to accomplish this task, it's up to the user to add some 
> preprocessing that converts pdf files into plain text files. Therefore, I 
> would like to see an integration of Solr Cell into DIH that makes those 
> preprocessing obsolete.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: Solr Cell revamped as an UpdateProcessor?

2009-12-08 Thread Grant Ingersoll

On Dec 8, 2009, at 12:22 AM, Noble Paul നോബിള്‍ नोब्ळ् wrote:

> Integrating Extraction w/ DIH is a better option. DIH makes it easier
> to do the mapping of fields etc.

Which comment is this directed at?  I'm lacking context here.

> 
> 
> On Tue, Dec 8, 2009 at 4:59 AM, Grant Ingersoll  wrote:
>> 
>> On Dec 7, 2009, at 3:51 PM, Chris Hostetter wrote:
>> 
>>> 
>>> ASs someone with very little knowledge of Solr Cell and/or Tika, I find 
>>> myself wondering if ExtractingRequestHandler would make more sense as an 
>>> extractingUpdateProcessor -- where it could be configured to take take 
>>> either binary fields (or string fields containing URLs) out of the 
>>> Documents, parse them with tika, and add the various XPath matching hunks 
>>> of text back into the document as new fields.
>>> 
>>> Then ExtractingRequestHandler just becomes a handler that slurps up it's 
>>> ContentStreams and adds them as binary data fields and adds the other 
>>> literal params as fields.
>>> 
>>> Wouldn't that make things like SOLR-1358, and using Tika with 
>>> URLs/filepaths in XML and CSV based updates fairly trivial?
>> 
>> It probably could, but am not sure how it works in a processor chain.  
>> However, I'm not sure I understand how they work all that much either.  I 
>> also plan on adding, BTW, a SolrJ client for Tika that does the extraction 
>> on the client.  In many cases, the ExtrReqHandler is really only designed 
>> for lighter weight extraction cases, as one would simply not want to send 
>> that much rich content over the wire.
> 
> 
> 
> -- 
> -
> Noble Paul | Systems Architect| AOL | http://aol.com

--
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using 
Solr/Lucene:
http://www.lucidimagination.com/search



Upgrading Lucene jars

2009-12-08 Thread Shalin Shekhar Mangar
I need to upgrade contrib-spellcheck jar for SOLR-785. Should I go ahead and
upgrade all Lucene jars to the latest 2.9 branch code?

-- 
Regards,
Shalin Shekhar Mangar.


Hudson build is back to normal: Solr-trunk #997

2009-12-08 Thread Apache Hudson Server
See 




[jira] Issue Comment Edited: (SOLR-1358) Integration of Tika and DataImportHandler

2009-12-08 Thread Noble Paul (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12750855#action_12750855
 ] 

Noble Paul edited comment on SOLR-1358 at 12/8/09 8:50 AM:
---

Let us provide a new TikaEntityProcessor 

{code:xml}

  
  
  

  
  
  
  
  
 
  

{code}

This most likely would need a BinUrlDataSource/BinContentStreamDataSource 
because Tika uses binary inputs.

My suggestion is that TikaEntityProcessor live in the extraction contrib so 
that managing dependencies is easier. But we will have to make extraction have 
a compile-time dependency on DIH. 

Grant , what do you think?

  was (Author: noble.paul):
Let us provide a new TikaEntityProcessor 

{code:xml}

  
  
  

 
  

{code}

This most likely would need a BinUrlDataSource/BinContentStreamDataSource 
because Tika uses binary inputs.

My suggestion is that TikaEntityProcessor live in the extraction contrib so 
that managing dependencies is easier. But we will have to make extraction have 
a compile-time dependency on DIH. 

Grant , what do you think?
  
> Integration of Tika and DataImportHandler
> -
>
> Key: SOLR-1358
> URL: https://issues.apache.org/jira/browse/SOLR-1358
> Project: Solr
>  Issue Type: New Feature
>  Components: contrib - DataImportHandler
>Reporter: Sascha Szott
>Assignee: Noble Paul
>
> At the moment, it's impossible to configure Solr such that it build up 
> documents by using data that comes from both pdf documents and database table 
> columns. Currently, to accomplish this task, it's up to the user to add some 
> preprocessing that converts pdf files into plain text files. Therefore, I 
> would like to see an integration of Solr Cell into DIH that makes those 
> preprocessing obsolete.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.