[jira] Commented: (SOLR-1106) Pluggable CoreAdminHandler (Action ) architecture that allows for custom handler access to CoreContainer / request-response

2009-04-17 Thread Noble Paul (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12700424#action_12700424
 ] 

Noble Paul commented on SOLR-1106:
--

to hoss' comment on the list http://markmail.org/message/s7mcbtaskngr74bd

The commands such as create/load/unload etc can only be done by the 
CoreAdminHandler. So it is not really possible to achieve this as a 
RequestHandler. Take our usecase where we start with a blank slate ( zero 
cores) and we keep adding cores . In this case there is no core in the first  
place to attach a RequestHandler 

bq.we might end up adding one more method to be overridden. Let me know what 
you feel about this. 

As I see it , there will be very few users overriding the CoreAdminHandler . We 
do it and we have a custom build of Solr for that. With this issue fixed I may 
be able to plugin my custom CoreAdminHandler.  having 7 -8 methods to be 
overridden is a good idea. If there are new commands we may have new methods

> Pluggable CoreAdminHandler  (Action ) architecture that allows for custom 
> handler access to CoreContainer / request-response 
> -
>
> Key: SOLR-1106
> URL: https://issues.apache.org/jira/browse/SOLR-1106
> Project: Solr
>  Issue Type: New Feature
> Environment: Java 5, Tomcat 6 
>Reporter: Kay Kay
> Attachments: SOLR-1106.patch, SOLR-1106.patch, SOLR-1106.patch
>
>   Original Estimate: 3h
>  Remaining Estimate: 3h
>
> Currently there are certain default actions implemented in CoreAdminHandler ( 
> CREATE , SWAP, RELOAD , ALIAS etc.) . 
> For the purpose of in-house monitoring tools that needs to interact with 
> multiple cores at a given solr instance - we need custom handlers that has 
> access to CoreContainer and the req, resp of the same. 
> So - the proposed way of injecting handlers is as follows. 
> In solr.xml - we add a new schema - 
>  
>
> 
>handlerType="com.mydomain.myclass" />
>
>
> New abstract class -  CoreAdminActionRequestHandler added - that 
> com.mydomain.myclass would need to inherit from. 
> Following action handlers registered by default - 
> registerCustomAdminHandler("create", new 
> AdminCreateActionRequestHandler());
> registerCustomAdminHandler("rename", new 
> AdminRenameActionRequestHandler());
> registerCustomAdminHandler("alias", new AdminAliasActionRequestHandler());
> registerCustomAdminHandler("unload", new 
> AdminUnloadActionRequestHandler());
> registerCustomAdminHandler("status", new 
> AdminStatusActionRequestHandler());
> registerCustomAdminHandler("persist", new 
> AdminPersistActionRequestHandler());
> registerCustomAdminHandler("reload", new 
> AdminReloadActionRequestHandler());
> registerCustomAdminHandler("swap", new AdminSwapActionRequestHandler());
> Trying to register a handler with one that already exists would result in an 
> error ( Hence - the above mentioned defaults would not be overridden). 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: SOLR-1106 - Custom Admin Action handler

2009-04-17 Thread Noble Paul നോബിള്‍ नोब्ळ्
hi hoss .I am adding the responses to the issue

https://issues.apache.org/jira/browse/SOLR-1106

On Sat, Apr 18, 2009 at 3:25 AM, Chris Hostetter
 wrote:
>
> : For one of our projects - we need custom admin monitoring hooks that gets
> : access to multiple cores for a given solr web app (through the CoreContainer
> : interface).
>
> i've only skimed the back and forth in this thread (and in the issue
> comments) but i'm wondering why/if this specificaly needs to be done as
> extensions to the CoreAdminHandler?
>
> Any Core can get access to the CoreContainer (via
> core.getCoreDescriptor().getCoreContainer()) and all of the SolrCores it
> is managing, so couldn't these new hooks you need be implented in regular
> RequestHandler?
>
> I ask this from the "how to achieve a niche goal with the minimal number
> of invasive changes" standpoint -- mainly because i don't really
> understand what new types of "monitoring hooks" you're thinking of.  if
> they seem like something that would be generally useful to lots of people,
> why not add them to CoreAdminHandler?  if they *pattern* of adding them
> seems like something that will come up for lots of people *then* i would
> worry about making CoreAdminHandler more extensible.
>
>
>
> -Hoss
>
>



-- 
--Noble Paul


[jira] Commented: (SOLR-633) QParser for use with user-entered query which recognizes subphrases as well as allowing some other customizations on per field basis

2009-04-17 Thread Otis Gospodnetic (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12700422#action_12700422
 ] 

Otis Gospodnetic commented on SOLR-633:
---

This description could sure use an example! :)  I read it 3 times and still 
don't have a good picture of what this is really about.


> QParser for use with user-entered query which recognizes subphrases as well 
> as allowing some other customizations on per field basis
> 
>
> Key: SOLR-633
> URL: https://issues.apache.org/jira/browse/SOLR-633
> Project: Solr
>  Issue Type: New Feature
>  Components: search
>Affects Versions: 1.4
> Environment: All
>Reporter: Preetam Rao
>Priority: Minor
> Fix For: 1.5
>
>
> Create a request handler (actually a QParser) for use with user entered 
> queries with following features-
> a) Take a user query string and try to match it against multiple fields, 
> while recognizing sub-phrase matches.
> b) For each field give the below parameters:
>1) phraseBoost - the factor which decides how good a n token sub phrase 
> match is compared to n-1 token sub-phrase match.
>2) maxScoreOnly - If there are multiple sub-phrase matches pick, only the 
> highest
>3) ignoreDuplicates - If the same sub-phrase query matches multiple times, 
> pick only one.
>4) disableOtherScoreFactors - Ignore tf, query norm, idf and any other 
> parameters which are not relevant.
> c) Try to provide all the parameters similar to dismax. Reuse or extend 
> dismax.  
> Other suggestions and feedback appreciated :-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-1099) FieldAnalysisRequestHandler

2009-04-17 Thread Uri Boness (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1099?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uri Boness updated SOLR-1099:
-

Attachment: AnalisysRequestHandler_refactored.patch

The latest patch (AnalisysRequestHandler_refactored.patch) does the following:

- deprecates the AnalysisRequesthandler
- adds the AnalysisRequestHandlerBase
- adds the DocumentAnalysisRequestHandler
- modifies the FieldAnalysisRequestHandler
- adds/updates the appropriate test classes

NOTE: the response format of the DocumentAnalysisRequestHandler differs from 
the AnalysisRequestHandler after all. This is mainly for two reasons: 1) to be 
consistent with the response format of the FieldAnalysisRequestHandler, 2) New 
features were added to this request handler which didn't exist in the old 
AnalysisRequestHandler

> FieldAnalysisRequestHandler
> ---
>
> Key: SOLR-1099
> URL: https://issues.apache.org/jira/browse/SOLR-1099
> Project: Solr
>  Issue Type: New Feature
>  Components: Analysis
>Affects Versions: 1.3
>Reporter: Uri Boness
>Assignee: Shalin Shekhar Mangar
> Fix For: 1.4
>
> Attachments: AnalisysRequestHandler_refactored.patch, 
> FieldAnalysisRequestHandler_incl_test.patch
>
>
> The FieldAnalysisRequestHandler provides the analysis functionality of the 
> web admin page as a service. This handler accepts a filetype/fieldname 
> parameter and a value and as a response returns a breakdown of the analysis 
> process. It is also possible to send a query value which will use the 
> configured query analyzer as well as a showmatch parameter which will then 
> mark every matched token as a match.
> If this handler is added to the code base, I also recommend to rename the 
> current AnalysisRequestHandler to DocumentAnalysisRequestHandler and have 
> them both inherit from one AnalysisRequestHandlerBase class which provides 
> the common functionality of the analysis breakdown and its translation to 
> named lists. This will also enhance the current AnalysisRequestHandler which 
> right now is fairly simplistic.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: SOLR-1106 - Custom Admin Action handler

2009-04-17 Thread Chris Hostetter

: For one of our projects - we need custom admin monitoring hooks that gets
: access to multiple cores for a given solr web app (through the CoreContainer
: interface).

i've only skimed the back and forth in this thread (and in the issue 
comments) but i'm wondering why/if this specificaly needs to be done as 
extensions to the CoreAdminHandler?

Any Core can get access to the CoreContainer (via 
core.getCoreDescriptor().getCoreContainer()) and all of the SolrCores it 
is managing, so couldn't these new hooks you need be implented in regular 
RequestHandler?

I ask this from the "how to achieve a niche goal with the minimal number 
of invasive changes" standpoint -- mainly because i don't really 
understand what new types of "monitoring hooks" you're thinking of.  if 
they seem like something that would be generally useful to lots of people, 
why not add them to CoreAdminHandler?  if they *pattern* of adding them 
seems like something that will come up for lots of people *then* i would 
worry about making CoreAdminHandler more extensible.



-Hoss



Re: analyzer in QueryElevationComponent

2009-04-17 Thread Ryan Mckinley

Yup -- that would be better.

On Apr 17, 2009, at 4:41 AM, Koji Sekiguchi  wrote:


Hi,

I'm just seeing the source of QueryElevationComponent.java.
At inform() method, analyzer object is set like this:

// line 154
analyzer = ft.getAnalyzer();

Why is getAnalyzer() used? Isn't getQueryAnalyzer() better?

Koji




[jira] Updated: (SOLR-773) Incorporate Local Lucene/Solr

2009-04-17 Thread patrick o'leary (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-773?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

patrick o'leary updated SOLR-773:
-

Attachment: SOLR-773.patch

This fixes the query parsing issue, it defaults to the use the default 
QParserPlugin
and allows you to specify a basedOn optional argument, to use a different 
QParserPlugin
 
{code}

{code}

There are a couple of things to note
* Latest distance facet code not included
* Faster distance filter using query intersect isn't working (spatial lucene 
fix) 
* fsv for shard sorting not present 

I feel fsv should be extracted to a separate component to reduce the 
duplication of effort across
other search components. But this will give the basics for the moment.

> Incorporate Local Lucene/Solr
> -
>
> Key: SOLR-773
> URL: https://issues.apache.org/jira/browse/SOLR-773
> Project: Solr
>  Issue Type: New Feature
>Reporter: Grant Ingersoll
>Assignee: Grant Ingersoll
>Priority: Minor
> Attachments: lucene.tar.gz, SOLR-773-local-lucene.patch, 
> SOLR-773-local-lucene.patch, SOLR-773-local-lucene.patch, 
> SOLR-773-local-lucene.patch, SOLR-773-local-lucene.patch, SOLR-773.patch, 
> SOLR-773.patch, spatial-solr.tar.gz
>
>
> Local Lucene has been donated to the Lucene project.  It has some Solr 
> components, but we should evaluate how best to incorporate it into Solr.
> See http://lucene.markmail.org/message/orzro22sqdj3wows?q=LocalLucene

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1106) Pluggable CoreAdminHandler (Action ) architecture that allows for custom handler access to CoreContainer / request-response

2009-04-17 Thread Kay Kay (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12700210#action_12700210
 ] 

Kay Kay commented on SOLR-1106:
---

{quote}
There are places were we may need to override the default implementation. 
Actually we already do it internally. So if I cannot override the the default 
commands it may not be as useful
{quote}

If that were the case - then going by an abstract action and implementations of 
the same (with option to override default implementations ) would probably be 
cleaner , allowing default implementations to be overridden. Otherwise we are 
looking at probably around 6 - 7 methods that could be overridden and as we add 
more default commands - we might end up adding one more method to be 
overridden. Let me know what you feel about this. 

> Pluggable CoreAdminHandler  (Action ) architecture that allows for custom 
> handler access to CoreContainer / request-response 
> -
>
> Key: SOLR-1106
> URL: https://issues.apache.org/jira/browse/SOLR-1106
> Project: Solr
>  Issue Type: New Feature
> Environment: Java 5, Tomcat 6 
>Reporter: Kay Kay
> Attachments: SOLR-1106.patch, SOLR-1106.patch, SOLR-1106.patch
>
>   Original Estimate: 3h
>  Remaining Estimate: 3h
>
> Currently there are certain default actions implemented in CoreAdminHandler ( 
> CREATE , SWAP, RELOAD , ALIAS etc.) . 
> For the purpose of in-house monitoring tools that needs to interact with 
> multiple cores at a given solr instance - we need custom handlers that has 
> access to CoreContainer and the req, resp of the same. 
> So - the proposed way of injecting handlers is as follows. 
> In solr.xml - we add a new schema - 
>  
>
> 
>handlerType="com.mydomain.myclass" />
>
>
> New abstract class -  CoreAdminActionRequestHandler added - that 
> com.mydomain.myclass would need to inherit from. 
> Following action handlers registered by default - 
> registerCustomAdminHandler("create", new 
> AdminCreateActionRequestHandler());
> registerCustomAdminHandler("rename", new 
> AdminRenameActionRequestHandler());
> registerCustomAdminHandler("alias", new AdminAliasActionRequestHandler());
> registerCustomAdminHandler("unload", new 
> AdminUnloadActionRequestHandler());
> registerCustomAdminHandler("status", new 
> AdminStatusActionRequestHandler());
> registerCustomAdminHandler("persist", new 
> AdminPersistActionRequestHandler());
> registerCustomAdminHandler("reload", new 
> AdminReloadActionRequestHandler());
> registerCustomAdminHandler("swap", new AdminSwapActionRequestHandler());
> Trying to register a handler with one that already exists would result in an 
> error ( Hence - the above mentioned defaults would not be overridden). 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1106) Pluggable CoreAdminHandler (Action ) architecture that allows for custom handler access to CoreContainer / request-response

2009-04-17 Thread Noble Paul (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12700205#action_12700205
 ] 

Noble Paul commented on SOLR-1106:
--

bq.I do not see the motivation to override the default commands. 

There are places were we may need to override the default implementation. 
Actually we already do it internally. So if I cannot override the the default 
commands it may not be as useful

> Pluggable CoreAdminHandler  (Action ) architecture that allows for custom 
> handler access to CoreContainer / request-response 
> -
>
> Key: SOLR-1106
> URL: https://issues.apache.org/jira/browse/SOLR-1106
> Project: Solr
>  Issue Type: New Feature
> Environment: Java 5, Tomcat 6 
>Reporter: Kay Kay
> Attachments: SOLR-1106.patch, SOLR-1106.patch, SOLR-1106.patch
>
>   Original Estimate: 3h
>  Remaining Estimate: 3h
>
> Currently there are certain default actions implemented in CoreAdminHandler ( 
> CREATE , SWAP, RELOAD , ALIAS etc.) . 
> For the purpose of in-house monitoring tools that needs to interact with 
> multiple cores at a given solr instance - we need custom handlers that has 
> access to CoreContainer and the req, resp of the same. 
> So - the proposed way of injecting handlers is as follows. 
> In solr.xml - we add a new schema - 
>  
>
> 
>handlerType="com.mydomain.myclass" />
>
>
> New abstract class -  CoreAdminActionRequestHandler added - that 
> com.mydomain.myclass would need to inherit from. 
> Following action handlers registered by default - 
> registerCustomAdminHandler("create", new 
> AdminCreateActionRequestHandler());
> registerCustomAdminHandler("rename", new 
> AdminRenameActionRequestHandler());
> registerCustomAdminHandler("alias", new AdminAliasActionRequestHandler());
> registerCustomAdminHandler("unload", new 
> AdminUnloadActionRequestHandler());
> registerCustomAdminHandler("status", new 
> AdminStatusActionRequestHandler());
> registerCustomAdminHandler("persist", new 
> AdminPersistActionRequestHandler());
> registerCustomAdminHandler("reload", new 
> AdminReloadActionRequestHandler());
> registerCustomAdminHandler("swap", new AdminSwapActionRequestHandler());
> Trying to register a handler with one that already exists would result in an 
> error ( Hence - the above mentioned defaults would not be overridden). 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-1106) Pluggable CoreAdminHandler (Action ) architecture that allows for custom handler access to CoreContainer / request-response

2009-04-17 Thread Kay Kay (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1106?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kay Kay updated SOLR-1106:
--

Attachment: SOLR-1106.patch

CoreAdminHandler ( default ) instantiated without using reflection 

> Pluggable CoreAdminHandler  (Action ) architecture that allows for custom 
> handler access to CoreContainer / request-response 
> -
>
> Key: SOLR-1106
> URL: https://issues.apache.org/jira/browse/SOLR-1106
> Project: Solr
>  Issue Type: New Feature
> Environment: Java 5, Tomcat 6 
>Reporter: Kay Kay
> Attachments: SOLR-1106.patch, SOLR-1106.patch, SOLR-1106.patch
>
>   Original Estimate: 3h
>  Remaining Estimate: 3h
>
> Currently there are certain default actions implemented in CoreAdminHandler ( 
> CREATE , SWAP, RELOAD , ALIAS etc.) . 
> For the purpose of in-house monitoring tools that needs to interact with 
> multiple cores at a given solr instance - we need custom handlers that has 
> access to CoreContainer and the req, resp of the same. 
> So - the proposed way of injecting handlers is as follows. 
> In solr.xml - we add a new schema - 
>  
>
> 
>handlerType="com.mydomain.myclass" />
>
>
> New abstract class -  CoreAdminActionRequestHandler added - that 
> com.mydomain.myclass would need to inherit from. 
> Following action handlers registered by default - 
> registerCustomAdminHandler("create", new 
> AdminCreateActionRequestHandler());
> registerCustomAdminHandler("rename", new 
> AdminRenameActionRequestHandler());
> registerCustomAdminHandler("alias", new AdminAliasActionRequestHandler());
> registerCustomAdminHandler("unload", new 
> AdminUnloadActionRequestHandler());
> registerCustomAdminHandler("status", new 
> AdminStatusActionRequestHandler());
> registerCustomAdminHandler("persist", new 
> AdminPersistActionRequestHandler());
> registerCustomAdminHandler("reload", new 
> AdminReloadActionRequestHandler());
> registerCustomAdminHandler("swap", new AdminSwapActionRequestHandler());
> Trying to register a handler with one that already exists would result in an 
> error ( Hence - the above mentioned defaults would not be overridden). 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1106) Pluggable CoreAdminHandler (Action ) architecture that allows for custom handler access to CoreContainer / request-response

2009-04-17 Thread Kay Kay (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12700191#action_12700191
 ] 

Kay Kay commented on SOLR-1106:
---

{quote}
It leaves no option to override the standard commands.
{quote}

That seems counter-intuitive to your objection of the first design by Abstract 
classes and implementations since the correct way is to have an abstract class 
and provide default implementations of the same for commands available by 
default.  Also - I do not see the motivation to override the default commands. 

I will attach the revised patch to load the default CoreAdminHandler without 
reflection. 

> Pluggable CoreAdminHandler  (Action ) architecture that allows for custom 
> handler access to CoreContainer / request-response 
> -
>
> Key: SOLR-1106
> URL: https://issues.apache.org/jira/browse/SOLR-1106
> Project: Solr
>  Issue Type: New Feature
> Environment: Java 5, Tomcat 6 
>Reporter: Kay Kay
> Attachments: SOLR-1106.patch, SOLR-1106.patch
>
>   Original Estimate: 3h
>  Remaining Estimate: 3h
>
> Currently there are certain default actions implemented in CoreAdminHandler ( 
> CREATE , SWAP, RELOAD , ALIAS etc.) . 
> For the purpose of in-house monitoring tools that needs to interact with 
> multiple cores at a given solr instance - we need custom handlers that has 
> access to CoreContainer and the req, resp of the same. 
> So - the proposed way of injecting handlers is as follows. 
> In solr.xml - we add a new schema - 
>  
>
> 
>handlerType="com.mydomain.myclass" />
>
>
> New abstract class -  CoreAdminActionRequestHandler added - that 
> com.mydomain.myclass would need to inherit from. 
> Following action handlers registered by default - 
> registerCustomAdminHandler("create", new 
> AdminCreateActionRequestHandler());
> registerCustomAdminHandler("rename", new 
> AdminRenameActionRequestHandler());
> registerCustomAdminHandler("alias", new AdminAliasActionRequestHandler());
> registerCustomAdminHandler("unload", new 
> AdminUnloadActionRequestHandler());
> registerCustomAdminHandler("status", new 
> AdminStatusActionRequestHandler());
> registerCustomAdminHandler("persist", new 
> AdminPersistActionRequestHandler());
> registerCustomAdminHandler("reload", new 
> AdminReloadActionRequestHandler());
> registerCustomAdminHandler("swap", new AdminSwapActionRequestHandler());
> Trying to register a handler with one that already exists would result in an 
> error ( Hence - the above mentioned defaults would not be overridden). 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-844) A SolrServer impl to front-end multiple urls

2009-04-17 Thread Shalin Shekhar Mangar (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12700109#action_12700109
 ] 

Shalin Shekhar Mangar commented on SOLR-844:


Committed revision 765912.

> A SolrServer impl to front-end multiple urls
> 
>
> Key: SOLR-844
> URL: https://issues.apache.org/jira/browse/SOLR-844
> Project: Solr
>  Issue Type: New Feature
>  Components: clients - java
>Affects Versions: 1.3
>Reporter: Noble Paul
>Assignee: Shalin Shekhar Mangar
> Fix For: 1.4
>
> Attachments: SOLR-844.patch, SOLR-844.patch, SOLR-844.patch, 
> SOLR-844.patch, SOLR-844.patch, SOLR-844.patch, SOLR-844.patch, SOLR-844.patch
>
>
> Currently a {{CommonsHttpSolrServer}} can talk to only one server. This 
> demands that the user have a LoadBalancer or do the roundrobin on their own. 
> We must have a {{LBHttpSolrServer}} which must automatically do a 
> Loadbalancing between multiple hosts. This can be backed by the 
> {{CommonsHttpSolrServer}}
> This can have the following other features
> * Automatic failover
> * Optionally take in  a file /url containing the the urls of servers so that 
> the server list can be automatically updated  by periodically loading the 
> config
> * Support for adding removing servers during runtime
> * Pluggable Loadbalancing mechanism. (round-robin, weighted round-robin, 
> random etc)
> * Pluggable Failover mechanisms

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1120) Simplify EntityProcessor API

2009-04-17 Thread Noble Paul (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12700087#action_12700087
 ] 

Noble Paul commented on SOLR-1120:
--

bq.Is there any ligitimate place where one would want to disallow replaceTokens?

yes . the XPathEntityProcessor uses it directly just to know what are the 
variables in the url so that it can read them and store . probably we can  add 
amethod getEntityAttributeResolved() to get the resolved value

> Simplify EntityProcessor API
> 
>
> Key: SOLR-1120
> URL: https://issues.apache.org/jira/browse/SOLR-1120
> Project: Solr
>  Issue Type: Improvement
>  Components: contrib - DataImportHandler
>Affects Versions: 1.3
>Reporter: Shalin Shekhar Mangar
>Assignee: Shalin Shekhar Mangar
> Fix For: 1.4
>
>
> Writing an EntityProcessor is deceptively complex. There are so many gotchas.
> I propose the following:
> # Extract out the Transformer application logic from EntityProcessor and add 
> it to DocBuilder. Then EntityProcessor do not need to call applyTransformer 
> or know about rowIterator and getFromRowCache() methods.
> # Change the meaning of EntityProcessor#destroy to be called on end of 
> parent's row -- Right now init is called once per parent row but destroy 
> actually means the end of import. In fact, there is no correct way for an 
> entity processor to do clean up right now. Most do clean up when returning 
> null (end of data) but with the introduction of $skipDoc, a transformer can 
> return $skipDoc and the entity processor will never get a chance to clean up 
> for the current init.
> # EntityProcessor will use the EventListener API to listen for import end. 
> This should be used by EntityProcessor to do a final cleanup.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Hudson build is back to normal: Solr-trunk #775

2009-04-17 Thread Apache Hudson Server
See http://hudson.zones.apache.org/hudson/job/Solr-trunk/775/changes




[jira] Commented: (SOLR-1120) Simplify EntityProcessor API

2009-04-17 Thread Fergus McMenemie (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12700085#action_12700085
 ] 

Fergus McMenemie commented on SOLR-1120:


Good idea. Dont know if you are interested in the following.

* Further to extracting out the Transformer application logic, I was wondering 
if every entity attribute read should automatically be processed by 
replaceTokens. Is there any ligitimate place where one would want to disallow 
replaceTokens? The following snippet of code is repeated far too many times; 
but is important if DIH is to provide simple predictable behaviour.  

{code}
s = context.getEntityAttribute(CHANGELIST_OMIT);
if (s != null) s = resolver.replaceTokens(s);

{code}

* The regexp transformer now has several combinations of mutually exclusive 
attributes. It would be nice to check the attributes for nonsensical 
combinations. However given that the transformer is invoked for every row such 
checking code could be a nasty overhead. I dont know how to sort this, but 
somehow we need to catch the first invocation of a fields transformer and allow 
far more detailed checking of the attributes

> Simplify EntityProcessor API
> 
>
> Key: SOLR-1120
> URL: https://issues.apache.org/jira/browse/SOLR-1120
> Project: Solr
>  Issue Type: Improvement
>  Components: contrib - DataImportHandler
>Affects Versions: 1.3
>Reporter: Shalin Shekhar Mangar
>Assignee: Shalin Shekhar Mangar
> Fix For: 1.4
>
>
> Writing an EntityProcessor is deceptively complex. There are so many gotchas.
> I propose the following:
> # Extract out the Transformer application logic from EntityProcessor and add 
> it to DocBuilder. Then EntityProcessor do not need to call applyTransformer 
> or know about rowIterator and getFromRowCache() methods.
> # Change the meaning of EntityProcessor#destroy to be called on end of 
> parent's row -- Right now init is called once per parent row but destroy 
> actually means the end of import. In fact, there is no correct way for an 
> entity processor to do clean up right now. Most do clean up when returning 
> null (end of data) but with the introduction of $skipDoc, a transformer can 
> return $skipDoc and the entity processor will never get a chance to clean up 
> for the current init.
> # EntityProcessor will use the EventListener API to listen for import end. 
> This should be used by EntityProcessor to do a final cleanup.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



analyzer in QueryElevationComponent

2009-04-17 Thread Koji Sekiguchi
Hi,

I'm just seeing the source of QueryElevationComponent.java.
At inform() method, analyzer object is set like this:

// line 154
analyzer = ft.getAnalyzer();

Why is getAnalyzer() used? Isn't getQueryAnalyzer() better?

Koji




[jira] Commented: (SOLR-1060) a new DIH EnityProcessor allowing text file lists of files to be indexed

2009-04-17 Thread Shalin Shekhar Mangar (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12700049#action_12700049
 ] 

Shalin Shekhar Mangar commented on SOLR-1060:
-

Thanks Fergus.

Even though LineEntityProcessor was originally conceived by you for reading 
file/urls from a text file, it does not need to be mentioned in the javadocs. I 
think it can confuse users. The purpose of LineEntityProcessor is simple, just 
read line by and line, accept/reject and pass on. The documentation should not 
be more complicated than that.

Also look at SOLR-1120 that I just opened. There are just so many things in 
entity processor that even I cannot keep track of. It is a big change but very 
much needed so let me circle back to this issue after taking care of it.

Some of the gotchas are:
# Right way to clean up. Contrary to my previous comments, destroy is not the 
right place to do the cleanup.
# applyTransformer can return multiple rows which are cached in the entity 
processor base class
# onError attribute need to be handled correctly e.g. abort, skip, continue

I'll take this forward from here.

> a new DIH EnityProcessor allowing text file lists of files to be indexed
> 
>
> Key: SOLR-1060
> URL: https://issues.apache.org/jira/browse/SOLR-1060
> Project: Solr
>  Issue Type: New Feature
>  Components: contrib - DataImportHandler
>Affects Versions: 1.4
>Reporter: Fergus McMenemie
>Assignee: Shalin Shekhar Mangar
> Fix For: 1.4
>
> Attachments: regex-fix.patch, SOLR-1060.patch, SOLR-1060.patch, 
> SOLR-1060.patch, SOLR-1060.patch, SOLR-1060.patch, SOLR-1060.patch, 
> SOLR-1060.patch, SOLR-1060.patch
>
>   Original Estimate: 120h
>  Remaining Estimate: 120h
>
> I have finished a new DIH EntityProcessor. It is designed around the idea 
> that whatever demon is used to maintain your content store it is likely to 
> drop a report or log file explaining what has changed within your content 
> store. I wish to use this report file to control the indexing of the new or 
> changed content and the removal of old content. The report files, perhaps 
> from un-tar or un-zip, are likely to reference jpegs and directory stubs 
> which need to be ignored. I assumed a file based content repository but this 
> should be expanded to handle URI's as well
> I feel that the current FileListEntityProcessor is poorly named. It should be 
> called the dirWalkEntityProcessor or dirCrawlEntityProcessor or such. And 
> this new EntityProcessor should have the name FileListEntityProcessor. 
> However what is done is done. I then came up with manifestEnityProcessor 
> which I thought suited, manifest files are all over the content sets I deal 
> with and the dictionary definition seemed close enough ("ships manifest"). 
> However how about ChangeListEntityProcessor
> {code}
>processor="ManifestEntityProcessor"
>baseDir="/Volumes/Techmore/ts/aaa/schema/data"
>rootEntity="false"
>dataSource="null"
>allowRegex="^.*\.xml$"
>blockRegex="usc2009"
>manifestFileName="/Volumes/ts/man-find.txt"
>docAddRegex=".*"
>>
> {code}
> The new entity fields are as follows.
>  
>*manifestFileName* is the required location of the manifest file. If this 
> value is relative, it assumed to be relative to baseDir.
>*allowRegex* is an optional attribute that if present discards any line 
> which does not match the regExp
>  
>*blockRegex* is an optional attribute that is applied after any allowRegex 
> and discards any line which matches the regExp
>*docAddRegex* is a required regex to identify lines which when matched 
> should cause docs to be added to the index. As well as matching the line it 
> should also return the portion of the line which contains the filepath as 
> group(1)
>*docDeleteRegex* is an optional value of a regex to identify documents 
> which when matched should be deleted from the index. As well as matching the 
> line it should also return the portion of the line which contains the 
> filepath as group(1) **PLANNED**

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (SOLR-1120) Simplify EntityProcessor API

2009-04-17 Thread Shalin Shekhar Mangar (JIRA)
Simplify EntityProcessor API


 Key: SOLR-1120
 URL: https://issues.apache.org/jira/browse/SOLR-1120
 Project: Solr
  Issue Type: Improvement
  Components: contrib - DataImportHandler
Affects Versions: 1.3
Reporter: Shalin Shekhar Mangar
Assignee: Shalin Shekhar Mangar
 Fix For: 1.4


Writing an EntityProcessor is deceptively complex. There are so many gotchas.

I propose the following:
# Extract out the Transformer application logic from EntityProcessor and add it 
to DocBuilder. Then EntityProcessor do not need to call applyTransformer or 
know about rowIterator and getFromRowCache() methods.
# Change the meaning of EntityProcessor#destroy to be called on end of parent's 
row -- Right now init is called once per parent row but destroy actually means 
the end of import. In fact, there is no correct way for an entity processor to 
do clean up right now. Most do clean up when returning null (end of data) but 
with the introduction of $skipDoc, a transformer can return $skipDoc and the 
entity processor will never get a chance to clean up for the current init.
# EntityProcessor will use the EventListener API to listen for import end. This 
should be used by EntityProcessor to do a final cleanup.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.