Re: SOLR-1106 - Custom Admin Action handler

2009-04-14 Thread Noble Paul നോബിള്‍ नोब्ळ्
On Mon, Apr 13, 2009 at 10:03 PM, Kay Kay  wrote:
> These custom action handlers need not be residing in solr . Hence I needed a
> hook ( listener ) that they can register themselves with and be loaded by
> the SolrResourceLoader ( ./lib/*.jar ) .  Also I believe the default
> handlers are very useful , necessary and mandatory and hence ported them to
> the listener for consistency purposes.
>
> Also - if we have a protected method called invokeCommand() - how do we
> inject that type as the admin handler ( as opposed to CoreAdminHandler) .
> Right now - the type information seems hardcoded in CoreContainer though.

There is no mean to inject that currently, But that can be made
possible by an extra attribute in the  tag . say 

We will have to refactor the code a bit so that you may be able to
extend the default core admin handler
>
>  //  Multicore self related methods ---
>  /**
>   * Creates a CoreAdminHandler for this MultiCore.
>   * @return a CoreAdminHandler
>   */
>  protected CoreAdminHandler createMultiCoreHandler() {
>    return new CoreAdminHandler() {
>     �...@override
>      public CoreContainer getCoreContainer() {
>        return CoreContainer.this;
>      }
>    };
>  }
>
>
> 2009/4/13 Noble Paul നോബിള്‍ नोब्ळ् 
>
>> Hi Kay,
>>
>> The idea of one handler per command looks like an overkill. How about
>> having a protected methods for all the known commands and have a
>> separate method invokeCommand() which can choose to implement any
>> extra commands if need be. This way the changes needed would be
>> minimal.
>>
>> On Mon, Apr 13, 2009 at 8:53 PM, Kay Kay  wrote:
>> > For one of our projects - we need custom admin monitoring hooks that gets
>> > access to multiple cores for a given solr web app (through the
>> CoreContainer
>> > interface).
>> >
>> > There are common admin handler commands with the actions - register /
>> swap /
>> > load etc. that seem to be available by default.
>> >
>> > I have submitted a patch to add custom admin handlers , against custom
>> > actions  ( that also refactors the existing action handlers that are
>> > available by default as well ).
>> >
>> > This would be useful to extend the handlers that need access to multiple
>> > cores.  Just curious if this is something that could be looked into .
>> > Thanks.
>> >
>>
>>
>>
>> --
>> --Noble Paul
>>
>



-- 
--Noble Paul


[jira] Updated: (SOLR-599) Lightweight SolrJ client

2009-04-14 Thread Noble Paul (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-599?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Noble Paul updated SOLR-599:


Attachment: (was: SOLR-599.patch)

> Lightweight SolrJ client
> 
>
> Key: SOLR-599
> URL: https://issues.apache.org/jira/browse/SOLR-599
> Project: Solr
>  Issue Type: Improvement
>  Components: clients - java
>Affects Versions: 1.3
>Reporter: Shalin Shekhar Mangar
>Priority: Minor
> Attachments: SOLR-599.patch
>
>
> SolrJ provides a SolrServer implementation backed by commons-httpclient which 
> introduces many dependency jars (commons-codec, commons-io and 
> commons-logging). Apart from that SolrJ also uses StAX API for XML parsing 
> which introduces dependencies like stax-api, stax and stax-utils.
> This enhancement will add a SolrServer implementation backed by 
> java.net.HttpUrlConnection and will use BinaryResponseParser as the default 
> response parser. Using this basic implementation out of the box would require 
> no dependencies on either commons-httpclient or StAX. The only dependency 
> would be on solr-commons making this a very lightweight and distribution 
> friendly Java client for Solr.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-599) Lightweight SolrJ client

2009-04-14 Thread Noble Paul (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-599?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Noble Paul updated SOLR-599:


Attachment: SOLR-599.patch

> Lightweight SolrJ client
> 
>
> Key: SOLR-599
> URL: https://issues.apache.org/jira/browse/SOLR-599
> Project: Solr
>  Issue Type: Improvement
>  Components: clients - java
>Affects Versions: 1.3
>Reporter: Shalin Shekhar Mangar
>Priority: Minor
> Attachments: SOLR-599.patch, SOLR-599.patch
>
>
> SolrJ provides a SolrServer implementation backed by commons-httpclient which 
> introduces many dependency jars (commons-codec, commons-io and 
> commons-logging). Apart from that SolrJ also uses StAX API for XML parsing 
> which introduces dependencies like stax-api, stax and stax-utils.
> This enhancement will add a SolrServer implementation backed by 
> java.net.HttpUrlConnection and will use BinaryResponseParser as the default 
> response parser. Using this basic implementation out of the box would require 
> no dependencies on either commons-httpclient or StAX. The only dependency 
> would be on solr-commons making this a very lightweight and distribution 
> friendly Java client for Solr.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-599) Lightweight SolrJ client

2009-04-14 Thread Noble Paul (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-599?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Noble Paul updated SOLR-599:


Attachment: SOLR-599.patch

untested patch . 

> Lightweight SolrJ client
> 
>
> Key: SOLR-599
> URL: https://issues.apache.org/jira/browse/SOLR-599
> Project: Solr
>  Issue Type: Improvement
>  Components: clients - java
>Affects Versions: 1.3
>Reporter: Shalin Shekhar Mangar
>Priority: Minor
> Attachments: SOLR-599.patch
>
>
> SolrJ provides a SolrServer implementation backed by commons-httpclient which 
> introduces many dependency jars (commons-codec, commons-io and 
> commons-logging). Apart from that SolrJ also uses StAX API for XML parsing 
> which introduces dependencies like stax-api, stax and stax-utils.
> This enhancement will add a SolrServer implementation backed by 
> java.net.HttpUrlConnection and will use BinaryResponseParser as the default 
> response parser. Using this basic implementation out of the box would require 
> no dependencies on either commons-httpclient or StAX. The only dependency 
> would be on solr-commons making this a very lightweight and distribution 
> friendly Java client for Solr.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Hudson build is back to normal: Solr-trunk #772

2009-04-14 Thread Apache Hudson Server
See http://hudson.zones.apache.org/hudson/job/Solr-trunk/772/changes




[jira] Updated: (SOLR-1115) on and yes should be acceptable in solrconfig.xml

2009-04-14 Thread Koji Sekiguchi (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Sekiguchi updated SOLR-1115:
-

Attachment: SOLR-1115.patch

The patch attached. I'll commit shortly.

> on and yes should be acceptable in solrconfig.xml
> ---
>
> Key: SOLR-1115
> URL: https://issues.apache.org/jira/browse/SOLR-1115
> Project: Solr
>  Issue Type: Improvement
>Affects Versions: 1.2, 1.3
>Reporter: Koji Sekiguchi
>Priority: Trivial
> Fix For: 1.4
>
> Attachments: SOLR-1115.patch
>
>
> snipoff from here:
> http://www.nabble.com/parsing-bool-type-in-solrconfig.xml-td23025954.html

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: parsing bool type in solrconfig.xml

2009-04-14 Thread Koji Sekiguchi

Erik Hatcher wrote:
+1 also.  on/yes/true should all work for boolean parameters (just 
like Ant ;)


Erik


On Apr 14, 2009, at 2:01 AM, Shalin Shekhar Mangar wrote:


2009/4/13 Koji Sekiguchi 


Should we accept not only true, but also on
and yes?
I think it is easy by using parseBool() instead of Boolean.valueOf() in
DOMUtil.



+1

I know it is inconsistent but so are the request parameters like hl,
debugQuery etc which I doubt will be changed.

--
Regards,
Shalin Shekhar Mangar.





Thanks guys. I opened https://issues.apache.org/jira/browse/SOLR-1115 .

Koji



[jira] Created: (SOLR-1115) on and yes should be acceptable in solrconfig.xml

2009-04-14 Thread Koji Sekiguchi (JIRA)
on and yes should be acceptable in solrconfig.xml
---

 Key: SOLR-1115
 URL: https://issues.apache.org/jira/browse/SOLR-1115
 Project: Solr
  Issue Type: Improvement
Affects Versions: 1.3, 1.2
Reporter: Koji Sekiguchi
Priority: Trivial
 Fix For: 1.4


snipoff from here:
http://www.nabble.com/parsing-bool-type-in-solrconfig.xml-td23025954.html


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (SOLR-1114) Re-organize examples directory keeping core and contribs in mind

2009-04-14 Thread Shalin Shekhar Mangar (JIRA)
Re-organize examples directory keeping core and contribs in mind


 Key: SOLR-1114
 URL: https://issues.apache.org/jira/browse/SOLR-1114
 Project: Solr
  Issue Type: Improvement
Reporter: Shalin Shekhar Mangar


Re-organize examples directory keeping core and contribs in mind.

>From Grant on solr-dev:
{quote}
The templates directory would contain the configurations (i.e. schema.xml and 
solrconfig.xml) and any sample docs (but not the libraries) for:
   tutorial - The current tutorial example
   dih - The DIH example
   extraction - Solr Cell example
   geo - geo spatial example (once 773 is committed)
   clustering - once SOLR-769 is committed
   simple - A barebones schema and config (mainly used for bootstrapping a 
new project for experienced users)
   exploratory - Basically, the same as simple, but the schema defines a 
single dynamic field -  Think of Hoss's Solr Out of the Box talk from ApacheCon 
whereby you want to quickly explore a new data set without having to define a 
schema.
   [other] -

Note, the templates directory could also live under each contrib, but it isn't 
necessarily a 1-1 thing (e.g. simple and exploratory templates are not 
contrib-specific).

Then, typing "ant example" would copy the necessary tutorial stuff to the 
example directory (which still contains the Jetty stuff) but would not have to 
recurse into any of the contribs.

Typing "ant example -Dtype=clustering"  would copy the clustering requirements, 
plus go to contrib/clustering (or whatever) and get the appropriate material 
such that the example directory.  Similarly for any of the other "templates"

Additionally, you could also define -DoutputDir such that it would take and 
copy the whole example directory (including the appropriate type) to some 
output dir.  This would allow one to quickly bootstrap a Solr project without 
having to do a lot of schema editing.
{quote}

http://markmail.org/thread/w6da7pwhcsdn43n3

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1060) a new DIH EnityProcessor allowing text file lists of files to be indexed

2009-04-14 Thread Fergus McMenemie (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12698767#action_12698767
 ] 

Fergus McMenemie commented on SOLR-1060:


Hmmm,

Are you referring to the fragment of code inside ChangeListEntityProcessor that 
opens the changelist, and its similarity to the functionality in URIDataSource?

I had not thought about arranging some kind of nested use of URIDataSource... 
is that what you are thinking about? 

> a new DIH EnityProcessor allowing text file lists of files to be indexed
> 
>
> Key: SOLR-1060
> URL: https://issues.apache.org/jira/browse/SOLR-1060
> Project: Solr
>  Issue Type: New Feature
>  Components: contrib - DataImportHandler
>Affects Versions: 1.4
>Reporter: Fergus McMenemie
>Assignee: Shalin Shekhar Mangar
> Fix For: 1.4
>
> Attachments: regex-fix.patch, SOLR-1060.patch, SOLR-1060.patch, 
> SOLR-1060.patch, SOLR-1060.patch, SOLR-1060.patch
>
>   Original Estimate: 120h
>  Remaining Estimate: 120h
>
> I have finished a new DIH EntityProcessor. It is designed around the idea 
> that whatever demon is used to maintain your content store it is likely to 
> drop a report or log file explaining what has changed within your content 
> store. I wish to use this report file to control the indexing of the new or 
> changed content and the removal of old content. The report files, perhaps 
> from un-tar or un-zip, are likely to reference jpegs and directory stubs 
> which need to be ignored. I assumed a file based content repository but this 
> should be expanded to handle URI's as well
> I feel that the current FileListEntityProcessor is poorly named. It should be 
> called the dirWalkEntityProcessor or dirCrawlEntityProcessor or such. And 
> this new EntityProcessor should have the name FileListEntityProcessor. 
> However what is done is done. I then came up with manifestEnityProcessor 
> which I thought suited, manifest files are all over the content sets I deal 
> with and the dictionary definition seemed close enough ("ships manifest"). 
> However how about ChangeListEntityProcessor
> {code}
>processor="ManifestEntityProcessor"
>baseDir="/Volumes/Techmore/ts/aaa/schema/data"
>rootEntity="false"
>dataSource="null"
>allowRegex="^.*\.xml$"
>blockRegex="usc2009"
>manifestFileName="/Volumes/ts/man-find.txt"
>docAddRegex=".*"
>>
> {code}
> The new entity fields are as follows.
>  
>*manifestFileName* is the required location of the manifest file. If this 
> value is relative, it assumed to be relative to baseDir.
>*allowRegex* is an optional attribute that if present discards any line 
> which does not match the regExp
>  
>*blockRegex* is an optional attribute that is applied after any allowRegex 
> and discards any line which matches the regExp
>*docAddRegex* is a required regex to identify lines which when matched 
> should cause docs to be added to the index. As well as matching the line it 
> should also return the portion of the line which contains the filepath as 
> group(1)
>*docDeleteRegex* is an optional value of a regex to identify documents 
> which when matched should be deleted from the index. As well as matching the 
> line it should also return the portion of the line which contains the 
> filepath as group(1) **PLANNED**

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: parsing bool type in solrconfig.xml

2009-04-14 Thread Erik Hatcher
+1 also.  on/yes/true should all work for boolean parameters (just  
like Ant ;)


Erik


On Apr 14, 2009, at 2:01 AM, Shalin Shekhar Mangar wrote:


2009/4/13 Koji Sekiguchi 


Should we accept not only true, but also on
and yes?
I think it is easy by using parseBool() instead of  
Boolean.valueOf() in

DOMUtil.



+1

I know it is inconsistent but so are the request parameters like hl,
debugQuery etc which I doubt will be changed.

--
Regards,
Shalin Shekhar Mangar.




[jira] Commented: (SOLR-1099) FieldAnalysisRequestHandler

2009-04-14 Thread Shalin Shekhar Mangar (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12698726#action_12698726
 ] 

Shalin Shekhar Mangar commented on SOLR-1099:
-

{quote}I hope this makes things a bit clearer {quote}

Crystal clear! Thanks Uri!

> FieldAnalysisRequestHandler
> ---
>
> Key: SOLR-1099
> URL: https://issues.apache.org/jira/browse/SOLR-1099
> Project: Solr
>  Issue Type: New Feature
>  Components: Analysis
>Affects Versions: 1.3
>Reporter: Uri Boness
>Assignee: Shalin Shekhar Mangar
> Fix For: 1.4
>
> Attachments: FieldAnalysisRequestHandler_incl_test.patch
>
>
> The FieldAnalysisRequestHandler provides the analysis functionality of the 
> web admin page as a service. This handler accepts a filetype/fieldname 
> parameter and a value and as a response returns a breakdown of the analysis 
> process. It is also possible to send a query value which will use the 
> configured query analyzer as well as a showmatch parameter which will then 
> mark every matched token as a match.
> If this handler is added to the code base, I also recommend to rename the 
> current AnalysisRequestHandler to DocumentAnalysisRequestHandler and have 
> them both inherit from one AnalysisRequestHandlerBase class which provides 
> the common functionality of the analysis breakdown and its translation to 
> named lists. This will also enhance the current AnalysisRequestHandler which 
> right now is fairly simplistic.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1099) FieldAnalysisRequestHandler

2009-04-14 Thread Uri Boness (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12698723#action_12698723
 ] 

Uri Boness commented on SOLR-1099:
--

{quote}
We copy AnalysisRequestHandler (ARH) to DocumentAnalysisRequestHandler and 
deprecate ARH.
{quote}
true, but it will be enhanced with functionality and support more extensive 
analysis breakdown (e.g. adding a query analysis and showmatch support)

{quote}
We extract common code (if any) of ARH and FieldARH in to a base class 
AnalysisRequestHandlerBase, as you suggested
{quote}
true

{quote}
We modify analysis.jsp to use FieldARH (maybe as a separate issue)
{quote}
probably a separate issue is more appropriate.

{quote}
You do not need to support AnalysisRequestHandler's format because it will also 
exist by the name of DocumentAnalysisRequestHandler. Since FieldARH is a new 
handler, it does not need to be back-compatible with ARH. Supporting the old 
format is a nice-to-have feature but not necessary.
{quote}
True. The old AnalysisRequestHandler will be deprecated and it's (enhanced) 
functionality will be available via the DocumentAnalysisRequestHandler. That 
said, it would be nice to be backward compatible as much as possible for those 
who are using the old ARH already (I suspect not many are using it anyway as 
it's mostly used for tooling and debugging). I do believe that both the new 
DocumentARH and the FieldARH are useful for different purposes due the nature 
of their differences as I mentioned above.

I hope this makes things a bit clearer :-)

> FieldAnalysisRequestHandler
> ---
>
> Key: SOLR-1099
> URL: https://issues.apache.org/jira/browse/SOLR-1099
> Project: Solr
>  Issue Type: New Feature
>  Components: Analysis
>Affects Versions: 1.3
>Reporter: Uri Boness
>Assignee: Shalin Shekhar Mangar
> Fix For: 1.4
>
> Attachments: FieldAnalysisRequestHandler_incl_test.patch
>
>
> The FieldAnalysisRequestHandler provides the analysis functionality of the 
> web admin page as a service. This handler accepts a filetype/fieldname 
> parameter and a value and as a response returns a breakdown of the analysis 
> process. It is also possible to send a query value which will use the 
> configured query analyzer as well as a showmatch parameter which will then 
> mark every matched token as a match.
> If this handler is added to the code base, I also recommend to rename the 
> current AnalysisRequestHandler to DocumentAnalysisRequestHandler and have 
> them both inherit from one AnalysisRequestHandlerBase class which provides 
> the common functionality of the analysis breakdown and its translation to 
> named lists. This will also enhance the current AnalysisRequestHandler which 
> right now is fairly simplistic.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1099) FieldAnalysisRequestHandler

2009-04-14 Thread Shalin Shekhar Mangar (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12698720#action_12698720
 ] 

Shalin Shekhar Mangar commented on SOLR-1099:
-

{quote}
The only change to the original structure will happen when more parameters will 
be sent, for example, when a query analysis takes place and a "showmatch=true" 
is sent then each matched token will be marked as a "match". I'll have to have 
a closer look at the current response of the AnalysisRequestHandler and see if 
I can support the exact same structure my gut feeling is that it's possible.
{quote}

This is the part that I do not understand.

Let me outline what I understood:
# We copy AnalysisRequestHandler (ARH) to DocumentAnalysisRequestHandler and 
deprecate ARH.
# We extract common code (if any) of ARH and FieldARH in to a base class 
AnalysisRequestHandlerBase, as you suggested
# We modify analysis.jsp to use FieldARH (maybe as a separate issue)

You do not need to support AnalysisRequestHandler's format because it will also 
exist by the name of DocumentAnalysisRequestHandler. Since FieldARH is a new 
handler, it does not need to be back-compatible with ARH. Supporting the old 
format is a nice-to-have feature but not necessary.

> FieldAnalysisRequestHandler
> ---
>
> Key: SOLR-1099
> URL: https://issues.apache.org/jira/browse/SOLR-1099
> Project: Solr
>  Issue Type: New Feature
>  Components: Analysis
>Affects Versions: 1.3
>Reporter: Uri Boness
>Assignee: Shalin Shekhar Mangar
> Fix For: 1.4
>
> Attachments: FieldAnalysisRequestHandler_incl_test.patch
>
>
> The FieldAnalysisRequestHandler provides the analysis functionality of the 
> web admin page as a service. This handler accepts a filetype/fieldname 
> parameter and a value and as a response returns a breakdown of the analysis 
> process. It is also possible to send a query value which will use the 
> configured query analyzer as well as a showmatch parameter which will then 
> mark every matched token as a match.
> If this handler is added to the code base, I also recommend to rename the 
> current AnalysisRequestHandler to DocumentAnalysisRequestHandler and have 
> them both inherit from one AnalysisRequestHandlerBase class which provides 
> the common functionality of the analysis breakdown and its translation to 
> named lists. This will also enhance the current AnalysisRequestHandler which 
> right now is fairly simplistic.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1099) FieldAnalysisRequestHandler

2009-04-14 Thread Uri Boness (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12698713#action_12698713
 ] 

Uri Boness commented on SOLR-1099:
--

{quote}
I was assuming that the output format of AnalysisRequestHandler and 
FieldAnalysisRequestHandler remains exactly as they are today and the 
refactoring is just to abstract common code into a base class.
{quote}
{quote}
Agreed. But the output of DocumentAnalysisRequestHandler will look exactly like 
what AnalysisRequestHandler returns today, right?
{quote}
You know what... your're right... I think it is possible to keep the same 
output by default. The only change to the original structure will happen when 
more parameters will be sent, for example, when a query analysis takes place 
and a "showmatch=true" is sent then each matched token will be marked as a 
"match". I'll have to have a closer look at the current response of the 
AnalysisRequestHandler and see if I can support the exact same structure my 
gut feeling is that it's possible.

I'll start working on it and see where I get

> FieldAnalysisRequestHandler
> ---
>
> Key: SOLR-1099
> URL: https://issues.apache.org/jira/browse/SOLR-1099
> Project: Solr
>  Issue Type: New Feature
>  Components: Analysis
>Affects Versions: 1.3
>Reporter: Uri Boness
>Assignee: Shalin Shekhar Mangar
> Fix For: 1.4
>
> Attachments: FieldAnalysisRequestHandler_incl_test.patch
>
>
> The FieldAnalysisRequestHandler provides the analysis functionality of the 
> web admin page as a service. This handler accepts a filetype/fieldname 
> parameter and a value and as a response returns a breakdown of the analysis 
> process. It is also possible to send a query value which will use the 
> configured query analyzer as well as a showmatch parameter which will then 
> mark every matched token as a match.
> If this handler is added to the code base, I also recommend to rename the 
> current AnalysisRequestHandler to DocumentAnalysisRequestHandler and have 
> them both inherit from one AnalysisRequestHandlerBase class which provides 
> the common functionality of the analysis breakdown and its translation to 
> named lists. This will also enhance the current AnalysisRequestHandler which 
> right now is fairly simplistic.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1099) FieldAnalysisRequestHandler

2009-04-14 Thread Shalin Shekhar Mangar (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12698711#action_12698711
 ] 

Shalin Shekhar Mangar commented on SOLR-1099:
-

{quote}The public API for the AnalysisRequestHandler will change in the context 
of the response. {quote}

I was assuming that the output format of AnalysisRequestHandler and 
FieldAnalysisRequestHandler remains exactly as they are today and the 
refactoring is just to abstract common code into a base class.

{quote}
Furthermore, it's probably wise to rename the AnalysisRequestHandler to 
DocumentAnalysisRequestHandler (more expressive name and also consistent with 
the FieldAnalysisRequestHandler). Another option is to do this refactoring 
anyway, and leave the AnalysisRequestHandler as is and only deprecate it. So 
basically we'll have 4 classes:

AnalysisRequestHanlderBase
FieldAnalysisRequestHanlder
DocumentAnalysisRequestHandler
AnalysisRequestHandler (deprecated)
{quote}

Agreed. But the output of DocumentAnalysisRequestHandler will look exactly like 
what AnalysisRequestHandler returns today, right?

{quote}it would be also be wise to reimplement the anaysis.jsp to use this new 
handler and clean it up from all the analysis (now duplicate) logic code.{quote}

Agreed.

> FieldAnalysisRequestHandler
> ---
>
> Key: SOLR-1099
> URL: https://issues.apache.org/jira/browse/SOLR-1099
> Project: Solr
>  Issue Type: New Feature
>  Components: Analysis
>Affects Versions: 1.3
>Reporter: Uri Boness
>Assignee: Shalin Shekhar Mangar
> Fix For: 1.4
>
> Attachments: FieldAnalysisRequestHandler_incl_test.patch
>
>
> The FieldAnalysisRequestHandler provides the analysis functionality of the 
> web admin page as a service. This handler accepts a filetype/fieldname 
> parameter and a value and as a response returns a breakdown of the analysis 
> process. It is also possible to send a query value which will use the 
> configured query analyzer as well as a showmatch parameter which will then 
> mark every matched token as a match.
> If this handler is added to the code base, I also recommend to rename the 
> current AnalysisRequestHandler to DocumentAnalysisRequestHandler and have 
> them both inherit from one AnalysisRequestHandlerBase class which provides 
> the common functionality of the analysis breakdown and its translation to 
> named lists. This will also enhance the current AnalysisRequestHandler which 
> right now is fairly simplistic.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1060) a new DIH EnityProcessor allowing text file lists of files to be indexed

2009-04-14 Thread Shalin Shekhar Mangar (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12698710#action_12698710
 ] 

Shalin Shekhar Mangar commented on SOLR-1060:
-

Fergus, ChangeListEntityProcessor seems to duplicate URIDataSource's 
functionality instead of using it. Why is that?

> a new DIH EnityProcessor allowing text file lists of files to be indexed
> 
>
> Key: SOLR-1060
> URL: https://issues.apache.org/jira/browse/SOLR-1060
> Project: Solr
>  Issue Type: New Feature
>  Components: contrib - DataImportHandler
>Affects Versions: 1.4
>Reporter: Fergus McMenemie
>Assignee: Shalin Shekhar Mangar
> Fix For: 1.4
>
> Attachments: regex-fix.patch, SOLR-1060.patch, SOLR-1060.patch, 
> SOLR-1060.patch, SOLR-1060.patch, SOLR-1060.patch
>
>   Original Estimate: 120h
>  Remaining Estimate: 120h
>
> I have finished a new DIH EntityProcessor. It is designed around the idea 
> that whatever demon is used to maintain your content store it is likely to 
> drop a report or log file explaining what has changed within your content 
> store. I wish to use this report file to control the indexing of the new or 
> changed content and the removal of old content. The report files, perhaps 
> from un-tar or un-zip, are likely to reference jpegs and directory stubs 
> which need to be ignored. I assumed a file based content repository but this 
> should be expanded to handle URI's as well
> I feel that the current FileListEntityProcessor is poorly named. It should be 
> called the dirWalkEntityProcessor or dirCrawlEntityProcessor or such. And 
> this new EntityProcessor should have the name FileListEntityProcessor. 
> However what is done is done. I then came up with manifestEnityProcessor 
> which I thought suited, manifest files are all over the content sets I deal 
> with and the dictionary definition seemed close enough ("ships manifest"). 
> However how about ChangeListEntityProcessor
> {code}
>processor="ManifestEntityProcessor"
>baseDir="/Volumes/Techmore/ts/aaa/schema/data"
>rootEntity="false"
>dataSource="null"
>allowRegex="^.*\.xml$"
>blockRegex="usc2009"
>manifestFileName="/Volumes/ts/man-find.txt"
>docAddRegex=".*"
>>
> {code}
> The new entity fields are as follows.
>  
>*manifestFileName* is the required location of the manifest file. If this 
> value is relative, it assumed to be relative to baseDir.
>*allowRegex* is an optional attribute that if present discards any line 
> which does not match the regExp
>  
>*blockRegex* is an optional attribute that is applied after any allowRegex 
> and discards any line which matches the regExp
>*docAddRegex* is a required regex to identify lines which when matched 
> should cause docs to be added to the index. As well as matching the line it 
> should also return the portion of the line which contains the filepath as 
> group(1)
>*docDeleteRegex* is an optional value of a regex to identify documents 
> which when matched should be deleted from the index. As well as matching the 
> line it should also return the portion of the line which contains the 
> filepath as group(1) **PLANNED**

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1099) FieldAnalysisRequestHandler

2009-04-14 Thread Uri Boness (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12698708#action_12698708
 ] 

Uri Boness commented on SOLR-1099:
--

The public API for the AnalysisRequestHandler will change in the context of the 
response. Since the analysis breakdown is more detailed, the response format 
will have to change a bit. Furthermore, it's probably wise to rename the 
AnalysisRequestHandler to DocumentAnalysisRequestHandler (more expressive name 
and also consistent with the FieldAnalysisRequestHandler). Another option is to 
do this refactoring anyway, and leave the AnalysisRequestHandler as is and only 
deprecate it. So basically we'll have 4 classes:

AnalysisRequestHanlderBase
FieldAnalysisRequestHanlder
DocumentAnalysisRequestHandler
AnalysisRequestHandler (deprecated)

what do you think?

BTW, once commited, it would be also be wise to reimplement the anaysis.jsp to 
use this new handler and clean it up from all the analysis (now duplicate) 
logic code.


> FieldAnalysisRequestHandler
> ---
>
> Key: SOLR-1099
> URL: https://issues.apache.org/jira/browse/SOLR-1099
> Project: Solr
>  Issue Type: New Feature
>  Components: Analysis
>Affects Versions: 1.3
>Reporter: Uri Boness
>Assignee: Shalin Shekhar Mangar
> Fix For: 1.4
>
> Attachments: FieldAnalysisRequestHandler_incl_test.patch
>
>
> The FieldAnalysisRequestHandler provides the analysis functionality of the 
> web admin page as a service. This handler accepts a filetype/fieldname 
> parameter and a value and as a response returns a breakdown of the analysis 
> process. It is also possible to send a query value which will use the 
> configured query analyzer as well as a showmatch parameter which will then 
> mark every matched token as a match.
> If this handler is added to the code base, I also recommend to rename the 
> current AnalysisRequestHandler to DocumentAnalysisRequestHandler and have 
> them both inherit from one AnalysisRequestHandlerBase class which provides 
> the common functionality of the analysis breakdown and its translation to 
> named lists. This will also enhance the current AnalysisRequestHandler which 
> right now is fairly simplistic.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (SOLR-1113) Error reports from ExtractingRequestHandler and Co do not indicate name of rejected documents

2009-04-14 Thread Fergus McMenemie (JIRA)
Error reports from ExtractingRequestHandler and Co do not indicate name of 
rejected documents
-

 Key: SOLR-1113
 URL: https://issues.apache.org/jira/browse/SOLR-1113
 Project: Solr
  Issue Type: Improvement
  Components: update
Reporter: Fergus McMenemie


The ExtractingRequestHandler rejects documents that are larger than the 
configured multipartUploadLimitInKB in solrconfig.xml. None of the generated 
error messages indicate the name of the rejected document or provide any way of 
identifying the rejected document. The failure to identify the rejected 
document complicates the middleware used to look after indexes.

Here is the trace produced by a recent version of trunk.

{code}
SEVERE: 
org.apache.commons.fileupload.FileUploadBase$SizeLimitExceededException: the 
request was rejected because its size (4585774) exceeds the configured maximum 
(2097152)
at 
org.apache.commons.fileupload.FileUploadBase$FileItemIteratorImpl.(FileUploadBase.java:914)
at 
org.apache.commons.fileupload.FileUploadBase.getItemIterator(FileUploadBase.java:331)
at 
org.apache.commons.fileupload.FileUploadBase.parseRequest(FileUploadBase.java:349)
at 
org.apache.commons.fileupload.servlet.ServletFileUpload.parseRequest(ServletFileUpload.java:126)
at 
org.apache.solr.servlet.MultipartRequestParser.parseParamsAndFillStreams(SolrRequestParsers.java:343)
at 
org.apache.solr.servlet.StandardRequestParser.parseParamsAndFillStreams(SolrRequestParsers.java:396)
at 
org.apache.solr.servlet.SolrRequestParsers.parse(SolrRequestParsers.java:114)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:217)
at 
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:202)
at 
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:173)
at 
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:213)
at 
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:178)
at 
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:126)
at 
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:105)
{code} 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1099) FieldAnalysisRequestHandler

2009-04-14 Thread Shalin Shekhar Mangar (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12698704#action_12698704
 ] 

Shalin Shekhar Mangar commented on SOLR-1099:
-

{quote}
Well, the AnalysisRequestHandler goal is to handle documents, so basically, you 
send a XML document (same document as you would send for indexing) and the 
handler analyses the fields of the document. So the main difference between the 
two handlers is that the AnalsisRequestHandler enables you to provides a set of 
field names/types and their values to be analysed, while in the 
FieldAnalysisRequestHandler you're mainly targeting just a couple of fields and 
you can only specify one value to be analysed. The other main difference is 
that the AnalysisRequestHandler handles a POST request with an XML request body 
while the FieldAnalysisRequestHandler handles a GET request where all the 
parameters are specified as URL params. 
{quote}

Thanks for clarifying Uri.

{quote}
As I mentioned, the analysis breakdown of the FieldAnalysisRequestHandler is 
more detailed than the AnalysisRequestHandler and this is why I think that some 
refactoring can take place by extracting all the common functionality to a 
parent class for these two classes.
{quote}

I agree. With this coming in, we will have three places which help with 
analysis (analysis.jsp, AnalysisRequestHandler and 
FieldAnalysisRequestHandler). Would you like to take a stab at this before we 
commit?

I doubt our refactoring will change the public API (the request/response 
format) for any of the three. Therefore, I'm fine with refactoring later and 
committing this as-is.

> FieldAnalysisRequestHandler
> ---
>
> Key: SOLR-1099
> URL: https://issues.apache.org/jira/browse/SOLR-1099
> Project: Solr
>  Issue Type: New Feature
>  Components: Analysis
>Affects Versions: 1.3
>Reporter: Uri Boness
>Assignee: Shalin Shekhar Mangar
> Fix For: 1.4
>
> Attachments: FieldAnalysisRequestHandler_incl_test.patch
>
>
> The FieldAnalysisRequestHandler provides the analysis functionality of the 
> web admin page as a service. This handler accepts a filetype/fieldname 
> parameter and a value and as a response returns a breakdown of the analysis 
> process. It is also possible to send a query value which will use the 
> configured query analyzer as well as a showmatch parameter which will then 
> mark every matched token as a match.
> If this handler is added to the code base, I also recommend to rename the 
> current AnalysisRequestHandler to DocumentAnalysisRequestHandler and have 
> them both inherit from one AnalysisRequestHandlerBase class which provides 
> the common functionality of the analysis breakdown and its translation to 
> named lists. This will also enhance the current AnalysisRequestHandler which 
> right now is fairly simplistic.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Build failed in Hudson: Solr-trunk #771

2009-04-14 Thread Apache Hudson Server
See http://hudson.zones.apache.org/hudson/job/Solr-trunk/771/changes

Changes:

[shalin] SOLR-934 followup -- Make CustomFilter static, remove extra logging 
code, check log level before logging

[shalin] SOLR-934 -- A MailEntityProcessor to enable indexing mails from 
POP/IMAP sources into a solr index

[gsingers] SOLR-804: added lucene misc jar rev 764281

[shalin] SOLR-1059 -- Fixing bug where skipping a row containing nested 
entities did not skip the nested entities. Handling special flag variables is 
in one method now.

[shalin] SOLR-940 followup -- Fix for the trie date test case

[shalin] SOLR-1096 -- Introduced httpConnTimeout and httpReadTimeout in 
replication slave configuration to avoid stalled replication

--
[...truncated 2323 lines...]
init-forrest-entities:

compile-solrj:

compile:

make-manifest:

compile:

compileTests:
[mkdir] Created dir: 
http://hudson.zones.apache.org/hudson/job/Solr-trunk/ws/trunk/contrib/dataimporthandler/target/test-classes
 
[javac] Compiling 24 source files to 
http://hudson.zones.apache.org/hudson/job/Solr-trunk/ws/trunk/contrib/dataimporthandler/target/test-classes
 
[javac] Note: Some input files use unchecked or unsafe operations.
[javac] Note: Recompile with -Xlint:unchecked for details.

testCore:
[junit] Running 
org.apache.solr.handler.dataimport.TestCachedSqlEntityProcessor
[junit] Tests run: 5, Failures: 0, Errors: 0, Time elapsed: 0.378 sec
[junit] Running org.apache.solr.handler.dataimport.TestClobTransformer
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.345 sec
[junit] Running 
org.apache.solr.handler.dataimport.TestContentStreamDataSource
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 2.467 sec
[junit] Running org.apache.solr.handler.dataimport.TestDataConfig
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 1.831 sec
[junit] Running org.apache.solr.handler.dataimport.TestDateFormatTransformer
[junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 0.35 sec
[junit] Running org.apache.solr.handler.dataimport.TestDocBuilder
[junit] Tests run: 5, Failures: 0, Errors: 0, Time elapsed: 0.481 sec
[junit] Running org.apache.solr.handler.dataimport.TestDocBuilder2
[junit] Tests run: 9, Failures: 0, Errors: 0, Time elapsed: 4.675 sec
[junit] Running org.apache.solr.handler.dataimport.TestEntityProcessorBase
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.358 sec
[junit] Running org.apache.solr.handler.dataimport.TestErrorHandling
[junit] Tests run: 4, Failures: 0, Errors: 0, Time elapsed: 3.061 sec
[junit] Running org.apache.solr.handler.dataimport.TestEvaluatorBag
[junit] Tests run: 5, Failures: 0, Errors: 0, Time elapsed: 0.361 sec
[junit] Running org.apache.solr.handler.dataimport.TestFieldReader
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.54 sec
[junit] Running 
org.apache.solr.handler.dataimport.TestFileListEntityProcessor
[junit] Tests run: 3, Failures: 0, Errors: 0, Time elapsed: 0.378 sec
[junit] Running org.apache.solr.handler.dataimport.TestJdbcDataSource
[junit] Tests run: 0, Failures: 0, Errors: 0, Time elapsed: 0.307 sec
[junit] Running 
org.apache.solr.handler.dataimport.TestNumberFormatTransformer
[junit] Tests run: 9, Failures: 0, Errors: 0, Time elapsed: 0.408 sec
[junit] Running 
org.apache.solr.handler.dataimport.TestPlainTextEntityProcessor
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.429 sec
[junit] Running org.apache.solr.handler.dataimport.TestRegexTransformer
[junit] Tests run: 4, Failures: 0, Errors: 0, Time elapsed: 0.358 sec
[junit] Running org.apache.solr.handler.dataimport.TestScriptTransformer
[junit] Tests run: 0, Failures: 0, Errors: 0, Time elapsed: 0.303 sec
[junit] Running org.apache.solr.handler.dataimport.TestSqlEntityProcessor
[junit] Tests run: 4, Failures: 0, Errors: 0, Time elapsed: 0.387 sec
[junit] Running org.apache.solr.handler.dataimport.TestSqlEntityProcessor2
[junit] Tests run: 4, Failures: 0, Errors: 0, Time elapsed: 2.941 sec
[junit] Running org.apache.solr.handler.dataimport.TestTemplateString
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.324 sec
[junit] Running org.apache.solr.handler.dataimport.TestTemplateTransformer
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.355 sec
[junit] Running org.apache.solr.handler.dataimport.TestVariableResolver
[junit] Tests run: 8, Failures: 0, Errors: 0, Time elapsed: 0.38 sec
[junit] Running org.apache.solr.handler.dataimport.TestXPathEntityProcessor
[junit] Tests run: 4, Failures: 0, Errors: 0, Time elapsed: 0.677 sec
[junit] Running org.apache.solr.handler.dataimport.TestXPathRecordReader
[junit] Tests run: 13, Failures: 0, Errors: 0, Time elapsed: 0.476 sec

compileExtras:

compileExtrasTests:
[mkd

Solr nightly build failure

2009-04-14 Thread solr-dev

init-forrest-entities:
[mkdir] Created dir: /tmp/apache-solr-nightly/build
[mkdir] Created dir: /tmp/apache-solr-nightly/build/web

compile-solrj:
[mkdir] Created dir: /tmp/apache-solr-nightly/build/solrj
[javac] Compiling 76 source files to /tmp/apache-solr-nightly/build/solrj
[javac] Note: Some input files use or override a deprecated API.
[javac] Note: Recompile with -Xlint:deprecation for details.
[javac] Note: Some input files use unchecked or unsafe operations.
[javac] Note: Recompile with -Xlint:unchecked for details.

compile:
[mkdir] Created dir: /tmp/apache-solr-nightly/build/solr
[javac] Compiling 365 source files to /tmp/apache-solr-nightly/build/solr
[javac] Note: Some input files use or override a deprecated API.
[javac] Note: Recompile with -Xlint:deprecation for details.
[javac] Note: Some input files use unchecked or unsafe operations.
[javac] Note: Recompile with -Xlint:unchecked for details.

compileTests:
[mkdir] Created dir: /tmp/apache-solr-nightly/build/tests
[javac] Compiling 150 source files to /tmp/apache-solr-nightly/build/tests
[javac] Note: Some input files use or override a deprecated API.
[javac] Note: Recompile with -Xlint:deprecation for details.
[javac] Note: Some input files use unchecked or unsafe operations.
[javac] Note: Recompile with -Xlint:unchecked for details.

junit:
[mkdir] Created dir: /tmp/apache-solr-nightly/build/test-results
[junit] Running org.apache.solr.BasicFunctionalityTest
[junit] Tests run: 19, Failures: 0, Errors: 0, Time elapsed: 17.464 sec
[junit] Running org.apache.solr.ConvertedLegacyTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 11.11 sec
[junit] Running org.apache.solr.DisMaxRequestHandlerTest
[junit] Tests run: 3, Failures: 0, Errors: 0, Time elapsed: 4.847 sec
[junit] Running org.apache.solr.EchoParamsTest
[junit] Tests run: 4, Failures: 0, Errors: 0, Time elapsed: 2.219 sec
[junit] Running org.apache.solr.OutputWriterTest
[junit] Tests run: 3, Failures: 0, Errors: 0, Time elapsed: 2.699 sec
[junit] Running org.apache.solr.SampleTest
[junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 2.315 sec
[junit] Running org.apache.solr.SolrInfoMBeanTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.861 sec
[junit] Running org.apache.solr.TestDistributedSearch
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 25.714 sec
[junit] Running org.apache.solr.TestTrie
[junit] Tests run: 7, Failures: 0, Errors: 0, Time elapsed: 6.842 sec
[junit] Running org.apache.solr.analysis.DoubleMetaphoneFilterFactoryTest
[junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 0.326 sec
[junit] Running org.apache.solr.analysis.DoubleMetaphoneFilterTest
[junit] Tests run: 6, Failures: 0, Errors: 0, Time elapsed: 0.321 sec
[junit] Running org.apache.solr.analysis.EnglishPorterFilterFactoryTest
[junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 1.281 sec
[junit] Running org.apache.solr.analysis.HTMLStripReaderTest
[junit] Tests run: 9, Failures: 0, Errors: 0, Time elapsed: 0.667 sec
[junit] Running org.apache.solr.analysis.LengthFilterTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.882 sec
[junit] Running org.apache.solr.analysis.SnowballPorterFilterFactoryTest
[junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 1.354 sec
[junit] Running org.apache.solr.analysis.TestBufferedTokenStream
[junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 0.963 sec
[junit] Running org.apache.solr.analysis.TestCapitalizationFilter
[junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 1.585 sec
[junit] Running org.apache.solr.analysis.TestCharFilter
[junit] Tests run: 4, Failures: 0, Errors: 0, Time elapsed: 0.315 sec
[junit] Running org.apache.solr.analysis.TestHyphenatedWordsFilter
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 1.112 sec
[junit] Running org.apache.solr.analysis.TestKeepFilterFactory
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 1.877 sec
[junit] Running org.apache.solr.analysis.TestKeepWordFilter
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.892 sec
[junit] Running org.apache.solr.analysis.TestMappingCharFilter
[junit] Tests run: 11, Failures: 0, Errors: 0, Time elapsed: 0.34 sec
[junit] Running org.apache.solr.analysis.TestMappingCharFilterFactory
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.334 sec
[junit] Running org.apache.solr.analysis.TestPatternReplaceFilter
[junit] Tests run: 5, Failures: 0, Errors: 0, Time elapsed: 1.213 sec
[junit] Running org.apache.solr.analysis.TestPatternTokenizerFactory
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.723 sec
[junit] Running org.apache.solr.analysis.TestPhoneticFilter
  

[jira] Commented: (SOLR-1099) FieldAnalysisRequestHandler

2009-04-14 Thread Uri Boness (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12698696#action_12698696
 ] 

Uri Boness commented on SOLR-1099:
--

Well, the AnalysisRequestHandler goal is to handle documents, so basically, you 
send a XML document (same document as you would send for indexing) and the 
handler analyses the fields of the document. So the main difference between the 
two handlers is that the AnalsisRequestHandler enables you to provides a set of 
field names/types and their values to be analysed, while in the 
FieldAnalysisRequestHandler you're mainly targeting just a couple of fields and 
you can only specify one value to be analysed. The other main difference is 
that the AnalysisRequestHandler handles a POST request with an XML request body 
while the FieldAnalysisRequestHandler handles a GET request where all the 
parameters are specified as URL params. 

As I mentioned, the analysis breakdown of the FieldAnalysisRequestHandler is 
more detailed than the AnalysisRequestHandler and this is why I think that some 
refactoring can take place by extracting all the common functionality to a 
parent class for these two classes.

> FieldAnalysisRequestHandler
> ---
>
> Key: SOLR-1099
> URL: https://issues.apache.org/jira/browse/SOLR-1099
> Project: Solr
>  Issue Type: New Feature
>  Components: Analysis
>Affects Versions: 1.3
>Reporter: Uri Boness
>Assignee: Shalin Shekhar Mangar
> Fix For: 1.4
>
> Attachments: FieldAnalysisRequestHandler_incl_test.patch
>
>
> The FieldAnalysisRequestHandler provides the analysis functionality of the 
> web admin page as a service. This handler accepts a filetype/fieldname 
> parameter and a value and as a response returns a breakdown of the analysis 
> process. It is also possible to send a query value which will use the 
> configured query analyzer as well as a showmatch parameter which will then 
> mark every matched token as a match.
> If this handler is added to the code base, I also recommend to rename the 
> current AnalysisRequestHandler to DocumentAnalysisRequestHandler and have 
> them both inherit from one AnalysisRequestHandlerBase class which provides 
> the common functionality of the analysis breakdown and its translation to 
> named lists. This will also enhance the current AnalysisRequestHandler which 
> right now is fairly simplistic.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1099) FieldAnalysisRequestHandler

2009-04-14 Thread Shalin Shekhar Mangar (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12698681#action_12698681
 ] 

Shalin Shekhar Mangar commented on SOLR-1099:
-

This looks great Uri.

I'm yet to look completely into the patch. But is there anything in the 
AnalysisRequestHandler which is not there in this patch? If not, does it make 
sense to just deprecate AnalysisRequestHandler and use this instead?

> FieldAnalysisRequestHandler
> ---
>
> Key: SOLR-1099
> URL: https://issues.apache.org/jira/browse/SOLR-1099
> Project: Solr
>  Issue Type: New Feature
>  Components: Analysis
>Affects Versions: 1.3
>Reporter: Uri Boness
>Assignee: Shalin Shekhar Mangar
> Fix For: 1.4
>
> Attachments: FieldAnalysisRequestHandler_incl_test.patch
>
>
> The FieldAnalysisRequestHandler provides the analysis functionality of the 
> web admin page as a service. This handler accepts a filetype/fieldname 
> parameter and a value and as a response returns a breakdown of the analysis 
> process. It is also possible to send a query value which will use the 
> configured query analyzer as well as a showmatch parameter which will then 
> mark every matched token as a match.
> If this handler is added to the code base, I also recommend to rename the 
> current AnalysisRequestHandler to DocumentAnalysisRequestHandler and have 
> them both inherit from one AnalysisRequestHandlerBase class which provides 
> the common functionality of the analysis breakdown and its translation to 
> named lists. This will also enhance the current AnalysisRequestHandler which 
> right now is fairly simplistic.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-1099) FieldAnalysisRequestHandler

2009-04-14 Thread Shalin Shekhar Mangar (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1099?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shalin Shekhar Mangar updated SOLR-1099:


  Component/s: (was: search)
   Analysis
Fix Version/s: (was: 1.3.1)
   1.4
 Assignee: Shalin Shekhar Mangar

> FieldAnalysisRequestHandler
> ---
>
> Key: SOLR-1099
> URL: https://issues.apache.org/jira/browse/SOLR-1099
> Project: Solr
>  Issue Type: New Feature
>  Components: Analysis
>Affects Versions: 1.3
>Reporter: Uri Boness
>Assignee: Shalin Shekhar Mangar
> Fix For: 1.4
>
> Attachments: FieldAnalysisRequestHandler_incl_test.patch
>
>
> The FieldAnalysisRequestHandler provides the analysis functionality of the 
> web admin page as a service. This handler accepts a filetype/fieldname 
> parameter and a value and as a response returns a breakdown of the analysis 
> process. It is also possible to send a query value which will use the 
> configured query analyzer as well as a showmatch parameter which will then 
> mark every matched token as a match.
> If this handler is added to the code base, I also recommend to rename the 
> current AnalysisRequestHandler to DocumentAnalysisRequestHandler and have 
> them both inherit from one AnalysisRequestHandlerBase class which provides 
> the common functionality of the analysis breakdown and its translation to 
> named lists. This will also enhance the current AnalysisRequestHandler which 
> right now is fairly simplistic.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-934) Enable importing of mails into a solr index through DIH.

2009-04-14 Thread Shalin Shekhar Mangar (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12698676#action_12698676
 ] 

Shalin Shekhar Mangar commented on SOLR-934:


Committed revision 764691.

> Enable importing of mails into a solr index through DIH.
> 
>
> Key: SOLR-934
> URL: https://issues.apache.org/jira/browse/SOLR-934
> Project: Solr
>  Issue Type: New Feature
>  Components: contrib - DataImportHandler
>Affects Versions: 1.4
>Reporter: Preetam Rao
>Assignee: Shalin Shekhar Mangar
> Fix For: 1.4
>
> Attachments: SOLR-934.patch, SOLR-934.patch, SOLR-934.patch, 
> SOLR-934.patch, SOLR-934.patch, SOLR-934.patch, SOLR-934.patch
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> Enable importing of mails into solr through DIH. Take one or more mailbox 
> credentials, download and index their content along with the content from 
> attachments. The folders to fetch can be made configurable based on various 
> criteria. Apache Tika is used for extracting content from different kinds of 
> attachments. JavaMail is used for mail box related operations like fetching 
> mails, filtering them etc.
> The basic configuration for one mail box is as below:
> {code:xml}
> 
> password="something" host="imap.gmail.com" protocol="imaps"/>
> 
> {code}
> The below is the list of all configuration available:
> {color:green}Required{color}
> -
> *user* 
> *pwd* 
> *protocol*  (only "imaps" supported now)
> *host* 
> {color:green}Optional{color}
> -
> *folders* - comma seperated list of folders. 
> If not specified, default folder is used. Nested folders can be specified 
> like a/b/c
> *recurse* - index subfolders. Defaults to true.
> *exclude* - comma seperated list of patterns. 
> *include* - comma seperated list of patterns.
> *batchSize* - mails to fetch at once in a given folder. 
> Only headers can be prefetched in Javamail IMAP.
> *readTimeout* - defaults to 6ms
> *conectTimeout* - defaults to 3ms
> *fetchSize* - IMAP config. 32KB default
> *fetchMailsSince* -
> date/time in "-MM-dd HH:mm:ss" format, mails received after which will be 
> fetched. Useful for delta import.
> *customFilter* - class name.  
> {code}
> import javax.mail.Folder;
> import javax.mail.SearchTerm;
> clz implements MailEntityProcessor.CustomFilter() {
> public SearchTerm getCustomSearch(Folder folder);
> }
> {code}
> *processAttachement* - defaults to true
> The below are the indexed fields.
> {code}
>   // Fields To Index
>   // single valued
>   private static final String SUBJECT = "subject";
>   private static final String FROM = "from";
>   private static final String SENT_DATE = "sentDate";
>   private static final String XMAILER = "xMailer";
>   // multi valued
>   private static final String TO_CC_BCC = "allTo";
>   private static final String FLAGS = "flags";
>   private static final String CONTENT = "content";
>   private static final String ATTACHMENT = "attachement";
>   private static final String ATTACHMENT_NAMES = "attachementNames";
>   // flag values
>   private static final String FLAG_ANSWERED = "answered";
>   private static final String FLAG_DELETED = "deleted";
>   private static final String FLAG_DRAFT = "draft";
>   private static final String FLAG_FLAGGED = "flagged";
>   private static final String FLAG_RECENT = "recent";
>   private static final String FLAG_SEEN = "seen";
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-934) Enable importing of mails into a solr index through DIH.

2009-04-14 Thread Shalin Shekhar Mangar (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shalin Shekhar Mangar updated SOLR-934:
---

Attachment: SOLR-934.patch

A few changes in this patch

# Made the CustomFilter interface static
# Removed logRow method. LogTransformer can be used if needed
# logConfig first checks if info level is enabled or not

I'll commit shortly.

> Enable importing of mails into a solr index through DIH.
> 
>
> Key: SOLR-934
> URL: https://issues.apache.org/jira/browse/SOLR-934
> Project: Solr
>  Issue Type: New Feature
>  Components: contrib - DataImportHandler
>Affects Versions: 1.4
>Reporter: Preetam Rao
>Assignee: Shalin Shekhar Mangar
> Fix For: 1.4
>
> Attachments: SOLR-934.patch, SOLR-934.patch, SOLR-934.patch, 
> SOLR-934.patch, SOLR-934.patch, SOLR-934.patch, SOLR-934.patch
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> Enable importing of mails into solr through DIH. Take one or more mailbox 
> credentials, download and index their content along with the content from 
> attachments. The folders to fetch can be made configurable based on various 
> criteria. Apache Tika is used for extracting content from different kinds of 
> attachments. JavaMail is used for mail box related operations like fetching 
> mails, filtering them etc.
> The basic configuration for one mail box is as below:
> {code:xml}
> 
> password="something" host="imap.gmail.com" protocol="imaps"/>
> 
> {code}
> The below is the list of all configuration available:
> {color:green}Required{color}
> -
> *user* 
> *pwd* 
> *protocol*  (only "imaps" supported now)
> *host* 
> {color:green}Optional{color}
> -
> *folders* - comma seperated list of folders. 
> If not specified, default folder is used. Nested folders can be specified 
> like a/b/c
> *recurse* - index subfolders. Defaults to true.
> *exclude* - comma seperated list of patterns. 
> *include* - comma seperated list of patterns.
> *batchSize* - mails to fetch at once in a given folder. 
> Only headers can be prefetched in Javamail IMAP.
> *readTimeout* - defaults to 6ms
> *conectTimeout* - defaults to 3ms
> *fetchSize* - IMAP config. 32KB default
> *fetchMailsSince* -
> date/time in "-MM-dd HH:mm:ss" format, mails received after which will be 
> fetched. Useful for delta import.
> *customFilter* - class name.  
> {code}
> import javax.mail.Folder;
> import javax.mail.SearchTerm;
> clz implements MailEntityProcessor.CustomFilter() {
> public SearchTerm getCustomSearch(Folder folder);
> }
> {code}
> *processAttachement* - defaults to true
> The below are the indexed fields.
> {code}
>   // Fields To Index
>   // single valued
>   private static final String SUBJECT = "subject";
>   private static final String FROM = "from";
>   private static final String SENT_DATE = "sentDate";
>   private static final String XMAILER = "xMailer";
>   // multi valued
>   private static final String TO_CC_BCC = "allTo";
>   private static final String FLAGS = "flags";
>   private static final String CONTENT = "content";
>   private static final String ATTACHMENT = "attachement";
>   private static final String ATTACHMENT_NAMES = "attachementNames";
>   // flag values
>   private static final String FLAG_ANSWERED = "answered";
>   private static final String FLAG_DELETED = "deleted";
>   private static final String FLAG_DRAFT = "draft";
>   private static final String FLAG_FLAGGED = "flagged";
>   private static final String FLAG_RECENT = "recent";
>   private static final String FLAG_SEEN = "seen";
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1105) Using external field content for highlighting

2009-04-14 Thread Shalin Shekhar Mangar (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12698672#action_12698672
 ] 

Shalin Shekhar Mangar commented on SOLR-1105:
-

Instead of baking this into the schema, should this be turned on/off through a 
request parameter?

> Using external field content for highlighting
> -
>
> Key: SOLR-1105
> URL: https://issues.apache.org/jira/browse/SOLR-1105
> Project: Solr
>  Issue Type: Improvement
>  Components: highlighter
>Affects Versions: 1.3
>Reporter: Dmitry Lihachev
> Fix For: 1.3.1
>
> Attachments: SOLR-1105_shared_content_field_1.3.0.patch
>
>
> DefaultSolrHighlighter uses stored field content to highlight. It has some 
> disadvantages, because index grows up fast when using multilingual indexing 
> due to several fields has to be stored with same content. This patch allows 
> DefaultSolrHighlighter to use "contentField" attribute to loockup content in 
> external field.
> Excerpt from old schema:
> {code:xml}
> 
> 
> 
> 
> {code}
> The same after patching, highlighter will now get content stored in "title" 
> field
> {code:xml}
> 
>  contentField="title"/>
>  contentField="title"/>
>  contentField="title"/>
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.