Re: SOLR-1106 - Custom Admin Action handler
On Mon, Apr 13, 2009 at 10:03 PM, Kay Kay wrote: > These custom action handlers need not be residing in solr . Hence I needed a > hook ( listener ) that they can register themselves with and be loaded by > the SolrResourceLoader ( ./lib/*.jar ) . Also I believe the default > handlers are very useful , necessary and mandatory and hence ported them to > the listener for consistency purposes. > > Also - if we have a protected method called invokeCommand() - how do we > inject that type as the admin handler ( as opposed to CoreAdminHandler) . > Right now - the type information seems hardcoded in CoreContainer though. There is no mean to inject that currently, But that can be made possible by an extra attribute in the tag . say We will have to refactor the code a bit so that you may be able to extend the default core admin handler > > // Multicore self related methods --- > /** > * Creates a CoreAdminHandler for this MultiCore. > * @return a CoreAdminHandler > */ > protected CoreAdminHandler createMultiCoreHandler() { > return new CoreAdminHandler() { > �...@override > public CoreContainer getCoreContainer() { > return CoreContainer.this; > } > }; > } > > > 2009/4/13 Noble Paul നോബിള് नोब्ळ् > >> Hi Kay, >> >> The idea of one handler per command looks like an overkill. How about >> having a protected methods for all the known commands and have a >> separate method invokeCommand() which can choose to implement any >> extra commands if need be. This way the changes needed would be >> minimal. >> >> On Mon, Apr 13, 2009 at 8:53 PM, Kay Kay wrote: >> > For one of our projects - we need custom admin monitoring hooks that gets >> > access to multiple cores for a given solr web app (through the >> CoreContainer >> > interface). >> > >> > There are common admin handler commands with the actions - register / >> swap / >> > load etc. that seem to be available by default. >> > >> > I have submitted a patch to add custom admin handlers , against custom >> > actions ( that also refactors the existing action handlers that are >> > available by default as well ). >> > >> > This would be useful to extend the handlers that need access to multiple >> > cores. Just curious if this is something that could be looked into . >> > Thanks. >> > >> >> >> >> -- >> --Noble Paul >> > -- --Noble Paul
[jira] Updated: (SOLR-599) Lightweight SolrJ client
[ https://issues.apache.org/jira/browse/SOLR-599?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Noble Paul updated SOLR-599: Attachment: (was: SOLR-599.patch) > Lightweight SolrJ client > > > Key: SOLR-599 > URL: https://issues.apache.org/jira/browse/SOLR-599 > Project: Solr > Issue Type: Improvement > Components: clients - java >Affects Versions: 1.3 >Reporter: Shalin Shekhar Mangar >Priority: Minor > Attachments: SOLR-599.patch > > > SolrJ provides a SolrServer implementation backed by commons-httpclient which > introduces many dependency jars (commons-codec, commons-io and > commons-logging). Apart from that SolrJ also uses StAX API for XML parsing > which introduces dependencies like stax-api, stax and stax-utils. > This enhancement will add a SolrServer implementation backed by > java.net.HttpUrlConnection and will use BinaryResponseParser as the default > response parser. Using this basic implementation out of the box would require > no dependencies on either commons-httpclient or StAX. The only dependency > would be on solr-commons making this a very lightweight and distribution > friendly Java client for Solr. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (SOLR-599) Lightweight SolrJ client
[ https://issues.apache.org/jira/browse/SOLR-599?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Noble Paul updated SOLR-599: Attachment: SOLR-599.patch > Lightweight SolrJ client > > > Key: SOLR-599 > URL: https://issues.apache.org/jira/browse/SOLR-599 > Project: Solr > Issue Type: Improvement > Components: clients - java >Affects Versions: 1.3 >Reporter: Shalin Shekhar Mangar >Priority: Minor > Attachments: SOLR-599.patch, SOLR-599.patch > > > SolrJ provides a SolrServer implementation backed by commons-httpclient which > introduces many dependency jars (commons-codec, commons-io and > commons-logging). Apart from that SolrJ also uses StAX API for XML parsing > which introduces dependencies like stax-api, stax and stax-utils. > This enhancement will add a SolrServer implementation backed by > java.net.HttpUrlConnection and will use BinaryResponseParser as the default > response parser. Using this basic implementation out of the box would require > no dependencies on either commons-httpclient or StAX. The only dependency > would be on solr-commons making this a very lightweight and distribution > friendly Java client for Solr. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (SOLR-599) Lightweight SolrJ client
[ https://issues.apache.org/jira/browse/SOLR-599?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Noble Paul updated SOLR-599: Attachment: SOLR-599.patch untested patch . > Lightweight SolrJ client > > > Key: SOLR-599 > URL: https://issues.apache.org/jira/browse/SOLR-599 > Project: Solr > Issue Type: Improvement > Components: clients - java >Affects Versions: 1.3 >Reporter: Shalin Shekhar Mangar >Priority: Minor > Attachments: SOLR-599.patch > > > SolrJ provides a SolrServer implementation backed by commons-httpclient which > introduces many dependency jars (commons-codec, commons-io and > commons-logging). Apart from that SolrJ also uses StAX API for XML parsing > which introduces dependencies like stax-api, stax and stax-utils. > This enhancement will add a SolrServer implementation backed by > java.net.HttpUrlConnection and will use BinaryResponseParser as the default > response parser. Using this basic implementation out of the box would require > no dependencies on either commons-httpclient or StAX. The only dependency > would be on solr-commons making this a very lightweight and distribution > friendly Java client for Solr. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Hudson build is back to normal: Solr-trunk #772
See http://hudson.zones.apache.org/hudson/job/Solr-trunk/772/changes
[jira] Updated: (SOLR-1115) on and yes should be acceptable in solrconfig.xml
[ https://issues.apache.org/jira/browse/SOLR-1115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Koji Sekiguchi updated SOLR-1115: - Attachment: SOLR-1115.patch The patch attached. I'll commit shortly. > on and yes should be acceptable in solrconfig.xml > --- > > Key: SOLR-1115 > URL: https://issues.apache.org/jira/browse/SOLR-1115 > Project: Solr > Issue Type: Improvement >Affects Versions: 1.2, 1.3 >Reporter: Koji Sekiguchi >Priority: Trivial > Fix For: 1.4 > > Attachments: SOLR-1115.patch > > > snipoff from here: > http://www.nabble.com/parsing-bool-type-in-solrconfig.xml-td23025954.html -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Re: parsing bool type in solrconfig.xml
Erik Hatcher wrote: +1 also. on/yes/true should all work for boolean parameters (just like Ant ;) Erik On Apr 14, 2009, at 2:01 AM, Shalin Shekhar Mangar wrote: 2009/4/13 Koji Sekiguchi Should we accept not only true, but also on and yes? I think it is easy by using parseBool() instead of Boolean.valueOf() in DOMUtil. +1 I know it is inconsistent but so are the request parameters like hl, debugQuery etc which I doubt will be changed. -- Regards, Shalin Shekhar Mangar. Thanks guys. I opened https://issues.apache.org/jira/browse/SOLR-1115 . Koji
[jira] Created: (SOLR-1115) on and yes should be acceptable in solrconfig.xml
on and yes should be acceptable in solrconfig.xml --- Key: SOLR-1115 URL: https://issues.apache.org/jira/browse/SOLR-1115 Project: Solr Issue Type: Improvement Affects Versions: 1.3, 1.2 Reporter: Koji Sekiguchi Priority: Trivial Fix For: 1.4 snipoff from here: http://www.nabble.com/parsing-bool-type-in-solrconfig.xml-td23025954.html -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (SOLR-1114) Re-organize examples directory keeping core and contribs in mind
Re-organize examples directory keeping core and contribs in mind Key: SOLR-1114 URL: https://issues.apache.org/jira/browse/SOLR-1114 Project: Solr Issue Type: Improvement Reporter: Shalin Shekhar Mangar Re-organize examples directory keeping core and contribs in mind. >From Grant on solr-dev: {quote} The templates directory would contain the configurations (i.e. schema.xml and solrconfig.xml) and any sample docs (but not the libraries) for: tutorial - The current tutorial example dih - The DIH example extraction - Solr Cell example geo - geo spatial example (once 773 is committed) clustering - once SOLR-769 is committed simple - A barebones schema and config (mainly used for bootstrapping a new project for experienced users) exploratory - Basically, the same as simple, but the schema defines a single dynamic field - Think of Hoss's Solr Out of the Box talk from ApacheCon whereby you want to quickly explore a new data set without having to define a schema. [other] - Note, the templates directory could also live under each contrib, but it isn't necessarily a 1-1 thing (e.g. simple and exploratory templates are not contrib-specific). Then, typing "ant example" would copy the necessary tutorial stuff to the example directory (which still contains the Jetty stuff) but would not have to recurse into any of the contribs. Typing "ant example -Dtype=clustering" would copy the clustering requirements, plus go to contrib/clustering (or whatever) and get the appropriate material such that the example directory. Similarly for any of the other "templates" Additionally, you could also define -DoutputDir such that it would take and copy the whole example directory (including the appropriate type) to some output dir. This would allow one to quickly bootstrap a Solr project without having to do a lot of schema editing. {quote} http://markmail.org/thread/w6da7pwhcsdn43n3 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1060) a new DIH EnityProcessor allowing text file lists of files to be indexed
[ https://issues.apache.org/jira/browse/SOLR-1060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12698767#action_12698767 ] Fergus McMenemie commented on SOLR-1060: Hmmm, Are you referring to the fragment of code inside ChangeListEntityProcessor that opens the changelist, and its similarity to the functionality in URIDataSource? I had not thought about arranging some kind of nested use of URIDataSource... is that what you are thinking about? > a new DIH EnityProcessor allowing text file lists of files to be indexed > > > Key: SOLR-1060 > URL: https://issues.apache.org/jira/browse/SOLR-1060 > Project: Solr > Issue Type: New Feature > Components: contrib - DataImportHandler >Affects Versions: 1.4 >Reporter: Fergus McMenemie >Assignee: Shalin Shekhar Mangar > Fix For: 1.4 > > Attachments: regex-fix.patch, SOLR-1060.patch, SOLR-1060.patch, > SOLR-1060.patch, SOLR-1060.patch, SOLR-1060.patch > > Original Estimate: 120h > Remaining Estimate: 120h > > I have finished a new DIH EntityProcessor. It is designed around the idea > that whatever demon is used to maintain your content store it is likely to > drop a report or log file explaining what has changed within your content > store. I wish to use this report file to control the indexing of the new or > changed content and the removal of old content. The report files, perhaps > from un-tar or un-zip, are likely to reference jpegs and directory stubs > which need to be ignored. I assumed a file based content repository but this > should be expanded to handle URI's as well > I feel that the current FileListEntityProcessor is poorly named. It should be > called the dirWalkEntityProcessor or dirCrawlEntityProcessor or such. And > this new EntityProcessor should have the name FileListEntityProcessor. > However what is done is done. I then came up with manifestEnityProcessor > which I thought suited, manifest files are all over the content sets I deal > with and the dictionary definition seemed close enough ("ships manifest"). > However how about ChangeListEntityProcessor > {code} >processor="ManifestEntityProcessor" >baseDir="/Volumes/Techmore/ts/aaa/schema/data" >rootEntity="false" >dataSource="null" >allowRegex="^.*\.xml$" >blockRegex="usc2009" >manifestFileName="/Volumes/ts/man-find.txt" >docAddRegex=".*" >> > {code} > The new entity fields are as follows. > >*manifestFileName* is the required location of the manifest file. If this > value is relative, it assumed to be relative to baseDir. >*allowRegex* is an optional attribute that if present discards any line > which does not match the regExp > >*blockRegex* is an optional attribute that is applied after any allowRegex > and discards any line which matches the regExp >*docAddRegex* is a required regex to identify lines which when matched > should cause docs to be added to the index. As well as matching the line it > should also return the portion of the line which contains the filepath as > group(1) >*docDeleteRegex* is an optional value of a regex to identify documents > which when matched should be deleted from the index. As well as matching the > line it should also return the portion of the line which contains the > filepath as group(1) **PLANNED** -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Re: parsing bool type in solrconfig.xml
+1 also. on/yes/true should all work for boolean parameters (just like Ant ;) Erik On Apr 14, 2009, at 2:01 AM, Shalin Shekhar Mangar wrote: 2009/4/13 Koji Sekiguchi Should we accept not only true, but also on and yes? I think it is easy by using parseBool() instead of Boolean.valueOf() in DOMUtil. +1 I know it is inconsistent but so are the request parameters like hl, debugQuery etc which I doubt will be changed. -- Regards, Shalin Shekhar Mangar.
[jira] Commented: (SOLR-1099) FieldAnalysisRequestHandler
[ https://issues.apache.org/jira/browse/SOLR-1099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12698726#action_12698726 ] Shalin Shekhar Mangar commented on SOLR-1099: - {quote}I hope this makes things a bit clearer {quote} Crystal clear! Thanks Uri! > FieldAnalysisRequestHandler > --- > > Key: SOLR-1099 > URL: https://issues.apache.org/jira/browse/SOLR-1099 > Project: Solr > Issue Type: New Feature > Components: Analysis >Affects Versions: 1.3 >Reporter: Uri Boness >Assignee: Shalin Shekhar Mangar > Fix For: 1.4 > > Attachments: FieldAnalysisRequestHandler_incl_test.patch > > > The FieldAnalysisRequestHandler provides the analysis functionality of the > web admin page as a service. This handler accepts a filetype/fieldname > parameter and a value and as a response returns a breakdown of the analysis > process. It is also possible to send a query value which will use the > configured query analyzer as well as a showmatch parameter which will then > mark every matched token as a match. > If this handler is added to the code base, I also recommend to rename the > current AnalysisRequestHandler to DocumentAnalysisRequestHandler and have > them both inherit from one AnalysisRequestHandlerBase class which provides > the common functionality of the analysis breakdown and its translation to > named lists. This will also enhance the current AnalysisRequestHandler which > right now is fairly simplistic. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1099) FieldAnalysisRequestHandler
[ https://issues.apache.org/jira/browse/SOLR-1099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12698723#action_12698723 ] Uri Boness commented on SOLR-1099: -- {quote} We copy AnalysisRequestHandler (ARH) to DocumentAnalysisRequestHandler and deprecate ARH. {quote} true, but it will be enhanced with functionality and support more extensive analysis breakdown (e.g. adding a query analysis and showmatch support) {quote} We extract common code (if any) of ARH and FieldARH in to a base class AnalysisRequestHandlerBase, as you suggested {quote} true {quote} We modify analysis.jsp to use FieldARH (maybe as a separate issue) {quote} probably a separate issue is more appropriate. {quote} You do not need to support AnalysisRequestHandler's format because it will also exist by the name of DocumentAnalysisRequestHandler. Since FieldARH is a new handler, it does not need to be back-compatible with ARH. Supporting the old format is a nice-to-have feature but not necessary. {quote} True. The old AnalysisRequestHandler will be deprecated and it's (enhanced) functionality will be available via the DocumentAnalysisRequestHandler. That said, it would be nice to be backward compatible as much as possible for those who are using the old ARH already (I suspect not many are using it anyway as it's mostly used for tooling and debugging). I do believe that both the new DocumentARH and the FieldARH are useful for different purposes due the nature of their differences as I mentioned above. I hope this makes things a bit clearer :-) > FieldAnalysisRequestHandler > --- > > Key: SOLR-1099 > URL: https://issues.apache.org/jira/browse/SOLR-1099 > Project: Solr > Issue Type: New Feature > Components: Analysis >Affects Versions: 1.3 >Reporter: Uri Boness >Assignee: Shalin Shekhar Mangar > Fix For: 1.4 > > Attachments: FieldAnalysisRequestHandler_incl_test.patch > > > The FieldAnalysisRequestHandler provides the analysis functionality of the > web admin page as a service. This handler accepts a filetype/fieldname > parameter and a value and as a response returns a breakdown of the analysis > process. It is also possible to send a query value which will use the > configured query analyzer as well as a showmatch parameter which will then > mark every matched token as a match. > If this handler is added to the code base, I also recommend to rename the > current AnalysisRequestHandler to DocumentAnalysisRequestHandler and have > them both inherit from one AnalysisRequestHandlerBase class which provides > the common functionality of the analysis breakdown and its translation to > named lists. This will also enhance the current AnalysisRequestHandler which > right now is fairly simplistic. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1099) FieldAnalysisRequestHandler
[ https://issues.apache.org/jira/browse/SOLR-1099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12698720#action_12698720 ] Shalin Shekhar Mangar commented on SOLR-1099: - {quote} The only change to the original structure will happen when more parameters will be sent, for example, when a query analysis takes place and a "showmatch=true" is sent then each matched token will be marked as a "match". I'll have to have a closer look at the current response of the AnalysisRequestHandler and see if I can support the exact same structure my gut feeling is that it's possible. {quote} This is the part that I do not understand. Let me outline what I understood: # We copy AnalysisRequestHandler (ARH) to DocumentAnalysisRequestHandler and deprecate ARH. # We extract common code (if any) of ARH and FieldARH in to a base class AnalysisRequestHandlerBase, as you suggested # We modify analysis.jsp to use FieldARH (maybe as a separate issue) You do not need to support AnalysisRequestHandler's format because it will also exist by the name of DocumentAnalysisRequestHandler. Since FieldARH is a new handler, it does not need to be back-compatible with ARH. Supporting the old format is a nice-to-have feature but not necessary. > FieldAnalysisRequestHandler > --- > > Key: SOLR-1099 > URL: https://issues.apache.org/jira/browse/SOLR-1099 > Project: Solr > Issue Type: New Feature > Components: Analysis >Affects Versions: 1.3 >Reporter: Uri Boness >Assignee: Shalin Shekhar Mangar > Fix For: 1.4 > > Attachments: FieldAnalysisRequestHandler_incl_test.patch > > > The FieldAnalysisRequestHandler provides the analysis functionality of the > web admin page as a service. This handler accepts a filetype/fieldname > parameter and a value and as a response returns a breakdown of the analysis > process. It is also possible to send a query value which will use the > configured query analyzer as well as a showmatch parameter which will then > mark every matched token as a match. > If this handler is added to the code base, I also recommend to rename the > current AnalysisRequestHandler to DocumentAnalysisRequestHandler and have > them both inherit from one AnalysisRequestHandlerBase class which provides > the common functionality of the analysis breakdown and its translation to > named lists. This will also enhance the current AnalysisRequestHandler which > right now is fairly simplistic. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1099) FieldAnalysisRequestHandler
[ https://issues.apache.org/jira/browse/SOLR-1099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12698713#action_12698713 ] Uri Boness commented on SOLR-1099: -- {quote} I was assuming that the output format of AnalysisRequestHandler and FieldAnalysisRequestHandler remains exactly as they are today and the refactoring is just to abstract common code into a base class. {quote} {quote} Agreed. But the output of DocumentAnalysisRequestHandler will look exactly like what AnalysisRequestHandler returns today, right? {quote} You know what... your're right... I think it is possible to keep the same output by default. The only change to the original structure will happen when more parameters will be sent, for example, when a query analysis takes place and a "showmatch=true" is sent then each matched token will be marked as a "match". I'll have to have a closer look at the current response of the AnalysisRequestHandler and see if I can support the exact same structure my gut feeling is that it's possible. I'll start working on it and see where I get > FieldAnalysisRequestHandler > --- > > Key: SOLR-1099 > URL: https://issues.apache.org/jira/browse/SOLR-1099 > Project: Solr > Issue Type: New Feature > Components: Analysis >Affects Versions: 1.3 >Reporter: Uri Boness >Assignee: Shalin Shekhar Mangar > Fix For: 1.4 > > Attachments: FieldAnalysisRequestHandler_incl_test.patch > > > The FieldAnalysisRequestHandler provides the analysis functionality of the > web admin page as a service. This handler accepts a filetype/fieldname > parameter and a value and as a response returns a breakdown of the analysis > process. It is also possible to send a query value which will use the > configured query analyzer as well as a showmatch parameter which will then > mark every matched token as a match. > If this handler is added to the code base, I also recommend to rename the > current AnalysisRequestHandler to DocumentAnalysisRequestHandler and have > them both inherit from one AnalysisRequestHandlerBase class which provides > the common functionality of the analysis breakdown and its translation to > named lists. This will also enhance the current AnalysisRequestHandler which > right now is fairly simplistic. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1099) FieldAnalysisRequestHandler
[ https://issues.apache.org/jira/browse/SOLR-1099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12698711#action_12698711 ] Shalin Shekhar Mangar commented on SOLR-1099: - {quote}The public API for the AnalysisRequestHandler will change in the context of the response. {quote} I was assuming that the output format of AnalysisRequestHandler and FieldAnalysisRequestHandler remains exactly as they are today and the refactoring is just to abstract common code into a base class. {quote} Furthermore, it's probably wise to rename the AnalysisRequestHandler to DocumentAnalysisRequestHandler (more expressive name and also consistent with the FieldAnalysisRequestHandler). Another option is to do this refactoring anyway, and leave the AnalysisRequestHandler as is and only deprecate it. So basically we'll have 4 classes: AnalysisRequestHanlderBase FieldAnalysisRequestHanlder DocumentAnalysisRequestHandler AnalysisRequestHandler (deprecated) {quote} Agreed. But the output of DocumentAnalysisRequestHandler will look exactly like what AnalysisRequestHandler returns today, right? {quote}it would be also be wise to reimplement the anaysis.jsp to use this new handler and clean it up from all the analysis (now duplicate) logic code.{quote} Agreed. > FieldAnalysisRequestHandler > --- > > Key: SOLR-1099 > URL: https://issues.apache.org/jira/browse/SOLR-1099 > Project: Solr > Issue Type: New Feature > Components: Analysis >Affects Versions: 1.3 >Reporter: Uri Boness >Assignee: Shalin Shekhar Mangar > Fix For: 1.4 > > Attachments: FieldAnalysisRequestHandler_incl_test.patch > > > The FieldAnalysisRequestHandler provides the analysis functionality of the > web admin page as a service. This handler accepts a filetype/fieldname > parameter and a value and as a response returns a breakdown of the analysis > process. It is also possible to send a query value which will use the > configured query analyzer as well as a showmatch parameter which will then > mark every matched token as a match. > If this handler is added to the code base, I also recommend to rename the > current AnalysisRequestHandler to DocumentAnalysisRequestHandler and have > them both inherit from one AnalysisRequestHandlerBase class which provides > the common functionality of the analysis breakdown and its translation to > named lists. This will also enhance the current AnalysisRequestHandler which > right now is fairly simplistic. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1060) a new DIH EnityProcessor allowing text file lists of files to be indexed
[ https://issues.apache.org/jira/browse/SOLR-1060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12698710#action_12698710 ] Shalin Shekhar Mangar commented on SOLR-1060: - Fergus, ChangeListEntityProcessor seems to duplicate URIDataSource's functionality instead of using it. Why is that? > a new DIH EnityProcessor allowing text file lists of files to be indexed > > > Key: SOLR-1060 > URL: https://issues.apache.org/jira/browse/SOLR-1060 > Project: Solr > Issue Type: New Feature > Components: contrib - DataImportHandler >Affects Versions: 1.4 >Reporter: Fergus McMenemie >Assignee: Shalin Shekhar Mangar > Fix For: 1.4 > > Attachments: regex-fix.patch, SOLR-1060.patch, SOLR-1060.patch, > SOLR-1060.patch, SOLR-1060.patch, SOLR-1060.patch > > Original Estimate: 120h > Remaining Estimate: 120h > > I have finished a new DIH EntityProcessor. It is designed around the idea > that whatever demon is used to maintain your content store it is likely to > drop a report or log file explaining what has changed within your content > store. I wish to use this report file to control the indexing of the new or > changed content and the removal of old content. The report files, perhaps > from un-tar or un-zip, are likely to reference jpegs and directory stubs > which need to be ignored. I assumed a file based content repository but this > should be expanded to handle URI's as well > I feel that the current FileListEntityProcessor is poorly named. It should be > called the dirWalkEntityProcessor or dirCrawlEntityProcessor or such. And > this new EntityProcessor should have the name FileListEntityProcessor. > However what is done is done. I then came up with manifestEnityProcessor > which I thought suited, manifest files are all over the content sets I deal > with and the dictionary definition seemed close enough ("ships manifest"). > However how about ChangeListEntityProcessor > {code} >processor="ManifestEntityProcessor" >baseDir="/Volumes/Techmore/ts/aaa/schema/data" >rootEntity="false" >dataSource="null" >allowRegex="^.*\.xml$" >blockRegex="usc2009" >manifestFileName="/Volumes/ts/man-find.txt" >docAddRegex=".*" >> > {code} > The new entity fields are as follows. > >*manifestFileName* is the required location of the manifest file. If this > value is relative, it assumed to be relative to baseDir. >*allowRegex* is an optional attribute that if present discards any line > which does not match the regExp > >*blockRegex* is an optional attribute that is applied after any allowRegex > and discards any line which matches the regExp >*docAddRegex* is a required regex to identify lines which when matched > should cause docs to be added to the index. As well as matching the line it > should also return the portion of the line which contains the filepath as > group(1) >*docDeleteRegex* is an optional value of a regex to identify documents > which when matched should be deleted from the index. As well as matching the > line it should also return the portion of the line which contains the > filepath as group(1) **PLANNED** -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1099) FieldAnalysisRequestHandler
[ https://issues.apache.org/jira/browse/SOLR-1099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12698708#action_12698708 ] Uri Boness commented on SOLR-1099: -- The public API for the AnalysisRequestHandler will change in the context of the response. Since the analysis breakdown is more detailed, the response format will have to change a bit. Furthermore, it's probably wise to rename the AnalysisRequestHandler to DocumentAnalysisRequestHandler (more expressive name and also consistent with the FieldAnalysisRequestHandler). Another option is to do this refactoring anyway, and leave the AnalysisRequestHandler as is and only deprecate it. So basically we'll have 4 classes: AnalysisRequestHanlderBase FieldAnalysisRequestHanlder DocumentAnalysisRequestHandler AnalysisRequestHandler (deprecated) what do you think? BTW, once commited, it would be also be wise to reimplement the anaysis.jsp to use this new handler and clean it up from all the analysis (now duplicate) logic code. > FieldAnalysisRequestHandler > --- > > Key: SOLR-1099 > URL: https://issues.apache.org/jira/browse/SOLR-1099 > Project: Solr > Issue Type: New Feature > Components: Analysis >Affects Versions: 1.3 >Reporter: Uri Boness >Assignee: Shalin Shekhar Mangar > Fix For: 1.4 > > Attachments: FieldAnalysisRequestHandler_incl_test.patch > > > The FieldAnalysisRequestHandler provides the analysis functionality of the > web admin page as a service. This handler accepts a filetype/fieldname > parameter and a value and as a response returns a breakdown of the analysis > process. It is also possible to send a query value which will use the > configured query analyzer as well as a showmatch parameter which will then > mark every matched token as a match. > If this handler is added to the code base, I also recommend to rename the > current AnalysisRequestHandler to DocumentAnalysisRequestHandler and have > them both inherit from one AnalysisRequestHandlerBase class which provides > the common functionality of the analysis breakdown and its translation to > named lists. This will also enhance the current AnalysisRequestHandler which > right now is fairly simplistic. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (SOLR-1113) Error reports from ExtractingRequestHandler and Co do not indicate name of rejected documents
Error reports from ExtractingRequestHandler and Co do not indicate name of rejected documents - Key: SOLR-1113 URL: https://issues.apache.org/jira/browse/SOLR-1113 Project: Solr Issue Type: Improvement Components: update Reporter: Fergus McMenemie The ExtractingRequestHandler rejects documents that are larger than the configured multipartUploadLimitInKB in solrconfig.xml. None of the generated error messages indicate the name of the rejected document or provide any way of identifying the rejected document. The failure to identify the rejected document complicates the middleware used to look after indexes. Here is the trace produced by a recent version of trunk. {code} SEVERE: org.apache.commons.fileupload.FileUploadBase$SizeLimitExceededException: the request was rejected because its size (4585774) exceeds the configured maximum (2097152) at org.apache.commons.fileupload.FileUploadBase$FileItemIteratorImpl.(FileUploadBase.java:914) at org.apache.commons.fileupload.FileUploadBase.getItemIterator(FileUploadBase.java:331) at org.apache.commons.fileupload.FileUploadBase.parseRequest(FileUploadBase.java:349) at org.apache.commons.fileupload.servlet.ServletFileUpload.parseRequest(ServletFileUpload.java:126) at org.apache.solr.servlet.MultipartRequestParser.parseParamsAndFillStreams(SolrRequestParsers.java:343) at org.apache.solr.servlet.StandardRequestParser.parseParamsAndFillStreams(SolrRequestParsers.java:396) at org.apache.solr.servlet.SolrRequestParsers.parse(SolrRequestParsers.java:114) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:217) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:202) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:173) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:213) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:178) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:126) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:105) {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1099) FieldAnalysisRequestHandler
[ https://issues.apache.org/jira/browse/SOLR-1099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12698704#action_12698704 ] Shalin Shekhar Mangar commented on SOLR-1099: - {quote} Well, the AnalysisRequestHandler goal is to handle documents, so basically, you send a XML document (same document as you would send for indexing) and the handler analyses the fields of the document. So the main difference between the two handlers is that the AnalsisRequestHandler enables you to provides a set of field names/types and their values to be analysed, while in the FieldAnalysisRequestHandler you're mainly targeting just a couple of fields and you can only specify one value to be analysed. The other main difference is that the AnalysisRequestHandler handles a POST request with an XML request body while the FieldAnalysisRequestHandler handles a GET request where all the parameters are specified as URL params. {quote} Thanks for clarifying Uri. {quote} As I mentioned, the analysis breakdown of the FieldAnalysisRequestHandler is more detailed than the AnalysisRequestHandler and this is why I think that some refactoring can take place by extracting all the common functionality to a parent class for these two classes. {quote} I agree. With this coming in, we will have three places which help with analysis (analysis.jsp, AnalysisRequestHandler and FieldAnalysisRequestHandler). Would you like to take a stab at this before we commit? I doubt our refactoring will change the public API (the request/response format) for any of the three. Therefore, I'm fine with refactoring later and committing this as-is. > FieldAnalysisRequestHandler > --- > > Key: SOLR-1099 > URL: https://issues.apache.org/jira/browse/SOLR-1099 > Project: Solr > Issue Type: New Feature > Components: Analysis >Affects Versions: 1.3 >Reporter: Uri Boness >Assignee: Shalin Shekhar Mangar > Fix For: 1.4 > > Attachments: FieldAnalysisRequestHandler_incl_test.patch > > > The FieldAnalysisRequestHandler provides the analysis functionality of the > web admin page as a service. This handler accepts a filetype/fieldname > parameter and a value and as a response returns a breakdown of the analysis > process. It is also possible to send a query value which will use the > configured query analyzer as well as a showmatch parameter which will then > mark every matched token as a match. > If this handler is added to the code base, I also recommend to rename the > current AnalysisRequestHandler to DocumentAnalysisRequestHandler and have > them both inherit from one AnalysisRequestHandlerBase class which provides > the common functionality of the analysis breakdown and its translation to > named lists. This will also enhance the current AnalysisRequestHandler which > right now is fairly simplistic. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Build failed in Hudson: Solr-trunk #771
See http://hudson.zones.apache.org/hudson/job/Solr-trunk/771/changes Changes: [shalin] SOLR-934 followup -- Make CustomFilter static, remove extra logging code, check log level before logging [shalin] SOLR-934 -- A MailEntityProcessor to enable indexing mails from POP/IMAP sources into a solr index [gsingers] SOLR-804: added lucene misc jar rev 764281 [shalin] SOLR-1059 -- Fixing bug where skipping a row containing nested entities did not skip the nested entities. Handling special flag variables is in one method now. [shalin] SOLR-940 followup -- Fix for the trie date test case [shalin] SOLR-1096 -- Introduced httpConnTimeout and httpReadTimeout in replication slave configuration to avoid stalled replication -- [...truncated 2323 lines...] init-forrest-entities: compile-solrj: compile: make-manifest: compile: compileTests: [mkdir] Created dir: http://hudson.zones.apache.org/hudson/job/Solr-trunk/ws/trunk/contrib/dataimporthandler/target/test-classes [javac] Compiling 24 source files to http://hudson.zones.apache.org/hudson/job/Solr-trunk/ws/trunk/contrib/dataimporthandler/target/test-classes [javac] Note: Some input files use unchecked or unsafe operations. [javac] Note: Recompile with -Xlint:unchecked for details. testCore: [junit] Running org.apache.solr.handler.dataimport.TestCachedSqlEntityProcessor [junit] Tests run: 5, Failures: 0, Errors: 0, Time elapsed: 0.378 sec [junit] Running org.apache.solr.handler.dataimport.TestClobTransformer [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.345 sec [junit] Running org.apache.solr.handler.dataimport.TestContentStreamDataSource [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 2.467 sec [junit] Running org.apache.solr.handler.dataimport.TestDataConfig [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 1.831 sec [junit] Running org.apache.solr.handler.dataimport.TestDateFormatTransformer [junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 0.35 sec [junit] Running org.apache.solr.handler.dataimport.TestDocBuilder [junit] Tests run: 5, Failures: 0, Errors: 0, Time elapsed: 0.481 sec [junit] Running org.apache.solr.handler.dataimport.TestDocBuilder2 [junit] Tests run: 9, Failures: 0, Errors: 0, Time elapsed: 4.675 sec [junit] Running org.apache.solr.handler.dataimport.TestEntityProcessorBase [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.358 sec [junit] Running org.apache.solr.handler.dataimport.TestErrorHandling [junit] Tests run: 4, Failures: 0, Errors: 0, Time elapsed: 3.061 sec [junit] Running org.apache.solr.handler.dataimport.TestEvaluatorBag [junit] Tests run: 5, Failures: 0, Errors: 0, Time elapsed: 0.361 sec [junit] Running org.apache.solr.handler.dataimport.TestFieldReader [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.54 sec [junit] Running org.apache.solr.handler.dataimport.TestFileListEntityProcessor [junit] Tests run: 3, Failures: 0, Errors: 0, Time elapsed: 0.378 sec [junit] Running org.apache.solr.handler.dataimport.TestJdbcDataSource [junit] Tests run: 0, Failures: 0, Errors: 0, Time elapsed: 0.307 sec [junit] Running org.apache.solr.handler.dataimport.TestNumberFormatTransformer [junit] Tests run: 9, Failures: 0, Errors: 0, Time elapsed: 0.408 sec [junit] Running org.apache.solr.handler.dataimport.TestPlainTextEntityProcessor [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.429 sec [junit] Running org.apache.solr.handler.dataimport.TestRegexTransformer [junit] Tests run: 4, Failures: 0, Errors: 0, Time elapsed: 0.358 sec [junit] Running org.apache.solr.handler.dataimport.TestScriptTransformer [junit] Tests run: 0, Failures: 0, Errors: 0, Time elapsed: 0.303 sec [junit] Running org.apache.solr.handler.dataimport.TestSqlEntityProcessor [junit] Tests run: 4, Failures: 0, Errors: 0, Time elapsed: 0.387 sec [junit] Running org.apache.solr.handler.dataimport.TestSqlEntityProcessor2 [junit] Tests run: 4, Failures: 0, Errors: 0, Time elapsed: 2.941 sec [junit] Running org.apache.solr.handler.dataimport.TestTemplateString [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.324 sec [junit] Running org.apache.solr.handler.dataimport.TestTemplateTransformer [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.355 sec [junit] Running org.apache.solr.handler.dataimport.TestVariableResolver [junit] Tests run: 8, Failures: 0, Errors: 0, Time elapsed: 0.38 sec [junit] Running org.apache.solr.handler.dataimport.TestXPathEntityProcessor [junit] Tests run: 4, Failures: 0, Errors: 0, Time elapsed: 0.677 sec [junit] Running org.apache.solr.handler.dataimport.TestXPathRecordReader [junit] Tests run: 13, Failures: 0, Errors: 0, Time elapsed: 0.476 sec compileExtras: compileExtrasTests: [mkd
Solr nightly build failure
init-forrest-entities: [mkdir] Created dir: /tmp/apache-solr-nightly/build [mkdir] Created dir: /tmp/apache-solr-nightly/build/web compile-solrj: [mkdir] Created dir: /tmp/apache-solr-nightly/build/solrj [javac] Compiling 76 source files to /tmp/apache-solr-nightly/build/solrj [javac] Note: Some input files use or override a deprecated API. [javac] Note: Recompile with -Xlint:deprecation for details. [javac] Note: Some input files use unchecked or unsafe operations. [javac] Note: Recompile with -Xlint:unchecked for details. compile: [mkdir] Created dir: /tmp/apache-solr-nightly/build/solr [javac] Compiling 365 source files to /tmp/apache-solr-nightly/build/solr [javac] Note: Some input files use or override a deprecated API. [javac] Note: Recompile with -Xlint:deprecation for details. [javac] Note: Some input files use unchecked or unsafe operations. [javac] Note: Recompile with -Xlint:unchecked for details. compileTests: [mkdir] Created dir: /tmp/apache-solr-nightly/build/tests [javac] Compiling 150 source files to /tmp/apache-solr-nightly/build/tests [javac] Note: Some input files use or override a deprecated API. [javac] Note: Recompile with -Xlint:deprecation for details. [javac] Note: Some input files use unchecked or unsafe operations. [javac] Note: Recompile with -Xlint:unchecked for details. junit: [mkdir] Created dir: /tmp/apache-solr-nightly/build/test-results [junit] Running org.apache.solr.BasicFunctionalityTest [junit] Tests run: 19, Failures: 0, Errors: 0, Time elapsed: 17.464 sec [junit] Running org.apache.solr.ConvertedLegacyTest [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 11.11 sec [junit] Running org.apache.solr.DisMaxRequestHandlerTest [junit] Tests run: 3, Failures: 0, Errors: 0, Time elapsed: 4.847 sec [junit] Running org.apache.solr.EchoParamsTest [junit] Tests run: 4, Failures: 0, Errors: 0, Time elapsed: 2.219 sec [junit] Running org.apache.solr.OutputWriterTest [junit] Tests run: 3, Failures: 0, Errors: 0, Time elapsed: 2.699 sec [junit] Running org.apache.solr.SampleTest [junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 2.315 sec [junit] Running org.apache.solr.SolrInfoMBeanTest [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.861 sec [junit] Running org.apache.solr.TestDistributedSearch [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 25.714 sec [junit] Running org.apache.solr.TestTrie [junit] Tests run: 7, Failures: 0, Errors: 0, Time elapsed: 6.842 sec [junit] Running org.apache.solr.analysis.DoubleMetaphoneFilterFactoryTest [junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 0.326 sec [junit] Running org.apache.solr.analysis.DoubleMetaphoneFilterTest [junit] Tests run: 6, Failures: 0, Errors: 0, Time elapsed: 0.321 sec [junit] Running org.apache.solr.analysis.EnglishPorterFilterFactoryTest [junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 1.281 sec [junit] Running org.apache.solr.analysis.HTMLStripReaderTest [junit] Tests run: 9, Failures: 0, Errors: 0, Time elapsed: 0.667 sec [junit] Running org.apache.solr.analysis.LengthFilterTest [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.882 sec [junit] Running org.apache.solr.analysis.SnowballPorterFilterFactoryTest [junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 1.354 sec [junit] Running org.apache.solr.analysis.TestBufferedTokenStream [junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 0.963 sec [junit] Running org.apache.solr.analysis.TestCapitalizationFilter [junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 1.585 sec [junit] Running org.apache.solr.analysis.TestCharFilter [junit] Tests run: 4, Failures: 0, Errors: 0, Time elapsed: 0.315 sec [junit] Running org.apache.solr.analysis.TestHyphenatedWordsFilter [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 1.112 sec [junit] Running org.apache.solr.analysis.TestKeepFilterFactory [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 1.877 sec [junit] Running org.apache.solr.analysis.TestKeepWordFilter [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.892 sec [junit] Running org.apache.solr.analysis.TestMappingCharFilter [junit] Tests run: 11, Failures: 0, Errors: 0, Time elapsed: 0.34 sec [junit] Running org.apache.solr.analysis.TestMappingCharFilterFactory [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.334 sec [junit] Running org.apache.solr.analysis.TestPatternReplaceFilter [junit] Tests run: 5, Failures: 0, Errors: 0, Time elapsed: 1.213 sec [junit] Running org.apache.solr.analysis.TestPatternTokenizerFactory [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.723 sec [junit] Running org.apache.solr.analysis.TestPhoneticFilter
[jira] Commented: (SOLR-1099) FieldAnalysisRequestHandler
[ https://issues.apache.org/jira/browse/SOLR-1099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12698696#action_12698696 ] Uri Boness commented on SOLR-1099: -- Well, the AnalysisRequestHandler goal is to handle documents, so basically, you send a XML document (same document as you would send for indexing) and the handler analyses the fields of the document. So the main difference between the two handlers is that the AnalsisRequestHandler enables you to provides a set of field names/types and their values to be analysed, while in the FieldAnalysisRequestHandler you're mainly targeting just a couple of fields and you can only specify one value to be analysed. The other main difference is that the AnalysisRequestHandler handles a POST request with an XML request body while the FieldAnalysisRequestHandler handles a GET request where all the parameters are specified as URL params. As I mentioned, the analysis breakdown of the FieldAnalysisRequestHandler is more detailed than the AnalysisRequestHandler and this is why I think that some refactoring can take place by extracting all the common functionality to a parent class for these two classes. > FieldAnalysisRequestHandler > --- > > Key: SOLR-1099 > URL: https://issues.apache.org/jira/browse/SOLR-1099 > Project: Solr > Issue Type: New Feature > Components: Analysis >Affects Versions: 1.3 >Reporter: Uri Boness >Assignee: Shalin Shekhar Mangar > Fix For: 1.4 > > Attachments: FieldAnalysisRequestHandler_incl_test.patch > > > The FieldAnalysisRequestHandler provides the analysis functionality of the > web admin page as a service. This handler accepts a filetype/fieldname > parameter and a value and as a response returns a breakdown of the analysis > process. It is also possible to send a query value which will use the > configured query analyzer as well as a showmatch parameter which will then > mark every matched token as a match. > If this handler is added to the code base, I also recommend to rename the > current AnalysisRequestHandler to DocumentAnalysisRequestHandler and have > them both inherit from one AnalysisRequestHandlerBase class which provides > the common functionality of the analysis breakdown and its translation to > named lists. This will also enhance the current AnalysisRequestHandler which > right now is fairly simplistic. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1099) FieldAnalysisRequestHandler
[ https://issues.apache.org/jira/browse/SOLR-1099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12698681#action_12698681 ] Shalin Shekhar Mangar commented on SOLR-1099: - This looks great Uri. I'm yet to look completely into the patch. But is there anything in the AnalysisRequestHandler which is not there in this patch? If not, does it make sense to just deprecate AnalysisRequestHandler and use this instead? > FieldAnalysisRequestHandler > --- > > Key: SOLR-1099 > URL: https://issues.apache.org/jira/browse/SOLR-1099 > Project: Solr > Issue Type: New Feature > Components: Analysis >Affects Versions: 1.3 >Reporter: Uri Boness >Assignee: Shalin Shekhar Mangar > Fix For: 1.4 > > Attachments: FieldAnalysisRequestHandler_incl_test.patch > > > The FieldAnalysisRequestHandler provides the analysis functionality of the > web admin page as a service. This handler accepts a filetype/fieldname > parameter and a value and as a response returns a breakdown of the analysis > process. It is also possible to send a query value which will use the > configured query analyzer as well as a showmatch parameter which will then > mark every matched token as a match. > If this handler is added to the code base, I also recommend to rename the > current AnalysisRequestHandler to DocumentAnalysisRequestHandler and have > them both inherit from one AnalysisRequestHandlerBase class which provides > the common functionality of the analysis breakdown and its translation to > named lists. This will also enhance the current AnalysisRequestHandler which > right now is fairly simplistic. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (SOLR-1099) FieldAnalysisRequestHandler
[ https://issues.apache.org/jira/browse/SOLR-1099?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shalin Shekhar Mangar updated SOLR-1099: Component/s: (was: search) Analysis Fix Version/s: (was: 1.3.1) 1.4 Assignee: Shalin Shekhar Mangar > FieldAnalysisRequestHandler > --- > > Key: SOLR-1099 > URL: https://issues.apache.org/jira/browse/SOLR-1099 > Project: Solr > Issue Type: New Feature > Components: Analysis >Affects Versions: 1.3 >Reporter: Uri Boness >Assignee: Shalin Shekhar Mangar > Fix For: 1.4 > > Attachments: FieldAnalysisRequestHandler_incl_test.patch > > > The FieldAnalysisRequestHandler provides the analysis functionality of the > web admin page as a service. This handler accepts a filetype/fieldname > parameter and a value and as a response returns a breakdown of the analysis > process. It is also possible to send a query value which will use the > configured query analyzer as well as a showmatch parameter which will then > mark every matched token as a match. > If this handler is added to the code base, I also recommend to rename the > current AnalysisRequestHandler to DocumentAnalysisRequestHandler and have > them both inherit from one AnalysisRequestHandlerBase class which provides > the common functionality of the analysis breakdown and its translation to > named lists. This will also enhance the current AnalysisRequestHandler which > right now is fairly simplistic. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-934) Enable importing of mails into a solr index through DIH.
[ https://issues.apache.org/jira/browse/SOLR-934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12698676#action_12698676 ] Shalin Shekhar Mangar commented on SOLR-934: Committed revision 764691. > Enable importing of mails into a solr index through DIH. > > > Key: SOLR-934 > URL: https://issues.apache.org/jira/browse/SOLR-934 > Project: Solr > Issue Type: New Feature > Components: contrib - DataImportHandler >Affects Versions: 1.4 >Reporter: Preetam Rao >Assignee: Shalin Shekhar Mangar > Fix For: 1.4 > > Attachments: SOLR-934.patch, SOLR-934.patch, SOLR-934.patch, > SOLR-934.patch, SOLR-934.patch, SOLR-934.patch, SOLR-934.patch > > Original Estimate: 24h > Remaining Estimate: 24h > > Enable importing of mails into solr through DIH. Take one or more mailbox > credentials, download and index their content along with the content from > attachments. The folders to fetch can be made configurable based on various > criteria. Apache Tika is used for extracting content from different kinds of > attachments. JavaMail is used for mail box related operations like fetching > mails, filtering them etc. > The basic configuration for one mail box is as below: > {code:xml} > > password="something" host="imap.gmail.com" protocol="imaps"/> > > {code} > The below is the list of all configuration available: > {color:green}Required{color} > - > *user* > *pwd* > *protocol* (only "imaps" supported now) > *host* > {color:green}Optional{color} > - > *folders* - comma seperated list of folders. > If not specified, default folder is used. Nested folders can be specified > like a/b/c > *recurse* - index subfolders. Defaults to true. > *exclude* - comma seperated list of patterns. > *include* - comma seperated list of patterns. > *batchSize* - mails to fetch at once in a given folder. > Only headers can be prefetched in Javamail IMAP. > *readTimeout* - defaults to 6ms > *conectTimeout* - defaults to 3ms > *fetchSize* - IMAP config. 32KB default > *fetchMailsSince* - > date/time in "-MM-dd HH:mm:ss" format, mails received after which will be > fetched. Useful for delta import. > *customFilter* - class name. > {code} > import javax.mail.Folder; > import javax.mail.SearchTerm; > clz implements MailEntityProcessor.CustomFilter() { > public SearchTerm getCustomSearch(Folder folder); > } > {code} > *processAttachement* - defaults to true > The below are the indexed fields. > {code} > // Fields To Index > // single valued > private static final String SUBJECT = "subject"; > private static final String FROM = "from"; > private static final String SENT_DATE = "sentDate"; > private static final String XMAILER = "xMailer"; > // multi valued > private static final String TO_CC_BCC = "allTo"; > private static final String FLAGS = "flags"; > private static final String CONTENT = "content"; > private static final String ATTACHMENT = "attachement"; > private static final String ATTACHMENT_NAMES = "attachementNames"; > // flag values > private static final String FLAG_ANSWERED = "answered"; > private static final String FLAG_DELETED = "deleted"; > private static final String FLAG_DRAFT = "draft"; > private static final String FLAG_FLAGGED = "flagged"; > private static final String FLAG_RECENT = "recent"; > private static final String FLAG_SEEN = "seen"; > {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (SOLR-934) Enable importing of mails into a solr index through DIH.
[ https://issues.apache.org/jira/browse/SOLR-934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shalin Shekhar Mangar updated SOLR-934: --- Attachment: SOLR-934.patch A few changes in this patch # Made the CustomFilter interface static # Removed logRow method. LogTransformer can be used if needed # logConfig first checks if info level is enabled or not I'll commit shortly. > Enable importing of mails into a solr index through DIH. > > > Key: SOLR-934 > URL: https://issues.apache.org/jira/browse/SOLR-934 > Project: Solr > Issue Type: New Feature > Components: contrib - DataImportHandler >Affects Versions: 1.4 >Reporter: Preetam Rao >Assignee: Shalin Shekhar Mangar > Fix For: 1.4 > > Attachments: SOLR-934.patch, SOLR-934.patch, SOLR-934.patch, > SOLR-934.patch, SOLR-934.patch, SOLR-934.patch, SOLR-934.patch > > Original Estimate: 24h > Remaining Estimate: 24h > > Enable importing of mails into solr through DIH. Take one or more mailbox > credentials, download and index their content along with the content from > attachments. The folders to fetch can be made configurable based on various > criteria. Apache Tika is used for extracting content from different kinds of > attachments. JavaMail is used for mail box related operations like fetching > mails, filtering them etc. > The basic configuration for one mail box is as below: > {code:xml} > > password="something" host="imap.gmail.com" protocol="imaps"/> > > {code} > The below is the list of all configuration available: > {color:green}Required{color} > - > *user* > *pwd* > *protocol* (only "imaps" supported now) > *host* > {color:green}Optional{color} > - > *folders* - comma seperated list of folders. > If not specified, default folder is used. Nested folders can be specified > like a/b/c > *recurse* - index subfolders. Defaults to true. > *exclude* - comma seperated list of patterns. > *include* - comma seperated list of patterns. > *batchSize* - mails to fetch at once in a given folder. > Only headers can be prefetched in Javamail IMAP. > *readTimeout* - defaults to 6ms > *conectTimeout* - defaults to 3ms > *fetchSize* - IMAP config. 32KB default > *fetchMailsSince* - > date/time in "-MM-dd HH:mm:ss" format, mails received after which will be > fetched. Useful for delta import. > *customFilter* - class name. > {code} > import javax.mail.Folder; > import javax.mail.SearchTerm; > clz implements MailEntityProcessor.CustomFilter() { > public SearchTerm getCustomSearch(Folder folder); > } > {code} > *processAttachement* - defaults to true > The below are the indexed fields. > {code} > // Fields To Index > // single valued > private static final String SUBJECT = "subject"; > private static final String FROM = "from"; > private static final String SENT_DATE = "sentDate"; > private static final String XMAILER = "xMailer"; > // multi valued > private static final String TO_CC_BCC = "allTo"; > private static final String FLAGS = "flags"; > private static final String CONTENT = "content"; > private static final String ATTACHMENT = "attachement"; > private static final String ATTACHMENT_NAMES = "attachementNames"; > // flag values > private static final String FLAG_ANSWERED = "answered"; > private static final String FLAG_DELETED = "deleted"; > private static final String FLAG_DRAFT = "draft"; > private static final String FLAG_FLAGGED = "flagged"; > private static final String FLAG_RECENT = "recent"; > private static final String FLAG_SEEN = "seen"; > {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1105) Using external field content for highlighting
[ https://issues.apache.org/jira/browse/SOLR-1105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12698672#action_12698672 ] Shalin Shekhar Mangar commented on SOLR-1105: - Instead of baking this into the schema, should this be turned on/off through a request parameter? > Using external field content for highlighting > - > > Key: SOLR-1105 > URL: https://issues.apache.org/jira/browse/SOLR-1105 > Project: Solr > Issue Type: Improvement > Components: highlighter >Affects Versions: 1.3 >Reporter: Dmitry Lihachev > Fix For: 1.3.1 > > Attachments: SOLR-1105_shared_content_field_1.3.0.patch > > > DefaultSolrHighlighter uses stored field content to highlight. It has some > disadvantages, because index grows up fast when using multilingual indexing > due to several fields has to be stored with same content. This patch allows > DefaultSolrHighlighter to use "contentField" attribute to loockup content in > external field. > Excerpt from old schema: > {code:xml} > > > > > {code} > The same after patching, highlighter will now get content stored in "title" > field > {code:xml} > > contentField="title"/> > contentField="title"/> > contentField="title"/> > {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.