Re: Welcome Uwe Schindler to the Lucene PMC
Congratulations Uwe! On Thu, Apr 1, 2010 at 4:35 PM, Grant Ingersoll gsing...@apache.org wrote: I'm pleased to announce that the Lucene PMC has voted to add Uwe Schindler to the PMC. Uwe has been doing a lot of work in Lucene and Solr, including several of the last releases in Lucene. Please join me in extending congratulations to Uwe! -Grant Ingersoll PMC Chair - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org -- Regards, Shalin Shekhar Mangar.
[jira] Commented: (SOLR-469) Data Import RequestHandler
[ https://issues.apache.org/jira/browse/SOLR-469?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12849121#action_12849121 ] Shalin Shekhar Mangar commented on SOLR-469: Thanks! Scheduling is not implemented inside Solr. You can use a cron job for scheduling automatic imports. For example, you can call wget http://solr.host:port/solr/dataimport?command=full-import;. Data Import RequestHandler -- Key: SOLR-469 URL: https://issues.apache.org/jira/browse/SOLR-469 Project: Solr Issue Type: New Feature Components: update Affects Versions: 1.3 Reporter: Noble Paul Assignee: Shalin Shekhar Mangar Fix For: 1.3 Attachments: SOLR-469-contrib.patch, SOLR-469-contrib.patch, SOLR-469-contrib.patch, SOLR-469-contrib.patch, SOLR-469-contrib.patch, SOLR-469-contrib.patch, SOLR-469-contrib.patch, SOLR-469-contrib.patch, SOLR-469-contrib.patch, SOLR-469-contrib.patch, SOLR-469-contrib.patch, SOLR-469-contrib.patch, SOLR-469.patch, SOLR-469.patch, SOLR-469.patch, SOLR-469.patch, SOLR-469.patch, SOLR-469.patch, SOLR-469.patch, SOLR-469.patch, SOLR-469.patch, xpath-stream.patch We need a RequestHandler Which can import data from a DB or other dataSources into the Solr index .Think of it as an advanced form of SqlUpload Plugin (SOLR-103). The way it works is as follows. * Provide a configuration file (xml) to the Handler which takes in the necessary SQL queries and mappings to a solr schema - It also takes in a properties file for the data source configuraution * Given the configuration it can also generate the solr schema.xml * It is registered as a RequestHandler which can take two commands do-full-import, do-delta-import - do-full-import - dumps all the data from the Database into the index (based on the SQL query in configuration) - do-delta-import - dumps all the data that has changed since last import. (We assume a modified-timestamp column in tables) * It provides a admin page - where we can schedule it to be run automatically at regular intervals - It shows the status of the Handler (idle, full-import, delta-import) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (SOLR-1799) enable matching of CamelCase with camelcase in WordDelimiterFilter
[ https://issues.apache.org/jira/browse/SOLR-1799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shalin Shekhar Mangar updated SOLR-1799: Fix Version/s: (was: 1.3) 1.5 enable matching of CamelCase with camelcase in WordDelimiterFilter -- Key: SOLR-1799 URL: https://issues.apache.org/jira/browse/SOLR-1799 Project: Solr Issue Type: Improvement Components: search Affects Versions: 1.3, 1.4 Reporter: Chris Darroch Priority: Minor Fix For: 1.5 Attachments: SOLR-1799.patch At the bottom of the WordDelimiterFilter.java code there's the following comment: // downsides: if source text is powershot then a query of PowerShot won't match! Another serious example for us might be something like an indexed document containing the word Tribeca or Soho, and then a user trying to search for TriBeCa or SoHo. This issue has turned up in a couple of recent mailing list threads: http://mail-archives.apache.org/mod_mbox/lucene-solr-user/200908.mbox/%3cfe4f94830908201429j3ffbcdd3s3cb7d80542b31...@mail.gmail.com%3e http://mail-archives.apache.org/mod_mbox/lucene-solr-user/200905.mbox/%3c72d9e9500905121619p68c27099ibc7079e52cb0e...@mail.gmail.com%3e In the first thread I found the best explication of what my own misunderstanding was, and it's something I'm sure must trip up other people as well: {quote} I've misunderstood WordDelimiterFilter. You might think that catenateAll=1 would append the full phrase (sans delimiters) as an OR against the query. So jOkersWild would produce: j (okers wild) OR jokerswild But you thought wrong. Its actually: j (okers wild jokerswild) Which is confusing and won't match... {quote} In the second thread, Yonik Seeley gives a good explanation of why this occurs, and provides a suggested workaround where you duplicate your data fields and then query on one using generateWordParts=1 and on the other using catenateWords=1. That works, but obviously requires data duplication. In our case, we are also following what I believe is recommended practice and duplicating our data already into stemmed and unstemmed indexes. To my mind, to further duplicate both of these fields a second time, with no difference in the indexed data of the additional copy, seems needlessly wasteful when the problem lies entirely in the query side of things. At any rate, I'm attaching a patch against Solr 1.3 which is rather hacky, but seems to work for us. In WordDelimiterFilter, if generateWordParts=1 and catenateWords=2, then we move the concatenated word to overlap its position with the first generated token instead of the last (which is the behaviour with catenateWords=1). We further insert a preceding dummy flag token with the special type CATENATE_FIRST. In SolrPluginUtils in the DisjunctionMaxQueryParser class we just copy in the entirety of the getFieldQuery() code from Lucene's QueryParser. This is ugly, I know. This code is then tweaked so that in the case where the dummy flag token is seen, it creates a BooleanQuery with the following token (the concatenated word) as a conditional TermQuery clause, and then adds the generated terms in their usual MultiPhraseQuery as a second conditional clause. Now I realize this patch is (a) not likely acceptable on style and elegance grounds, and (b) only against Solr 1.3, not trunk. My apologies for both; after I'd spent most of what time I had available tracking down the source of the problem, I just needed to get something working quickly. Perhaps this patch will inspire others to greatness, though, or at a minimum provide a starting point for those who stumble over this same issue. Thanks for a great application! Cheers. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (SOLR-1814) select count(distinct fieldname) in SOLR
[ https://issues.apache.org/jira/browse/SOLR-1814?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shalin Shekhar Mangar updated SOLR-1814: Fix Version/s: (was: 1.4) select count(distinct fieldname) in SOLR Key: SOLR-1814 URL: https://issues.apache.org/jira/browse/SOLR-1814 Project: Solr Issue Type: New Feature Components: SearchComponents - other Affects Versions: 1.5 Reporter: Marcus Herou Fix For: 1.5 Attachments: CountComponent.java I have seen questions on the mailinglist about having the functionality for counting distinct on a field. We at Tailsweep as well want to that in for example our blogsearch. Example: You had 1345 hits on 244 blogs The 244 part is not possible in SOLR today (correct me if I am wrong). So I've written a component which does this. Attaching it. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (SOLR-1814) select count(distinct fieldname) in SOLR
[ https://issues.apache.org/jira/browse/SOLR-1814?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shalin Shekhar Mangar updated SOLR-1814: Affects Version/s: (was: 2.0) (was: 1.6) (was: 1.4) Fix Version/s: (was: 2.0) (was: 1.6) select count(distinct fieldname) in SOLR Key: SOLR-1814 URL: https://issues.apache.org/jira/browse/SOLR-1814 Project: Solr Issue Type: New Feature Components: SearchComponents - other Affects Versions: 1.5 Reporter: Marcus Herou Fix For: 1.5 Attachments: CountComponent.java I have seen questions on the mailinglist about having the functionality for counting distinct on a field. We at Tailsweep as well want to that in for example our blogsearch. Example: You had 1345 hits on 244 blogs The 244 part is not possible in SOLR today (correct me if I am wrong). So I've written a component which does this. Attaching it. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Re: removal of deprecated HtmlStrip*Tokenizer factories
On Tue, Mar 16, 2010 at 2:09 AM, Robert Muir rcm...@gmail.com wrote: Hello, Is there any concern with removing the deprecated HtmlStrip*Tokenizer factories? These can be done with CharFilter instead and they have some problems with lucene's trunk. If no one objects, I'd like to remove these in the branch. Otherwise, Uwe tells me there is some way to make them work if need be. Is there a way we can fix LUCENE-2098 too? -- Regards, Shalin Shekhar Mangar.
[jira] Commented: (SOLR-1812) StreamingUpdateSolrServer creates an OutputStreamWriter that it never closes
[ https://issues.apache.org/jira/browse/SOLR-1812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12842717#action_12842717 ] Shalin Shekhar Mangar commented on SOLR-1812: - Closing the OutputStreamWriter will close the underlying OutputStream. The HttpClient will automatically do that once the request has been sent so there is no leak here. StreamingUpdateSolrServer creates an OutputStreamWriter that it never closes Key: SOLR-1812 URL: https://issues.apache.org/jira/browse/SOLR-1812 Project: Solr Issue Type: Bug Components: clients - java Affects Versions: 1.4 Reporter: Mark Miller Priority: Minor Fix For: 1.5 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1807) UpdateHandler plugin is not fully supported
[ https://issues.apache.org/jira/browse/SOLR-1807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12842369#action_12842369 ] Shalin Shekhar Mangar commented on SOLR-1807: - UpdateHandler is an interface so instead of adding a method to it and breaking compatibility, we added it to the DirectUpdateHandler2 class. I guess the only way is to change the UpdateHandler interface. UpdateHandler plugin is not fully supported --- Key: SOLR-1807 URL: https://issues.apache.org/jira/browse/SOLR-1807 Project: Solr Issue Type: Bug Components: update Affects Versions: 1.4 Reporter: John Wang UpdateHandler is published as a supported Plugin, but code such as the following: if (core.getUpdateHandler() instanceof DirectUpdateHandler2) { ((DirectUpdateHandler2) core.getUpdateHandler()).forceOpenWriter(); } else { LOG.warn(The update handler being used is not an instance or sub-class of DirectUpdateHandler2. + Replicate on Startup cannot work.); } suggest that it is really not fully supported. Must all implementations of UpdateHandler be subclasses of DirectUpdateHandler2 for it to work with replication? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Upgrading Lucene jars to 2.9.2 artifacts
Any objections? -- Regards, Shalin Shekhar Mangar.
Re: Upgrading Lucene jars to 2.9.2 artifacts
On Sat, Feb 27, 2010 at 8:04 PM, Mark Miller markrmil...@gmail.com wrote: On 02/27/2010 05:53 AM, Shalin Shekhar Mangar wrote: Any objections? Didn't rc2 (that we are on) end up being the final release? Hmm, I didn't know that. But the lucene contrib jars checked in trunk are different from the ones on Maven. The revision number is same but the date/time of the build is different. For example, the lucene-analyzers-2.9.2.jar: Maven - Implementation-Version: 2.9.2 912433 - 2010-02-22 00:00:06 Trunk - Implementation-Version: 2.9.2 912433 - 2010-02-21 23:52:03 -- Regards, Shalin Shekhar Mangar.
[jira] Commented: (SOLR-1752) SolrJ fails with exception when passing document ADD and DELETEs in the same request using XML request writer (but not binary request writer)
[ https://issues.apache.org/jira/browse/SOLR-1752?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12838486#action_12838486 ] Shalin Shekhar Mangar commented on SOLR-1752: - Jayson, Solr's update XML does not define a container tag so we are constrained to only one of add/delete/commit/optimize at a time. Binary format of course does not have this problem. So unless we decide to add a root tag to the update XML, this exception will happen. So I guess we have the following options: # Disallow more than one type of operation for any request writer # Document this behavior in the UpdateRequest javadocs. I'd prefer #2 even though it is inconsistent. SolrJ fails with exception when passing document ADD and DELETEs in the same request using XML request writer (but not binary request writer) - Key: SOLR-1752 URL: https://issues.apache.org/jira/browse/SOLR-1752 Project: Solr Issue Type: Bug Components: clients - java, update Affects Versions: 1.4 Reporter: Jayson Minard Assignee: Shalin Shekhar Mangar Priority: Blocker Add this test to SolrExampleTests.java and it will fail when using the XML Request Writer (now default), but not if you change the SolrExampleJettyTest to use the BinaryRequestWriter. {code} public void testAddDeleteInSameRequest() throws Exception { SolrServer server = getSolrServer(); SolrInputDocument doc3 = new SolrInputDocument(); doc3.addField( id, id3, 1.0f ); doc3.addField( name, doc3, 1.0f ); doc3.addField( price, 10 ); UpdateRequest up = new UpdateRequest(); up.add( doc3 ); up.deleteById(id001); up.setWaitFlush(false); up.setWaitSearcher(false); up.process( server ); } {code} terminates with exception: {code} Feb 3, 2010 8:55:34 AM org.apache.solr.common.SolrException log SEVERE: org.apache.solr.common.SolrException: Illegal to have multiple roots (start tag in epilog?). at [row,col {unknown-source}]: [1,125] at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:72) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:54) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1089) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:365) at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181) at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:712) at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:139) at org.mortbay.jetty.Server.handle(Server.java:285) at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:502) at org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnection.java:835) at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:723) at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:202) at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:378) at org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:226) at org.mortbay.thread.BoundedThreadPool$PoolThread.run(BoundedThreadPool.java:442) Caused by: com.ctc.wstx.exc.WstxParsingException: Illegal to have multiple roots (start tag in epilog?). at [row,col {unknown-source}]: [1,125] at com.ctc.wstx.sr.StreamScanner.constructWfcException(StreamScanner.java:630) at com.ctc.wstx.sr.StreamScanner.throwParseError(StreamScanner.java:461) at com.ctc.wstx.sr.BasicStreamReader.handleExtraRoot(BasicStreamReader.java:2155) at com.ctc.wstx.sr.BasicStreamReader.nextFromProlog(BasicStreamReader.java:2070) at com.ctc.wstx.sr.BasicStreamReader.nextFromTree(BasicStreamReader.java:2647) at com.ctc.wstx.sr.BasicStreamReader.next(BasicStreamReader.java:1019) at org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:90) at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:69) ... 18 more {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1302) Fun with Distances - Add Distance functions for a variety of things
[ https://issues.apache.org/jira/browse/SOLR-1302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12834933#action_12834933 ] Shalin Shekhar Mangar commented on SOLR-1302: - Looking over the source of DistanceUtils#vectorDistance, it seems like there is a bug with calculating infinite norm: Existing code: {code} for (int i = 0; i vec1.length; i++) { result = Math.max(vec1[i], vec2[i]); } {code} Shouldn't that be: {code} for (int i = 0; i vec1.length; i++) { result = Math.max(result, Math.max(vec1[i], vec2[i])); } {code} Fun with Distances - Add Distance functions for a variety of things --- Key: SOLR-1302 URL: https://issues.apache.org/jira/browse/SOLR-1302 Project: Solr Issue Type: New Feature Reporter: Grant Ingersoll Assignee: Grant Ingersoll Priority: Minor Fix For: 1.5 Attachments: SOLR-1302.patch, SOLR-1302.patch, SOLR-1302.patch There are many distance functions that are useful to have: 1. Great Circle (lat/lon) and other geo distances 2. Euclidean (Vector) 3. Manhattan (Vector) 4. Cosine (Vector) For the vector ones, the idea is that the fields on a document can be used to determine the vector. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1302) Fun with Distances - Add Distance functions for a variety of things
[ https://issues.apache.org/jira/browse/SOLR-1302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12834950#action_12834950 ] Shalin Shekhar Mangar commented on SOLR-1302: - Done. Committed revision 911153. Fun with Distances - Add Distance functions for a variety of things --- Key: SOLR-1302 URL: https://issues.apache.org/jira/browse/SOLR-1302 Project: Solr Issue Type: New Feature Reporter: Grant Ingersoll Assignee: Grant Ingersoll Priority: Minor Fix For: 1.5 Attachments: SOLR-1302.patch, SOLR-1302.patch, SOLR-1302.patch There are many distance functions that are useful to have: 1. Great Circle (lat/lon) and other geo distances 2. Euclidean (Vector) 3. Manhattan (Vector) 4. Cosine (Vector) For the vector ones, the idea is that the fields on a document can be used to determine the vector. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Re: Field value tags
On Sat, Feb 13, 2010 at 11:18 PM, Peter S pete...@hotmail.com wrote: Hello Solr-dev, I've now implemented a QParserPlugin/QParser for tagging functionality in my internal Solr environment, and this is working very nicely. The type of functionality offered by tagging isn't currently in Solr, so I was thinking this might be a good plugin to contribute to the project. Before preparing the plugin for ASF-readiness, it would be great to get feedback, comments etc. on what the Solr dev experts think of including this sort of thing. If it's deemed useful for inclusion, I'll go ahead and create a JIRA issue and prepare the code for ASF. Here is a quick precis of what tagging offers: First off, for your typical user-based searching of 'shopping cart' or google-type doc-scored searching, tagging is probably not what you want. Dismax provides a much better fit for this type of searching. Tagging provides a means of entering a tag into a query, which, on the server (in the plugin) translates to some configured subquery that is actually executed by Solr. There are a number of cool use-cases for this - the 2 most salient of which are these: 1. To provide a known 'key' at query time, that translates into subqueries that the user couldn't/wouldn't/shouldn't know at query time. For example, I use this to supply a tag called: 'admins', which, when entered into a query, will actually query for all documents that have some reference to all administrators/root users in the searched index(es). The [securely logged-in] person searching won't know who all the root users are (and the list will change over time), only that he/she wishes to find out information pertaining to their activity. 2. To provide subquery 'shortcuts' for often used, usually lengthy and/or complicated queries. For example, if every morning, as part of your job, you need to search for: ((this AND that) OR (theother AND NOT somethingelse)) AND timestamp:[then TO now] . . . A tag can be made, say, 'mysearchtag' which equates to the above query. This tag can then be used as a query, and/or embedded in other queries. This is quite handy for automated searching and/or saved searches etc. This allows server administrators to control the content that gets returned by these queries, thus reducing client-side maintenance. Additionally, for distributed searches, evaluated tags can, if desired, produce different queries for different shards (e.g. the list of root users are different on different machines). Any comments, concerns, opinions etc. on a contributuion of this type would be greatly appreciated. Thanks Peter. It definitely sounds useful for some use-cases. Can you open a Jira issue and give a patch? -- Regards, Shalin Shekhar Mangar.
[jira] Commented: (SOLR-1773) Field Collapsing (lightweight version)
[ https://issues.apache.org/jira/browse/SOLR-1773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12833524#action_12833524 ] Shalin Shekhar Mangar commented on SOLR-1773: - Koji, have you looked at SOLR-1682? I gave an implementation of the same approach but that too is only a PoC. Field Collapsing (lightweight version) -- Key: SOLR-1773 URL: https://issues.apache.org/jira/browse/SOLR-1773 Project: Solr Issue Type: New Feature Components: search Affects Versions: 1.4 Reporter: Koji Sekiguchi Priority: Minor Attachments: LOADTEST.patch, SOLR-1773.patch I'd like to start another approach for field collapsing suggested by Yonik on 19/Dec/09 at SOLR-236. Re-posting the idea: {code} === two pass collapsing algorithm for collapse.aggregate=max First pass: pretend that collapseCount=1 - Use a TreeSet as a priority queue since one can remove and insert entries. - A HashMapKey,TreeSetEntry will be used to map from collapse group to top entry in the TreeSet - compare new doc with smallest element in treeset. If smaller discard and go to the next doc. - If new doc is bigger, look up it's group. Use the Map to find if the group has been added to the TreeSet and add it if not. - If the new bigger doc is already in the TreeSet, compare with the document in that group. If bigger, update the node, remove and re-add to the TreeSet to re-sort. efficiency: the treeset and hashmap are both only the size of the top number of docs we are looking at (10 for instance) We will now have the top 10 documents collapsed by the right field with a collapseCount of 1. Put another way, we have the top 10 groups. Second pass (if collapseCount1): - create a priority queue for each group (10) of size collapseCount - re-execute the query (or if the sort within the collapse groups does not involve score, we could just use the docids gathered during phase 1) - for each document, find it's appropriate priority queue and insert - optimization: we can use the previous info from phase1 to even avoid creating a priority queue if no other items matched. So instead of creating collapse groups for every group in the set (as is done now?), we create it for only 10 groups. Instead of collecting the score for every document in the set (40MB per request for a 10M doc index is *big*) we re-execute the query if needed. We could optionally store the score as is done now... but I bet aggregate throughput on large indexes would be better by just re-executing. Other thought: we could also cache the first phase in the query cache which would allow one to quickly move to the 2nd phase for any collapseCount. {code} The restriction is: {quote} one would not be able to tell the total number of collapsed docs, or the total number of hits (or the DocSet) after collapsing. So only collapse.facet=before would be supported. {quote} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1316) Create autosuggest component
[ https://issues.apache.org/jira/browse/SOLR-1316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12831544#action_12831544 ] Shalin Shekhar Mangar commented on SOLR-1316: - {quote}Where are we on this - do people feel it's ready to commit?{quote} It has been some time since I looked at it but I don't feel it is ready. Using it through spellcheck works but specifying spell check params feels odd. Also, I don't know how well it compares to regular TermsComponent or facet.prefix searches in terms of memory and cpu cost. Create autosuggest component Key: SOLR-1316 URL: https://issues.apache.org/jira/browse/SOLR-1316 Project: Solr Issue Type: New Feature Components: search Affects Versions: 1.4 Reporter: Jason Rutherglen Assignee: Shalin Shekhar Mangar Priority: Minor Fix For: 1.5 Attachments: suggest.patch, suggest.patch, suggest.patch, TST.zip Original Estimate: 96h Remaining Estimate: 96h Autosuggest is a common search function that can be integrated into Solr as a SearchComponent. Our first implementation will use the TernaryTree found in Lucene contrib. * Enable creation of the dictionary from the index or via Solr's RPC mechanism * What types of parameters and settings are desirable? * Hopefully in the future we can include user click through rates to boost those terms/phrases higher -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (SOLR-1768) Text Categorization Transformer
Text Categorization Transformer --- Key: SOLR-1768 URL: https://issues.apache.org/jira/browse/SOLR-1768 Project: Solr Issue Type: New Feature Components: contrib - DataImportHandler Reporter: Shalin Shekhar Mangar Priority: Minor A Transformer which uses TCatNG - http://tcatng.sourceforge.net/ (BSD license) to categorize text. See original discussion at - http://www.lucidimagination.com/search/document/37c1f48fb8224171/is_it_posible_to_exclude_results_from_other_languages -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Re: DB Connection
On Thu, Feb 4, 2010 at 8:08 PM, cjkadakia cjkada...@sonicbids.com wrote: Thanks. Next, can the 10 seconds be re-configured? We may likely want to keep the connection alive for a few minutes in case another commit is triggered. Is there any reason we may not want to consider this option? Commit on Solr or DB? In any case, creating another connection after a few minutes is not costly, so why complicate the code. -- Regards, Shalin Shekhar Mangar.
[jira] Assigned: (SOLR-1752) SolrJ fails with exception when passing document ADD and DELETEs in the same request using XML request writer (but not binary request writer)
[ https://issues.apache.org/jira/browse/SOLR-1752?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shalin Shekhar Mangar reassigned SOLR-1752: --- Assignee: Shalin Shekhar Mangar SolrJ fails with exception when passing document ADD and DELETEs in the same request using XML request writer (but not binary request writer) - Key: SOLR-1752 URL: https://issues.apache.org/jira/browse/SOLR-1752 Project: Solr Issue Type: Bug Components: clients - java, update Affects Versions: 1.4 Reporter: Jayson Minard Assignee: Shalin Shekhar Mangar Priority: Blocker Add this test to SolrExampleTests.java and it will fail when using the XML Request Writer (now default), but not if you change the SolrExampleJettyTest to use the BinaryRequestWriter. {code} public void testAddDeleteInSameRequest() throws Exception { SolrServer server = getSolrServer(); SolrInputDocument doc3 = new SolrInputDocument(); doc3.addField( id, id3, 1.0f ); doc3.addField( name, doc3, 1.0f ); doc3.addField( price, 10 ); UpdateRequest up = new UpdateRequest(); up.add( doc3 ); up.deleteById(id001); up.setWaitFlush(false); up.setWaitSearcher(false); up.process( server ); } {code} terminates with exception: {code} Feb 3, 2010 8:55:34 AM org.apache.solr.common.SolrException log SEVERE: org.apache.solr.common.SolrException: Illegal to have multiple roots (start tag in epilog?). at [row,col {unknown-source}]: [1,125] at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:72) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:54) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1089) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:365) at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181) at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:712) at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:139) at org.mortbay.jetty.Server.handle(Server.java:285) at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:502) at org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnection.java:835) at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:723) at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:202) at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:378) at org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:226) at org.mortbay.thread.BoundedThreadPool$PoolThread.run(BoundedThreadPool.java:442) Caused by: com.ctc.wstx.exc.WstxParsingException: Illegal to have multiple roots (start tag in epilog?). at [row,col {unknown-source}]: [1,125] at com.ctc.wstx.sr.StreamScanner.constructWfcException(StreamScanner.java:630) at com.ctc.wstx.sr.StreamScanner.throwParseError(StreamScanner.java:461) at com.ctc.wstx.sr.BasicStreamReader.handleExtraRoot(BasicStreamReader.java:2155) at com.ctc.wstx.sr.BasicStreamReader.nextFromProlog(BasicStreamReader.java:2070) at com.ctc.wstx.sr.BasicStreamReader.nextFromTree(BasicStreamReader.java:2647) at com.ctc.wstx.sr.BasicStreamReader.next(BasicStreamReader.java:1019) at org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:90) at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:69) ... 18 more {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Assigned: (SOLR-1741) NPE when deletionPolicy sets maxOptimizedCommitsTokeep=0
[ https://issues.apache.org/jira/browse/SOLR-1741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shalin Shekhar Mangar reassigned SOLR-1741: --- Assignee: Shalin Shekhar Mangar NPE when deletionPolicy sets maxOptimizedCommitsTokeep=0 Key: SOLR-1741 URL: https://issues.apache.org/jira/browse/SOLR-1741 Project: Solr Issue Type: Bug Components: replication (java) Affects Versions: 1.4 Reporter: Noble Paul Assignee: Shalin Shekhar Mangar Priority: Minor Fix For: 1.5 This is a user reported issue http://markmail.org/thread/bjcwiw3s66b5x76h -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Re: DB Connection
On Thu, Feb 4, 2010 at 12:54 AM, cjkadakia cjkada...@sonicbids.com wrote: I looked at some of the references to see if this has been explained or not, but I didn't see anything regarding it. I was wondering, quite simply, if the SQL Server connection from Solr during indexing is kept alive for all subsequent delta-import requests, or does it reopen the connection each time and close it after it's finished? DataImportHandler re-opens connection if it has not been used for the last 10 seconds. Connections are created at the start of an import and closed once the import finishes or is aborted. -- Regards, Shalin Shekhar Mangar.
[jira] Resolved: (SOLR-1701) Off-by-one error in calculating numFound in Distributed Search
[ https://issues.apache.org/jira/browse/SOLR-1701?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shalin Shekhar Mangar resolved SOLR-1701. - Resolution: Invalid Fix Version/s: (was: 1.5) Stupid mistake. Used delQ instead of del :( Off-by-one error in calculating numFound in Distributed Search -- Key: SOLR-1701 URL: https://issues.apache.org/jira/browse/SOLR-1701 Project: Solr Issue Type: Bug Components: search Reporter: Shalin Shekhar Mangar Attachments: SOLR-1701.patch {code} // This passes query(q, *:*, sort, id asc, fl, id,text); // This also passes (notice the rows param) query(q, *:*, sort, id desc, rows, 12, fl, id,text); // But this fails query(q, *:*, sort, id desc, fl, id,text); {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1682) Implement CollapseComponent
[ https://issues.apache.org/jira/browse/SOLR-1682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12799180#action_12799180 ] Shalin Shekhar Mangar commented on SOLR-1682: - bq. Shalin, I tried your patch out and I ran into a few problems with sorting and the collapse counts which turned out to be bugs. Thanks Martijn. {quote} Though I have a question about the response format. When collapse.threshold is 1 and more than one documents is collapsed then the collapse.count is named group.size. The field group.numFound is then added as well. Why did you gave it a different name? {quote} Actually I intended to rename collapse.value to group.value and collapse.count to group.numFound but I forgot to do it in both the places. * group.numFound = the total number of documents belonging to this group (i.e. have the same group.value) * group.size = the number of documents in this result set belonging to the same group which is equal to min(group.numFound, collapse.threshold) So when collapse.threshold = 1, group.size=1 and group.numFound will be equal to the number of documents in the same group. Suppose collapse.threshold = 5, but group.numFound=4 then group.size=4. The group.size is required to read all docs belonging to the same group without having to maintain a set. Let me know if you have suggestions for a better name than these. {quote} When collapse.threshold is larger than one two collectors are used. I understand that in both situations a different algorithm is used. But now also a search is done twice. Shouldn't it be better to have two complete distinct collectors that don't depend on one another? {quote} We can have distinct collectors. The CollapsedDocCollector uses some of the data that TopGroupCollector gathers and that is why it uses it directly. We could keep references to the individual objects that are needed too. As I said, this is just a PoC and not the final design. I'll give a new patch with the names fixed for both the cases. Implement CollapseComponent --- Key: SOLR-1682 URL: https://issues.apache.org/jira/browse/SOLR-1682 Project: Solr Issue Type: Sub-task Components: search Reporter: Martijn van Groningen Assignee: Shalin Shekhar Mangar Fix For: 1.5 Attachments: field-collapsing.patch, SOLR-1682.patch, SOLR-236.patch Child issue of SOLR-236. This issue is dedicated to field collapsing in general and all its code (CollapseComponent, DocumentCollapsers and CollapseCollectors). The main goal is the finalize the request parameters and response format. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (SOLR-1682) Implement CollapseComponent
[ https://issues.apache.org/jira/browse/SOLR-1682?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shalin Shekhar Mangar updated SOLR-1682: Attachment: SOLR-1682.patch Patch which fixes the inconsistent names for the meta fields. Implement CollapseComponent --- Key: SOLR-1682 URL: https://issues.apache.org/jira/browse/SOLR-1682 Project: Solr Issue Type: Sub-task Components: search Reporter: Martijn van Groningen Assignee: Shalin Shekhar Mangar Fix For: 1.5 Attachments: field-collapsing.patch, SOLR-1682.patch, SOLR-1682.patch, SOLR-236.patch Child issue of SOLR-236. This issue is dedicated to field collapsing in general and all its code (CollapseComponent, DocumentCollapsers and CollapseCollectors). The main goal is the finalize the request parameters and response format. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1720) replication configuration bug with multiple replicateAfter values
[ https://issues.apache.org/jira/browse/SOLR-1720?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12799600#action_12799600 ] Shalin Shekhar Mangar commented on SOLR-1720: - Yonik, replicateAfter is supposed to be specified multiple times with different values. A single replicateAfter with comma separated value is invalid. So it is by design, not a bug. We could change that if you want. replication configuration bug with multiple replicateAfter values - Key: SOLR-1720 URL: https://issues.apache.org/jira/browse/SOLR-1720 Project: Solr Issue Type: Bug Affects Versions: 1.4 Reporter: Yonik Seeley Fix For: 1.5 Jason reported problems with Multiple replicateAfter values - it worked after changing to just commit http://www.lucidimagination.com/search/document/e4c9ba46dc03b031/replication_problem -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1680) Provide an API to specify custom Collectors
[ https://issues.apache.org/jira/browse/SOLR-1680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12797771#action_12797771 ] Shalin Shekhar Mangar commented on SOLR-1680: - bq. Why not broaden this and allow people to pass in their own collectors? Yes, that is the general idea, though it would be API driven than configuration. Any component should be able to pass a Collector to the various SolrIndexSearcher methods. bq. Also, can you explain a bit more the use case specifically for Field Collapse? Field Collapsing needs to use a custom collector. Right now the collector is hard coded inside SolrIndexSearcher. bq. Alternatively, given something like LUCENE-2127, we may want Solr to be able to make query time decisions about what Collector to use. I guess that decision should be made by QueryComponent? If so, then the ability to pass a custom Collector to SolrIndexSearcher methods should be enough. Provide an API to specify custom Collectors --- Key: SOLR-1680 URL: https://issues.apache.org/jira/browse/SOLR-1680 Project: Solr Issue Type: Sub-task Components: search Affects Versions: 1.3 Reporter: Martijn van Groningen Fix For: 1.5 Attachments: field-collapse-core.patch, SOLR-1680.patch The issue is dedicated to incorporate fieldcollapse's changes to the Solr's core code. We want to make it possible for components to specify custom Collectors in SolrIndexSearcher methods. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (SOLR-1705) Move QueryConvertor into SpellCheckComponent configuration
Move QueryConvertor into SpellCheckComponent configuration -- Key: SOLR-1705 URL: https://issues.apache.org/jira/browse/SOLR-1705 Project: Solr Issue Type: Improvement Components: spellchecker Reporter: Shalin Shekhar Mangar Priority: Minor Fix For: 1.5 QueryConvertor is a top level XML tag in solrconfig.xml but it is used by SpellCheckComponent only. Deprecate the current queryConvertor configuration and move it inside SpellCheckComponent configurationl. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (SOLR-1701) Off-by-one error in calculating numFound in Distributed Search
Off-by-one error in calculating numFound in Distributed Search -- Key: SOLR-1701 URL: https://issues.apache.org/jira/browse/SOLR-1701 Project: Solr Issue Type: Bug Components: search Reporter: Shalin Shekhar Mangar Fix For: 1.5 {code} // This passes query(q, *:*, sort, id asc, fl, id,text); // This also passes (notice the rows param) query(q, *:*, sort, id desc, rows, 12, fl, id,text); // But this fails query(q, *:*, sort, id desc, fl, id,text); {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (SOLR-1701) Off-by-one error in calculating numFound in Distributed Search
[ https://issues.apache.org/jira/browse/SOLR-1701?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shalin Shekhar Mangar updated SOLR-1701: Attachment: SOLR-1701.patch Test to demonstrate the bug Off-by-one error in calculating numFound in Distributed Search -- Key: SOLR-1701 URL: https://issues.apache.org/jira/browse/SOLR-1701 Project: Solr Issue Type: Bug Components: search Reporter: Shalin Shekhar Mangar Fix For: 1.5 Attachments: SOLR-1701.patch {code} // This passes query(q, *:*, sort, id asc, fl, id,text); // This also passes (notice the rows param) query(q, *:*, sort, id desc, rows, 12, fl, id,text); // But this fails query(q, *:*, sort, id desc, fl, id,text); {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1212) TestNG Test Case
[ https://issues.apache.org/jira/browse/SOLR-1212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12796123#action_12796123 ] Shalin Shekhar Mangar commented on SOLR-1212: - bq. Keeping this out of the codebase would result in the patch being out of sync with the tree. If there were no licensing restrictions - what is the harm in having this in the tree. You wrote this because you needed it at work and I appreciate that you thought about contributing it to Solr. But from Solr's perspective it is not needed and therefore I don't see why we should ship it at all. It is a class that is not used by Solr but would need to be maintained by us if we ship it. TestNG Test Case - Key: SOLR-1212 URL: https://issues.apache.org/jira/browse/SOLR-1212 Project: Solr Issue Type: New Feature Components: clients - java Affects Versions: 1.4 Environment: Java 6 Reporter: Kay Kay Fix For: 1.5 Attachments: SOLR-1212.patch, testng-5.9-jdk15.jar Original Estimate: 1h Remaining Estimate: 1h TestNG equivalent of AbstractSolrTestCase , without using JUnit altogether . New Class created: AbstractSolrNGTest LICENSE.txt , NOTICE.txt modified as appropriate. ( TestNG under Apache License 2.0 ) TestNG 5.9-jdk15 added to lib. Justification: In some workplaces - people are moving towards TestNG and take out JUnit altogether from the classpath. Hence useful in those cases. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1602) Refactor SOLR package structure to include o.a.solr.response and move QueryResponseWriters in there
[ https://issues.apache.org/jira/browse/SOLR-1602?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12796108#action_12796108 ] Shalin Shekhar Mangar commented on SOLR-1602: - {quote} One that springs to mind is updateRequestProcessor going to updateRequestProcessorChain. {quote} Patrick, I think that change was made in trunk before update processors were ever released. {quote} I'm both a software developer and a user of SOLR, and the consistent resistance to any proposed refactoring is quite troubling. {quote} The resistance is not towards refactoring. We are arguing about compatibility, not refactoring. {quote} And as I noted from a code organization standpoint, placing classes named response in a package named request is not subjectively anything - it's poor design and it needs to be addressed. {quote} I bet 99% of the users do not care about a wrongly named package when everything works. But they care when things stop working. Code organization is secondary to usability. Let us not cause discomfort to our users for such a trivial issue. {quote} As for no apparent reason as I mentioned to Noble, end-users of a system don't dictate its code-level organization/design. {quote} End users do not dictate code level organization but they do have an influence when compatibility is involved. In this case, it is an inconvenience for many of them which can be avoided easily, so why not? I agree with Hoss. This is too much discussion over too small an issue. I think things are quite clear. Hoss, Erik, Noble and I all feel that breaking compatibility is not worth it. So lets do what needs to be done and get on with more important things. Refactor SOLR package structure to include o.a.solr.response and move QueryResponseWriters in there --- Key: SOLR-1602 URL: https://issues.apache.org/jira/browse/SOLR-1602 Project: Solr Issue Type: Improvement Components: Response Writers Affects Versions: 1.2, 1.3, 1.4 Environment: independent of environment (code structure) Reporter: Chris A. Mattmann Assignee: Noble Paul Fix For: 1.5 Attachments: SOLR-1602.Mattmann.112509.patch.txt, SOLR-1602.Mattmann.112509_02.patch.txt, upgrade_solr_config Currently all o.a.solr.request.QueryResponseWriter implementations are curiously located in the o.a.solr.request package. Not only is this package getting big (30+ classes), a lot of them are misplaced. There should be a first-class o.a.solr.response package, and the response related classes should be given a home there. Patch forthcoming. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Assigned: (SOLR-1682) Implement CollapseComponent
[ https://issues.apache.org/jira/browse/SOLR-1682?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shalin Shekhar Mangar reassigned SOLR-1682: --- Assignee: Shalin Shekhar Mangar Implement CollapseComponent --- Key: SOLR-1682 URL: https://issues.apache.org/jira/browse/SOLR-1682 Project: Solr Issue Type: Sub-task Components: search Reporter: Martijn van Groningen Assignee: Shalin Shekhar Mangar Fix For: 1.5 Attachments: field-collapsing.patch Child issue of SOLR-236. This issue is dedicated to field collapsing in general and all its code (CollapseComponent, DocumentCollapsers and CollapseCollectors). The main goal is the finalize the request parameters and response format. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (SOLR-1682) Implement CollapseComponent
[ https://issues.apache.org/jira/browse/SOLR-1682?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shalin Shekhar Mangar updated SOLR-1682: Attachment: SOLR-236.patch Here's an implementation based on [Yonik's suggestion|https://issues.apache.org/jira/browse/SOLR-236?focusedCommentId=12792916page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12792916]. This is just a PoC and not fit to be committed. This implementation uses one pass for collapse.threshold=1 and two passes for collapse.threshold1 so it should be a lot faster than the previous method. Though, I haven't benchmarked yet. Memory consumption should be proportional to start+count instead of index size. What is covered: # Non-adjacent collapsing # collapse.threshold # [New response format|https://issues.apache.org/jira/browse/SOLR-236?focusedCommentId=12793101page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12793101] # Includes DocSetAwareCollector interface from SOLR-1680 What is not covered: # Adjacent collapsing # Aggregate functions (should be easy to add) # Faceting (it doesn't keep/return the docsets needed for FacetComponent) # Caching # This implementation does not return the correct numFound The response adds special fields to only the first document in a group. Here's a sample of the first document in a group: {code:xml} doc int name=id1/int str name=name_s1author1/str str name=title_s1a tree/str date name=timestamp2009-12-30T10:16:51.944Z/date arr name=multiDefault strmuLti-Default/str /arr int name=intDefault42/int str name=collapse.valueauthor1/str int name=collapse.count1/int float name=score0.67107505/float /doc {code} See TestCollapseComponent.java for example usage. Implement CollapseComponent --- Key: SOLR-1682 URL: https://issues.apache.org/jira/browse/SOLR-1682 Project: Solr Issue Type: Sub-task Components: search Reporter: Martijn van Groningen Assignee: Shalin Shekhar Mangar Fix For: 1.5 Attachments: field-collapsing.patch, SOLR-236.patch Child issue of SOLR-236. This issue is dedicated to field collapsing in general and all its code (CollapseComponent, DocumentCollapsers and CollapseCollectors). The main goal is the finalize the request parameters and response format. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1688) Inner class FieldCacheSources should be refactored into their own classes
[ https://issues.apache.org/jira/browse/SOLR-1688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12794782#action_12794782 ] Shalin Shekhar Mangar commented on SOLR-1688: - {quote} IMO, most of these should remain implementation details (i.e. not public)... they weren't thought out in sufficient detail to support as public classes (and there has been little reason to do so). If we need StrValueSource to be public for another issue, then we should limit the change to that. {quote} +1 As they say, lets not fix what ain't broken. {quote} If they are defined as a core data structure part of the JDK, then I would say yes. It's not as black and white of a line as you make it out to be. You can have SOLR be entirely a plugin-based system, with nothing but configuration inside of SVN, or you can have every piece of code that interacts with SOLR be inside the SOLR SVN. Neither solution will work, you have to strike a balance. The same applies for code organization and using absolutes or extremes doesn't really illustrate much. {quote} Chris, we are striving for balance and we are OK with the change to StrFieldSource. In this particular case, you seem to be pushing towards extremes in the name of consistency. {quote} Can you tell me the reason that e.g., StrFieldSource exists inside of StrField while DoubleFieldSource exists outside of DoubleField? Or why the other 4 or 5 FieldSources that are defined inside of their own java file exist there, while the other 4 or 5 defined inside of the FieldType's java file exist there? What's the litmus test? {quote} It is not a public API and I guess that at the time it was written, there was no reason to make it one. It was convenient or a matter of personal style or most likely a random choice. There is no litmus test and there does not have to be one. {quote} Because it's more consistent, and thus, more maintainable. {quote} Actually it is the other way round. Once you make it public, it is harder to maintain. All changes should then be backward compatible as far as possible. The bottom line is that making all of them public is not needed. Your opinion is that it is broken because it is not consistent. My opinion is that it is OK and it does not matter. We shouldn't lean towards making something a public API in the name of consistency. {quote} Because when you tell someone to modify one of the core FieldSources or ValueSources, they know where to look instead of, oh is this one inside of a class inside of o.a.solr.schema, or is this one in the o.a.solr.search.function package? {quote} Most IDEs have a way to goto the source of a particular class, otherwise there is grep. The point is that many (most?) of these classes don't need to be modified unless in very rare cases. If it becomes a common practice to modify them, then there is probably something wrong with our APIs and we need to re-think them. Inner class FieldCacheSources should be refactored into their own classes - Key: SOLR-1688 URL: https://issues.apache.org/jira/browse/SOLR-1688 Project: Solr Issue Type: Improvement Components: search Affects Versions: 1.4 Environment: indep. of env. Reporter: Chris A. Mattmann Fix For: 1.5 Attachments: SOLR-1688.Mattmann.122609.patch.txt While working on SOLR-1586 I noticed that outside of class level access (or package level), you can't really reference FieldCacheSources that are defined inside of their FieldType constituents (e.g., in the case of StrFieldSource as defined in StrField). What's more troubling is that the FieldType/FieldCacheSources are defined in an inconsistent fashion: some are done as inner classes, e.g., StrFieldSource and SortableFloatFieldSource, while others are defined as individual classes (e.g., FloatFIeldSource). This patch will make them all consistent and define each FieldCacheSource as an outside class, present in o.a.solr.search.function. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1688) Inner class FieldCacheSources should be refactored into their own classes
[ https://issues.apache.org/jira/browse/SOLR-1688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12794664#action_12794664 ] Shalin Shekhar Mangar commented on SOLR-1688: - Chris, isn't referring to it as a ValueSource instance enough for SOLR-1586? {quote} What's more troubling is that the FieldType/FieldCacheSources are defined in an inconsistent fashion: some are done as inner classes, e.g., StrFieldSource and SortableFloatFieldSource, while others are defined as individual classes (e.g., FloatFIeldSource). {quote} That is not really a problem. The field types are always loaded by Solr so whether they are an inner class or independent does not matter too much. Inner class FieldCacheSources should be refactored into their own classes - Key: SOLR-1688 URL: https://issues.apache.org/jira/browse/SOLR-1688 Project: Solr Issue Type: Improvement Components: search Affects Versions: 1.4 Environment: indep. of env. Reporter: Chris A. Mattmann Fix For: 1.5 Attachments: SOLR-1688.Mattmann.122609.patch.txt While working on SOLR-1586 I noticed that outside of class level access (or package level), you can't really reference FieldCacheSources that are defined inside of their FieldType constituents (e.g., in the case of StrFieldSource as defined in StrField). What's more troubling is that the FieldType/FieldCacheSources are defined in an inconsistent fashion: some are done as inner classes, e.g., StrFieldSource and SortableFloatFieldSource, while others are defined as individual classes (e.g., FloatFIeldSource). This patch will make them all consistent and define each FieldCacheSource as an outside class, present in o.a.solr.search.function. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (SOLR-1685) Refactor QueryComponent for easy extensibility
Refactor QueryComponent for easy extensibility -- Key: SOLR-1685 URL: https://issues.apache.org/jira/browse/SOLR-1685 Project: Solr Issue Type: Sub-task Reporter: Shalin Shekhar Mangar Assignee: Shalin Shekhar Mangar -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (SOLR-1685) Refactor QueryComponent for easy extensibility
[ https://issues.apache.org/jira/browse/SOLR-1685?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shalin Shekhar Mangar updated SOLR-1685: Attachment: SOLR-1685.patch Extracted field sort value and prefetch processing into two new methods out of QueryComponent#process Refactor QueryComponent for easy extensibility -- Key: SOLR-1685 URL: https://issues.apache.org/jira/browse/SOLR-1685 Project: Solr Issue Type: Sub-task Reporter: Shalin Shekhar Mangar Assignee: Shalin Shekhar Mangar Attachments: SOLR-1685.patch -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (SOLR-1685) Refactor QueryComponent for easy extensibility
[ https://issues.apache.org/jira/browse/SOLR-1685?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shalin Shekhar Mangar resolved SOLR-1685. - Resolution: Fixed Fix Version/s: 1.5 Committed revision 893723. Refactor QueryComponent for easy extensibility -- Key: SOLR-1685 URL: https://issues.apache.org/jira/browse/SOLR-1685 Project: Solr Issue Type: Sub-task Reporter: Shalin Shekhar Mangar Assignee: Shalin Shekhar Mangar Fix For: 1.5 Attachments: SOLR-1685.patch -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (SOLR-1686) Support fixing the number of shards in BaseDistributedTestCase
Support fixing the number of shards in BaseDistributedTestCase -- Key: SOLR-1686 URL: https://issues.apache.org/jira/browse/SOLR-1686 Project: Solr Issue Type: Sub-task Reporter: Shalin Shekhar Mangar Assignee: Shalin Shekhar Mangar -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (SOLR-1686) Support fixing the number of shards in BaseDistributedTestCase
[ https://issues.apache.org/jira/browse/SOLR-1686?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shalin Shekhar Mangar updated SOLR-1686: Attachment: SOLR-1686.patch A new protected flag named fixShardCount is added which can be set to true by sub-classes to fix the number of shards being used for testing. Support fixing the number of shards in BaseDistributedTestCase -- Key: SOLR-1686 URL: https://issues.apache.org/jira/browse/SOLR-1686 Project: Solr Issue Type: Sub-task Reporter: Shalin Shekhar Mangar Assignee: Shalin Shekhar Mangar Fix For: 1.5 Attachments: SOLR-1686.patch -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (SOLR-1686) Support fixing the number of shards in BaseDistributedTestCase
[ https://issues.apache.org/jira/browse/SOLR-1686?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shalin Shekhar Mangar resolved SOLR-1686. - Resolution: Fixed Fix Version/s: 1.5 Committed revision 893725. Support fixing the number of shards in BaseDistributedTestCase -- Key: SOLR-1686 URL: https://issues.apache.org/jira/browse/SOLR-1686 Project: Solr Issue Type: Sub-task Reporter: Shalin Shekhar Mangar Assignee: Shalin Shekhar Mangar Fix For: 1.5 Attachments: SOLR-1686.patch -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (SOLR-236) Field collapsing
[ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shalin Shekhar Mangar updated SOLR-236: --- Attachment: SOLR-236.patch # Patch updated for SOLR-1685 and SOLR-1686 # The last patch had reverted changes to CollapseComponent configuration in solrconfig.xml and solrconfig-fieldcollapse.xml. Synced it back Field collapsing Key: SOLR-236 URL: https://issues.apache.org/jira/browse/SOLR-236 Project: Solr Issue Type: New Feature Components: search Affects Versions: 1.3 Reporter: Emmanuel Keller Assignee: Shalin Shekhar Mangar Fix For: 1.5 Attachments: collapsing-patch-to-1.3.0-dieter.patch, collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch, field-collapse-4-with-solrj.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, quasidistributed.additional.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, solr-236.patch, SOLR-236_collapsing.patch, SOLR-236_collapsing.patch This patch include a new feature called Field collapsing. Used in order to collapse a group of results with similar value for a given field to a single entry in the result set. Site collapsing is a special case of this, where all results for a given web site is collapsed into one or two entries in the result set, typically with an associated more documents from this site link. See also Duplicate detection. http://www.fastsearch.com/glossary.aspx?m=48amid=299 The implementation add 3 new query parameters (SolrParams): collapse.field to choose the field used to group results collapse.type normal (default value) or adjacent collapse.max to select how many continuous results are allowed before collapsing TODO (in progress): - More documentation (on source code) - Test cases Two patches: - field_collapsing.patch for current development version - field_collapsing_1.1.0.patch for Solr-1.1.0 P.S.: Feedback and misspelling correction are welcome ;-) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (SOLR-1676) spellcheck.count has confusing default and documentation
[ https://issues.apache.org/jira/browse/SOLR-1676?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shalin Shekhar Mangar resolved SOLR-1676. - Resolution: Fixed Fix Version/s: 1.5 Committed revision 893700. I've added a note in the example solrconfig.xml to refer to the wiki for details on the request parameters. Thanks Daniel! spellcheck.count has confusing default and documentation Key: SOLR-1676 URL: https://issues.apache.org/jira/browse/SOLR-1676 Project: Solr Issue Type: Bug Components: spellchecker Affects Versions: 1.4 Reporter: Daniel Naber Assignee: Shalin Shekhar Mangar Priority: Minor Fix For: 1.5 Attachments: solr-spellcheck.diff It seems spellcheck.count does not just limit the number of results returned, as the documentation claims. Instead, this value is given to the Lucene SpellChecker class which multiplies it by 10 and then only fetches the first spellcheck.count*10 candidates, ignoring all others. The effect is that with a low value for spellcheck.count you might miss good hits. In other words, the first item with spellcheck.count==1 is not always the same item as with e.g. spellcheck.count==10. The fix could be to fix the documentation (the comments in the sample solrconfig.xml) to mention this and use a better default. The Lucene SpellChecker class says about the numSug parameter: Thus, you should set this value to *at least* 5 for a good suggestion. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (SOLR-1682) Implement CollapseComponent
[ https://issues.apache.org/jira/browse/SOLR-1682?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shalin Shekhar Mangar updated SOLR-1682: Affects Version/s: (was: 1.3) Summary: Implement CollapseComponent (was: The field collapse ) Implement CollapseComponent --- Key: SOLR-1682 URL: https://issues.apache.org/jira/browse/SOLR-1682 Project: Solr Issue Type: Sub-task Components: search Reporter: Martijn van Groningen Fix For: 1.5 Attachments: field-collapsing.patch Child issue of SOLR-236. This issue is dedicated to field collapsing in general and all its code (CollapseComponent, DocumentCollapsers and CollapseCollectors). The main goal is the finalize the request parameters and response format. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (SOLR-1680) Provide an API to specify custom Collectors
[ https://issues.apache.org/jira/browse/SOLR-1680?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shalin Shekhar Mangar updated SOLR-1680: Description: The issue is dedicated to incorporate fieldcollapse's changes to the Solr's core code. We want to make it possible for components to specify custom Collectors in SolrIndexSearcher methods. was:Child issue of SOLR-236. The issue is dedicated to incorporate fieldcollapse's changes to the Solr's core code. Summary: Provide an API to specify custom Collectors (was: Fieldcollapse related changes to the core) Provide an API to specify custom Collectors --- Key: SOLR-1680 URL: https://issues.apache.org/jira/browse/SOLR-1680 Project: Solr Issue Type: Sub-task Components: search Affects Versions: 1.3 Reporter: Martijn van Groningen Fix For: 1.5 Attachments: field-collapse-core.patch The issue is dedicated to incorporate fieldcollapse's changes to the Solr's core code. We want to make it possible for components to specify custom Collectors in SolrIndexSearcher methods. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-236) Field collapsing
[ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12793607#action_12793607 ] Shalin Shekhar Mangar commented on SOLR-236: {quote} This is exactly the point, it's not really meta-data over the document, but on the group the document belongs to. And you also need a more obvious way to mark this document as a group representation (to distinguish it from other normal documents). {quote} We show the highest scoring document of a group, so does the fact that the metadata belongs to the group and not the document matter at all? {quote} But extending the current doc element, doesn't mean we break BWC. Adding a collapse-info (or collapse-meta-data) sub element to it, will certainly not break anything, specially when we still don't have a formal xsd for the responses (I know we're working on it, but it's still not out there so it's safe). {quote} We are not extending anything. We're just adding a couple of fields which may not exist in the index and this is a capability we plan to introduce anyway (however this issue does not need to depend on SOLR-1566). The response format remains exactly the same. There is no break in compatibility. Field collapsing Key: SOLR-236 URL: https://issues.apache.org/jira/browse/SOLR-236 Project: Solr Issue Type: New Feature Components: search Affects Versions: 1.3 Reporter: Emmanuel Keller Assignee: Shalin Shekhar Mangar Fix For: 1.5 Attachments: collapsing-patch-to-1.3.0-dieter.patch, collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch, field-collapse-4-with-solrj.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, quasidistributed.additional.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, solr-236.patch, SOLR-236_collapsing.patch, SOLR-236_collapsing.patch This patch include a new feature called Field collapsing. Used in order to collapse a group of results with similar value for a given field to a single entry in the result set. Site collapsing is a special case of this, where all results for a given web site is collapsed into one or two entries in the result set, typically with an associated more documents from this site link. See also Duplicate detection. http://www.fastsearch.com/glossary.aspx?m=48amid=299 The implementation add 3 new query parameters (SolrParams): collapse.field to choose the field used to group results collapse.type normal (default value) or adjacent collapse.max to select how many continuous results are allowed before collapsing TODO (in progress): - More documentation (on source code) - Test cases Two patches: - field_collapsing.patch for current development version - field_collapsing_1.1.0.patch for Solr-1.1.0 P.S.: Feedback and misspelling correction are welcome ;-) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Re: Distributed search test using only one shard?
On Tue, Dec 22, 2009 at 8:23 PM, Yonik Seeley yo...@lucidimagination.comwrote: Looks like the recently committed SOLR-1608 accidentally changed this... it was nservers4 before that. Yes, I changed it for debugging and then forgot to change it back. Sorry about that. -- Regards, Shalin Shekhar Mangar.
[jira] Commented: (SOLR-1682) The field collapse
[ https://issues.apache.org/jira/browse/SOLR-1682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12793957#action_12793957 ] Shalin Shekhar Mangar commented on SOLR-1682: - Isn't this issue the same as SOLR-236? It is better to have patches in one place than two. Lets close this one The field collapse --- Key: SOLR-1682 URL: https://issues.apache.org/jira/browse/SOLR-1682 Project: Solr Issue Type: Sub-task Components: search Affects Versions: 1.3 Reporter: Martijn van Groningen Fix For: 1.5 Attachments: field-collapsing.patch Child issue of SOLR-236. This issue is dedicated to field collapsing in general and all its code (CollapseComponent, DocumentCollapsers and CollapseCollectors). The main goal is the finalize the request parameters and response format. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-236) Field collapsing
[ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12793958#action_12793958 ] Shalin Shekhar Mangar commented on SOLR-236: @ttdi - Please post your questions to solr-user mailing list. This issue is strictly for Solr related development (not usage). Field collapsing Key: SOLR-236 URL: https://issues.apache.org/jira/browse/SOLR-236 Project: Solr Issue Type: New Feature Components: search Affects Versions: 1.3 Reporter: Emmanuel Keller Assignee: Shalin Shekhar Mangar Fix For: 1.5 Attachments: collapsing-patch-to-1.3.0-dieter.patch, collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch, field-collapse-4-with-solrj.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, quasidistributed.additional.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, solr-236.patch, SOLR-236_collapsing.patch, SOLR-236_collapsing.patch This patch include a new feature called Field collapsing. Used in order to collapse a group of results with similar value for a given field to a single entry in the result set. Site collapsing is a special case of this, where all results for a given web site is collapsed into one or two entries in the result set, typically with an associated more documents from this site link. See also Duplicate detection. http://www.fastsearch.com/glossary.aspx?m=48amid=299 The implementation add 3 new query parameters (SolrParams): collapse.field to choose the field used to group results collapse.type normal (default value) or adjacent collapse.max to select how many continuous results are allowed before collapsing TODO (in progress): - More documentation (on source code) - Test cases Two patches: - field_collapsing.patch for current development version - field_collapsing_1.1.0.patch for Solr-1.1.0 P.S.: Feedback and misspelling correction are welcome ;-) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Assigned: (SOLR-1676) spellcheck.count has confusing default and documentation
[ https://issues.apache.org/jira/browse/SOLR-1676?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shalin Shekhar Mangar reassigned SOLR-1676: --- Assignee: Shalin Shekhar Mangar spellcheck.count has confusing default and documentation Key: SOLR-1676 URL: https://issues.apache.org/jira/browse/SOLR-1676 Project: Solr Issue Type: Bug Components: spellchecker Affects Versions: 1.4 Reporter: Daniel Naber Assignee: Shalin Shekhar Mangar Priority: Minor It seems spellcheck.count does not just limit the number of results returned, as the documentation claims. Instead, this value is given to the Lucene SpellChecker class which multiplies it by 10 and then only fetches the first spellcheck.count*10 candidates, ignoring all others. The effect is that with a low value for spellcheck.count you might miss good hits. In other words, the first item with spellcheck.count==1 is not always the same item as with e.g. spellcheck.count==10. The fix could be to fix the documentation (the comments in the sample solrconfig.xml) to mention this and use a better default. The Lucene SpellChecker class says about the numSug parameter: Thus, you should set this value to *at least* 5 for a good suggestion. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1676) spellcheck.count has confusing default and documentation
[ https://issues.apache.org/jira/browse/SOLR-1676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12793158#action_12793158 ] Shalin Shekhar Mangar commented on SOLR-1676: - Although it is not documented anywhere, SpellCheckComponent passes max(spellcheck.count, 5) to the Lucene spellchecker, see AbstractLuceneSpellChecker line 141 in trunk. bq. The effect is that with a low value for spellcheck.count you might miss good hits. In other words, the first item with spellcheck.count==1 is not always the same item as with e.g. spellcheck.count==10. That is true. It is a trade-off between accuracy and performance. We cannot avoid this without fetching all results (or a large number of them) internally and score all of them with a distance metric and that can make it very slow. Do you have any suggestion on how we could improve the documentation? spellcheck.count has confusing default and documentation Key: SOLR-1676 URL: https://issues.apache.org/jira/browse/SOLR-1676 Project: Solr Issue Type: Bug Components: spellchecker Affects Versions: 1.4 Reporter: Daniel Naber Priority: Minor It seems spellcheck.count does not just limit the number of results returned, as the documentation claims. Instead, this value is given to the Lucene SpellChecker class which multiplies it by 10 and then only fetches the first spellcheck.count*10 candidates, ignoring all others. The effect is that with a low value for spellcheck.count you might miss good hits. In other words, the first item with spellcheck.count==1 is not always the same item as with e.g. spellcheck.count==10. The fix could be to fix the documentation (the comments in the sample solrconfig.xml) to mention this and use a better default. The Lucene SpellChecker class says about the numSug parameter: Thus, you should set this value to *at least* 5 for a good suggestion. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1674) improve analysis tests, cut over to new API
[ https://issues.apache.org/jira/browse/SOLR-1674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12793174#action_12793174 ] Shalin Shekhar Mangar commented on SOLR-1674: - All tests pass after renaming protWords.txt to protwords.txt. Unfortunately, this is too big to review in detail right now but I trust Robert to do the right thing :) bq. If there are no objections I will commit this beautiful addition to our analysis tests soon. +1 improve analysis tests, cut over to new API --- Key: SOLR-1674 URL: https://issues.apache.org/jira/browse/SOLR-1674 Project: Solr Issue Type: Test Components: Schema and Analysis Reporter: Robert Muir Attachments: SOLR-1674.patch, SOLR-1674.patch This patch * converts all analysis tests to use the new tokenstream api * converts most tests to use the more stringent assertion mechanisms from lucene * adds new tests to improve coverage Most bugs found by more stringent testing have been fixed, with the exception of SynonymFilter. The problems with this filter are more serious, the previous tests were essentially a no-op. The new tests for SynonymFilter test the current behavior, but have FIXMEs with what I think the old test wanted to expect in the comments. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1676) spellcheck.count has confusing default and documentation
[ https://issues.apache.org/jira/browse/SOLR-1676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12793277#action_12793277 ] Shalin Shekhar Mangar commented on SOLR-1676: - I guess it is better to add this information to the SpellCheckComponent wiki page and reference that in the example solrconfig.xml. Anybody using SpellCheckComponent would anyway need to refer to the wiki to figure out the other parameters. http://wiki.apache.org/solr/SpellCheckComponent spellcheck.count has confusing default and documentation Key: SOLR-1676 URL: https://issues.apache.org/jira/browse/SOLR-1676 Project: Solr Issue Type: Bug Components: spellchecker Affects Versions: 1.4 Reporter: Daniel Naber Assignee: Shalin Shekhar Mangar Priority: Minor Attachments: solr-spellcheck.diff It seems spellcheck.count does not just limit the number of results returned, as the documentation claims. Instead, this value is given to the Lucene SpellChecker class which multiplies it by 10 and then only fetches the first spellcheck.count*10 candidates, ignoring all others. The effect is that with a low value for spellcheck.count you might miss good hits. In other words, the first item with spellcheck.count==1 is not always the same item as with e.g. spellcheck.count==10. The fix could be to fix the documentation (the comments in the sample solrconfig.xml) to mention this and use a better default. The Lucene SpellChecker class says about the numSug parameter: Thus, you should set this value to *at least* 5 for a good suggestion. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Re: SOLR 1.4 debian packaging
On Tue, Dec 22, 2009 at 4:15 AM, Chris Hostetter hossman_luc...@fucit.orgwrote: : @solr-dev: Could sbd. from upstream help us out with a working tomcat.policy : for solr? For now I just granted all permissions to solr. * solr needs read for the conf directory. With the new Java based replication in Solr 1.4, people who need configuration replication need read/write access for the conf directory. -- Regards, Shalin Shekhar Mangar.
Re: $Id$
On Sun, Dec 20, 2009 at 10:42 PM, Mark Miller markrmil...@gmail.com wrote: Robert Muir wrote: Hello, I am wondering why we are using $Id$ in solr? To me it only seems this causes problems with applying patches (it is causing Mark a problem right now). I am trying to see how it is helpful? there are other ways to see the svn history that do not cause problems with patches +1 on giving them the boot - we decided the same thing in Lucene - who needs them when they cause these problems and offer little to nothing in return. And I can't count how many patches I've had to hand fix ... I agree. It causes problems with patches and I don't see the benefit of using them in class javadocs. Though they are sometimes useful in the statistics section (for SolrInfoMBean) -- Regards, Shalin Shekhar Mangar.
[jira] Commented: (SOLR-236) Field collapsing
[ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12793101#action_12793101 ] Shalin Shekhar Mangar commented on SOLR-236: How about we change the current field collapsing response format to the following? We add new well-known fields to the document itself, say # collapse.value - contains the group field's value for this document # collapse.count - the number of results collapsed under this document # collapse.aggregate.function(field-name) - the aggregate value for the given function applied to the given field for this document's group Example: {code:xml} ?xml version=1.0 encoding=UTF-8? response lst name=responseHeader int name=status0/int int name=QTime2/int lst name=params str name=collapse.fieldmanu_exact/str str name=collapse.aggregatemax(field1)/str str name=collapse.aggregateavg(field1)/str str name=qtitle:test/str str name=field.collapsetitle/str str name=qtcollapse/str /lst /lst result name=response numFound=30 start=0 doc str name=idF8V7067-APL-KIT/str str name=collapse.valueBelkin/str int name=collapse.count1/int int name=collapse.aggregate.max(field1)100/int float name=collapse.aggregate.avg(field1)50.0/float /doc doc str name=idTWINX2048-3200PRO/str str name=collapse.valueCorsair Microsystems Inc./str int name=collapse.count3/int int name=collapse.aggregate.max(field1)100/int float name=collapse.aggregate.avg(field1)50.0/float /doc /result /response {code} No need to have another section and correlate based on uniqueKeys. For this to work, CollapseComponent must generate a custom SolrDocumentList and set it as results in the response. For request parameters: # collapse.aggregate - Can we make this a multi-valued parameter instead of comma separated? Field collapsing Key: SOLR-236 URL: https://issues.apache.org/jira/browse/SOLR-236 Project: Solr Issue Type: New Feature Components: search Affects Versions: 1.3 Reporter: Emmanuel Keller Assignee: Shalin Shekhar Mangar Fix For: 1.5 Attachments: collapsing-patch-to-1.3.0-dieter.patch, collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch, field-collapse-4-with-solrj.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, quasidistributed.additional.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, solr-236.patch, SOLR-236_collapsing.patch, SOLR-236_collapsing.patch This patch include a new feature called Field collapsing. Used in order to collapse a group of results with similar value for a given field to a single entry in the result set. Site collapsing is a special case of this, where all results for a given web site is collapsed into one or two entries in the result set, typically with an associated more documents from this site link. See also Duplicate detection. http://www.fastsearch.com/glossary.aspx?m=48amid=299 The implementation add 3 new query parameters (SolrParams): collapse.field to choose the field used to group results collapse.type normal (default value) or adjacent collapse.max to select how many continuous results are allowed before collapsing TODO (in progress): - More documentation (on source code) - Test cases Two patches: - field_collapsing.patch for current development version - field_collapsing_1.1.0.patch for Solr-1.1.0 P.S.: Feedback and misspelling correction are welcome ;-) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (SOLR-236) Field collapsing
[ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shalin Shekhar Mangar updated SOLR-236: --- Attachment: SOLR-236.patch Changes: # Modified configuration as Noble suggested. The AggregateCollapseCollectorFactory is now PluginInfoInitialized instead of NamedListInitialzed and functions are plugins. The name attribute is removed from collapseCollectorFactory since it is no longer necessary: {code:xml} searchComponent name=collapse class=org.apache.solr.handler.component.CollapseComponent collapseCollectorFactory class=solr.fieldcollapse.collector.DocumentGroupCountCollapseCollectorFactory / collapseCollectorFactory class=solr.fieldcollapse.collector.FieldValueCountCollapseCollectorFactory / collapseCollectorFactory class=solr.fieldcollapse.collector.DocumentFieldsCollapseCollectorFactory / collapseCollectorFactory class=org.apache.solr.search.fieldcollapse.collector.AggregateCollapseCollectorFactory function name=sum class=org.apache.solr.search.fieldcollapse.collector.aggregate.SumFunction/ function name=avg class=org.apache.solr.search.fieldcollapse.collector.aggregate.AverageFunction/ function name=min class=org.apache.solr.search.fieldcollapse.collector.aggregate.MinFunction/ function name=max class=org.apache.solr.search.fieldcollapse.collector.aggregate.MaxFunction/ /collapseCollectorFactory fieldCollapseCache class=solr.FastLRUCache size=512 initialSize=512 autowarmCount=128/ /searchComponent {code} # Changed DistributedFieldCollapsingIntegrationTest to use BaseDistributedSearchTestCase. This fails right now. I believe there is a bug with the distributed implementation. The distributed version returns one extra group when compared to the non-distributed version. I've put an @Ignore annotation on that test. We can consider creating the functions through a factory so that they can accept initialization parameters. The schema-fieldcollapse.xml and solrconfig-fieldcollapse.xml are no longer necessary and can be removed. Next steps: # Let us open issues for all the modifications needed in Solr to support this feature. That will help us break down this patch into more manageable (and easily reviewable) pieces. I guess we need one for providing custom Collectors for SolrIndexSearcher methods. Any others? # The response format is not very clear in the wiki. We should add more examples and explain the format. Field collapsing Key: SOLR-236 URL: https://issues.apache.org/jira/browse/SOLR-236 Project: Solr Issue Type: New Feature Components: search Affects Versions: 1.3 Reporter: Emmanuel Keller Assignee: Shalin Shekhar Mangar Fix For: 1.5 Attachments: collapsing-patch-to-1.3.0-dieter.patch, collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch, field-collapse-4-with-solrj.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, quasidistributed.additional.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, solr-236.patch, SOLR-236_collapsing.patch, SOLR-236_collapsing.patch This patch include a new feature called Field collapsing. Used in order to collapse a group of results with similar value for a given field to a single entry in the result set. Site collapsing is a special case of this, where all results for a given web site is collapsed into one or two entries in the result set, typically with an associated more documents from this site link. See also Duplicate detection. http://www.fastsearch.com/glossary.aspx?m=48amid=299 The implementation add 3 new query parameters (SolrParams): collapse.field to choose the field used to group results collapse.type normal (default value) or adjacent collapse.max to select how many continuous results are allowed before collapsing TODO (in progress): - More documentation (on source code) - Test cases Two patches: - field_collapsing.patch for current development version - field_collapsing_1.1.0.patch
[jira] Resolved: (SOLR-1667) PatternTokenizer does not clearAttributes()
[ https://issues.apache.org/jira/browse/SOLR-1667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shalin Shekhar Mangar resolved SOLR-1667. - Resolution: Fixed Fix Version/s: 1.5 Assignee: Shalin Shekhar Mangar Committed revision 892217. Thanks Robert! PatternTokenizer does not clearAttributes() --- Key: SOLR-1667 URL: https://issues.apache.org/jira/browse/SOLR-1667 Project: Solr Issue Type: Bug Components: Schema and Analysis Affects Versions: 1.4 Reporter: Robert Muir Assignee: Shalin Shekhar Mangar Fix For: 1.5 Attachments: SOLR-1667.patch PatternTokenizer creates tokens, but never calls clearAttributes() because of this things like positionIncrementGap are never reset to their default value. trivial patch -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1673) Getting the list of terms from more than one field
[ https://issues.apache.org/jira/browse/SOLR-1673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12792471#action_12792471 ] Shalin Shekhar Mangar commented on SOLR-1673: - Why would that be better? The current way is how http params are supposed to be if multiple values are not ordered. Getting the list of terms from more than one field -- Key: SOLR-1673 URL: https://issues.apache.org/jira/browse/SOLR-1673 Project: Solr Issue Type: Improvement Components: SearchComponents - other Affects Versions: 1.4 Environment: Operating system - Linux (Archlinux) Servlet container - Jetty Reporter: Siddhant Goel Priority: Minor Fix For: 1.5 To get the list of terms from more than one field, its currently required to specify the fields as terms.fl=field1terms.fl=field2terms.fl=field3, and so on. It would be better if the syntax can be modified to something like terms.fl=field1,field2,field3. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Assigned: (SOLR-236) Field collapsing
[ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shalin Shekhar Mangar reassigned SOLR-236: -- Assignee: Shalin Shekhar Mangar Field collapsing Key: SOLR-236 URL: https://issues.apache.org/jira/browse/SOLR-236 Project: Solr Issue Type: New Feature Components: search Affects Versions: 1.3 Reporter: Emmanuel Keller Assignee: Shalin Shekhar Mangar Fix For: 1.5 Attachments: collapsing-patch-to-1.3.0-dieter.patch, collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch, field-collapse-4-with-solrj.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, quasidistributed.additional.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, solr-236.patch, SOLR-236_collapsing.patch, SOLR-236_collapsing.patch This patch include a new feature called Field collapsing. Used in order to collapse a group of results with similar value for a given field to a single entry in the result set. Site collapsing is a special case of this, where all results for a given web site is collapsed into one or two entries in the result set, typically with an associated more documents from this site link. See also Duplicate detection. http://www.fastsearch.com/glossary.aspx?m=48amid=299 The implementation add 3 new query parameters (SolrParams): collapse.field to choose the field used to group results collapse.type normal (default value) or adjacent collapse.max to select how many continuous results are allowed before collapsing TODO (in progress): - More documentation (on source code) - Test cases Two patches: - field_collapsing.patch for current development version - field_collapsing_1.1.0.patch for Solr-1.1.0 P.S.: Feedback and misspelling correction are welcome ;-) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Assigned: (SOLR-1660) capitalizationfilter crashes if you use the maxWordCountOption
[ https://issues.apache.org/jira/browse/SOLR-1660?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shalin Shekhar Mangar reassigned SOLR-1660: --- Assignee: Shalin Shekhar Mangar capitalizationfilter crashes if you use the maxWordCountOption -- Key: SOLR-1660 URL: https://issues.apache.org/jira/browse/SOLR-1660 Project: Solr Issue Type: Bug Components: Schema and Analysis Affects Versions: 1.4 Reporter: Robert Muir Assignee: Shalin Shekhar Mangar Attachments: SOLR-1660.patch because arrayCopys into null. if you want a testcase i can yank it out of in-progress patch from SOLR-1657, but i think its obvious. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (SOLR-1660) capitalizationfilter crashes if you use the maxWordCountOption
[ https://issues.apache.org/jira/browse/SOLR-1660?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shalin Shekhar Mangar resolved SOLR-1660. - Resolution: Fixed Fix Version/s: 1.5 Committed revision 891596. Thanks Robert! capitalizationfilter crashes if you use the maxWordCountOption -- Key: SOLR-1660 URL: https://issues.apache.org/jira/browse/SOLR-1660 Project: Solr Issue Type: Bug Components: Schema and Analysis Affects Versions: 1.4 Reporter: Robert Muir Assignee: Shalin Shekhar Mangar Fix For: 1.5 Attachments: SOLR-1660.patch because arrayCopys into null. if you want a testcase i can yank it out of in-progress patch from SOLR-1657, but i think its obvious. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1662) BufferedTokenStream incorrect cloning
[ https://issues.apache.org/jira/browse/SOLR-1662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12791866#action_12791866 ] Shalin Shekhar Mangar commented on SOLR-1662: - {quote} So if we decide its the responsibility of the subclass, these implementations need thorough tests to see if they are ok or not. If we add the cloning to BufferedTokenStream itself, then we know they are ok... {quote} I think cloning should be done by sub-classes before writing. If BufferedTokenStream clones the token then every sub-class pays the price even though the use-case may just be to throw the token away. BufferedTokenStream incorrect cloning - Key: SOLR-1662 URL: https://issues.apache.org/jira/browse/SOLR-1662 Project: Solr Issue Type: Bug Components: Schema and Analysis Affects Versions: 1.4 Reporter: Robert Muir As part of writing tests for SOLR-1657, I rewrote one of the base classes (BaseTokenTestCase) to use the new TokenStream API, but also with some additional safety. {code} public static String tsToString(TokenStream in) throws IOException { StringBuilder out = new StringBuilder(); TermAttribute termAtt = (TermAttribute) in.addAttribute(TermAttribute.class); // extra safety to enforce, that the state is not preserved and also // assign bogus values in.clearAttributes(); termAtt.setTermBuffer(bogusTerm); while (in.incrementToken()) { if (out.length() 0) out.append(' '); out.append(termAtt.term()); in.clearAttributes(); termAtt.setTermBuffer(bogusTerm); } in.close(); return out.toString(); } {code} Setting the term text to bogus values helps find bugs in tokenstreams that do not clear or clone properly. In this case there is a problem with a tokenstream AB_AAB_Stream in TestBufferedTokenStream, it converts A B - A A B but does not clone, so the values get overwritten. This can be fixed in two ways: * BufferedTokenStream does the cloning * subclasses are responsible for the cloning The question is which one should it be? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Assigned: (SOLR-1662) BufferedTokenStream incorrect cloning
[ https://issues.apache.org/jira/browse/SOLR-1662?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shalin Shekhar Mangar reassigned SOLR-1662: --- Assignee: Shalin Shekhar Mangar BufferedTokenStream incorrect cloning - Key: SOLR-1662 URL: https://issues.apache.org/jira/browse/SOLR-1662 Project: Solr Issue Type: Bug Components: Schema and Analysis Affects Versions: 1.4 Reporter: Robert Muir Assignee: Shalin Shekhar Mangar As part of writing tests for SOLR-1657, I rewrote one of the base classes (BaseTokenTestCase) to use the new TokenStream API, but also with some additional safety. {code} public static String tsToString(TokenStream in) throws IOException { StringBuilder out = new StringBuilder(); TermAttribute termAtt = (TermAttribute) in.addAttribute(TermAttribute.class); // extra safety to enforce, that the state is not preserved and also // assign bogus values in.clearAttributes(); termAtt.setTermBuffer(bogusTerm); while (in.incrementToken()) { if (out.length() 0) out.append(' '); out.append(termAtt.term()); in.clearAttributes(); termAtt.setTermBuffer(bogusTerm); } in.close(); return out.toString(); } {code} Setting the term text to bogus values helps find bugs in tokenstreams that do not clear or clone properly. In this case there is a problem with a tokenstream AB_AAB_Stream in TestBufferedTokenStream, it converts A B - A A B but does not clone, so the values get overwritten. This can be fixed in two ways: * BufferedTokenStream does the cloning * subclasses are responsible for the cloning The question is which one should it be? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (SOLR-236) Field collapsing
[ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shalin Shekhar Mangar updated SOLR-236: --- Attachment: SOLR-236.patch Patch in sync with trunk. # CollapseComponent is PluginInfoInitialized. Removed changes to SolrConfig. Note, the collapseCollectorFactories array and the separate fieldCollapsing element has been removed from configuration. this patch has the following configuration: {code:xml} searchComponent name=collapse class=org.apache.solr.handler.component.CollapseComponent collapseCollectorFactory name=groupDocumentsCounts class=solr.fieldcollapse.collector.DocumentGroupCountCollapseCollectorFactory / collapseCollectorFactory name=groupFieldValue class=solr.fieldcollapse.collector.FieldValueCountCollapseCollectorFactory / collapseCollectorFactory name=groupDocumentsFields class=solr.fieldcollapse.collector.DocumentFieldsCollapseCollectorFactory / collapseCollectorFactory name=groupAggregatedData class=org.apache.solr.search.fieldcollapse.collector.AggregateCollapseCollectorFactory lst name=aggregateFunctions str name=sumorg.apache.solr.search.fieldcollapse.collector.aggregate.SumFunction/str str name=avgorg.apache.solr.search.fieldcollapse.collector.aggregate.AverageFunction/str str name=minorg.apache.solr.search.fieldcollapse.collector.aggregate.MinFunction/str str name=maxorg.apache.solr.search.fieldcollapse.collector.aggregate.MaxFunction/str /lst /collapseCollectorFactory fieldCollapseCache class=solr.FastLRUCache size=512 initialSize=512 autowarmCount=128/ /searchComponent {code} # I couldn't find where the fieldCollapseCache was being regenerated. It seems it is not being thrown away after commits? I have changed it to be re-created on newSearcher event. # Removed changes to JettySolrRunner,CoreContainer and SolrDispatchFilter for the distributed test case. We will refactor it to use BaseDistributedSearchTestCase (not implemented yet) Field collapsing Key: SOLR-236 URL: https://issues.apache.org/jira/browse/SOLR-236 Project: Solr Issue Type: New Feature Components: search Affects Versions: 1.3 Reporter: Emmanuel Keller Assignee: Shalin Shekhar Mangar Fix For: 1.5 Attachments: collapsing-patch-to-1.3.0-dieter.patch, collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch, field-collapse-4-with-solrj.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, quasidistributed.additional.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236.patch, solr-236.patch, SOLR-236_collapsing.patch, SOLR-236_collapsing.patch This patch include a new feature called Field collapsing. Used in order to collapse a group of results with similar value for a given field to a single entry in the result set. Site collapsing is a special case of this, where all results for a given web site is collapsed into one or two entries in the result set, typically with an associated more documents from this site link. See also Duplicate detection. http://www.fastsearch.com/glossary.aspx?m=48amid=299 The implementation add 3 new query parameters (SolrParams): collapse.field to choose the field used to group results collapse.type normal (default value) or adjacent collapse.max to select how many continuous results are allowed before collapsing TODO (in progress): - More documentation (on source code) - Test cases Two patches: - field_collapsing.patch for current development version - field_collapsing_1.1.0.patch for Solr-1.1.0 P.S.: Feedback and misspelling correction are welcome ;-) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-236) Field collapsing
[ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12792115#action_12792115 ] Shalin Shekhar Mangar commented on SOLR-236: {quote} I'd define large scale for this in a couple of ways: 1. Lots of docs in the result set (10K+) 2. Lots of overall docs (100M+) 3. Lots of queries ( 10 QPS) {quote} Grant, this patch may not be perfect but I think we all agree that it is a great start. This is stable, used by many and has been well supported by the community. This is also a large patch and as I have known from my DataImportHandler experience, maintaining a large patch is quite a pain (and DataImportHandler didn't even touch the core). How about we commit this (after some review, of course), mark this as experimental (no guarantees of any sort) and then start improving it one issue at a time? Alternately, if you are not comfortable adding it to trunk, we can commit this on a branch and merge into trunk later. What do you think? Field collapsing Key: SOLR-236 URL: https://issues.apache.org/jira/browse/SOLR-236 Project: Solr Issue Type: New Feature Components: search Affects Versions: 1.3 Reporter: Emmanuel Keller Assignee: Shalin Shekhar Mangar Fix For: 1.5 Attachments: collapsing-patch-to-1.3.0-dieter.patch, collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch, field-collapse-4-with-solrj.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, quasidistributed.additional.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236.patch, solr-236.patch, SOLR-236_collapsing.patch, SOLR-236_collapsing.patch This patch include a new feature called Field collapsing. Used in order to collapse a group of results with similar value for a given field to a single entry in the result set. Site collapsing is a special case of this, where all results for a given web site is collapsed into one or two entries in the result set, typically with an associated more documents from this site link. See also Duplicate detection. http://www.fastsearch.com/glossary.aspx?m=48amid=299 The implementation add 3 new query parameters (SolrParams): collapse.field to choose the field used to group results collapse.type normal (default value) or adjacent collapse.max to select how many continuous results are allowed before collapsing TODO (in progress): - More documentation (on source code) - Test cases Two patches: - field_collapsing.patch for current development version - field_collapsing_1.1.0.patch for Solr-1.1.0 P.S.: Feedback and misspelling correction are welcome ;-) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (SOLR-1662) BufferedTokenStream incorrect cloning
[ https://issues.apache.org/jira/browse/SOLR-1662?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shalin Shekhar Mangar resolved SOLR-1662. - Resolution: Fixed Committed revision 891889. Thanks Robert and Uwe! BufferedTokenStream incorrect cloning - Key: SOLR-1662 URL: https://issues.apache.org/jira/browse/SOLR-1662 Project: Solr Issue Type: Bug Components: Schema and Analysis Affects Versions: 1.4 Reporter: Robert Muir Assignee: Shalin Shekhar Mangar Attachments: SOLR-1662.patch As part of writing tests for SOLR-1657, I rewrote one of the base classes (BaseTokenTestCase) to use the new TokenStream API, but also with some additional safety. {code} public static String tsToString(TokenStream in) throws IOException { StringBuilder out = new StringBuilder(); TermAttribute termAtt = (TermAttribute) in.addAttribute(TermAttribute.class); // extra safety to enforce, that the state is not preserved and also // assign bogus values in.clearAttributes(); termAtt.setTermBuffer(bogusTerm); while (in.incrementToken()) { if (out.length() 0) out.append(' '); out.append(termAtt.term()); in.clearAttributes(); termAtt.setTermBuffer(bogusTerm); } in.close(); return out.toString(); } {code} Setting the term text to bogus values helps find bugs in tokenstreams that do not clear or clone properly. In this case there is a problem with a tokenstream AB_AAB_Stream in TestBufferedTokenStream, it converts A B - A A B but does not clone, so the values get overwritten. This can be fixed in two ways: * BufferedTokenStream does the cloning * subclasses are responsible for the cloning The question is which one should it be? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-236) Field collapsing
[ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12792350#action_12792350 ] Shalin Shekhar Mangar commented on SOLR-236: For Martijn: {quote} The reason I added fieldCollapsing ... /fieldCollapsing was to be able support sharing of collapseCollectorFactory instances between different collapse components in the near future. You think that is a valid reason for that? Or do you think that collapseCollectorFactories shouldn't be shared? {quote} I just don't think that we should introduce new tags and new kinds of components in solrconfig.xml, particularly those that are useful to only a single component. That introduces changes in SolrConfig.java so that it knows how to load such things. That is why I moved that configuration inside CollapseComponent. Ideally, all components will use PluginInfo and load whatever they need from their own PluginInfo object and SolrConfig would not need to be changed unless we introduce new kinds of Solr plugins. Just curious, what would be a use-case for sharing factories (other than reducing duplication of configuration) and having multiple CollapseComponent? {quote} The CollapseComponentTest was failing. The field collapseCollectorFactories in CollapseComponent was null when not specifying any collapse collector factories in the solrconfig.xml which resulted in a NPE. {quote} Oops, sorry about that. I only ran the tests inside org.apache.solr.search.fieldcollapse. I didn't notice there are other tests too. Thanks! bq. The DistributedFieldCollapsingIntegrationTest is still failing, because you left out changes in JettySolrRunner, CoreContainer and SolrDispatchFilter from my original patch. I don't think we need to add that functionality to CoreContainer and SolrDispatchFilter. It is still possible to specify a different solrconfig and schema for a test. Let me see if I can make this work with BaseDistributedSearchTestCase Field collapsing Key: SOLR-236 URL: https://issues.apache.org/jira/browse/SOLR-236 Project: Solr Issue Type: New Feature Components: search Affects Versions: 1.3 Reporter: Emmanuel Keller Assignee: Shalin Shekhar Mangar Fix For: 1.5 Attachments: collapsing-patch-to-1.3.0-dieter.patch, collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch, field-collapse-4-with-solrj.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, quasidistributed.additional.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236.patch, SOLR-236.patch, solr-236.patch, SOLR-236_collapsing.patch, SOLR-236_collapsing.patch This patch include a new feature called Field collapsing. Used in order to collapse a group of results with similar value for a given field to a single entry in the result set. Site collapsing is a special case of this, where all results for a given web site is collapsed into one or two entries in the result set, typically with an associated more documents from this site link. See also Duplicate detection. http://www.fastsearch.com/glossary.aspx?m=48amid=299 The implementation add 3 new query parameters (SolrParams): collapse.field to choose the field used to group results collapse.type normal (default value) or adjacent collapse.max to select how many continuous results are allowed before collapsing TODO (in progress): - More documentation (on source code) - Test cases Two patches: - field_collapsing.patch for current development version - field_collapsing_1.1.0.patch for Solr-1.1.0 P.S.: Feedback and misspelling correction are welcome ;-) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (SOLR-1630) StringIndexOutOfBoundsException in SpellCheckComponent
[ https://issues.apache.org/jira/browse/SOLR-1630?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shalin Shekhar Mangar updated SOLR-1630: Attachment: SOLR-1630.patch I'm not able to reproduce this issue. I used Robin's document, schema and solrconfig.xml in the form of a unit test and it gives an empty spell check response but no exceptions. StringIndexOutOfBoundsException in SpellCheckComponent -- Key: SOLR-1630 URL: https://issues.apache.org/jira/browse/SOLR-1630 Project: Solr Issue Type: Bug Components: Schema and Analysis, spellchecker Affects Versions: 1.4 Environment: Solr 1.4 Lucene 2.9.1 Win XP java version 1.6.0_14 Reporter: Robin Wojciki Assignee: Shalin Shekhar Mangar Attachments: bug.xml, schema.xml, SOLR-1630.patch, solrconfig.xml For some documents/search strings, the SpellCheckComponent throws StringIndexOutOfBoundsException See: http://www.lucidimagination.com/search/document/3be6555227e031fc/ h2. Replication * Save attached schema.xml and solrconfig.xml in apache-solr-1.4.0/example/solr/conf * Start Solr * Index attached bug.xml * Query [http://localhost:8983/solr/select/?q=awehjse-wjkekw] It throws a StringIndexOutOfBoundsException {noformat} String index out of range: -7 java.lang.StringIndexOutOfBoundsException: String index out of range: -7 at java.lang.AbstractStringBuilder.replace(Unknown Source) at java.lang.StringBuilder.replace(Unknown Source) at org.apache.solr.handler.component.SpellCheckComponent.toNamedList(SpellCheckComponent.java:248) at org.apache.solr.handler.component.SpellCheckComponent.process(SpellCheckComponent.java:143) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:195) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241) {noformat} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1630) StringIndexOutOfBoundsException in SpellCheckComponent
[ https://issues.apache.org/jira/browse/SOLR-1630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12791342#action_12791342 ] Shalin Shekhar Mangar commented on SOLR-1630: - Thanks Guillaume, can you give me an example document too? StringIndexOutOfBoundsException in SpellCheckComponent -- Key: SOLR-1630 URL: https://issues.apache.org/jira/browse/SOLR-1630 Project: Solr Issue Type: Bug Components: Schema and Analysis, spellchecker Affects Versions: 1.4 Environment: Solr 1.4 Lucene 2.9.1 Win XP java version 1.6.0_14 Reporter: Robin Wojciki Assignee: Shalin Shekhar Mangar Attachments: bug.xml, schema.xml, SOLR-1630.patch, solrconfig.xml, spellcheckconfig.xml For some documents/search strings, the SpellCheckComponent throws StringIndexOutOfBoundsException See: http://www.lucidimagination.com/search/document/3be6555227e031fc/ h2. Replication * Save attached schema.xml and solrconfig.xml in apache-solr-1.4.0/example/solr/conf * Start Solr * Index attached bug.xml * Query [http://localhost:8983/solr/select/?q=awehjse-wjkekw] It throws a StringIndexOutOfBoundsException {noformat} String index out of range: -7 java.lang.StringIndexOutOfBoundsException: String index out of range: -7 at java.lang.AbstractStringBuilder.replace(Unknown Source) at java.lang.StringBuilder.replace(Unknown Source) at org.apache.solr.handler.component.SpellCheckComponent.toNamedList(SpellCheckComponent.java:248) at org.apache.solr.handler.component.SpellCheckComponent.process(SpellCheckComponent.java:143) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:195) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241) {noformat} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-236) Field collapsing
[ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12791836#action_12791836 ] Shalin Shekhar Mangar commented on SOLR-236: Does anybody have a reason for why this should not be committed to trunk as it stands right now? Field collapsing Key: SOLR-236 URL: https://issues.apache.org/jira/browse/SOLR-236 Project: Solr Issue Type: New Feature Components: search Affects Versions: 1.3 Reporter: Emmanuel Keller Fix For: 1.5 Attachments: collapsing-patch-to-1.3.0-dieter.patch, collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch, field-collapse-4-with-solrj.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, quasidistributed.additional.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, solr-236.patch, SOLR-236_collapsing.patch, SOLR-236_collapsing.patch This patch include a new feature called Field collapsing. Used in order to collapse a group of results with similar value for a given field to a single entry in the result set. Site collapsing is a special case of this, where all results for a given web site is collapsed into one or two entries in the result set, typically with an associated more documents from this site link. See also Duplicate detection. http://www.fastsearch.com/glossary.aspx?m=48amid=299 The implementation add 3 new query parameters (SolrParams): collapse.field to choose the field used to group results collapse.type normal (default value) or adjacent collapse.max to select how many continuous results are allowed before collapsing TODO (in progress): - More documentation (on source code) - Test cases Two patches: - field_collapsing.patch for current development version - field_collapsing_1.1.0.patch for Solr-1.1.0 P.S.: Feedback and misspelling correction are welcome ;-) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-17) XSD for solr requests/responses
[ https://issues.apache.org/jira/browse/SOLR-17?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12790621#action_12790621 ] Shalin Shekhar Mangar commented on SOLR-17: --- This is like a solution looking for a problem. XSD for solr requests/responses --- Key: SOLR-17 URL: https://issues.apache.org/jira/browse/SOLR-17 Project: Solr Issue Type: Improvement Reporter: Mike Baranczak Priority: Minor Attachments: solr-complex.xml, solr-rev2.xsd, solr.xsd, UselessRequestHandler.java Attaching an XML schema definition for the responses and the update requests. I needed to do this for myself anyway, so I might as well contribute it to the project. At the moment, I have no plans to write an XSD for the config documents, but it wouldn't be a bad idea. TODO: change the schema URL. I'm guessing that Apache already has some sort of naming convention for these? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Issue Comment Edited: (SOLR-1006) ConcurrentLRUCache API improvements
[ https://issues.apache.org/jira/browse/SOLR-1006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12790644#action_12790644 ] Shalin Shekhar Mangar edited comment on SOLR-1006 at 12/15/09 10:18 AM: I don't have a use-case for this anymore. Let us close this issue. was (Author: shalinmangar): I don't have a a use-case for this anymore. Let us close this issue. ConcurrentLRUCache API improvements --- Key: SOLR-1006 URL: https://issues.apache.org/jira/browse/SOLR-1006 Project: Solr Issue Type: Improvement Reporter: Noble Paul Assignee: Shalin Shekhar Mangar Priority: Minor Fix For: 1.4 Attachments: SOLR-1006.patch, SOLR-1006.patch This is to make ConcurrentLRUCache more consistent with LinkedHashMap behavior # remove must not call evictionListener.evictedEntry() # -EvictionListener must be able prevent eviction of an element by returning a false.- # Add a new method Map getOldestItems(long n) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (SOLR-1006) ConcurrentLRUCache API improvements
[ https://issues.apache.org/jira/browse/SOLR-1006?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shalin Shekhar Mangar updated SOLR-1006: Description: This is to make ConcurrentLRUCache more consistent with LinkedHashMap behavior # remove must not call evictionListener.evictedEntry() # -EvictionListener must be able prevent eviction of an element by returning a false.- # Add a new method Map getOldestItems(long n) was: This is to make ConcurrentLRUCache more consistent with LinkedHashMap behavior # remove must not call evictionListener.evictedEntry() # EvictionListener must be able prevent eviction of an element by returning a false. # Add a new method Map getOldestItems(long n) I don't have a a use-case for this anymore. Let us close this issue. ConcurrentLRUCache API improvements --- Key: SOLR-1006 URL: https://issues.apache.org/jira/browse/SOLR-1006 Project: Solr Issue Type: Improvement Reporter: Noble Paul Assignee: Shalin Shekhar Mangar Priority: Minor Fix For: 1.4 Attachments: SOLR-1006.patch, SOLR-1006.patch This is to make ConcurrentLRUCache more consistent with LinkedHashMap behavior # remove must not call evictionListener.evictedEntry() # -EvictionListener must be able prevent eviction of an element by returning a false.- # Add a new method Map getOldestItems(long n) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Closed: (SOLR-1006) ConcurrentLRUCache API improvements
[ https://issues.apache.org/jira/browse/SOLR-1006?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shalin Shekhar Mangar closed SOLR-1006. --- Resolution: Fixed Fix Version/s: (was: 1.5) 1.4 ConcurrentLRUCache API improvements --- Key: SOLR-1006 URL: https://issues.apache.org/jira/browse/SOLR-1006 Project: Solr Issue Type: Improvement Reporter: Noble Paul Assignee: Shalin Shekhar Mangar Priority: Minor Fix For: 1.4 Attachments: SOLR-1006.patch, SOLR-1006.patch This is to make ConcurrentLRUCache more consistent with LinkedHashMap behavior # remove must not call evictionListener.evictedEntry() # -EvictionListener must be able prevent eviction of an element by returning a false.- # Add a new method Map getOldestItems(long n) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (SOLR-1645) Add human content-type
[ https://issues.apache.org/jira/browse/SOLR-1645?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shalin Shekhar Mangar updated SOLR-1645: Fix Version/s: (was: 1.4) 1.5 1.4 has been released. Marking for 1.5 instead. Add human content-type -- Key: SOLR-1645 URL: https://issues.apache.org/jira/browse/SOLR-1645 Project: Solr Issue Type: Improvement Components: contrib - Solr Cell (Tika extraction) Affects Versions: 1.4 Reporter: Khalid Yagoubi Fix For: 1.5 Idea is to allow Solr-Cell to calculate the human content-type from the extracted content-type and map it to a field in the schema. So the user can search on media: image or media:video Idea : 1) Hardcode a hashmap in somewhere in extraction classes and get human content-type from extracted content-type. I Think to SolrContentHandler.java 2) Write an xml file where we can put a mapping like in tika-config.xml for parsers 3) Use tika-config.xml to get all supported mime-types -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1212) TestNG Test Case
[ https://issues.apache.org/jira/browse/SOLR-1212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12790648#action_12790648 ] Shalin Shekhar Mangar commented on SOLR-1212: - I'm not sure what to do with this. We don't need to ship this with our releases. Perhaps it is best to mark this as Won't Fix and link this issue to http://wiki.apache.org/solr/TestingSolr so that people who use TestNG can use this code if necessary. TestNG Test Case - Key: SOLR-1212 URL: https://issues.apache.org/jira/browse/SOLR-1212 Project: Solr Issue Type: New Feature Components: clients - java Affects Versions: 1.4 Environment: Java 6 Reporter: Kay Kay Fix For: 1.5 Attachments: SOLR-1212.patch, testng-5.9-jdk15.jar Original Estimate: 1h Remaining Estimate: 1h TestNG equivalent of AbstractSolrTestCase , without using JUnit altogether . New Class created: AbstractSolrNGTest LICENSE.txt , NOTICE.txt modified as appropriate. ( TestNG under Apache License 2.0 ) TestNG 5.9-jdk15 added to lib. Justification: In some workplaces - people are moving towards TestNG and take out JUnit altogether from the classpath. Hence useful in those cases. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (SOLR-630) Spellchecker should not be case-sensitive and should be stopwords-aware
[ https://issues.apache.org/jira/browse/SOLR-630?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shalin Shekhar Mangar resolved SOLR-630. Resolution: Invalid I don't think this is a problem. As Alex noted, it is all a matter of configuring your analyzers and spell checker correctly. Spellchecker should not be case-sensitive and should be stopwords-aware --- Key: SOLR-630 URL: https://issues.apache.org/jira/browse/SOLR-630 Project: Solr Issue Type: Bug Components: spellchecker Reporter: Otis Gospodnetic Priority: Minor Fix For: 1.5 Here are 2 more bugs: 1) Search for: united states of America Suggests: united states oft America It looks like the SC doesn't check stopwords, and of is a stopword. Thus, it does not exist in the index, but oft does, so SC suggests oft and thinks of is misspelled. I think the SC component should check the list of stopwords, too, no? 2) Search for: united states of America Suggests: united states oftAmericaa The of-oft is described above. But note how SC suggested America-Americaa, but it didn't do that for america. This looks like case-sensitivity problem. Shouldn't the SC be case-insensitive? I can't produce a patch now (no src handy), so I'm hoping Grant or somebody else can do it based on this report. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (SOLR-1532) allow StreamingUpdateSolrServer to use a provided HttpClient
[ https://issues.apache.org/jira/browse/SOLR-1532?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shalin Shekhar Mangar updated SOLR-1532: Attachment: SOLR-1532.patch Synced to trunk. I'll commit this shortly. allow StreamingUpdateSolrServer to use a provided HttpClient Key: SOLR-1532 URL: https://issues.apache.org/jira/browse/SOLR-1532 Project: Solr Issue Type: Improvement Components: clients - java Affects Versions: 1.4 Reporter: gabriele renzi Priority: Minor Fix For: 1.5 Attachments: SOLR-1532.patch, SOLR-1532.patch As of r830319 StreamingUpdateSolrServer does not allow calling code to provide an HttpClient, and this implies client code cannot reuse an existing connection manager, the patch adds a new constructor and refactors the old one to use this. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (SOLR-1532) allow StreamingUpdateSolrServer to use a provided HttpClient
[ https://issues.apache.org/jira/browse/SOLR-1532?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shalin Shekhar Mangar resolved SOLR-1532. - Resolution: Fixed Assignee: Shalin Shekhar Mangar Committed revision 890769. Thanks Gabriele! allow StreamingUpdateSolrServer to use a provided HttpClient Key: SOLR-1532 URL: https://issues.apache.org/jira/browse/SOLR-1532 Project: Solr Issue Type: Improvement Components: clients - java Affects Versions: 1.4 Reporter: gabriele renzi Assignee: Shalin Shekhar Mangar Priority: Minor Fix For: 1.5 Attachments: SOLR-1532.patch, SOLR-1532.patch As of r830319 StreamingUpdateSolrServer does not allow calling code to provide an HttpClient, and this implies client code cannot reuse an existing connection manager, the patch adds a new constructor and refactors the old one to use this. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (SOLR-1131) Allow a single field type to index multiple fields
[ https://issues.apache.org/jira/browse/SOLR-1131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shalin Shekhar Mangar updated SOLR-1131: Attachment: SOLR-1131.patch I guess Noble was referring to something like what is done in this patch. # DelegatingFieldType has a new method: {code} public SchemaField[] getSubFields(SchemaField mainField); {code} # PointType and PlusMinusField implement this new method. It is not the prettiest way but this is one way to do it. # With this approach, we can get the names from the subFields wherever the name is used (not implemented in this patch). The PlusMinusField is actually a field type and not a field so we should probably rename it to PlusMinusFieldType. Allow a single field type to index multiple fields -- Key: SOLR-1131 URL: https://issues.apache.org/jira/browse/SOLR-1131 Project: Solr Issue Type: New Feature Components: Schema and Analysis Reporter: Ryan McKinley Assignee: Grant Ingersoll Fix For: 1.5 Attachments: SOLR-1131-IndexMultipleFields.patch, SOLR-1131.Mattmann.121009.patch.txt, SOLR-1131.Mattmann.121109.patch.txt, SOLR-1131.patch, SOLR-1131.patch, SOLR-1131.patch, SOLR-1131.patch, SOLR-1131.patch, SOLR-1131.patch, SOLR-1131.patch, SOLR-1131.patch, SOLR-1131.patch, SOLR-1131.patch In a few special cases, it makes sense for a single field (the concept) to be indexed as a set of Fields (lucene Field). Consider SOLR-773. The concept point may be best indexed in a variety of ways: * geohash (sincle lucene field) * lat field, lon field (two double fields) * cartesian tiers (a series of fields with tokens to say if it exists within that region) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Re: NPE in MoreLikeThis referenced doc not found and debugQuery=True
On Thu, Dec 10, 2009 at 6:34 PM, david.stu...@progressivealliance.co.uk david.stu...@progressivealliance.co.uk wrote: Hi All, When I do a specific MLT search on a document with debugQuery=True I am getting a NullPoniterException both on screen and in my catalina logs. The query is as follows http://localhost:8080/solr2/select/?mlt.minwl=3mlt.fl=bodymlt.mintf=1mlt.maxwl=15mlt.maxqt=20version=1.2rows=5mlt.mindf=1fl=nid,title,path,url,digest,teaserstart=0q=nid:16036qt=mltdebugQuery=true Is this desired behavior? java.lang.RuntimeException: java.lang.NullPointerException at org.apache.solr.search.QueryParsing.toString(QueryParsing.java:470) at org.apache.solr.util.SolrPluginUtils.doStandardDebug(SolrPluginUtils.java:399) at org.apache.solr.handler.MoreLikeThisHandler.handleRequestBody(MoreLikeThisHandle r.java:189) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java :131) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1204) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:303) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:232) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilt erChain.java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain. java:206) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:2 33) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:1 91) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109 ) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:286) at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:845) at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Pr otocol.java:583) at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:447) at java.lang.Thread.run(Thread.java:637) Caused by: java.lang.NullPointerException at org.apache.solr.search.QueryParsing.toString(QueryParsing.java:439) at org.apache.solr.search.QueryParsing.toString(QueryParsing.java:467) ... 18 more Apologies if this has been discussed or deemed desired, but thought I would mention this and offer a patch as a entry into helping with the project. Thanks for reporting this Dave. It'd be great if you can open a Jira issue and attach a unit test reproducing this issue. A fix would be even better :) http://wiki.apache.org/solr/HowToContribute -- Regards, Shalin Shekhar Mangar.
[jira] Commented: (SOLR-17) XSD for solr requests/responses
[ https://issues.apache.org/jira/browse/SOLR-17?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12790910#action_12790910 ] Shalin Shekhar Mangar commented on SOLR-17: --- Chris, it seems that you are taking my comment personally. Please don't; it is not my intention to ridicule anyone's efforts. As you can see, this issue has been open for some time now and a major reason is that we have never found a good use for an XSD. I'm merely trying to say that it seems like we're trying to _find_ use-cases for a solution instead of starting with an actual need. My point is that Solr can use it we _want_ to but Solr certainly does not _need_ to use it. I don't think we gain much by an XSD. XSD for solr requests/responses --- Key: SOLR-17 URL: https://issues.apache.org/jira/browse/SOLR-17 Project: Solr Issue Type: Improvement Reporter: Mike Baranczak Priority: Minor Attachments: solr-complex.xml, solr-rev2.xsd, solr.xsd, UselessRequestHandler.java Attaching an XML schema definition for the responses and the update requests. I needed to do this for myself anyway, so I might as well contribute it to the project. At the moment, I have no plans to write an XSD for the config documents, but it wouldn't be a bad idea. TODO: change the schema URL. I'm guessing that Apache already has some sort of naming convention for these? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Re: ValueSourceParser problem
On Wed, Dec 16, 2009 at 11:01 AM, patrick o'leary pj...@pjaol.com wrote: #2 There's an AbstractMethodError when you extend ValueSourceParser and don't override the init(NamedList args) method because SolrCore:~439 createInitInstance, cast's the plugin class as a NamedListInitializedPlugin, and call's ((NamedListInitializedPlugin) o).init(info.initArgs); If your extended ValueSourceParser class doesn't provide an override, then there's nothing that implements the base interface from NamedListInitializedPlugin. ValueSourceParser in trunk has an empty init method so you should never get a AbstractMethodError. Can you check again? -- Regards, Shalin Shekhar Mangar.
Re: ValueSourceParser problem
On Wed, Dec 16, 2009 at 11:32 AM, patrick o'leary pj...@pjaol.com wrote: Check SolrCore.createInitInstance It cast's your CustomValueSourceParser as a NamedListInitializedPlugin which is an interface, thus the AbstractMethodError, as there isn't a concrete implementation of init. If it cast it as a ValueSourceParser in SolrCore then it would be fine. That is not possible. Even though the object is cast to an interface NamedListInitializedPlugin, it is still an instance of ValueSourceParser and therefore it does have an implementation of the init method. Am I missing something? -- Regards, Shalin Shekhar Mangar.
Re: ValueSourceParser problem
On Wed, Dec 16, 2009 at 11:58 AM, patrick o'leary pj...@pjaol.com wrote: SEVERE: java.lang.AbstractMethodError at org.apache.solr.core.SolrCore.createInitInstance(SolrCore.java:439) at org.apache.solr.core.SolrCore.initPlugins(SolrCore.java:1498) at org.apache.solr.core.SolrCore.initPlugins(SolrCore.java:1492) at org.apache.solr.core.SolrCore.initPlugins(SolrCore.java:1525) at org.apache.solr.core.SolrCore.initValueSourceParsers(SolrCore.java:1469) at org.apache.solr.core.SolrCore.init(SolrCore.java:549) at org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:137) at org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:83) at org.mortbay.jetty.servlet.FilterHolder.doStart(FilterHolder.java:99) And svn info Path: . URL: http://svn.apache.org/repos/asf/lucene/solr/trunk Repository Root: http://svn.apache.org/repos/asf Repository UUID: 13f79535-47bb-0310-9956-ffa450edef68 Revision: 891117 Node Kind: directory Schedule: normal Last Changed Author: koji Last Changed Rev: 890798 Last Changed Date: 2009-12-15 06:13:59 -0800 (Tue, 15 Dec 2009) I just wrote a custom ValueSourceParser which does not override the init method and it loads fine on current trunk. Can you share your code? -- Regards, Shalin Shekhar Mangar.
Re: ValueSourceParser problem
On Wed, Dec 16, 2009 at 12:39 PM, patrick o'leary pj...@pjaol.com wrote: Yeah.. can't release that part mate, all you need is package com.pjaol; import org.apache.lucene.queryParser.ParseException; import org.apache.solr.search.FunctionQParser; import org.apache.solr.search.ValueSourceParser; import org.apache.solr.search.function.ValueSource; public class CustomValueSourceParser extends ValueSourceParser{ @Override public ValueSource parse(FunctionQParser fp) throws ParseException { System.out.println(*** Called); return null; } } And valueSourceParser name=social_a class=com.pjaol.CustomValueSourceParser / in your solrconfig.xml The parse method only gets called at query time Patrick, this works for me. The string is printed in the console. Your runtime classpath must have Solr 1.3 jars somewhere because the ValueSourceParser#init was abstract in 1.3 http://svn.apache.org/repos/asf/lucene/solr/branches/branch-1.3/src/java/org/apache/solr/search/ValueSourceParser.java -- Regards, Shalin Shekhar Mangar.
[jira] Resolved: (SOLR-1651) Incorrect dataimport handler package name in SolrResourceLoader
[ https://issues.apache.org/jira/browse/SOLR-1651?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shalin Shekhar Mangar resolved SOLR-1651. - Resolution: Fixed Committed revision 890243. Thanks for the catch Akshay! Incorrect dataimport handler package name in SolrResourceLoader --- Key: SOLR-1651 URL: https://issues.apache.org/jira/browse/SOLR-1651 Project: Solr Issue Type: Bug Components: contrib - DataImportHandler Affects Versions: 1.4 Reporter: Akshay K. Ukey Assignee: Shalin Shekhar Mangar Priority: Trivial Fix For: 1.5 Attachments: SOLR-1651.patch packages String array used by findClass method in SolrResourceLoader has value for dataimport handler package as handler.dataimport, must be handler.dataimport. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (SOLR-1610) Add generics to SolrCache
[ https://issues.apache.org/jira/browse/SOLR-1610?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shalin Shekhar Mangar resolved SOLR-1610. - Resolution: Fixed Committed revision 890250. Thanks Jason! Add generics to SolrCache - Key: SOLR-1610 URL: https://issues.apache.org/jira/browse/SOLR-1610 Project: Solr Issue Type: Improvement Components: search Affects Versions: 1.4 Reporter: Jason Rutherglen Assignee: Shalin Shekhar Mangar Priority: Trivial Fix For: 1.5 Attachments: SOLR-1610.patch Seems fairly simple for SolrCache to have generics. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1653) add PatternReplaceCharFilter
[ https://issues.apache.org/jira/browse/SOLR-1653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12790577#action_12790577 ] Shalin Shekhar Mangar commented on SOLR-1653: - bq. If there is no objections, I'll commit later today. +1 Thanks Koji! add PatternReplaceCharFilter Key: SOLR-1653 URL: https://issues.apache.org/jira/browse/SOLR-1653 Project: Solr Issue Type: New Feature Components: Schema and Analysis Affects Versions: 1.4 Reporter: Koji Sekiguchi Assignee: Koji Sekiguchi Priority: Minor Fix For: 1.5 Attachments: SOLR-1653.patch, SOLR-1653.patch Add a new CharFilter that uses a regular expression for the target of replace string in char stream. Usage: {code:title=schema.xml} fieldType name=textCharNorm class=solr.TextField positionIncrementGap=100 analyzer charFilter class=solr.PatternReplaceCharFilterFactory groupedPattern=([nN][oO]\.)\s*(\d+) replaceGroups=1,2 blockDelimiters=:;/ charFilter class=solr.MappingCharFilterFactory mapping=mapping-ISOLatin1Accent.txt/ tokenizer class=solr.WhitespaceTokenizerFactory/ /analyzer /fieldType {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (SOLR-1643) remove DIH-extras package
[ https://issues.apache.org/jira/browse/SOLR-1643?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shalin Shekhar Mangar resolved SOLR-1643. - Resolution: Won't Fix Reverted previous committed and moved TikaEntityProcessor and tests to extras. Committed revision 890679. remove DIH-extras package - Key: SOLR-1643 URL: https://issues.apache.org/jira/browse/SOLR-1643 Project: Solr Issue Type: Sub-task Components: contrib - DataImportHandler Reporter: Noble Paul Assignee: Shalin Shekhar Mangar Fix For: 1.5 Attachments: SOLR-1643.patch, SOLR-1643.patch Now that jars can be added directly using solrconfig.xml We may not really need this extra package. We can compile and add this to the main dataimporthandler.jar and specify in the instructions how to include the jars for those components w/ external requirements such as MailEntityProcessor/TikaEntityProcessor -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (SOLR-1139) SolrJ TermsComponent Query and Response Support
[ https://issues.apache.org/jira/browse/SOLR-1139?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shalin Shekhar Mangar updated SOLR-1139: Attachment: SOLR-1139.patch Updated patch for two params added by SOLR-1625. I'll commit this shortly. SolrJ TermsComponent Query and Response Support --- Key: SOLR-1139 URL: https://issues.apache.org/jira/browse/SOLR-1139 Project: Solr Issue Type: New Feature Components: clients - java Affects Versions: 1.4 Reporter: Matt Weber Assignee: Shalin Shekhar Mangar Priority: Minor Attachments: SOLR-1139-WITH_SORT_SUPPORT.patch, SOLR-1139.patch, SOLR-1139.patch, SOLR-1139.patch, SOLR-1139.patch, SOLR-1139.patch, SOLR-1139.patch, SOLR-1139.patch SolrJ should support the new TermsComponent that was introduced in Solr 1.4. It should be able to: - set TermsComponent query parameters via SolrQuery - parse the TermsComponent response -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (SOLR-1139) SolrJ TermsComponent Query and Response Support
[ https://issues.apache.org/jira/browse/SOLR-1139?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shalin Shekhar Mangar resolved SOLR-1139. - Resolution: Fixed Fix Version/s: 1.5 Committed revision 890053. Thanks Matt! SolrJ TermsComponent Query and Response Support --- Key: SOLR-1139 URL: https://issues.apache.org/jira/browse/SOLR-1139 Project: Solr Issue Type: New Feature Components: clients - java Affects Versions: 1.4 Reporter: Matt Weber Assignee: Shalin Shekhar Mangar Priority: Minor Fix For: 1.5 Attachments: SOLR-1139-WITH_SORT_SUPPORT.patch, SOLR-1139.patch, SOLR-1139.patch, SOLR-1139.patch, SOLR-1139.patch, SOLR-1139.patch, SOLR-1139.patch, SOLR-1139.patch SolrJ should support the new TermsComponent that was introduced in Solr 1.4. It should be able to: - set TermsComponent query parameters via SolrQuery - parse the TermsComponent response -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (SOLR-1177) Distributed TermsComponent
[ https://issues.apache.org/jira/browse/SOLR-1177?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shalin Shekhar Mangar updated SOLR-1177: Attachment: SOLR-1177.patch {code} if (tc.getFrequency() = freqmin tc.getFrequency() = freqmax) { fieldterms.add(tc.getTerm(), ((Number)tc.getFrequency()).intValue()); cnt++; } {code} I changed freqmin and freqmax to long and used Yonik's method to write int if possible or else switch to longs in the response. I'll commit this shortly. Distributed TermsComponent -- Key: SOLR-1177 URL: https://issues.apache.org/jira/browse/SOLR-1177 Project: Solr Issue Type: Improvement Affects Versions: 1.4 Reporter: Matt Weber Assignee: Shalin Shekhar Mangar Priority: Minor Fix For: 1.5 Attachments: SOLR-1177.patch, SOLR-1177.patch, SOLR-1177.patch, SOLR-1177.patch, SOLR-1177.patch, SOLR-1177.patch, TermsComponent.java, TermsComponent.patch TermsComponent should be distributed -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (SOLR-1177) Distributed TermsComponent
[ https://issues.apache.org/jira/browse/SOLR-1177?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shalin Shekhar Mangar resolved SOLR-1177. - Resolution: Fixed Committed revision 890199. Thanks Matt! Distributed TermsComponent -- Key: SOLR-1177 URL: https://issues.apache.org/jira/browse/SOLR-1177 Project: Solr Issue Type: Improvement Affects Versions: 1.4 Reporter: Matt Weber Assignee: Shalin Shekhar Mangar Priority: Minor Fix For: 1.5 Attachments: SOLR-1177.patch, SOLR-1177.patch, SOLR-1177.patch, SOLR-1177.patch, SOLR-1177.patch, SOLR-1177.patch, TermsComponent.java, TermsComponent.patch TermsComponent should be distributed -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1653) add PatternReplaceCharFilter
[ https://issues.apache.org/jira/browse/SOLR-1653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12790026#action_12790026 ] Shalin Shekhar Mangar commented on SOLR-1653: - Koji, even after reading through the test, I do not understand how to use it. Are the characters in curly braces, written down for non-groups only? What if I want to remove one particular group? It is always good to write a use-case and an example in the issue description itself. add PatternReplaceCharFilter Key: SOLR-1653 URL: https://issues.apache.org/jira/browse/SOLR-1653 Project: Solr Issue Type: New Feature Components: Schema and Analysis Affects Versions: 1.4 Reporter: Koji Sekiguchi Assignee: Koji Sekiguchi Priority: Minor Fix For: 1.5 Attachments: SOLR-1653.patch Add a new CharFilter that uses a regular expression for the target of replace string in char stream. Usage: {code:title=schema.xml} fieldType name=textCharNorm class=solr.TextField positionIncrementGap=100 analyzer charFilter class=solr.PatternReplaceCharFilterFactory groupedPattern=([nN][oO]\.)\s*(\d+) replaceGroups=1,2 blockDelimiters=:;/ charFilter class=solr.MappingCharFilterFactory mapping=mapping-ISOLatin1Accent.txt/ tokenizer class=solr.WhitespaceTokenizerFactory/ /analyzer /fieldType {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1177) Distributed TermsComponent
[ https://issues.apache.org/jira/browse/SOLR-1177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12789790#action_12789790 ] Shalin Shekhar Mangar commented on SOLR-1177: - Thanks Matt. Can you please attach the relevant portions to SOLR-1139. We can commit SOLR-1139 first and then resolve this one. Distributed TermsComponent -- Key: SOLR-1177 URL: https://issues.apache.org/jira/browse/SOLR-1177 Project: Solr Issue Type: Improvement Affects Versions: 1.4 Reporter: Matt Weber Assignee: Shalin Shekhar Mangar Priority: Minor Fix For: 1.5 Attachments: SOLR-1177.patch, SOLR-1177.patch, SOLR-1177.patch, SOLR-1177.patch, TermsComponent.java, TermsComponent.patch TermsComponent should be distributed -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Assigned: (SOLR-1651) Incorrect dataimport handler package name in SolrResourceLoader
[ https://issues.apache.org/jira/browse/SOLR-1651?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shalin Shekhar Mangar reassigned SOLR-1651: --- Assignee: Shalin Shekhar Mangar Incorrect dataimport handler package name in SolrResourceLoader --- Key: SOLR-1651 URL: https://issues.apache.org/jira/browse/SOLR-1651 Project: Solr Issue Type: Bug Components: contrib - DataImportHandler Affects Versions: 1.4 Reporter: Akshay K. Ukey Assignee: Shalin Shekhar Mangar Priority: Trivial Fix For: 1.5 Attachments: SOLR-1651.patch packages String array used by findClass method in SolrResourceLoader has value for dataimport handler package as handler.dataimport, must be handler.dataimport. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1177) Distributed TermsComponent
[ https://issues.apache.org/jira/browse/SOLR-1177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12789795#action_12789795 ] Shalin Shekhar Mangar commented on SOLR-1177: - bq. The latest SOLR-1139 patch is included inside the latest patch I attached to this ticket. Should I separate them? Yes. I'll commit SOLR-1139 first so remove those classes from the current patch. PS: I'm sorry if I am confusing you. It is 3AM here and I'm a little confused myself :) Distributed TermsComponent -- Key: SOLR-1177 URL: https://issues.apache.org/jira/browse/SOLR-1177 Project: Solr Issue Type: Improvement Affects Versions: 1.4 Reporter: Matt Weber Assignee: Shalin Shekhar Mangar Priority: Minor Fix For: 1.5 Attachments: SOLR-1177.patch, SOLR-1177.patch, SOLR-1177.patch, SOLR-1177.patch, TermsComponent.java, TermsComponent.patch TermsComponent should be distributed -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1652) Allow single unit test to be executed from SOLR build.xml
[ https://issues.apache.org/jira/browse/SOLR-1652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12789803#action_12789803 ] Shalin Shekhar Mangar commented on SOLR-1652: - This capability already exists. Run a single test using: ant -Dtestcase=TestDistributedSearch clean test Run tests inside a package (recursively): ant -Dtestpackage=org.apache.solr.handler clean test Run tests in package root: ant -Dtestpackageroot=org.apache.solr.handler clean test The above will exclude packages inside handler such as admin and component. Allow single unit test to be executed from SOLR build.xml - Key: SOLR-1652 URL: https://issues.apache.org/jira/browse/SOLR-1652 Project: Solr Issue Type: New Feature Components: Build Affects Versions: 1.2, 1.3, 1.4 Environment: My local MacBook Reporter: Chris A. Mattmann Fix For: 1.5 While playing around and running someone's example code in the form of a test, I realized it might be nice to run a single test from the ant command line when testing SOLR. To my knowledge, there is no way to do this. So, I googled around and found a nice way of doing it. I'll contribute a patch that allows you to do: ant runtest -Dtest=fully qualified class name or just class name no package [-Dargs=jvm args for junit] which will run one of SOLR's unit tests at a time. You can also use *'s in the -Dtest= to run many test cases that match the * expression too. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.