[jira] Commented: (SOLR-1204) Enhance SpellingQueryConverter to handle UTF-8 instead of ASCII only
[ https://issues.apache.org/jira/browse/SOLR-1204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12751337#action_12751337 ] Shalin Shekhar Mangar commented on SOLR-1204: - bq. Since this ticket is marked resolved, I filed SOLR-1407 to point out some closely related problems. Yes, that is how I remembered this one :) Enhance SpellingQueryConverter to handle UTF-8 instead of ASCII only Key: SOLR-1204 URL: https://issues.apache.org/jira/browse/SOLR-1204 Project: Solr Issue Type: Improvement Components: spellchecker Affects Versions: 1.3 Reporter: Michael Ludwig Assignee: Shalin Shekhar Mangar Priority: Trivial Fix For: 1.4 Attachments: SpellingQueryConverter.java.diff, SpellingQueryConverter.java.diff Solr - User - SpellCheckComponent: queryAnalyzerFieldType http://www.nabble.com/SpellCheckComponent%3A-queryAnalyzerFieldType-td23870668.html In the above thread, it was suggested to extend the SpellingQueryConverter to cover the full UTF-8 range instead of handling US-ASCII only. This might be as simple as changing the regular expression used to tokenize the input string to accept a sequence of one or more Unicode letters ( \p{L}+ ) instead of a sequence of one or more word characters ( \w+ ). See http://java.sun.com/j2se/1.4.2/docs/api/java/util/regex/Pattern.html for Java regular expression reference. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-236) Field collapsing
[ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12751358#action_12751358 ] Martijn van Groningen commented on SOLR-236: Hi Abdul, nice improvements. It makes absolutely sense to keep the field values around during the collapsing as a StringIndex. From what I understand the StringIndex does not have duplicate string values, whereas the plain string array has. This will lower the memory footprint. I will add these improvements to the next patch. Thanks for pointing this out! Field collapsing Key: SOLR-236 URL: https://issues.apache.org/jira/browse/SOLR-236 Project: Solr Issue Type: New Feature Components: search Affects Versions: 1.3 Reporter: Emmanuel Keller Fix For: 1.5 Attachments: collapsing-patch-to-1.3.0-dieter.patch, collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch, field-collapse-4-with-solrj.patch, field-collapse-5.patch, field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, solr-236.patch, SOLR-236_collapsing.patch, SOLR-236_collapsing.patch This patch include a new feature called Field collapsing. Used in order to collapse a group of results with similar value for a given field to a single entry in the result set. Site collapsing is a special case of this, where all results for a given web site is collapsed into one or two entries in the result set, typically with an associated more documents from this site link. See also Duplicate detection. http://www.fastsearch.com/glossary.aspx?m=48amid=299 The implementation add 3 new query parameters (SolrParams): collapse.field to choose the field used to group results collapse.type normal (default value) or adjacent collapse.max to select how many continuous results are allowed before collapsing TODO (in progress): - More documentation (on source code) - Test cases Two patches: - field_collapsing.patch for current development version - field_collapsing_1.1.0.patch for Solr-1.1.0 P.S.: Feedback and misspelling correction are welcome ;-) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1410) remove deprecated custom encoding support in russian/greek analysis
[ https://issues.apache.org/jira/browse/SOLR-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12751374#action_12751374 ] Shalin Shekhar Mangar commented on SOLR-1410: - bq. I don't think we've ever really had a situation like this ...logging a warning seems like the right course of action for now ... We actually have done this in DataImportHandler in relation to the syntax for evaluators. Logging a warning is the right way to go. remove deprecated custom encoding support in russian/greek analysis --- Key: SOLR-1410 URL: https://issues.apache.org/jira/browse/SOLR-1410 Project: Solr Issue Type: Task Components: Analysis Reporter: Robert Muir Priority: Minor Attachments: SOLR-1410.patch In this case, analyzers have strange encoding support and it has been deprecated in lucene. For example someone using CP1251 in the russian analyzer is simply storing Ж as 0xC6, its being represented as Æ LUCENE-1793: Deprecate the custom encoding support in the Greek and Russian Analyzers. If you need to index text in these encodings, please use Java's character set conversion facilities (InputStreamReader, etc) during I/O, so that Lucene can analyze this text as Unicode instead. I noticed in solr, the factories for these tokenstreams allow these configuration options, which are deprecated in 2.9 to be removed in 3.0 Let me know the policy (how do you deprecate a config option in solr exactly, log a warning, etc?) and I'd be happy to create a patch. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Hudson build is back to normal: Solr-trunk #914
See http://hudson.zones.apache.org/hudson/job/Solr-trunk/914/changes
[jira] Updated: (SOLR-1407) SpellingQueryConverter now disallows underscores and digits in field names (but allows all UTF-8 letters)
[ https://issues.apache.org/jira/browse/SOLR-1407?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Ludwig updated SOLR-1407: - Attachment: SpellingQueryConverter.java As announced in SOLR-1204, I'm posting the version I had prepared back in June. Maybe it is useful, maybe not. The question of why there is this extra sequence of digits in the regular expression is still entirely unclear to me. Caveat emptor! SpellingQueryConverter now disallows underscores and digits in field names (but allows all UTF-8 letters) - Key: SOLR-1407 URL: https://issues.apache.org/jira/browse/SOLR-1407 Project: Solr Issue Type: Improvement Components: spellchecker Affects Versions: 1.3 Reporter: David Bowen Assignee: Shalin Shekhar Mangar Priority: Trivial Fix For: 1.4 Attachments: SpellingQueryConverter.java, SpellingQueryConverter.java SpellingQueryConverter was extended to cover the full UTF-8 range instead of handling US-ASCII only, but in the process it was broken for field names that contain underscores or digits. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1408) Allow classes from ${solr.home}/lib to be loaded by the same classloader as solr war to prevent ClassCastException
[ https://issues.apache.org/jira/browse/SOLR-1408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12751487#action_12751487 ] Luke Forehand commented on SOLR-1408: - This is also happening when I try to extend EventListener, I get the mysterious ClassCastException from within Solr. I am running solr from a jetty server, specifying solr.home using JNDI, and I am starting the jetty server from within a unit test for integration testing purposes. Allow classes from ${solr.home}/lib to be loaded by the same classloader as solr war to prevent ClassCastException -- Key: SOLR-1408 URL: https://issues.apache.org/jira/browse/SOLR-1408 Project: Solr Issue Type: Improvement Components: contrib - DataImportHandler Affects Versions: 1.3 Reporter: Luke Forehand When extending org.apache.solr.handler.dataimport.DataSource, I would like to package my extended class in ${solr.home}/lib to that I can keep the vanilla copy of my solr.war intact. The problem is I encounter a ClassCastException when Solr tries to create a newInstance of my extended class, which I suspect has to do with the DataSource and my extended class being loaded from different classloaders. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (SOLR-1411) SolrJ SolrCell Request
[ https://issues.apache.org/jira/browse/SOLR-1411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Ingersoll updated SOLR-1411: -- Attachment: SOLR-1411.patch Adds SolrCellRequest to the SolrJ common. Will commit in a day or two. SolrJ SolrCell Request -- Key: SOLR-1411 URL: https://issues.apache.org/jira/browse/SOLR-1411 Project: Solr Issue Type: Improvement Reporter: Grant Ingersoll Assignee: Grant Ingersoll Priority: Trivial Fix For: 1.4 Attachments: SOLR-1411.patch Create a SolrRequest for SolrJ that can add Solr Cell documents (PDF, Word, etc.) to Solr for indexing. Patch shortly -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Assigned: (SOLR-1400) Document with empty or white-space only string causes exception with TrimFilter
[ https://issues.apache.org/jira/browse/SOLR-1400?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Ingersoll reassigned SOLR-1400: - Assignee: Grant Ingersoll Document with empty or white-space only string causes exception with TrimFilter --- Key: SOLR-1400 URL: https://issues.apache.org/jira/browse/SOLR-1400 Project: Solr Issue Type: Bug Components: update Affects Versions: 1.4 Reporter: Peter Wolanin Assignee: Grant Ingersoll Fix For: 1.4 Attachments: trim-example.xml Observed with Solr trunk. Posting any empty or whitespace-only string to a field using the {code}filter class=solr.TrimFilterFactory /{code} Causes a java exception: {code} Sep 1, 2009 4:58:09 PM org.apache.solr.common.SolrException log SEVERE: java.lang.ArrayIndexOutOfBoundsException: -1 at org.apache.solr.analysis.TrimFilter.incrementToken(TrimFilter.java:63) at org.apache.solr.analysis.PatternReplaceFilter.incrementToken(PatternReplaceFilter.java:74) at org.apache.lucene.index.DocInverterPerField.processFields(DocInverterPerField.java:138) at org.apache.lucene.index.DocFieldProcessorPerThread.processDocument(DocFieldProcessorPerThread.java:244) at org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:772) at org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:755) at org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:2611) at org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:2583) at org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:241) at org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:61) at org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:140) at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:69) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:54) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1299) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241) {code} Trim of an empty or WS-only string should not fail. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Re: Lucene RC2
On Aug 29, 2009, at 3:38 PM, Yonik Seeley wrote: On Sat, Aug 29, 2009 at 5:44 PM, Bill Aubill.w...@gmail.com wrote: Yonik, Are you in the process of trying it out or upgrading Solr, or both? Bill It's done: http://svn.apache.org/viewvc?view=revrevision=809010 You should add a note to CHANGES.txt.
[jira] Commented: (SOLR-1400) Document with empty or white-space only string causes exception with TrimFilter
[ https://issues.apache.org/jira/browse/SOLR-1400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12751511#action_12751511 ] Grant Ingersoll commented on SOLR-1400: --- Hmm, trimFilter has a test for all whitespace Document with empty or white-space only string causes exception with TrimFilter --- Key: SOLR-1400 URL: https://issues.apache.org/jira/browse/SOLR-1400 Project: Solr Issue Type: Bug Components: update Affects Versions: 1.4 Reporter: Peter Wolanin Assignee: Grant Ingersoll Fix For: 1.4 Attachments: trim-example.xml Observed with Solr trunk. Posting any empty or whitespace-only string to a field using the {code}filter class=solr.TrimFilterFactory /{code} Causes a java exception: {code} Sep 1, 2009 4:58:09 PM org.apache.solr.common.SolrException log SEVERE: java.lang.ArrayIndexOutOfBoundsException: -1 at org.apache.solr.analysis.TrimFilter.incrementToken(TrimFilter.java:63) at org.apache.solr.analysis.PatternReplaceFilter.incrementToken(PatternReplaceFilter.java:74) at org.apache.lucene.index.DocInverterPerField.processFields(DocInverterPerField.java:138) at org.apache.lucene.index.DocFieldProcessorPerThread.processDocument(DocFieldProcessorPerThread.java:244) at org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:772) at org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:755) at org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:2611) at org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:2583) at org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:241) at org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:61) at org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:140) at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:69) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:54) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1299) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241) {code} Trim of an empty or WS-only string should not fail. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Re: Solr development with IntelliJIDEA - looking for advice
Grant, Are you able to run single unit test from IDEA? How do you setup resource folders for tests in this case? Or do you run it manually from command line via ant? Regards, Lukas On Thu, Sep 3, 2009 at 4:05 PM, Grant Ingersoll gsing...@apache.org wrote: I usually skip through the Wizard stuff as fast as possible and then just add the modules by hand, as IntelliJ thinks it is smart at this stuff when it really isn't. For the core Solr, I create a Project Library dependency that has 3 JAR Directories as dependencies: ./lib example/lib example/lib/jsp-2.1 YMMV. This is one place where Maven is _so much better_ than Ant. Point IntelliJ at the pom.xml, and you have it all setup, including all the submodules, etc. On Sep 3, 2009, at 6:42 AM, Lukáš Vlček wrote: Hello, I noticed that several developers (Yonik, Grant, ... ?) are using IntelliJIDEA for Solr development. Is anybody willing to share his/her experience about how to setup and open Solr project in IntelliJIDEA? I am quite new to IntelliJIDEA and I would greatly appreciate any *how-to* or *for dummies* step-by-step tutorial. I tried to create a new project in IDEA from existing sources (fresh solr-trunk) and simply followed the wizard but this does not seem to be the best option (getting some circular dependencies and missing classpath issues). Note: I am using IntelliJIDEA 8.1.3 Regards, Lukas -- Grant Ingersoll http://www.lucidimagination.com/ Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using Solr/Lucene: http://www.lucidimagination.com/search
Re: capturing field length into a stored document field
Sorry wrong list mike.schultz wrote: For various statistics I collect from an index it's important for me to know the length (measured in tokens) of a document field. I can get that information to some degree from the norms for the field but a) the resolution isn't that great, and b) more importantly, if boosts are used it's almost impossible to get lengths from this. Here's two ideas I was thinking about that maybe some can comment on. 1) Use copyto to copy the field in question, fieldA to an addition field, fieldALength, which has an extra filter that just counts the tokens and only outputs a token representing the length of the field. This has the disadvantage of retokenizing basically the whole document (because the field in question is basically the body). Plus I would think littering the term space with these tokens might be bad for performance, I'm not sure. 2) Add a filter to the field in question which again counts the tokens. This filter allows the regular tokens to be indexed as usual but somehow manages to get the token-count into a stored field of the document. This has the advantage of not having to retokenize the field and instead of littering the token space, the count becomes docdata for each doc. Can this be done? Maybe using threadLocal to temporarily store the count? Thanks. -- View this message in context: http://www.nabble.com/capturing-field-length-into-a-stored-document-field-tp25297597p25297661.html Sent from the Solr - Dev mailing list archive at Nabble.com.
[jira] Updated: (SOLR-1400) Document with empty or white-space only string causes exception with TrimFilter
[ https://issues.apache.org/jira/browse/SOLR-1400?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Ingersoll updated SOLR-1400: -- Attachment: SOLR-1400.patch Try this out. Document with empty or white-space only string causes exception with TrimFilter --- Key: SOLR-1400 URL: https://issues.apache.org/jira/browse/SOLR-1400 Project: Solr Issue Type: Bug Components: update Affects Versions: 1.4 Reporter: Peter Wolanin Assignee: Grant Ingersoll Fix For: 1.4 Attachments: SOLR-1400.patch, trim-example.xml Observed with Solr trunk. Posting any empty or whitespace-only string to a field using the {code}filter class=solr.TrimFilterFactory /{code} Causes a java exception: {code} Sep 1, 2009 4:58:09 PM org.apache.solr.common.SolrException log SEVERE: java.lang.ArrayIndexOutOfBoundsException: -1 at org.apache.solr.analysis.TrimFilter.incrementToken(TrimFilter.java:63) at org.apache.solr.analysis.PatternReplaceFilter.incrementToken(PatternReplaceFilter.java:74) at org.apache.lucene.index.DocInverterPerField.processFields(DocInverterPerField.java:138) at org.apache.lucene.index.DocFieldProcessorPerThread.processDocument(DocFieldProcessorPerThread.java:244) at org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:772) at org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:755) at org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:2611) at org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:2583) at org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:241) at org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:61) at org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:140) at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:69) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:54) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1299) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241) {code} Trim of an empty or WS-only string should not fail. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1406) Add ability to retrieve DataConfig from dataimport Context
[ https://issues.apache.org/jira/browse/SOLR-1406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12751587#action_12751587 ] Luke Forehand commented on SOLR-1406: - I could extend FileListEntityProcessor if it was written in a more extensible way, for example, exposing it's baseUrl and fileName private members with accessor methods, and refactoring some of the private methods that do fileName filtering so that they are reusable and protected. Add ability to retrieve DataConfig from dataimport Context -- Key: SOLR-1406 URL: https://issues.apache.org/jira/browse/SOLR-1406 Project: Solr Issue Type: Improvement Components: contrib - DataImportHandler Affects Versions: 1.4 Reporter: Luke Forehand Assignee: Noble Paul Attachments: SOLR-1406.patch The ability to retrieve the DataConfig is very useful for inspecting configuration attributes within an EventListener! -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Re: Lucene RC2
I keep sending emails from the wrong account: attempt 2: I think it's kind of weird how we add an entry every update - IMO it should be one entry- upgraded to Lucene 2.9. That's going to be the only change. - Mark http://www.lucidimagination.com (mobile) On Sep 4, 2009, at 12:03 PM, Grant Ingersoll gsing...@apache.org wrote: On Aug 29, 2009, at 3:38 PM, Yonik Seeley wrote: On Sat, Aug 29, 2009 at 5:44 PM, Bill Aubill.w...@gmail.com wrote: Yonik, Are you in the process of trying it out or upgrading Solr, or both? Bill It's done: http://svn.apache.org/viewvc?view=revrevision=809010 You should add a note to CHANGES.txt.
[jira] Created: (SOLR-1412) Add solr-lucene-memory and solr-lucene-misc jars to maven repository
Add solr-lucene-memory and solr-lucene-misc jars to maven repository Key: SOLR-1412 URL: https://issues.apache.org/jira/browse/SOLR-1412 Project: Solr Issue Type: Wish Affects Versions: 1.4 Reporter: Igor Motov Priority: Minor Since solr-lucene-memory and solr-lucene-misc jars were added to the distribution (see [SOLR-804|https://issues.apache.org/jira/browse/SOLR-804]) it would make sense to add them to maven repository as well. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (SOLR-1412) Add solr-lucene-memory and solr-lucene-misc jars to maven repository
[ https://issues.apache.org/jira/browse/SOLR-1412?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Igor Motov updated SOLR-1412: - Attachment: SOLR-1412.patch The patch that adds solr-lucene-misc and solr-lucene-memory to maven repository. Add solr-lucene-memory and solr-lucene-misc jars to maven repository Key: SOLR-1412 URL: https://issues.apache.org/jira/browse/SOLR-1412 Project: Solr Issue Type: Wish Affects Versions: 1.4 Reporter: Igor Motov Priority: Minor Attachments: SOLR-1412.patch Since solr-lucene-memory and solr-lucene-misc jars were added to the distribution (see [SOLR-804|https://issues.apache.org/jira/browse/SOLR-804]) it would make sense to add them to maven repository as well. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-236) Field collapsing
[ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12751615#action_12751615 ] Abdul Chaudhry commented on SOLR-236: - If this helps you fix your unit tests. I fixed the unit tests by changing the CollapseFilter constructor that's used for testing to take a StringIndex like so :- - CollapseFilter(int collapseMaxDocs, int collapseTreshold) { + CollapseFilter(int collapseMaxDocs, int collapseTreshold, FieldCache.StringIndex index) { +this.collapseIndex = index; and then I changed the unit test cases to move values into a StringIndex in CollapseFilterTest like so:- public void testNormalCollapse_collapseThresholdOne() { -collapseFilter = new CollapseFilter(Integer.MAX_VALUE, 1); +String[] values = new String[]{a, b, c}; +int[] order = new int[]{0, 1, 0, 2, 1, 0, 1}; +FieldCache.StringIndex index = new FieldCache.StringIndex(order, values); +int[] docIds = new int[]{1, 2, 0, 3, 4, 5, 6}; + +collapseFilter = new CollapseFilter(Integer.MAX_VALUE, 1, index); -String[] values = new String[]{a, b, a, c, b, a, b}; Field collapsing Key: SOLR-236 URL: https://issues.apache.org/jira/browse/SOLR-236 Project: Solr Issue Type: New Feature Components: search Affects Versions: 1.3 Reporter: Emmanuel Keller Fix For: 1.5 Attachments: collapsing-patch-to-1.3.0-dieter.patch, collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch, field-collapse-4-with-solrj.patch, field-collapse-5.patch, field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, solr-236.patch, SOLR-236_collapsing.patch, SOLR-236_collapsing.patch This patch include a new feature called Field collapsing. Used in order to collapse a group of results with similar value for a given field to a single entry in the result set. Site collapsing is a special case of this, where all results for a given web site is collapsed into one or two entries in the result set, typically with an associated more documents from this site link. See also Duplicate detection. http://www.fastsearch.com/glossary.aspx?m=48amid=299 The implementation add 3 new query parameters (SolrParams): collapse.field to choose the field used to group results collapse.type normal (default value) or adjacent collapse.max to select how many continuous results are allowed before collapsing TODO (in progress): - More documentation (on source code) - Test cases Two patches: - field_collapsing.patch for current development version - field_collapsing_1.1.0.patch for Solr-1.1.0 P.S.: Feedback and misspelling correction are welcome ;-) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Re: Solr development with IntelliJIDEA - looking for advice
On Fri, Sep 4, 2009 at 9:43 PM, Lukáš Vlček lukas.vl...@gmail.com wrote: Grant, Are you able to run single unit test from IDEA? How do you setup resource folders for tests in this case? Or do you run it manually from command line via ant? To run a test from IDEA, set the start path (I don't remember the exact name) to src/test/test-files. To run only one single test from ant use -Dtestcase=class-name -- Regards, Shalin Shekhar Mangar.
[jira] Commented: (SOLR-1406) Add ability to retrieve DataConfig from dataimport Context
[ https://issues.apache.org/jira/browse/SOLR-1406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12751638#action_12751638 ] Shalin Shekhar Mangar commented on SOLR-1406: - bq. I could extend FileListEntityProcessor if it was written in a more extensible way, for example, exposing it's baseUrl and fileName private members with accessor methods, and refactoring some of the private methods that do fileName filtering so that they are reusable and protected. Ah, I see. Well, that is easier than exposing DataConfig. DataConfig was never really meant to be exposed. We need to have another look at DataConfig before exposing making it a public API. How about you create an issue (or rename this one) to make FileListEntityProcessor more extensible rather than exposing DataConfig? We can get that in for 1.4. Add ability to retrieve DataConfig from dataimport Context -- Key: SOLR-1406 URL: https://issues.apache.org/jira/browse/SOLR-1406 Project: Solr Issue Type: Improvement Components: contrib - DataImportHandler Affects Versions: 1.4 Reporter: Luke Forehand Assignee: Noble Paul Attachments: SOLR-1406.patch The ability to retrieve the DataConfig is very useful for inspecting configuration attributes within an EventListener! -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Re: Lucene RC2
It's very useful to know the rev # in a place that doesn't require: 1) starting up Solr, 2) unpacking the Lucene jar, but yeah, we could just have one entry at the top or something that just lists what the current version and rev # are. On Sep 4, 2009, at 2:41 PM, Mark Miller wrote: I keep sending emails from the wrong account: attempt 2: I think it's kind of weird how we add an entry every update - IMO it should be one entry- upgraded to Lucene 2.9. That's going to be the only change. - Mark http://www.lucidimagination.com (mobile) On Sep 4, 2009, at 12:03 PM, Grant Ingersoll gsing...@apache.org wrote: On Aug 29, 2009, at 3:38 PM, Yonik Seeley wrote: On Sat, Aug 29, 2009 at 5:44 PM, Bill Aubill.w...@gmail.com wrote: Yonik, Are you in the process of trying it out or upgrading Solr, or both? Bill It's done: http://svn.apache.org/viewvc?view=revrevision=809010 You should add a note to CHANGES.txt.
Re: Lucene RC2
+1 - I'm not against knowing what the last rev upgraded to was - I also think thats important. It just seems the Changes log should read what changed from 1.3 or else its a little confusing. You could make another argument with so many on trunk - but in my mind, the only thing those going from 1.3 to 1.4 should need to worry about is upgraded to 2.9 - not follow the whole dev path as changes invalidate changes. Not a big deal if I am the only one that thinks that, just a thought. If we didn't do it in general, it wouldn't matter if we didn't do with the Lucene upgrade though. - Mark Grant Ingersoll wrote: It's very useful to know the rev # in a place that doesn't require: 1) starting up Solr, 2) unpacking the Lucene jar, but yeah, we could just have one entry at the top or something that just lists what the current version and rev # are. On Sep 4, 2009, at 2:41 PM, Mark Miller wrote: I keep sending emails from the wrong account: attempt 2: I think it's kind of weird how we add an entry every update - IMO it should be one entry- upgraded to Lucene 2.9. That's going to be the only change. - Mark http://www.lucidimagination.com (mobile) On Sep 4, 2009, at 12:03 PM, Grant Ingersoll gsing...@apache.org wrote: On Aug 29, 2009, at 3:38 PM, Yonik Seeley wrote: On Sat, Aug 29, 2009 at 5:44 PM, Bill Aubill.w...@gmail.com wrote: Yonik, Are you in the process of trying it out or upgrading Solr, or both? Bill It's done: http://svn.apache.org/viewvc?view=revrevision=809010 You should add a note to CHANGES.txt. -- - Mark http://www.lucidimagination.com
[jira] Updated: (SOLR-1406) Refactor FileDataSource and FileListEntityProcessor to be more extendable
[ https://issues.apache.org/jira/browse/SOLR-1406?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Luke Forehand updated SOLR-1406: Description: FileDataSource should make openStream method protected so we can extend FileDataSource for other File types such as GZip, by controlling the underlying InputStreamReader implementation being returned. FileListEntityProcessor needs to aggregate a list of files that were processed and expose that list in an accessible way so that further processing on that file list can be done in the close method. For example, deletion or archiving. Another improvement would be that in the event of an indexing rollback event, processing of the close method either does not occur, or the close method is allowed access to that event, to prevent processing within the close method if necessary. was:The ability to retrieve the DataConfig is very useful for inspecting configuration attributes within an EventListener! Summary: Refactor FileDataSource and FileListEntityProcessor to be more extendable (was: Add ability to retrieve DataConfig from dataimport Context) Refactor FileDataSource and FileListEntityProcessor to be more extendable - Key: SOLR-1406 URL: https://issues.apache.org/jira/browse/SOLR-1406 Project: Solr Issue Type: Improvement Components: contrib - DataImportHandler Affects Versions: 1.4 Reporter: Luke Forehand Assignee: Noble Paul Attachments: SOLR-1406.patch FileDataSource should make openStream method protected so we can extend FileDataSource for other File types such as GZip, by controlling the underlying InputStreamReader implementation being returned. FileListEntityProcessor needs to aggregate a list of files that were processed and expose that list in an accessible way so that further processing on that file list can be done in the close method. For example, deletion or archiving. Another improvement would be that in the event of an indexing rollback event, processing of the close method either does not occur, or the close method is allowed access to that event, to prevent processing within the close method if necessary. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (SOLR-1408) Classes in ${solr.home}/lib are not able to extend classes loaded by solr war - ClassCastException
[ https://issues.apache.org/jira/browse/SOLR-1408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Luke Forehand updated SOLR-1408: Description: When extending org.apache.solr.handler.dataimport.DataSource, I would like to package my extended class in ${solr.home}/lib to that I can keep the vanilla copy of my solr.war intact. The problem is I encounter a ClassCastException when Solr tries to create a newInstance of my extended class. Although the parent classloader of ${solr.home}/lib classloader loads DataSource, I am still getting a ClassCastException when a class in ${solr.home}/lib extends DataSource. The solr instance is being deployed to a jetty plus server that is running inside a unit test. was:When extending org.apache.solr.handler.dataimport.DataSource, I would like to package my extended class in ${solr.home}/lib to that I can keep the vanilla copy of my solr.war intact. The problem is I encounter a ClassCastException when Solr tries to create a newInstance of my extended class, which I suspect has to do with the DataSource and my extended class being loaded from different classloaders. Issue Type: Bug (was: Improvement) Summary: Classes in ${solr.home}/lib are not able to extend classes loaded by solr war - ClassCastException (was: Allow classes from ${solr.home}/lib to be loaded by the same classloader as solr war to prevent ClassCastException) Classes in ${solr.home}/lib are not able to extend classes loaded by solr war - ClassCastException -- Key: SOLR-1408 URL: https://issues.apache.org/jira/browse/SOLR-1408 Project: Solr Issue Type: Bug Components: contrib - DataImportHandler Affects Versions: 1.3 Reporter: Luke Forehand When extending org.apache.solr.handler.dataimport.DataSource, I would like to package my extended class in ${solr.home}/lib to that I can keep the vanilla copy of my solr.war intact. The problem is I encounter a ClassCastException when Solr tries to create a newInstance of my extended class. Although the parent classloader of ${solr.home}/lib classloader loads DataSource, I am still getting a ClassCastException when a class in ${solr.home}/lib extends DataSource. The solr instance is being deployed to a jetty plus server that is running inside a unit test. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1408) Classes in ${solr.home}/lib are not able to extend classes loaded by solr war - ClassCastException
[ https://issues.apache.org/jira/browse/SOLR-1408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12751684#action_12751684 ] Avlesh Singh commented on SOLR-1408: bq. I am starting the jetty server from within a unit test for integration testing purposes. Does it fail in unit testing? I too suspect that there is a problem. I too have similar extensions of DIH and UpdateProcessors, which lie in the lib. I have never faced any such issue on any of the platforms. Classes in ${solr.home}/lib are not able to extend classes loaded by solr war - ClassCastException -- Key: SOLR-1408 URL: https://issues.apache.org/jira/browse/SOLR-1408 Project: Solr Issue Type: Bug Components: contrib - DataImportHandler Affects Versions: 1.3 Reporter: Luke Forehand When extending org.apache.solr.handler.dataimport.DataSource, I would like to package my extended class in ${solr.home}/lib to that I can keep the vanilla copy of my solr.war intact. The problem is I encounter a ClassCastException when Solr tries to create a newInstance of my extended class. Although the parent classloader of ${solr.home}/lib classloader loads DataSource, I am still getting a ClassCastException when a class in ${solr.home}/lib extends DataSource. The solr instance is being deployed to a jetty plus server that is running inside a unit test. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1401) solr should error on document add/update if uniqueKey field has multiple tokens.
[ https://issues.apache.org/jira/browse/SOLR-1401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12751692#action_12751692 ] Igor Motov commented on SOLR-1401: -- It might be helpful to expand this to other non-trivial analyzers as well. Even if an analyzer produces a single token, removal of duplicates and distributed search don't function properly for any ids that were modified by the analyzer. To see how it works, just change type of id field to tightText and add a record with id ID twice. The tightText analyzer produces a single token for this value, and yet the record appears twice in the result list. At the same time, in distributed search (even with a single shard), these records completely disappear from the result list. This problem combined with recommendation for using textTight for SKUs in the schema.xml causes problems for some novice users. Frequently, SKU is a natural id and changing type for id from string to textTight is one of the first schema modifications that some users do, and then it takes them days to figure out the problem: http://www.nabble.com/uniqueKey-gives-duplicate-values-td15341288.html http://www.nabble.com/Adding-new-docs%2C-but-duplicating-instead-of-updating-td25241444.html http://www.nabble.com/Solr-Shard---Strange-results-td23561201.html http://www.nabble.com/Shard-Query-Problem-td22110121.html solr should error on document add/update if uniqueKey field has multiple tokens. Key: SOLR-1401 URL: https://issues.apache.org/jira/browse/SOLR-1401 Project: Solr Issue Type: Improvement Reporter: Hoss Man over the years, have seem more then a few solr-user posts noticing odd behavior when using a uniqueKey field configured to use TextField with a non trivial analyzer ... we shouldn't error on TextField (KeyworkdTokenizer is perfectly legitimate) but we should error if that analyzer produces multiple tokens. Likewise we should verify that good error messages if uniqueKey field is configured such that multivalued=true. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Closed: (SOLR-1408) Classes in ${solr.home}/lib are not able to extend classes loaded by solr war - ClassCastException
[ https://issues.apache.org/jira/browse/SOLR-1408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Luke Forehand closed SOLR-1408. --- Resolution: Invalid This is not a bug. The problem was that my extending classes were being compiled onto the testing classpath and they were also packaged into the jar within ${solr.home}/lib. They were being loaded by junit before being loaded by solr and that was causing the ClassCastException. When I removed the extending classes from the test classpath, everything worked. Classes in ${solr.home}/lib are not able to extend classes loaded by solr war - ClassCastException -- Key: SOLR-1408 URL: https://issues.apache.org/jira/browse/SOLR-1408 Project: Solr Issue Type: Bug Components: contrib - DataImportHandler Affects Versions: 1.3 Reporter: Luke Forehand When extending org.apache.solr.handler.dataimport.DataSource, I would like to package my extended class in ${solr.home}/lib to that I can keep the vanilla copy of my solr.war intact. The problem is I encounter a ClassCastException when Solr tries to create a newInstance of my extended class. Although the parent classloader of ${solr.home}/lib classloader loads DataSource, I am still getting a ClassCastException when a class in ${solr.home}/lib extends DataSource. The solr instance is being deployed to a jetty plus server that is running inside a unit test. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Re: [jira] Closed: (SOLR-1408) Classes in ${solr.home}/lib are not able to extend classes loaded by solr war - ClassCastException
It is generally a good idea to cross check for a bug on the user/dev mailing list and then create the issue, Luke. Cheers Avlesh On Sat, Sep 5, 2009 at 9:37 AM, Luke Forehand (JIRA) j...@apache.orgwrote: [ https://issues.apache.org/jira/browse/SOLR-1408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel] Luke Forehand closed SOLR-1408. --- Resolution: Invalid This is not a bug. The problem was that my extending classes were being compiled onto the testing classpath and they were also packaged into the jar within ${solr.home}/lib. They were being loaded by junit before being loaded by solr and that was causing the ClassCastException. When I removed the extending classes from the test classpath, everything worked. Classes in ${solr.home}/lib are not able to extend classes loaded by solr war - ClassCastException -- Key: SOLR-1408 URL: https://issues.apache.org/jira/browse/SOLR-1408 Project: Solr Issue Type: Bug Components: contrib - DataImportHandler Affects Versions: 1.3 Reporter: Luke Forehand When extending org.apache.solr.handler.dataimport.DataSource, I would like to package my extended class in ${solr.home}/lib to that I can keep the vanilla copy of my solr.war intact. The problem is I encounter a ClassCastException when Solr tries to create a newInstance of my extended class. Although the parent classloader of ${solr.home}/lib classloader loads DataSource, I am still getting a ClassCastException when a class in ${solr.home}/lib extends DataSource. The solr instance is being deployed to a jetty plus server that is running inside a unit test. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.