[jira] Commented: (SOLR-1632) Distributed IDF
[ https://issues.apache.org/jira/browse/SOLR-1632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12789174#action_12789174 ] Andrzej Bialecki commented on SOLR-1632: - I'm not sure what approach you are referring to. Following the terminology in that thread, this implementation follows the approach where there is a single merged big idf map at the master, and it's sent out to slaves on each query. However, when exactly this merging and sending happens is implementation-specific - in the ExactDFSource it happens on every query, but I hope the API can support other scenarios as well. Distributed IDF --- Key: SOLR-1632 URL: https://issues.apache.org/jira/browse/SOLR-1632 Project: Solr Issue Type: New Feature Components: search Affects Versions: 1.5 Reporter: Andrzej Bialecki Attachments: distrib.patch Distributed IDF is a valuable enhancement for distributed search across non-uniform shards. This issue tracks the proposed implementation of an API to support this functionality in Solr. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (SOLR-1625) Add regexp support for TermsComponent
[ https://issues.apache.org/jira/browse/SOLR-1625?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Noble Paul resolved SOLR-1625. -- Resolution: Fixed committed r889537 Thanks Uri Boness Add regexp support for TermsComponent - Key: SOLR-1625 URL: https://issues.apache.org/jira/browse/SOLR-1625 Project: Solr Issue Type: Improvement Components: search Affects Versions: 1.4 Reporter: Uri Boness Assignee: Noble Paul Priority: Minor Fix For: 1.5 Attachments: SOLR-1625.patch, SOLR-1625.patch, SOLR-1625.patch At the moment the only way to filter the returned terms is by a prefix. It would be nice it the filter could also be done by regular expression -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (SOLR-1598) Reader used in PlainTextEntityProcessor.nextRow() is not explicitly closed
[ https://issues.apache.org/jira/browse/SOLR-1598?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Noble Paul resolved SOLR-1598. -- Resolution: Fixed committed r889541 Reader used in PlainTextEntityProcessor.nextRow() is not explicitly closed -- Key: SOLR-1598 URL: https://issues.apache.org/jira/browse/SOLR-1598 Project: Solr Issue Type: Bug Components: contrib - DataImportHandler Affects Versions: 1.4 Reporter: Sascha Szott Assignee: Noble Paul Fix For: 1.5 Attachments: SOLR-1598.patch, solr-1598.patch Due to a missing r.close() statement at the end of method PlainTextEntityProcessor.nextRow(), IO exceptions (too many open files) can occur when large numbers of files are processed. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Issue Comment Edited: (SOLR-1598) Reader used in PlainTextEntityProcessor.nextRow() is not explicitly closed
[ https://issues.apache.org/jira/browse/SOLR-1598?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12789191#action_12789191 ] Noble Paul edited comment on SOLR-1598 at 12/11/09 9:25 AM: committed r889541 Thanks Sascha Szott was (Author: noble.paul): committed r889541 Reader used in PlainTextEntityProcessor.nextRow() is not explicitly closed -- Key: SOLR-1598 URL: https://issues.apache.org/jira/browse/SOLR-1598 Project: Solr Issue Type: Bug Components: contrib - DataImportHandler Affects Versions: 1.4 Reporter: Sascha Szott Assignee: Noble Paul Fix For: 1.5 Attachments: SOLR-1598.patch, solr-1598.patch Due to a missing r.close() statement at the end of method PlainTextEntityProcessor.nextRow(), IO exceptions (too many open files) can occur when large numbers of files are processed. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Re: svn commit: r886127 - in /lucene/solr/trunk/src: solrj/org/apache/solr/client/solrj/beans/DocumentObjectBinder.java test/org/apache/solr/client/solrj/beans/TestDocumentObjectBinder.java
fixed. Thanks Hoss On Fri, Dec 11, 2009 at 7:01 AM, Chris Hostetter hossman_luc...@fucit.org wrote: Noble: 1) you *have* to include a CHANGES.txt entry for every non-trivial commit ... if it has a Jira issue, there better be a CHANGES.txt entry, and the CHANGES.txt entry really needs to be in the same atomic commit as the rest of the changes, not a follow up commit, so code changes can be correlated to why the change was made. 2) CHANGES.txt entries must cite teh person who contributed the patch. 3) you have to be careful to cite the correct Jira issue when making commits -- this commit doesn't seem to have anything to do with SOLR-1516, i'm pretty sure it was for SOLR-1357 ...with out all three of these things, it's nearly impossible to audit changes later and understand what they were, and who they came from. : Date: Wed, 02 Dec 2009 11:57:17 - : From: no...@apache.org : Reply-To: solr-dev@lucene.apache.org : To: solr-comm...@lucene.apache.org : Subject: svn commit: r886127 - in /lucene/solr/trunk/src: : solrj/org/apache/solr/client/solrj/beans/DocumentObjectBinder.java : test/org/apache/solr/client/solrj/beans/TestDocumentObjectBinder.java : : Author: noble : Date: Wed Dec 2 11:57:15 2009 : New Revision: 886127 : : URL: http://svn.apache.org/viewvc?rev=886127view=rev : Log: : SOLR-1516 SolrInputDocument cannot process dynamic fields : : Modified: : lucene/solr/trunk/src/solrj/org/apache/solr/client/solrj/beans/DocumentObjectBinder.java : lucene/solr/trunk/src/test/org/apache/solr/client/solrj/beans/TestDocumentObjectBinder.java : : Modified: lucene/solr/trunk/src/solrj/org/apache/solr/client/solrj/beans/DocumentObjectBinder.java : URL: http://svn.apache.org/viewvc/lucene/solr/trunk/src/solrj/org/apache/solr/client/solrj/beans/DocumentObjectBinder.java?rev=886127r1=886126r2=886127view=diff : == : --- lucene/solr/trunk/src/solrj/org/apache/solr/client/solrj/beans/DocumentObjectBinder.java (original) : +++ lucene/solr/trunk/src/solrj/org/apache/solr/client/solrj/beans/DocumentObjectBinder.java Wed Dec 2 11:57:15 2009 : @@ -76,9 +76,19 @@ : } : : SolrInputDocument doc = new SolrInputDocument(); : - for( DocField field : fields ) { : - doc.setField( field.name, field.get( obj ), 1.0f ); : - } : + for (DocField field : fields) { : + if (field.dynamicFieldNamePatternMatcher != null : + field.get(obj) != null field.isContainedInMap) { : + MapString, Object mapValue = (HashMapString, Object) field : + .get(obj); : + : + for (Map.EntryString, Object e : mapValue.entrySet()) { : + doc.setField( e.getKey(), e.getValue(), 1.0f); : + } : + } else { : + doc.setField(field.name, field.get(obj), 1.0f); : + } : + } : return doc; : } : : : Modified: lucene/solr/trunk/src/test/org/apache/solr/client/solrj/beans/TestDocumentObjectBinder.java : URL: http://svn.apache.org/viewvc/lucene/solr/trunk/src/test/org/apache/solr/client/solrj/beans/TestDocumentObjectBinder.java?rev=886127r1=886126r2=886127view=diff : == : --- lucene/solr/trunk/src/test/org/apache/solr/client/solrj/beans/TestDocumentObjectBinder.java (original) : +++ lucene/solr/trunk/src/test/org/apache/solr/client/solrj/beans/TestDocumentObjectBinder.java Wed Dec 2 11:57:15 2009 : @@ -25,12 +25,14 @@ : import org.apache.solr.common.SolrInputDocument; : import org.apache.solr.common.SolrInputField; : import org.apache.solr.common.SolrDocument; : +import org.apache.solr.common.util.Hash; : import org.apache.solr.common.util.NamedList; : import org.junit.Assert; : : import java.io.StringReader; : import java.util.Arrays; : import java.util.Date; : +import java.util.HashMap; : import java.util.List; : import java.util.Map; : : @@ -100,6 +102,15 @@ : item.inStock = false; : item.categories = new String[] { aaa, bbb, ccc }; : item.features = Arrays.asList( item.categories ); : + ListString supA = Arrays.asList( new String[] { supA1, supA2, supA3 } ); : + ListString supB = Arrays.asList( new String[] { supB1, supB2, supB3}); : + item.supplier = new HashMapString, ListString(); : + item.supplier.put(supplier_supA, supA); : + item.supplier.put(supplier_supB, supB); : + : + item.supplier_simple = new HashMapString, String(); : + item.supplier_simple.put(sup_simple_supA, supA_val); : + item.supplier_simple.put(sup_simple_supB, supB_val); : : DocumentObjectBinder binder = new DocumentObjectBinder(); : SolrInputDocument doc = binder.toSolrInputDocument( item
[jira] Created: (SOLR-1644) Provide a clean way to keep flags and helper objects in ResponseBuilder
Provide a clean way to keep flags and helper objects in ResponseBuilder --- Key: SOLR-1644 URL: https://issues.apache.org/jira/browse/SOLR-1644 Project: Solr Issue Type: Improvement Components: search Reporter: Shalin Shekhar Mangar Assignee: Shalin Shekhar Mangar Fix For: 1.5 Many components such as StatsComponent, FacetComponent etc keep flags and helper objects in ResponseBuilder. Having to modify the ResponseBuilder for such things is a very kludgy solution. Let us provide a clean way for components to keep arbitrary objects for the duration of a (distributed) search request. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1644) Provide a clean way to keep flags and helper objects in ResponseBuilder
[ https://issues.apache.org/jira/browse/SOLR-1644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12789226#action_12789226 ] Noble Paul commented on SOLR-1644: -- Let us keep a Map in RB so that each component can store/retrieve any values it want. It can be considered as a session object . Provide a clean way to keep flags and helper objects in ResponseBuilder --- Key: SOLR-1644 URL: https://issues.apache.org/jira/browse/SOLR-1644 Project: Solr Issue Type: Improvement Components: search Reporter: Shalin Shekhar Mangar Assignee: Shalin Shekhar Mangar Fix For: 1.5 Many components such as StatsComponent, FacetComponent etc keep flags and helper objects in ResponseBuilder. Having to modify the ResponseBuilder for such things is a very kludgy solution. Let us provide a clean way for components to keep arbitrary objects for the duration of a (distributed) search request. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Re: SOLR-1131: disconnect between fields created by poly fields
On Dec 10, 2009, at 11:30 PM, Mattmann, Chris A (388J) wrote: Hi Yonik, While thinking about SOLR-1131, something important just came to mind. If we allow poly fields to add fields to the schema (be it via dynamic fields, or explicit field decls, either way), then we introduce a disconnect between the existing XML schema, and the runtime schema instance. To my knowledge there is no write-back/flush back for changes made to the schema at run-time, and what's loaded on startup (correct me if I'm wrong). There are other pros/cons to dynamically inserting field definitions, but this isn't one of them I think (provided that created fields are deterministically created at the same time as the poly field.) There's no need to write-back. I disagree. Without defining the DynamicField's offline by editing the schema.xml file before running SOLR, if fields are created on-the-fly by a PolyField during SOLR runtime, the schema.xml file is out of sync with the runtime IndexSchema instance. By declaring the poly field, you are declaring the dynamic field. I don't see why this leads to drift. Sure, it is an abstraction and their are Lucene fields that will be created under the hood, but that is one of the primary features of Solr, it hides all that mess. Moreover, at startup, there is a check to make sure there are no name clashes. Also, I'm (ack!) now leaning towards dynamic fields as a flexible method to do this (so long as they are pre-declared in the schema.xml file explicitly) I think Grant I are now on the same page about using dynamic fields. The remaining issue / option is if one must declare the dynamic fields in the schema, or if another field type (like a poly field, but it doesn't have to be) can insert the dynamic field definition if it doesn't exist. +1. My point is: one must declare the dynamic fields in the schema that the poly fields will use, ahead of time or you risk the drift I'm talking about. -- that way you don't have to create (n+10) * m fields for a 10-dimensional point that's stored as well as indexed. AFAIK, there are no options currently on the table that would create (or require) field definitions for every poly-type field used. So perhaps we are actually all on the same page now (with the open question of if we should explicit define the dynamic field definition for the point type that will end up being added to the example schema). My +1 for being explicit rather than implicit -- let's require the schema.xml editor person to define, ahead-of-time, the dynamicFiels that a polyField will use. This will keep schema.xml and the runtime IndexSchema in sync. Cheers, Chris ++ Chris Mattmann, Ph.D. Senior Computer Scientist NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 171-266B, Mailstop: 171-246 Email: chris.mattm...@jpl.nasa.gov WWW: http://sunset.usc.edu/~mattmann/ ++ Adjunct Assistant Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA ++ -- Grant Ingersoll http://www.lucidimagination.com/ Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using Solr/Lucene: http://www.lucidimagination.com/search
Re: Merge changes with cloud branch?
I think it is the job of the cloud branch to merge to trunk when ready. I don't feel strongly about it, but I suspect that applying patches to both all the time would just complicate things for the people working on the cloud branch, but I don't feel all that strong about it. On Dec 11, 2009, at 2:24 AM, Shalin Shekhar Mangar wrote: Hey Guys, Do we need to merge changes to trunk to the cloud branch? -- Regards, Shalin Shekhar Mangar.
Re: Merge changes with cloud branch?
On Fri, Dec 11, 2009 at 4:29 PM, Grant Ingersoll gsing...@apache.orgwrote: I think it is the job of the cloud branch to merge to trunk when ready. I don't feel strongly about it, but I suspect that applying patches to both all the time would just complicate things for the people working on the cloud branch, but I don't feel all that strong about it. Actually I was talking about us (working on trunk) merging our changes with cloud. Yes, it will make working with trunk more complex and I'm happy not to do it :) I'd like to wish good luck to the person who will merge cloud back to the ground :) -- Regards, Shalin Shekhar Mangar.
[jira] Commented: (SOLR-1644) Provide a clean way to keep flags and helper objects in ResponseBuilder
[ https://issues.apache.org/jira/browse/SOLR-1644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12789236#action_12789236 ] Uri Boness commented on SOLR-1644: -- Don't you have it already with the SolrQueryRequest.getContext() Provide a clean way to keep flags and helper objects in ResponseBuilder --- Key: SOLR-1644 URL: https://issues.apache.org/jira/browse/SOLR-1644 Project: Solr Issue Type: Improvement Components: search Reporter: Shalin Shekhar Mangar Assignee: Shalin Shekhar Mangar Fix For: 1.5 Many components such as StatsComponent, FacetComponent etc keep flags and helper objects in ResponseBuilder. Having to modify the ResponseBuilder for such things is a very kludgy solution. Let us provide a clean way for components to keep arbitrary objects for the duration of a (distributed) search request. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1131) Allow a single field type to index multiple fields
[ https://issues.apache.org/jira/browse/SOLR-1131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12789240#action_12789240 ] Grant Ingersoll commented on SOLR-1131: --- {quote} Quick comment based on a spot check of the changes to IndexSchema: rather than make polyField special somehow w.r.t IndexSchema, and add a FieldType.getPolyFieldNames, etc, I had been thinking more along the lines of having an IndexSchema.registerDynamicFieldDefinition - just like the existing registerDynamicCopyField. This would (optionally) allow any field type to add other definitions to the IndexSchema. I continue to think it would be good to stay away of special logic for polyfields in the IndexSchema. {quote} So, then the FieldType would register it's Dynamic Fields in it's own init() method by calling this method? That can work. bq. Why use Dynamic Field array The array is sorted and array access is much faster and we often have to loop over it to look it up. {quote} CoordinateFieldType: why process 1 sub field types and then throw an exception at the end? I cleaned this up to throw the Exception when it occurs. {quote} OK. Actually, this should just be in the derived class, as it may be the case some other CoordinateFieldType has multiple sub types. {quote} # parsePoint in DistanceUtils, why use ',' as the separator - use ' ' (at least conforms to georss point then). I guess because you are supporting N-dimensional points, right? # parsePoint - instead of complicated isolation loops, why not just use trim()? I've taken that approach in the patch I've attached. {quote} I think comma makes sense. As for the optimization stuff, I agree w/ Yonik, this is code that will be called a lot. * when checking for isDuplicateDynField, if it is, nothing is done. Shouldn't this be where an exception is thrown or a message is logged? In the patch I'm attaching I took the log approach. {quote} It is logged, but for the poly fields, if the dyn field is already defined, that's just fine. Allow a single field type to index multiple fields -- Key: SOLR-1131 URL: https://issues.apache.org/jira/browse/SOLR-1131 Project: Solr Issue Type: New Feature Components: Schema and Analysis Reporter: Ryan McKinley Assignee: Grant Ingersoll Fix For: 1.5 Attachments: SOLR-1131-IndexMultipleFields.patch, SOLR-1131.Mattmann.121009.patch.txt, SOLR-1131.patch, SOLR-1131.patch, SOLR-1131.patch, SOLR-1131.patch, SOLR-1131.patch, SOLR-1131.patch In a few special cases, it makes sense for a single field (the concept) to be indexed as a set of Fields (lucene Field). Consider SOLR-773. The concept point may be best indexed in a variety of ways: * geohash (sincle lucene field) * lat field, lon field (two double fields) * cartesian tiers (a series of fields with tokens to say if it exists within that region) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1131) Allow a single field type to index multiple fields
[ https://issues.apache.org/jira/browse/SOLR-1131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12789241#action_12789241 ] Grant Ingersoll commented on SOLR-1131: --- I've got a patch almost ready that brings in the ValueSource stuff. Allow a single field type to index multiple fields -- Key: SOLR-1131 URL: https://issues.apache.org/jira/browse/SOLR-1131 Project: Solr Issue Type: New Feature Components: Schema and Analysis Reporter: Ryan McKinley Assignee: Grant Ingersoll Fix For: 1.5 Attachments: SOLR-1131-IndexMultipleFields.patch, SOLR-1131.Mattmann.121009.patch.txt, SOLR-1131.patch, SOLR-1131.patch, SOLR-1131.patch, SOLR-1131.patch, SOLR-1131.patch, SOLR-1131.patch In a few special cases, it makes sense for a single field (the concept) to be indexed as a set of Fields (lucene Field). Consider SOLR-773. The concept point may be best indexed in a variety of ways: * geohash (sincle lucene field) * lat field, lon field (two double fields) * cartesian tiers (a series of fields with tokens to say if it exists within that region) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (SOLR-1644) Provide a clean way to keep flags and helper objects in ResponseBuilder
[ https://issues.apache.org/jira/browse/SOLR-1644?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Noble Paul updated SOLR-1644: - Attachment: SOLR-1644.patch A new Map is kept is ResponseBuilder. It is also possible to use SolrQueryRequest.getContext(). Let us make that call. Every component keeps a private Object as the key and stores values by that key in rb.store . No other component can access that value because the key is a private Object (So no conflict). The patch illustrates how FacetComponent has eliminated the variables from ResponseBuilder. If it is fine , we can remove the rest of it too such as doStats,needDocList,needDocSet etc Provide a clean way to keep flags and helper objects in ResponseBuilder --- Key: SOLR-1644 URL: https://issues.apache.org/jira/browse/SOLR-1644 Project: Solr Issue Type: Improvement Components: search Reporter: Shalin Shekhar Mangar Assignee: Shalin Shekhar Mangar Fix For: 1.5 Attachments: SOLR-1644.patch Many components such as StatsComponent, FacetComponent etc keep flags and helper objects in ResponseBuilder. Having to modify the ResponseBuilder for such things is a very kludgy solution. Let us provide a clean way for components to keep arbitrary objects for the duration of a (distributed) search request. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1644) Provide a clean way to keep flags and helper objects in ResponseBuilder
[ https://issues.apache.org/jira/browse/SOLR-1644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12789259#action_12789259 ] Uri Boness commented on SOLR-1644: -- It should also be possible to share objects between components to eliminate duplicate computations (if one component can re-use a computation that was already done in another component). I guess this can be supported by publishing the key as a public static field. Provide a clean way to keep flags and helper objects in ResponseBuilder --- Key: SOLR-1644 URL: https://issues.apache.org/jira/browse/SOLR-1644 Project: Solr Issue Type: Improvement Components: search Reporter: Shalin Shekhar Mangar Assignee: Shalin Shekhar Mangar Fix For: 1.5 Attachments: SOLR-1644.patch Many components such as StatsComponent, FacetComponent etc keep flags and helper objects in ResponseBuilder. Having to modify the ResponseBuilder for such things is a very kludgy solution. Let us provide a clean way for components to keep arbitrary objects for the duration of a (distributed) search request. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (SOLR-1644) Provide a clean way to keep flags and helper objects in ResponseBuilder
[ https://issues.apache.org/jira/browse/SOLR-1644?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Noble Paul updated SOLR-1644: - Comment: was deleted (was: bq.) Provide a clean way to keep flags and helper objects in ResponseBuilder --- Key: SOLR-1644 URL: https://issues.apache.org/jira/browse/SOLR-1644 Project: Solr Issue Type: Improvement Components: search Reporter: Shalin Shekhar Mangar Assignee: Shalin Shekhar Mangar Fix For: 1.5 Attachments: SOLR-1644.patch Many components such as StatsComponent, FacetComponent etc keep flags and helper objects in ResponseBuilder. Having to modify the ResponseBuilder for such things is a very kludgy solution. Let us provide a clean way for components to keep arbitrary objects for the duration of a (distributed) search request. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1644) Provide a clean way to keep flags and helper objects in ResponseBuilder
[ https://issues.apache.org/jira/browse/SOLR-1644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12789269#action_12789269 ] Noble Paul commented on SOLR-1644: -- bq. Provide a clean way to keep flags and helper objects in ResponseBuilder --- Key: SOLR-1644 URL: https://issues.apache.org/jira/browse/SOLR-1644 Project: Solr Issue Type: Improvement Components: search Reporter: Shalin Shekhar Mangar Assignee: Shalin Shekhar Mangar Fix For: 1.5 Attachments: SOLR-1644.patch Many components such as StatsComponent, FacetComponent etc keep flags and helper objects in ResponseBuilder. Having to modify the ResponseBuilder for such things is a very kludgy solution. Let us provide a clean way for components to keep arbitrary objects for the duration of a (distributed) search request. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1644) Provide a clean way to keep flags and helper objects in ResponseBuilder
[ https://issues.apache.org/jira/browse/SOLR-1644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12789271#action_12789271 ] Noble Paul commented on SOLR-1644: -- bq.I guess this can be supported by publishing the key as a public static field. We should not keep it static. public final should be good enough Provide a clean way to keep flags and helper objects in ResponseBuilder --- Key: SOLR-1644 URL: https://issues.apache.org/jira/browse/SOLR-1644 Project: Solr Issue Type: Improvement Components: search Reporter: Shalin Shekhar Mangar Assignee: Shalin Shekhar Mangar Fix For: 1.5 Attachments: SOLR-1644.patch Many components such as StatsComponent, FacetComponent etc keep flags and helper objects in ResponseBuilder. Having to modify the ResponseBuilder for such things is a very kludgy solution. Let us provide a clean way for components to keep arbitrary objects for the duration of a (distributed) search request. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Re: Merge changes with cloud branch?
I would say you *certainly* don't have to. But anyone that's willing to, +1 :) Someone will have to do it. The merge back to trunk will be easy, because we will be periodicly merging trunk to cloud. - Mark http://www.lucidimagination.com (mobile) On Dec 11, 2009, at 2:24 AM, Shalin Shekhar Mangar shalinman...@gmail.com wrote: Hey Guys, Do we need to merge changes to trunk to the cloud branch? -- Regards, Shalin Shekhar Mangar.
[jira] Commented: (SOLR-1644) Provide a clean way to keep flags and helper objects in ResponseBuilder
[ https://issues.apache.org/jira/browse/SOLR-1644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12789290#action_12789290 ] Uri Boness commented on SOLR-1644: -- bq. We should not keep it static. public final should be good enough Is there a special reason for this? Is the plan to have a KEY per component instance? If so, how would it be possible to refer to the key from other components? This is what I had in mind - Assuming Component1 computed something and registered it in the store using KEY. Then Component2 can reuse this computation by accessing it as follows: {code} Object someValue = rb.store.get(Component2.KEY); // do something with someValue {code} Provide a clean way to keep flags and helper objects in ResponseBuilder --- Key: SOLR-1644 URL: https://issues.apache.org/jira/browse/SOLR-1644 Project: Solr Issue Type: Improvement Components: search Reporter: Shalin Shekhar Mangar Assignee: Shalin Shekhar Mangar Fix For: 1.5 Attachments: SOLR-1644.patch Many components such as StatsComponent, FacetComponent etc keep flags and helper objects in ResponseBuilder. Having to modify the ResponseBuilder for such things is a very kludgy solution. Let us provide a clean way for components to keep arbitrary objects for the duration of a (distributed) search request. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (SOLR-1358) Integration of Tika and DataImportHandler
[ https://issues.apache.org/jira/browse/SOLR-1358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Akshay K. Ukey updated SOLR-1358: - Attachment: SOLR-1358.patch Patch with test case, tika parser configurable via parser attribute for entity tag. Integration of Tika and DataImportHandler - Key: SOLR-1358 URL: https://issues.apache.org/jira/browse/SOLR-1358 Project: Solr Issue Type: New Feature Components: contrib - DataImportHandler Reporter: Sascha Szott Assignee: Noble Paul Attachments: SOLR-1358.patch, SOLR-1358.patch, SOLR-1358.patch, SOLR-1358.patch At the moment, it's impossible to configure Solr such that it build up documents by using data that comes from both pdf documents and database table columns. Currently, to accomplish this task, it's up to the user to add some preprocessing that converts pdf files into plain text files. Therefore, I would like to see an integration of Solr Cell into DIH that makes those preprocessing obsolete. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Issue Comment Edited: (SOLR-1358) Integration of Tika and DataImportHandler
[ https://issues.apache.org/jira/browse/SOLR-1358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12789300#action_12789300 ] Akshay K. Ukey edited comment on SOLR-1358 at 12/11/09 1:23 PM: Patch with test case and with tika parser configurable via parser attribute for entity tag. was (Author: akshay): Patch with test case, tika parser configurable via parser attribute for entity tag. Integration of Tika and DataImportHandler - Key: SOLR-1358 URL: https://issues.apache.org/jira/browse/SOLR-1358 Project: Solr Issue Type: New Feature Components: contrib - DataImportHandler Reporter: Sascha Szott Assignee: Noble Paul Attachments: SOLR-1358.patch, SOLR-1358.patch, SOLR-1358.patch, SOLR-1358.patch At the moment, it's impossible to configure Solr such that it build up documents by using data that comes from both pdf documents and database table columns. Currently, to accomplish this task, it's up to the user to add some preprocessing that converts pdf files into plain text files. Therefore, I would like to see an integration of Solr Cell into DIH that makes those preprocessing obsolete. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1621) Allow current single core deployments to be specified by solr.xml
[ https://issues.apache.org/jira/browse/SOLR-1621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12789302#action_12789302 ] Mark Miller commented on SOLR-1621: --- I'd like to take a quick look - will report back. Allow current single core deployments to be specified by solr.xml - Key: SOLR-1621 URL: https://issues.apache.org/jira/browse/SOLR-1621 Project: Solr Issue Type: New Feature Affects Versions: 1.5 Reporter: Noble Paul Fix For: 1.5 Attachments: SOLR-1621.patch, SOLR-1621.patch, SOLR-1621.patch, SOLR-1621.patch, SOLR-1621.patch, SOLR-1621.patch, SOLR-1621.patch supporting two different modes of deployments is turning out to be hard. This leads to duplication of code. Moreover there is a lot of confusion on where do we put common configuration. See the mail thread http://markmail.org/message/3m3rqvp2ckausjnf -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (SOLR-1358) Integration of Tika and DataImportHandler
[ https://issues.apache.org/jira/browse/SOLR-1358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Noble Paul resolved SOLR-1358. -- Resolution: Fixed Fix Version/s: 1.5 committed r889613 Thanks Akshay Integration of Tika and DataImportHandler - Key: SOLR-1358 URL: https://issues.apache.org/jira/browse/SOLR-1358 Project: Solr Issue Type: New Feature Components: contrib - DataImportHandler Reporter: Sascha Szott Assignee: Noble Paul Fix For: 1.5 Attachments: SOLR-1358.patch, SOLR-1358.patch, SOLR-1358.patch, SOLR-1358.patch At the moment, it's impossible to configure Solr such that it build up documents by using data that comes from both pdf documents and database table columns. Currently, to accomplish this task, it's up to the user to add some preprocessing that converts pdf files into plain text files. Therefore, I would like to see an integration of Solr Cell into DIH that makes those preprocessing obsolete. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (SOLR-1177) Distributed TermsComponent
[ https://issues.apache.org/jira/browse/SOLR-1177?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shalin Shekhar Mangar updated SOLR-1177: Attachment: SOLR-1177.patch # Based on Matt's patch # Synced to trunk # Uses BaseDistributedTestCase All tests pass. I had to change TermData#frequency to an int to match the output of distributed and non-distributed cases. It is theoretically possible to have the sum of frequencies from all shards to exceed size of an int but I don't think it is practical right now. The problem is that we represent frequency as int everywhere for non-distributed responses. If we want longs in distributed search responses then we must start using longs in non-distributed responses as well to maintain compatibility. Matt -- There is an issue open for adding SolrJ support for TermsComponent - SOLR-1139. Is it possible to replace the TermsHelper and TermData classes by classes in SOLR-1139? I'd like to have the same classes parsing responses in Solrj and distributed search. Distributed TermsComponent -- Key: SOLR-1177 URL: https://issues.apache.org/jira/browse/SOLR-1177 Project: Solr Issue Type: Improvement Affects Versions: 1.4 Reporter: Matt Weber Assignee: Shalin Shekhar Mangar Priority: Minor Fix For: 1.5 Attachments: SOLR-1177.patch, SOLR-1177.patch, SOLR-1177.patch, TermsComponent.java, TermsComponent.patch TermsComponent should be distributed -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Assigned: (SOLR-1139) SolrJ TermsComponent Query and Response Support
[ https://issues.apache.org/jira/browse/SOLR-1139?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shalin Shekhar Mangar reassigned SOLR-1139: --- Assignee: Shalin Shekhar Mangar SolrJ TermsComponent Query and Response Support --- Key: SOLR-1139 URL: https://issues.apache.org/jira/browse/SOLR-1139 Project: Solr Issue Type: New Feature Components: clients - java Affects Versions: 1.4 Reporter: Matt Weber Assignee: Shalin Shekhar Mangar Priority: Minor Attachments: SOLR-1139-WITH_SORT_SUPPORT.patch, SOLR-1139.patch, SOLR-1139.patch, SOLR-1139.patch, SOLR-1139.patch, SOLR-1139.patch SolrJ should support the new TermsComponent that was introduced in Solr 1.4. It should be able to: - set TermsComponent query parameters via SolrQuery - parse the TermsComponent response -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1131) Allow a single field type to index multiple fields
[ https://issues.apache.org/jira/browse/SOLR-1131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12789330#action_12789330 ] Chris A. Mattmann commented on SOLR-1131: - Hi Grant: bq.The array is sorted and array access is much faster and we often have to loop over it to look it up. OK, fair enough. bq. I think comma makes sense. OK, so you think it makes sense -- why? Because it's an N-dimensional array and spaces are less user friendly as Yonik put it? bq. As for the optimization stuff, I agree w/ Yonik, this is code that will be called a lot. I'm wondering what there is to agree with since optimization was never defined. Are you talking speed? Are you talking memory efficiency? Code readability? Maintainability? Some combination of all of those? There are tradeoffs in everything. You could rewrite some of the provided java runtime methods to squeeze out extra performance, but what's the point of libraries or reusable functions then? The prior code that was in there basically rewrote exactly what split() and trim() do, so why not reuse what's there? If you throw up the performance flag, I would push back on readability and maintainability. bq. It is logged, but for the poly fields, if the dyn field is already defined, that's just fine. Where is it logged? It wasn't in the most up-to-date patch, provided on 2009-12-10 04:34 PM. Here was the code snipped that was there: {code} +//For each poly field, go through and add the appropriate Dynamic field + for (FieldType fieldType : polyFieldTypes.values()) { +if (fieldType instanceof DelegatingFieldType){ + ListFieldType subs = ((DelegatingFieldType) fieldType).getSubTypes(); + if (subs.isEmpty() == false){ +//add a new dynamic field for each sub field type +for (FieldType type : subs) { + log.debug(dynamic field creation for sub type: + type.typeName); + SchemaField df = SchemaField.create(* + FieldType.POLY_FIELD_SEPARATOR + type.typeName, + type, type.args);//TODO: is type.args right? + if (isDuplicateDynField(dFields, df) == false){ +addDynamicFieldNoDupCheck(dFields, df); + } // NOTE: there is no else here, so I added an else and a log message +} + } // NOTE: there is no else here, so I added an else and a log message +} + } + {code} Here's what I added: {code} +//For each poly field, go through and add the appropriate Dynamic field + for (FieldType fieldType : polyFieldTypes.values()) { +if (fieldType instanceof DelegatingFieldType){ + ListFieldType subs = ((DelegatingFieldType) fieldType).getSubTypes(); + if (!subs.isEmpty()){ +//add a new dynamic field for each sub field type +for (FieldType type : subs) { + log.debug(dynamic field creation for sub type: + type.typeName); + SchemaField df = SchemaField.create(* + FieldType.POLY_FIELD_SEPARATOR + type.typeName, + type, type.args);//TODO: is type.args right? + if (!isDuplicateDynField(dFields, df)){ +addDynamicFieldNoDupCheck(dFields, df); + } + else{ +log.debug(dynamic field creation avoided: dynamic field: [+df.getName()+] + + already defined in the schema!); + } + +} + } + else{ +log.debug(field type: [+fieldType.getTypeName()+]: no sub fields defined); + } +} + } + {code} Also, I get that it's fine for the poly fields if the dyn field is already defined (in fact, based on my mailing lists comments http://old.nabble.com/SOLR-1131:-disconnect-between-fields-created-by-poly-fields-td26736431.html I think this should _always_ be the case), but whether it's fine or not, it's still worthy to log to provide someone more information. Allow a single field type to index multiple fields -- Key: SOLR-1131 URL: https://issues.apache.org/jira/browse/SOLR-1131 Project: Solr Issue Type: New Feature Components: Schema and Analysis Reporter: Ryan McKinley Assignee: Grant Ingersoll Fix For: 1.5 Attachments: SOLR-1131-IndexMultipleFields.patch, SOLR-1131.Mattmann.121009.patch.txt, SOLR-1131.patch, SOLR-1131.patch, SOLR-1131.patch, SOLR-1131.patch, SOLR-1131.patch, SOLR-1131.patch In a few special cases, it makes sense for a single field (the concept) to be indexed as a set of Fields (lucene Field). Consider SOLR-773. The concept point may be best indexed in a variety of ways: * geohash (sincle lucene field) * lat
Re: SOLR-1131: disconnect between fields created by poly fields
Hi Grant: By declaring the poly field, you are declaring the dynamic field. I don't see why this leads to drift. Sure, it is an abstraction and their are Lucene fields that will be created under the hood, but that is one of the primary features of Solr, it hides all that mess. Actually if it was the case that poly field mapped to a single dynamic field, then I would agree with you, but as is the discussion, poly field can map to _many_ dynamic fields, which is where the drift occurs. Also, the drift does not occur in the sense that there are more Lucene fields created (fields of course, are Lucene concepts, which true, SOLR should serve as an abstraction to) -- it's in the sense that there are more dynamic fields (which are of course, a SOLR concept, and which I'm not sure that SOLR should serve as an abstraction to, and if it does, then schema.xml can be out of sync). So, in other words, with the current thinking, you can have dynamic fields that are created by a poly field that have could have no corresponding entry in schema.xml. That's my point. I think we should enforce that there be an entry in schema.xml to prevent this drift. Cheers, Chris ++ Chris Mattmann, Ph.D. Senior Computer Scientist NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 171-266B, Mailstop: 171-246 Email: chris.mattm...@jpl.nasa.gov WWW: http://sunset.usc.edu/~mattmann/ ++ Adjunct Assistant Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA ++
[jira] Commented: (SOLR-1177) Distributed TermsComponent
[ https://issues.apache.org/jira/browse/SOLR-1177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12789334#action_12789334 ] Yonik Seeley commented on SOLR-1177: The facet component internally uses long to add up distributed facet counts, and then uses this code: {code} // use int tags for smaller facet counts (better back compatibility) private Number num(long val) { if (val Integer.MAX_VALUE) return (int)val; else return val; } {code} Yes, it's not ideal to switch from int to long in a running application, but I think it's better than failing or overflowing the int. Client code in SolrJ should be written to handle either via ((Number)x).longValue() Distributed TermsComponent -- Key: SOLR-1177 URL: https://issues.apache.org/jira/browse/SOLR-1177 Project: Solr Issue Type: Improvement Affects Versions: 1.4 Reporter: Matt Weber Assignee: Shalin Shekhar Mangar Priority: Minor Fix For: 1.5 Attachments: SOLR-1177.patch, SOLR-1177.patch, SOLR-1177.patch, TermsComponent.java, TermsComponent.patch TermsComponent should be distributed -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Re: SOLR-1131: disconnect between fields created by poly fields
On Fri, Dec 11, 2009 at 9:53 AM, Mattmann, Chris A (388J) chris.a.mattm...@jpl.nasa.gov wrote: Actually if it was the case that poly field mapped to a single dynamic field, then I would agree with you, but as is the discussion, poly field can map to _many_ dynamic fields, which is where the drift occurs. I'm not sure if we're using the exact same terminology, but it's well defined how many dynamic fields would be created by the basic point class (exactly one) *if* we decide to go that route and use that option. Can you give an examples of what you mean? Is your objection to this point class registering a single dynamic field, or are you talking about a hypothetical case? -Yonik http://www.lucidimagination.com
[jira] Commented: (SOLR-1177) Distributed TermsComponent
[ https://issues.apache.org/jira/browse/SOLR-1177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12789336#action_12789336 ] Matt Weber commented on SOLR-1177: -- Thanks for the update Yonik! I will see if I can get this and SOLR-1139 using the same classes. Distributed TermsComponent -- Key: SOLR-1177 URL: https://issues.apache.org/jira/browse/SOLR-1177 Project: Solr Issue Type: Improvement Affects Versions: 1.4 Reporter: Matt Weber Assignee: Shalin Shekhar Mangar Priority: Minor Fix For: 1.5 Attachments: SOLR-1177.patch, SOLR-1177.patch, SOLR-1177.patch, TermsComponent.java, TermsComponent.patch TermsComponent should be distributed -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1131) Allow a single field type to index multiple fields
[ https://issues.apache.org/jira/browse/SOLR-1131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12789339#action_12789339 ] Grant Ingersoll commented on SOLR-1131: --- {quote} I'm wondering what there is to agree with since optimization was never defined. Are you talking speed? Are you talking memory efficiency? Code readability? Maintainability? Some combination of all of those? {quote} Speed and memory. As for logging, that code is all going away in the next patch, I think Allow a single field type to index multiple fields -- Key: SOLR-1131 URL: https://issues.apache.org/jira/browse/SOLR-1131 Project: Solr Issue Type: New Feature Components: Schema and Analysis Reporter: Ryan McKinley Assignee: Grant Ingersoll Fix For: 1.5 Attachments: SOLR-1131-IndexMultipleFields.patch, SOLR-1131.Mattmann.121009.patch.txt, SOLR-1131.patch, SOLR-1131.patch, SOLR-1131.patch, SOLR-1131.patch, SOLR-1131.patch, SOLR-1131.patch In a few special cases, it makes sense for a single field (the concept) to be indexed as a set of Fields (lucene Field). Consider SOLR-773. The concept point may be best indexed in a variety of ways: * geohash (sincle lucene field) * lat field, lon field (two double fields) * cartesian tiers (a series of fields with tokens to say if it exists within that region) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1131) Allow a single field type to index multiple fields
[ https://issues.apache.org/jira/browse/SOLR-1131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12789342#action_12789342 ] Chris A. Mattmann commented on SOLR-1131: - bq. Speed and memory. Unfortunately, it's at the cost of readability and maintainability. bq. As for logging, that code is all going away in the next patch, I think I'll take a look when you throw it up, thanks. Cheers, Chris Allow a single field type to index multiple fields -- Key: SOLR-1131 URL: https://issues.apache.org/jira/browse/SOLR-1131 Project: Solr Issue Type: New Feature Components: Schema and Analysis Reporter: Ryan McKinley Assignee: Grant Ingersoll Fix For: 1.5 Attachments: SOLR-1131-IndexMultipleFields.patch, SOLR-1131.Mattmann.121009.patch.txt, SOLR-1131.patch, SOLR-1131.patch, SOLR-1131.patch, SOLR-1131.patch, SOLR-1131.patch, SOLR-1131.patch In a few special cases, it makes sense for a single field (the concept) to be indexed as a set of Fields (lucene Field). Consider SOLR-773. The concept point may be best indexed in a variety of ways: * geohash (sincle lucene field) * lat field, lon field (two double fields) * cartesian tiers (a series of fields with tokens to say if it exists within that region) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1131) Allow a single field type to index multiple fields
[ https://issues.apache.org/jira/browse/SOLR-1131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12789351#action_12789351 ] Grant Ingersoll commented on SOLR-1131: --- bq. Unfortunately, it's at the cost of readability and maintainability. Maybe. It took me all of 30 seconds to figure out what it was doing. I'll put some comments on it. While readability is important, Solr's goal is not to make a product that a CS101 grad can read, it's too build a blazing fast search server. That call could hit millions of times when indexing points. Allow a single field type to index multiple fields -- Key: SOLR-1131 URL: https://issues.apache.org/jira/browse/SOLR-1131 Project: Solr Issue Type: New Feature Components: Schema and Analysis Reporter: Ryan McKinley Assignee: Grant Ingersoll Fix For: 1.5 Attachments: SOLR-1131-IndexMultipleFields.patch, SOLR-1131.Mattmann.121009.patch.txt, SOLR-1131.patch, SOLR-1131.patch, SOLR-1131.patch, SOLR-1131.patch, SOLR-1131.patch, SOLR-1131.patch In a few special cases, it makes sense for a single field (the concept) to be indexed as a set of Fields (lucene Field). Consider SOLR-773. The concept point may be best indexed in a variety of ways: * geohash (sincle lucene field) * lat field, lon field (two double fields) * cartesian tiers (a series of fields with tokens to say if it exists within that region) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1131) Allow a single field type to index multiple fields
[ https://issues.apache.org/jira/browse/SOLR-1131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12789354#action_12789354 ] Chris A. Mattmann commented on SOLR-1131: - bq. Maybe. It took me all of 30 seconds to figure out what it was doing. I'll put some comments on it. While readability is important, Solr's goal is not to make a product that a CS101 grad can read, it's too build a blazing fast search server. That call could hit millions of times when indexing points. Sure, maybe the goal isn't for a CS101 grad to be able to easily understand the code, but SOLR's goal should include being open to ideas from the community that involve reusing standard Java library functions and not rewriting based on perception of speed and memory without empirical proof. Where's the evidence that using things like split and trim are so much more costly than rewriting those basic capabilities that they warrant not using them? Allow a single field type to index multiple fields -- Key: SOLR-1131 URL: https://issues.apache.org/jira/browse/SOLR-1131 Project: Solr Issue Type: New Feature Components: Schema and Analysis Reporter: Ryan McKinley Assignee: Grant Ingersoll Fix For: 1.5 Attachments: SOLR-1131-IndexMultipleFields.patch, SOLR-1131.Mattmann.121009.patch.txt, SOLR-1131.patch, SOLR-1131.patch, SOLR-1131.patch, SOLR-1131.patch, SOLR-1131.patch, SOLR-1131.patch In a few special cases, it makes sense for a single field (the concept) to be indexed as a set of Fields (lucene Field). Consider SOLR-773. The concept point may be best indexed in a variety of ways: * geohash (sincle lucene field) * lat field, lon field (two double fields) * cartesian tiers (a series of fields with tokens to say if it exists within that region) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1632) Distributed IDF
[ https://issues.apache.org/jira/browse/SOLR-1632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12789379#action_12789379 ] Otis Gospodnetic commented on SOLR-1632: I didn't look a the patch, but from your comments it looks like you already have that 1 merged big idf map, which is really what I was aiming at, so that's good! I was just thinking that this map (file) would be periodically updated and pushed to slaves, so that slaves can compute the global IDF *locally* instead of any kind of extra requests. Distributed IDF --- Key: SOLR-1632 URL: https://issues.apache.org/jira/browse/SOLR-1632 Project: Solr Issue Type: New Feature Components: search Affects Versions: 1.5 Reporter: Andrzej Bialecki Attachments: distrib.patch Distributed IDF is a valuable enhancement for distributed search across non-uniform shards. This issue tracks the proposed implementation of an API to support this functionality in Solr. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-236) Field collapsing
[ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12789381#action_12789381 ] Marc Menghin commented on SOLR-236: --- Hi, new to Solr, so sorry for my likely still incomplete setup. I got everything from Solr SVN and applied the Patch (field-collapse-5.patch2009-12-08 09:43 PM). As I search I get a NPE because I seem to not have a cache for the collapsing. It wants to add a entry to the cache but can't. There is none at that time, which it checks before in AbstractDocumentCollapser.collapse but still wants to use it later in AbstractDocumentCollapser.createDocumentCollapseResult. I suppose thats a bug? Or is something wrong on my side? Exception I get is: java.lang.NullPointerException at org.apache.solr.search.fieldcollapse.AbstractDocumentCollapser.createDocumentCollapseResult(AbstractDocumentCollapser.java:278) at org.apache.solr.search.fieldcollapse.AbstractDocumentCollapser.executeCollapse(AbstractDocumentCollapser.java:249) at org.apache.solr.search.fieldcollapse.AbstractDocumentCollapser.collapse(AbstractDocumentCollapser.java:172) at org.apache.solr.handler.component.CollapseComponent.doProcess(CollapseComponent.java:173) at org.apache.solr.handler.component.CollapseComponent.process(CollapseComponent.java:127) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:195) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316) I fixed it locally by only adding something to the cache if there is one (fieldCollapseCache != null). But I'm not very into the code so not sure if thats a good/right way to fix it. Thanks, Marc Field collapsing Key: SOLR-236 URL: https://issues.apache.org/jira/browse/SOLR-236 Project: Solr Issue Type: New Feature Components: search Affects Versions: 1.3 Reporter: Emmanuel Keller Fix For: 1.5 Attachments: collapsing-patch-to-1.3.0-dieter.patch, collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch, field-collapse-4-with-solrj.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, quasidistributed.additional.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, solr-236.patch, SOLR-236_collapsing.patch, SOLR-236_collapsing.patch This patch include a new feature called Field collapsing. Used in order to collapse a group of results with similar value for a given field to a single entry in the result set. Site collapsing is a special case of this, where all results for a given web site is collapsed into one or two entries in the result set, typically with an associated more documents from this site link. See also Duplicate detection. http://www.fastsearch.com/glossary.aspx?m=48amid=299 The implementation add 3 new query parameters (SolrParams): collapse.field to choose the field used to group results collapse.type normal (default value) or adjacent collapse.max to select how many continuous results are allowed before collapsing TODO (in progress): - More documentation (on source code) - Test cases Two patches: - field_collapsing.patch for current development version - field_collapsing_1.1.0.patch for Solr-1.1.0 P.S.: Feedback and misspelling correction are welcome ;-) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (SOLR-1131) Allow a single field type to index multiple fields
[ https://issues.apache.org/jira/browse/SOLR-1131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Ingersoll updated SOLR-1131: -- Attachment: SOLR-1131.patch OK, this is getting a lot closer to ready to commit. Changes: # Introduced a MultiValueSource - ValueSource that abstractly represents ValueSources for poly fields, and other things. # Introduced PointValueSource - point(x,y,z) - a MultiValueSource that wraps other value sources (could be called something else, I suppose) # Implemented PointTypeValueSource to represent ValueSource for the PointType class. # Hooked in multivalue callbacks to DocValues. In addition to making functions work with Points (et. al) it should be possible to write functions that work on multivalued fields, but I did not undertake this work. # Add in SchemaAware callback mechanism so that Field Types and other schema stuff can register dynamic fields, etc. after the schema has been created # Updated the example to have spatial information in the docs, etc. See http://wiki.apache.org/solr/SpatialSearch # Modified the distance functions to work with MultiValueSources # cleaned up the tests # Incorporated various comments from Chris and Yonik. Allow a single field type to index multiple fields -- Key: SOLR-1131 URL: https://issues.apache.org/jira/browse/SOLR-1131 Project: Solr Issue Type: New Feature Components: Schema and Analysis Reporter: Ryan McKinley Assignee: Grant Ingersoll Fix For: 1.5 Attachments: SOLR-1131-IndexMultipleFields.patch, SOLR-1131.Mattmann.121009.patch.txt, SOLR-1131.patch, SOLR-1131.patch, SOLR-1131.patch, SOLR-1131.patch, SOLR-1131.patch, SOLR-1131.patch, SOLR-1131.patch In a few special cases, it makes sense for a single field (the concept) to be indexed as a set of Fields (lucene Field). Consider SOLR-773. The concept point may be best indexed in a variety of ways: * geohash (sincle lucene field) * lat field, lon field (two double fields) * cartesian tiers (a series of fields with tokens to say if it exists within that region) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1316) Create autosuggest component
[ https://issues.apache.org/jira/browse/SOLR-1316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12789418#action_12789418 ] Ankul Garg commented on SOLR-1316: -- Shouldn't we be creating a separate AutoSuggestComponent like the SpellCheckComponent havings its own prepare, process and inform functions? Create autosuggest component Key: SOLR-1316 URL: https://issues.apache.org/jira/browse/SOLR-1316 Project: Solr Issue Type: New Feature Components: search Affects Versions: 1.4 Reporter: Jason Rutherglen Assignee: Shalin Shekhar Mangar Priority: Minor Fix For: 1.5 Attachments: suggest.patch, suggest.patch, TST.zip Original Estimate: 96h Remaining Estimate: 96h Autosuggest is a common search function that can be integrated into Solr as a SearchComponent. Our first implementation will use the TernaryTree found in Lucene contrib. * Enable creation of the dictionary from the index or via Solr's RPC mechanism * What types of parameters and settings are desirable? * Hopefully in the future we can include user click through rates to boost those terms/phrases higher -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Fwd: [ANNOUNCEMENT] HttpComponents HttpClient 4.0.1 (GA) Released
FYI... Begin forwarded message: From: Oleg Kalnichevski ol...@apache.org Date: December 11, 2009 2:20:50 PM GMT+01:00 To: annou...@apache.org, priv...@hc.apache.org, d...@hc.apache.org, httpclient-us...@hc.apache.org Subject: [ANNOUNCEMENT] HttpComponents HttpClient 4.0.1 (GA) Released Reply-To: HttpComponents Project d...@hc.apache.org HttpClient 4.0.1 is a bug fix release that addresses a number of issues discovered since the previous stable release. None of the fixed bugs is considered critical. Most notably this release eliminates eliminates dependency on JCIP annotations. This release is also expected to improve performance by 5 to 10% due to elimination of unnecessary Log object lookups by short-lived components. --- Download - http://hc.apache.org/downloads.cgi Release notes - http://www.apache.org/dist/httpcomponents/httpclient/RELEASE_NOTES.txt HttpComponents site - http://hc.apache.org/ Please note HttpClient 4.0 currently provides only limited support for NTLM authentication. For details please refer to http://hc.apache.org/httpcomponents-client/ntlm.html --- About Apache HttpClient Although the java.net package provides basic functionality for accessing resources via HTTP, it doesn't provide the full flexibility or functionality needed by many applications. HttpClient seeks to fill this void by providing an efficient, up-to-date, and feature-rich package implementing the client side of the most recent HTTP standards and recommendations. Designed for extension while providing robust support for the base HTTP protocol, HttpClient may be of interest to anyone building HTTP-aware client applications such as web browsers, web service clients, or systems that leverage or extend the HTTP protocol for distributed communication.
Re: [ANNOUNCEMENT] HttpComponents HttpClient 4.0.1 (GA) Released
There are, in fact, updates to many of Solr's dependencies that we should consider. I like the sound of 5-10% perf. improvement in HttpClient... -Grant On Dec 11, 2009, at 12:53 PM, Erik Hatcher wrote: FYI... Begin forwarded message: From: Oleg Kalnichevski ol...@apache.org Date: December 11, 2009 2:20:50 PM GMT+01:00 To: annou...@apache.org, priv...@hc.apache.org, d...@hc.apache.org, httpclient-us...@hc.apache.org Subject: [ANNOUNCEMENT] HttpComponents HttpClient 4.0.1 (GA) Released Reply-To: HttpComponents Project d...@hc.apache.org HttpClient 4.0.1 is a bug fix release that addresses a number of issues discovered since the previous stable release. None of the fixed bugs is considered critical. Most notably this release eliminates eliminates dependency on JCIP annotations. This release is also expected to improve performance by 5 to 10% due to elimination of unnecessary Log object lookups by short-lived components. --- Download - http://hc.apache.org/downloads.cgi Release notes - http://www.apache.org/dist/httpcomponents/httpclient/RELEASE_NOTES.txt HttpComponents site - http://hc.apache.org/ Please note HttpClient 4.0 currently provides only limited support for NTLM authentication. For details please refer to http://hc.apache.org/httpcomponents-client/ntlm.html --- About Apache HttpClient Although the java.net package provides basic functionality for accessing resources via HTTP, it doesn't provide the full flexibility or functionality needed by many applications. HttpClient seeks to fill this void by providing an efficient, up-to-date, and feature-rich package implementing the client side of the most recent HTTP standards and recommendations. Designed for extension while providing robust support for the base HTTP protocol, HttpClient may be of interest to anyone building HTTP-aware client applications such as web browsers, web service clients, or systems that leverage or extend the HTTP protocol for distributed communication. -- Grant Ingersoll http://www.lucidimagination.com/ Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using Solr/Lucene: http://www.lucidimagination.com/search
[jira] Commented: (SOLR-1621) Allow current single core deployments to be specified by solr.xml
[ https://issues.apache.org/jira/browse/SOLR-1621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12789439#action_12789439 ] Mark Miller commented on SOLR-1621: --- You've deprecated handleAliasAction, but you have also made it so its not called with the default handleRequestBody. I think its safer to just remove handleAliasAction - someone that overrode handleAliasAction but kept the default handleRequestBody would be in a for a silent nasty surprise anyway. Better to get a compile time error. Allow current single core deployments to be specified by solr.xml - Key: SOLR-1621 URL: https://issues.apache.org/jira/browse/SOLR-1621 Project: Solr Issue Type: New Feature Affects Versions: 1.5 Reporter: Noble Paul Fix For: 1.5 Attachments: SOLR-1621.patch, SOLR-1621.patch, SOLR-1621.patch, SOLR-1621.patch, SOLR-1621.patch, SOLR-1621.patch, SOLR-1621.patch supporting two different modes of deployments is turning out to be hard. This leads to duplication of code. Moreover there is a lot of confusion on where do we put common configuration. See the mail thread http://markmail.org/message/3m3rqvp2ckausjnf -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1621) Allow current single core deployments to be specified by solr.xml
[ https://issues.apache.org/jira/browse/SOLR-1621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12789443#action_12789443 ] Mark Miller commented on SOLR-1621: --- Also, its confusing the way the Alias test is commented out - shouldn't we just remove to be less confusing? It will still be in subversion if we decide we want to bring it back. Allow current single core deployments to be specified by solr.xml - Key: SOLR-1621 URL: https://issues.apache.org/jira/browse/SOLR-1621 Project: Solr Issue Type: New Feature Affects Versions: 1.5 Reporter: Noble Paul Fix For: 1.5 Attachments: SOLR-1621.patch, SOLR-1621.patch, SOLR-1621.patch, SOLR-1621.patch, SOLR-1621.patch, SOLR-1621.patch, SOLR-1621.patch supporting two different modes of deployments is turning out to be hard. This leads to duplication of code. Moreover there is a lot of confusion on where do we put common configuration. See the mail thread http://markmail.org/message/3m3rqvp2ckausjnf -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1621) Allow current single core deployments to be specified by solr.xml
[ https://issues.apache.org/jira/browse/SOLR-1621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12789447#action_12789447 ] Mark Miller commented on SOLR-1621: --- Lastly : there are a couple little unrelated changes in this patch. 1. The change to the import statement in CoreDescriptor, but no other changes to the file. 2. The change in index.jsp from self closing tags to closing link tags - this has no apparent purpose, so I think it shouldn't be part of this commit. Allow current single core deployments to be specified by solr.xml - Key: SOLR-1621 URL: https://issues.apache.org/jira/browse/SOLR-1621 Project: Solr Issue Type: New Feature Affects Versions: 1.5 Reporter: Noble Paul Fix For: 1.5 Attachments: SOLR-1621.patch, SOLR-1621.patch, SOLR-1621.patch, SOLR-1621.patch, SOLR-1621.patch, SOLR-1621.patch, SOLR-1621.patch supporting two different modes of deployments is turning out to be hard. This leads to duplication of code. Moreover there is a lot of confusion on where do we put common configuration. See the mail thread http://markmail.org/message/3m3rqvp2ckausjnf -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (SOLR-1621) Allow current single core deployments to be specified by solr.xml
[ https://issues.apache.org/jira/browse/SOLR-1621?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mark Miller updated SOLR-1621: -- Attachment: SOLR-1621.patch You mark setSolrConfigFilename as deprecated, but it simply doesn't work anymore either - thats not really deprecation. I feel that deprecating solrconfig-filename should be its own issue so thats a more clear entry in CHANGES. And it should prob be deprecated rather than turned off out of the blue. I don't suppose its a big deal either way though. You have also changed CoreContainer#getCores - but that change is unrelated to this patch. I really feal an issue should just stick to the issue - it makes commits more clear. That change (which isn't back compat anyway), should be its own commit. This latest update removes a bunch of the unrelated changes. The patch is still unfinished though - you removed my fix to use the solrconfig override without replacing it with an alternative fix. As a result, some tests don't use the right config now, and in particular, NoCacheHeaderTest is back to failing. Allow current single core deployments to be specified by solr.xml - Key: SOLR-1621 URL: https://issues.apache.org/jira/browse/SOLR-1621 Project: Solr Issue Type: New Feature Affects Versions: 1.5 Reporter: Noble Paul Fix For: 1.5 Attachments: SOLR-1621.patch, SOLR-1621.patch, SOLR-1621.patch, SOLR-1621.patch, SOLR-1621.patch, SOLR-1621.patch, SOLR-1621.patch, SOLR-1621.patch supporting two different modes of deployments is turning out to be hard. This leads to duplication of code. Moreover there is a lot of confusion on where do we put common configuration. See the mail thread http://markmail.org/message/3m3rqvp2ckausjnf -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (SOLR-1621) Allow current single core deployments to be specified by solr.xml
[ https://issues.apache.org/jira/browse/SOLR-1621?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mark Miller updated SOLR-1621: -- Attachment: SOLR-1621.patch This patch reinstates a fix for the SolrDispatcher config name override so that all tests pass again. It also changes DEFAULT_CORE to use a final static constant rather than be repeated. Personally, I also almost think that removing the alias feature should be its own issue. Allow current single core deployments to be specified by solr.xml - Key: SOLR-1621 URL: https://issues.apache.org/jira/browse/SOLR-1621 Project: Solr Issue Type: New Feature Affects Versions: 1.5 Reporter: Noble Paul Fix For: 1.5 Attachments: SOLR-1621.patch, SOLR-1621.patch, SOLR-1621.patch, SOLR-1621.patch, SOLR-1621.patch, SOLR-1621.patch, SOLR-1621.patch, SOLR-1621.patch, SOLR-1621.patch supporting two different modes of deployments is turning out to be hard. This leads to duplication of code. Moreover there is a lot of confusion on where do we put common configuration. See the mail thread http://markmail.org/message/3m3rqvp2ckausjnf -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Issue Comment Edited: (SOLR-1621) Allow current single core deployments to be specified by solr.xml
[ https://issues.apache.org/jira/browse/SOLR-1621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12789467#action_12789467 ] Mark Miller edited comment on SOLR-1621 at 12/11/09 7:13 PM: - This patch reinstates a fix for the SolrDispatcher config name override so that all tests pass again. It also changes DEFAULT_CORE to use a final static constant rather than be repeated. Personally, I also almost think that removing the alias feature should be its own issue. (*EDIT* okay, I see you made that a sub-issue - nice) was (Author: markrmil...@gmail.com): This patch reinstates a fix for the SolrDispatcher config name override so that all tests pass again. It also changes DEFAULT_CORE to use a final static constant rather than be repeated. Personally, I also almost think that removing the alias feature should be its own issue. Allow current single core deployments to be specified by solr.xml - Key: SOLR-1621 URL: https://issues.apache.org/jira/browse/SOLR-1621 Project: Solr Issue Type: New Feature Affects Versions: 1.5 Reporter: Noble Paul Fix For: 1.5 Attachments: SOLR-1621.patch, SOLR-1621.patch, SOLR-1621.patch, SOLR-1621.patch, SOLR-1621.patch, SOLR-1621.patch, SOLR-1621.patch, SOLR-1621.patch, SOLR-1621.patch supporting two different modes of deployments is turning out to be hard. This leads to duplication of code. Moreover there is a lot of confusion on where do we put common configuration. See the mail thread http://markmail.org/message/3m3rqvp2ckausjnf -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (SOLR-1131) Allow a single field type to index multiple fields
[ https://issues.apache.org/jira/browse/SOLR-1131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann updated SOLR-1131: Attachment: SOLR-1131.Mattmann.121109.patch.txt OK, this is getting a lot closer to ready to commit. Changes: Hi All, Updated patch: {quote} # Introduced a MultiValueSource - ValueSource that abstractly represents ValueSources for poly fields, and other things. {quote} I added javadoc to this and the ASF license header. {quote} # Introduced PointValueSource - point(x,y,z) - a MultiValueSource that wraps other value sources (could be called something else, I suppose) {quote} I put the ASF header before the package decl, to be consistent with the other SOLR java files. {quote} # Add in SchemaAware callback mechanism so that Field Types and other schema stuff can register dynamic fields, etc. after the schema has been created {quote} Added more javadoc here, and ASF license. {quote} # Incorporated various comments from Chris and Yonik. {quote} Thanks, I appreciate it. I'm still -1 on the way this patch deals with the optimization issue. I'd like to see evidence that it makes sense to not use split and trim. Cheers, Chris Allow a single field type to index multiple fields -- Key: SOLR-1131 URL: https://issues.apache.org/jira/browse/SOLR-1131 Project: Solr Issue Type: New Feature Components: Schema and Analysis Reporter: Ryan McKinley Assignee: Grant Ingersoll Fix For: 1.5 Attachments: SOLR-1131-IndexMultipleFields.patch, SOLR-1131.Mattmann.121009.patch.txt, SOLR-1131.Mattmann.121109.patch.txt, SOLR-1131.patch, SOLR-1131.patch, SOLR-1131.patch, SOLR-1131.patch, SOLR-1131.patch, SOLR-1131.patch, SOLR-1131.patch In a few special cases, it makes sense for a single field (the concept) to be indexed as a set of Fields (lucene Field). Consider SOLR-773. The concept point may be best indexed in a variety of ways: * geohash (sincle lucene field) * lat field, lon field (two double fields) * cartesian tiers (a series of fields with tokens to say if it exists within that region) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Issue Comment Edited: (SOLR-1131) Allow a single field type to index multiple fields
[ https://issues.apache.org/jira/browse/SOLR-1131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12789484#action_12789484 ] Chris A. Mattmann edited comment on SOLR-1131 at 12/11/09 7:39 PM: --- Hi All, Updated patch: {quote} # Introduced a MultiValueSource - ValueSource that abstractly represents ValueSources for poly fields, and other things. {quote} I added javadoc to this and the ASF license header. {quote} # Introduced PointValueSource - point(x,y,z) - a MultiValueSource that wraps other value sources (could be called something else, I suppose) {quote} I put the ASF header before the package decl, to be consistent with the other SOLR java files. {quote} # Add in SchemaAware callback mechanism so that Field Types and other schema stuff can register dynamic fields, etc. after the schema has been created {quote} Added more javadoc here, and ASF license. {quote} # Incorporated various comments from Chris and Yonik. {quote} Thanks, I appreciate it. I'm still -1 on the way this patch deals with the optimization issue. I'd like to see evidence that it makes sense to not use split and trim. Cheers, Chris was (Author: chrismattmann): OK, this is getting a lot closer to ready to commit. Changes: Hi All, Updated patch: {quote} # Introduced a MultiValueSource - ValueSource that abstractly represents ValueSources for poly fields, and other things. {quote} I added javadoc to this and the ASF license header. {quote} # Introduced PointValueSource - point(x,y,z) - a MultiValueSource that wraps other value sources (could be called something else, I suppose) {quote} I put the ASF header before the package decl, to be consistent with the other SOLR java files. {quote} # Add in SchemaAware callback mechanism so that Field Types and other schema stuff can register dynamic fields, etc. after the schema has been created {quote} Added more javadoc here, and ASF license. {quote} # Incorporated various comments from Chris and Yonik. {quote} Thanks, I appreciate it. I'm still -1 on the way this patch deals with the optimization issue. I'd like to see evidence that it makes sense to not use split and trim. Cheers, Chris Allow a single field type to index multiple fields -- Key: SOLR-1131 URL: https://issues.apache.org/jira/browse/SOLR-1131 Project: Solr Issue Type: New Feature Components: Schema and Analysis Reporter: Ryan McKinley Assignee: Grant Ingersoll Fix For: 1.5 Attachments: SOLR-1131-IndexMultipleFields.patch, SOLR-1131.Mattmann.121009.patch.txt, SOLR-1131.Mattmann.121109.patch.txt, SOLR-1131.patch, SOLR-1131.patch, SOLR-1131.patch, SOLR-1131.patch, SOLR-1131.patch, SOLR-1131.patch, SOLR-1131.patch In a few special cases, it makes sense for a single field (the concept) to be indexed as a set of Fields (lucene Field). Consider SOLR-773. The concept point may be best indexed in a variety of ways: * geohash (sincle lucene field) * lat field, lon field (two double fields) * cartesian tiers (a series of fields with tokens to say if it exists within that region) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Re: Namespaces in response (SOLR-1586)
: : I think the initial geosearch feature can start off with : : str10,20/str for a point. : : +1. : : Fundamentally, how is a string a point? Fundementally a string is not a point, and a point is not a string -- but if you want express the concept of a point in a manner that only uses very simple primative types, then a string containing comma seperated numbers is a pretty dencet way to do it. If you'd prefer, a pair of numbers would workd just as well... arrfloat10/floatfloat20/float/arr : The current XML format SOlr uses was designed to be extremely simple, very : JSON-esque, and easily parsable by *anyone* in any langauge, without : needing special knowledge of types . : : Whoah. I'm totally confused now. Why have FieldTypes then? When not just use : Lucene? The use case for FieldTypes is _not_ just for indexing, or querying. : It's also for representation? No, actually the use case for FieldTYpes is entirely about the internal logic of how Solr should deal with those fields, and how various operations should work on them. FieldTypes can dictate the internal representation within the confines of a Lucene index, but they should not circumvent the contracts of the response writers in dictating what is/isn't a legal response. XMLWriter.writePrim may be public, which means there is a loophole that plugin writers can exploit to add new tag names to the Solr XML response that violate the contract (and no we don't have a formal XSD or DTD for our XML response format, but we still have a very well advertised contract) -- but that doesn't mean that code which ships with Solr should exploit those loopholes to violate that contract. People should expect that if they use Solr as is without any custom code that the XMLResponseWriter won't all of the sudden start including new, non-primitive-ish, XML tags/attributes that weren't there before. That's the entire point of the format as it was designed: break down whatever complex data might be involved in a response into easily digestible maps/lists of maps/lists of very primitive types that can easily be used in any programming langauge. : allowed for a while I think), why prevent it? Allowing namespaces does _not_ : break anything. ... : introducing a new 'point concept, wether as point or as : georss:point/, is going to break things for people. : : Show me an example, I fundamentally disagree with this. Ok. Let's start with SolrJ then: take a look at the KnownType enum (line 151) in XMLResponseParser... http://svn.apache.org/viewvc/lucene/solr/trunk/src/solrj/org/apache/solr/client/solrj/impl/XMLResponseParser.java?revision=819403view=markup ...or let's do a random google code search for solr xml lst -- check out ResponseContentHandler in solrpy... http://code.google.com/p/solrpy/source/browse/trunk/solr/core.py#841 ...I can't write python code to save my life, but I have pretty good idea what that code will do if it sees an unexpected tag. This is how a *LOT* of SOlr client libraries are implemented ... it's not an issue of broken XML parsers freaking out about namespaces, it's an issue of having a long standing, heavily advertised schema for the XML response that promises to only ever use a handful of types. Adding any new tags to this format (regardless of how easy it may be because of that stupid fucking public modifier on XMLWuiter.writePrim) will absolutely break things for people. : And why is that? Isn't the point of SOLR to expand to use cases brought up : by users of the system? As long as those use cases can be principally : supported, without breaking backwards compatibility (or in that case, if : they do, with large blinking red text that says it), then you're shutting : people out for 0 benefit? It's aesthetics we're talking about here. I don't know if i'd say that's the point of Solr, but yes we should absolutely try to grow the capabilities of the system as new use cases come along. I am 100% in agreement that the existing simple XMLRresponseWriter is not for everyone -- Historicly we've tried to maintain a sense of equality between all of hte Response writers, so that they all contained the same data just with different markup -- but there are clearly cases where it would be nice to have a response writer that is allowed to know more about teh real structure of the data and represent it in a manner that more closely represents it's purpose. This was the entire point behind adding FieldType.toOBject, and UUIDFIeld w/the BinaryResponseWriter is a good example of the model we should follow in the future. There is a clear push for Solr to natively be able to generated responses that incorporate more industry standard XML schemas, and i would love to see us start adding functionality to do that, but bastardizing the existing XMLResponseWriter format is not the way to do it. Bottom Line: I am a big fat -1 on any patch to Solr that adds new xml tags to the output
Re: Namespaces in response (SOLR-1586)
: themselves ... because of the back-ass-wards way we have FieldTypes write : their values directly to an XMLWriter or a TextWriter the idea of using an : object that stringifies itself as needed doesn't really apply very well : : I think it's rather powerful. You insulate the following variations into 1 : single place to change them (FieldType): : : * output representation : * indexing : * validation : : To remove this from FieldType would be to strew the same functionality : across multiple classes, which doesn't make sense IMHO. it's a damned-if-you-do/damned-if-you-don't situation though ... you look at as insulating the response writers because all of the logic about serializing data is in the FieldType, but i look at it as poluting the FieldType with knowledge about the output formats -- there's a reason we didn't add writeBinary to the FieldTYpe when the BinaryResponseWriter was added ... the toObject abstraction let's the FieldType do whatever it wants internally, and provide it's best face to the world when asked. the ResponseWriters can then apply hueristics to decide the most compatible type they know of to use when representing it: is it something complex i have a codec for? no; oh well, then is it soemthing that implemnets COllection? no; oh well, then is it something that is an instanceof Number? no; oh well, as a last resort we can stringify : In the long run, this might be nice, and +1 on getting there in the long : run. In the short, a compromise is to allow namespacing on fields in the : existing XmlWriter, which is allowed anyways, whether by oversight or not. I'm sure if we look hard enough at teh existing internal APIs, we can find a way to generate completley broken XML that no DOM, SAX or pull parser could possibly deal with cleanly -- but that doesn't mean we should do that just because it would allow us to start outputing a bunch of metadata that we think is useful. breaking the (implicit) XML Schema is just as bad as breaking the XML itself. -Hoss
[jira] Updated: (SOLR-1387) Add more search options for filtering field facets.
[ https://issues.apache.org/jira/browse/SOLR-1387?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hoss Man updated SOLR-1387: --- Summary: Add more search options for filtering field facets. (was: Add more search options for filtering facets.) Add more search options for filtering field facets. --- Key: SOLR-1387 URL: https://issues.apache.org/jira/browse/SOLR-1387 Project: Solr Issue Type: New Feature Components: search Affects Versions: 1.4 Reporter: Anil Khadka Currently for filtering the facets, we have to use prefix (which use String.startsWith() in java). We can add some parameters like * facet.iPrefix : this would act like case-insensitive search. (or --- facet.prefix=afacet.caseinsense=on) * facet.regex : this is pure regular expression search (which obviously would be expensive if issued). Moreover, allowing multiple filtering for same field would be great like facet.prefix=a OR facet.prefix=A ... sth like this. All above concepts could be equally applicable to TermsComponent. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (SOLR-1387) Add more search options for filtering field facets.
[ https://issues.apache.org/jira/browse/SOLR-1387?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anil Khadka updated SOLR-1387: -- Fix Version/s: 1.5 Affects Version/s: (was: 1.4) Add more search options for filtering field facets. --- Key: SOLR-1387 URL: https://issues.apache.org/jira/browse/SOLR-1387 Project: Solr Issue Type: New Feature Components: search Reporter: Anil Khadka Fix For: 1.5 Currently for filtering the facets, we have to use prefix (which use String.startsWith() in java). We can add some parameters like * facet.iPrefix : this would act like case-insensitive search. (or --- facet.prefix=afacet.caseinsense=on) * facet.regex : this is pure regular expression search (which obviously would be expensive if issued). Moreover, allowing multiple filtering for same field would be great like facet.prefix=a OR facet.prefix=A ... sth like this. All above concepts could be equally applicable to TermsComponent. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1131) Allow a single field type to index multiple fields
[ https://issues.apache.org/jira/browse/SOLR-1131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12789526#action_12789526 ] Grant Ingersoll commented on SOLR-1131: --- I think this is ready to commit. I'd like to do so on Monday or Tuesday of next week, so that should give plenty of time for further review Allow a single field type to index multiple fields -- Key: SOLR-1131 URL: https://issues.apache.org/jira/browse/SOLR-1131 Project: Solr Issue Type: New Feature Components: Schema and Analysis Reporter: Ryan McKinley Assignee: Grant Ingersoll Fix For: 1.5 Attachments: SOLR-1131-IndexMultipleFields.patch, SOLR-1131.Mattmann.121009.patch.txt, SOLR-1131.Mattmann.121109.patch.txt, SOLR-1131.patch, SOLR-1131.patch, SOLR-1131.patch, SOLR-1131.patch, SOLR-1131.patch, SOLR-1131.patch, SOLR-1131.patch In a few special cases, it makes sense for a single field (the concept) to be indexed as a set of Fields (lucene Field). Consider SOLR-773. The concept point may be best indexed in a variety of ways: * geohash (sincle lucene field) * lat field, lon field (two double fields) * cartesian tiers (a series of fields with tokens to say if it exists within that region) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Assigned: (SOLR-1297) Enable sorting by Function Query
[ https://issues.apache.org/jira/browse/SOLR-1297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Ingersoll reassigned SOLR-1297: - Assignee: Grant Ingersoll Enable sorting by Function Query Key: SOLR-1297 URL: https://issues.apache.org/jira/browse/SOLR-1297 Project: Solr Issue Type: New Feature Reporter: Grant Ingersoll Assignee: Grant Ingersoll Priority: Minor Fix For: 1.5 It would be nice if one could sort by FunctionQuery. See also SOLR-773, where this was first mentioned by Yonik as part of the generic solution to geo-search -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Work started: (SOLR-1297) Enable sorting by Function Query
[ https://issues.apache.org/jira/browse/SOLR-1297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on SOLR-1297 started by Grant Ingersoll. Enable sorting by Function Query Key: SOLR-1297 URL: https://issues.apache.org/jira/browse/SOLR-1297 Project: Solr Issue Type: New Feature Reporter: Grant Ingersoll Assignee: Grant Ingersoll Priority: Minor Fix For: 1.5 It would be nice if one could sort by FunctionQuery. See also SOLR-773, where this was first mentioned by Yonik as part of the generic solution to geo-search -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Issue Comment Edited: (SOLR-236) Field collapsing
[ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12789550#action_12789550 ] Chad Kouse edited comment on SOLR-236 at 12/11/09 9:37 PM: --- Just wanted to comment that I am experiencing the same behavior as Marc Menghin above (NPE) -- the patch did NOT install cleanly (1 hunk failed) -- but I couldn't really tell why since it looked like it should have worked -- I just manually copied the hunk into the correct class Sorry I didn't note what failed was (Author: chadkouse): Just wanted to comment that I am experiencing the same behavior as Marc Menghin above (NPE) -- the patch did NOT install cleanly (1 hunk failed) -- but I couldn't really tell why since it looked like it should have worked -- I just manually copied the hunk into the write class Sorry I didn't note what failed Field collapsing Key: SOLR-236 URL: https://issues.apache.org/jira/browse/SOLR-236 Project: Solr Issue Type: New Feature Components: search Affects Versions: 1.3 Reporter: Emmanuel Keller Fix For: 1.5 Attachments: collapsing-patch-to-1.3.0-dieter.patch, collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch, field-collapse-4-with-solrj.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, quasidistributed.additional.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, solr-236.patch, SOLR-236_collapsing.patch, SOLR-236_collapsing.patch This patch include a new feature called Field collapsing. Used in order to collapse a group of results with similar value for a given field to a single entry in the result set. Site collapsing is a special case of this, where all results for a given web site is collapsed into one or two entries in the result set, typically with an associated more documents from this site link. See also Duplicate detection. http://www.fastsearch.com/glossary.aspx?m=48amid=299 The implementation add 3 new query parameters (SolrParams): collapse.field to choose the field used to group results collapse.type normal (default value) or adjacent collapse.max to select how many continuous results are allowed before collapsing TODO (in progress): - More documentation (on source code) - Test cases Two patches: - field_collapsing.patch for current development version - field_collapsing_1.1.0.patch for Solr-1.1.0 P.S.: Feedback and misspelling correction are welcome ;-) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-236) Field collapsing
[ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12789550#action_12789550 ] Chad Kouse commented on SOLR-236: - Just wanted to comment that I am experiencing the same behavior as Marc Menghin above (NPE) -- the patch did NOT install cleanly (1 hunk failed) -- but I couldn't really tell why since it looked like it should have worked -- I just manually copied the hunk into the write class Sorry I didn't note what failed Field collapsing Key: SOLR-236 URL: https://issues.apache.org/jira/browse/SOLR-236 Project: Solr Issue Type: New Feature Components: search Affects Versions: 1.3 Reporter: Emmanuel Keller Fix For: 1.5 Attachments: collapsing-patch-to-1.3.0-dieter.patch, collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch, field-collapse-4-with-solrj.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, quasidistributed.additional.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, solr-236.patch, SOLR-236_collapsing.patch, SOLR-236_collapsing.patch This patch include a new feature called Field collapsing. Used in order to collapse a group of results with similar value for a given field to a single entry in the result set. Site collapsing is a special case of this, where all results for a given web site is collapsed into one or two entries in the result set, typically with an associated more documents from this site link. See also Duplicate detection. http://www.fastsearch.com/glossary.aspx?m=48amid=299 The implementation add 3 new query parameters (SolrParams): collapse.field to choose the field used to group results collapse.type normal (default value) or adjacent collapse.max to select how many continuous results are allowed before collapsing TODO (in progress): - More documentation (on source code) - Test cases Two patches: - field_collapsing.patch for current development version - field_collapsing_1.1.0.patch for Solr-1.1.0 P.S.: Feedback and misspelling correction are welcome ;-) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1297) Enable sorting by Function Query
[ https://issues.apache.org/jira/browse/SOLR-1297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12789551#action_12789551 ] Grant Ingersoll commented on SOLR-1297: --- For this, I think we want to be able to do things like: Just functions {code} sort=dist(2,x,y, point(0,0)) desc {code} Multiple sort params, some functions, some fields {code} sort=weight asc,dist(2,x,y, point(0,0)) asc {code} If and when a function result cache exists, we should be able to take advantage of that too, but that is an implementation detail. Enable sorting by Function Query Key: SOLR-1297 URL: https://issues.apache.org/jira/browse/SOLR-1297 Project: Solr Issue Type: New Feature Reporter: Grant Ingersoll Assignee: Grant Ingersoll Priority: Minor Fix For: 1.5 It would be nice if one could sort by FunctionQuery. See also SOLR-773, where this was first mentioned by Yonik as part of the generic solution to geo-search -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Re: Namespaces in response (SOLR-1586)
Hi Hoss, : : I think the initial geosearch feature can start off with : : str10,20/str for a point. : : +1. : : Fundamentally, how is a string a point? Fundementally a string is not a point, and a point is not a string -- but if you want express the concept of a point in a manner that only uses very simple primative types, then a string containing comma seperated numbers is a pretty dencet way to do it. If you'd prefer, a pair of numbers would workd just as well... arrfloat10/floatfloat20/float/arr I'm conflicted here. In simple semantics, sure it's just an array of float/double numbers. A, if a string must be used a comma is probably OK, so long as it maps to some existing known approach to represent points. I've asked several times if there are examples. I can point to one that uses spaces to separate the coordinates in the point (georss). What others use comma? : The current XML format SOlr uses was designed to be extremely simple, very : JSON-esque, and easily parsable by *anyone* in any langauge, without : needing special knowledge of types . : : Whoah. I'm totally confused now. Why have FieldTypes then? When not just use : Lucene? The use case for FieldTypes is _not_ just for indexing, or querying. : It's also for representation? No, actually the use case for FieldTYpes is entirely about the internal logic of how Solr should deal with those fields, and how various operations should work on them. FieldTypes can dictate the internal representation within the confines of a Lucene index, but they should not circumvent the contracts of the response writers in dictating what is/isn't a legal response. Well, I actually would disagree. What's the point of #toInternal and #toExternal then, other than to convert from the external representation to an internal Lucene index representation, and then to do the opposite coming out of the index? : allowed for a while I think), why prevent it? Allowing namespaces does _not_ : break anything. ... : introducing a new 'point concept, wether as point or as : georss:point/, is going to break things for people. : : Show me an example, I fundamentally disagree with this. Ok. Let's start with SolrJ then: take a look at the KnownType enum (line 151) in XMLResponseParser... http://svn.apache.org/viewvc/lucene/solr/trunk/src/solrj/org/apache/solr/clien t/solrj/impl/XMLResponseParser.java?revision=819403view=markup Got it. OK, sure, well thanks for actually being able to identify somewhere where it would be and for taking the time to provide a link. So what you are saying is that this breaks the SolrJ and python clients and people who develop clients to parse and read the (undocumented) SOLR response schema. ...or let's do a random google code search for solr xml lst -- check out ResponseContentHandler in solrpy... http://code.google.com/p/solrpy/source/browse/trunk/solr/core.py#841 ...I can't write python code to save my life, but I have pretty good idea what that code will do if it sees an unexpected tag. Gotcha. : And why is that? Isn't the point of SOLR to expand to use cases brought up : by users of the system? As long as those use cases can be principally : supported, without breaking backwards compatibility (or in that case, if : they do, with large blinking red text that says it), then you're shutting : people out for 0 benefit? It's aesthetics we're talking about here. I don't know if i'd say that's the point of Solr, but yes we should absolutely try to grow the capabilities of the system as new use cases come along. Well that's what I was trying to do, but all I was hearing was a lot of hollering without any help to understand why. Thanks for being the one to finally provide that information. I am 100% in agreement that the existing simple XMLRresponseWriter is not for everyone -- Historicly we've tried to maintain a sense of equality between all of hte Response writers, so that they all contained the same data just with different markup -- but there are clearly cases where it would be nice to have a response writer that is allowed to know more about teh real structure of the data and represent it in a manner that more closely represents it's purpose. I'd like to refactor the whole thing to be a bit less brittle, and also to close off people that shouldn't be dealing with SOLR's XML in/out (by taking away your favorite writePrim method and its public modifier and making the class final which it once was). We should rename that to SolrXmlResponseWriter, but it's not really generic XML (as the name suggests), it's SOLR's custom (undocumented) XML schema, right? Also, since it's undocumented, I'd be happy to throw it together for it's XML format. Would that also be welcomed? Then, we should develop an easy extension point mechanism for people who want to develop their own XML response writers and write their own clients (or leverage existing clients that
[jira] Created: (SOLR-1645) Add human content-type
Add human content-type -- Key: SOLR-1645 URL: https://issues.apache.org/jira/browse/SOLR-1645 Project: Solr Issue Type: Improvement Components: contrib - Solr Cell (Tika extraction) Affects Versions: 1.4 Reporter: Khalid Yagoubi Fix For: 1.4 Idea is to allow Solr-Cell to calculate the human content-type from the extracted content-type and map it to a field in the schema. So the user can search on media: image or media:video Idea : 1) Hardcode a hashmap in somewhere in extraction classes and get human content-type from extracted content-type. I Think to SolrContentHandler.java 2) Write an xml file where we can put a mapping like in tika-config.xml for parsers 3) Use tika-config.xml to get all supported mime-types -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Re: Namespaces in response (SOLR-1586)
Hi Hoss, : I think it's rather powerful. You insulate the following variations into 1 : single place to change them (FieldType): : : * output representation : * indexing : * validation : : To remove this from FieldType would be to strew the same functionality : across multiple classes, which doesn't make sense IMHO. it's a damned-if-you-do/damned-if-you-don't situation though ... you look at as insulating the response writers because all of the logic about serializing data is in the FieldType, but i look at it as poluting the FieldType with knowledge about the output formats -- there's a reason we didn't add writeBinary to the FieldTYpe when the BinaryResponseWriter was added ... the toObject abstraction let's the FieldType do whatever it wants internally, and provide it's best face to the world when asked. the ResponseWriters can then apply hueristics to decide the most compatible type they know of to use when representing it: is it something complex i have a codec for? no; oh well, then is it soemthing that implemnets COllection? no; oh well, then is it something that is an instanceof Number? no; oh well, as a last resort we can stringify Sure, it's just that it's half-way on both sides right now like you said. There's probably a middle ground. I like the insulation but I also understand the clutter (i.e., what you're saying). : In the long run, this might be nice, and +1 on getting there in the long : run. In the short, a compromise is to allow namespacing on fields in the : existing XmlWriter, which is allowed anyways, whether by oversight or not. I'm sure if we look hard enough at teh existing internal APIs, we can find a way to generate completley broken XML that no DOM, SAX or pull parser could possibly deal with cleanly -- but that doesn't mean we should do that just because it would allow us to start outputing a bunch of metadata that we think is useful. breaking the (implicit) XML Schema is just as bad as breaking the XML itself. Agreed. Let's document that (implicit) schema so loud people like me don't keep bugging you guys when it's so obvious to you. I'm just trying to help. I'll take an action. Cheers, Chris ++ Chris Mattmann, Ph.D. Senior Computer Scientist NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 171-266B, Mailstop: 171-246 Email: chris.mattm...@jpl.nasa.gov WWW: http://sunset.usc.edu/~mattmann/ ++ Adjunct Assistant Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA ++
[jira] Commented: (SOLR-1131) Allow a single field type to index multiple fields
[ https://issues.apache.org/jira/browse/SOLR-1131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12789561#action_12789561 ] Chris A. Mattmann commented on SOLR-1131: - Hi Grant: {quote} My tests show it to be at least 7 times faster. But this should be obvious from static analysis, too. First of all, String.split() uses a regex which then makes a pass through the underlying character array. Then, trim has to go back through and analyze the char array too, not to mention the extra String creations. The optimized version here makes one pass and deals solely at the char array level and only has to do the substring, which I think can be optimized by the JVM to be a copy on write. {quote} Got it. A couple of points: 1. 7x faster is great, but could end up being noise if x = 2 ms. It matters if x is say 2 minutes, agreed. If it's on the ms end then the expense of more lines of (uncommented) code isn't worth it. 2. This code is likely to get called heavily on the indexing side, so performance, though still an issue, is not as hugely important as say on the searching side. 3. If you feel strongly about an optimized version of this magic splitAndTrim function, how ability a utility function and refactor then? I would guess this code could be used elsewhere, and that would help to satisfy my hunger for reusability. I'll even javadoc the function and do the refactor if you'd like. Cheers, Chris Allow a single field type to index multiple fields -- Key: SOLR-1131 URL: https://issues.apache.org/jira/browse/SOLR-1131 Project: Solr Issue Type: New Feature Components: Schema and Analysis Reporter: Ryan McKinley Assignee: Grant Ingersoll Fix For: 1.5 Attachments: SOLR-1131-IndexMultipleFields.patch, SOLR-1131.Mattmann.121009.patch.txt, SOLR-1131.Mattmann.121109.patch.txt, SOLR-1131.patch, SOLR-1131.patch, SOLR-1131.patch, SOLR-1131.patch, SOLR-1131.patch, SOLR-1131.patch, SOLR-1131.patch In a few special cases, it makes sense for a single field (the concept) to be indexed as a set of Fields (lucene Field). Consider SOLR-773. The concept point may be best indexed in a variety of ways: * geohash (sincle lucene field) * lat field, lon field (two double fields) * cartesian tiers (a series of fields with tokens to say if it exists within that region) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1506) Search multiple cores using MultiReader
[ https://issues.apache.org/jira/browse/SOLR-1506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12789600#action_12789600 ] Jason Rutherglen commented on SOLR-1506: There's a different bug here, where because CoreContainer loads the cores sequentially, and MultiCoreReaderFactory looks for all the cores, when the proxy core isn't last, not all the cores are searchable, if the proxy is first, an exception is thrown. The workaround is to place the proxy core last, however that's not possible when using the core admin HTTP API. Hmm... Not sure what the best workaround is. Search multiple cores using MultiReader --- Key: SOLR-1506 URL: https://issues.apache.org/jira/browse/SOLR-1506 Project: Solr Issue Type: Improvement Components: search Affects Versions: 1.4 Reporter: Jason Rutherglen Priority: Trivial Fix For: 1.5 Attachments: SOLR-1506.patch, SOLR-1506.patch, SOLR-1506.patch I need to search over multiple cores, and SOLR-1477 is more complicated than expected, so here we'll create a MultiReader over the cores to allow searching on them. Maybe in the future we can add parallel searching however SOLR-1477, if it gets completed, provides that out of the box. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1632) Distributed IDF
[ https://issues.apache.org/jira/browse/SOLR-1632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12789607#action_12789607 ] Andrzej Bialecki commented on SOLR-1632: - I believe the API that I propose would support such implementation as well. Please note that it's usually not feasible to compute and distribute the complete IDF table for all terms - you would have to replicate a union of all term dictionaries across the cluster. In practice, you limit the amount of information by various means, e.g. only distributing data related to the current request (this implementation) or reducing the frequency of updates (e.g. LRU caching), or approximating global DF with a constant for frequent terms (where the contribution of their IDF to the score would be negligible anyway). Distributed IDF --- Key: SOLR-1632 URL: https://issues.apache.org/jira/browse/SOLR-1632 Project: Solr Issue Type: New Feature Components: search Affects Versions: 1.5 Reporter: Andrzej Bialecki Attachments: distrib.patch Distributed IDF is a valuable enhancement for distributed search across non-uniform shards. This issue tracks the proposed implementation of an API to support this functionality in Solr. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (SOLR-1177) Distributed TermsComponent
[ https://issues.apache.org/jira/browse/SOLR-1177?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matt Weber updated SOLR-1177: - Attachment: SOLR-1177.patch Here is an updated patch that includes Shalin's suggestions: - replace TermData with TermsResponse.Term - updates TermsHelper to use the parsing code from TermsResponse I also changed TermsResponse.Term#frequency to a long so that we don't overflow when calculating the frequency. Then to keep back-compatbility with existing code I do the following when writing it to the NamedList: if (tc.getFrequency() = freqmin tc.getFrequency() = freqmax) { fieldterms.add(tc.getTerm(), ((Number)tc.getFrequency()).intValue()); cnt++; } Is this a good approach? This new patch includes SOLR-1139. Distributed TermsComponent -- Key: SOLR-1177 URL: https://issues.apache.org/jira/browse/SOLR-1177 Project: Solr Issue Type: Improvement Affects Versions: 1.4 Reporter: Matt Weber Assignee: Shalin Shekhar Mangar Priority: Minor Fix For: 1.5 Attachments: SOLR-1177.patch, SOLR-1177.patch, SOLR-1177.patch, SOLR-1177.patch, TermsComponent.java, TermsComponent.patch TermsComponent should be distributed -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1621) Allow current single core deployments to be specified by solr.xml
[ https://issues.apache.org/jira/browse/SOLR-1621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12789675#action_12789675 ] Noble Paul commented on SOLR-1621: -- bq.You mark setSolrConfigFilename as deprecated, but it simply doesn't work anymore either - thats not really deprecation. I feel that deprecating solrconfig-filename should be its own issue so thats a more clear entry in CHANGES I shall open another issue for this. The setSolrConfigFilename() did not work for multicore . So it does not make sense for us to bend over backwards to make it work in this and add too many overloaded methods. Anyway, users won't need Allow current single core deployments to be specified by solr.xml - Key: SOLR-1621 URL: https://issues.apache.org/jira/browse/SOLR-1621 Project: Solr Issue Type: New Feature Affects Versions: 1.5 Reporter: Noble Paul Fix For: 1.5 Attachments: SOLR-1621.patch, SOLR-1621.patch, SOLR-1621.patch, SOLR-1621.patch, SOLR-1621.patch, SOLR-1621.patch, SOLR-1621.patch, SOLR-1621.patch, SOLR-1621.patch supporting two different modes of deployments is turning out to be hard. This leads to duplication of code. Moreover there is a lot of confusion on where do we put common configuration. See the mail thread http://markmail.org/message/3m3rqvp2ckausjnf -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1592) Refactor XMLWriter startTag to allow arbitrary attributes to be written
[ https://issues.apache.org/jira/browse/SOLR-1592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12789678#action_12789678 ] Lance Norskog commented on SOLR-1592: - Ok, never mind. Refactor XMLWriter startTag to allow arbitrary attributes to be written --- Key: SOLR-1592 URL: https://issues.apache.org/jira/browse/SOLR-1592 Project: Solr Issue Type: Improvement Affects Versions: 1.4 Environment: My MacBook laptop. Reporter: Chris A. Mattmann Assignee: Noble Paul Fix For: 1.5 Attachments: SOLR-1592.Mattmann.112209.patch.txt, SOLR-1592.Mattmann.112209_02.patch.txt, SOLR-1592.patch, SOLR-1592.patch There are certain cases in which a user would like to write arbitrary attributes as part of the XML output for a field tag. Case in point: I'd like to declare tags in the SOLR output that are e.g., georss namespace, like georss:point. Other users may want to declare myns:mytag tags, which should be perfectly legal as SOLR goes. This isn't currently possible with the XMLWriter implementation, which curiously only allows the attribute name to be included in the XML tags. Coincidentally, users of XMLWriter aren't allowed to modify the response outer XML tag to include those arbitrary namespaces (which was my original thought as a workaround for this). This wouldn't matter anyways, because by the time the user got to the FieldType#writeXML method, the header for the XML would have been written anyways. I've developed a workaround, and in doing so, allowed something that should have probably been allowed in the first place: allow a user to write arbitrary attributes (including xmlns:myns=myuri) as part of the XMLWriter#startTag function. I've kept the existing #startTag, but replaced its innards with versions of startTag that include startTagWithNamespaces, and startTagNoAttrs. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (SOLR-1646) Document the SOLR XML response schema in XMLSchema and DTD
Document the SOLR XML response schema in XMLSchema and DTD -- Key: SOLR-1646 URL: https://issues.apache.org/jira/browse/SOLR-1646 Project: Solr Issue Type: Improvement Components: clients - C#, clients - java, clients - php, clients - python, clients - ruby - flare, documentation, Response Writers, Schema and Analysis, search Affects Versions: 1.4, 1.3, 1.2, 1.1.0 Environment: My local Mac Book pro over Christmas Break. Reporter: Chris A. Mattmann Priority: Critical Fix For: 1.5 See this thread: http://www.lucidimagination.com/search/document/e8bb6cac84c1f520/namespaces_in_response_solr_1586#cc50ba9e9d8fe2dc For the background. The idea is to finally document the often alluded to (but never riefied) SOLR XML response schema/DTD. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (SOLR-1648) Rename XMLWriter (and XMLResponseWriter) to SolrXmlResponseWriter
Rename XMLWriter (and XMLResponseWriter) to SolrXmlResponseWriter - Key: SOLR-1648 URL: https://issues.apache.org/jira/browse/SOLR-1648 Project: Solr Issue Type: Improvement Components: Response Writers Affects Versions: 1.4, 1.3, 1.2 Environment: My local MacBook pro over the Christmas Break. Reporter: Chris A. Mattmann Fix For: 1.5 The current XMLWriter class is kind of a misnomer. It's not a generic XMLWriter by any means, and that's not its intention. Its intention is to write instances (as responses) of a particular XML schema that SOLR clients (written in Java, Python, pick-your-favorite-programming-language) that speak its XML protocol can understand. So, we should rename it to SolrXmlResponseWriter to indicate its unique XML speak. Morever, as part of the next issue I'm going to report (refactoring all ResponseWriters to be a bit more friendly), we should probably do away with the current XmlResponseWriter (which simply delegates to XmlWriter -- eeep) and just stick with SolrXmlResponseWriter. Patch forthcoming. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1648) Rename XMLWriter (and XMLResponseWriter) to SolrXmlResponseWriter
[ https://issues.apache.org/jira/browse/SOLR-1648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12789682#action_12789682 ] Chris A. Mattmann commented on SOLR-1648: - See this thread for more detail: http://www.lucidimagination.com/search/document/e8bb6cac84c1f520/namespaces_in_response_solr_1586#cc50ba9e9d8fe2dc Rename XMLWriter (and XMLResponseWriter) to SolrXmlResponseWriter - Key: SOLR-1648 URL: https://issues.apache.org/jira/browse/SOLR-1648 Project: Solr Issue Type: Improvement Components: Response Writers Affects Versions: 1.2, 1.3, 1.4 Environment: My local MacBook pro over the Christmas Break. Reporter: Chris A. Mattmann Fix For: 1.5 The current XMLWriter class is kind of a misnomer. It's not a generic XMLWriter by any means, and that's not its intention. Its intention is to write instances (as responses) of a particular XML schema that SOLR clients (written in Java, Python, pick-your-favorite-programming-language) that speak its XML protocol can understand. So, we should rename it to SolrXmlResponseWriter to indicate its unique XML speak. Morever, as part of the next issue I'm going to report (refactoring all ResponseWriters to be a bit more friendly), we should probably do away with the current XmlResponseWriter (which simply delegates to XmlWriter -- eeep) and just stick with SolrXmlResponseWriter. Patch forthcoming. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (SOLR-1649) Refactor all ResponseWriters to be more extension-friendly
Refactor all ResponseWriters to be more extension-friendly -- Key: SOLR-1649 URL: https://issues.apache.org/jira/browse/SOLR-1649 Project: Solr Issue Type: Improvement Components: Response Writers Affects Versions: 1.4 Environment: My local MacBook pro over the Christmas break. Reporter: Chris A. Mattmann Fix For: 1.5 I'd like to refactor all the ResponseWriters to be a bit less brittle. ResponseWriters should follow a standard interface with more existing methods than is currently present in the interface, and with lots of refactored utility code and more concrete control/data flow. I'll take a hard look at the existing response writers and try to generalize. See this thread for background: http://www.lucidimagination.com/search/document/e8bb6cac84c1f520/namespaces_in_response_solr_1586#cc50ba9e9d8fe2dc -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Re: SOLR-1131: disconnect between fields created by poly fields
There are already components (ExtractingRequestHandler, Deduplication) that secretly add fields which violate the schema. Personally I would nuke this ability; I've had major problems with junk in the indexed data and discovering secret fields would have made my head explode that much louder. On Fri, Dec 11, 2009 at 7:01 AM, Yonik Seeley yo...@lucidimagination.com wrote: On Fri, Dec 11, 2009 at 9:53 AM, Mattmann, Chris A (388J) chris.a.mattm...@jpl.nasa.gov wrote: Actually if it was the case that poly field mapped to a single dynamic field, then I would agree with you, but as is the discussion, poly field can map to _many_ dynamic fields, which is where the drift occurs. I'm not sure if we're using the exact same terminology, but it's well defined how many dynamic fields would be created by the basic point class (exactly one) *if* we decide to go that route and use that option. Can you give an examples of what you mean? Is your objection to this point class registering a single dynamic field, or are you talking about a hypothetical case? -Yonik http://www.lucidimagination.com -- Lance Norskog goks...@gmail.com
Re: SOLR-1131: disconnect between fields created by poly fields
Hi Lance, On 12/11/09 9:08 PM, Lance Norskog goks...@gmail.com wrote: There are already components (ExtractingRequestHandler, Deduplication) that secretly add fields which violate the schema. Personally I would nuke this ability; I've had major problems with junk in the indexed data and discovering secret fields would have made my head explode that much louder. Yep, I agree. I originally felt that dynamic fields weren't the way to go, but I came around and ended up thinking they are the way to go for this. However, even though I think dynamic fields should be used for poly fields, I _also_ think they should be pre-declared in the schema -- and not secretly (as you put it) added at runtime (hence my mentioning of drift). It seems that Hoss was for this as well: http://www.lucidimagination.com/search/document/7f8ed6212481aca6/solr_1131_m ultiple_fields_per_field_type and I think he's right, though I remember going back and forth with him (and others) about it in the beginning. The thing people have to realize here is that there are 2 notions: 1. schema.xml -- edited by a user, offline before SOLR has started up, loaded at runtime and turned into... 2. an IndexSchema instance, a Java object instance used to access the schema.xml file loaded into a JVM. IndexSchema acts as a runtime model to the underlying SOLR field typing system. If #1 and #2 fall out of sync (which creating _any_ type of field in #2 does, without informing #1), then you have drift. Cheers, Chris ++ Chris Mattmann, Ph.D. Senior Computer Scientist NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 171-266B, Mailstop: 171-246 Email: chris.mattm...@jpl.nasa.gov WWW: http://sunset.usc.edu/~mattmann/ ++ Adjunct Assistant Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA ++
[jira] Assigned: (SOLR-1637) Deprecate ALIAS command
[ https://issues.apache.org/jira/browse/SOLR-1637?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Noble Paul reassigned SOLR-1637: Assignee: Noble Paul Deprecate ALIAS command --- Key: SOLR-1637 URL: https://issues.apache.org/jira/browse/SOLR-1637 Project: Solr Issue Type: Sub-task Components: multicore Affects Versions: 1.5 Reporter: Noble Paul Assignee: Noble Paul Fix For: 1.5 Mulicore makes the CoreContainer code more complex. We should remove it for now and revisit it for a simpler cleaner implementation -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Re: SOLR-1131: disconnect between fields created by poly fields
They don't violate the schema, do they? The fields added from both of those (and DIH too) all must be either fields or match dynamic field patterns. Right? Erik On Dec 12, 2009, at 6:08 AM, Lance Norskog wrote: There are already components (ExtractingRequestHandler, Deduplication) that secretly add fields which violate the schema. Personally I would nuke this ability; I've had major problems with junk in the indexed data and discovering secret fields would have made my head explode that much louder. On Fri, Dec 11, 2009 at 7:01 AM, Yonik Seeley yo...@lucidimagination.com wrote: On Fri, Dec 11, 2009 at 9:53 AM, Mattmann, Chris A (388J) chris.a.mattm...@jpl.nasa.gov wrote: Actually if it was the case that poly field mapped to a single dynamic field, then I would agree with you, but as is the discussion, poly field can map to _many_ dynamic fields, which is where the drift occurs. I'm not sure if we're using the exact same terminology, but it's well defined how many dynamic fields would be created by the basic point class (exactly one) *if* we decide to go that route and use that option. Can you give an examples of what you mean? Is your objection to this point class registering a single dynamic field, or are you talking about a hypothetical case? -Yonik http://www.lucidimagination.com -- Lance Norskog goks...@gmail.com
[jira] Commented: (SOLR-1621) Allow current single core deployments to be specified by solr.xml
[ https://issues.apache.org/jira/browse/SOLR-1621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12789689#action_12789689 ] Noble Paul commented on SOLR-1621: -- I guess deprecating ALIAS can be committed first so that the changes for this issue will be smaller and more manageable Allow current single core deployments to be specified by solr.xml - Key: SOLR-1621 URL: https://issues.apache.org/jira/browse/SOLR-1621 Project: Solr Issue Type: New Feature Affects Versions: 1.5 Reporter: Noble Paul Fix For: 1.5 Attachments: SOLR-1621.patch, SOLR-1621.patch, SOLR-1621.patch, SOLR-1621.patch, SOLR-1621.patch, SOLR-1621.patch, SOLR-1621.patch, SOLR-1621.patch, SOLR-1621.patch supporting two different modes of deployments is turning out to be hard. This leads to duplication of code. Moreover there is a lot of confusion on where do we put common configuration. See the mail thread http://markmail.org/message/3m3rqvp2ckausjnf -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Synchronizing Webapp on Tomcat with Solr instance
Hello, I have a webapp on Tomcat that displays search results to the end users . I have a Solr instance on the same linux box that has data indexed . The webapp wraps an instance of CommonHTTPSolrServer presumably to send HTTP requests to the Solr instance , now if Solr has indexed documents , how does it return the document(s) that satisfied the search query ? I am a little confused about how this communication between tomcat solr takes place -- View this message in context: http://old.nabble.com/Synchronizing-Webapp-on-Tomcat-with-Solr-instance-tp26754143p26754143.html Sent from the Solr - Dev mailing list archive at Nabble.com.