[jira] Commented: (SOLR-1632) Distributed IDF

2009-12-11 Thread Andrzej Bialecki (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12789174#action_12789174
 ] 

Andrzej Bialecki  commented on SOLR-1632:
-

I'm not sure what approach you are referring to. Following the terminology in 
that thread, this implementation follows the approach where there is a single 
merged big idf map at the master, and it's sent out to slaves on each query. 
However, when exactly this merging and sending happens is 
implementation-specific - in the ExactDFSource it happens on every query, but I 
hope the API can support other scenarios as well.

 Distributed IDF
 ---

 Key: SOLR-1632
 URL: https://issues.apache.org/jira/browse/SOLR-1632
 Project: Solr
  Issue Type: New Feature
  Components: search
Affects Versions: 1.5
Reporter: Andrzej Bialecki 
 Attachments: distrib.patch


 Distributed IDF is a valuable enhancement for distributed search across 
 non-uniform shards. This issue tracks the proposed implementation of an API 
 to support this functionality in Solr.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (SOLR-1625) Add regexp support for TermsComponent

2009-12-11 Thread Noble Paul (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1625?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Noble Paul resolved SOLR-1625.
--

Resolution: Fixed

committed r889537
Thanks Uri Boness

 Add regexp support for TermsComponent
 -

 Key: SOLR-1625
 URL: https://issues.apache.org/jira/browse/SOLR-1625
 Project: Solr
  Issue Type: Improvement
  Components: search
Affects Versions: 1.4
Reporter: Uri Boness
Assignee: Noble Paul
Priority: Minor
 Fix For: 1.5

 Attachments: SOLR-1625.patch, SOLR-1625.patch, SOLR-1625.patch


 At the moment the only way to filter the returned terms is by a prefix. It 
 would be nice it the filter could also be done by regular expression

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (SOLR-1598) Reader used in PlainTextEntityProcessor.nextRow() is not explicitly closed

2009-12-11 Thread Noble Paul (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1598?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Noble Paul resolved SOLR-1598.
--

Resolution: Fixed

committed r889541

 Reader used in PlainTextEntityProcessor.nextRow() is not explicitly closed
 --

 Key: SOLR-1598
 URL: https://issues.apache.org/jira/browse/SOLR-1598
 Project: Solr
  Issue Type: Bug
  Components: contrib - DataImportHandler
Affects Versions: 1.4
Reporter: Sascha Szott
Assignee: Noble Paul
 Fix For: 1.5

 Attachments: SOLR-1598.patch, solr-1598.patch


 Due to a missing r.close() statement at the end of method 
 PlainTextEntityProcessor.nextRow(), IO exceptions (too many open files) can 
 occur when large numbers of files are processed.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Issue Comment Edited: (SOLR-1598) Reader used in PlainTextEntityProcessor.nextRow() is not explicitly closed

2009-12-11 Thread Noble Paul (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1598?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12789191#action_12789191
 ] 

Noble Paul edited comment on SOLR-1598 at 12/11/09 9:25 AM:


committed r889541

Thanks Sascha Szott

  was (Author: noble.paul):
committed r889541
  
 Reader used in PlainTextEntityProcessor.nextRow() is not explicitly closed
 --

 Key: SOLR-1598
 URL: https://issues.apache.org/jira/browse/SOLR-1598
 Project: Solr
  Issue Type: Bug
  Components: contrib - DataImportHandler
Affects Versions: 1.4
Reporter: Sascha Szott
Assignee: Noble Paul
 Fix For: 1.5

 Attachments: SOLR-1598.patch, solr-1598.patch


 Due to a missing r.close() statement at the end of method 
 PlainTextEntityProcessor.nextRow(), IO exceptions (too many open files) can 
 occur when large numbers of files are processed.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: svn commit: r886127 - in /lucene/solr/trunk/src: solrj/org/apache/solr/client/solrj/beans/DocumentObjectBinder.java test/org/apache/solr/client/solrj/beans/TestDocumentObjectBinder.java

2009-12-11 Thread Noble Paul നോബിള്‍ नोब्ळ्
fixed. Thanks Hoss

On Fri, Dec 11, 2009 at 7:01 AM, Chris Hostetter
hossman_luc...@fucit.org wrote:

 Noble:
 1) you *have* to include a CHANGES.txt entry for every non-trivial commit
 ... if it has a Jira issue, there better be a CHANGES.txt entry, and the
 CHANGES.txt entry really needs to be in the same atomic commit as the rest
 of the changes, not a follow up commit, so code changes can be correlated
 to why the change was made.

 2) CHANGES.txt entries must cite teh person who contributed the patch.

 3) you have to be careful to cite the correct Jira issue when making
 commits -- this commit doesn't seem to have anything to do with SOLR-1516,
 i'm pretty sure it was for SOLR-1357

 ...with out all three of these things, it's nearly impossible to audit
 changes later and understand what they were, and who they came from.


 : Date: Wed, 02 Dec 2009 11:57:17 -
 : From: no...@apache.org
 : Reply-To: solr-dev@lucene.apache.org
 : To: solr-comm...@lucene.apache.org
 : Subject: svn commit: r886127 - in /lucene/solr/trunk/src:
 :     solrj/org/apache/solr/client/solrj/beans/DocumentObjectBinder.java
 :     test/org/apache/solr/client/solrj/beans/TestDocumentObjectBinder.java
 :
 : Author: noble
 : Date: Wed Dec  2 11:57:15 2009
 : New Revision: 886127
 :
 : URL: http://svn.apache.org/viewvc?rev=886127view=rev
 : Log:
 : SOLR-1516 SolrInputDocument cannot process dynamic fields
 :
 : Modified:
 :     
 lucene/solr/trunk/src/solrj/org/apache/solr/client/solrj/beans/DocumentObjectBinder.java
 :     
 lucene/solr/trunk/src/test/org/apache/solr/client/solrj/beans/TestDocumentObjectBinder.java
 :
 : Modified: 
 lucene/solr/trunk/src/solrj/org/apache/solr/client/solrj/beans/DocumentObjectBinder.java
 : URL: 
 http://svn.apache.org/viewvc/lucene/solr/trunk/src/solrj/org/apache/solr/client/solrj/beans/DocumentObjectBinder.java?rev=886127r1=886126r2=886127view=diff
 : 
 ==
 : --- 
 lucene/solr/trunk/src/solrj/org/apache/solr/client/solrj/beans/DocumentObjectBinder.java
  (original)
 : +++ 
 lucene/solr/trunk/src/solrj/org/apache/solr/client/solrj/beans/DocumentObjectBinder.java
  Wed Dec  2 11:57:15 2009
 : @@ -76,9 +76,19 @@
 :      }
 :
 :      SolrInputDocument doc = new SolrInputDocument();
 : -    for( DocField field : fields ) {
 : -      doc.setField( field.name, field.get( obj ), 1.0f );
 : -    }
 : +     for (DocField field : fields) {
 : +             if (field.dynamicFieldNamePatternMatcher != null
 : +                              field.get(obj) != null  
 field.isContainedInMap) {
 : +                     MapString, Object mapValue = (HashMapString, 
 Object) field
 : +                                     .get(obj);
 : +
 : +                     for (Map.EntryString, Object e : 
 mapValue.entrySet()) {
 : +                             doc.setField( e.getKey(), e.getValue(), 1.0f);
 : +                     }
 : +             } else {
 : +                     doc.setField(field.name, field.get(obj), 1.0f);
 : +             }
 : +     }
 :      return doc;
 :    }
 :
 :
 : Modified: 
 lucene/solr/trunk/src/test/org/apache/solr/client/solrj/beans/TestDocumentObjectBinder.java
 : URL: 
 http://svn.apache.org/viewvc/lucene/solr/trunk/src/test/org/apache/solr/client/solrj/beans/TestDocumentObjectBinder.java?rev=886127r1=886126r2=886127view=diff
 : 
 ==
 : --- 
 lucene/solr/trunk/src/test/org/apache/solr/client/solrj/beans/TestDocumentObjectBinder.java
  (original)
 : +++ 
 lucene/solr/trunk/src/test/org/apache/solr/client/solrj/beans/TestDocumentObjectBinder.java
  Wed Dec  2 11:57:15 2009
 : @@ -25,12 +25,14 @@
 :  import org.apache.solr.common.SolrInputDocument;
 :  import org.apache.solr.common.SolrInputField;
 :  import org.apache.solr.common.SolrDocument;
 : +import org.apache.solr.common.util.Hash;
 :  import org.apache.solr.common.util.NamedList;
 :  import org.junit.Assert;
 :
 :  import java.io.StringReader;
 :  import java.util.Arrays;
 :  import java.util.Date;
 : +import java.util.HashMap;
 :  import java.util.List;
 :  import java.util.Map;
 :
 : @@ -100,6 +102,15 @@
 :      item.inStock = false;
 :      item.categories =  new String[] { aaa, bbb, ccc };
 :      item.features = Arrays.asList( item.categories );
 : +    ListString supA =  Arrays.asList(  new String[] { supA1, supA2, 
 supA3 } );
 : +    ListString supB =  Arrays.asList(  new String[] { supB1, supB2, 
 supB3});
 : +    item.supplier = new HashMapString, ListString();
 : +    item.supplier.put(supplier_supA, supA);
 : +    item.supplier.put(supplier_supB, supB);
 : +
 : +    item.supplier_simple = new HashMapString, String();
 : +    item.supplier_simple.put(sup_simple_supA, supA_val);
 : +    item.supplier_simple.put(sup_simple_supB, supB_val);
 :
 :      DocumentObjectBinder binder = new DocumentObjectBinder();
 :      SolrInputDocument doc = binder.toSolrInputDocument( item 

[jira] Created: (SOLR-1644) Provide a clean way to keep flags and helper objects in ResponseBuilder

2009-12-11 Thread Shalin Shekhar Mangar (JIRA)
Provide a clean way to keep flags and helper objects in ResponseBuilder
---

 Key: SOLR-1644
 URL: https://issues.apache.org/jira/browse/SOLR-1644
 Project: Solr
  Issue Type: Improvement
  Components: search
Reporter: Shalin Shekhar Mangar
Assignee: Shalin Shekhar Mangar
 Fix For: 1.5


Many components such as StatsComponent, FacetComponent etc keep flags and 
helper objects in ResponseBuilder. Having to modify the ResponseBuilder for 
such things is a very kludgy solution.

Let us provide a clean way for components to keep arbitrary objects for the 
duration of a (distributed) search request.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1644) Provide a clean way to keep flags and helper objects in ResponseBuilder

2009-12-11 Thread Noble Paul (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12789226#action_12789226
 ] 

Noble Paul commented on SOLR-1644:
--

Let us keep a Map in RB so that each component can store/retrieve  any values 
it want. It can be considered as a session object .

 Provide a clean way to keep flags and helper objects in ResponseBuilder
 ---

 Key: SOLR-1644
 URL: https://issues.apache.org/jira/browse/SOLR-1644
 Project: Solr
  Issue Type: Improvement
  Components: search
Reporter: Shalin Shekhar Mangar
Assignee: Shalin Shekhar Mangar
 Fix For: 1.5


 Many components such as StatsComponent, FacetComponent etc keep flags and 
 helper objects in ResponseBuilder. Having to modify the ResponseBuilder for 
 such things is a very kludgy solution.
 Let us provide a clean way for components to keep arbitrary objects for the 
 duration of a (distributed) search request.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: SOLR-1131: disconnect between fields created by poly fields

2009-12-11 Thread Grant Ingersoll

On Dec 10, 2009, at 11:30 PM, Mattmann, Chris A (388J) wrote:

 Hi Yonik,
 
 While thinking about SOLR-1131, something important just came to mind. If we
 allow poly fields to add fields to the schema (be it via dynamic fields, or
 explicit field decls, either way), then we introduce a disconnect between
 the existing XML schema, and the runtime schema instance. To my knowledge
 there is no write-back/flush back for changes made to the schema at
 run-time, and what's loaded on startup (correct me if I'm wrong).
 
 There are other pros/cons to dynamically inserting field definitions,
 but this isn't one of them I think (provided that created fields are
 deterministically created at the same time as the poly field.)
 There's no need to write-back.
 
 I disagree. Without defining the DynamicField's offline by editing the
 schema.xml file before running SOLR, if fields are created on-the-fly by a
 PolyField during SOLR runtime, the schema.xml file is out of sync with the
 runtime IndexSchema instance.

By declaring the poly field, you are declaring the dynamic field.  I don't see 
why this leads to drift.  Sure, it is an abstraction and their are Lucene 
fields that will be created under the hood, but that is one of the primary 
features of Solr, it hides all that mess.  

Moreover, at startup, there is a check to make sure there are no name clashes.


 
 
 Also, I'm (ack!) now leaning towards
 dynamic fields as a flexible method to do this (so long as they are
 pre-declared in the schema.xml file explicitly)
 
 I think Grant  I are now on the same page about using dynamic fields.
 The remaining issue / option is if one must declare the dynamic
 fields in the schema, or if another field type (like a poly field, but
 it doesn't have to be) can insert the dynamic field definition if it
 doesn't exist.
 
 +1. My point is: one must declare the dynamic fields in the schema that the
 poly fields will use, ahead of time or you risk the drift I'm talking about.
 
 
 -- that way you don't have
 to create (n+10) * m fields for a 10-dimensional point that's stored as well
 as indexed.
 
 AFAIK, there are no options currently on the table that would create
 (or require) field definitions for every poly-type field used.
 
 So perhaps we are actually all on the same page now (with the open
 question of if we should explicit define the dynamic field definition
 for the point type that will end up being added to the example
 schema).
 
 My +1 for being explicit rather than implicit -- let's require the
 schema.xml editor person to define, ahead-of-time, the dynamicFiels that a
 polyField will use. This will keep schema.xml and the runtime IndexSchema in
 sync.
 
 Cheers,
 Chris
 
 ++
 Chris Mattmann, Ph.D.
 Senior Computer Scientist
 NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
 Office: 171-266B, Mailstop: 171-246
 Email: chris.mattm...@jpl.nasa.gov
 WWW:   http://sunset.usc.edu/~mattmann/
 ++
 Adjunct Assistant Professor, Computer Science Department University of
 Southern California, Los Angeles, CA 90089 USA
 ++
 
 

--
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using 
Solr/Lucene:
http://www.lucidimagination.com/search



Re: Merge changes with cloud branch?

2009-12-11 Thread Grant Ingersoll
I think it is the job of the cloud branch to merge to trunk when ready.  I 
don't feel strongly about it, but I suspect that applying patches to both all 
the time would just complicate things for the people working on the cloud 
branch, but I don't feel all that strong about it.

On Dec 11, 2009, at 2:24 AM, Shalin Shekhar Mangar wrote:

 Hey Guys,
 
 Do we need to merge changes to trunk to the cloud branch?
 
 -- 
 Regards,
 Shalin Shekhar Mangar.



Re: Merge changes with cloud branch?

2009-12-11 Thread Shalin Shekhar Mangar
On Fri, Dec 11, 2009 at 4:29 PM, Grant Ingersoll gsing...@apache.orgwrote:

 I think it is the job of the cloud branch to merge to trunk when ready.  I
 don't feel strongly about it, but I suspect that applying patches to both
 all the time would just complicate things for the people working on the
 cloud branch, but I don't feel all that strong about it.


Actually I was talking about us (working on trunk) merging our changes with
cloud. Yes, it will make working with trunk more complex and I'm happy not
to do it :)

I'd like to wish good luck to the person who will merge cloud back to the
ground :)

-- 
Regards,
Shalin Shekhar Mangar.


[jira] Commented: (SOLR-1644) Provide a clean way to keep flags and helper objects in ResponseBuilder

2009-12-11 Thread Uri Boness (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12789236#action_12789236
 ] 

Uri Boness commented on SOLR-1644:
--

Don't you have it already with the SolrQueryRequest.getContext()

 Provide a clean way to keep flags and helper objects in ResponseBuilder
 ---

 Key: SOLR-1644
 URL: https://issues.apache.org/jira/browse/SOLR-1644
 Project: Solr
  Issue Type: Improvement
  Components: search
Reporter: Shalin Shekhar Mangar
Assignee: Shalin Shekhar Mangar
 Fix For: 1.5


 Many components such as StatsComponent, FacetComponent etc keep flags and 
 helper objects in ResponseBuilder. Having to modify the ResponseBuilder for 
 such things is a very kludgy solution.
 Let us provide a clean way for components to keep arbitrary objects for the 
 duration of a (distributed) search request.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1131) Allow a single field type to index multiple fields

2009-12-11 Thread Grant Ingersoll (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12789240#action_12789240
 ] 

Grant Ingersoll commented on SOLR-1131:
---

{quote}
Quick comment based on a spot check of the changes to IndexSchema: rather than 
make polyField special somehow w.r.t IndexSchema, and add a 
FieldType.getPolyFieldNames, etc, I had been thinking more along the lines of 
having an IndexSchema.registerDynamicFieldDefinition - just like the existing 
registerDynamicCopyField. This would (optionally) allow any field type to add 
other definitions to the IndexSchema. I continue to think it would be good to 
stay away of special logic for polyfields in the IndexSchema.
{quote}

So, then the FieldType would register it's Dynamic Fields in it's own init() 
method by calling this method?  That can work.

bq. Why use Dynamic Field array

The array is sorted and array access is much faster and we often have to loop 
over it to look it up.

{quote}
CoordinateFieldType: why process  1 sub field types and then throw an 
exception at the end? I cleaned this up to throw the Exception when it occurs.
{quote}

OK.  Actually, this should just be in the derived class, as it may be the case 
some other CoordinateFieldType has multiple sub types.

{quote}
# parsePoint in DistanceUtils, why use ',' as the separator - use ' ' (at least 
conforms to georss point then). I guess because you are supporting 
N-dimensional points, right?
# parsePoint - instead of complicated isolation loops, why not just use trim()? 
I've taken that approach in the patch I've attached.
{quote}

I think comma makes sense.  As for the optimization stuff, I agree w/ Yonik, 
this is code that will be called a lot.

* when checking for isDuplicateDynField, if it is, nothing is done. 
Shouldn't this be where an exception is thrown or a message is logged? In the 
patch I'm attaching I took the log approach.

{quote}

It is logged, but for the poly fields, if the dyn field is already defined, 
that's just fine.


 Allow a single field type to index multiple fields
 --

 Key: SOLR-1131
 URL: https://issues.apache.org/jira/browse/SOLR-1131
 Project: Solr
  Issue Type: New Feature
  Components: Schema and Analysis
Reporter: Ryan McKinley
Assignee: Grant Ingersoll
 Fix For: 1.5

 Attachments: SOLR-1131-IndexMultipleFields.patch, 
 SOLR-1131.Mattmann.121009.patch.txt, SOLR-1131.patch, SOLR-1131.patch, 
 SOLR-1131.patch, SOLR-1131.patch, SOLR-1131.patch, SOLR-1131.patch


 In a few special cases, it makes sense for a single field (the concept) to 
 be indexed as a set of Fields (lucene Field).  Consider SOLR-773.  The 
 concept point may be best indexed in a variety of ways:
  * geohash (sincle lucene field)
  * lat field, lon field (two double fields)
  * cartesian tiers (a series of fields with tokens to say if it exists within 
 that region)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1131) Allow a single field type to index multiple fields

2009-12-11 Thread Grant Ingersoll (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12789241#action_12789241
 ] 

Grant Ingersoll commented on SOLR-1131:
---

I've got a patch almost ready that brings in the ValueSource stuff.

 Allow a single field type to index multiple fields
 --

 Key: SOLR-1131
 URL: https://issues.apache.org/jira/browse/SOLR-1131
 Project: Solr
  Issue Type: New Feature
  Components: Schema and Analysis
Reporter: Ryan McKinley
Assignee: Grant Ingersoll
 Fix For: 1.5

 Attachments: SOLR-1131-IndexMultipleFields.patch, 
 SOLR-1131.Mattmann.121009.patch.txt, SOLR-1131.patch, SOLR-1131.patch, 
 SOLR-1131.patch, SOLR-1131.patch, SOLR-1131.patch, SOLR-1131.patch


 In a few special cases, it makes sense for a single field (the concept) to 
 be indexed as a set of Fields (lucene Field).  Consider SOLR-773.  The 
 concept point may be best indexed in a variety of ways:
  * geohash (sincle lucene field)
  * lat field, lon field (two double fields)
  * cartesian tiers (a series of fields with tokens to say if it exists within 
 that region)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-1644) Provide a clean way to keep flags and helper objects in ResponseBuilder

2009-12-11 Thread Noble Paul (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1644?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Noble Paul updated SOLR-1644:
-

Attachment: SOLR-1644.patch

A new Map is kept is ResponseBuilder. It is also possible to use 
SolrQueryRequest.getContext(). Let us make that call.

Every component keeps a private Object as the key and stores values by that key 
in rb.store . No other component can access that value because the key is a 
private Object  (So no conflict).  The patch illustrates  how FacetComponent 
has eliminated the variables from ResponseBuilder. If it is fine , we can 
remove the rest of it too such as doStats,needDocList,needDocSet etc

 Provide a clean way to keep flags and helper objects in ResponseBuilder
 ---

 Key: SOLR-1644
 URL: https://issues.apache.org/jira/browse/SOLR-1644
 Project: Solr
  Issue Type: Improvement
  Components: search
Reporter: Shalin Shekhar Mangar
Assignee: Shalin Shekhar Mangar
 Fix For: 1.5

 Attachments: SOLR-1644.patch


 Many components such as StatsComponent, FacetComponent etc keep flags and 
 helper objects in ResponseBuilder. Having to modify the ResponseBuilder for 
 such things is a very kludgy solution.
 Let us provide a clean way for components to keep arbitrary objects for the 
 duration of a (distributed) search request.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1644) Provide a clean way to keep flags and helper objects in ResponseBuilder

2009-12-11 Thread Uri Boness (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12789259#action_12789259
 ] 

Uri Boness commented on SOLR-1644:
--

It should also be possible to share objects between components to eliminate 
duplicate computations (if one component can re-use a computation that was 
already done in another component). I guess this can be supported by publishing 
the key as a public static field.

 Provide a clean way to keep flags and helper objects in ResponseBuilder
 ---

 Key: SOLR-1644
 URL: https://issues.apache.org/jira/browse/SOLR-1644
 Project: Solr
  Issue Type: Improvement
  Components: search
Reporter: Shalin Shekhar Mangar
Assignee: Shalin Shekhar Mangar
 Fix For: 1.5

 Attachments: SOLR-1644.patch


 Many components such as StatsComponent, FacetComponent etc keep flags and 
 helper objects in ResponseBuilder. Having to modify the ResponseBuilder for 
 such things is a very kludgy solution.
 Let us provide a clean way for components to keep arbitrary objects for the 
 duration of a (distributed) search request.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-1644) Provide a clean way to keep flags and helper objects in ResponseBuilder

2009-12-11 Thread Noble Paul (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1644?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Noble Paul updated SOLR-1644:
-

Comment: was deleted

(was: 
bq.)

 Provide a clean way to keep flags and helper objects in ResponseBuilder
 ---

 Key: SOLR-1644
 URL: https://issues.apache.org/jira/browse/SOLR-1644
 Project: Solr
  Issue Type: Improvement
  Components: search
Reporter: Shalin Shekhar Mangar
Assignee: Shalin Shekhar Mangar
 Fix For: 1.5

 Attachments: SOLR-1644.patch


 Many components such as StatsComponent, FacetComponent etc keep flags and 
 helper objects in ResponseBuilder. Having to modify the ResponseBuilder for 
 such things is a very kludgy solution.
 Let us provide a clean way for components to keep arbitrary objects for the 
 duration of a (distributed) search request.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1644) Provide a clean way to keep flags and helper objects in ResponseBuilder

2009-12-11 Thread Noble Paul (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12789269#action_12789269
 ] 

Noble Paul commented on SOLR-1644:
--


bq.

 Provide a clean way to keep flags and helper objects in ResponseBuilder
 ---

 Key: SOLR-1644
 URL: https://issues.apache.org/jira/browse/SOLR-1644
 Project: Solr
  Issue Type: Improvement
  Components: search
Reporter: Shalin Shekhar Mangar
Assignee: Shalin Shekhar Mangar
 Fix For: 1.5

 Attachments: SOLR-1644.patch


 Many components such as StatsComponent, FacetComponent etc keep flags and 
 helper objects in ResponseBuilder. Having to modify the ResponseBuilder for 
 such things is a very kludgy solution.
 Let us provide a clean way for components to keep arbitrary objects for the 
 duration of a (distributed) search request.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1644) Provide a clean way to keep flags and helper objects in ResponseBuilder

2009-12-11 Thread Noble Paul (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12789271#action_12789271
 ] 

Noble Paul commented on SOLR-1644:
--

bq.I guess this can be supported by publishing the key as a public static field.

We should not keep it static. public final should be good enough

 Provide a clean way to keep flags and helper objects in ResponseBuilder
 ---

 Key: SOLR-1644
 URL: https://issues.apache.org/jira/browse/SOLR-1644
 Project: Solr
  Issue Type: Improvement
  Components: search
Reporter: Shalin Shekhar Mangar
Assignee: Shalin Shekhar Mangar
 Fix For: 1.5

 Attachments: SOLR-1644.patch


 Many components such as StatsComponent, FacetComponent etc keep flags and 
 helper objects in ResponseBuilder. Having to modify the ResponseBuilder for 
 such things is a very kludgy solution.
 Let us provide a clean way for components to keep arbitrary objects for the 
 duration of a (distributed) search request.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: Merge changes with cloud branch?

2009-12-11 Thread Mark Miller
I would say you *certainly* don't have to. But anyone that's willing  
to, +1 :)


Someone will have to do it. The merge back to trunk will be easy,  
because we will be periodicly merging trunk to cloud.


- Mark

http://www.lucidimagination.com (mobile)

On Dec 11, 2009, at 2:24 AM, Shalin Shekhar Mangar shalinman...@gmail.com 
 wrote:



Hey Guys,

Do we need to merge changes to trunk to the cloud branch?

--
Regards,
Shalin Shekhar Mangar.


[jira] Commented: (SOLR-1644) Provide a clean way to keep flags and helper objects in ResponseBuilder

2009-12-11 Thread Uri Boness (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12789290#action_12789290
 ] 

Uri Boness commented on SOLR-1644:
--

bq. We should not keep it static. public final should be good enough

Is there a special reason for this? Is the plan to have a KEY per component 
instance? If so, how would it be possible to refer to the key from other 
components?

This is what I had in mind - Assuming Component1 computed something and 
registered it in the store using KEY. Then Component2 can reuse this 
computation by accessing it as follows: 

{code}
Object someValue = rb.store.get(Component2.KEY);
// do something with someValue
{code}

 Provide a clean way to keep flags and helper objects in ResponseBuilder
 ---

 Key: SOLR-1644
 URL: https://issues.apache.org/jira/browse/SOLR-1644
 Project: Solr
  Issue Type: Improvement
  Components: search
Reporter: Shalin Shekhar Mangar
Assignee: Shalin Shekhar Mangar
 Fix For: 1.5

 Attachments: SOLR-1644.patch


 Many components such as StatsComponent, FacetComponent etc keep flags and 
 helper objects in ResponseBuilder. Having to modify the ResponseBuilder for 
 such things is a very kludgy solution.
 Let us provide a clean way for components to keep arbitrary objects for the 
 duration of a (distributed) search request.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-1358) Integration of Tika and DataImportHandler

2009-12-11 Thread Akshay K. Ukey (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akshay K. Ukey updated SOLR-1358:
-

Attachment: SOLR-1358.patch

Patch with test case, tika parser configurable via parser attribute for entity 
tag.

 Integration of Tika and DataImportHandler
 -

 Key: SOLR-1358
 URL: https://issues.apache.org/jira/browse/SOLR-1358
 Project: Solr
  Issue Type: New Feature
  Components: contrib - DataImportHandler
Reporter: Sascha Szott
Assignee: Noble Paul
 Attachments: SOLR-1358.patch, SOLR-1358.patch, SOLR-1358.patch, 
 SOLR-1358.patch


 At the moment, it's impossible to configure Solr such that it build up 
 documents by using data that comes from both pdf documents and database table 
 columns. Currently, to accomplish this task, it's up to the user to add some 
 preprocessing that converts pdf files into plain text files. Therefore, I 
 would like to see an integration of Solr Cell into DIH that makes those 
 preprocessing obsolete.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Issue Comment Edited: (SOLR-1358) Integration of Tika and DataImportHandler

2009-12-11 Thread Akshay K. Ukey (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12789300#action_12789300
 ] 

Akshay K. Ukey edited comment on SOLR-1358 at 12/11/09 1:23 PM:


Patch with test case and with tika parser configurable via parser attribute for 
entity tag.

  was (Author: akshay):
Patch with test case, tika parser configurable via parser attribute for 
entity tag.
  
 Integration of Tika and DataImportHandler
 -

 Key: SOLR-1358
 URL: https://issues.apache.org/jira/browse/SOLR-1358
 Project: Solr
  Issue Type: New Feature
  Components: contrib - DataImportHandler
Reporter: Sascha Szott
Assignee: Noble Paul
 Attachments: SOLR-1358.patch, SOLR-1358.patch, SOLR-1358.patch, 
 SOLR-1358.patch


 At the moment, it's impossible to configure Solr such that it build up 
 documents by using data that comes from both pdf documents and database table 
 columns. Currently, to accomplish this task, it's up to the user to add some 
 preprocessing that converts pdf files into plain text files. Therefore, I 
 would like to see an integration of Solr Cell into DIH that makes those 
 preprocessing obsolete.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1621) Allow current single core deployments to be specified by solr.xml

2009-12-11 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12789302#action_12789302
 ] 

Mark Miller commented on SOLR-1621:
---

I'd like to take a quick look - will report back.

 Allow current single core deployments to be specified by solr.xml
 -

 Key: SOLR-1621
 URL: https://issues.apache.org/jira/browse/SOLR-1621
 Project: Solr
  Issue Type: New Feature
Affects Versions: 1.5
Reporter: Noble Paul
 Fix For: 1.5

 Attachments: SOLR-1621.patch, SOLR-1621.patch, SOLR-1621.patch, 
 SOLR-1621.patch, SOLR-1621.patch, SOLR-1621.patch, SOLR-1621.patch


 supporting two different modes of deployments is turning out to be hard. This 
 leads to duplication of code. Moreover there is a lot of confusion on where 
 do we put common configuration. See the mail thread 
 http://markmail.org/message/3m3rqvp2ckausjnf

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (SOLR-1358) Integration of Tika and DataImportHandler

2009-12-11 Thread Noble Paul (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Noble Paul resolved SOLR-1358.
--

   Resolution: Fixed
Fix Version/s: 1.5

committed r889613

Thanks Akshay

 Integration of Tika and DataImportHandler
 -

 Key: SOLR-1358
 URL: https://issues.apache.org/jira/browse/SOLR-1358
 Project: Solr
  Issue Type: New Feature
  Components: contrib - DataImportHandler
Reporter: Sascha Szott
Assignee: Noble Paul
 Fix For: 1.5

 Attachments: SOLR-1358.patch, SOLR-1358.patch, SOLR-1358.patch, 
 SOLR-1358.patch


 At the moment, it's impossible to configure Solr such that it build up 
 documents by using data that comes from both pdf documents and database table 
 columns. Currently, to accomplish this task, it's up to the user to add some 
 preprocessing that converts pdf files into plain text files. Therefore, I 
 would like to see an integration of Solr Cell into DIH that makes those 
 preprocessing obsolete.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-1177) Distributed TermsComponent

2009-12-11 Thread Shalin Shekhar Mangar (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1177?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shalin Shekhar Mangar updated SOLR-1177:


Attachment: SOLR-1177.patch

# Based on Matt's patch
# Synced to trunk
# Uses BaseDistributedTestCase

All tests pass.

I had to change TermData#frequency to an int to match the output of distributed 
and non-distributed cases. It is theoretically possible to have the sum of 
frequencies from all shards to exceed size of an int but I don't think it is 
practical right now. The problem is that we represent frequency as int 
everywhere for non-distributed responses. If we want longs in distributed 
search responses then we must start using longs in non-distributed responses as 
well to maintain compatibility.

Matt -- There is an issue open for adding SolrJ support for TermsComponent - 
SOLR-1139. Is it possible to replace the TermsHelper and TermData classes by 
classes in SOLR-1139? I'd like to have the same classes parsing responses in 
Solrj and distributed search.

 Distributed TermsComponent
 --

 Key: SOLR-1177
 URL: https://issues.apache.org/jira/browse/SOLR-1177
 Project: Solr
  Issue Type: Improvement
Affects Versions: 1.4
Reporter: Matt Weber
Assignee: Shalin Shekhar Mangar
Priority: Minor
 Fix For: 1.5

 Attachments: SOLR-1177.patch, SOLR-1177.patch, SOLR-1177.patch, 
 TermsComponent.java, TermsComponent.patch


 TermsComponent should be distributed

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (SOLR-1139) SolrJ TermsComponent Query and Response Support

2009-12-11 Thread Shalin Shekhar Mangar (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1139?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shalin Shekhar Mangar reassigned SOLR-1139:
---

Assignee: Shalin Shekhar Mangar

 SolrJ TermsComponent Query and Response Support
 ---

 Key: SOLR-1139
 URL: https://issues.apache.org/jira/browse/SOLR-1139
 Project: Solr
  Issue Type: New Feature
  Components: clients - java
Affects Versions: 1.4
Reporter: Matt Weber
Assignee: Shalin Shekhar Mangar
Priority: Minor
 Attachments: SOLR-1139-WITH_SORT_SUPPORT.patch, SOLR-1139.patch, 
 SOLR-1139.patch, SOLR-1139.patch, SOLR-1139.patch, SOLR-1139.patch


 SolrJ should support the new TermsComponent that was introduced in Solr 1.4.  
 It should be able to:
 - set TermsComponent query parameters via SolrQuery
 - parse the TermsComponent response

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1131) Allow a single field type to index multiple fields

2009-12-11 Thread Chris A. Mattmann (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12789330#action_12789330
 ] 

Chris A. Mattmann commented on SOLR-1131:
-

Hi Grant:

bq.The array is sorted and array access is much faster and we often have to 
loop over it to look it up.

OK, fair enough.

bq. I think comma makes sense. 

OK, so you think it makes sense -- why? Because it's an N-dimensional array and 
spaces are less user friendly as Yonik put it? 

bq. As for the optimization stuff, I agree w/ Yonik, this is code that will be 
called a lot.

I'm wondering what there is to agree with since optimization was never 
defined. Are you talking speed? Are you talking memory efficiency? Code 
readability? Maintainability? Some combination of all of those? There are 
tradeoffs in everything. You could rewrite some of the provided java runtime 
methods to squeeze out extra performance, but what's the point of libraries or 
reusable functions then? The prior code that was in there basically rewrote 
exactly what split() and trim() do, so why not reuse what's there? If you throw 
up the performance flag, I would push back on readability and maintainability. 

bq. It is logged, but for the poly fields, if the dyn field is already defined, 
that's just fine.

Where is it logged? It wasn't in the most up-to-date patch, provided on 
2009-12-10 04:34 PM. Here was the code snipped that was there:

{code}
+//For each poly field, go through and add the appropriate Dynamic field
+  for (FieldType fieldType : polyFieldTypes.values()) {
+if (fieldType instanceof DelegatingFieldType){
+  ListFieldType subs = ((DelegatingFieldType) 
fieldType).getSubTypes();
+  if (subs.isEmpty() == false){
+//add a new dynamic field for each sub field type
+for (FieldType type : subs) {
+  log.debug(dynamic field creation for sub type:  + 
type.typeName);
+  SchemaField df = SchemaField.create(* + 
FieldType.POLY_FIELD_SEPARATOR + type.typeName,
+  type, type.args);//TODO: is type.args right?
+  if (isDuplicateDynField(dFields, df) == false){
+addDynamicFieldNoDupCheck(dFields, df);
+  }
   // NOTE: there is no else here, so I added an else and a log 
message
+}
+  }
// NOTE: there is no else here, so I added an else and a log message
+}
+  }
+
{code}

Here's what I added:

{code}
+//For each poly field, go through and add the appropriate Dynamic field
+  for (FieldType fieldType : polyFieldTypes.values()) {
+if (fieldType instanceof DelegatingFieldType){
+  ListFieldType subs = ((DelegatingFieldType) 
fieldType).getSubTypes();
+  if (!subs.isEmpty()){
+//add a new dynamic field for each sub field type
+for (FieldType type : subs) {
+  log.debug(dynamic field creation for sub type:  + 
type.typeName);
+  SchemaField df = SchemaField.create(* + 
FieldType.POLY_FIELD_SEPARATOR + type.typeName,
+  type, type.args);//TODO: is type.args right?
+  if (!isDuplicateDynField(dFields, df)){
+addDynamicFieldNoDupCheck(dFields, df);
+  }
+  else{
+log.debug(dynamic field creation avoided: dynamic field: 
[+df.getName()+]  +
+   already defined in the schema!);
+  }
+  
+}
+  }
+  else{
+log.debug(field type: [+fieldType.getTypeName()+]: no sub 
fields defined);
+  }
+}
+  }
+
{code}

Also, I get that it's fine for the poly fields if the dyn field is already 
defined (in fact, based on my mailing lists comments 
http://old.nabble.com/SOLR-1131:-disconnect-between-fields-created-by-poly-fields-td26736431.html
 I think this should _always_ be the case), but whether it's fine or not, it's 
still worthy to log to provide someone more information. 



 Allow a single field type to index multiple fields
 --

 Key: SOLR-1131
 URL: https://issues.apache.org/jira/browse/SOLR-1131
 Project: Solr
  Issue Type: New Feature
  Components: Schema and Analysis
Reporter: Ryan McKinley
Assignee: Grant Ingersoll
 Fix For: 1.5

 Attachments: SOLR-1131-IndexMultipleFields.patch, 
 SOLR-1131.Mattmann.121009.patch.txt, SOLR-1131.patch, SOLR-1131.patch, 
 SOLR-1131.patch, SOLR-1131.patch, SOLR-1131.patch, SOLR-1131.patch


 In a few special cases, it makes sense for a single field (the concept) to 
 be indexed as a set of Fields (lucene Field).  Consider SOLR-773.  The 
 concept point may be best indexed in a variety of ways:
  * geohash (sincle lucene field)
  * lat 

Re: SOLR-1131: disconnect between fields created by poly fields

2009-12-11 Thread Mattmann, Chris A (388J)
Hi Grant:

 By declaring the poly field, you are declaring the dynamic field.  I don't see
 why this leads to drift.  Sure, it is an abstraction and their are Lucene
 fields that will be created under the hood, but that is one of the primary
 features of Solr, it hides all that mess.

Actually if it was the case that poly field mapped to a single dynamic
field, then I would agree with you, but as is the discussion, poly field can
map to _many_ dynamic fields, which is where the drift occurs.

Also, the drift does not occur in the sense that there are more Lucene
fields created (fields of course, are Lucene concepts, which true, SOLR
should serve as an abstraction to) -- it's in the sense that there are more
dynamic fields (which are of course, a SOLR concept, and which I'm not
sure that SOLR should serve as an abstraction to, and if it does, then
schema.xml can be out of sync).

So, in other words, with the current thinking, you can have dynamic fields
that are created by a poly field that have could have no corresponding entry
in schema.xml. That's my point. I think we should enforce that there be an
entry in schema.xml to prevent this drift.

Cheers,
Chris

++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: chris.mattm...@jpl.nasa.gov
WWW:   http://sunset.usc.edu/~mattmann/
++
Adjunct Assistant Professor, Computer Science Department University of
Southern California, Los Angeles, CA 90089 USA
++




[jira] Commented: (SOLR-1177) Distributed TermsComponent

2009-12-11 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12789334#action_12789334
 ] 

Yonik Seeley commented on SOLR-1177:


The facet component internally uses long to add up distributed facet counts, 
and then uses this code:

{code}
// use int tags for smaller facet counts (better back compatibility)
  private Number num(long val) {
   if (val  Integer.MAX_VALUE) return (int)val;
   else return val;
  }
{code}

Yes, it's not ideal to switch from int to long in a running application, 
but I think it's better than failing or overflowing the int.
Client code in SolrJ should be written to handle either via 
((Number)x).longValue()

 Distributed TermsComponent
 --

 Key: SOLR-1177
 URL: https://issues.apache.org/jira/browse/SOLR-1177
 Project: Solr
  Issue Type: Improvement
Affects Versions: 1.4
Reporter: Matt Weber
Assignee: Shalin Shekhar Mangar
Priority: Minor
 Fix For: 1.5

 Attachments: SOLR-1177.patch, SOLR-1177.patch, SOLR-1177.patch, 
 TermsComponent.java, TermsComponent.patch


 TermsComponent should be distributed

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: SOLR-1131: disconnect between fields created by poly fields

2009-12-11 Thread Yonik Seeley
On Fri, Dec 11, 2009 at 9:53 AM, Mattmann, Chris A (388J)
chris.a.mattm...@jpl.nasa.gov wrote:
 Actually if it was the case that poly field mapped to a single dynamic
 field, then I would agree with you, but as is the discussion, poly field can
 map to _many_ dynamic fields, which is where the drift occurs.

I'm not sure if we're using the exact same terminology, but it's well
defined how many dynamic fields would be created by the basic point
class (exactly one) *if* we decide to go that route and use that
option.  Can you give an examples of what you mean?  Is your objection
to this point class registering a single dynamic field, or are you
talking about a hypothetical case?

-Yonik
http://www.lucidimagination.com


[jira] Commented: (SOLR-1177) Distributed TermsComponent

2009-12-11 Thread Matt Weber (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12789336#action_12789336
 ] 

Matt Weber commented on SOLR-1177:
--

Thanks for the update Yonik!  I will see if I can get this and SOLR-1139 using 
the same classes.

 Distributed TermsComponent
 --

 Key: SOLR-1177
 URL: https://issues.apache.org/jira/browse/SOLR-1177
 Project: Solr
  Issue Type: Improvement
Affects Versions: 1.4
Reporter: Matt Weber
Assignee: Shalin Shekhar Mangar
Priority: Minor
 Fix For: 1.5

 Attachments: SOLR-1177.patch, SOLR-1177.patch, SOLR-1177.patch, 
 TermsComponent.java, TermsComponent.patch


 TermsComponent should be distributed

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1131) Allow a single field type to index multiple fields

2009-12-11 Thread Grant Ingersoll (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12789339#action_12789339
 ] 

Grant Ingersoll commented on SOLR-1131:
---

{quote}
I'm wondering what there is to agree with since optimization was never 
defined. Are you talking speed? Are you talking memory efficiency? Code 
readability? Maintainability? Some combination of all of those?
{quote}

Speed and memory.  

As for logging, that code is all going away in the next patch, I think

 Allow a single field type to index multiple fields
 --

 Key: SOLR-1131
 URL: https://issues.apache.org/jira/browse/SOLR-1131
 Project: Solr
  Issue Type: New Feature
  Components: Schema and Analysis
Reporter: Ryan McKinley
Assignee: Grant Ingersoll
 Fix For: 1.5

 Attachments: SOLR-1131-IndexMultipleFields.patch, 
 SOLR-1131.Mattmann.121009.patch.txt, SOLR-1131.patch, SOLR-1131.patch, 
 SOLR-1131.patch, SOLR-1131.patch, SOLR-1131.patch, SOLR-1131.patch


 In a few special cases, it makes sense for a single field (the concept) to 
 be indexed as a set of Fields (lucene Field).  Consider SOLR-773.  The 
 concept point may be best indexed in a variety of ways:
  * geohash (sincle lucene field)
  * lat field, lon field (two double fields)
  * cartesian tiers (a series of fields with tokens to say if it exists within 
 that region)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1131) Allow a single field type to index multiple fields

2009-12-11 Thread Chris A. Mattmann (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12789342#action_12789342
 ] 

Chris A. Mattmann commented on SOLR-1131:
-

bq. Speed and memory. 

Unfortunately, it's at the cost of readability and maintainability. 

bq. As for logging, that code is all going away in the next patch, I think

I'll take a look when you throw it up, thanks.

Cheers,
Chris



 Allow a single field type to index multiple fields
 --

 Key: SOLR-1131
 URL: https://issues.apache.org/jira/browse/SOLR-1131
 Project: Solr
  Issue Type: New Feature
  Components: Schema and Analysis
Reporter: Ryan McKinley
Assignee: Grant Ingersoll
 Fix For: 1.5

 Attachments: SOLR-1131-IndexMultipleFields.patch, 
 SOLR-1131.Mattmann.121009.patch.txt, SOLR-1131.patch, SOLR-1131.patch, 
 SOLR-1131.patch, SOLR-1131.patch, SOLR-1131.patch, SOLR-1131.patch


 In a few special cases, it makes sense for a single field (the concept) to 
 be indexed as a set of Fields (lucene Field).  Consider SOLR-773.  The 
 concept point may be best indexed in a variety of ways:
  * geohash (sincle lucene field)
  * lat field, lon field (two double fields)
  * cartesian tiers (a series of fields with tokens to say if it exists within 
 that region)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1131) Allow a single field type to index multiple fields

2009-12-11 Thread Grant Ingersoll (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12789351#action_12789351
 ] 

Grant Ingersoll commented on SOLR-1131:
---

bq. Unfortunately, it's at the cost of readability and maintainability. 

Maybe.  It took me all of 30 seconds to figure out what it was doing.  I'll put 
some comments on it.  While readability is important, Solr's goal is not to 
make a product that a CS101 grad can read, it's too build a blazing fast search 
server.  That call could hit millions of times when indexing points.

 Allow a single field type to index multiple fields
 --

 Key: SOLR-1131
 URL: https://issues.apache.org/jira/browse/SOLR-1131
 Project: Solr
  Issue Type: New Feature
  Components: Schema and Analysis
Reporter: Ryan McKinley
Assignee: Grant Ingersoll
 Fix For: 1.5

 Attachments: SOLR-1131-IndexMultipleFields.patch, 
 SOLR-1131.Mattmann.121009.patch.txt, SOLR-1131.patch, SOLR-1131.patch, 
 SOLR-1131.patch, SOLR-1131.patch, SOLR-1131.patch, SOLR-1131.patch


 In a few special cases, it makes sense for a single field (the concept) to 
 be indexed as a set of Fields (lucene Field).  Consider SOLR-773.  The 
 concept point may be best indexed in a variety of ways:
  * geohash (sincle lucene field)
  * lat field, lon field (two double fields)
  * cartesian tiers (a series of fields with tokens to say if it exists within 
 that region)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1131) Allow a single field type to index multiple fields

2009-12-11 Thread Chris A. Mattmann (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12789354#action_12789354
 ] 

Chris A. Mattmann commented on SOLR-1131:
-

bq. Maybe. It took me all of 30 seconds to figure out what it was doing. I'll 
put some comments on it. While readability is important, Solr's goal is not to 
make a product that a CS101 grad can read, it's too build a blazing fast search 
server. That call could hit millions of times when indexing points.

Sure, maybe the goal isn't for a CS101 grad to be able to easily understand the 
code, but SOLR's goal should include being open to ideas from the community 
that involve reusing standard Java library functions and not rewriting based on 
perception of speed and memory without empirical proof. Where's the evidence 
that using things like split and trim are so much more costly than rewriting 
those basic capabilities that they warrant not using them?

 Allow a single field type to index multiple fields
 --

 Key: SOLR-1131
 URL: https://issues.apache.org/jira/browse/SOLR-1131
 Project: Solr
  Issue Type: New Feature
  Components: Schema and Analysis
Reporter: Ryan McKinley
Assignee: Grant Ingersoll
 Fix For: 1.5

 Attachments: SOLR-1131-IndexMultipleFields.patch, 
 SOLR-1131.Mattmann.121009.patch.txt, SOLR-1131.patch, SOLR-1131.patch, 
 SOLR-1131.patch, SOLR-1131.patch, SOLR-1131.patch, SOLR-1131.patch


 In a few special cases, it makes sense for a single field (the concept) to 
 be indexed as a set of Fields (lucene Field).  Consider SOLR-773.  The 
 concept point may be best indexed in a variety of ways:
  * geohash (sincle lucene field)
  * lat field, lon field (two double fields)
  * cartesian tiers (a series of fields with tokens to say if it exists within 
 that region)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1632) Distributed IDF

2009-12-11 Thread Otis Gospodnetic (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12789379#action_12789379
 ] 

Otis Gospodnetic commented on SOLR-1632:


I didn't look a the patch, but from your comments it looks like you already 
have that 1 merged big idf map, which is really what I was aiming at, so 
that's good!

I was just thinking that this map (file) would be periodically updated and 
pushed to slaves, so that slaves can compute the global IDF *locally* instead 
of any kind of extra requests.


 Distributed IDF
 ---

 Key: SOLR-1632
 URL: https://issues.apache.org/jira/browse/SOLR-1632
 Project: Solr
  Issue Type: New Feature
  Components: search
Affects Versions: 1.5
Reporter: Andrzej Bialecki 
 Attachments: distrib.patch


 Distributed IDF is a valuable enhancement for distributed search across 
 non-uniform shards. This issue tracks the proposed implementation of an API 
 to support this functionality in Solr.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-236) Field collapsing

2009-12-11 Thread Marc Menghin (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12789381#action_12789381
 ] 

Marc Menghin commented on SOLR-236:
---

Hi,

new to Solr, so sorry for my likely still incomplete setup. I got everything 
from Solr SVN and applied the Patch (field-collapse-5.patch2009-12-08 
09:43 PM). As I search I get a NPE because I seem to not have a cache for the 
collapsing. It wants to add a entry to the cache but can't. There is none at 
that time, which it checks before in AbstractDocumentCollapser.collapse but 
still wants to use it later in 
AbstractDocumentCollapser.createDocumentCollapseResult. I suppose thats a bug? 
Or is something wrong on my side?

Exception I get is:

java.lang.NullPointerException
at 
org.apache.solr.search.fieldcollapse.AbstractDocumentCollapser.createDocumentCollapseResult(AbstractDocumentCollapser.java:278)
at 
org.apache.solr.search.fieldcollapse.AbstractDocumentCollapser.executeCollapse(AbstractDocumentCollapser.java:249)
at 
org.apache.solr.search.fieldcollapse.AbstractDocumentCollapser.collapse(AbstractDocumentCollapser.java:172)
at 
org.apache.solr.handler.component.CollapseComponent.doProcess(CollapseComponent.java:173)
at 
org.apache.solr.handler.component.CollapseComponent.process(CollapseComponent.java:127)
at 
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:195)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316)

I fixed it locally by only adding something to the cache if there is one 
(fieldCollapseCache != null). But I'm not very into the code so not sure if 
thats a good/right way to fix it.

Thanks,
Marc

 Field collapsing
 

 Key: SOLR-236
 URL: https://issues.apache.org/jira/browse/SOLR-236
 Project: Solr
  Issue Type: New Feature
  Components: search
Affects Versions: 1.3
Reporter: Emmanuel Keller
 Fix For: 1.5

 Attachments: collapsing-patch-to-1.3.0-dieter.patch, 
 collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, 
 collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch, 
 field-collapse-4-with-solrj.patch, field-collapse-5.patch, 
 field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, 
 field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, 
 field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, 
 field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, 
 field-collapse-5.patch, field-collapse-solr-236-2.patch, 
 field-collapse-solr-236.patch, field-collapsing-extended-592129.patch, 
 field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, 
 field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, 
 field_collapsing_dsteigerwald.diff, quasidistributed.additional.patch, 
 SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, 
 SOLR-236-FieldCollapsing.patch, solr-236.patch, SOLR-236_collapsing.patch, 
 SOLR-236_collapsing.patch


 This patch include a new feature called Field collapsing.
 Used in order to collapse a group of results with similar value for a given 
 field to a single entry in the result set. Site collapsing is a special case 
 of this, where all results for a given web site is collapsed into one or two 
 entries in the result set, typically with an associated more documents from 
 this site link. See also Duplicate detection.
 http://www.fastsearch.com/glossary.aspx?m=48amid=299
 The implementation add 3 new query parameters (SolrParams):
 collapse.field to choose the field used to group results
 collapse.type normal (default value) or adjacent
 collapse.max to select how many continuous results are allowed before 
 collapsing
 TODO (in progress):
 - More documentation (on source code)
 - Test cases
 Two patches:
 - field_collapsing.patch for current development version
 - field_collapsing_1.1.0.patch for Solr-1.1.0
 P.S.: Feedback and misspelling correction are welcome ;-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-1131) Allow a single field type to index multiple fields

2009-12-11 Thread Grant Ingersoll (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Ingersoll updated SOLR-1131:
--

Attachment: SOLR-1131.patch

OK, this is getting a lot closer to ready to commit.

Changes:
# Introduced a MultiValueSource - ValueSource that abstractly represents 
ValueSources for poly fields, and other things.  
# Introduced PointValueSource - point(x,y,z) - a MultiValueSource that wraps 
other value sources (could be called something else, I suppose)
# Implemented PointTypeValueSource to represent ValueSource for the PointType 
class. 
# Hooked in multivalue callbacks to DocValues.  In addition to making functions 
work with Points (et. al) it should be possible to write functions that work on 
multivalued fields, but I did not undertake this work.
# Add in SchemaAware callback mechanism so that Field Types and other schema 
stuff can register dynamic fields, etc. after the schema has been created
# Updated the example to have spatial information in the docs, etc.  See 
http://wiki.apache.org/solr/SpatialSearch
# Modified the distance functions to work with MultiValueSources
# cleaned up the tests
# Incorporated various comments from Chris and Yonik.


 Allow a single field type to index multiple fields
 --

 Key: SOLR-1131
 URL: https://issues.apache.org/jira/browse/SOLR-1131
 Project: Solr
  Issue Type: New Feature
  Components: Schema and Analysis
Reporter: Ryan McKinley
Assignee: Grant Ingersoll
 Fix For: 1.5

 Attachments: SOLR-1131-IndexMultipleFields.patch, 
 SOLR-1131.Mattmann.121009.patch.txt, SOLR-1131.patch, SOLR-1131.patch, 
 SOLR-1131.patch, SOLR-1131.patch, SOLR-1131.patch, SOLR-1131.patch, 
 SOLR-1131.patch


 In a few special cases, it makes sense for a single field (the concept) to 
 be indexed as a set of Fields (lucene Field).  Consider SOLR-773.  The 
 concept point may be best indexed in a variety of ways:
  * geohash (sincle lucene field)
  * lat field, lon field (two double fields)
  * cartesian tiers (a series of fields with tokens to say if it exists within 
 that region)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1316) Create autosuggest component

2009-12-11 Thread Ankul Garg (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12789418#action_12789418
 ] 

Ankul Garg commented on SOLR-1316:
--

Shouldn't we be creating a separate AutoSuggestComponent like the 
SpellCheckComponent havings its own prepare, process and inform functions?

 Create autosuggest component
 

 Key: SOLR-1316
 URL: https://issues.apache.org/jira/browse/SOLR-1316
 Project: Solr
  Issue Type: New Feature
  Components: search
Affects Versions: 1.4
Reporter: Jason Rutherglen
Assignee: Shalin Shekhar Mangar
Priority: Minor
 Fix For: 1.5

 Attachments: suggest.patch, suggest.patch, TST.zip

   Original Estimate: 96h
  Remaining Estimate: 96h

 Autosuggest is a common search function that can be integrated
 into Solr as a SearchComponent. Our first implementation will
 use the TernaryTree found in Lucene contrib. 
 * Enable creation of the dictionary from the index or via Solr's
 RPC mechanism
 * What types of parameters and settings are desirable?
 * Hopefully in the future we can include user click through
 rates to boost those terms/phrases higher

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Fwd: [ANNOUNCEMENT] HttpComponents HttpClient 4.0.1 (GA) Released

2009-12-11 Thread Erik Hatcher

FYI...

Begin forwarded message:


From: Oleg Kalnichevski ol...@apache.org
Date: December 11, 2009 2:20:50 PM GMT+01:00
To: annou...@apache.org, priv...@hc.apache.org, d...@hc.apache.org,  
httpclient-us...@hc.apache.org
Subject: [ANNOUNCEMENT] HttpComponents HttpClient 4.0.1 (GA) Released
Reply-To: HttpComponents Project d...@hc.apache.org

HttpClient 4.0.1 is a bug fix release that addresses a number of  
issues discovered since the previous stable release. None of the  
fixed bugs is considered critical. Most notably this release  
eliminates eliminates dependency on JCIP annotations.


This release is also expected to improve performance by 5 to 10% due  
to elimination of unnecessary Log object lookups by short-lived  
components.


---
Download -
http://hc.apache.org/downloads.cgi

Release notes -
http://www.apache.org/dist/httpcomponents/httpclient/RELEASE_NOTES.txt 



HttpComponents site -
http://hc.apache.org/

Please note HttpClient 4.0 currently provides only limited support for
NTLM authentication. For details please refer to
http://hc.apache.org/httpcomponents-client/ntlm.html
---

About Apache HttpClient

Although the java.net package provides basic functionality for  
accessing

resources via HTTP, it doesn't provide the full flexibility or
functionality needed by many applications. HttpClient seeks to fill  
this

void by providing an efficient, up-to-date, and feature-rich package
implementing the client side of the most recent HTTP standards and
recommendations.

Designed for extension while providing robust support for the base  
HTTP

protocol, HttpClient may be of interest to anyone building HTTP-aware
client applications such as web browsers, web service clients, or
systems that leverage or extend the HTTP protocol for distributed
communication.





Re: [ANNOUNCEMENT] HttpComponents HttpClient 4.0.1 (GA) Released

2009-12-11 Thread Grant Ingersoll
There are, in fact, updates to many of Solr's dependencies that we should 
consider.  I like the sound of 5-10% perf. improvement in HttpClient...

-Grant

On Dec 11, 2009, at 12:53 PM, Erik Hatcher wrote:

 FYI...
 
 Begin forwarded message:
 
 From: Oleg Kalnichevski ol...@apache.org
 Date: December 11, 2009 2:20:50 PM GMT+01:00
 To: annou...@apache.org, priv...@hc.apache.org, d...@hc.apache.org,  
 httpclient-us...@hc.apache.org
 Subject: [ANNOUNCEMENT] HttpComponents HttpClient 4.0.1 (GA) Released
 Reply-To: HttpComponents Project d...@hc.apache.org
 
 HttpClient 4.0.1 is a bug fix release that addresses a number of issues 
 discovered since the previous stable release. None of the fixed bugs is 
 considered critical. Most notably this release eliminates eliminates 
 dependency on JCIP annotations.
 
 This release is also expected to improve performance by 5 to 10% due to 
 elimination of unnecessary Log object lookups by short-lived components.
 
 ---
 Download -
 http://hc.apache.org/downloads.cgi
 
 Release notes -
 http://www.apache.org/dist/httpcomponents/httpclient/RELEASE_NOTES.txt
 
 HttpComponents site -
 http://hc.apache.org/
 
 Please note HttpClient 4.0 currently provides only limited support for
 NTLM authentication. For details please refer to
 http://hc.apache.org/httpcomponents-client/ntlm.html
 ---
 
 About Apache HttpClient
 
 Although the java.net package provides basic functionality for accessing
 resources via HTTP, it doesn't provide the full flexibility or
 functionality needed by many applications. HttpClient seeks to fill this
 void by providing an efficient, up-to-date, and feature-rich package
 implementing the client side of the most recent HTTP standards and
 recommendations.
 
 Designed for extension while providing robust support for the base HTTP
 protocol, HttpClient may be of interest to anyone building HTTP-aware
 client applications such as web browsers, web service clients, or
 systems that leverage or extend the HTTP protocol for distributed
 communication.
 
 

--
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using 
Solr/Lucene:
http://www.lucidimagination.com/search



[jira] Commented: (SOLR-1621) Allow current single core deployments to be specified by solr.xml

2009-12-11 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12789439#action_12789439
 ] 

Mark Miller commented on SOLR-1621:
---

You've deprecated handleAliasAction, but you have also made it so its not 
called with the default handleRequestBody.

I think its safer to just remove handleAliasAction - someone that overrode 
handleAliasAction but kept the default handleRequestBody would be in a for a 
silent nasty surprise anyway. Better to get a compile time error.

 Allow current single core deployments to be specified by solr.xml
 -

 Key: SOLR-1621
 URL: https://issues.apache.org/jira/browse/SOLR-1621
 Project: Solr
  Issue Type: New Feature
Affects Versions: 1.5
Reporter: Noble Paul
 Fix For: 1.5

 Attachments: SOLR-1621.patch, SOLR-1621.patch, SOLR-1621.patch, 
 SOLR-1621.patch, SOLR-1621.patch, SOLR-1621.patch, SOLR-1621.patch


 supporting two different modes of deployments is turning out to be hard. This 
 leads to duplication of code. Moreover there is a lot of confusion on where 
 do we put common configuration. See the mail thread 
 http://markmail.org/message/3m3rqvp2ckausjnf

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1621) Allow current single core deployments to be specified by solr.xml

2009-12-11 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12789443#action_12789443
 ] 

Mark Miller commented on SOLR-1621:
---

Also, its confusing the way the Alias test is commented out - shouldn't we just 
remove to be less confusing? It will still be in subversion if we decide we 
want to bring it back.

 Allow current single core deployments to be specified by solr.xml
 -

 Key: SOLR-1621
 URL: https://issues.apache.org/jira/browse/SOLR-1621
 Project: Solr
  Issue Type: New Feature
Affects Versions: 1.5
Reporter: Noble Paul
 Fix For: 1.5

 Attachments: SOLR-1621.patch, SOLR-1621.patch, SOLR-1621.patch, 
 SOLR-1621.patch, SOLR-1621.patch, SOLR-1621.patch, SOLR-1621.patch


 supporting two different modes of deployments is turning out to be hard. This 
 leads to duplication of code. Moreover there is a lot of confusion on where 
 do we put common configuration. See the mail thread 
 http://markmail.org/message/3m3rqvp2ckausjnf

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1621) Allow current single core deployments to be specified by solr.xml

2009-12-11 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12789447#action_12789447
 ] 

Mark Miller commented on SOLR-1621:
---

Lastly : there are a couple little unrelated changes in this patch.

1. The change to the import statement in CoreDescriptor, but no other changes 
to the file.
2. The change in index.jsp from self closing tags to closing link tags - this 
has no apparent purpose, so I think it shouldn't be part of this commit.

 Allow current single core deployments to be specified by solr.xml
 -

 Key: SOLR-1621
 URL: https://issues.apache.org/jira/browse/SOLR-1621
 Project: Solr
  Issue Type: New Feature
Affects Versions: 1.5
Reporter: Noble Paul
 Fix For: 1.5

 Attachments: SOLR-1621.patch, SOLR-1621.patch, SOLR-1621.patch, 
 SOLR-1621.patch, SOLR-1621.patch, SOLR-1621.patch, SOLR-1621.patch


 supporting two different modes of deployments is turning out to be hard. This 
 leads to duplication of code. Moreover there is a lot of confusion on where 
 do we put common configuration. See the mail thread 
 http://markmail.org/message/3m3rqvp2ckausjnf

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-1621) Allow current single core deployments to be specified by solr.xml

2009-12-11 Thread Mark Miller (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1621?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Miller updated SOLR-1621:
--

Attachment: SOLR-1621.patch

You mark setSolrConfigFilename as deprecated, but it simply doesn't work 
anymore either - thats not really deprecation. I feel that deprecating 
solrconfig-filename should be its own issue so thats a more clear entry in 
CHANGES. And it should prob be deprecated rather than turned off out of the 
blue. I don't suppose its a big deal either way though.

You have also changed CoreContainer#getCores - but that change is unrelated to 
this patch. I really feal an issue should just stick to the issue - it makes 
commits more clear. That change (which isn't back compat anyway), should be its 
own commit.

This latest update removes a bunch of the unrelated changes.

The patch is still unfinished though - you removed my fix to use the solrconfig 
override without replacing it with an alternative fix. As a result, some tests 
don't use the right config now, and in particular, NoCacheHeaderTest is back to 
failing.

 Allow current single core deployments to be specified by solr.xml
 -

 Key: SOLR-1621
 URL: https://issues.apache.org/jira/browse/SOLR-1621
 Project: Solr
  Issue Type: New Feature
Affects Versions: 1.5
Reporter: Noble Paul
 Fix For: 1.5

 Attachments: SOLR-1621.patch, SOLR-1621.patch, SOLR-1621.patch, 
 SOLR-1621.patch, SOLR-1621.patch, SOLR-1621.patch, SOLR-1621.patch, 
 SOLR-1621.patch


 supporting two different modes of deployments is turning out to be hard. This 
 leads to duplication of code. Moreover there is a lot of confusion on where 
 do we put common configuration. See the mail thread 
 http://markmail.org/message/3m3rqvp2ckausjnf

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-1621) Allow current single core deployments to be specified by solr.xml

2009-12-11 Thread Mark Miller (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1621?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Miller updated SOLR-1621:
--

Attachment: SOLR-1621.patch

This patch reinstates a fix for the SolrDispatcher config name override so that 
all tests pass again.
It also changes DEFAULT_CORE to use a final static constant rather than be 
repeated.


Personally, I also almost think that removing the alias feature should be its 
own issue.

 Allow current single core deployments to be specified by solr.xml
 -

 Key: SOLR-1621
 URL: https://issues.apache.org/jira/browse/SOLR-1621
 Project: Solr
  Issue Type: New Feature
Affects Versions: 1.5
Reporter: Noble Paul
 Fix For: 1.5

 Attachments: SOLR-1621.patch, SOLR-1621.patch, SOLR-1621.patch, 
 SOLR-1621.patch, SOLR-1621.patch, SOLR-1621.patch, SOLR-1621.patch, 
 SOLR-1621.patch, SOLR-1621.patch


 supporting two different modes of deployments is turning out to be hard. This 
 leads to duplication of code. Moreover there is a lot of confusion on where 
 do we put common configuration. See the mail thread 
 http://markmail.org/message/3m3rqvp2ckausjnf

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Issue Comment Edited: (SOLR-1621) Allow current single core deployments to be specified by solr.xml

2009-12-11 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12789467#action_12789467
 ] 

Mark Miller edited comment on SOLR-1621 at 12/11/09 7:13 PM:
-

This patch reinstates a fix for the SolrDispatcher config name override so that 
all tests pass again.
It also changes DEFAULT_CORE to use a final static constant rather than be 
repeated.


Personally, I also almost think that removing the alias feature should be its 
own issue.

(*EDIT* okay, I see you made that a sub-issue - nice)

  was (Author: markrmil...@gmail.com):
This patch reinstates a fix for the SolrDispatcher config name override so 
that all tests pass again.
It also changes DEFAULT_CORE to use a final static constant rather than be 
repeated.


Personally, I also almost think that removing the alias feature should be its 
own issue.
  
 Allow current single core deployments to be specified by solr.xml
 -

 Key: SOLR-1621
 URL: https://issues.apache.org/jira/browse/SOLR-1621
 Project: Solr
  Issue Type: New Feature
Affects Versions: 1.5
Reporter: Noble Paul
 Fix For: 1.5

 Attachments: SOLR-1621.patch, SOLR-1621.patch, SOLR-1621.patch, 
 SOLR-1621.patch, SOLR-1621.patch, SOLR-1621.patch, SOLR-1621.patch, 
 SOLR-1621.patch, SOLR-1621.patch


 supporting two different modes of deployments is turning out to be hard. This 
 leads to duplication of code. Moreover there is a lot of confusion on where 
 do we put common configuration. See the mail thread 
 http://markmail.org/message/3m3rqvp2ckausjnf

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-1131) Allow a single field type to index multiple fields

2009-12-11 Thread Chris A. Mattmann (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris A. Mattmann updated SOLR-1131:


Attachment: SOLR-1131.Mattmann.121109.patch.txt

OK, this is getting a lot closer to ready to commit.

Changes:
Hi All,

Updated patch:

{quote}
# Introduced a MultiValueSource - ValueSource that abstractly represents 
ValueSources for poly fields, and other things.
{quote}

I added javadoc to this and the ASF license header.

{quote}
# Introduced PointValueSource - point(x,y,z) - a MultiValueSource that wraps 
other value sources (could be called something else, I suppose)
{quote}

I put the ASF header before the package decl, to be consistent with the other 
SOLR java files.

{quote}
# Add in SchemaAware callback mechanism so that Field Types and other schema 
stuff can register dynamic fields, etc. after the schema has been created
{quote}

Added more javadoc here, and ASF license.

{quote}
# Incorporated various comments from Chris and Yonik.
{quote}

Thanks, I appreciate it. I'm still -1 on the way this patch deals with the 
optimization issue. I'd like to see evidence that it makes sense to not use 
split and trim.

Cheers,
Chris



 Allow a single field type to index multiple fields
 --

 Key: SOLR-1131
 URL: https://issues.apache.org/jira/browse/SOLR-1131
 Project: Solr
  Issue Type: New Feature
  Components: Schema and Analysis
Reporter: Ryan McKinley
Assignee: Grant Ingersoll
 Fix For: 1.5

 Attachments: SOLR-1131-IndexMultipleFields.patch, 
 SOLR-1131.Mattmann.121009.patch.txt, SOLR-1131.Mattmann.121109.patch.txt, 
 SOLR-1131.patch, SOLR-1131.patch, SOLR-1131.patch, SOLR-1131.patch, 
 SOLR-1131.patch, SOLR-1131.patch, SOLR-1131.patch


 In a few special cases, it makes sense for a single field (the concept) to 
 be indexed as a set of Fields (lucene Field).  Consider SOLR-773.  The 
 concept point may be best indexed in a variety of ways:
  * geohash (sincle lucene field)
  * lat field, lon field (two double fields)
  * cartesian tiers (a series of fields with tokens to say if it exists within 
 that region)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Issue Comment Edited: (SOLR-1131) Allow a single field type to index multiple fields

2009-12-11 Thread Chris A. Mattmann (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12789484#action_12789484
 ] 

Chris A. Mattmann edited comment on SOLR-1131 at 12/11/09 7:39 PM:
---

Hi All,

Updated patch:

{quote}
# Introduced a MultiValueSource - ValueSource that abstractly represents 
ValueSources for poly fields, and other things.
{quote}

I added javadoc to this and the ASF license header.

{quote}
# Introduced PointValueSource - point(x,y,z) - a MultiValueSource that wraps 
other value sources (could be called something else, I suppose)
{quote}

I put the ASF header before the package decl, to be consistent with the other 
SOLR java files.

{quote}
# Add in SchemaAware callback mechanism so that Field Types and other schema 
stuff can register dynamic fields, etc. after the schema has been created
{quote}

Added more javadoc here, and ASF license.

{quote}
# Incorporated various comments from Chris and Yonik.
{quote}

Thanks, I appreciate it. I'm still -1 on the way this patch deals with the 
optimization issue. I'd like to see evidence that it makes sense to not use 
split and trim.

Cheers,
Chris



  was (Author: chrismattmann):
OK, this is getting a lot closer to ready to commit.

Changes:
Hi All,

Updated patch:

{quote}
# Introduced a MultiValueSource - ValueSource that abstractly represents 
ValueSources for poly fields, and other things.
{quote}

I added javadoc to this and the ASF license header.

{quote}
# Introduced PointValueSource - point(x,y,z) - a MultiValueSource that wraps 
other value sources (could be called something else, I suppose)
{quote}

I put the ASF header before the package decl, to be consistent with the other 
SOLR java files.

{quote}
# Add in SchemaAware callback mechanism so that Field Types and other schema 
stuff can register dynamic fields, etc. after the schema has been created
{quote}

Added more javadoc here, and ASF license.

{quote}
# Incorporated various comments from Chris and Yonik.
{quote}

Thanks, I appreciate it. I'm still -1 on the way this patch deals with the 
optimization issue. I'd like to see evidence that it makes sense to not use 
split and trim.

Cheers,
Chris


  
 Allow a single field type to index multiple fields
 --

 Key: SOLR-1131
 URL: https://issues.apache.org/jira/browse/SOLR-1131
 Project: Solr
  Issue Type: New Feature
  Components: Schema and Analysis
Reporter: Ryan McKinley
Assignee: Grant Ingersoll
 Fix For: 1.5

 Attachments: SOLR-1131-IndexMultipleFields.patch, 
 SOLR-1131.Mattmann.121009.patch.txt, SOLR-1131.Mattmann.121109.patch.txt, 
 SOLR-1131.patch, SOLR-1131.patch, SOLR-1131.patch, SOLR-1131.patch, 
 SOLR-1131.patch, SOLR-1131.patch, SOLR-1131.patch


 In a few special cases, it makes sense for a single field (the concept) to 
 be indexed as a set of Fields (lucene Field).  Consider SOLR-773.  The 
 concept point may be best indexed in a variety of ways:
  * geohash (sincle lucene field)
  * lat field, lon field (two double fields)
  * cartesian tiers (a series of fields with tokens to say if it exists within 
 that region)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: Namespaces in response (SOLR-1586)

2009-12-11 Thread Chris Hostetter
:  : I think the initial geosearch feature can start off with
:  : str10,20/str for a point.
:  
:  +1.
: 
: Fundamentally, how is a string a point?

Fundementally a string is not a point, and a point is not a string -- but 
if you want express the concept of a point in a manner that only uses very 
simple primative types, then a string containing comma seperated numbers 
is a pretty dencet way to do it.  If you'd prefer, a pair of numbers would 
workd just as well...

   arrfloat10/floatfloat20/float/arr

:  The current XML format SOlr uses was designed to be extremely simple, very
:  JSON-esque, and easily parsable by *anyone* in any langauge, without
:  needing special knowledge of types .
: 
: Whoah. I'm totally confused now. Why have FieldTypes then? When not just use
: Lucene? The use case for FieldTypes is _not_ just for indexing, or querying.
: It's also for representation?

No, actually the use case for FieldTYpes is entirely about the internal 
logic of how Solr should deal with those fields, and how various 
operations should work on them.  FieldTypes can dictate the internal 
representation within the confines of a Lucene index, but they should not 
circumvent the contracts of the response writers in dictating what 
is/isn't a legal response.

XMLWriter.writePrim may be public, which means there is a loophole that 
plugin writers can exploit to add new tag names to the Solr XML response that
violate the contract (and no we don't have a formal XSD or DTD for our 
XML response format, but we still have a very well advertised contract) -- 
but that doesn't mean that code which ships with Solr should exploit those 
loopholes to violate that contract.  People should expect that if they use 
Solr as is without any custom code that the XMLResponseWriter won't all of 
the sudden start including new, non-primitive-ish, XML tags/attributes 
that weren't there before.

That's the entire point of the format as it was designed: break down 
whatever complex data might be involved in a response into easily 
digestible maps/lists of maps/lists of very primitive types that can 
easily be used in any programming langauge.

: allowed for a while I think), why prevent it? Allowing namespaces does _not_
: break anything. 
...
:  introducing a new 'point concept, wether as point or as
:  georss:point/, is going to break things for people.
: 
: Show me an example, I fundamentally disagree with this.

Ok. Let's start with SolrJ then: take a look at the KnownType enum (line 
151) in XMLResponseParser...

http://svn.apache.org/viewvc/lucene/solr/trunk/src/solrj/org/apache/solr/client/solrj/impl/XMLResponseParser.java?revision=819403view=markup

...or let's do a random google code search for solr xml lst -- check out 
ResponseContentHandler in solrpy...

http://code.google.com/p/solrpy/source/browse/trunk/solr/core.py#841

...I can't write python code to save my life, but I have pretty good idea 
what that code will do if it sees an unexpected tag.

This is how a *LOT* of SOlr client libraries are implemented ... it's not 
an issue of broken XML parsers freaking out about namespaces, it's an 
issue of having a long standing, heavily advertised schema for the XML 
response that promises to only ever use a handful of types.  Adding any 
new tags to this format (regardless of how easy it may be because of that 
stupid fucking public modifier on XMLWuiter.writePrim) will absolutely 
break things for people.

: And why is that? Isn't the point of SOLR to expand to use cases brought up
: by users of the system? As long as those use cases can be principally
: supported, without breaking backwards compatibility (or in that case,  if
: they do, with large blinking red text that says it), then you're shutting
: people out for 0 benefit? It's aesthetics we're talking about here.

I don't know if i'd say that's the point of Solr, but yes we should 
absolutely try to grow the capabilities of the system as new use cases 
come along.

I am 100% in agreement that the existing simple XMLRresponseWriter is 
not for everyone -- Historicly we've tried to maintain a sense of equality 
between all of hte Response writers, so that they all contained the same 
data just with different markup -- but there are clearly cases where it 
would be nice to have a response writer that is allowed to know more 
about teh real structure of the data and represent it in a manner that 
more closely represents it's purpose.  This was the entire point behind 
adding FieldType.toOBject, and UUIDFIeld w/the BinaryResponseWriter is a 
good example of the model we should follow in the future.

There is a clear push for Solr to natively be able to generated responses that 
incorporate more industry standard XML schemas, and i would love to see 
us start adding functionality to do that, but bastardizing the existing 
XMLResponseWriter format is not the way to do it.

Bottom Line: I am a big fat -1 on any patch to Solr that adds new xml tags 
to the output 

Re: Namespaces in response (SOLR-1586)

2009-12-11 Thread Chris Hostetter

:  themselves ... because of the back-ass-wards way we have FieldTypes write
:  their values directly to an XMLWriter or a TextWriter the idea of using an
:  object that stringifies itself as needed doesn't really apply very well
: 
: I think it's rather powerful. You insulate the following variations into 1
: single place to change them (FieldType):
: 
: * output representation
: * indexing
: * validation
: 
: To remove this from FieldType would be to strew the same functionality
: across multiple classes, which doesn't make sense IMHO.

it's a damned-if-you-do/damned-if-you-don't situation though ... you look 
at as insulating the response writers because all of the logic about 
serializing data is in the FieldType, but i look at it as poluting the 
FieldType with knowledge about the output formats -- there's a reason we 
didn't add writeBinary to the FieldTYpe when the BinaryResponseWriter 
was added ... the toObject abstraction let's the FieldType do whatever it 
wants internally, and provide it's best face to the world when asked.  
the ResponseWriters can then apply hueristics to decide the most 
compatible type they know of to use when representing it: is it something 
complex i have a codec for? no; oh well, then is it soemthing that 
implemnets COllection? no; oh well, then is it something that is an 
instanceof Number? no; oh well, as a last resort we can stringify

: In the long run, this might be nice, and +1 on getting there in the long
: run. In the short, a compromise is to allow namespacing on fields in the
: existing XmlWriter, which is allowed anyways, whether by oversight or not.

I'm sure if we look hard enough at teh existing internal APIs, we can find 
a way to generate completley broken XML that no DOM, SAX or pull parser 
could possibly deal with cleanly -- but that doesn't mean we should do 
that just because it would allow us to start outputing a bunch of metadata 
that we think is useful.  breaking the (implicit) XML Schema is just as 
bad as breaking the XML itself.



-Hoss



[jira] Updated: (SOLR-1387) Add more search options for filtering field facets.

2009-12-11 Thread Hoss Man (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1387?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hoss Man updated SOLR-1387:
---

Summary: Add more search options for filtering field facets.  (was: Add 
more search options for filtering facets.)

 Add more search options for filtering field facets.
 ---

 Key: SOLR-1387
 URL: https://issues.apache.org/jira/browse/SOLR-1387
 Project: Solr
  Issue Type: New Feature
  Components: search
Affects Versions: 1.4
Reporter: Anil Khadka

 Currently for filtering the facets, we have to use prefix (which use 
 String.startsWith() in java). 
 We can add some parameters like
 * facet.iPrefix : this would act like case-insensitive search. (or ---  
 facet.prefix=afacet.caseinsense=on)
 * facet.regex : this is pure regular expression search (which obviously would 
 be expensive if issued).
 Moreover, allowing multiple filtering for same field would be great like
 facet.prefix=a OR facet.prefix=A ... sth like this.
 All above concepts could be equally applicable to TermsComponent.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-1387) Add more search options for filtering field facets.

2009-12-11 Thread Anil Khadka (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1387?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anil Khadka updated SOLR-1387:
--

Fix Version/s: 1.5
Affects Version/s: (was: 1.4)

 Add more search options for filtering field facets.
 ---

 Key: SOLR-1387
 URL: https://issues.apache.org/jira/browse/SOLR-1387
 Project: Solr
  Issue Type: New Feature
  Components: search
Reporter: Anil Khadka
 Fix For: 1.5


 Currently for filtering the facets, we have to use prefix (which use 
 String.startsWith() in java). 
 We can add some parameters like
 * facet.iPrefix : this would act like case-insensitive search. (or ---  
 facet.prefix=afacet.caseinsense=on)
 * facet.regex : this is pure regular expression search (which obviously would 
 be expensive if issued).
 Moreover, allowing multiple filtering for same field would be great like
 facet.prefix=a OR facet.prefix=A ... sth like this.
 All above concepts could be equally applicable to TermsComponent.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1131) Allow a single field type to index multiple fields

2009-12-11 Thread Grant Ingersoll (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12789526#action_12789526
 ] 

Grant Ingersoll commented on SOLR-1131:
---

I think this is ready to commit.  I'd like to do so on Monday or Tuesday of 
next week, so that should give plenty of time for further review

 Allow a single field type to index multiple fields
 --

 Key: SOLR-1131
 URL: https://issues.apache.org/jira/browse/SOLR-1131
 Project: Solr
  Issue Type: New Feature
  Components: Schema and Analysis
Reporter: Ryan McKinley
Assignee: Grant Ingersoll
 Fix For: 1.5

 Attachments: SOLR-1131-IndexMultipleFields.patch, 
 SOLR-1131.Mattmann.121009.patch.txt, SOLR-1131.Mattmann.121109.patch.txt, 
 SOLR-1131.patch, SOLR-1131.patch, SOLR-1131.patch, SOLR-1131.patch, 
 SOLR-1131.patch, SOLR-1131.patch, SOLR-1131.patch


 In a few special cases, it makes sense for a single field (the concept) to 
 be indexed as a set of Fields (lucene Field).  Consider SOLR-773.  The 
 concept point may be best indexed in a variety of ways:
  * geohash (sincle lucene field)
  * lat field, lon field (two double fields)
  * cartesian tiers (a series of fields with tokens to say if it exists within 
 that region)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (SOLR-1297) Enable sorting by Function Query

2009-12-11 Thread Grant Ingersoll (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Ingersoll reassigned SOLR-1297:
-

Assignee: Grant Ingersoll

 Enable sorting by Function Query
 

 Key: SOLR-1297
 URL: https://issues.apache.org/jira/browse/SOLR-1297
 Project: Solr
  Issue Type: New Feature
Reporter: Grant Ingersoll
Assignee: Grant Ingersoll
Priority: Minor
 Fix For: 1.5


 It would be nice if one could sort by FunctionQuery.  See also SOLR-773, 
 where this was first mentioned by Yonik as part of the generic solution to 
 geo-search

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Work started: (SOLR-1297) Enable sorting by Function Query

2009-12-11 Thread Grant Ingersoll (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on SOLR-1297 started by Grant Ingersoll.

 Enable sorting by Function Query
 

 Key: SOLR-1297
 URL: https://issues.apache.org/jira/browse/SOLR-1297
 Project: Solr
  Issue Type: New Feature
Reporter: Grant Ingersoll
Assignee: Grant Ingersoll
Priority: Minor
 Fix For: 1.5


 It would be nice if one could sort by FunctionQuery.  See also SOLR-773, 
 where this was first mentioned by Yonik as part of the generic solution to 
 geo-search

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Issue Comment Edited: (SOLR-236) Field collapsing

2009-12-11 Thread Chad Kouse (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12789550#action_12789550
 ] 

Chad Kouse edited comment on SOLR-236 at 12/11/09 9:37 PM:
---

Just wanted to comment that I am experiencing the same behavior as Marc Menghin 
above (NPE) -- the patch did NOT install cleanly (1 hunk failed) -- but I 
couldn't really tell why since it looked like it should have worked -- I just 
manually copied the hunk into the correct class Sorry I didn't note what 
failed

  was (Author: chadkouse):
Just wanted to comment that I am experiencing the same behavior as Marc 
Menghin above (NPE) -- the patch did NOT install cleanly (1 hunk failed) -- but 
I couldn't really tell why since it looked like it should have worked -- I just 
manually copied the hunk into the write class Sorry I didn't note what 
failed
  
 Field collapsing
 

 Key: SOLR-236
 URL: https://issues.apache.org/jira/browse/SOLR-236
 Project: Solr
  Issue Type: New Feature
  Components: search
Affects Versions: 1.3
Reporter: Emmanuel Keller
 Fix For: 1.5

 Attachments: collapsing-patch-to-1.3.0-dieter.patch, 
 collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, 
 collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch, 
 field-collapse-4-with-solrj.patch, field-collapse-5.patch, 
 field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, 
 field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, 
 field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, 
 field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, 
 field-collapse-5.patch, field-collapse-solr-236-2.patch, 
 field-collapse-solr-236.patch, field-collapsing-extended-592129.patch, 
 field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, 
 field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, 
 field_collapsing_dsteigerwald.diff, quasidistributed.additional.patch, 
 SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, 
 SOLR-236-FieldCollapsing.patch, solr-236.patch, SOLR-236_collapsing.patch, 
 SOLR-236_collapsing.patch


 This patch include a new feature called Field collapsing.
 Used in order to collapse a group of results with similar value for a given 
 field to a single entry in the result set. Site collapsing is a special case 
 of this, where all results for a given web site is collapsed into one or two 
 entries in the result set, typically with an associated more documents from 
 this site link. See also Duplicate detection.
 http://www.fastsearch.com/glossary.aspx?m=48amid=299
 The implementation add 3 new query parameters (SolrParams):
 collapse.field to choose the field used to group results
 collapse.type normal (default value) or adjacent
 collapse.max to select how many continuous results are allowed before 
 collapsing
 TODO (in progress):
 - More documentation (on source code)
 - Test cases
 Two patches:
 - field_collapsing.patch for current development version
 - field_collapsing_1.1.0.patch for Solr-1.1.0
 P.S.: Feedback and misspelling correction are welcome ;-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-236) Field collapsing

2009-12-11 Thread Chad Kouse (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12789550#action_12789550
 ] 

Chad Kouse commented on SOLR-236:
-

Just wanted to comment that I am experiencing the same behavior as Marc Menghin 
above (NPE) -- the patch did NOT install cleanly (1 hunk failed) -- but I 
couldn't really tell why since it looked like it should have worked -- I just 
manually copied the hunk into the write class Sorry I didn't note what 
failed

 Field collapsing
 

 Key: SOLR-236
 URL: https://issues.apache.org/jira/browse/SOLR-236
 Project: Solr
  Issue Type: New Feature
  Components: search
Affects Versions: 1.3
Reporter: Emmanuel Keller
 Fix For: 1.5

 Attachments: collapsing-patch-to-1.3.0-dieter.patch, 
 collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, 
 collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch, 
 field-collapse-4-with-solrj.patch, field-collapse-5.patch, 
 field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, 
 field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, 
 field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, 
 field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, 
 field-collapse-5.patch, field-collapse-solr-236-2.patch, 
 field-collapse-solr-236.patch, field-collapsing-extended-592129.patch, 
 field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, 
 field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, 
 field_collapsing_dsteigerwald.diff, quasidistributed.additional.patch, 
 SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, 
 SOLR-236-FieldCollapsing.patch, solr-236.patch, SOLR-236_collapsing.patch, 
 SOLR-236_collapsing.patch


 This patch include a new feature called Field collapsing.
 Used in order to collapse a group of results with similar value for a given 
 field to a single entry in the result set. Site collapsing is a special case 
 of this, where all results for a given web site is collapsed into one or two 
 entries in the result set, typically with an associated more documents from 
 this site link. See also Duplicate detection.
 http://www.fastsearch.com/glossary.aspx?m=48amid=299
 The implementation add 3 new query parameters (SolrParams):
 collapse.field to choose the field used to group results
 collapse.type normal (default value) or adjacent
 collapse.max to select how many continuous results are allowed before 
 collapsing
 TODO (in progress):
 - More documentation (on source code)
 - Test cases
 Two patches:
 - field_collapsing.patch for current development version
 - field_collapsing_1.1.0.patch for Solr-1.1.0
 P.S.: Feedback and misspelling correction are welcome ;-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1297) Enable sorting by Function Query

2009-12-11 Thread Grant Ingersoll (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12789551#action_12789551
 ] 

Grant Ingersoll commented on SOLR-1297:
---

For this, I think we want to be able to do things like:

Just functions
{code}
sort=dist(2,x,y, point(0,0)) desc
{code}

Multiple sort params, some functions, some fields
{code}
sort=weight asc,dist(2,x,y, point(0,0)) asc
{code}

If and when a function result cache exists, we should be able to take advantage 
of that too, but that is an implementation detail.

 Enable sorting by Function Query
 

 Key: SOLR-1297
 URL: https://issues.apache.org/jira/browse/SOLR-1297
 Project: Solr
  Issue Type: New Feature
Reporter: Grant Ingersoll
Assignee: Grant Ingersoll
Priority: Minor
 Fix For: 1.5


 It would be nice if one could sort by FunctionQuery.  See also SOLR-773, 
 where this was first mentioned by Yonik as part of the generic solution to 
 geo-search

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: Namespaces in response (SOLR-1586)

2009-12-11 Thread Mattmann, Chris A (388J)
Hi Hoss,

 :  : I think the initial geosearch feature can start off with
 :  : str10,20/str for a point.
 : 
 :  +1.
 :
 : Fundamentally, how is a string a point?
 
 Fundementally a string is not a point, and a point is not a string -- but
 if you want express the concept of a point in a manner that only uses very
 simple primative types, then a string containing comma seperated numbers
 is a pretty dencet way to do it.  If you'd prefer, a pair of numbers would
 workd just as well...
 
arrfloat10/floatfloat20/float/arr

I'm conflicted here. In simple semantics, sure it's just an array of
float/double numbers. A, if a string must be used a comma is probably OK, so
long as it maps to some existing known approach to represent points. I've
asked several times if there are examples. I can point to one that uses
spaces to separate the coordinates in the point (georss). What others use
comma? 

 
 :  The current XML format SOlr uses was designed to be extremely simple, very
 :  JSON-esque, and easily parsable by *anyone* in any langauge, without
 :  needing special knowledge of types .
 :
 : Whoah. I'm totally confused now. Why have FieldTypes then? When not just use
 : Lucene? The use case for FieldTypes is _not_ just for indexing, or querying.
 : It's also for representation?
 
 No, actually the use case for FieldTYpes is entirely about the internal
 logic of how Solr should deal with those fields, and how various
 operations should work on them.  FieldTypes can dictate the internal
 representation within the confines of a Lucene index, but they should not
 circumvent the contracts of the response writers in dictating what
 is/isn't a legal response.

Well, I actually would disagree. What's the point of #toInternal and
#toExternal then, other than to convert from the external representation to
an internal Lucene index representation, and then to do the opposite coming
out of the index? 

 
 : allowed for a while I think), why prevent it? Allowing namespaces does _not_
 : break anything.
 ...
 :  introducing a new 'point concept, wether as point or as
 :  georss:point/, is going to break things for people.
 :
 : Show me an example, I fundamentally disagree with this.
 
 Ok. Let's start with SolrJ then: take a look at the KnownType enum (line
 151) in XMLResponseParser...
 
 http://svn.apache.org/viewvc/lucene/solr/trunk/src/solrj/org/apache/solr/clien
 t/solrj/impl/XMLResponseParser.java?revision=819403view=markup

Got it. OK, sure, well thanks for actually being able to identify somewhere
where it would be and for taking the time to provide a link. So what you are
saying is that this breaks the SolrJ and python clients and people who
develop clients to parse and read the (undocumented) SOLR response schema.

 
 ...or let's do a random google code search for solr xml lst -- check out
 ResponseContentHandler in solrpy...
 
 http://code.google.com/p/solrpy/source/browse/trunk/solr/core.py#841
 
 ...I can't write python code to save my life, but I have pretty good idea
 what that code will do if it sees an unexpected tag.

Gotcha.


 : And why is that? Isn't the point of SOLR to expand to use cases brought up
 : by users of the system? As long as those use cases can be principally
 : supported, without breaking backwards compatibility (or in that case,  if
 : they do, with large blinking red text that says it), then you're shutting
 : people out for 0 benefit? It's aesthetics we're talking about here.
 
 I don't know if i'd say that's the point of Solr, but yes we should
 absolutely try to grow the capabilities of the system as new use cases
 come along.

Well that's what I was trying to do, but all I was hearing was a lot of
hollering without any help to understand why. Thanks for being the one to
finally provide that information.

 
 I am 100% in agreement that the existing simple XMLRresponseWriter is
 not for everyone -- Historicly we've tried to maintain a sense of equality
 between all of hte Response writers, so that they all contained the same
 data just with different markup -- but there are clearly cases where it
 would be nice to have a response writer that is allowed to know more
 about teh real structure of the data and represent it in a manner that
 more closely represents it's purpose.

I'd like to refactor the whole thing to be a bit less brittle, and also to
close off people that shouldn't be dealing with SOLR's XML in/out (by taking
away your favorite writePrim method and its public modifier and making the
class final which it once was). We should rename that to
SolrXmlResponseWriter, but it's not really generic XML (as the name
suggests), it's SOLR's custom (undocumented) XML schema, right? Also, since
it's undocumented, I'd be happy to throw it together for it's XML format.
Would that also be welcomed? Then, we should develop an easy extension point
mechanism for people who want to develop their own XML response writers and
write their own clients (or leverage existing clients that 

[jira] Created: (SOLR-1645) Add human content-type

2009-12-11 Thread Khalid Yagoubi (JIRA)
Add human content-type
--

 Key: SOLR-1645
 URL: https://issues.apache.org/jira/browse/SOLR-1645
 Project: Solr
  Issue Type: Improvement
  Components: contrib - Solr Cell (Tika extraction)
Affects Versions: 1.4
Reporter: Khalid Yagoubi
 Fix For: 1.4


Idea is to allow Solr-Cell to calculate the human content-type from the 
extracted content-type and map it to a field in the schema. 

So the user can search on media: image or media:video

Idea :
1) Hardcode a hashmap in somewhere in extraction classes and get human 
content-type from extracted content-type. I Think to SolrContentHandler.java
2) Write an xml file where we can put a mapping like in tika-config.xml for 
parsers
3) Use tika-config.xml to get all supported mime-types



-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: Namespaces in response (SOLR-1586)

2009-12-11 Thread Mattmann, Chris A (388J)
Hi Hoss,

 : I think it's rather powerful. You insulate the following variations into 1
 : single place to change them (FieldType):
 :
 : * output representation
 : * indexing
 : * validation
 :
 : To remove this from FieldType would be to strew the same functionality
 : across multiple classes, which doesn't make sense IMHO.
 
 it's a damned-if-you-do/damned-if-you-don't situation though ... you look
 at as insulating the response writers because all of the logic about
 serializing data is in the FieldType, but i look at it as poluting the
 FieldType with knowledge about the output formats -- there's a reason we
 didn't add writeBinary to the FieldTYpe when the BinaryResponseWriter
 was added ... the toObject abstraction let's the FieldType do whatever it
 wants internally, and provide it's best face to the world when asked.
 the ResponseWriters can then apply hueristics to decide the most
 compatible type they know of to use when representing it: is it something
 complex i have a codec for? no; oh well, then is it soemthing that
 implemnets COllection? no; oh well, then is it something that is an
 instanceof Number? no; oh well, as a last resort we can stringify

Sure, it's just that it's half-way on both sides right now like you said.
There's probably a middle ground. I like the insulation but I also
understand the clutter (i.e., what you're saying).

 
 : In the long run, this might be nice, and +1 on getting there in the long
 : run. In the short, a compromise is to allow namespacing on fields in the
 : existing XmlWriter, which is allowed anyways, whether by oversight or not.
 
 I'm sure if we look hard enough at teh existing internal APIs, we can find
 a way to generate completley broken XML that no DOM, SAX or pull parser
 could possibly deal with cleanly -- but that doesn't mean we should do
 that just because it would allow us to start outputing a bunch of metadata
 that we think is useful.  breaking the (implicit) XML Schema is just as
 bad as breaking the XML itself.

Agreed. Let's document that (implicit) schema so loud people like me don't
keep bugging you guys when it's so obvious to you. I'm just trying to help.
I'll take an action.

Cheers,
Chris

++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: chris.mattm...@jpl.nasa.gov
WWW:   http://sunset.usc.edu/~mattmann/
++
Adjunct Assistant Professor, Computer Science Department University of
Southern California, Los Angeles, CA 90089 USA
++




[jira] Commented: (SOLR-1131) Allow a single field type to index multiple fields

2009-12-11 Thread Chris A. Mattmann (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12789561#action_12789561
 ] 

Chris A. Mattmann commented on SOLR-1131:
-

Hi Grant:

{quote}
My tests show it to be at least 7 times faster. But this should be obvious from 
static analysis, too. First of all, String.split() uses a regex which then 
makes a pass through the underlying character array. Then, trim has to go back 
through and analyze the char array too, not to mention the extra String 
creations. The optimized version here makes one pass and deals solely at the 
char array level and only has to do the substring, which I think can be 
optimized by the JVM to be a copy on write.
{quote}

Got it. A couple of points:

1. 7x faster is great, but could end up being noise if x = 2 ms. It matters if 
x is say 2 minutes, agreed. If it's on the ms end then the expense of more 
lines of (uncommented) code isn't worth it.
2. This code is likely to get called heavily on the indexing side, so 
performance, though still an issue, is not as hugely important as say on the 
searching side.
3. If you feel strongly about an optimized version of this magic splitAndTrim 
function, how ability a utility function and refactor then? I would guess this 
code could be used elsewhere, and that would help to satisfy my hunger for 
reusability. I'll even javadoc the function and do the refactor if you'd like.

Cheers,
Chris



 Allow a single field type to index multiple fields
 --

 Key: SOLR-1131
 URL: https://issues.apache.org/jira/browse/SOLR-1131
 Project: Solr
  Issue Type: New Feature
  Components: Schema and Analysis
Reporter: Ryan McKinley
Assignee: Grant Ingersoll
 Fix For: 1.5

 Attachments: SOLR-1131-IndexMultipleFields.patch, 
 SOLR-1131.Mattmann.121009.patch.txt, SOLR-1131.Mattmann.121109.patch.txt, 
 SOLR-1131.patch, SOLR-1131.patch, SOLR-1131.patch, SOLR-1131.patch, 
 SOLR-1131.patch, SOLR-1131.patch, SOLR-1131.patch


 In a few special cases, it makes sense for a single field (the concept) to 
 be indexed as a set of Fields (lucene Field).  Consider SOLR-773.  The 
 concept point may be best indexed in a variety of ways:
  * geohash (sincle lucene field)
  * lat field, lon field (two double fields)
  * cartesian tiers (a series of fields with tokens to say if it exists within 
 that region)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1506) Search multiple cores using MultiReader

2009-12-11 Thread Jason Rutherglen (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12789600#action_12789600
 ] 

Jason Rutherglen commented on SOLR-1506:


There's a different bug here, where because CoreContainer loads
the cores sequentially, and MultiCoreReaderFactory looks for all
the cores, when the proxy core isn't last, not all the cores are
searchable, if the proxy is first, an exception is thrown. 

The workaround is to place the proxy core last, however that's
not possible when using the core admin HTTP API. Hmm... Not sure
what the best workaround is.

 Search multiple cores using MultiReader
 ---

 Key: SOLR-1506
 URL: https://issues.apache.org/jira/browse/SOLR-1506
 Project: Solr
  Issue Type: Improvement
  Components: search
Affects Versions: 1.4
Reporter: Jason Rutherglen
Priority: Trivial
 Fix For: 1.5

 Attachments: SOLR-1506.patch, SOLR-1506.patch, SOLR-1506.patch


 I need to search over multiple cores, and SOLR-1477 is more
 complicated than expected, so here we'll create a MultiReader
 over the cores to allow searching on them.
 Maybe in the future we can add parallel searching however
 SOLR-1477, if it gets completed, provides that out of the box.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1632) Distributed IDF

2009-12-11 Thread Andrzej Bialecki (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12789607#action_12789607
 ] 

Andrzej Bialecki  commented on SOLR-1632:
-

I believe the API that I propose would support such implementation as well. 
Please note that it's usually not feasible to compute and distribute the 
complete IDF table for all terms - you would have to replicate a union of all 
term dictionaries across the cluster. In practice, you limit the amount of 
information by various means, e.g. only distributing data related to the 
current request (this implementation) or reducing the frequency of updates 
(e.g. LRU caching), or approximating global DF with a constant for frequent 
terms (where the contribution of their IDF to the score would be negligible 
anyway).

 Distributed IDF
 ---

 Key: SOLR-1632
 URL: https://issues.apache.org/jira/browse/SOLR-1632
 Project: Solr
  Issue Type: New Feature
  Components: search
Affects Versions: 1.5
Reporter: Andrzej Bialecki 
 Attachments: distrib.patch


 Distributed IDF is a valuable enhancement for distributed search across 
 non-uniform shards. This issue tracks the proposed implementation of an API 
 to support this functionality in Solr.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-1177) Distributed TermsComponent

2009-12-11 Thread Matt Weber (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1177?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Weber updated SOLR-1177:
-

Attachment: SOLR-1177.patch

Here is an updated patch that includes Shalin's suggestions:

- replace TermData with TermsResponse.Term
- updates TermsHelper to use the parsing code from TermsResponse

I also changed TermsResponse.Term#frequency to a long so that we don't overflow 
when calculating the frequency.  Then to keep back-compatbility with existing 
code I do the following when writing it to the NamedList:

if (tc.getFrequency() = freqmin  tc.getFrequency() = freqmax) {
fieldterms.add(tc.getTerm(), ((Number)tc.getFrequency()).intValue());
cnt++;
}

Is this a good approach?


This new patch includes SOLR-1139.



 Distributed TermsComponent
 --

 Key: SOLR-1177
 URL: https://issues.apache.org/jira/browse/SOLR-1177
 Project: Solr
  Issue Type: Improvement
Affects Versions: 1.4
Reporter: Matt Weber
Assignee: Shalin Shekhar Mangar
Priority: Minor
 Fix For: 1.5

 Attachments: SOLR-1177.patch, SOLR-1177.patch, SOLR-1177.patch, 
 SOLR-1177.patch, TermsComponent.java, TermsComponent.patch


 TermsComponent should be distributed

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1621) Allow current single core deployments to be specified by solr.xml

2009-12-11 Thread Noble Paul (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12789675#action_12789675
 ] 

Noble Paul commented on SOLR-1621:
--

bq.You mark setSolrConfigFilename as deprecated, but it simply doesn't work 
anymore either - thats not really deprecation. I feel that deprecating 
solrconfig-filename should be its own issue so thats a more clear entry in 
CHANGES

I shall open another issue for this.  The setSolrConfigFilename() did not work 
for multicore . So it does not make sense for us to bend over backwards to make 
it work in this and add too many overloaded methods. Anyway, users won't need 



 Allow current single core deployments to be specified by solr.xml
 -

 Key: SOLR-1621
 URL: https://issues.apache.org/jira/browse/SOLR-1621
 Project: Solr
  Issue Type: New Feature
Affects Versions: 1.5
Reporter: Noble Paul
 Fix For: 1.5

 Attachments: SOLR-1621.patch, SOLR-1621.patch, SOLR-1621.patch, 
 SOLR-1621.patch, SOLR-1621.patch, SOLR-1621.patch, SOLR-1621.patch, 
 SOLR-1621.patch, SOLR-1621.patch


 supporting two different modes of deployments is turning out to be hard. This 
 leads to duplication of code. Moreover there is a lot of confusion on where 
 do we put common configuration. See the mail thread 
 http://markmail.org/message/3m3rqvp2ckausjnf

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1592) Refactor XMLWriter startTag to allow arbitrary attributes to be written

2009-12-11 Thread Lance Norskog (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12789678#action_12789678
 ] 

Lance Norskog commented on SOLR-1592:
-

Ok, never mind. 

 Refactor XMLWriter startTag to allow arbitrary attributes to be written
 ---

 Key: SOLR-1592
 URL: https://issues.apache.org/jira/browse/SOLR-1592
 Project: Solr
  Issue Type: Improvement
Affects Versions: 1.4
 Environment: My MacBook laptop.
Reporter: Chris A. Mattmann
Assignee: Noble Paul
 Fix For: 1.5

 Attachments: SOLR-1592.Mattmann.112209.patch.txt, 
 SOLR-1592.Mattmann.112209_02.patch.txt, SOLR-1592.patch, SOLR-1592.patch


 There are certain cases in which a user would like to write arbitrary 
 attributes as part of the XML output for a field tag. Case in point: I'd like 
 to declare tags in the SOLR output that are e.g., georss namespace, like 
 georss:point. Other users may want to declare myns:mytag tags, which should 
 be perfectly legal as SOLR goes. This isn't currently possible with the 
 XMLWriter implementation, which curiously only allows the attribute name to 
 be included in the XML tags. 
 Coincidentally, users of XMLWriter aren't allowed to modify the response 
 outer XML tag to include those arbitrary namespaces (which was my original 
 thought as a workaround for this). This wouldn't matter anyways, because by 
 the time the user got to the FieldType#writeXML method, the header for the 
 XML would have been written anyways.
 I've developed a workaround, and in doing so, allowed something that should 
 have probably been allowed in the first place: allow a user to write 
 arbitrary attributes (including xmlns:myns=myuri) as part of the 
 XMLWriter#startTag function. I've kept the existing #startTag, but replaced 
 its innards with versions of startTag that include startTagWithNamespaces, 
 and startTagNoAttrs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (SOLR-1646) Document the SOLR XML response schema in XMLSchema and DTD

2009-12-11 Thread Chris A. Mattmann (JIRA)
Document the SOLR XML response schema in XMLSchema and DTD
--

 Key: SOLR-1646
 URL: https://issues.apache.org/jira/browse/SOLR-1646
 Project: Solr
  Issue Type: Improvement
  Components: clients - C#, clients - java, clients - php, clients - 
python, clients - ruby - flare, documentation, Response Writers, Schema and 
Analysis, search
Affects Versions: 1.4, 1.3, 1.2, 1.1.0
 Environment: My local Mac Book pro over Christmas Break.
Reporter: Chris A. Mattmann
Priority: Critical
 Fix For: 1.5


See this thread:

http://www.lucidimagination.com/search/document/e8bb6cac84c1f520/namespaces_in_response_solr_1586#cc50ba9e9d8fe2dc

For the background. The idea is to finally document the often alluded to (but 
never riefied) SOLR XML response schema/DTD.



-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (SOLR-1648) Rename XMLWriter (and XMLResponseWriter) to SolrXmlResponseWriter

2009-12-11 Thread Chris A. Mattmann (JIRA)
Rename XMLWriter (and XMLResponseWriter) to SolrXmlResponseWriter
-

 Key: SOLR-1648
 URL: https://issues.apache.org/jira/browse/SOLR-1648
 Project: Solr
  Issue Type: Improvement
  Components: Response Writers
Affects Versions: 1.4, 1.3, 1.2
 Environment: My local MacBook pro over the Christmas Break.
Reporter: Chris A. Mattmann
 Fix For: 1.5


The current XMLWriter class is kind of a misnomer. It's not a generic XMLWriter 
by any means, and that's not its intention. Its intention is to write instances 
(as responses) of a particular XML schema that SOLR clients (written in Java, 
Python, pick-your-favorite-programming-language) that speak its XML protocol 
can understand. So, we should rename it to SolrXmlResponseWriter to indicate 
its unique XML speak.

Morever, as part of the next issue I'm going to report (refactoring all 
ResponseWriters to be a bit more friendly), we should probably do away with the 
current XmlResponseWriter (which simply delegates to XmlWriter -- eeep) and 
just stick with SolrXmlResponseWriter. Patch forthcoming.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1648) Rename XMLWriter (and XMLResponseWriter) to SolrXmlResponseWriter

2009-12-11 Thread Chris A. Mattmann (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12789682#action_12789682
 ] 

Chris A. Mattmann commented on SOLR-1648:
-

See this thread for more detail:

http://www.lucidimagination.com/search/document/e8bb6cac84c1f520/namespaces_in_response_solr_1586#cc50ba9e9d8fe2dc

 Rename XMLWriter (and XMLResponseWriter) to SolrXmlResponseWriter
 -

 Key: SOLR-1648
 URL: https://issues.apache.org/jira/browse/SOLR-1648
 Project: Solr
  Issue Type: Improvement
  Components: Response Writers
Affects Versions: 1.2, 1.3, 1.4
 Environment: My local MacBook pro over the Christmas Break.
Reporter: Chris A. Mattmann
 Fix For: 1.5


 The current XMLWriter class is kind of a misnomer. It's not a generic 
 XMLWriter by any means, and that's not its intention. Its intention is to 
 write instances (as responses) of a particular XML schema that SOLR clients 
 (written in Java, Python, pick-your-favorite-programming-language) that speak 
 its XML protocol can understand. So, we should rename it to 
 SolrXmlResponseWriter to indicate its unique XML speak.
 Morever, as part of the next issue I'm going to report (refactoring all 
 ResponseWriters to be a bit more friendly), we should probably do away with 
 the current XmlResponseWriter (which simply delegates to XmlWriter -- eeep) 
 and just stick with SolrXmlResponseWriter. Patch forthcoming.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (SOLR-1649) Refactor all ResponseWriters to be more extension-friendly

2009-12-11 Thread Chris A. Mattmann (JIRA)
Refactor all ResponseWriters to be more extension-friendly
--

 Key: SOLR-1649
 URL: https://issues.apache.org/jira/browse/SOLR-1649
 Project: Solr
  Issue Type: Improvement
  Components: Response Writers
Affects Versions: 1.4
 Environment: My local MacBook pro over the Christmas break.
Reporter: Chris A. Mattmann
 Fix For: 1.5


I'd like to refactor all the ResponseWriters to be a bit less brittle. 
ResponseWriters should follow a standard interface with more existing methods 
than is currently present in the interface, and with lots of refactored utility 
code and more concrete control/data flow. I'll take a hard look at the existing 
response writers and try to generalize.

See this thread for background:

http://www.lucidimagination.com/search/document/e8bb6cac84c1f520/namespaces_in_response_solr_1586#cc50ba9e9d8fe2dc

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: SOLR-1131: disconnect between fields created by poly fields

2009-12-11 Thread Lance Norskog
There are already components (ExtractingRequestHandler, Deduplication)
that secretly add fields which violate the schema. Personally I would
nuke this ability; I've had major problems with junk in the indexed
data and discovering secret fields would have made my head explode
that much louder.

On Fri, Dec 11, 2009 at 7:01 AM, Yonik Seeley
yo...@lucidimagination.com wrote:
 On Fri, Dec 11, 2009 at 9:53 AM, Mattmann, Chris A (388J)
 chris.a.mattm...@jpl.nasa.gov wrote:
 Actually if it was the case that poly field mapped to a single dynamic
 field, then I would agree with you, but as is the discussion, poly field can
 map to _many_ dynamic fields, which is where the drift occurs.

 I'm not sure if we're using the exact same terminology, but it's well
 defined how many dynamic fields would be created by the basic point
 class (exactly one) *if* we decide to go that route and use that
 option.  Can you give an examples of what you mean?  Is your objection
 to this point class registering a single dynamic field, or are you
 talking about a hypothetical case?

 -Yonik
 http://www.lucidimagination.com




-- 
Lance Norskog
goks...@gmail.com


Re: SOLR-1131: disconnect between fields created by poly fields

2009-12-11 Thread Mattmann, Chris A (388J)
Hi Lance,

On 12/11/09 9:08 PM, Lance Norskog goks...@gmail.com wrote:

 There are already components (ExtractingRequestHandler, Deduplication)
 that secretly add fields which violate the schema. Personally I would
 nuke this ability; I've had major problems with junk in the indexed
 data and discovering secret fields would have made my head explode
 that much louder.

Yep, I agree. I originally felt that dynamic fields weren't the way to go,
but I came around and ended up thinking they are the way to go for this.
However, even though I think dynamic fields should be used for poly fields,
I _also_ think they should be pre-declared in the schema -- and not secretly
(as you put it) added at runtime (hence my mentioning of drift). It seems
that Hoss was for this as well:

http://www.lucidimagination.com/search/document/7f8ed6212481aca6/solr_1131_m
ultiple_fields_per_field_type

and I think he's right, though I remember going back and forth with him (and
others) about it in the beginning. The thing people have to realize here is
that there are 2 notions:

1. schema.xml -- edited by a user, offline before SOLR has started up,
loaded at runtime and turned into...
2. an IndexSchema instance, a Java object instance used to access the
schema.xml file loaded into a JVM. IndexSchema acts as a runtime model to
the underlying SOLR field typing system.

If #1 and #2 fall out of sync (which creating _any_ type of field in #2
does, without informing #1), then you have drift.

Cheers,
Chris

++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: chris.mattm...@jpl.nasa.gov
WWW:   http://sunset.usc.edu/~mattmann/
++
Adjunct Assistant Professor, Computer Science Department University of
Southern California, Los Angeles, CA 90089 USA
++




[jira] Assigned: (SOLR-1637) Deprecate ALIAS command

2009-12-11 Thread Noble Paul (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1637?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Noble Paul reassigned SOLR-1637:


Assignee: Noble Paul

 Deprecate ALIAS command
 ---

 Key: SOLR-1637
 URL: https://issues.apache.org/jira/browse/SOLR-1637
 Project: Solr
  Issue Type: Sub-task
  Components: multicore
Affects Versions: 1.5
Reporter: Noble Paul
Assignee: Noble Paul
 Fix For: 1.5


 Mulicore makes the CoreContainer code more complex. We should remove it for 
 now and revisit it for a simpler cleaner implementation

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: SOLR-1131: disconnect between fields created by poly fields

2009-12-11 Thread Erik Hatcher
They don't violate the schema, do they?   The fields added from both  
of those (and DIH too) all must be either fields or match dynamic  
field patterns.  Right?


Erik


On Dec 12, 2009, at 6:08 AM, Lance Norskog wrote:


There are already components (ExtractingRequestHandler, Deduplication)
that secretly add fields which violate the schema. Personally I would
nuke this ability; I've had major problems with junk in the indexed
data and discovering secret fields would have made my head explode
that much louder.

On Fri, Dec 11, 2009 at 7:01 AM, Yonik Seeley
yo...@lucidimagination.com wrote:

On Fri, Dec 11, 2009 at 9:53 AM, Mattmann, Chris A (388J)
chris.a.mattm...@jpl.nasa.gov wrote:
Actually if it was the case that poly field mapped to a single  
dynamic
field, then I would agree with you, but as is the discussion, poly  
field can

map to _many_ dynamic fields, which is where the drift occurs.


I'm not sure if we're using the exact same terminology, but it's well
defined how many dynamic fields would be created by the basic point
class (exactly one) *if* we decide to go that route and use that
option.  Can you give an examples of what you mean?  Is your  
objection

to this point class registering a single dynamic field, or are you
talking about a hypothetical case?

-Yonik
http://www.lucidimagination.com





--
Lance Norskog
goks...@gmail.com




[jira] Commented: (SOLR-1621) Allow current single core deployments to be specified by solr.xml

2009-12-11 Thread Noble Paul (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12789689#action_12789689
 ] 

Noble Paul commented on SOLR-1621:
--

I guess deprecating ALIAS can be committed first so that the changes for this 
issue will be smaller and more manageable

 Allow current single core deployments to be specified by solr.xml
 -

 Key: SOLR-1621
 URL: https://issues.apache.org/jira/browse/SOLR-1621
 Project: Solr
  Issue Type: New Feature
Affects Versions: 1.5
Reporter: Noble Paul
 Fix For: 1.5

 Attachments: SOLR-1621.patch, SOLR-1621.patch, SOLR-1621.patch, 
 SOLR-1621.patch, SOLR-1621.patch, SOLR-1621.patch, SOLR-1621.patch, 
 SOLR-1621.patch, SOLR-1621.patch


 supporting two different modes of deployments is turning out to be hard. This 
 leads to duplication of code. Moreover there is a lot of confusion on where 
 do we put common configuration. See the mail thread 
 http://markmail.org/message/3m3rqvp2ckausjnf

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Synchronizing Webapp on Tomcat with Solr instance

2009-12-11 Thread insaneyogi3008

Hello,

I have a webapp on Tomcat that displays search results to the end users . I
have a Solr instance on the same linux box that has data indexed . 

The webapp wraps an instance of CommonHTTPSolrServer presumably to send
HTTP requests to the Solr instance , now if Solr has indexed documents , how
does it return the document(s) that satisfied the search query ?

I am a little confused about how this communication between tomcat  solr
takes place




-- 
View this message in context: 
http://old.nabble.com/Synchronizing-Webapp-on-Tomcat-with-Solr-instance-tp26754143p26754143.html
Sent from the Solr - Dev mailing list archive at Nabble.com.