[jira] Created: (SOLR-1768) Text Categorization Transformer

2010-02-09 Thread Shalin Shekhar Mangar (JIRA)
Text Categorization Transformer
---

 Key: SOLR-1768
 URL: https://issues.apache.org/jira/browse/SOLR-1768
 Project: Solr
  Issue Type: New Feature
  Components: contrib - DataImportHandler
Reporter: Shalin Shekhar Mangar
Priority: Minor


A Transformer which uses TCatNG - http://tcatng.sourceforge.net/ (BSD license) 
to categorize text.

See original discussion at - 
http://www.lucidimagination.com/search/document/37c1f48fb8224171/is_it_posible_to_exclude_results_from_other_languages

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (SOLR-1767) DataImportHandler: dataimporter.functions.escapeSql() does not escape backslash character

2010-02-09 Thread Noble Paul (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Noble Paul resolved SOLR-1767.
--

Resolution: Fixed

committed r908357
Thanks Sean Timm

> DataImportHandler: dataimporter.functions.escapeSql() does not escape 
> backslash character
> -
>
> Key: SOLR-1767
> URL: https://issues.apache.org/jira/browse/SOLR-1767
> Project: Solr
>  Issue Type: Bug
>  Components: contrib - DataImportHandler
>Affects Versions: 1.4
>Reporter: Sean Timm
>Assignee: Noble Paul
> Fix For: 1.5
>
> Attachments: SOLR-1767.patch
>
>
> MySQL requires that the backslash and the quote character used to quote the 
> string in the query be escaped.  Currently only single and double quotes are 
> escaped.
> See: http://dev.mysql.com/doc/refman/4.1/en/mysql-real-escape-string.html

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (SOLR-1767) DataImportHandler: dataimporter.functions.escapeSql() does not escape backslash character

2010-02-09 Thread Noble Paul (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Noble Paul reassigned SOLR-1767:


Assignee: Noble Paul

> DataImportHandler: dataimporter.functions.escapeSql() does not escape 
> backslash character
> -
>
> Key: SOLR-1767
> URL: https://issues.apache.org/jira/browse/SOLR-1767
> Project: Solr
>  Issue Type: Bug
>  Components: contrib - DataImportHandler
>Affects Versions: 1.4
>Reporter: Sean Timm
>Assignee: Noble Paul
> Fix For: 1.5
>
> Attachments: SOLR-1767.patch
>
>
> MySQL requires that the backslash and the quote character used to quote the 
> string in the query be escaped.  Currently only single and double quotes are 
> escaped.
> See: http://dev.mysql.com/doc/refman/4.1/en/mysql-real-escape-string.html

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (SOLR-1089) do write to Solr in a separate thread

2010-02-09 Thread Noble Paul (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1089?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Noble Paul resolved SOLR-1089.
--

Resolution: Duplicate

> do write to Solr in a separate thread
> -
>
> Key: SOLR-1089
> URL: https://issues.apache.org/jira/browse/SOLR-1089
> Project: Solr
>  Issue Type: Improvement
>  Components: contrib - DataImportHandler
>Reporter: Noble Paul
>Assignee: Noble Paul
> Attachments: SOLR-1089.patch, SOLR-1089.patch, SOLR-1089.patch
>
>
> import can be made faster if the write is done in a different thread

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (SOLR-1766) DIH with threads enabled doesn't respond to the abort command

2010-02-09 Thread Noble Paul (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1766?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Noble Paul resolved SOLR-1766.
--

Resolution: Fixed

committed r908355

Thanks Michael Henson

> DIH with threads enabled doesn't respond to the abort command
> -
>
> Key: SOLR-1766
> URL: https://issues.apache.org/jira/browse/SOLR-1766
> Project: Solr
>  Issue Type: Bug
>  Components: contrib - DataImportHandler
>Affects Versions: 1.5
> Environment: tomcat 6.x,  jdk 1.6.x, windows and linux
>Reporter: Michael Henson
>Assignee: Noble Paul
> Fix For: 1.5
>
> Attachments: solr-1766.patch
>
>
> When the multithreaded entity processor is enabled by adding the threads="x" 
> attribute to an entity, the thread runner code doesn't check the status of 
> the abort flag. The process continues to run after the abort command is given.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



SolrCloud - Using collections, slices and shards in the wild

2010-02-09 Thread Jon Gifford
I've been following the progress of the SolrCloud branch closely, and
wanted to explain how I intend to use it, and what that means for how
the collections, slices and shards could work.

I should say up front that Mark, Yonik and I have exchanged a few
emails on this already, and Mark suggested that I switch to this list
to "drum up more interest in others playing with the branch and
chiming in with thoughts." I realize the code is still at a very early
stage, so this is really intended to be more grist for the mill, not a
criticism of the current implementation, whih seems to me to be a very
nice base for what I need to do. Also, I'll apologize up front for the
length of this email, but I wanted to paint as clear a picture as I
could of how I intend to use this stuff.

The system I'll be building needs to be able to:

1) Support one index per customer, and many customers (thus, many
independent indices)

2) Share the same schema across all indicies

3) Allow for time-based shards within a single customers index.

4) as an added twist, some customers will be sending data faster than
can be indexed in a single core, so we'll also need to split the input
stream to multiple cores. Thus, for a given time-based shard, we're
likely to have multiple parallel indexers building independent shards.

Mapping these requirements to the current state of SolrCloud, I could
use a single collection (i.e. a single schema) that all customer
indicies are part of, then create slices of that collection to
represent an individual customers index, each made up of a set of time
based shards, which may themselves be built in parallel on independent
cores..

Alternatively, I could create a collection per customer, which removes
the need for slices, but means duplicating the schema many times. From
an operational standpoint, a single collection makes more sense to me.

The current state of the branch allows me to do some, but not all, of
what I need to do, and I wanted to walk through how I could see myself
using it

Firstly, I'd like to be able to use the REST interface to create new
cores/shards - I'm not going to bet that this is what the final system
will do, but for the stage I'm at now, its the simplest, quickest way
to get going. The current code uses the core name as the collection
name, which won't work for me if I use a single collection. For
example, if I want to create a new core for customer_1 for todays
index, I'd do the following:


http://localhost:8983/solr/admin/cores?action=CREATE&instanceDir=.&name=collection_1&dataDir=data/customer_1_20100209

This approach is going to lead to a lot of solr instances ;-)

Revising the code to use the core name as a slice, I'd get:


http://localhost:8983/solr/admin/cores?action=CREATE&instanceDir=.&name=customer_1&dataDir=data/customer_1_20100209

but would need to explicitly add a collection=collection_1 parameter
to the call to make sure it uses the correct collection. The problem
with this approach is that I'm now limited to only being able deliver
one shard per customer from each Solr instance.

Revising again, to use the core name as the shard name, I'd get:


http://localhost:8983/solr/admin/cores?action=CREATE&instanceDir=.&name=customer_1_20100209&dataDir=data/customer_1_201-0209

and would need explicit collection= and slice= parameters. This is the
ideal situation, because I can run as many hards from the same
customer as I like on a single Solr instance.

So, essentially what I'm saying is that cores and shards really are
identical, and when a core is created, we should be able to specify
the collection and slice that they belong to, via the REST interface.

Here's Marks' comments on this...

> I think we simply haven't thought much about creating cores dynamically
> with http requests yet. You can set a custom shard id initially in the
> solr.xml, or using the CloudDescriptor on the CoreDescriptor when doing
> it programmatically.
>
> Its a good issue to bring up  - I think we will want support to handle
> this stuff with the core admin handler. I can add the basics pretty soon
> I think.
>
> The way things default now (core name is collection) is really only for
> simple bootstrap situations.

and

> Yeah, I think you can do quite a bit with it now, but there is def still
> a lot planned. We are actually working on polishing off what we have as
> a first plateau now. We have mostly been working from either static
> configuration and/or java code in building it up though, so personally
> it hadn't even yet hit me to take care of the HTTP CoreAdmin side of
> things. From a dev side, I just havn't had to use it much, so when I
> think dynamic cores I'm usually thinking java code style.

The second part of what I need is to be able to search a single
customers index, which I'm assuming will be a slice. Something like:

http://localhost:8983/solr/collection1/select?distrib=true&slice=customer_1

would do the trick, assuming we have the slice => shards map

[jira] Updated: (SOLR-1767) DataImportHandler: dataimporter.functions.escapeSql() does not escape backslash character

2010-02-09 Thread Sean Timm (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Timm updated SOLR-1767:


Attachment: SOLR-1767.patch

adds escaping of backslash with a test case.

> DataImportHandler: dataimporter.functions.escapeSql() does not escape 
> backslash character
> -
>
> Key: SOLR-1767
> URL: https://issues.apache.org/jira/browse/SOLR-1767
> Project: Solr
>  Issue Type: Bug
>  Components: contrib - DataImportHandler
>Affects Versions: 1.4
>Reporter: Sean Timm
> Fix For: 1.5
>
> Attachments: SOLR-1767.patch
>
>
> MySQL requires that the backslash and the quote character used to quote the 
> string in the query be escaped.  Currently only single and double quotes are 
> escaped.
> See: http://dev.mysql.com/doc/refman/4.1/en/mysql-real-escape-string.html

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-1767) DataImportHandler: dataimporter.functions.escapeSql() does not escape backslash character

2010-02-09 Thread Sean Timm (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Timm updated SOLR-1767:


   Labels: dih  (was: )
  Component/s: contrib - DataImportHandler
Fix Version/s: 1.5
  Description: 
MySQL requires that the backslash and the quote character used to quote the 
string in the query be escaped.  Currently only single and double quotes are 
escaped.

See: http://dev.mysql.com/doc/refman/4.1/en/mysql-real-escape-string.html
Affects Version/s: 1.4
  Summary: DataImportHandler: dataimporter.functions.escapeSql() 
does not escape backslash character  (was: DataImportHandler: 
dataimporter.functions.escapeSql)

> DataImportHandler: dataimporter.functions.escapeSql() does not escape 
> backslash character
> -
>
> Key: SOLR-1767
> URL: https://issues.apache.org/jira/browse/SOLR-1767
> Project: Solr
>  Issue Type: Bug
>  Components: contrib - DataImportHandler
>Affects Versions: 1.4
>Reporter: Sean Timm
> Fix For: 1.5
>
>
> MySQL requires that the backslash and the quote character used to quote the 
> string in the query be escaped.  Currently only single and double quotes are 
> escaped.
> See: http://dev.mysql.com/doc/refman/4.1/en/mysql-real-escape-string.html

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (SOLR-1767) DataImportHandler: dataimporter.functions.escapeSql

2010-02-09 Thread Sean Timm (JIRA)
DataImportHandler: dataimporter.functions.escapeSql
---

 Key: SOLR-1767
 URL: https://issues.apache.org/jira/browse/SOLR-1767
 Project: Solr
  Issue Type: Bug
Reporter: Sean Timm




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (SOLR-1766) DIH with threads enabled doesn't respond to the abort command

2010-02-09 Thread Noble Paul (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1766?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Noble Paul reassigned SOLR-1766:


Assignee: Noble Paul

> DIH with threads enabled doesn't respond to the abort command
> -
>
> Key: SOLR-1766
> URL: https://issues.apache.org/jira/browse/SOLR-1766
> Project: Solr
>  Issue Type: Bug
>  Components: contrib - DataImportHandler
>Affects Versions: 1.5
> Environment: tomcat 6.x,  jdk 1.6.x, windows and linux
>Reporter: Michael Henson
>Assignee: Noble Paul
> Fix For: 1.5
>
> Attachments: solr-1766.patch
>
>
> When the multithreaded entity processor is enabled by adding the threads="x" 
> attribute to an entity, the thread runner code doesn't check the status of 
> the abort flag. The process continues to run after the abort command is given.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-1766) DIH with threads enabled doesn't respond to the abort command

2010-02-09 Thread Michael Henson (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1766?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Henson updated SOLR-1766:
-

Attachment: solr-1766.patch

Simply adding a test for the status of the abort tag here seems to work.

> DIH with threads enabled doesn't respond to the abort command
> -
>
> Key: SOLR-1766
> URL: https://issues.apache.org/jira/browse/SOLR-1766
> Project: Solr
>  Issue Type: Bug
>  Components: contrib - DataImportHandler
>Affects Versions: 1.5
> Environment: tomcat 6.x,  jdk 1.6.x, windows and linux
>Reporter: Michael Henson
> Fix For: 1.5
>
> Attachments: solr-1766.patch
>
>
> When the multithreaded entity processor is enabled by adding the threads="x" 
> attribute to an entity, the thread runner code doesn't check the status of 
> the abort flag. The process continues to run after the abort command is given.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (SOLR-1766) DIH with threads enabled doesn't respond to the abort command

2010-02-09 Thread Michael Henson (JIRA)
DIH with threads enabled doesn't respond to the abort command
-

 Key: SOLR-1766
 URL: https://issues.apache.org/jira/browse/SOLR-1766
 Project: Solr
  Issue Type: Bug
  Components: contrib - DataImportHandler
Affects Versions: 1.5
 Environment: tomcat 6.x,  jdk 1.6.x, windows and linux
Reporter: Michael Henson
 Fix For: 1.5


When the multithreaded entity processor is enabled by adding the threads="x" 
attribute to an entity, the thread runner code doesn't check the status of the 
abort flag. The process continues to run after the abort command is given.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-1579) CLONE -stats.jsp XML escaping

2010-02-09 Thread David Bowen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Bowen updated SOLR-1579:
--

Attachment: SOLR-1579.patch

This is a trivial fix, but I'm supplying a patch in the hope of raising the 
priority.

> CLONE -stats.jsp XML escaping
> -
>
> Key: SOLR-1579
> URL: https://issues.apache.org/jira/browse/SOLR-1579
> Project: Solr
>  Issue Type: Bug
>  Components: web gui
>Affects Versions: 1.4
>Reporter: David Bowen
>Assignee: Erik Hatcher
> Fix For: 1.5
>
> Attachments: SOLR-1579.patch
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> The fix to SOLR-1008 was wrong.  It used chardata escaping for a value that 
> is an attribute value.
> I.e. instead of XML.escapeCharData it should call XML.escapeAttributeValue.
> Otherwise, any query used as a key in the filter cache whose printed 
> representation contains a double-quote character causes invalid XML to be 
> generated.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: svn commit: r908194 - in /lucene/solr/branches/cloud/src: solrj/org/apache/solr/client/solrj/impl/CloudSolrServer.java test/org/apache/solr/BaseDistributedSearchTestCase.java test/org/apache/sol

2010-02-09 Thread Yonik Seeley
On Tue, Feb 9, 2010 at 3:10 PM,   wrote:
> Author: yonik
> Date: Tue Feb  9 20:10:12 2010
> New Revision: 908194
>
> URL: http://svn.apache.org/viewvc?rev=908194&view=rev
> Log:
> solrj distrib test code - test currently fails when enabled


Scratch that - it was only failing because I had another solr server
up at port 8983, and the test framework incorrectly registers a node
at that same port.

-Yonik
http://www.lucidimagination.com


> Modified:
>    
> lucene/solr/branches/cloud/src/solrj/org/apache/solr/client/solrj/impl/CloudSolrServer.java
>    
> lucene/solr/branches/cloud/src/test/org/apache/solr/BaseDistributedSearchTestCase.java
>    
> lucene/solr/branches/cloud/src/test/org/apache/solr/cloud/BasicDistributedZkTest.java
>
> Modified: 
> lucene/solr/branches/cloud/src/solrj/org/apache/solr/client/solrj/impl/CloudSolrServer.java
> URL: 
> http://svn.apache.org/viewvc/lucene/solr/branches/cloud/src/solrj/org/apache/solr/client/solrj/impl/CloudSolrServer.java?rev=908194&r1=908193&r2=908194&view=diff
> ==
> --- 
> lucene/solr/branches/cloud/src/solrj/org/apache/solr/client/solrj/impl/CloudSolrServer.java
>  (original)
> +++ 
> lucene/solr/branches/cloud/src/solrj/org/apache/solr/client/solrj/impl/CloudSolrServer.java
>  Tue Feb  9 20:10:12 2010
> @@ -118,7 +118,7 @@
>     }
>
>     Collections.shuffle(urlList, rand);
> -
> +    // System.out.println("## MAKING REQUEST TO " + 
> urlList);
>     // TODO: set distrib=true if we detected more than one shard?
>     LBHttpSolrServer.Req req = new LBHttpSolrServer.Req(request, urlList);
>     LBHttpSolrServer.Rsp rsp = lbServer.request(req);
>
> Modified: 
> lucene/solr/branches/cloud/src/test/org/apache/solr/BaseDistributedSearchTestCase.java
> URL: 
> http://svn.apache.org/viewvc/lucene/solr/branches/cloud/src/test/org/apache/solr/BaseDistributedSearchTestCase.java?rev=908194&r1=908193&r2=908194&view=diff
> ==
> --- 
> lucene/solr/branches/cloud/src/test/org/apache/solr/BaseDistributedSearchTestCase.java
>  (original)
> +++ 
> lucene/solr/branches/cloud/src/test/org/apache/solr/BaseDistributedSearchTestCase.java
>  Tue Feb  9 20:10:12 2010
> @@ -290,6 +290,14 @@
>     for (SolrServer client : clients) client.commit();
>   }
>
> +  protected QueryResponse queryServer(ModifiableSolrParams params) throws 
> SolrServerException {
> +    // query a random server
> +    int which = r.nextInt(clients.size());
> +    SolrServer client = clients.get(which);
> +    QueryResponse rsp = client.query(params);
> +    return rsp;
> +  }
> +
>   protected void query(Object... q) throws Exception {
>     final ModifiableSolrParams params = new ModifiableSolrParams();
>
> @@ -300,10 +308,8 @@
>     final QueryResponse controlRsp = controlClient.query(params);
>
>     setDistributedParams(params);
> -    // query a random server
> -    int which = r.nextInt(clients.size());
> -    SolrServer client = clients.get(which);
> -    QueryResponse rsp = client.query(params);
> +
> +    QueryResponse rsp = queryServer(params);
>
>     //compareResponses(rsp, controlRsp);
>
>
> Modified: 
> lucene/solr/branches/cloud/src/test/org/apache/solr/cloud/BasicDistributedZkTest.java
> URL: 
> http://svn.apache.org/viewvc/lucene/solr/branches/cloud/src/test/org/apache/solr/cloud/BasicDistributedZkTest.java?rev=908194&r1=908193&r2=908194&view=diff
> ==
> --- 
> lucene/solr/branches/cloud/src/test/org/apache/solr/cloud/BasicDistributedZkTest.java
>  (original)
> +++ 
> lucene/solr/branches/cloud/src/test/org/apache/solr/cloud/BasicDistributedZkTest.java
>  Tue Feb  9 20:10:12 2010
> @@ -17,10 +17,14 @@
>  * limitations under the License.
>  */
>
> +import java.net.MalformedURLException;
>  import java.util.HashSet;
>
> +import org.apache.solr.client.solrj.SolrServer;
>  import org.apache.solr.client.solrj.SolrServerException;
>  import org.apache.solr.client.solrj.embedded.JettySolrRunner;
> +import org.apache.solr.client.solrj.impl.CloudSolrServer;
> +import org.apache.solr.client.solrj.response.QueryResponse;
>  import org.apache.solr.common.params.ModifiableSolrParams;
>  import org.apache.solr.core.CoreDescriptor;
>  import org.apache.solr.core.SolrCore;
> @@ -259,4 +263,29 @@
>     super.printLayout();
>
>   }
> +
> +
> +  volatile CloudSolrServer solrj;
> +
> + �...@override
> +  protected QueryResponse queryServer(ModifiableSolrParams params) throws 
> SolrServerException {
> +    if (true || r.nextBoolean())
> +      return super.queryServer(params);
> +
> +    // use the distributed solrj client
> +    if (solrj == null) {
> +      synchronized(this) {
> +        try {
> +          CloudSolrServer server = new 
> CloudSolrServer(AbstractZkTestCase.ZOO_KEEPER_ADDRESS);
> +          server.setDefaultCollection("collection1");
> +         

[jira] Resolved: (SOLR-1722) Allowing changing the "special" default core name, and as a default default core name, switch to using collection1 rather than DEFAULT_CORE

2010-02-09 Thread Mark Miller (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1722?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Miller resolved SOLR-1722.
---

Resolution: Fixed

> Allowing changing the "special" default core name, and as a default default 
> core name, switch to using collection1 rather than DEFAULT_CORE
> ---
>
> Key: SOLR-1722
> URL: https://issues.apache.org/jira/browse/SOLR-1722
> Project: Solr
>  Issue Type: Improvement
>Reporter: Mark Miller
>Assignee: Mark Miller
>Priority: Minor
> Fix For: 1.5
>
> Attachments: SOLR-1722.patch
>
>
> see 
> http://search.lucidimagination.com/search/document/f5f2af7c5041a79e/default_core

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-236) Field collapsing

2010-02-09 Thread Kevin Cunningham (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12831617#action_12831617
 ] 

Kevin Cunningham commented on SOLR-236:
---

No, just field collapsing.  We went back to the field-collapse-5.patch for the 
time being.  So far its been good and we updated just to get closer to the 
latest not because we were seeing issues.  Thanks.

> Field collapsing
> 
>
> Key: SOLR-236
> URL: https://issues.apache.org/jira/browse/SOLR-236
> Project: Solr
>  Issue Type: New Feature
>  Components: search
>Affects Versions: 1.3
>Reporter: Emmanuel Keller
>Assignee: Shalin Shekhar Mangar
> Fix For: 1.5
>
> Attachments: collapsing-patch-to-1.3.0-dieter.patch, 
> collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, 
> collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch, 
> field-collapse-4-with-solrj.patch, field-collapse-5.patch, 
> field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, 
> field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, 
> field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, 
> field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, 
> field-collapse-5.patch, field-collapse-5.patch, 
> field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, 
> field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, 
> field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, 
> field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, 
> quasidistributed.additional.patch, SOLR-236-FieldCollapsing.patch, 
> SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, 
> SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, 
> SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, solr-236.patch, 
> SOLR-236_collapsing.patch, SOLR-236_collapsing.patch
>
>
> This patch include a new feature called "Field collapsing".
> "Used in order to collapse a group of results with similar value for a given 
> field to a single entry in the result set. Site collapsing is a special case 
> of this, where all results for a given web site is collapsed into one or two 
> entries in the result set, typically with an associated "more documents from 
> this site" link. See also Duplicate detection."
> http://www.fastsearch.com/glossary.aspx?m=48&amid=299
> The implementation add 3 new query parameters (SolrParams):
> "collapse.field" to choose the field used to group results
> "collapse.type" normal (default value) or adjacent
> "collapse.max" to select how many continuous results are allowed before 
> collapsing
> TODO (in progress):
> - More documentation (on source code)
> - Test cases
> Two patches:
> - "field_collapsing.patch" for current development version
> - "field_collapsing_1.1.0.patch" for Solr-1.1.0
> P.S.: Feedback and misspelling correction are welcome ;-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1568) Implement Spatial Filter

2010-02-09 Thread Grant Ingersoll (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1568?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12831611#action_12831611
 ] 

Grant Ingersoll commented on SOLR-1568:
---

Actually, in thinking some more about this, it seems like it is just as easy to 
extend the FieldType to override a method as it is to invent new syntax to 
support configuring these things.  I'm going to go that route, which is, of 
course, what Yonik suggested all along :-)

> Implement Spatial Filter
> 
>
> Key: SOLR-1568
> URL: https://issues.apache.org/jira/browse/SOLR-1568
> Project: Solr
>  Issue Type: New Feature
>Reporter: Grant Ingersoll
>Assignee: Grant Ingersoll
>Priority: Minor
> Fix For: 1.5
>
> Attachments: CartesianTierQParserPlugin.java, SOLR-1568.patch
>
>
> Given an index with spatial information (either as a geohash, 
> SpatialTileField (see SOLR-1586) or just two lat/lon pairs), we should be 
> able to pass in a filter query that takes in the field name, lat, lon and 
> distance and produces an appropriate Filter (i.e. one that is aware of the 
> underlying field type for use by Solr. 
> The interface _could_ look like:
> {code}
> &fq={!sfilt dist=20}location:49.32,-79.0
> {code}
> or it could be:
> {code}
> &fq={!sfilt lat=49.32 lat=-79.0 f=location dist=20}
> {code}
> or:
> {code}
> &fq={!sfilt p=49.32,-79.0 f=location dist=20}
> {code}
> or:
> {code}
> &fq={!sfilt lat=49.32,-79.0 fl=lat,lon dist=20}
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: priority queue in query component

2010-02-09 Thread Ted Dunning
Katta has a very flexible and usable option for this even in the absence of
replicas.

The idea is that shards may report results, may report failure, may report
late, may never report or may have a transport layer issue.  All kinds of
behavior should be handled.

What is done with katta is that each search has a deadline and a partial
results policy.  At any time, if all results have been received, a complete
set of results is returned.  If a deadline is reached, then the policy is
interrogated with the results so far.  The policy has the option to return a
failure, partial results (with timeouts reported on missing shards) or to
set a new deadline and possibly a new policy (so that the number of missing
results gets more relaxed as time passes).  The policy is also called each
time a new result is received or failure is noted.

Transport layer issues and explicit error returns are handled by the
framework.  Any time one of these is encountered, the search is immediately
dispatched to a replica of the shard if one exists.  In that case, that
query may have a late start and may not return by the deadline, depending on
policy.  If no replica is available that has not been queried, an error
result is recorded for that shard.

Note that Katta even supports fail-fast in this scenario since the partial
result policy can return a new deadline for all partial results that have no
hard failures and can return a failure if it notes any shard failures.

On Tue, Feb 9, 2010 at 5:25 AM, Yonik Seeley wrote:

> The SolrCloud branch now has load balancing and fail-over amongst
> shard replicas.
> Partial results aren't available yet (if there are no up replicas for
> a shard), but that is planned.
>
> -Yonik
> http://www.lucidimagination.com
>
>
> On Tue, Feb 9, 2010 at 8:21 AM, Jan Høydahl / Cominvent
>  wrote:
> > Isn't that OK as long as there is the option of allowing partial results
> if you really want?
> > Keeping the logic simple has its benefits. Let client be responsible for
> query resubmit strategy, and let load balancer (or shard manager) be
> responsible for marking a node/shard as dead/inresponsive and choosing
> another for the next query.
> >
> > --
> > Jan Høydahl  - search architect
> > Cominvent AS - www.cominvent.com
> >
> > On 9. feb. 2010, at 04.36, Lance Norskog wrote:
> >
> >> At this point, Distributed Search does not support any recovery if
> >> when one or more shards fail. If any fail or time out, the whole query
> >> fails.
> >>
> >> On Sat, Feb 6, 2010 at 9:34 AM, mike anderson 
> wrote:
> >>> "so if we received the response from shard2 before shard1, we would
> just
> >>> queue it up and wait for the response to shard1."
> >>>
> >>> This crossed my mind, but my concern was how to handle the case when
> shard1
> >>> never responds. Is this something I need to worry about?
> >>>
> >>> -mike
> >>>
> >>> On Sat, Feb 6, 2010 at 11:33 AM, Yonik Seeley <
> yo...@lucidimagination.com>wrote:
> >>>
>  It seems like changing an element in a priority queue breaks the
>  invariants, and hence it's not doable with a priority queue and with
>  the current strategy of adding sub-responses as they are received.
> 
>  One way to continue using a priority queue would be to add
>  sub-responses to the queue in the preferred order... so if we received
>  the response from shard2 before shard1, we would just queue it up and
>  wait for the response to shard1.
> 
>  -Yonik
>  http://www.lucidimagination.com
> 
> 
>  On Sat, Feb 6, 2010 at 10:35 AM, mike anderson <
> saidthero...@gmail.com>
>  wrote:
> > I have a need to favor documents from one shard over another when
>  duplicates
> > occur. I found this code in the query component:
> >
> >  String prevShard = uniqueDoc.put(id, srsp.getShard());
> >  if (prevShard != null) {
> >// duplicate detected
> >numFound--;
> >
> >// For now, just always use the first encountered since we
>  can't
> > currently
> >// remove the previous one added to the priority queue.
>  If we
> > switched
> >// to the Java5 PriorityQueue, this would be easier.
> >continue;
> >// make which duplicate is used deterministic based on
> shard
> >// if (prevShard.compareTo(srsp.shard) >= 0) {
> >//  TODO: remove previous from priority queue
> >//  continue;
> >// }
> >  }
> >
> >
> > Is there a ticket open for this issue? What would it take to fix?
> >
> > Thanks,
> > Mike
> >
> 
> >>>
> >>
> >>
> >>
> >> --
> >> Lance Norskog
> >> goks...@gmail.com
> >
> >
>



-- 
Ted Dunning, CTO
DeepDyve


[jira] Created: (SOLR-1765) ETag calculation is incorrect for distributed searches

2010-02-09 Thread Charlie Jackson (JIRA)
ETag calculation is incorrect for distributed searches
--

 Key: SOLR-1765
 URL: https://issues.apache.org/jira/browse/SOLR-1765
 Project: Solr
  Issue Type: Bug
  Components: multicore, search
Affects Versions: 1.4
Reporter: Charlie Jackson
Priority: Minor


When searching across multiple shards with HTTP caching enabled, the ETag value 
in the response is only using the searcher in the original request, not the 
shards. For example, take the query

http://localhost:8983/solr/core1/select/?q=google&shards=localhost:8983/solr/core2,localhost:8983/solr/core3

ETag should be calculated off of core2 and core3, instead it's being calculated 
from core1.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1316) Create autosuggest component

2010-02-09 Thread Shalin Shekhar Mangar (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12831544#action_12831544
 ] 

Shalin Shekhar Mangar commented on SOLR-1316:
-

{quote}Where are we on this - do people feel it's ready to commit?{quote}

It has been some time since I looked at it but I don't feel it is ready. Using 
it through spellcheck works but specifying spell check params feels odd. Also, 
I don't know how well it compares to regular TermsComponent or facet.prefix 
searches in terms of memory and cpu cost.

> Create autosuggest component
> 
>
> Key: SOLR-1316
> URL: https://issues.apache.org/jira/browse/SOLR-1316
> Project: Solr
>  Issue Type: New Feature
>  Components: search
>Affects Versions: 1.4
>Reporter: Jason Rutherglen
>Assignee: Shalin Shekhar Mangar
>Priority: Minor
> Fix For: 1.5
>
> Attachments: suggest.patch, suggest.patch, suggest.patch, TST.zip
>
>   Original Estimate: 96h
>  Remaining Estimate: 96h
>
> Autosuggest is a common search function that can be integrated
> into Solr as a SearchComponent. Our first implementation will
> use the TernaryTree found in Lucene contrib. 
> * Enable creation of the dictionary from the index or via Solr's
> RPC mechanism
> * What types of parameters and settings are desirable?
> * Hopefully in the future we can include user click through
> rates to boost those terms/phrases higher

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1316) Create autosuggest component

2010-02-09 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12831490#action_12831490
 ] 

Yonik Seeley commented on SOLR-1316:


Where are we on this - do people feel it's ready to commit?
We probably want to add some unit tests too, and some documentation on the wiki 
at some point.

AFAIK, we're limited to one spellcheck component per request handler - that 
should be OK though, since presumably this is meant to be used on it's own, 
right?  What is the recommended/default configuration?  We should probably add 
it as a /autocomplete handler in the example server.

Does this currently work with phrases?

> Create autosuggest component
> 
>
> Key: SOLR-1316
> URL: https://issues.apache.org/jira/browse/SOLR-1316
> Project: Solr
>  Issue Type: New Feature
>  Components: search
>Affects Versions: 1.4
>Reporter: Jason Rutherglen
>Assignee: Shalin Shekhar Mangar
>Priority: Minor
> Fix For: 1.5
>
> Attachments: suggest.patch, suggest.patch, suggest.patch, TST.zip
>
>   Original Estimate: 96h
>  Remaining Estimate: 96h
>
> Autosuggest is a common search function that can be integrated
> into Solr as a SearchComponent. Our first implementation will
> use the TernaryTree found in Lucene contrib. 
> * Enable creation of the dictionary from the index or via Solr's
> RPC mechanism
> * What types of parameters and settings are desirable?
> * Hopefully in the future we can include user click through
> rates to boost those terms/phrases higher

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: [jira] Created: (SOLR-1713) Carrot2 Clustering should have an option to cluster on a different number of rows than the DocList size

2010-02-09 Thread Grant Ingersoll

On Feb 9, 2010, at 9:21 AM, Zacarias wrote:

> Hi,
> 
> I want to solve the
> https://issues.apache.org/jira/browse/SOLR-1713improvment but I have
> some questions. If somebody can give a little
> orientation should be great.
> 
> What the issue says is "Query rows=10 but cluster on more"?
> If this is what it says, the idea is to solve using results or collection
> part of the ClusteringComponent. (Because Collection part uses
> DocumentEngine, which is in experimental state).
> If the user wants to cluster on more rows, should I query twice or just
> query by the biggest quantity of rows and then reduce the number at the end?

I think we want to avoid querying twice.  I would query by the max of rows and 
a new parameter (cluster_rows? internal_rows?  Other?) and then reduce the 
number at the end.  It's a little tricky, b/c we likely don't want to couple 
the QueryComponent to the ClusterComponent, so we may want to make this just a 
wee bit more generic.

-Grant

Re: [jira] Created: (SOLR-1713) Carrot2 Clustering should have an option to cluster on a different number of rows than the DocList size

2010-02-09 Thread Zacarias
Hi,

I want to solve the
https://issues.apache.org/jira/browse/SOLR-1713improvment but I have
some questions. If somebody can give a little
orientation should be great.

What the issue says is "Query rows=10 but cluster on more"?
If this is what it says, the idea is to solve using results or collection
part of the ClusteringComponent. (Because Collection part uses
DocumentEngine, which is in experimental state).
If the user wants to cluster on more rows, should I query twice or just
query by the biggest quantity of rows and then reduce the number at the end?

Regards,
Zacarias.





On Sat, Jan 9, 2010 at 6:25 PM, Grant Ingersoll (JIRA) wrote:

> Carrot2 Clustering should have an option to cluster on a different number
> of rows than the DocList size
>
> ---
>
> Key: SOLR-1713
> URL: https://issues.apache.org/jira/browse/SOLR-1713
> Project: Solr
>  Issue Type: Improvement
>  Components: contrib - Clustering
>Reporter: Grant Ingersoll
>
>
> It would be nice if, in the Carrot2 clustering, we could only return 10
> rows as part of the query, but cluster on more.  Alternatively, it may even
> make sense to be able to cluster on the DocSet, too.
>
>
> --
> This message is automatically generated by JIRA.
> -
> You can reply to this email to add a comment to the issue online.
>
>


[jira] Commented: (SOLR-1764) While indexing a "java.lang.IllegalStateException: Can't overwrite cause" exception is thrown

2010-02-09 Thread Fuad Efendi (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12831458#action_12831458
 ] 

Fuad Efendi commented on SOLR-1764:
---

Funny, it might happen that this is not a problem with JDK 1.6.0_9; or may be 
with latest JDK. As a quick workaround... Also, you may try to use SolrJ with 
binary format...
I'll try to check that word&word doesn't cause a 
problem...

> While indexing a "java.lang.IllegalStateException: Can't overwrite cause" 
> exception is thrown
> -
>
> Key: SOLR-1764
> URL: https://issues.apache.org/jira/browse/SOLR-1764
> Project: Solr
>  Issue Type: Bug
>  Components: clients - java
>Affects Versions: 1.4
> Environment: Windows XP, JBoss 4.2.3 GA
>Reporter: Michael McGowan
>Priority: Blocker
>
> I get an exception while indexing. It seems that I'm unable to see the root 
> cause of the exception because it is masked by another 
> "java.lang.IllegalStateException: Can't overwrite cause" exception.
> Here is the stacktrace :
> 16:59:04,292 ERROR [STDERR] Feb 8, 2010 4:59:04 PM 
> org.apache.solr.update.processor.LogUpdateProcessor finish
> INFO: {} 0 15
> 16:59:04,292 ERROR [STDERR] Feb 8, 2010 4:59:04 PM 
> org.apache.solr.common.SolrException log
> SEVERE: java.lang.IllegalStateException: Can't overwrite cause
> at java.lang.Throwable.initCause(Throwable.java:320)
> at com.ctc.wstx.compat.Jdk14Impl.setInitCause(Jdk14Impl.java:70)
> at com.ctc.wstx.exc.WstxException.(WstxException.java:46)
> at com.ctc.wstx.exc.WstxIOException.(WstxIOException.java:16)
> at 
> com.ctc.wstx.stax.WstxInputFactory.doCreateSR(WstxInputFactory.java:536)
> at 
> com.ctc.wstx.stax.WstxInputFactory.createSR(WstxInputFactory.java:592)
> at 
> com.ctc.wstx.stax.WstxInputFactory.createSR(WstxInputFactory.java:648)
> at 
> com.ctc.wstx.stax.WstxInputFactory.createXMLStreamReader(WstxInputFactory.java:319)
> at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:68)
> at 
> org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:54)
> at 
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
> at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316)
> at 
> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)
> at 
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241)
> at 
> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
> at 
> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
> at 
> org.jboss.web.tomcat.filters.ReplyHeaderFilter.doFilter(ReplyHeaderFilter.java:96)
> at 
> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
> at 
> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
> at 
> org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:230)
> at 
> org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:175)
> at 
> org.jboss.web.tomcat.security.SecurityAssociationValve.invoke(SecurityAssociationValve.java:182)
> at 
> org.jboss.web.tomcat.security.JaccContextValve.invoke(JaccContextValve.java:84)
> at 
> org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
> at 
> org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
> at 
> org.jboss.web.tomcat.service.jca.CachedConnectionValve.invoke(CachedConnectionValve.java:157)
> at 
> org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
> at 
> org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:262)
> at 
> org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:844)
> at 
> org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:583)
> at 
> org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:446)
> at java.lang.Thread.run(Thread.java:619)
> 16:59:04,292 ERROR [STDERR] Feb 8, 2010 4:59:04 PM 
> org.apache.solr.core.SolrCore execute
> INFO: [] webapp=/solr path=/update params={wt=xml&version=2.2} status=500 
> QTime=15
> 16:59:04,292 ERROR [STDERR] Feb 8, 2010 4:59:04 PM 
> org.apache.solr.common.SolrException log
> SEVERE: java.lang.IllegalStateException: Can't overwrite cause
> at java.lang.Throwable.initCause(Throwable.java:320)
> at com.ctc.wstx.compat.Jdk14Impl.setInitCause(Jdk14Impl

[jira] Issue Comment Edited: (SOLR-1764) While indexing a "java.lang.IllegalStateException: Can't overwrite cause" exception is thrown

2010-02-09 Thread Fuad Efendi (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12831458#action_12831458
 ] 

Fuad Efendi edited comment on SOLR-1764 at 2/9/10 2:00 PM:
---

Funny, it might happen that this is not a problem with JDK 1.6.0_9; or may be 
with latest JDK. As a quick workaround... Also, you may try to use SolrJ with 
binary format...
I'll try to check that word&word doesn't cause a 
problem...



  was (Author: funtick):
Funny, it might happen that this is not a problem with JDK 1.6.0_9; or may 
be with latest JDK. As a quick workaround... Also, you may try to use SolrJ 
with binary format...
I'll try to check that word&word doesn't cause a 
problem...
  
> While indexing a "java.lang.IllegalStateException: Can't overwrite cause" 
> exception is thrown
> -
>
> Key: SOLR-1764
> URL: https://issues.apache.org/jira/browse/SOLR-1764
> Project: Solr
>  Issue Type: Bug
>  Components: clients - java
>Affects Versions: 1.4
> Environment: Windows XP, JBoss 4.2.3 GA
>Reporter: Michael McGowan
>Priority: Blocker
>
> I get an exception while indexing. It seems that I'm unable to see the root 
> cause of the exception because it is masked by another 
> "java.lang.IllegalStateException: Can't overwrite cause" exception.
> Here is the stacktrace :
> 16:59:04,292 ERROR [STDERR] Feb 8, 2010 4:59:04 PM 
> org.apache.solr.update.processor.LogUpdateProcessor finish
> INFO: {} 0 15
> 16:59:04,292 ERROR [STDERR] Feb 8, 2010 4:59:04 PM 
> org.apache.solr.common.SolrException log
> SEVERE: java.lang.IllegalStateException: Can't overwrite cause
> at java.lang.Throwable.initCause(Throwable.java:320)
> at com.ctc.wstx.compat.Jdk14Impl.setInitCause(Jdk14Impl.java:70)
> at com.ctc.wstx.exc.WstxException.(WstxException.java:46)
> at com.ctc.wstx.exc.WstxIOException.(WstxIOException.java:16)
> at 
> com.ctc.wstx.stax.WstxInputFactory.doCreateSR(WstxInputFactory.java:536)
> at 
> com.ctc.wstx.stax.WstxInputFactory.createSR(WstxInputFactory.java:592)
> at 
> com.ctc.wstx.stax.WstxInputFactory.createSR(WstxInputFactory.java:648)
> at 
> com.ctc.wstx.stax.WstxInputFactory.createXMLStreamReader(WstxInputFactory.java:319)
> at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:68)
> at 
> org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:54)
> at 
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
> at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316)
> at 
> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)
> at 
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241)
> at 
> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
> at 
> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
> at 
> org.jboss.web.tomcat.filters.ReplyHeaderFilter.doFilter(ReplyHeaderFilter.java:96)
> at 
> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
> at 
> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
> at 
> org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:230)
> at 
> org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:175)
> at 
> org.jboss.web.tomcat.security.SecurityAssociationValve.invoke(SecurityAssociationValve.java:182)
> at 
> org.jboss.web.tomcat.security.JaccContextValve.invoke(JaccContextValve.java:84)
> at 
> org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
> at 
> org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
> at 
> org.jboss.web.tomcat.service.jca.CachedConnectionValve.invoke(CachedConnectionValve.java:157)
> at 
> org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
> at 
> org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:262)
> at 
> org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:844)
> at 
> org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:583)
> at 
> org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:446)
> at java.lang.Thread.run(Thread.java:619)
> 16:59:04,292 ERROR [STDERR] Feb 8, 2010 4:59:04 PM 
> org.apache.solr.core.SolrCore execute
> INFO: [] webapp=/solr path=/update params={wt=xml&v

Re: priority queue in query component

2010-02-09 Thread Yonik Seeley
The SolrCloud branch now has load balancing and fail-over amongst
shard replicas.
Partial results aren't available yet (if there are no up replicas for
a shard), but that is planned.

-Yonik
http://www.lucidimagination.com


On Tue, Feb 9, 2010 at 8:21 AM, Jan Høydahl / Cominvent
 wrote:
> Isn't that OK as long as there is the option of allowing partial results if 
> you really want?
> Keeping the logic simple has its benefits. Let client be responsible for 
> query resubmit strategy, and let load balancer (or shard manager) be 
> responsible for marking a node/shard as dead/inresponsive and choosing 
> another for the next query.
>
> --
> Jan Høydahl  - search architect
> Cominvent AS - www.cominvent.com
>
> On 9. feb. 2010, at 04.36, Lance Norskog wrote:
>
>> At this point, Distributed Search does not support any recovery if
>> when one or more shards fail. If any fail or time out, the whole query
>> fails.
>>
>> On Sat, Feb 6, 2010 at 9:34 AM, mike anderson  wrote:
>>> "so if we received the response from shard2 before shard1, we would just
>>> queue it up and wait for the response to shard1."
>>>
>>> This crossed my mind, but my concern was how to handle the case when shard1
>>> never responds. Is this something I need to worry about?
>>>
>>> -mike
>>>
>>> On Sat, Feb 6, 2010 at 11:33 AM, Yonik Seeley 
>>> wrote:
>>>
 It seems like changing an element in a priority queue breaks the
 invariants, and hence it's not doable with a priority queue and with
 the current strategy of adding sub-responses as they are received.

 One way to continue using a priority queue would be to add
 sub-responses to the queue in the preferred order... so if we received
 the response from shard2 before shard1, we would just queue it up and
 wait for the response to shard1.

 -Yonik
 http://www.lucidimagination.com


 On Sat, Feb 6, 2010 at 10:35 AM, mike anderson 
 wrote:
> I have a need to favor documents from one shard over another when
 duplicates
> occur. I found this code in the query component:
>
>          String prevShard = uniqueDoc.put(id, srsp.getShard());
>          if (prevShard != null) {
>            // duplicate detected
>            numFound--;
>
>            // For now, just always use the first encountered since we
 can't
> currently
>            // remove the previous one added to the priority queue.  If we
> switched
>            // to the Java5 PriorityQueue, this would be easier.
>            continue;
>            // make which duplicate is used deterministic based on shard
>            // if (prevShard.compareTo(srsp.shard) >= 0) {
>            //  TODO: remove previous from priority queue
>            //  continue;
>            // }
>          }
>
>
> Is there a ticket open for this issue? What would it take to fix?
>
> Thanks,
> Mike
>

>>>
>>
>>
>>
>> --
>> Lance Norskog
>> goks...@gmail.com
>
>


Re: priority queue in query component

2010-02-09 Thread Jan Høydahl / Cominvent
Isn't that OK as long as there is the option of allowing partial results if you 
really want?
Keeping the logic simple has its benefits. Let client be responsible for query 
resubmit strategy, and let load balancer (or shard manager) be responsible for 
marking a node/shard as dead/inresponsive and choosing another for the next 
query.

--
Jan Høydahl  - search architect
Cominvent AS - www.cominvent.com

On 9. feb. 2010, at 04.36, Lance Norskog wrote:

> At this point, Distributed Search does not support any recovery if
> when one or more shards fail. If any fail or time out, the whole query
> fails.
> 
> On Sat, Feb 6, 2010 at 9:34 AM, mike anderson  wrote:
>> "so if we received the response from shard2 before shard1, we would just
>> queue it up and wait for the response to shard1."
>> 
>> This crossed my mind, but my concern was how to handle the case when shard1
>> never responds. Is this something I need to worry about?
>> 
>> -mike
>> 
>> On Sat, Feb 6, 2010 at 11:33 AM, Yonik Seeley 
>> wrote:
>> 
>>> It seems like changing an element in a priority queue breaks the
>>> invariants, and hence it's not doable with a priority queue and with
>>> the current strategy of adding sub-responses as they are received.
>>> 
>>> One way to continue using a priority queue would be to add
>>> sub-responses to the queue in the preferred order... so if we received
>>> the response from shard2 before shard1, we would just queue it up and
>>> wait for the response to shard1.
>>> 
>>> -Yonik
>>> http://www.lucidimagination.com
>>> 
>>> 
>>> On Sat, Feb 6, 2010 at 10:35 AM, mike anderson 
>>> wrote:
 I have a need to favor documents from one shard over another when
>>> duplicates
 occur. I found this code in the query component:
 
  String prevShard = uniqueDoc.put(id, srsp.getShard());
  if (prevShard != null) {
// duplicate detected
numFound--;
 
// For now, just always use the first encountered since we
>>> can't
 currently
// remove the previous one added to the priority queue.  If we
 switched
// to the Java5 PriorityQueue, this would be easier.
continue;
// make which duplicate is used deterministic based on shard
// if (prevShard.compareTo(srsp.shard) >= 0) {
//  TODO: remove previous from priority queue
//  continue;
// }
  }
 
 
 Is there a ticket open for this issue? What would it take to fix?
 
 Thanks,
 Mike
 
>>> 
>> 
> 
> 
> 
> -- 
> Lance Norskog
> goks...@gmail.com