[jira] [Commented] (SOLR-2700) transaction logging

2011-11-03 Thread Mike Anderson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13143211#comment-13143211
 ] 

Mike Anderson commented on SOLR-2700:
-

Will the transaction log be available via API? It would be very useful for 
application debugging if it were possible to query a record's transaction log 
and see a history of updates. 

> transaction logging
> ---
>
> Key: SOLR-2700
> URL: https://issues.apache.org/jira/browse/SOLR-2700
> Project: Solr
>  Issue Type: New Feature
>Reporter: Yonik Seeley
> Attachments: SOLR-2700.patch, SOLR-2700.patch, SOLR-2700.patch, 
> SOLR-2700.patch, SOLR-2700.patch, SOLR-2700.patch, SOLR-2700.patch
>
>
> A transaction log is needed for durability of updates, for a more performant 
> realtime-get, and for replaying updates to recovering peers.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Fwd: Any contribs available for Range field type?

2011-02-15 Thread mike anderson
-- Forwarded message --
From: kenf_nc 
Date: Fri, Feb 11, 2011 at 8:49 AM
Subject: Any contribs available for Range field type?
To: solr-u...@lucene.apache.org



I have a huge need for a new field type. It would be a Poly field, similar
to
Point or Payload. It would take 2 data elements and a search would return a
hit if the search term fell within the range of the elements. For example
let's say I have a document representing an Employment record. I may want to
create a field for years_of_service where it would take values 1999,2004.
Then in a query q=years_of_service:2001 would be a hit,
q=years_of_service:2010 would not. The field would need to take a data type
attribute as a parameter. I may need to do integer ranges, float/double
ranges, date ranges. I don't see the need now, but heck maybe even a string
range. This would be useful for things like Event dates. An event often
occurs between several days (or hours) but the query is something like "what
events are happening today". If I did q=event_date:NOW (or similar) it
should hit all documents where event_date has a range that in inclusive of
today. Another example would be product category document. A specific
automobile may have a fixed price, but a category of auto (2010 BMW 3-series
for example) would have a price range.

I hope you get the point. My question (finally) is, does anyone know of an
existing contribution to the public domain that already does this? I'm more
of a .Net/C# developer than a Java developer. I know my way around Java, but
don't really have the right tools to build/test/etc. So was hoping to borrow
rather than build if I could.

Thanks,
Ken
--
View this message in context:
http://lucene.472066.n3.nabble.com/Any-contribs-available-for-Range-field-type-tp2473601p2473601.html
Sent from the Solr - User mailing list archive at Nabble.com.


Fwd: Any contribs available for Range field type?

2011-02-15 Thread mike anderson
-- Forwarded message --
From: kenf_nc 
Date: Tue, Feb 15, 2011 at 10:49 AM
Subject: Re: Any contribs available for Range field type?
To: solr-u...@lucene.apache.org



I've tried several times to get an active account on
solr-...@lucene.apache.org and the mailing list won't send me a confirmation
email, and therefore won't let me post because I'm not confirmed. Could I
get someone that is a member of Solr-Dev to post either my original request
in this thread, or a link to this thread on the Dev mailing list? I really
was hoping for more response than this to this question. This would be a
terrifically useful field type to just about any solr index.

Thanks,
Ken
--
View this message in context:
http://lucene.472066.n3.nabble.com/Any-contribs-available-for-Range-field-type-tp2473601p2502203.html
Sent from the Solr - User mailing list archive at Nabble.com.


[jira] Commented: (SOLR-880) SolrCore should have a STOP option and a lazy startup option

2010-10-27 Thread Mike Anderson (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12925654#action_12925654
 ] 

Mike Anderson commented on SOLR-880:


I'm not entirely clear on how START would differ from CREATE, and how STOP 
would differ from UNLOAD. I gather there are certain tasks that occur in CREATE 
that would be skipped in a START command, and that a START command could only 
be issued on a STOPPED core, which had previously been CREATED (but not 
UNLOADED). 

This issue hasn't been touched in over a year, is it still thought to be the 
right approach to improving multicore? What are the saving of START/STOP vs 
CREATE/UNLOAD? is it an issue of speed? memory? 

> SolrCore should have a STOP option and a lazy startup option
> 
>
> Key: SOLR-880
> URL: https://issues.apache.org/jira/browse/SOLR-880
> Project: Solr
>  Issue Type: Improvement
>  Components: multicore
>Reporter: Noble Paul
>Assignee: Shalin Shekhar Mangar
>
> * We must have an option to STOP and START a core. 
> * a core should have an option of loadOnStartup=true|false. default should be 
> true
> * A list command which can give the names of all cores and some meta 
> information like status
> If there are too many cores (tens of thousands) where each of them may be 
> used occassionally, we should not load all of them at once. In the runtime I 
> should be able to STOP and START a core on demand. A listing command would 
> let me know which one is present and what is up and what is down. A stopped 
> core must not use any resource

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: distributed search on duplicate shards

2010-09-30 Thread mike anderson
Thanks for the feedback. I ended up posting a patch to JIRA
(SOLR-2132),
although I've made a few changes since that patch. Already from our initial
tests we've seen a 10% improvement in the 90% line for response times, which
translates to a 50% improvement in the average time.

It would be nice to know more about the current plans for SolrCloud and it's
future development road map. I've seen a few threads on here asking for more
information, but it doesn't seem like a popular subject. I'll keep an eye on
it though.

Cheers,
Mike


On Wed, Sep 29, 2010 at 2:46 PM, Chris Hostetter
wrote:

>
> : 4. The first shard from a set (solr1a, solr1b) to successfully return is
> : honored, and the other requests (solr1b, if solr1a responds first, for
> : instance) are removed/ignored
> : 5. The response is completed and returned as soon as one shard from each
> set
> : responds
>
> It seems like a useful feature to me ... i know some folks who have
> (non Solr/Lucene based) custom search infrastructures that do roughly
> the same thing.
>
> : 1. What are the known disadvantages to such a strategy? (we've thought of
> a
> : few, like sets being out of sync, but they don't bother us too much)
>
> you wind up burning a lot of CPU, but that's not a disadvantage as much sa
> it is a trade off -- the whole point of doing something like this is that
> you'd rather burn CPU (and wasting network IO) in order to improve your
> worst case latency.
>
> : 2. What would this type of a feature be called? This way I can open a
> Jira
> : ticket for it
>
> no idea ... "redundent shard requests" comes to mind.
>
> : 3. Is there a preferred way to do this? My current patch (wich I can post
> : soon) works in the HTTPClient portion of SearchHandler. I keep a hash map
> of
> : the shard sets and cancel the Future's in the
> corresponding
> : set when each response comes back.
> ...
> : P.S I'd like to write a test for this feature but it wasn't clear from
> the
> : distributed test how to do so. Could somebody point me in the right
> : direction (an existing test, perhaps) for how to accomplish this?
>
> I don't relaly have a good answer for either of those questions, but the
> one thing i can suggest is thta you take a look at the SolrCloud branch
> and think about how this functionality would integrate with that (both in
> terms of implementation and in how SolrCloud unit tests work)
>
> As you mentioned: the current approach in SolrCloud is to load balance
> against identical shards on mutiple nodes in the cluster, but that's not
> contradictory with your idea: they can work in conjunction with eachother
> (ie: imagine "shard1" has four physical instances: "shard1Ax", "shard1Ay",
> "shard1Bq" and "shard1Bp" ... a request for "shard1" could trigger two
> "redundent parallel shard requests" for "shard1A" and "shard1B" and each
> of those requests could then load balance between the respecitve
> underlying physical shards.
>
>
>
> -Hoss
>
> --
> http://lucenerevolution.org/  ...  October 7-8, Boston
> http://bit.ly/stump-hoss  ...  Stump The Chump!
>
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>
>


[jira] Updated: (SOLR-2132) Distributed query to duplicate shards

2010-09-24 Thread Mike Anderson (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2132?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mike Anderson updated SOLR-2132:


Attachment: SOLR-2132.patch

Here's a patch that accomplishes the above, however I didn't write tests for 
it. 

-Mike

> Distributed query to duplicate shards
> -
>
> Key: SOLR-2132
> URL: https://issues.apache.org/jira/browse/SOLR-2132
> Project: Solr
>  Issue Type: New Feature
>  Components: search
>Reporter: Mike Anderson
>Priority: Minor
> Attachments: SOLR-2132.patch
>
>
> I think it would be useful to have the option of specifying shard "sets" in 
> the shards parameter. Such that if all shards in a set are replicating from 
> the same master (and thus have the same documents) the HTTPCommComponent will 
> honor the first shard to respond and not wait for the subsequent shards in 
> the same set. This will improve performance in the use case when one shard is 
> occasionally slow and holds up the entire response. I'm not sure if this is a 
> feature that other people want, but I thought I'd post the code none the less.
> -Mike

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Created: (SOLR-2132) Distributed query to duplicate shards

2010-09-24 Thread Mike Anderson (JIRA)
Distributed query to duplicate shards
-

 Key: SOLR-2132
 URL: https://issues.apache.org/jira/browse/SOLR-2132
 Project: Solr
  Issue Type: New Feature
  Components: search
Reporter: Mike Anderson
Priority: Minor


I think it would be useful to have the option of specifying shard "sets" in the 
shards parameter. Such that if all shards in a set are replicating from the 
same master (and thus have the same documents) the HTTPCommComponent will honor 
the first shard to respond and not wait for the subsequent shards in the same 
set. This will improve performance in the use case when one shard is 
occasionally slow and holds up the entire response. I'm not sure if this is a 
feature that other people want, but I thought I'd post the code none the less.

-Mike

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Fwd: distributed search on duplicate shards

2010-09-23 Thread mike anderson
Just wanted to poke this since it got buried under a dozen or so Jira
updates. I also sent it to the deprecated list, though I think it should
have forwarded.

-mike

-- Forwarded message --
From: mike anderson 
Date: Thu, Sep 23, 2010 at 7:06 PM
Subject: distributed search on duplicate shards
To: solr-...@lucene.apache.org


Hi all,

My company is currently running a distributed Solr cluster with about 15
shards. We occasionally find that one shard will be relatively slow and thus
hold up the entire response. To remedy this we thought it might be useful to
have a system such that:

1. We can duplicate each shard, and thus have "sets" of shards, each with
the same index
2. We can pass in these sets of shards along with the query (for instance,
if "!" is the delimiter, shards=solr1a!solr1b,solr2a!solr2b)
3. The request goes out to /all/ shards (unlike load balancing in Solr
Cloud)
4. The first shard from a set (solr1a, solr1b) to successfully return is
honored, and the other requests (solr1b, if solr1a responds first, for
instance) are removed/ignored
5. The response is completed and returned as soon as one shard from each set
responds


I've written a patch to accomplish this, but have a few questions

1. What are the known disadvantages to such a strategy? (we've thought of a
few, like sets being out of sync, but they don't bother us too much)
2. What would this type of a feature be called? This way I can open a Jira
ticket for it
3. Is there a preferred way to do this? My current patch (wich I can post
soon) works in the HTTPClient portion of SearchHandler. I keep a hash map of
the shard sets and cancel the Future's in the corresponding
set when each response comes back.

Thanks in advance,
Mike

P.S I'd like to write a test for this feature but it wasn't clear from the
distributed test how to do so. Could somebody point me in the right
direction (an existing test, perhaps) for how to accomplish this?


distributed search on duplicate shards

2010-09-23 Thread mike anderson
Hi all,

My company is currently running a distributed Solr cluster with about 15
shards. We occasionally find that one shard will be relatively slow and thus
hold up the entire response. To remedy this we thought it might be useful to
have a system such that:

1. We can duplicate each shard, and thus have "sets" of shards, each with
the same index
2. We can pass in these sets of shards along with the query (for instance,
if "!" is the delimiter, shards=solr1a!solr1b,solr2a!solr2b)
3. The request goes out to /all/ shards (unlike load balancing in Solr
Cloud)
4. The first shard from a set (solr1a, solr1b) to successfully return is
honored, and the other requests (solr1b, if solr1a responds first, for
instance) are removed/ignored
5. The response is completed and returned as soon as one shard from each set
responds


I've written a patch to accomplish this, but have a few questions

1. What are the known disadvantages to such a strategy? (we've thought of a
few, like sets being out of sync, but they don't bother us too much)
2. What would this type of a feature be called? This way I can open a Jira
ticket for it
3. Is there a preferred way to do this? My current patch (wich I can post
soon) works in the HTTPClient portion of SearchHandler. I keep a hash map of
the shard sets and cancel the Future's in the corresponding
set when each response comes back.

Thanks in advance,
Mike

P.S I'd like to write a test for this feature but it wasn't clear from the
distributed test how to do so. Could somebody point me in the right
direction (an existing test, perhaps) for how to accomplish this?