Re: Replicating Between Solr Clouds

2014-03-05 Thread Toby Lazar
Unless Solr is your system of record, aren't you already replicating your 
source data across the WAN?  If so, could you load Solr in colo B from your 
colo B data source?  You may be duplicating some indexing work, but at least 
your colo B Solr would be more closely in sync with your colo B data.

Toby
Sent via BlackBerry by ATT

-Original Message-
From: Tim Potter tim.pot...@lucidworks.com
Date: Wed, 5 Mar 2014 02:51:21 
To: solr-user@lucene.apache.orgsolr-user@lucene.apache.org
Reply-To: solr-user@lucene.apache.org
Subject: RE: Replicating Between Solr Clouds

Unfortunately, there is no out-of-the-box solution for this at the moment. 

In the past, I solved this using a couple of different approaches, which 
weren't all that elegant but served the purpose and were simple enough to allow 
the ops folks to setup monitors and alerts if things didn't work.

1) use DIH's Solr entity processor to pull data from one Solr to another, see: 
http://wiki.apache.org/solr/DataImportHandler#SolrEntityProcessor

This only works if you store all fields, which in my use case was OK because I 
also did lots of partial document updates, which also required me to store all 
fields

2) use the replication handler's snapshot support to create snapshots on a 
regular basis and then move the files over the network

This one works but required the use of read and write aliases and two 
collections on the remote (slave) data center so that I could rebuild my write 
collection from the snapshots and then update the aliases to point the reads at 
the updated collection. Work on an automated backup/restore solution is 
planned, see https://issues.apache.org/jira/browse/SOLR-5750, but if you need 
something sooner, you can write a backup driver using SolrJ that uses 
CloudSolrServer to get the address of all the shard leaders, initiate the 
backup command on each leader, poll the replication details handler for 
snapshot completion on each shard, and then ship the files across the network. 
Obviously, this isn't a solution for NRT multi-homing ;-)

Lastly, these aren't the only ways to go about this, just wanted to share some 
high-level details about what has worked.

Timothy Potter
Sr. Software Engineer, LucidWorks
www.lucidworks.com


From: perdurabo robert_par...@volusion.com
Sent: Tuesday, March 04, 2014 1:04 PM
To: solr-user@lucene.apache.org
Subject: Replicating Between Solr Clouds

We are looking to setup a highly available failover site across a WAN for our
SolrCloud instance.  The main production instance is at colo center A and
consists of a 3-node ZooKeeper ensemble managing configs for a 4-node
SolrCloud running Solr 4.6.1.  We only have one collection among the 4 cores
and there are two shards in the collection, one master node and one replica
node for each shard.  Our search and indexing services address the Solr
cloud through a load balancer VIP, not a compound API call.

Anyway, the Solr wiki explains fairly well how to replicate single node Solr
collections, but I do not see an obvious way for replicating a SolrCloud's
indices over a WAN to another SolrCloud.  I need for a SolrCloud in another
data center to be able to replicate both shards of the collection in the
other data center over a WAN.  It needs to be able to replicate from a load
balancer VIP, not a single named server of the SolrCloud, which round robins
across all four nodes/2 shards for high availability.

I've searched high and low for a white paper or some discussion of how to do
this and haven't found anything.  Any ideas?

Thanks in advance.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Replicating-Between-Solr-Clouds-tp4121196.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: to reduce indexing time

2014-03-05 Thread Toby Lazar
I believe SolrJ uses XML under the covers.  If so, I don't think you would
improve performance by switching to SolrJ, since the client would convert
it to XML before sending it on the wire.

Toby

***
  Toby Lazar
  Capital Technology Group
  Email: tla...@capitaltg.com
  Mobile: 646-469-5865
***


On Wed, Mar 5, 2014 at 3:25 PM, Ahmet Arslan iori...@yahoo.com wrote:

 Hi,

 One thing to consider is, I think solrnet use xml update, there is xml
 parsing overhead with it.
 Switching to solrJ or CSV can cause additional gain.

 http://wiki.apache.org/lucene-java/ImproveIndexingSpeed

 Ahmet


 On Wednesday, March 5, 2014 10:13 PM, sweety sweetyshind...@yahoo.com
 wrote:
 I will surely read about JVM Garbage collection. Thanks a lot, all of you.

 But, is the time required for my indexing good enough? I dont know about
 the
 ideal timings.
 I think that my indexing is taking more time.



 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/to-reduce-indexing-time-tp4121391p4121483.html

 Sent from the Solr - User mailing list archive at Nabble.com.




Re: to reduce indexing time

2014-03-05 Thread Toby Lazar
Thanks Ahmet for the correction.  I used wireshark to capture an
UpdateRequest to solr and saw this XML:

adddoc boost=1.0field name=caseID123/fieldfield
name=caseNameblah/field/doc/add

and figured that javabin was only for the responses.  Does wt apply for how
solrj send requests to solr?  Could this HTTP content be in javabin format?

Toby


On Wed, Mar 5, 2014 at 4:34 PM, Ahmet Arslan iori...@yahoo.com wrote:

 Hi Toby,

 SolrJ uses javabin by default.

 Ahmet


 On Wednesday, March 5, 2014 11:31 PM, Toby Lazar tla...@capitaltg.com
 wrote:
 I believe SolrJ uses XML under the covers.  If so, I don't think you would
 improve performance by switching to SolrJ, since the client would convert
 it to XML before sending it on the wire.

 Toby

 ***
   Toby Lazar
   Capital Technology Group
   Email: tla...@capitaltg.com
   Mobile: 646-469-5865
 ***



 On Wed, Mar 5, 2014 at 3:25 PM, Ahmet Arslan iori...@yahoo.com wrote:

  Hi,
 
  One thing to consider is, I think solrnet use xml update, there is xml
  parsing overhead with it.
  Switching to solrJ or CSV can cause additional gain.
 
  http://wiki.apache.org/lucene-java/ImproveIndexingSpeed
 
  Ahmet
 
 
  On Wednesday, March 5, 2014 10:13 PM, sweety sweetyshind...@yahoo.com
  wrote:
  I will surely read about JVM Garbage collection. Thanks a lot, all of
 you.
 
  But, is the time required for my indexing good enough? I dont know about
  the
  ideal timings.
  I think that my indexing is taking more time.
 
 
 
  --
  View this message in context:
 
 http://lucene.472066.n3.nabble.com/to-reduce-indexing-time-tp4121391p4121483.html
 
  Sent from the Solr - User mailing list archive at Nabble.com.
 
 




Re: to reduce indexing time

2014-03-05 Thread Toby Lazar
OK, I was using HttpSolrServer since I haven't yet migrated to
CloudSolrServer.  I added the line:

   solrServer.setRequestWriter(new BinaryRequestWriter())

after creating the server object and now see the difference through
wireshark.  Is it fair to assume that this usage is multi-thread safe?

Thank you Shawn and Ahmet,

Toby

***
  Toby Lazar
  Capital Technology Group
  Email: tla...@capitaltg.com
  Mobile: 646-469-5865
***


On Wed, Mar 5, 2014 at 4:46 PM, Shawn Heisey s...@elyograg.org wrote:

 On 3/5/2014 2:31 PM, Toby Lazar wrote:

 I believe SolrJ uses XML under the covers.  If so, I don't think you would
 improve performance by switching to SolrJ, since the client would convert
 it to XML before sending it on the wire.


 Until recently, SolrJ always used XML by default for requests and javabin
 for responses.  That is moving to javabin for both.  This is already the
 case in the newest versions for CloudSolrServer.  HttpSolrServer is still
 using the XML RequestWriter by default, but you can change this very easily
 to BinaryRequestWriter.  If you plan to use SolrJ, it's a change I would
 highly recommend.

 Thanks,
 Shawn




Re: Facet field query on subset of documents

2013-12-20 Thread Toby Lazar
Luis (or anyone else),

Did you ever find a solution for this problem?  If not, is querying twice
the way to go?  I'm looking to do the same with no luck yet.

Thanks,

Toby

***
  Toby Lazar
  Capital Technology Group
  Email: tla...@capitaltg.com
  Mobile: 646-469-5865
***


On Thu, Nov 21, 2013 at 5:44 PM, Luis Lebolo luis.leb...@gmail.com wrote:

 Hi Erick,

 Thanks for the reply and sorry, my fault, wasn't clear enough. I was
 wondering if there was a way to remove terms that would always be zero
 (because the term came from a document that didn't match the filter query).

 Here's an example. I have a bunch of documents with fields 'manufacturer'
 and 'location'. If I set my filter query to manufacturer = Sony and all
 Sony documents had a value of 'Florida' for location, then I want 'Florida'
 NOT to show up in my facet field results. Instead, it shows up with a count
 of zero (and it'll always be zero because of my filter query).

 Using mincount = 1 doesn't solve my problem because I don't want it to hide
 zeroes that came from documents that actually pass my filter query.

 Does that make more sense?


 On Thu, Nov 21, 2013 at 4:36 PM, Erick Erickson erickerick...@gmail.com
 wrote:

  That's what faceting does. The facets are only tabulated
  for documents that satisfy they query, including all of
  the filter queries and anh other criteria.
 
  Otherwise, facet counts would be the same no matter
  what the query was.
 
  Or I'm completely misunderstanding your question...
 
  Best,
  Erick
 
 
  On Thu, Nov 21, 2013 at 4:22 PM, Luis Lebolo luis.leb...@gmail.com
  wrote:
 
   Hi All,
  
   Is it possible to perform a facet field query on a subset of documents
  (the
   subset being defined via a filter query for instance)?
  
   I understand that facet pivoting might work, but it would require that
  the
   subset be defined by some field hierarchy, e.g. manufacturer - price
  (then
   only look at the results for the manufacturer I'm interested in).
  
   What if I wanted to define a more complex subset (where the name starts
   with A but ends with Z and some other field is greater than 5 and yet
   another field is not 'x', etc.)?
  
   Ideally I would then define a facet field constraining query to
 include
   only terms from documents that pass this query.
  
   Thanks,
   Luis
  
 



Re: How to get similarity score between 0 and 1 not relative score

2013-10-31 Thread Toby Lazar
I think you are looking for something like this, though you can omit the fq
section:


http://localhost:8983/solr/collection/select?abc=text:bobq={!func}scale(product(query($abc),1),0,1)fq={!
frange l=0.9}$q

Also, I don't understand all the fuss about normalized scores.  In the
linked example, I can see an interest in searching for apple bannana,
zzz yyy xxx qqq kkk ttt rrr 111, etc. and wanting only close matches for
that point in time.  Would this be a good use for this approach?  I
understand that the results can change if the documents in the index change.

Thanks,

Toby



On Thu, Oct 31, 2013 at 12:56 AM, Anshum Gupta ans...@anshumgupta.netwrote:

 Hi Susheel,

 Have a look at this:
 http://wiki.apache.org/lucene-java/ScoresAsPercentages

 You may really want to reconsider doing that.




 On Thu, Oct 31, 2013 at 9:41 AM, sushil sharma sushil2...@yahoo.co.in
 wrote:

  Hi,
 
  We have a requirement where user would like to see a score (between 0 to
  1) which can tell how close the input search string is with result
 string.
  So if input was very close but not exact matach, score could be .90 etc.
 
  I do understand that we can get score from solr  divide by highest score
  but that will always show 1 even if we match was not exact.
 
  Regards,
  Susheel




 --

 Anshum Gupta
 http://www.anshumgupta.net



Re: pivot range faceting

2013-10-20 Thread Toby Lazar
Thanks for confirming my fears.  I saw some presentations where I thought
this feature was used, but perhaps it was done performing multiple range
queries.

Any chance there is a way for copyField to copy a function of a field
instead of the original itself is there?  Or, must this be done by the
application?

Thank you again for your help.

Toby

***
  Toby Lazar
  Capital Technology Group
  Email: tla...@capitaltg.com
  Mobile: 646-469-5865
***


On Sun, Oct 20, 2013 at 2:39 PM, Upayavira u...@odoko.co.uk wrote:



 On Sun, Oct 20, 2013, at 04:04 AM, Toby Lazar wrote:
  Is it possible to get pivot info on a range-faceted query?  For example,
  if
  I want to query the number of orders placed in January, February, etc., I
  know I can use a simple range search.  If I want to get the number of
  orders by category, I can do that easily by faceting on category.  I'm
  wondering if I can get the number of all orders by month, and also broken
  down by category.  Is that possible in a single query?

 You can't yet include a range facet within a pivot. The way to achieve
 this is to store a version of your date field rounded to the nearest
 month, then you will be able to use that field in a pivot facet.

 Obviously, this requires index time effort, which is less than ideal.

 I guess this is a feature just waiting for someone to implement it.

 Upayavira



pivot range faceting

2013-10-19 Thread Toby Lazar
Is it possible to get pivot info on a range-faceted query?  For example, if
I want to query the number of orders placed in January, February, etc., I
know I can use a simple range search.  If I want to get the number of
orders by category, I can do that easily by faceting on category.  I'm
wondering if I can get the number of all orders by month, and also broken
down by category.  Is that possible in a single query?

Thanks,

Toby