Re: Do we really need to build all existing index data again after changing schema? [scottchu]

2016-06-21 Thread Alexandre Rafalovitch
Well, if you are changing query analyzer chain, you can get away
without reindexing.

But if you change index analyzer chain, then your older tokens are
stored in a different ways. Even if Solr does not complain, it will
not match and may lead to very obscure issues.

And if you change the type radically (String to text or to integer), I
am quite sure Solr will complain a lot.


Regards,
   Alex.

Newsletter and resources for Solr beginners and intermediates:
http://www.solr-start.com/


On 22 June 2016 at 16:49, scott.chu  wrote:
>
> According to https://wiki.apache.org/solr/HowToReindex, If I understand 
> right, once we change schema after index built, we have to rebuild all index 
> again. Is there really no other way to keep existing index data and still 
> apply new schema?
>
> scott.chu,scott@udngroup.com
> 2016/6/22 (週三)


Do we really need to build all existing index data again after changing schema? [scottchu]

2016-06-21 Thread scott.chu

According to https://wiki.apache.org/solr/HowToReindex, If I understand right, 
once we change schema after index built, we have to rebuild all index again. Is 
there really no other way to keep existing index data and still apply new 
schema?

scott.chu,scott@udngroup.com
2016/6/22 (週三)


Re: SolrCloud: Adding a very large collection to a pre-existing cluster

2016-06-21 Thread Erick Erickson
One other option is to index "somewhere else", then use the collections API
to "addreplica"s on your prod cluster. Then perhaps delete replica on the
nodes that are "somewhere else".

Best,
Erick
On Jun 21, 2016 4:27 PM, "Jeff Wartes"  wrote:


There’s no official way of doing #1, but there are some less official ways:
1. The Backup/Restore API provides some hooks into loading pre-existing
data dirs into an existing collection. Lots of caveats.
2. If you don’t have many shards, there’s always rsync/reload.
3. There are some third-party tools that help with this kind of thing:
a. https://github.com/whitepages/solrcloud_manager (primarily a command
line tool)
b. https://github.com/bloomreach/solrcloud-haft (primarily a library)

For #2, absolutely. Spin up some new nodes in your cluster, and then use
the “createNodeSet” parameter when creating the new collection to restrict
to those new nodes:
https://cwiki.apache.org/confluence/display/solr/Collections+API#CollectionsAPI-api1




On 6/21/16, 12:33 PM, "Kelly, Frank"  wrote:

>We have about 200 million documents (~70 GB) we need to keep indexed
across 3 collections.
>
>Currently 2 of the 3 collections are already indexed (roughly 90m docs).
>
>We'd like to create the remaining collection (about 100 m documents) but
minimizing the performance impact on the existing collections on Solr
servers during that Time.
>
>Is there some way to do this either by
>
>  1.  Creating the collection in another environment and shipping the
(underlying Lucene) index files
>  2.  Creating the collection on (dedicated) new machines that we add to
the SolrCloud cluster?
>
>Thoughts, comments or suggestions appreciated,
>
>Best
>
>-Frank Kelly
>


Re: Solr Query Processing Detail

2016-06-21 Thread Erick Erickson
Where are you seeing that this does anything? It wouldn't be the first time
new functionality happened that I totally missed, but I've never seen that
config.

You might get some mileage out of ReRankingQParserPlugin though, that runs
the top N queries from one query through another.

Best,
Erick
On Jun 21, 2016 2:43 PM, "John Bickerstaff" 
wrote:

> Hi all,
>
> I have a question about whether sub-queries in Solr requestHandlers go
> against the total index or against the results of the previous query.
>
> Here's a simple example:
>
> 
>
>   {!edismax qf=blah, blah}
>
>   {!edismax qf=blah, blah}
>
> 
>
> My question is:
>
> What does Query2 run "against"?
>   a. The entire Solr Index
>   b. The results of Query1
>
> If this is clearly documented anywhere, I'm very interested in a link.
>
> Thanks
>


Re: AW: How many cores is too many cores?

2016-06-21 Thread Erick Erickson
Good luck! This really requires that you can configure your solr to cache N
cores and don't expect more than N-M users in parallel. Or you don't mind
if some users see periodic slow responses. You can sometimes make a hidden
call when a user signs on to pre-load her core, but again your usage
pattern may not tolerate that.

Finally, note that the lazy-load parameter does NOT require a transient
cache. That avoids the "load all cores at startup" delay.

Best,
Erick
On Jun 21, 2016 4:54 AM, "Sebastian Riemer"  wrote:

> Thanks for your respone Erick!
>
> Currently we are trying to keep things simple so we don't use SolrCloud.
>
> I'll give it a look, configuration seems easy, however testing with many
> clients in parallel seems not so much.
>
> Thanks again,
> Sebastian
>
> -Ursprüngliche Nachricht-
> Von: Erick Erickson [mailto:erickerick...@gmail.com]
> Gesendet: Dienstag, 21. Juni 2016 01:52
> An: solr-user 
> Betreff: Re: How many cores is too many cores?
>
> Sebastian:
>
> It Depends (tm). Solr can handle this, but there are caveats. Is this
> SolrCloud or not? Each core will consume some resources and there are some
> JIRAs out there about specifically that many cores in SolrCloud.
> If your problem space works with the LotsOfCores, start here:
> https://cwiki.apache.org/confluence/display/solr/Format+of+solr.xml
> and
> https://cwiki.apache.org/confluence/display/solr/Defining+core.properties
> The idea is that if your access pattern is
> > sign on
> > ask some questions
> > go away
> you can configure that only N cores are loaded at any one time.
> Theoretically you can have a huge number of cores (I've tested with
> 15,000) defined, but only say 100 active at a time.
>
> There are also options you can specify that cause a core to not be loaded
> until requested, but not aged out.
>
> The 1,500 core case will keep Solr from coming up until all of the cores
> have been opened, which can be lengthy. But you can define the number of
> threads that are running in parallel to open the cores
> but the default is unlimited so you can run out of threads (really
> memory).
>
> So the real answer is "it's not insane, but you really need to test it
> operationally and tweak a bunch of settings before making your decision"
>
> Best,
> Erick
>
> On Mon, Jun 20, 2016 at 12:49 PM, Sebastian Riemer 
> wrote:
> > Hi,
> >
> > Currently I have a single solr server handling 5 cores which differ in
> the content they provide.
> >
> > However, each of them might hold data for many different
> clients/customers. Let's say for example one day there might be 300
> different clients each storing their data in those 5 cores.
> >
> > Every client can make backups of his data and import that data back into
> our system. That however, makes re-indexing all of his documents in the
> cores necessary, which A) is very slow at the moment since fetching the
> data from MySQL-DB is slow and B) would slow down searches for all other
> clients while the reindexing is taking place, right?
> >
> > Now my idea would be:
> >
> > What if each client gets his own 5 cores? Then instead of re-indexing I
> could simply copy back the solr-index files (which I copied while making
> the backup) into his core-directories, right?
> >
> > That would lead to about 5 x 300 cores, equals 1500 cores.
> >
> > Am I insane by thinking that way?
> >
> > Best regards,
> > Sebastian
> >
>


RE: Solr 5.5 | Field boosting not working as per expectation

2016-06-21 Thread Erick Erickson
I strongly suspect you're making a false correlation between insertion time
and results. Or you somehow have a sort by date clause. Solr does not do
anything to automatically consider recency, you have to tell it to...

Best,
Erick
On Jun 21, 2016 2:51 AM, "Megha Bhandari"  wrote:

> We have the following in solrconfig.xml, nothing related to timestamp
>
> 
> 
>   edismax
>   
> metatag.keywords^90.1 metatag.description^50.1
> title^1.1 h1^1000.7 h2^700.6 h3^10.1 h4^5.4 h5^1.3 h6^1.2 _text_^1.0
>   
> 
> And the explain query doesn’t mention anything wrt to timestamp
>
> '/content/dam/uhcdotcom/en/qa_workarea/Silver-Choice-5000-E.pdf'=>'
> 0.050655644 = max of:
>   0.050655644 = weight(title:florida in 0) [ClassicSimilarity], result of:
> 0.050655644 = score(doc=0,freq=1.0), product of:
>   0.0075930804 = queryWeight, product of:
> 1.1 = boost
> 6.6712904 = idf(docFreq=21, maxDocs=6389)
> 1.1381613E-7 = queryNorm
>   6.6712904 = fieldWeight in 0, product of:
> 1.0 = tf(freq=1.0), with freq of:
>   1.0 = termFreq=1.0
> 6.6712904 = idf(docFreq=21, maxDocs=6389)
> 1.0 = fieldNorm(doc=0)
>   3.462133E-7 = weight(_text_:florida in 0) [ClassicSimilarity], result of:
> 3.462133E-7 = score(doc=0,freq=2.0), product of:
>   4.222856E-7 = queryWeight, product of:
> 3.710244 = idf(docFreq=424, maxDocs=6389)
> 1.1381613E-7 = queryNorm
>   0.8198558 = fieldWeight in 0, product of:
> 1.4142135 = tf(freq=2.0), with freq of:
>   2.0 = termFreq=2.0
> 3.710244 = idf(docFreq=424, maxDocs=6389)
> 0.15625 = fieldNorm(doc=0)
> ',
>   'https://10.209.5.171/contact-us/florida'=>'
> 0.02968075 = max of:
>   0.0011872416 = weight(title:florida in 380) [ClassicSimilarity], result
> of:
> 0.0011872416 = score(doc=380,freq=1.0), product of:
>   0.0075930804 = queryWeight, product of:
> 1.1 = boost
> 6.6712904 = idf(docFreq=21, maxDocs=6389)
> 1.1381613E-7 = queryNorm
>   0.15635836 = fieldWeight in 380, product of:
> 1.0 = tf(freq=1.0), with freq of:
>   1.0 = termFreq=1.0
> 6.6712904 = idf(docFreq=21, maxDocs=6389)
> 0.0234375 = fieldNorm(doc=380)
>   5.724965E-9 = weight(_text_:florida in 380) [ClassicSimilarity], result
> of:
> 5.724965E-9 = score(doc=380,freq=14.0), product of:
>   4.222856E-7 = queryWeight, product of:
> 3.710244 = idf(docFreq=424, maxDocs=6389)
> 1.1381613E-7 = queryNorm
>   0.013557092 = fieldWeight in 380, product of:
> 3.7416575 = tf(freq=14.0), with freq of:
>   14.0 = termFreq=14.0
> 3.710244 = idf(docFreq=424, maxDocs=6389)
> 9.765625E-4 = fieldNorm(doc=380)
>   8.445298E-5 = weight(h1:florida in 380) [ClassicSimilarity], result of:
> 8.445298E-5 = score(doc=380,freq=1.0), product of:
>   8.387785E-4 = queryWeight, product of:
> 1000.7 = boost
> 7.3644376 = idf(docFreq=10, maxDocs=6389)
> 1.1381613E-7 = queryNorm
>   0.10068567 = fieldWeight in 380, product of:
> 1.0 = tf(freq=1.0), with freq of:
>   1.0 = termFreq=1.0
> 7.3644376 = idf(docFreq=10, maxDocs=6389)
> 0.013671875 = fieldNorm(doc=380)
>   0.02968075 = weight(metatag.description:florida in 380)
> [ClassicSimilarity], result of:
> 0.02968075 = score(doc=380,freq=1.0), product of:
>   0.3796503 = queryWeight, product of:
> 50.1 = boost
> 6.6712904 = idf(docFreq=21, maxDocs=6389)
> 1.1381613E-7 = queryNorm
>   0.07817918 = fieldWeight in 380, product of:
> 1.0 = tf(freq=1.0), with freq of:
>   1.0 = termFreq=1.0
> 6.6712904 = idf(docFreq=21, maxDocs=6389)
> 0.01171875 = fieldNorm(doc=380)
> ',
>
> So not able to fathom what is going wrong.
>
> -Original Message-
> From: Alexandre Rafalovitch [mailto:arafa...@gmail.com]
> Sent: Tuesday, June 21, 2016 1:50 PM
> To: solr-user
> Subject: Re: Solr 5.5 | Field boosting not working as per expectation
>
> Sounds strange.
>
> Are you absolutely sure this insertion-time factor is not something
> you are doing explicitly? I would add debug=all parameter to see that
> you don't have some unexpectedly-complex biasing formula hiding in
> solrconfig.xml parameter.
>
> Regards,
>Alex.
> 
> Newsletter and resources for Solr beginners and intermediates:
> http://www.solr-start.com/
>
>
> On 21 June 2016 at 16:53, Megha Bhandari  wrote:
> > After further investigation we have found that latest inserted documents
> are getting higher priority and coming on top of the search results and
> ignoring the field boosting in case the time difference of document
> insertion is a day.
> >
> > Is there a configuration to switch off insertion time factor. As per our
> understanding field boosting should take precedence.
> >
> > Thanks in advance for a

Re: Updating solr schema for a collection in place

2016-06-21 Thread Erick Erickson
Well, if it works... changing schema factories should be fine, assuming
you've correctly configured before reload, I.e. point at the right schema
file etc. My comments were more thinking about changing the schema rather
than the config.

Best,
Erick
On Jun 20, 2016 10:26 PM, "Stephen Lewis"  wrote:

Oh, also I see when I first replied, I missed addressing this


> For instance,
> ​ ​
> having a field defined with docValues set to false, indexing some data
> then changing that field to docValues="true" and indexing some more data

will give you "interesting" results.


The way we update our data model is to run fields in parallel as we migrate
fields through a "rebuild" we do in the background. For any update
requiring in place updates of fields (which we've yet to do), we would have
stood up a parallel cloud and run a data migration. (We weren't actually
100% sure this would be strictly necessary.) If I understand you right, we
could use the managed schema
<
https://cwiki.apache.org/confluence/display/solr/Schema+Factory+Definition+in+SolrConfig
>
factory
and perform an atomic field update to the schema like this in place. That
could make testing and migration even quicker for us. I think in prod we
would be able to make good use of this too, though we would probably still
want to run a parallel cloud in isolation while doing this update if there
were a risk of delay in write throughput or heavy perf at peak time.

In my test environment noodling, I noticed that even when using a managed
schema, I could update the solrconfig.xml through a reload. Is it generally
safe to switch between schema factories through schema reloads, or is this
getting on the "cavalier" side of things? :)

On Mon, Jun 20, 2016 at 9:51 PM, Stephen Lewis  wrote:

> ​Thanks for the advice! I haven't encountered those nuances yet so it's
> great to be aware of them now.
>
> I manage our solr clouds through an OO python package which models our
> search stack. We use this package deploy to stacks which are isolated and
> configurable, but otherwise identical. We push our updates to the config
> through to our test environment for a test pass, next to our production
> clouds in parallel, and finally we flight to users. It's been a pretty
good
> system so far, and generally I haven't had many issues using solr 6.0. We
> were using 4.9 until relatively recently, and we did have some troubles
> with the collections API. In those cases, we resolved by recreating the
> collection. So far 6.0 seems to hum along gracefully as we use the API.
>
> Thanks again for letting us know to keep a sharp eye on the details and to
> be on the lookout for interesting behavior :)
>
> Best,
> Stephen
>
> On Mon, Jun 20, 2016 at 7:56 PM, Erick Erickson 
> wrote:
>
>> Glad you found the issue. The switch to managed has tripped up
>> more people than just you!
>>
>> Do be a little cautious about changing the schema however. There
>> are some "benign" changes you can do when you already have data
>> indexed and a series of others that are not benign. For instance,
>> having a field defined with docValues set to false, indexing some data
>> then changing that field to docValues="true" and indexing some more data
>> will give you "interesting" results.
>>
>> Other operations, like adding new fieldTypes or new Fields are entirely
>> benign.
>>
>> Mostly, this is just a caution that if you are changing your schema
>> and find results wonky (e.g. facet counts not correct, docs not being
>> found
>> when you change stemming, etc). to consider deleting/recreating the
>> collection before tearing your hair out.
>>
>> Best,
>> Erick
>>
>> On Mon, Jun 20, 2016 at 10:37 PM, Stephen Lewis 
>> wrote:
>> > I'm happy to say I figured out the issue. Looking through previous
>> > questions in this forum, I was able to find someone hitting the same
>> issue
>> > which I was. After upgrading versions, we switched to the managed
>> instead
>> > of the ClassicIndexSchemaFactory unintentionally. Sorry for the bother!
>> >
>> > On Mon, Jun 20, 2016 at 7:01 PM, Stephen Lewis 
>> wrote:
>> >
>> >> Hello,
>> >>
>> >> I've recently set up a solr cloud using solr 6.0, and I've been having
>> >> some trouble getting our collections to pick up schema updates.
>> Following
>> >> the docs on zkcli.sh
>> >> <
>>
https://cwiki.apache.org/confluence/display/solr/Using+ZooKeeper+to+Manage+Configuration+Files
>
>> and
>> >> the collections API
>> >> <
>>
https://cwiki.apache.org/confluence/display/solr/Collections+API#CollectionsAPI-api2
>> >,
>> >> I have uploaded the new schema by placing it onto a solr node at
>> >> /opt/solr/server/configsets/my_collection/conf/schema.xml and running
>> >>
>> >> /opt/solr/cloud-scripts/zkcli.sh \
>> >>
>> >> -zkhost zkdns.foo.bar \
>> >>
>> >> -cmd upconfig \
>> >>
>> >> -confname my_collection \
>> >>
>> >> -confdir /opt/solr/server/configsets/my_collection/conf
>> >>
>> >>
>> >> and then triggering a reload of the collection by hitting
>> >>
>> >>
>> >>
>>
sol

Multiple context field / filters in Solr suggester

2016-06-21 Thread Shamik Bandopadhyay
Hi,

  Just trying to understand if Solr suggester supports multiple filtering
through the "contextField" option. As shown in the config below, is it
possible to have two contextFields defined where I can use  "cat" and
"manu" as filtering criteria on the suggested result ?


  
mySuggester
AnalyzingInfixLookupFactory
DocumentDictionaryFactory
name
price
cat
manu
string
false
  


The only reference around this seemed to be Solr-7888, but based on my
understanding, it talks about boolean support for a given context.

Any pointers will be appreciated.

Thanks,
Shamik


Re: [ANN] Relevant Search by Manning out! (Thanks Solr community!)

2016-06-21 Thread Trey Grainger
Congrats Doug and John! Writing a book like this is a very long, arduous
process (as several folks on this list can attest to). Writing a great book
like this is considerably more challenging.

I read through this entire book a few months ago before they put the final
touches on it, and (for anyone on the mailing list who is contemplating
buying it), it is a REALLY great book that will teach you the ins and outs
of how search relevancy works under the covers and how you can manipulate
and improve it. It's very well-written, and definitely worth the read.

Congrats again, guys.

Trey Grainger
Co-author, Solr in Action
SVP of Engineering @ Lucidworks

On Tue, Jun 21, 2016 at 2:12 PM, Doug Turnbull <
dturnb...@opensourceconnections.com> wrote:

> Not much more to add than my post here! This book is targeted towards
> Lucene-based search (Elasticsearch and Solr) relevance.
>
> Announcement with discount code:
> http://opensourceconnections.com/blog/2016/06/21/relevant-search-published/
>
> Related hacker news thread:
> https://news.ycombinator.com/item?id=11946636
>
> Thanks to everyone in the Solr community that was helpful to my efforts.
> Specifically Trey Grainger, Eric Pugh (for keeping me employed), Charlie
> Hull and the Flax team, Alex Rafalovitch, Timothy Potter, Yonik Seeley,
> Grant Ingersoll (for basically teaching me Solr back in the day), Drew
> Farris (for encouraging my early blogging), everyone at OSC, and many
> others I'm probably forgetting!
>
> Best
> -Doug
>


Re: [ANN] Relevant Search by Manning out! (Thanks Solr community!)

2016-06-21 Thread Erik Hatcher
Hat tip, indeed.   It’s a painful process and those that have survived it get 
my respect. 

Who doesn’t want Relevant Search?

—
Erik Hatcher, Senior Solutions Architect
http://www.lucidworks.com 



> On Jun 21, 2016, at 2:12 PM, Doug Turnbull 
>  wrote:
> 
> Not much more to add than my post here! This book is targeted towards
> Lucene-based search (Elasticsearch and Solr) relevance.
> 
> Announcement with discount code:
> http://opensourceconnections.com/blog/2016/06/21/relevant-search-published/
> 
> Related hacker news thread:
> https://news.ycombinator.com/item?id=11946636
> 
> Thanks to everyone in the Solr community that was helpful to my efforts.
> Specifically Trey Grainger, Eric Pugh (for keeping me employed), Charlie
> Hull and the Flax team, Alex Rafalovitch, Timothy Potter, Yonik Seeley,
> Grant Ingersoll (for basically teaching me Solr back in the day), Drew
> Farris (for encouraging my early blogging), everyone at OSC, and many
> others I'm probably forgetting!
> 
> Best
> -Doug



Re: SolrCloud: Adding a very large collection to a pre-existing cluster

2016-06-21 Thread Jeff Wartes

There’s no official way of doing #1, but there are some less official ways:
1. The Backup/Restore API provides some hooks into loading pre-existing data 
dirs into an existing collection. Lots of caveats.
2. If you don’t have many shards, there’s always rsync/reload.
3. There are some third-party tools that help with this kind of thing:
a. https://github.com/whitepages/solrcloud_manager (primarily a command line 
tool)
b. https://github.com/bloomreach/solrcloud-haft (primarily a library)

For #2, absolutely. Spin up some new nodes in your cluster, and then use the 
“createNodeSet” parameter when creating the new collection to restrict to those 
new nodes:
https://cwiki.apache.org/confluence/display/solr/Collections+API#CollectionsAPI-api1




On 6/21/16, 12:33 PM, "Kelly, Frank"  wrote:

>We have about 200 million documents (~70 GB) we need to keep indexed across 3 
>collections.
>
>Currently 2 of the 3 collections are already indexed (roughly 90m docs).
>
>We'd like to create the remaining collection (about 100 m documents) but 
>minimizing the performance impact on the existing collections on Solr servers 
>during that Time.
>
>Is there some way to do this either by
>
>  1.  Creating the collection in another environment and shipping the 
> (underlying Lucene) index files
>  2.  Creating the collection on (dedicated) new machines that we add to the 
> SolrCloud cluster?
>
>Thoughts, comments or suggestions appreciated,
>
>Best
>
>-Frank Kelly
>



Re: [ANN] Relevant Search by Manning out! (Thanks Solr community!)

2016-06-21 Thread Alexandre Rafalovitch
Congratulations.

I know it was a very long road. MEAP was really good, I am looking
forward to reading the final version.

Regards,
Alex.

Newsletter and resources for Solr beginners and intermediates:
http://www.solr-start.com/


On 22 June 2016 at 04:12, Doug Turnbull
 wrote:
> Not much more to add than my post here! This book is targeted towards
> Lucene-based search (Elasticsearch and Solr) relevance.
>
> Announcement with discount code:
> http://opensourceconnections.com/blog/2016/06/21/relevant-search-published/
>
> Related hacker news thread:
> https://news.ycombinator.com/item?id=11946636
>
> Thanks to everyone in the Solr community that was helpful to my efforts.
> Specifically Trey Grainger, Eric Pugh (for keeping me employed), Charlie
> Hull and the Flax team, Alex Rafalovitch, Timothy Potter, Yonik Seeley,
> Grant Ingersoll (for basically teaching me Solr back in the day), Drew
> Farris (for encouraging my early blogging), everyone at OSC, and many
> others I'm probably forgetting!
>
> Best
> -Doug


Re: [ANN] Relevant Search by Manning out! (Thanks Solr community!)

2016-06-21 Thread Alessandro Benedetti
Congrats Doug ! I already pre-ordered a copy :)
Well done !
Cheers !

On Tue, Jun 21, 2016 at 7:12 PM, Doug Turnbull <
dturnb...@opensourceconnections.com> wrote:

> Not much more to add than my post here! This book is targeted towards
> Lucene-based search (Elasticsearch and Solr) relevance.
>
> Announcement with discount code:
> http://opensourceconnections.com/blog/2016/06/21/relevant-search-published/
>
> Related hacker news thread:
> https://news.ycombinator.com/item?id=11946636
>
> Thanks to everyone in the Solr community that was helpful to my efforts.
> Specifically Trey Grainger, Eric Pugh (for keeping me employed), Charlie
> Hull and the Flax team, Alex Rafalovitch, Timothy Potter, Yonik Seeley,
> Grant Ingersoll (for basically teaching me Solr back in the day), Drew
> Farris (for encouraging my early blogging), everyone at OSC, and many
> others I'm probably forgetting!
>
> Best
> -Doug
>



-- 
--

Benedetti Alessandro
Visiting card : http://about.me/alessandro_benedetti

"Tyger, tyger burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symmetry?"

William Blake - Songs of Experience -1794 England


Solr Query Processing Detail

2016-06-21 Thread John Bickerstaff
Hi all,

I have a question about whether sub-queries in Solr requestHandlers go
against the total index or against the results of the previous query.

Here's a simple example:



  {!edismax qf=blah, blah}

  {!edismax qf=blah, blah}



My question is:

What does Query2 run "against"?
  a. The entire Solr Index
  b. The results of Query1

If this is clearly documented anywhere, I'm very interested in a link.

Thanks


Re: How do we get terms suggestion from SuggestComponent?

2016-06-21 Thread Ahmet Arslan
Hi,

With grams parameter of FreeTextLookupFactory, no?

Ahmet



On Tuesday, June 21, 2016 1:19 PM, solr2020  wrote:
Thanks Ahmet.

It is working fine. Now i would like to get suggestions for multiple terms.
How do i get suggestions for multiple terms?


 


   
   
   
   
  


Thanks.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-do-we-get-terms-suggestion-from-SuggestComponent-tp4283399p4283584.html

Sent from the Solr - User mailing list archive at Nabble.com.


SolrCloud: Adding a very large collection to a pre-existing cluster

2016-06-21 Thread Kelly, Frank
We have about 200 million documents (~70 GB) we need to keep indexed across 3 
collections.

Currently 2 of the 3 collections are already indexed (roughly 90m docs).

We'd like to create the remaining collection (about 100 m documents) but 
minimizing the performance impact on the existing collections on Solr servers 
during that Time.

Is there some way to do this either by

  1.  Creating the collection in another environment and shipping the 
(underlying Lucene) index files
  2.  Creating the collection on (dedicated) new machines that we add to the 
SolrCloud cluster?

Thoughts, comments or suggestions appreciated,

Best

-Frank Kelly



RE: [ANN] Relevant Search by Manning out! (Thanks Solr community!)

2016-06-21 Thread Allison, Timothy B.
Not that I need any other book beyond this one... but I didn't realize that the 
50% discount code applies to all books in the order. :)

Congratulations, Doug and John!

-Original Message-
From: Doug Turnbull [mailto:dturnb...@opensourceconnections.com] 
Sent: Tuesday, June 21, 2016 2:12 PM
To: solr-user@lucene.apache.org
Cc: John Berryman 
Subject: [ANN] Relevant Search by Manning out! (Thanks Solr community!)

Not much more to add than my post here! This book is targeted towards 
Lucene-based search (Elasticsearch and Solr) relevance.

Announcement with discount code:
http://opensourceconnections.com/blog/2016/06/21/relevant-search-published/

Related hacker news thread:
https://news.ycombinator.com/item?id=11946636

Thanks to everyone in the Solr community that was helpful to my efforts.
Specifically Trey Grainger, Eric Pugh (for keeping me employed), Charlie Hull 
and the Flax team, Alex Rafalovitch, Timothy Potter, Yonik Seeley, Grant 
Ingersoll (for basically teaching me Solr back in the day), Drew Farris (for 
encouraging my early blogging), everyone at OSC, and many others I'm probably 
forgetting!

Best
-Doug


RE: SpanQuery - How to wrap a NOT subquery

2016-06-21 Thread Allison, Timothy B.
>Awesome, 0 pre and 1 post works!

Great!

> What if I wanted to match thirty, but exclude if six or seven are included 
> anywhere in the document?

Any time you need "anywhere in the document", use a "regular" query (not 
SpanQuery).  As you wrote initially, you can construct a BooleanQuery that 
includes a complex SpanQuery and another Query that is 
BooleanClause.Occur.MUST_NOT.

> I also tried 0 pre and 0 post
You'd use those if you wanted to find something that didn't contain something 
else: 

["William Clinton"~2 Jefferson]!~0,0

Find 'william' within two words of 'clinton', but not if 'jefferson' appears 
between them.

> I replaced pre with Integer.MAX_VALUE and post with Integer.MAX_VALUE - 5 and 
> it works!
I'll have to think about this one...



Re: [ANN] Relevant Search by Manning out! (Thanks Solr community!)

2016-06-21 Thread John Blythe
awesome, will definitely pick up a copy. booking my ticket to revolution
before the next early bird special lapses, see some of you there!

-- 
*John Blythe*
Product Manager & Lead Developer

251.605.3071 | j...@curvolabs.com
www.curvolabs.com

58 Adams Ave
Evansville, IN 47713

On Tue, Jun 21, 2016 at 2:21 PM, Doug Turnbull <
dturnb...@opensourceconnections.com> wrote:

> Thanks Will and John! You both have also been helpful.
>
> 
> If you want a great relevance-heavy search conference, you better be at
> Lucene/Solr Revolution in October!!
> http://lucenerevolution.org/
> "Lucene Revolution: Officially Endorsed by the Authors of Relevant Search"
>
> See everyone on the mailing list there!!
> 
>
> Best
> -Doug
>
>
>
>
> On Tue, Jun 21, 2016 at 2:16 PM Will Hayes  wrote:
>
> > W00t! Congrats!
> > On Jun 21, 2016 8:12 PM, "Doug Turnbull" <
> > dturnb...@opensourceconnections.com> wrote:
> >
> > > Not much more to add than my post here! This book is targeted towards
> > > Lucene-based search (Elasticsearch and Solr) relevance.
> > >
> > > Announcement with discount code:
> > >
> >
> http://opensourceconnections.com/blog/2016/06/21/relevant-search-published/
> > >
> > > Related hacker news thread:
> > > https://news.ycombinator.com/item?id=11946636
> > >
> > > Thanks to everyone in the Solr community that was helpful to my
> efforts.
> > > Specifically Trey Grainger, Eric Pugh (for keeping me employed),
> Charlie
> > > Hull and the Flax team, Alex Rafalovitch, Timothy Potter, Yonik Seeley,
> > > Grant Ingersoll (for basically teaching me Solr back in the day), Drew
> > > Farris (for encouraging my early blogging), everyone at OSC, and many
> > > others I'm probably forgetting!
> > >
> > > Best
> > > -Doug
> > >
> >
>


Re: SpanQuery - How to wrap a NOT subquery

2016-06-21 Thread Brandon Miller
Awesome, 0 pre and 1 post works!
I replaced pre with Integer.MAX_VALUE and post with Integer.MAX_VALUE - 5
and it works!

If I replace post with Integer.MAX_VALUE - 4 (or -3, -2, -1, -0), it fails.
But, if it's -(5+), it appears to work.

Thank you guys for suffering through my inexperience with Solr.



*NOTE*: In case someone would find it helpful to follow my reasoning before
I discovered the work-around above:

I don't understand why 0,1 works and Integer.MAX_VALUE, Integer.MAX_VALUE
doesn't.  I mean I know that six and seven both come one word after thirty *in
this case*.  This is case-dependent.  What if I wanted to match thirty, but
exclude if six or seven are included anywhere in the document?

How will I know what numbers to plug into pre and post when they could be
anywhere in the document?

In this case, those numbers worked.  Why didn't the big numbers work?
After all, six and seven were unique throughout the whole number (i.e. six
and seven were only at the end of the document).

I also tried 0 pre and 0 post, but that gave me the same as I had when I
had pre and post as really large numbers.



On Tue, Jun 21, 2016 at 12:50 PM, Allison, Timothy B. 
wrote:

> >Perhaps I'm misunderstanding the pre/post parameters?
>
> Pre/post parameters: " 'six' or 'seven' should not appear $pre tokens
> before 'thirty' or $post tokens after 'thirty'
>
> Maybe something like this:
> spanNear([
>   spanNear([field:one, field:thousand, field:one, field:hundred], 0, true),
>   spanNot(field:thirty, spanOr([field:six, field:seven]), 0,
> 1)
>   ], 0, true)
>
>


Re: [ANN] Relevant Search by Manning out! (Thanks Solr community!)

2016-06-21 Thread Doug Turnbull
Thanks Will and John! You both have also been helpful.


If you want a great relevance-heavy search conference, you better be at
Lucene/Solr Revolution in October!!
http://lucenerevolution.org/
"Lucene Revolution: Officially Endorsed by the Authors of Relevant Search"

See everyone on the mailing list there!!


Best
-Doug




On Tue, Jun 21, 2016 at 2:16 PM Will Hayes  wrote:

> W00t! Congrats!
> On Jun 21, 2016 8:12 PM, "Doug Turnbull" <
> dturnb...@opensourceconnections.com> wrote:
>
> > Not much more to add than my post here! This book is targeted towards
> > Lucene-based search (Elasticsearch and Solr) relevance.
> >
> > Announcement with discount code:
> >
> http://opensourceconnections.com/blog/2016/06/21/relevant-search-published/
> >
> > Related hacker news thread:
> > https://news.ycombinator.com/item?id=11946636
> >
> > Thanks to everyone in the Solr community that was helpful to my efforts.
> > Specifically Trey Grainger, Eric Pugh (for keeping me employed), Charlie
> > Hull and the Flax team, Alex Rafalovitch, Timothy Potter, Yonik Seeley,
> > Grant Ingersoll (for basically teaching me Solr back in the day), Drew
> > Farris (for encouraging my early blogging), everyone at OSC, and many
> > others I'm probably forgetting!
> >
> > Best
> > -Doug
> >
>


Re: [ANN] Relevant Search by Manning out! (Thanks Solr community!)

2016-06-21 Thread John Bickerstaff
Congrats!

Now you can enjoy those huge royalty payments that I'm sure are coming
in... 

Great book and it's been hugely helpful to me.

--JohnB

On Tue, Jun 21, 2016 at 12:12 PM, Doug Turnbull <
dturnb...@opensourceconnections.com> wrote:

> Not much more to add than my post here! This book is targeted towards
> Lucene-based search (Elasticsearch and Solr) relevance.
>
> Announcement with discount code:
> http://opensourceconnections.com/blog/2016/06/21/relevant-search-published/
>
> Related hacker news thread:
> https://news.ycombinator.com/item?id=11946636
>
> Thanks to everyone in the Solr community that was helpful to my efforts.
> Specifically Trey Grainger, Eric Pugh (for keeping me employed), Charlie
> Hull and the Flax team, Alex Rafalovitch, Timothy Potter, Yonik Seeley,
> Grant Ingersoll (for basically teaching me Solr back in the day), Drew
> Farris (for encouraging my early blogging), everyone at OSC, and many
> others I'm probably forgetting!
>
> Best
> -Doug
>


Re: [ANN] Relevant Search by Manning out! (Thanks Solr community!)

2016-06-21 Thread Will Hayes
W00t! Congrats!
On Jun 21, 2016 8:12 PM, "Doug Turnbull" <
dturnb...@opensourceconnections.com> wrote:

> Not much more to add than my post here! This book is targeted towards
> Lucene-based search (Elasticsearch and Solr) relevance.
>
> Announcement with discount code:
> http://opensourceconnections.com/blog/2016/06/21/relevant-search-published/
>
> Related hacker news thread:
> https://news.ycombinator.com/item?id=11946636
>
> Thanks to everyone in the Solr community that was helpful to my efforts.
> Specifically Trey Grainger, Eric Pugh (for keeping me employed), Charlie
> Hull and the Flax team, Alex Rafalovitch, Timothy Potter, Yonik Seeley,
> Grant Ingersoll (for basically teaching me Solr back in the day), Drew
> Farris (for encouraging my early blogging), everyone at OSC, and many
> others I'm probably forgetting!
>
> Best
> -Doug
>


[ANN] Relevant Search by Manning out! (Thanks Solr community!)

2016-06-21 Thread Doug Turnbull
Not much more to add than my post here! This book is targeted towards
Lucene-based search (Elasticsearch and Solr) relevance.

Announcement with discount code:
http://opensourceconnections.com/blog/2016/06/21/relevant-search-published/

Related hacker news thread:
https://news.ycombinator.com/item?id=11946636

Thanks to everyone in the Solr community that was helpful to my efforts.
Specifically Trey Grainger, Eric Pugh (for keeping me employed), Charlie
Hull and the Flax team, Alex Rafalovitch, Timothy Potter, Yonik Seeley,
Grant Ingersoll (for basically teaching me Solr back in the day), Drew
Farris (for encouraging my early blogging), everyone at OSC, and many
others I'm probably forgetting!

Best
-Doug


RE: SpanQuery - How to wrap a NOT subquery

2016-06-21 Thread Allison, Timothy B.
>Perhaps I'm misunderstanding the pre/post parameters?

Pre/post parameters: " 'six' or 'seven' should not appear $pre tokens before 
'thirty' or $post tokens after 'thirty'

Maybe something like this:
spanNear([
  spanNear([field:one, field:thousand, field:one, field:hundred], 0, true),
  spanNot(field:thirty, spanOr([field:six, field:seven]), 0,
1)
  ], 0, true)



Re: SpanQuery - How to wrap a NOT subquery

2016-06-21 Thread Brandon Miller
I saw the second post--the first post was new to me.

We plan on connecting with those people later on, but right now, I'm trying
to write a stop-gap dtSearch compiler until we can at least secure the
funding we need to employ their help.

Right now, I have a very functional query parser, with just a few holes
needing to be patched.

I rewrote my AND NOT and OR NOT queries.

Now I'm perplexed why this query is not working as expected:
spanNear([
  spanNear([field:one, field:thousand, field:one, field:hundred], 0, true),
  spanNot(field:thirty, spanOr([field:six, field:seven]), 2147483647,
2147483647)
  ], 0, true)

is returning 1130..1139.

expected:<[1130, 1131, 1132, 1133, 1134, 1135, 1138, 1139]> but was:<[1130,
1131, 1132, 1133, 1134, 1135, 1136, 1137, 1138, 1139]>

I would've expected 1136 and 1137 to have been excluded.


Original dtSearch string: "one thousand one hundred" pre/0 (thirty and not
(six or seven))
I even tried it with pre/5 to see if there was something funny going on
with that, but it gave the same results: 1130..1139.

If you can tell me what it should look like when the SpanQuery is converted
to a string, I should be able to figure out the rest.

Perhaps I'm misunderstanding the pre/post parameters?

Thank you for any help!

On Tue, Jun 21, 2016 at 9:46 AM, Allison, Timothy B. 
wrote:

> > dtSearch allows a user to have NOTs embedded in proximity searches.
>
> And, if you're heading down the path of building your own queryparser to
> handle dtSearch's syntax, please read and heed Charlie Hull's post:
>
> http://www.flax.co.uk/blog/2016/05/13/old-new-query-parser/
>
> See also:
>
>
> http://www.flax.co.uk/blog/2012/04/24/dtsolr-an-open-source-replacement-for-the-dtsearch-closed-source-search-engine/
>
>


RE: SpanQuery - How to wrap a NOT subquery

2016-06-21 Thread Allison, Timothy B.
> dtSearch allows a user to have NOTs embedded in proximity searches.

And, if you're heading down the path of building your own queryparser to handle 
dtSearch's syntax, please read and heed Charlie Hull's post:

http://www.flax.co.uk/blog/2016/05/13/old-new-query-parser/

See also:

http://www.flax.co.uk/blog/2012/04/24/dtsolr-an-open-source-replacement-for-the-dtsearch-closed-source-search-engine/
 



RE: All Datanodes are Bad

2016-06-21 Thread Markus Jelsma
Hello Joseph,

Your datanodes are in a bad state, you probably overwhelmed it when indexing. 
Check your max open files on those nodes. Usual default of 1024 is way too low.

Markus

 
 
-Original message-
> From:Joseph Obernberger 
> Sent: Monday 20th June 2016 19:36
> To: solr-user@lucene.apache.org
> Subject: All Datanodes are Bad
> 
> Anyone ever seen an error like this?  We are running using HDFS for the
> index.  At the time of the error, we are doing a lot of indexing.
> 
> Two errors:
> java.io.IOException: All datanodes DatanodeInfoWithStorage[
> 172.16.100.220:50010,DS-4b806395-0661-4a70-a32b-deef82a85359,DISK] are bad.
> Aborting...
> at
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1357)
> at
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:1119)
> at
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:622)
> 
> and
> 
> auto commit error...:org.apache.solr.common.SolrException:
> java.io.IOException: All datanodes DatanodeInfoWithStorage[
> 172.16.100.220:50010,DS-4b806395-0661-4a70-a32b-deef82a85359,DISK] are bad.
> Aborting...
> at
> org.apache.solr.update.HdfsTransactionLog.close(HdfsTransactionLog.java:321)
> at org.apache.solr.update.TransactionLog.decref(TransactionLog.java:510)
> at org.apache.solr.update.UpdateLog.addOldLog(UpdateLog.java:372)
> at org.apache.solr.update.UpdateLog.postCommit(UpdateLog.java:668)
> at
> org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:658)
> at org.apache.solr.update.CommitTracker.run(CommitTracker.java:217)
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
> at java.util.concurrent.FutureTask.run(FutureTask.java:262)
> at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178)
> at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: java.io.IOException: All datanodes DatanodeInfoWithStorage[
> 172.16.100.220:50010,DS-4b806395-0661-4a70-a32b-deef82a85359,DISK] are bad.
> Aborting...
> at
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1357)
> at
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:1119)
> at
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:622)
> 
> On the client side doing the indexing, we see:
> 
> org.apache.solr.client.solrj.impl.CloudSolrClient$RouteException: Error
> from server at http://deimos:9100/solr/UNCLASS: java.io.IOException: All
> datanodes 
> DatanodeInfoWithStorage[172.16.100.220:50010,DS-f0e14105-9557-4a59-8918-43724aaa8346,DISK]
> are bad. Aborting...
> at
> org.apache.solr.client.solrj.impl.CloudSolrClient.directUpdate(CloudSolrClient.java:632)
> at
> org.apache.solr.client.solrj.impl.CloudSolrClient.sendRequest(CloudSolrClient.java:981)
> at
> org.apache.solr.client.solrj.impl.CloudSolrClient.requestWithRetryOnStaleState(CloudSolrClient.java:870)
> at
> org.apache.solr.client.solrj.impl.CloudSolrClient.request(CloudSolrClient.java:806)
> at
> org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:149)
> at org.apache.solr.client.solrj.SolrClient.add(SolrClient.java:106)
> at org.apache.solr.client.solrj.SolrClient.add(SolrClient.java:71)
> at org.apache.solr.client.solrj.SolrClient.add(SolrClient.java:85)
> .
> .
> .
> Caused by:
> org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error
> from server at http://deimos:9100/solr/UNCLASS: java.io.IOException: All
> datanodes 
> DatanodeInfoWithStorage[172.16.100.220:50010,DS-f0e14105-9557-4a59-8918-43724aaa8346,DISK]
> are bad. Aborting...
> at
> org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:576)
> at
> org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:240)
> at
> org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:229)
> at
> org.apache.solr.client.solrj.impl.LBHttpSolrClient.doRequest(LBHttpSolrClient.java:372)
> at
> org.apache.solr.client.solrj.impl.LBHttpSolrClient.request(LBHttpSolrClient.java:325)
> at
> org.apache.solr.client.solrj.impl.CloudSolrClient$2.call(CloudSolrClient.java:607)
> at
> org.apache.solr.client.solrj.impl.CloudSolrClient$2.call(CloudSolrClient.java:604)
> at java.util.concurrent.FutureTask.run(FutureTask.java:262)
> at
> org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor$1.run(ExecutorUtil.java:231)

RE: SpanQuery - How to wrap a NOT subquery

2016-06-21 Thread Allison, Timothy B.
In the syntax for LUCENE-5205’s SpanQueryParser 
[0], that’d be

[“one thousand one hundred thirty” (six seven)]!~0,1

In English: find “one thousand one hundred thirty”, but not if six or seven 
comes immediately after it.

[0] https://github.com/tballison/lucene-addons/tree/master/lucene-5205

From: Brandon Miller [mailto:computerengineer.bran...@gmail.com]
Sent: Monday, June 20, 2016 4:12 PM
To: Allison, Timothy B. ; solr-user@lucene.apache.org
Subject: Re: SpanQuery - How to wrap a NOT subquery

Thank you, Timothy.

I have support for and am using SpanNotQuery elsewhere.  Maybe there is another 
use for it that I'm not considering.  I'm wondering if there's a clever way of 
reusing it in order to satisfy the requirements of proximity NOTs, too.

dtSearch allows a user to have NOTs embedded in proximity searches.
I.e.
Let's say you have an index whose ID has been converted to English phrases, 
like 1001 would be "One thousand one"

"one thousand one hundred" pre/0 (thirty and not (six or seven))
Returns: 1130, 1131, 1132, 1133, 1134, 1135,1138, 1139

Perhaps I've been staring at the screen too long and the obvious answer is 
hiding from me.

Here's how I'm trying to implement it, but it's incorrect...  It's giving me 
1130..1139 without excluding anything.



public Query visitNot_expr(Not_exprContext ctx) {
  //ProximityNotSupportedFor("NOT");
Query subquery = visit(ctx.expr());
BooleanQuery.Builder query = new BooleanQuery.Builder();
query.add(subquery, BooleanClause.Occur.MUST_NOT);
// TODO: Consolidate this so that we don't use 
MatchAllDocsQuery, but using the other query, to increase performance
query.add(new MatchAllDocsQuery(), 
BooleanClause.Occur.SHOULD);

if(currentlyInASpanQuery){
SpanQuery matchAllDocs = 
getSpanWildcardQuery(new Term(defaultFieldName,"*"));
SpanNotQuery snq = new 
SpanNotQuery(matchAllDocs, (SpanQuery)subquery, Integer.MAX_VALUE, 
Integer.MAX_VALUE);
return snq;
} else {
return query.build();
}
}

protected SpanQuery getSpanWildcardQuery(Term term) {
WildcardQuery wq = new WildcardQuery(term);
   SpanQuery swq = new SpanMultiTermQueryWrapper<>(wq);
   return swq;
}


On Mon, Jun 20, 2016 at 2:53 PM, Allison, Timothy B. 
mailto:talli...@mitre.org>> wrote:
Bouncing over to user’s list.

As you’ve found, spans are different from regular queries.  MUST_NOT at the 
BooleanQuery level means that the term must not appear anywhere in the 
document; whereas spans focus on terms near each other.

Have you tried SpanNotQuery?  This would allow you at least to do something 
like:

termA but not if zyx or yyy appears X words before or Y words after



From: Brandon Miller 
[mailto:computerengineer.bran...@gmail.com]
Sent: Monday, June 20, 2016 2:36 PM
To: d...@lucene.apache.org
Subject: SpanQuery - How to wrap a NOT subquery

Greetings!

I'm wanting to support this:
TermA within_N_terms_of (abc and cba or xyz and not zyx or not yyy)

Focusing on the sub-query:
I have ANDs and ORs figured out (special tricks playing with slops and such).

I'm having the hardest time figuring out how to wrap a NOT.

Outside of SpanQuery, I'm using a BooleanQuery with a MUST_NOT clause.  That's 
fine (if you know another way, I'd like to hear that, too, but this appears to 
work dandy).

However, SpanQuery requires queries that are also of type SpanQuery or 
SpanMultiTermQueryWrapper will allow you to throw in anything derived from 
MultiTermQuery (which includes AutomatedQuery).

Right now, I'm at a loss.  We have huge, complex, nested boolean queries inside 
proximity operators with our current solution.

If I need to write a custom solution, then that's what I need to hear and 
perhaps a couple of pointers.

Thanks a bunch and God bless!

Brandon



AW: How many cores is too many cores?

2016-06-21 Thread Sebastian Riemer
Thanks for your respone Erick!

Currently we are trying to keep things simple so we don't use SolrCloud.

I'll give it a look, configuration seems easy, however testing with many 
clients in parallel seems not so much.

Thanks again,
Sebastian

-Ursprüngliche Nachricht-
Von: Erick Erickson [mailto:erickerick...@gmail.com] 
Gesendet: Dienstag, 21. Juni 2016 01:52
An: solr-user 
Betreff: Re: How many cores is too many cores?

Sebastian:

It Depends (tm). Solr can handle this, but there are caveats. Is this SolrCloud 
or not? Each core will consume some resources and there are some JIRAs out 
there about specifically that many cores in SolrCloud.
If your problem space works with the LotsOfCores, start here:
https://cwiki.apache.org/confluence/display/solr/Format+of+solr.xml
and
https://cwiki.apache.org/confluence/display/solr/Defining+core.properties
The idea is that if your access pattern is
> sign on
> ask some questions
> go away
you can configure that only N cores are loaded at any one time.
Theoretically you can have a huge number of cores (I've tested with
15,000) defined, but only say 100 active at a time.

There are also options you can specify that cause a core to not be loaded until 
requested, but not aged out.

The 1,500 core case will keep Solr from coming up until all of the cores have 
been opened, which can be lengthy. But you can define the number of threads 
that are running in parallel to open the cores
but the default is unlimited so you can run out of threads (really memory).

So the real answer is "it's not insane, but you really need to test it 
operationally and tweak a bunch of settings before making your decision"

Best,
Erick

On Mon, Jun 20, 2016 at 12:49 PM, Sebastian Riemer  wrote:
> Hi,
>
> Currently I have a single solr server handling 5 cores which differ in the 
> content they provide.
>
> However, each of them might hold data for many different clients/customers. 
> Let's say for example one day there might be 300 different clients each 
> storing their data in those 5 cores.
>
> Every client can make backups of his data and import that data back into our 
> system. That however, makes re-indexing all of his documents in the cores 
> necessary, which A) is very slow at the moment since fetching the data from 
> MySQL-DB is slow and B) would slow down searches for all other clients while 
> the reindexing is taking place, right?
>
> Now my idea would be:
>
> What if each client gets his own 5 cores? Then instead of re-indexing I could 
> simply copy back the solr-index files (which I copied while making the 
> backup) into his core-directories, right?
>
> That would lead to about 5 x 300 cores, equals 1500 cores.
>
> Am I insane by thinking that way?
>
> Best regards,
> Sebastian
>


Re: How do we get terms suggestion from SuggestComponent?

2016-06-21 Thread solr2020
Thanks Ahmet.

It is working fine. Now i would like to get suggestions for multiple terms.
How do i get suggestions for multiple terms?


 
   

   
   
 
   

   

  
 

Thanks.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-do-we-get-terms-suggestion-from-SuggestComponent-tp4283399p4283584.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: Solr 5.5 | Field boosting not working as per expectation

2016-06-21 Thread Megha Bhandari
We have the following in solrconfig.xml, nothing related to timestamp



  edismax
  
metatag.keywords^90.1 metatag.description^50.1 title^1.1 
h1^1000.7 h2^700.6 h3^10.1 h4^5.4 h5^1.3 h6^1.2 _text_^1.0
  

And the explain query doesn’t mention anything wrt to timestamp

'/content/dam/uhcdotcom/en/qa_workarea/Silver-Choice-5000-E.pdf'=>'
0.050655644 = max of:
  0.050655644 = weight(title:florida in 0) [ClassicSimilarity], result of:
0.050655644 = score(doc=0,freq=1.0), product of:
  0.0075930804 = queryWeight, product of:
1.1 = boost
6.6712904 = idf(docFreq=21, maxDocs=6389)
1.1381613E-7 = queryNorm
  6.6712904 = fieldWeight in 0, product of:
1.0 = tf(freq=1.0), with freq of:
  1.0 = termFreq=1.0
6.6712904 = idf(docFreq=21, maxDocs=6389)
1.0 = fieldNorm(doc=0)
  3.462133E-7 = weight(_text_:florida in 0) [ClassicSimilarity], result of:
3.462133E-7 = score(doc=0,freq=2.0), product of:
  4.222856E-7 = queryWeight, product of:
3.710244 = idf(docFreq=424, maxDocs=6389)
1.1381613E-7 = queryNorm
  0.8198558 = fieldWeight in 0, product of:
1.4142135 = tf(freq=2.0), with freq of:
  2.0 = termFreq=2.0
3.710244 = idf(docFreq=424, maxDocs=6389)
0.15625 = fieldNorm(doc=0)
',
  'https://10.209.5.171/contact-us/florida'=>'
0.02968075 = max of:
  0.0011872416 = weight(title:florida in 380) [ClassicSimilarity], result of:
0.0011872416 = score(doc=380,freq=1.0), product of:
  0.0075930804 = queryWeight, product of:
1.1 = boost
6.6712904 = idf(docFreq=21, maxDocs=6389)
1.1381613E-7 = queryNorm
  0.15635836 = fieldWeight in 380, product of:
1.0 = tf(freq=1.0), with freq of:
  1.0 = termFreq=1.0
6.6712904 = idf(docFreq=21, maxDocs=6389)
0.0234375 = fieldNorm(doc=380)
  5.724965E-9 = weight(_text_:florida in 380) [ClassicSimilarity], result of:
5.724965E-9 = score(doc=380,freq=14.0), product of:
  4.222856E-7 = queryWeight, product of:
3.710244 = idf(docFreq=424, maxDocs=6389)
1.1381613E-7 = queryNorm
  0.013557092 = fieldWeight in 380, product of:
3.7416575 = tf(freq=14.0), with freq of:
  14.0 = termFreq=14.0
3.710244 = idf(docFreq=424, maxDocs=6389)
9.765625E-4 = fieldNorm(doc=380)
  8.445298E-5 = weight(h1:florida in 380) [ClassicSimilarity], result of:
8.445298E-5 = score(doc=380,freq=1.0), product of:
  8.387785E-4 = queryWeight, product of:
1000.7 = boost
7.3644376 = idf(docFreq=10, maxDocs=6389)
1.1381613E-7 = queryNorm
  0.10068567 = fieldWeight in 380, product of:
1.0 = tf(freq=1.0), with freq of:
  1.0 = termFreq=1.0
7.3644376 = idf(docFreq=10, maxDocs=6389)
0.013671875 = fieldNorm(doc=380)
  0.02968075 = weight(metatag.description:florida in 380) [ClassicSimilarity], 
result of:
0.02968075 = score(doc=380,freq=1.0), product of:
  0.3796503 = queryWeight, product of:
50.1 = boost
6.6712904 = idf(docFreq=21, maxDocs=6389)
1.1381613E-7 = queryNorm
  0.07817918 = fieldWeight in 380, product of:
1.0 = tf(freq=1.0), with freq of:
  1.0 = termFreq=1.0
6.6712904 = idf(docFreq=21, maxDocs=6389)
0.01171875 = fieldNorm(doc=380)
',

So not able to fathom what is going wrong.

-Original Message-
From: Alexandre Rafalovitch [mailto:arafa...@gmail.com] 
Sent: Tuesday, June 21, 2016 1:50 PM
To: solr-user
Subject: Re: Solr 5.5 | Field boosting not working as per expectation

Sounds strange.

Are you absolutely sure this insertion-time factor is not something
you are doing explicitly? I would add debug=all parameter to see that
you don't have some unexpectedly-complex biasing formula hiding in
solrconfig.xml parameter.

Regards,
   Alex.

Newsletter and resources for Solr beginners and intermediates:
http://www.solr-start.com/


On 21 June 2016 at 16:53, Megha Bhandari  wrote:
> After further investigation we have found that latest inserted documents are 
> getting higher priority and coming on top of the search results and ignoring 
> the field boosting in case the time difference of document insertion is a day.
>
> Is there a configuration to switch off insertion time factor. As per our 
> understanding field boosting should take precedence.
>
> Thanks in advance for any inputs you can give.
>
> -Original Message-
> From: Megha Bhandari [mailto:mbhanda...@sapient.com]
> Sent: Monday, June 20, 2016 4:37 PM
> To: solr-user@lucene.apache.org
> Subject: Solr 5.5 | Field boosting not working as per expectation
>
> Hi
>
> Problem statement : metatag.description field has highest boost and documents 
> with a match in this field should come first. However 
> Silver-Choice-5000-E.pdf comes before /contact-us/florida even though the 
> search term matches mo

Re: Regarding CDCR SOLR 6

2016-06-21 Thread Renaud Delbru

Hi,

On 15/06/16 03:18, Bharath Kumar wrote:

Hi Renaud,

Thank you so much for your response. It is very helpful and it helped me
understand the need for turning on buffering.

Is it recommended to keep the buffering enabled all the time on the
source cluster? If the target cluster is up and running and the cdcr is
started, can i turn off the buffering on the source site?


yes, no need to keep buffering on if your target cluster is up and 
running and cdcr replication is started.



As you have mentioned, the transaction logs are kept on the source
cluster, until the data is replicated on the target cluster, once the
cdcr is started. Is there a possibility that target cluster is out of
sync with the source cluster and we need to do a hard recovery from the
source cluster to sync up the target cluster?


If the target cluster goes down while cdcr is replicating, there should 
be no loss of information. The source cluster will try from time to time 
to communicate with the target and continue the replication until the 
target cluster is back up and running. Until it can resume 
communication, the source cluster will keep a pointer on where the 
replication should resume, and therefore the update log will not be 
cleaned up to this point.


The pointer on the source cluster is not persistent (maybe that could be 
something to implement). Therefore if the source cluster is restarted, 
the pointer will be lost, and buffer should be activated until the 
target cluster is up and running.




Also i have the below configuration on the source cluster to synchronize
the update logs.
|   <||lst| |name||=||"updateLogSynchronizer"||>|
|||<||str| |name||=||"schedule"||>1000|

|
|
|Regarding the monitoring of the replication, i am planning to add a
script to check the queue size, to make sure the disk is not full in
case the target site is down and the transaction log size keeps growing
on the source site.|
|Is there any other recommended approach?|


The best is to use the monitoring api which provides some metrics on how 
the replication is going. In the cwiki [1], there are also some 
recommendations on how to monitor the system


[1] 
https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=62687462


Kind Regards
--
Renaud Delbru


|
|
|Thanks again, your inputs were very helpful.|

On Tue, Jun 14, 2016 at 7:10 PM, Bharath Kumar
mailto:bharath.mvku...@gmail.com>> wrote:

Hi Renaud,

Thank you so much for your response. It is very helpful and it
helped me understand the need for turning on buffering.

Is it recommended to keep the buffering enabled all the time on the
source cluster? If the target cluster is up and running and the cdcr
is started, can i turn off the buffering on the source site?

As you have mentioned, the transaction logs are kept on the source
cluster, until the data is replicated on the target cluster, once
the cdcr is started, is there a possibility that if on the target
cluster



On Tue, Jun 14, 2016 at 6:50 AM, Davis, Daniel (NIH/NLM) [C]
mailto:daniel.da...@nih.gov>> wrote:

I must chime in to clarify something - in case 2, would the
source cluster eventually start a log reader on its own?   That
is, would the CDCR heal over time, or would manual action be
required?

-Original Message-
From: Renaud Delbru [mailto:renaud@siren.solutions
]
Sent: Tuesday, June 14, 2016 4:51 AM
To: solr-user@lucene.apache.org 
Subject: Re: Regarding CDCR SOLR 6

Hi Bharath,

The buffer is useful when you need to buffer updates on the
source cluster before starting cdcr, if the source cluster might
receive updates in the meanwhile and you want to be sure to not
miss them.

To understand this better, you need to understand how cdcr clean
transaction logs. Cdcr when started (with the START action) will
instantiate a log reader for each target cluster. The position
of the log reader will indicate cdcr which transaction logs it
can clean. If all the log readers are beyond a certain point,
then cdcr can clean all the transaction logs up to this point.

However, there might be cases when the source cluster will be up
without any log readers instantiated:
1) The source cluster is started, but cdcr is not started yet
2) the source cluster is started, cdcr is started, but the
target cluster was not accessible when cdcr was started. In this
case, cdcr will not be able to instantiate a log reader for this
cluster.

In these two scenarios, if updates are received by the source
cluster, then they might be cleaned out from the transaction log
as per the normal update log cleaning procedure.
That is where the buffer becomes useful. When you k

Re: Encryption to Solr indexes – Using Custom Codec

2016-06-21 Thread Renaud Delbru

Hi,

maybe it is the way you created the jar ? Why not applying the patch to 
lucene/solr trunk and use ant jar instead to get the codecs jar created 
for you ?
Also, I think the directory where you put the jars should be called 
"lib" instead of "Lib".

you can try also to use the lib directives in your solrconfig.xml [1]

[1] 
https://cwiki.apache.org/confluence/display/solr/Lib+Directives+in+SolrConfig


--
Renaud Delbru

On 20/06/16 15:42, Sidana, Mohit wrote:

Hello,

As Part of my studies I am exploring the solutions which can be used for
Lucene/Solr Index encryption.

I found the patch open on Apache JIRA- Codec for index-level encryption
(LUCENE-6966).
https://issues.apache.org/jira/browse/LUCENE-6966 and I am currently
trying to test this Custom codec with Solr to perform secure search over
some sensitive records.

I've decided to follow the path described in Solr wiki, setting up
Simple Text Codec and further tried to use Encrypted codec Source.

*Here are the additional details.*

I've created a basic jar file out of this source code (Build it as a jar
from Eclipse using Maven Plugin).

The Solr installation I'm using to test this is the Solr 6.0.0 unzipped,
and started via its embedded Jetty server and using the single core.

I've placed my jar with the codec in [My_Core\ instance Dir.]\ Lib

In:

[$SolrDir]\Solr\ My_Core \conf\*solrconfig.xml*

I've added the following lines:

| |||

||

Then in the *schema.xml* file, I've declared some field and field Types
that should use this codec:



|  |

|  |

|  |

|  |

|  |

|  |

||

||

||

||

I'm pretty sure I've followed all the steps described in Solr Wiki;
however, when I actually try to use custom codec implementation (named
"Encrypted Codec") to index some sample CSV data using simple post tool

java -Dtype=text/csv -Durl=http://localhost:8983/solr/My_Core /update
-jar  post.jar  Sales.csv

and I have also tried doing the same with SolrJ but I have faced the
same error.

SolrClient _server_=
*new*HttpSolrClient("http://localhost:8983/solr/My_Core ");

   SolrInputDocument doc= *new*SolrInputDocument();

doc.addField("id", "1234");

doc.addField("name", "A lovely summer holiday");

*try*{

server.add(doc);

server.commit();

  System.*/out/*.println("Document added!");

   } *catch*(SolrServerException | IOException e) {

e.printStackTrace();

   }

}

}

I get the attached errors in Solr log.

org.apache.solr.common.SolrException: Exception writing document id
b3e01ada-d0f1-4ddf-ad6a-2828bfe619a3 to the index; possible analysis error.

 at
org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:181)

 at
org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:68)

 at
org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:48)

 at
org.apache.solr.update.processor.AddSchemaFieldsUpdateProcessorFactory$AddSchemaFieldsUpdateProcessor.processAdd(AddSchemaFieldsUpdateProcessorFactory.java:335)

 at
org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:48)

 at
org.apache.solr.update.processor.FieldMutatingUpdateProcessor.processAdd(FieldMutatingUpdateProcessor.java:117)

 at
org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:48)

 at
org.apache.solr.update.processor.FieldMutatingUpdateProcessor.processAdd(FieldMutatingUpdateProcessor.java:117)

 at
org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:48)

 at
org.apache.solr.update.processor.FieldMutatingUpdateProcessor.processAdd(FieldMutatingUpdateProcessor.java:117)

 at
org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:48)

 at
org.apache.solr.update.processor.FieldMutatingUpdateProcessor.processAdd(FieldMutatingUpdateProcessor.java:117)

 at
org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:48)

 at
org.apache.solr.update.processor.FieldNameMutatingUpdateProcessorFactory$1.processAdd(FieldNameMutatingUpdateProcessorFactory.java:74)

 at
org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:48)

 at
org.apache.solr.update.processor.FieldMutatingUpdateProcessor.processAdd(FieldMutatingUpdateProcessor.java:117)

 at
org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:48)

 at
org.apache.solr.update.processor.DistributedUpdateProcessor.doLocalAdd(DistributedUpdateProcessor.java:936)

 at

RE: Encryption to Solr indexes – Using Custom Codec

2016-06-21 Thread Sidana, Mohit
Hello Bosco,

As part of my thesis I am evaluating the existing solutions for Index level 
Encryption.

I believe using this custom codec implementation would be a better choice as it 
provides very low level access and better control over index data.(As I am only 
intrested in how term dictionary and stored fields are being stored on disk)

Therefore, I am trying to test this approach with Solr.


Thanks,
Mohit

-Original Message-
From: Don Bosco Durai [mailto:bo...@apache.org] 
Sent: Monday, June 20, 2016 7:10 PM
To: solr-user@lucene.apache.org
Subject: Re: Encryption to Solr indexes – Using Custom Codec

Mohit, just curious. Have you considered just encrypting the volume where there 
index is stored? It will be done at the OS level and performance wise it will 
be much better and easier to manage. Would it address your requirement? Or is 
it purely academic exercise for you.

Bosco


From:  "Sidana, Mohit" 
Reply-To:  
Date:  Monday, June 20, 2016 at 7:42 AM
To:  "solr-user@lucene.apache.org" 
Subject:   Encryption to Solr indexes – Using Custom Codec

Hello,

As Part of my studies I am exploring the solutions which can be used for 
Lucene/Solr Index encryption.

I found the patch open on Apache JIRA - Codec for index-level encryption 
(LUCENE-6966). https://issues.apache.org/jira/browse/LUCENE-6966 and I am 
currently trying to test this Custom codec with Solr to perform secure search 
over some sensitive records. 

I've decided to follow the path described in Solr wiki, setting up Simple Text 
Codec and further tried to use Encrypted codec Source. 

Here are the additional details.

I've created a basic jar file out of this source code (Build it as a jar from 
Eclipse using Maven Plugin).

The Solr installation I'm using to test this is the Solr 6.0.0 unzipped, and 
started via its embedded Jetty server and using the single core.

I've placed my jar with the codec in [My_Core\ instance Dir.]\ Lib

In:

[$SolrDir]\Solr\ My_Core \conf\solrconfig.xml

I've added the following lines:

 



Then in the schema.xml file, I've declared some field and field Types that 
should use this codec:



  

  

  

  

  

  

 







 

I'm pretty sure I've followed all the steps described in Solr Wiki; however, 
when I actually try to use custom codec implementation (named "Encrypted 
Codec") to index some sample CSV data using simple post tool 

java -Dtype=text/csv -Durl=http://localhost:8983/solr/My_Core /update -jar  
post.jar  Sales.csv

and I have also tried doing the same with SolrJ but I have faced the same error.

SolrClient server = new HttpSolrClient("http://localhost:8983/solr/My_Core ");

  SolrInputDocument doc = new SolrInputDocument();

 

  doc.addField("id", "1234");

  doc.addField("name", "A lovely summer holiday");

 

  try {

server.add(doc);

server.commit();

 System.out.println("Document added!");

  } catch (SolrServerException | IOException e) {

e.printStackTrace();

  }

 

   }

 

}

I get the attached errors in Solr log.

org.apache.solr.common.SolrException: Exception writing document id 
b3e01ada-d0f1-4ddf-ad6a-2828bfe619a3 to the index; possible analysis error.

at 
org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:181)

at 
org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:68)

at 
org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:48)

at 
org.apache.solr.update.processor.AddSchemaFieldsUpdateProcessorFactory$AddSchemaFieldsUpdateProcessor.processAdd(AddSchemaFieldsUpdateProcessorFactory.java:335)

at 
org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:48)

at 
org.apache.solr.update.processor.FieldMutatingUpdateProcessor.processAdd(FieldMutatingUpdateProcessor.java:117)

at 
org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:48)

at 
org.apache.solr.update.processor.FieldMutatingUpdateProcessor.processAdd(FieldMutatingUpdateProcessor.java:117)

at 
org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:48)

at 
org.apache.solr.update.processor.FieldMutatingUpdateProcessor.processAdd(FieldMutatingUpdateProcessor.java:117)

at 
org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:48)

at 
org.apache.solr.update.processor.FieldMutatingUpdateProcessor.processAdd(FieldMutatingUpdateProcessor.java:117)

at 
org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:48)

at 
org.apache.sol

Re: Solr 5.5 | Field boosting not working as per expectation

2016-06-21 Thread Alexandre Rafalovitch
Sounds strange.

Are you absolutely sure this insertion-time factor is not something
you are doing explicitly? I would add debug=all parameter to see that
you don't have some unexpectedly-complex biasing formula hiding in
solrconfig.xml parameter.

Regards,
   Alex.

Newsletter and resources for Solr beginners and intermediates:
http://www.solr-start.com/


On 21 June 2016 at 16:53, Megha Bhandari  wrote:
> After further investigation we have found that latest inserted documents are 
> getting higher priority and coming on top of the search results and ignoring 
> the field boosting in case the time difference of document insertion is a day.
>
> Is there a configuration to switch off insertion time factor. As per our 
> understanding field boosting should take precedence.
>
> Thanks in advance for any inputs you can give.
>
> -Original Message-
> From: Megha Bhandari [mailto:mbhanda...@sapient.com]
> Sent: Monday, June 20, 2016 4:37 PM
> To: solr-user@lucene.apache.org
> Subject: Solr 5.5 | Field boosting not working as per expectation
>
> Hi
>
> Problem statement : metatag.description field has highest boost and documents 
> with a match in this field should come first. However 
> Silver-Choice-5000-E.pdf comes before /contact-us/florida even though the 
> search term matches more fields in the /contact-us/florida page. In 
> Silver-Choice-5000-E.pdf matches are found in title and _text fields. In 
> /contact-us/florida matches are found in metatag.description,title,h1,_text 
> fields.
>
> We have the following in solrconfig.xml
>
> 
> 
>   edismax
>   
> metatag.keywords^90.1 metatag.description^50.1 title^1.1 
> h1^1000.7 h2^700.6 h3^10.1 h4^5.4 h5^1.3 h6^1.2 _text_^1.0
>   
>
> When searching for Florida we get the following results.
> --
>
> {
>
> 
> 'id'=>'/content/dam/uhcdotcom/en/qa_workarea/Silver-Choice-5000-E.pdf',
>
> 'title'=>'Florida',
>
> 'metatag.description'=>'mental health',
>
> 'itemtype'=>'pdf',
>
> 'playerid'=>'',
>
> 'playerkey'=>'',
>
> 'metatag.topresultthumbnailalt'=>'Florida',
>
> 'lang'=>'en',
>
> 'metatag.hideininternalsearch'=>'false'},
>
>   {
>
> 'lang'=>'en',
>
> 'metatag.topresultthumbnailurl'=>'',
>
> 'id'=>'https://10.209.5.171/contact-us/florida',
>
> 'title'=>'Florida',
>
> 'metatag.topresultthumbnailalt'=>'',
>
> 'metatag.hideininternalsearch'=>'false',
>
> 'metatag.description'=>'Contact UnitedHealthcare in Florida.'}
>
> ---
> With following debug information
>
> '/content/dam/uhcdotcom/en/qa_workarea/Silver-Choice-5000-E.pdf'=>'
> 0.050655644 = max of:
>   0.050655644 = weight(title:florida in 0) [ClassicSimilarity], result of:
> 0.050655644 = score(doc=0,freq=1.0), product of:
>   0.0075930804 = queryWeight, product of:
> 1.1 = boost
> 6.6712904 = idf(docFreq=21, maxDocs=6389)
> 1.1381613E-7 = queryNorm
>   6.6712904 = fieldWeight in 0, product of:
> 1.0 = tf(freq=1.0), with freq of:
>   1.0 = termFreq=1.0
> 6.6712904 = idf(docFreq=21, maxDocs=6389)
> 1.0 = fieldNorm(doc=0)
>   3.462133E-7 = weight(_text_:florida in 0) [ClassicSimilarity], result of:
> 3.462133E-7 = score(doc=0,freq=2.0), product of:
>   4.222856E-7 = queryWeight, product of:
> 3.710244 = idf(docFreq=424, maxDocs=6389)
> 1.1381613E-7 = queryNorm
>   0.8198558 = fieldWeight in 0, product of:
> 1.4142135 = tf(freq=2.0), with freq of:
>   2.0 = termFreq=2.0
> 3.710244 = idf(docFreq=424, maxDocs=6389)
> 0.15625 = fieldNorm(doc=0)
> ',
>   'https://10.209.5.171/contact-us/florida'=>'
> 0.02968075 = max of:
>   0.0011872416 = weight(title:florida in 380) [ClassicSimilarity], result of:
> 0.0011872416 = score(doc=380,freq=1.0), product of:
>   0.0075930804 = queryWeight, product of:
> 1.1 = boost
> 6.6712904 = idf(docFreq=21, maxDocs=6389)
> 1.1381613E-7 = queryNorm
>   0.15635836 = fieldWeight in 380, product of:
> 1.0 = tf(freq=1.0), with freq of:
>   1.0 = termFreq=1.0
> 6.6712904 = idf(docFreq=21, maxDocs=6389)
> 0.0234375 = fieldNorm(doc=380)
>   5.724965E-9 = weight(_text_:florida in 380) [ClassicSimilarity], result of:
> 5.724965E-9 = score(doc=380,freq=14.0), product of:
>   4.222856E-7 = queryWeight, product of:
> 3.710244 = idf(docFreq=424, maxDocs=6389)
> 1.1381613E-7 = queryNorm
>   0.013557092 = fieldWeight in 380, product of:
> 3.7416575 = tf(freq=14.0), with freq of:
>   14.0 = termFreq=14.0
> 3.710244 = idf(docFreq=424, maxDocs=6389)
> 9.765625E-4 = fieldNorm(doc=380)
>   8.445298E-5 = weight(h1:florida in 380) [ClassicSimilarity], result of:
>