date:20160510

Thank you, Yes i am aware that surround with quotes will result in match for
space but i am trying to match word based on input which cant be controlled. 
I need to search solr for %rek Dr%  and return all result which has "rek Dr"
without qoutes.





--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-search-in-solr-for-words-like-rek-Dr-tp4275854p4276027.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Transforming SolrDocument to SolrInputDocument in Solr 6.0

2016-05-10 Thread Alexandre Rafalovitch

Not sure if that's useful, but the samples that ship with Solr show how to
transform Solr XML output into Solr Update XML format using XSLT
post-processing.

Regards,
   Alex.


Newsletter and resources for Solr beginners and intermediates:
http://www.solr-start.com/

On 11 May 2016 at 01:36, Stephan Schubert  wrote:

> In Solr 6.0 the method ClientUtils.toSolrInputDocument() was removed
> (deprecated since 5.5.1, see
> https://issues.apache.org/jira/browse/SOLR-8339). What is the best way
> now to transform a SolrDocument into a SolrInputDocument?
>
> Mit freundlichen Grüßen / Best regards
>
> Stephan Schubert
> Senior Web Application Engineer  |   IT Engineering Information Oriented
> Applications
>
>
>
> SICK AG  |  Erwin-Sick-Str. 1  |  79183 Waldkirch  |  Germany
> Phone +49 7681 202-3751  |  stephan.schub...@sick.de  |
> http://www.sick.de
> 
> __
>
> SICK AG  |   Sitz: Waldkirch i. Br.  |   Handelsregister: Freiburg i. Br.
> HRB 280355
> Vorstand: Dr. Robert Bauer (Vorsitzender)  |  Reinhard Bösl  |  Dr. Mats
> Gökstorp  |  Dr. Martin Krämer  |  Markus Vatter
> Aufsichtsrat: Gisela Sick (Ehrenvorsitzende)  |  Klaus M. Bukenberger
> (Vorsitzender)

How do we generate SHA256 password for Authentication

2016-05-10 Thread Shamik Bandopadhyay

Hi,

  I'm trying to setup Authentication and Role-based authorization in Solr
5.5. Beside "Solr" user from example, I've created another user "dev". I've
used the following website to generate sha256 encoded password.

http://www.lorem-ipsum.co.uk/hasher.php

I've used password as "password" .

Here's my security.json

{
  "authentication": {
"blockUnknown": false,
"class": "solr.BasicAuthPlugin",
"credentials": {
  "solr": "IV0EHq1OnNrj6gvRCwvFwTrZ1+z1oBbnQdiVC3otuq0=
Ndd7LKvVBAaZIF0QAVi1ekCfAJXr1GGfLtRUXhgrF8c=",
  "dev":"
5e884898da28047151d0e56f8dc6292773603d0d6aabbdd62a11ef721d1542d8",
}
  },
  "authorization": {
"class": "solr.RuleBasedAuthorizationPlugin",
"permissions": [
  {
"name": "security-edit",
"role": "admin"
  },
  {
"name": "schema-edit",
"role": "admin"
  },
  {
"name": "config-edit",
"role": "admin"
  },
  {
"name": "collection-admin-edit",
"role": "admin"
  },
  {
"name": "all-admin",
"collection": null,
"path": "/*",
"role": "adminAllRole"
  },
  {
"name": "all-core-handlers",
"path": "/*",
"role": "adminAllHandler"
  },
  {
"name": "update",
"role": "updateRole"
  },
  {
"name": "read",
"role": "readRole"
  },
  {
"name": "browse",
"collection": "gettingstarted",
"path": "/browse",
"role": "browseRole"
  },
  {
"name": "select",
"collection": "gettingstarted",
"path": "/select/*",
"role": "selectRole"
  }
],
"user-role": {
  "solr": [
"admin",
"adminAllRole",
"adminAllHandler",
"updateRole"
  ],
  "dev": [
"readRole"
  ]
}
  }
}

Here's what I'm doing.
1. I started Solr in Cloud mode "solr start -e cloud -noprompt"
2. zkcli.bat -zkhost localhost:9983 -cmd putfile /security.json
security.json
3. tried http://localhost:8983/solr/gettingstarted/browse , provided
dev/password but I'm getting the following exception:

[c:gettingstarted s:shard2 r:core_node3 x:gettingstarted_shard2_replica2]
org.apache.solr.servlet.HttpSolrCall; USER_REQUIRED auth header Basic
c29scjpTb2xyUm9ja3M= context : userPrincipal: [[principal: solr]] type:
[UNKNOWN], collections: [gettingstarted,], Path: [/browse] path : /browse
params :

Looks like I'm using the wrong way of generating the password.
solr/SolrRocks works as expected.

Also, sure what's wrong with the "readRole" . It doesn't seem to work when
I try with user "solr".

Any pointers will be appreciated.

-Thanks,
Shamik

Re: How to search in solr for words like %rek Dr%

2016-05-10 Thread Walter Underwood

That is going to be a very slow search in Solr.

But if you want to match space separated words, that is very easy and fast in 
Solr. Surround the phrase in quotes: “N Derek”.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)


> On May 10, 2016, at 3:53 PM, Thrinadh Kuppili  wrote:
> 
> Thanks Nick, will look into it.
> 
> My main moto is to able to search like %xxx xxx% similar to database search
> of contians with.
> 
> 
> 
> 
> 
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/How-to-search-in-solr-for-words-like-rek-Dr-tp4275854p4275970.html
> Sent from the Solr - User mailing list archive at Nabble.com.

Re: How to search in solr for words like %rek Dr%

Thanks Nick, will look into it.

My main moto is to able to search like %xxx xxx% similar to database search
of contians with.





--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-search-in-solr-for-words-like-rek-Dr-tp4275854p4275970.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Simulate doc linking via post filter cache check

2016-05-10 Thread tedsolr

Mikhail, that's an interesting idea. If a terms list could stand in for a
cache that may be helpful. What I don't fully see is how the search would
work. Building an explicit negative terms query with returned IDs doesn't
seem possible as that list would be in the millions. To drastically speed my
process up I need to stop updating the data docs and only update the marker
(linked) docs.

Starting with 0 terms indexed for field "doclist" the very first search is
easy:
- put all result IDs in the doclist
Second search must exclude results that are already represented in the
doclist field. How is that possible?

I should mention I do an explicit hard commit after running each saved
search, to prevent consecutive searches from overlapping. That is probably
costing me. I didn't know it was possible to do an explicit soft commit. How
do you do that with SolrJ (not by setting maxDocs=1 in the config I hope)?




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Simulate-doc-linking-via-post-filter-cache-check-tp4275842p4275929.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Simulate doc linking via post filter cache check

2016-05-10 Thread Mikhail Khludnev

The problem description is really long, you know.
I'd attack the statement:

> Since it's not possible to do a RDBMS like search joining the 2
> doc types, I need to run the saved search: find docs where name=Johnson,
> then drop the docs that are not in a doclist.
>

And also, if you remove all markers or starting from empty collection, and
do softCommit after every add, you can use /terms (TermsComponent) as a
"cache of" inserted doclist_ids.

For me it seems more like transient cache for ETL process, this state makes
sense only for single load operation; and not a search engine concern,
really.

Also, you can think from the opposite side:
after you search for the first request: q=name:Johnson and add it result to
markers, the second request might be q=name:Jacobson -name:Johnson etc,
until you exceed maxBooleanClauses limit, that can be leveraged by another
meanings.

and also every request can append list of responded ids into the growing
list of negative terms query:
q=name:Jacobson -{terms f=ids v=$alreadyseen}&alreadyseen=2,4,6,8,...

or they might be joined from markers, if you can afford often softCommit,
etc there are plenty of approaches to keep your hair.


On Tue, May 10, 2016 at 6:44 PM, tedsolr  wrote:

> I'm pulling my hair out on this one - and there's not much of that to begin
> with. The problem I have is that updating 10M denormalized docs in a
> collection takes about 5 hours. Soon there will be collections with 100M
> docs and a 50 hour update cycle will not be acceptable. The process
> involves
> cleaning (deleting) the marker fields, querying the collection with user
> defined saved searches, then updating the marker fields in every matched
> doc. If I can normalize based on the searches the processing time should go
> way down: delete marker docs, query the collection with user defined saved
> searches, then insert marker docs. The time savings comes from 1) deleting
> and inserting docs is faster than updating docs, 2) the number of saved
> searches is at least 1000X less than the number of docs.
>
> A doc may have a couple hundred fields, but looks sorta like this:
> {"id":123_5677899","searchid":"34","name":"Johnson", ...}
>
> To normalize I would remove the searchid into a new doc:
> {"id":"S234","searchid":"34","doclist":["123_5677899","123_5677898",...]}
>
> The "link" is established by the doclist field which is multivalued and
> contains the ids from the real docs. All this is doable, the problem is
> that
> when users create saved searches they must only match docs that have not
> already been matched by another search. That's why there's only one doc
> "type" now - every matched doc has a marker (searchid) which makes the Solr
> search work. Since it's not possible to do a RDBMS like search joining the
> 2
> doc types, I need to run the saved search: find docs where name=Johnson,
> then drop the docs that are not in a doclist.
>
> So, maybe if I manage a custom cache of matched doc ids, I can check each
> returned id against the cache and drop the docs that are not in it. I think
> this could be done in a post filter. There will be a big memory hit to
> maintain this cache, but does this seem like a performant solution to my
> problem?
>
> Thanks!
> v5.2.1
> All collections are one shard with replication factor 2
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Simulate-doc-linking-via-post-filter-cache-check-tp4275842.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>



-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics

Re: Facet ignoring repeated word

2016-05-10 Thread Toke Eskildsen

G, Rajesh  wrote:
> Thanks Toke. The issue I have is I cannot look for a specific word e.g. ddr
> in termfreq(%27name%27,%20%27ddr%27). I have to find count of all words
> and their sum

Is that really the case? As your field is a comment field, your word cloud 
could easily contain tens or hundreds of thousands of words. That is pretty 
hard to display. Normally a word cloud consists of a small amount of words, 
just as seen in the example you link to. The point of using facet + stats is 
that facets gives you a rough list and stats gives you the real count.

If a usable word cloud consists of 50 words, you could use something like 
facet.limit=200 and feed those to your stats-request, then only use the top-50 
from there. I know that it does not guarantee that the words are the correct 
ones, but you can experiment with the facet.limit until you get a proper 
speed/accurracy trade-off.

- Toke Eskildsen

Re: Re: Transforming SolrDocument to SolrInputDocument in Solr 6.0

NP, Having to dive into the patch is kind of arcane...

On Tue, May 10, 2016 at 8:54 AM, Stephan Schubert 
wrote:

> Ouch... thanks a lot ;)
>
>
> Mit freundlichen Grüßen / Best regards
>
> Stephan Schubert
> Senior Web Application Engineer  |   IT Engineering Information Oriented
> Applications
>
>
>
> SICK AG  |  Erwin-Sick-Str. 1  |  79183 Waldkirch  |  Germany
> Phone +49 7681 202-3751  |  stephan.schub...@sick.de  |
> http://www.sick.de
> 
> __
>
> SICK AG  |   Sitz: Waldkirch i. Br.  |   Handelsregister: Freiburg i. Br.
> HRB 280355
> Vorstand: Dr. Robert Bauer (Vorsitzender)  |  Reinhard Bösl  |  Dr. Mats
> Gökstorp  |  Dr. Martin Krämer  |  Markus Vatter
> Aufsichtsrat: Gisela Sick (Ehrenvorsitzende)  |  Klaus M. Bukenberger
> (Vorsitzender)

Re: How to search in solr for words like %rek Dr%

2016-05-10 Thread Nick D

Don't really get what "Q= {!dismax qf=address} "rek Dr*" - It is not
allowed since perfix in Quotes is not allowed" means, why cant you use
exact phrase matching? Do you have some limitation of quoting as you are
specifically looking for an exact phrase I dont see why you wouldn't want
exact matching.

Anyways

You can look into using another type of tokenizer, my guess is you are
probably using the standard tokenizer or possibly the whitespace tokenizer.
You may want to try a different one a see what result you get. Also you
probably wont need to use the wildcards if you setup you gram sizes the way
you want.

The shingle factory can do stuff like (now my memory is a bit fuzzy on this
but I play with it in the admin page).

This is a sentence
shingle = 4
this_is_a_sentence

Combine that with your ngram factory and you can do something like. Mingram
= 4 max =50
this
this_i
this_is

this_is_a_sentence

his_i
his_is

his_is_a_sentence

etc.

Then apply the shingle factory on query to take something like

his is-> his_is and you will get that phrase back.

My personal favorite is just using edgengram and fixing something like but
the concept is the same with regular old ngram:

2001 N Drive Derek Fullerton

2
[32]
0
1
1
word
1
20
[32 30]
0
2
1
word
1
200
[32 30 30]
0
3
1
word
1
2001
[32 30 30 31]
0
4
1
word
1
n
[6e]
5
6
1
word
2
d
[64]
7
8
1
word
3
dr
[64 72]
7
9
1
word
3
dri
[64 72 69]
7
10
1
word
3
driv
[64 72 69 76]
7
11
1
word
3
drive
[64 72 69 76 65]
7
12
1
word
3
d
[64]
13
14
1
word
4
de
[64 65]
13
15
1
word
4
der
[64 65 72]
13
16
1
word
4
dere
[64 65 72 65]
13
17
1
word
4
derek
[64 65 72 65 6b]
13
18
1
word
4
f
[66]
19
20
1
word
5
fu
[66 75]
19
21
1
word
5
ful
[66 75 6c]
19
22
1
word
5
full
[66 75 6c 6c]
19
23
1
word
5
fulle
[66 75 6c 6c 65]
19
24
1
word
5
fuller
[66 75 6c 6c 65 72]
19
25
1
word
5
fullert
[66 75 6c 6c 65 72 74]
19
26
1
word
5
fullerto
[66 75 6c 6c 65 72 74 6f]
19
27
1
word
5
fullerton
[66 75 6c 6c 65 72 74 6f 6e]
19
28
1
word
5

Works great for a quick type-ahead field type.

Oh and by the way your ngram size is two small for _rek_ to be split up
from _derek_

Setting up a few different field types and playing with the analyzer in
admin page can give you a good idea about what both index and query time
results can be and with your tiny data set is the best way I can think of
to see instant results with your new field types.

Nick

On Tue, May 10, 2016 at 10:01 AM, Thrinadh Kuppili 
wrote:

> I have tried with  maxGramSize="12"/> and search using the Extended Dismax
>
> Q= {!dismax qf=address} rek Dr* - It did not work as expected since i am
> getting all the records which has rek, Dr .
>
> Q= {!dismax qf=address} "rek Dr*" - It is not allowed since perfix in
> Quotes
> is not allowed.
>
> Q= {!complexphrase inOrder=true}address:"rek dr*" - It did not work since
> it
> is searching for words starts with rek
>
> I am not aware of shingle factory as of now will try to use and findout how
> i can use.
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/How-to-search-in-solr-for-words-like-rek-Dr-tp4275854p4275859.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>

RE: Solr 5.4.1 Mergeindexes duplicate rows

2016-05-10 Thread Kalpana

As per Shawn's advice I deleted the index data using 
http://localhost:8983/solr/Sitecore_SharePoint/update?stream.body=%3Cdelete%3E%3Cquery%3E*:*%3C/query%3E%3C/delete%3E&commit=true

and then stopped and started Solr and the duplicates were gone.

Will keep a watch!

Thanks much!



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-5-4-1-Mergeindexes-duplicate-rows-tp4275153p4275869.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Re-indexing in SolRCloud while keeping the collection online -- Best practice?

2016-05-10 Thread Horváth Péter Gergely

Hi Erick,

Most of the time we have to do a full re-index: I do love your second idea,
I will take a look at the details of that. Thank you! :)

Cheers,
Peter

2016-05-10 17:10 GMT+02:00 Erick Erickson :

> Peter:
>
> Yeah, that would work, but there are a couple of alternatives:
> 1> If there's any way to know what the subset of docs that's
>  changed, just re-index _them_. The problem here is
>  picking up deletes. In the RDBMS case this is often done
>  by creating a trigger for deletes and then the last step
>  in your update is to remove the docs since the last time
>  you indexed using the deleted_docs table (or whatever).
>  This falls down if a> you require an instantaneous switch
>  from _all_ the old data to the new or b> you can't get a
>  list of deleted docs.
>
> 2> Use collection aliasing. The pattern is this: you have your
>  "Hot" collection (col1) serving queries that is pointed to
>  by alias "hot". You create a new collection (col2) and index
>  to it in the background. When done, use CREATEALIAS
>  to point "hot" to "col2". Now you can delete col1. There are
>  no restrictions on where these collections live, so this
>  allows you to move your collections around as you want. Plus
>  this keeps a better separation of old and new data...
>
> Best,
> Erick
>
> On Tue, May 10, 2016 at 4:32 AM, Horváth Péter Gergely
>  wrote:
> > Hi Everyone,
> >
> > I am wondering if there is any best practice regarding re-indexing
> > documents in SolrCloud 6.0.0 without making the data (or the underlying
> > collection) temporarily unavailable. Wiping all documents in a collection
> > and performing a full re-indexing is not a viable alternative for us.
> >
> > Say we had a massive Solr Cloud cluster with a number of separate nodes
> > that are used to host *multiple hundreds* of collections, with document
> > counts ranging from a couple of thousands to multiple (say up to 20)
> > millions of documents, each with 200-300 fields and a background batch
> > loader job that fetches data from a variety of source systems.
> >
> > We have to retain the cluster and ALL collections online all the time
> (365
> > x 24): We cannot allow queries to be blocked while data in a collection
> is
> > being updated and we cannot load everything in a single-shot jumbo commit
> > (the replication could overload the cluster).
> >
> > One solution I could imagine is storing an additional field "load
> > time-stamp" in all documents and the client (interactive query)
> application
> > extending all queries with an additional restriction, which requires
> > documents "load time-stamp" to be the latest known completed "load
> > time-stamp".
> >
> > This concept would work according to the following:
> > 1.) The batch job would simply start loading new documents, with the new
> > "load time-stamp". Existing documents would not be touched.
> > 2.) The client (interactive query) application would still use the old
> data
> > from the previous load (since all queries are restricted with the old
> "load
> > time-stamp")
> > 3.) The batch job would store the new "load time-stamp" as the one to be
> > used (e.g. in a separate collection etc.) -- after this, all queries
> would
> > return the most up-to-data documents
> > 4.) The batch job would purge all documents from the collection, where
> > the "load time-stamp" is not the same as the last one.
> >
> > This approach seems to be implementable, however, I definitely want to
> > avoid reinventing the wheel myself and wondering if there is any better
> > solution or built-in Solr Cloud feature to achieve the same or something
> > similar.
> >
> > Thanks,
> > Peter
>

Re: How to search in solr for words like %rek Dr%

I have tried with  and search using the Extended Dismax 

Q= {!dismax qf=address} rek Dr* - It did not work as expected since i am
getting all the records which has rek, Dr .

Q= {!dismax qf=address} "rek Dr*" - It is not allowed since perfix in Quotes
is not allowed.

Q= {!complexphrase inOrder=true}address:"rek dr*" - It did not work since it
is searching for words starts with rek 

I am not aware of shingle factory as of now will try to use and findout how
i can use.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-search-in-solr-for-words-like-rek-Dr-tp4275854p4275859.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: auto purge for embedded zookeeper

2016-05-10 Thread tedsolr

That makes perfect sense Shawn. I will clean up the old log data the old
fashioned way.

thanks, Ted



--
View this message in context: 
http://lucene.472066.n3.nabble.com/auto-purge-for-embedded-zookeeper-tp4275561p4275857.html
Sent from the Solr - User mailing list archive at Nabble.com.

RE: Solr 5.4.1 Mergeindexes duplicate rows

2016-05-10 Thread Kalpana

Thanks for your reply!

Some questions:

Is Solr in cloud mode or running standalone?

Standalone

If you look at the core overview in the admin UI for these three cores,
can you tell me what Num Docs, Max Doc, and the index size is for all
three indexes?

SharePoint_All
Num Docs: 6211
Max Doc= 6211
Index: 29.82 MB

Sitecore_web_index
Num Docs: 5268
Max Doc= 5268
Index: 3.47 MB

Sitecore_SharePoint
Num Docs: 22958
Max Doc= 22958
Index: 78.84 MB

Are the schemas in these three indexes all using the same field name for
uniqueKey?

Yes
_uniqueid

Are you sure that you have only run the merge once?  Alternately, before
each merge attempt, you could entirely delete
$SOLR_HOME/Sitecore_Sharepoint/data and reload the core or restart Solr.

I am manually type the url and performing the merge. I stopped solr, deleted 
the files in the file system for index and then started Solr and ran the Url 
merge and still saw duplicates. I can try what you have recommended.

Thanks so much!.

From: Shawn Heisey-2 [via Lucene] 
[mailto:ml-node+s472066n4275813...@n3.nabble.com]
Sent: Tuesday, May 10, 2016 10:38 AM
To: Kalpana Sivanandan 
Subject: Re: Solr 5.4.1 Mergeindexes duplicate rows

On 5/9/2016 7:55 AM, Kalpana wrote:

> Can anyone help me with a merge. Currently I have the two cores already
> pulling data from SQL Table based on the query I set up.
>
> Solr is running
>
> I also have a third core set up with schema similar to the first two. and
> then I wrote this in the url and hit enter
> http://localhost:8983/solr/admin/cores?action=mergeindexes&core=Sitecore_SharePoint&srcCore=sitecore_web_index&srcCore=SharePoint_All
>
> I stop and start Solr and I see data with duplicates.
>
> Am I doing this right?

Some questions:

Is Solr in cloud mode or running standalone?

If you look at the core overview in the admin UI for these three cores,
can you tell me what Num Docs, Max Doc, and the index size is for all
three indexes?

Are the schemas in these three indexes all using the same field name for
uniqueKey?

Are you sure that you have only run the merge once?  Alternately, before
each merge attempt, you could entirely delete
$SOLR_HOME/Sitecore_Sharepoint/data and reload the core or restart Solr.

Thanks,
Shawn

If you reply to this email, your message will be added to the discussion below:
http://lucene.472066.n3.nabble.com/Solr-5-4-1-Mergeindexes-duplicate-rows-tp4275153p4275813.html
To unsubscribe from Solr 5.4.1 Mergeindexes duplicate rows, click 
here.
NAML

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-5-4-1-Mergeindexes-duplicate-rows-tp4275153p4275820.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: How to search in solr for words like %rek Dr%

2016-05-10 Thread Nick D

You can use a combination of ngram or edgengram fields and possibly the
shingle factory if you want to combine words. Also might want to have it as
exact text with no query sloop if the two words, even the partial text,
need to be right next to each other. Edge is great for left to right ngram
is great just to splitup by a size.  There are a number of tokenizers you
can try out.

Nick
On May 10, 2016 9:22 AM, "Thrinadh Kuppili"  wrote:

> I am trying to search a field named Address which has a space in it.
> Example :
> Address has the below values in it.
> 1. 2000 North Derek Dr Fullerton
> 2. 2011 N Derek Drive Fullerton
> 3. 2108 N Derek Drive Fullerton
> 4. 2100 N Derek Drive Fullerton
> 5. 2001 N Drive Derek Fullerton
>
> Search Query:- Derek Drive or rek Dr
> Expectation is it should return all  2,3,4 and it should not return 1 & 5 .
>
> Finally i am trying to find a word which can search similar to database
> search of %N Derek%
>
>
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/How-to-search-in-solr-for-words-like-rek-Dr-tp4275854.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>

How to search in solr for words like %rek Dr%

I am trying to search a field named Address which has a space in it.
Example :
Address has the below values in it.
1. 2000 North Derek Dr Fullerton
2. 2011 N Derek Drive Fullerton 
3. 2108 N Derek Drive Fullerton
4. 2100 N Derek Drive Fullerton
5. 2001 N Drive Derek Fullerton

Search Query:- Derek Drive or rek Dr 
Expectation is it should return all  2,3,4 and it should not return 1 & 5 .

Finally i am trying to find a word which can search similar to database
search of %N Derek% 

 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-search-in-solr-for-words-like-rek-Dr-tp4275854.html
Sent from the Solr - User mailing list archive at Nabble.com.

Antwort: Re: Transforming SolrDocument to SolrInputDocument in Solr 6.0

2016-05-10 Thread Stephan Schubert

Ouch... thanks a lot ;)

Mit freundlichen Grüßen / Best regards

Stephan Schubert
Senior Web Application Engineer | IT Engineering
Information Oriented Applications

SICK AG | Erwin-Sick-Str. 1 | 79183 Waldkirch | Germany
Phone  +49 7681 202-3751 | Fax  | mailto:stephan.schub...@sick.de | 
http://www.sick.de
 

SICK AG  |  Sitz: Waldkirch i. Br.  |  Handelsregister: Freiburg i. Br. HRB 
280355 
Vorstand: Dr. Robert Bauer (Vorsitzender)  |  Reinhard Bösl  |  Dr. Mats 
Gökstorp  |  Dr. Martin Krämer  |  Markus Vatter
Aufsichtsrat: Gisela Sick (Ehrenvorsitzende)  |  Klaus M. Bukenberger 
(Vorsitzender)

Re: Transforming SolrDocument to SolrInputDocument in Solr 6.0

Hmm, looking at the patch I see:

DocumentObjectBinder binder = new DocumentObjectBinder();
.
.
.

SolrInputDocument solrInputDoc = binder.toSolrInputDocument(in);

But I confess I didn't actually try it.

On Tue, May 10, 2016 at 8:41 AM, Stephan Schubert
 wrote:
> In Solr 6.0 the method ClientUtils.toSolrInputDocument() was removed
> (deprecated since 5.5.1, see
> https://issues.apache.org/jira/browse/SOLR-8339). What is the best way now
> to transform a SolrDocument into a SolrInputDocument?
>
> Mit freundlichen Grüßen / Best regards
>
> Stephan Schubert
> Senior Web Application Engineer  |   IT Engineering Information Oriented
> Applications
>
>
>
> SICK AG  |  Erwin-Sick-Str. 1  |  79183 Waldkirch  |  Germany
> Phone +49 7681 202-3751  |  stephan.schub...@sick.de  |  http://www.sick.de
> __
>
> SICK AG  |   Sitz: Waldkirch i. Br.  |   Handelsregister: Freiburg i. Br.
> HRB 280355
> Vorstand: Dr. Robert Bauer (Vorsitzender)  |  Reinhard Bösl  |  Dr. Mats
> Gökstorp  |  Dr. Martin Krämer  |  Markus Vatter
> Aufsichtsrat: Gisela Sick (Ehrenvorsitzende)  |  Klaus M. Bukenberger
> (Vorsitzender)

RE: Solr edismax field boosting

2016-05-10 Thread Megha Bhandari

Hi Nick

We found the issue.

We had set the type of some of the fields as "string". After changing the 
fields to "text_general" boosting started working.
You were right , Solr was not finding the search term in those fields as String 
only supports exact match and doesn’t tokenise content.

Thanks

-Original Message-
From: Nick D [mailto:ndrake0...@gmail.com] 
Sent: Tuesday, May 10, 2016 9:05 PM
To: solr-user@lucene.apache.org
Subject: Re: Solr edismax field boosting

Megha,

What are the field types for the fields you are trying to search through?
Grab a copy of the schema.xml and paste the relevant fields.

My guess is you have _text_ as some copy field for everything else and have
it stored=false correct? I am no seeing that field in the output above.
Also in you first post you show the /elevate requestHandler definition, is
that your default request handler or did you paste in the incorrect
handler.

The simple reason the boosting isn't working is Solr isnt finding a match
in that your query fields that you are applying a boost too it is only
finding the values in the _text_ field.

Also you probably should read up on BM25Similarity as this is the default
in the version of solr you are using.


Nick




On Tue, May 10, 2016 at 12:27 AM, Megha Bhandari 
wrote:

> Thanks Nick, got the response formatted. We are using Solr 5.5.
> Not able to understand why it is ignoring the boosts completely. What
> configuration is being missed? As you correctly pointed out it is only
> calculating based on the _text_ field.
>
> Query:
>
> http://10.203.101.42:8983/solr/uhc/select?defType=edismax&indent=on&mm=1&q=upendra&qf=h1
> ^9.0%20_text_^1.0&wt=ruby&debug=true
>
> Response with debug on:
> {
>   'responseHeader'=>{
> 'status'=>0,
> 'QTime'=>6,
> 'params'=>{
>   'mm'=>'1',
>   'q'=>'upendra',
>   'defType'=>'edismax',
>   'debug'=>'true',
>   'indent'=>'on',
>   'qf'=>'h1^9.0 _text_^1.0',
>   'wt'=>'ruby'}},
>   'response'=>{'numFound'=>6,'start'=>0,'maxScore'=>0.14641379,'docs'=>[
>   {
> 'h2'=>['Looks like your browser is a little out-of-date.'],
> 'h3'=>['Already a member?'],
> 'strtitle'=>['I m increasiing the the page title content Upendra
> Custon'],
> 'id'=>'http://localhost:4503/baseurl/upendra-custon.html',
> 'tstamp'=>'2016-05-10T05:50:22.316Z',
> 'metataghideininternalsearch'=>false,
> 'metatagtopresultthumbnailalt'=>',',
> 'segment'=>[20160510112017],
> 'digest'=>['fb988351afceb26a835fba68e2bcc33f'],
> 'boost'=>[1.4142135],
> 'lang'=>'en',
> 'metatagkeywords'=>[','],
> '_version_'=>1533919301006786560,
> 'host'=>'localhost',
> 'url'=>'http://localhost:4503/baseurl/upendra-custon.html',
> 'score'=>0.14641379},
>   {
> 'metatagdescription'=>['test'],
> 'h1'=>['Upendra'],
> 'h2'=>['Looks like your browser is a little out-of-date.'],
> 'h3'=>['Already a member?'],
> 'strtitle'=>['health care body content'],
> 'id'=>'
> http://localhost:4503/baseurl/upendra-custon/care-body-content.html',
> 'tstamp'=>'2016-05-10T05:50:22.269Z',
> 'metataghideininternalsearch'=>false,
> 'metatagtopresultthumbnailalt'=>',',
> 'segment'=>[20160510112017],
> 'digest'=>['dd4ef8879be2d4d3f28e24928e9b84c5'],
> 'boost'=>[1.4142135],
> 'lang'=>'en',
> 'metatagkeywords'=>[','],
> '_version_'=>1533919301071798272,
> 'host'=>'localhost',
> 'url'=>'
> http://localhost:4503/baseurl/upendra-custon/care-body-content.html',
> 'score'=>0.13738367},
>   {
> 'metatagdescription'=>['test'],
> 'h1'=>['health care keyword'],
> 'h2'=>['Looks like your browser is a little out-of-date.'],
> 'h3'=>['Already a member?'],
> 'strtitle'=>['health care keyword'],
> 'id'=>'
> http://localhost:4503/baseurl/upendra-custon/care-keyword.html',
> 'tstamp'=>'2016-05-10T05:50:22.300Z',
> 'metataghideininternalsearch'=>false,
> 'metatagtopresultthumbnailalt'=>',',
> 'segment'=>[20160510112017],
> 'digest'=>['4af11065d604bcec7aa4cbc1cf0fca59'],
> 'boost'=>[1.4142135],
> 'lang'=>'en',
> 'metatagkeywords'=>['upendra,upendra'],
> '_version_'=>1533919301088575488,
> 'host'=>'localhost',
> 'url'=>'
> http://localhost:4503/baseurl/upendra-custon/care-keyword.html',
> 'score'=>0.13738367},
>   {
> 'metatagdescription'=>['test'],
> 'h1'=>['Health care'],
> 'h2'=>['Looks like your browser is a little out-of-date.'],
> 'h3'=>['Already a member?'],
> 'strtitle'=>['This is the page Title Upendra, lets do the
> testing'],
> 'id'=>'http://localhost:4503/baseurl/upendra-custon/care.html',
> 'tstamp'=>'2016-05-10T05:50:22.518Z',
> 'metataghide

Simulate doc linking via post filter cache check

2016-05-10 Thread tedsolr

I'm pulling my hair out on this one - and there's not much of that to begin
with. The problem I have is that updating 10M denormalized docs in a
collection takes about 5 hours. Soon there will be collections with 100M
docs and a 50 hour update cycle will not be acceptable. The process involves
cleaning (deleting) the marker fields, querying the collection with user
defined saved searches, then updating the marker fields in every matched
doc. If I can normalize based on the searches the processing time should go
way down: delete marker docs, query the collection with user defined saved
searches, then insert marker docs. The time savings comes from 1) deleting
and inserting docs is faster than updating docs, 2) the number of saved
searches is at least 1000X less than the number of docs.

A doc may have a couple hundred fields, but looks sorta like this:
{"id":123_5677899","searchid":"34","name":"Johnson", ...}

To normalize I would remove the searchid into a new doc:
{"id":"S234","searchid":"34","doclist":["123_5677899","123_5677898",...]}

The "link" is established by the doclist field which is multivalued and
contains the ids from the real docs. All this is doable, the problem is that
when users create saved searches they must only match docs that have not
already been matched by another search. That's why there's only one doc
"type" now - every matched doc has a marker (searchid) which makes the Solr
search work. Since it's not possible to do a RDBMS like search joining the 2
doc types, I need to run the saved search: find docs where name=Johnson,
then drop the docs that are not in a doclist.

So, maybe if I manage a custom cache of matched doc ids, I can check each
returned id against the cache and drop the docs that are not in it. I think
this could be done in a post filter. There will be a big memory hit to
maintain this cache, but does this seem like a performant solution to my
problem?

Thanks!
v5.2.1
All collections are one shard with replication factor 2

--
View this message in context:
http://lucene.472066.n3.nabble.com/Simulate-doc-linking-via-post-filter-cache-check-tp4275842.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr 5.x bug with Service installation script?

2016-05-10 Thread A Laxmi

Hi Shawn -

You brought up a good point. This might be a possible reason. I'll test it
out. Thanks! My index (4.5g) usually takes about 15-20 secs to load.

One other observation - even though it says write.lock file in a specific
data directory path, when I look up the directory, I don't see any
write.lock file in there. It is really confusing.

AL

On Tue, May 10, 2016 at 10:37 AM, Shawn Heisey  wrote:

> On 5/9/2016 11:30 AM, A Laxmi wrote:
> > yes, I always shutdown both source and destination Solr before copying
> the
> > index over from one to another. Somehow the write.lock only happens when
> > Solr restarts from service script. If loads just fine when started
> manually.
>
> One possible problem:
>
> The bin/solr script (which is used by the init script) only waits for 5
> seconds for Solr to stop gracefully before killingit forcibly.  This can
> leave write.lock files behind.
>
> I thought it had increased to 30 seconds in a recent version and that it
> was possibly even configurable in solr.in.sh, but I just checked the
> 6.0.0 download.  It's still only 5 seconds, and the value is hard-coded
> in the script.  This is only enough time if you have a very small number
> of very small indexes.
>
> Thanks,
> Shawn
>
>

Transforming SolrDocument to SolrInputDocument in Solr 6.0

2016-05-10 Thread Stephan Schubert

In Solr 6.0 the method ClientUtils.toSolrInputDocument() was removed 
(deprecated since 5.5.1, see 
https://issues.apache.org/jira/browse/SOLR-8339). What is the best way now 
to transform a SolrDocument into a SolrInputDocument?
Mit freundlichen Grüßen / Best regards

Stephan Schubert
Senior Web Application Engineer | IT Engineering
Information Oriented Applications

SICK AG | Erwin-Sick-Str. 1 | 79183 Waldkirch | Germany
Phone  +49 7681 202-3751 | Fax  | mailto:stephan.schub...@sick.de | 
http://www.sick.de
 

SICK AG  |  Sitz: Waldkirch i. Br.  |  Handelsregister: Freiburg i. Br. HRB 
280355 
Vorstand: Dr. Robert Bauer (Vorsitzender)  |  Reinhard Bösl  |  Dr. Mats 
Gökstorp  |  Dr. Martin Krämer  |  Markus Vatter
Aufsichtsrat: Gisela Sick (Ehrenvorsitzende)  |  Klaus M. Bukenberger 
(Vorsitzender)

Re: Solr 5.x bug with Service installation script?

2016-05-10 Thread A Laxmi

Hi Erick - I used "sudo service solr stop" to shut it down.

On Tue, May 10, 2016 at 12:26 AM, Erick Erickson 
wrote:

> How do you shut down your Solrs? Any kind of un-graceful
> stopping (kill -9 is a favorite) may leave the lock file around.
>
> It can't be coming from nowhere, so my guess is that
> it's present in the source or destination before
> you do your copy...
>
> Best,
> Erick
>
> On Mon, May 9, 2016 at 10:30 AM, A Laxmi  wrote:
> > yes, I always shutdown both source and destination Solr before copying
> the
> > index over from one to another. Somehow the write.lock only happens when
> > Solr restarts from service script. If loads just fine when started
> manually.
> >
> > On Mon, May 9, 2016 at 1:20 PM, Abdel Belkasri 
> wrote:
> >
> >> Did you copy the core while solr is running? if yes, first shuown source
> >> and destination solr, copy intex to the other solr, then restat solr
> nodes.
> >> Lock files get written to the core while solr is running and doing
> indexing
> >> or searching, etc.
> >>
> >> On Mon, May 9, 2016 at 12:38 PM, A Laxmi 
> wrote:
> >>
> >> > Hi,
> >> >
> >> > I have installed Solr 5.3.1 using the Service Installation Script. I
> was
> >> > able to successfully start and stop Solr using service solr start/stop
> >> > commands and Solr loads up just fine.
> >> >
> >> > However, when I stop Solr service and copy an index of a core from one
> >> > server to another with same exact version of Solr and its
> corresponding
> >> > conf and restart the service, it complains about write.lock file when
> >> none
> >> > exists under the path that it specifies in the log.
> >> >
> >> > To validate whether the issue is with the data that is being copied or
> >> the
> >> > service script itself, I copied the collection directory with new
> index
> >> > into example-DIH directory and restarted Solr manually bin/solr start
> -e
> >> > dih -m 2g, it worked without any error. So, atleast this validates
> that
> >> > collection data is just fine and service script is creating a lock
> >> > everytime a new index is copied from another server though it has the
> >> same
> >> > exact Solr version.
> >> >
> >> > Did anyone experience the same? Any thoughts if this is a bug?
> >> >
> >> > Thanks!
> >> > AL
> >> >
> >>
> >>
> >>
> >> --
> >> Abdel K. Belkasri, PhD
> >>
>

Transforming SolrDocument to SolrInputDocument in Solr 6.0

2016-05-10 Thread Stephan Schubert

In Solr 6.0 the method ClientUtils.toSolrInputDocument() was removed 
(deprecated since 5.5.1, see 
https://issues.apache.org/jira/browse/SOLR-8339). What is the best way now 
to transform a SolrDocument into a SolrInputDocument?
Mit freundlichen Grüßen / Best regards

Stephan Schubert
Senior Web Application Engineer | IT Engineering
Information Oriented Applications

SICK AG | Erwin-Sick-Str. 1 | 79183 Waldkirch | Germany
Phone  +49 7681 202-3751 | Fax  | mailto:stephan.schub...@sick.de | 
http://www.sick.de
 

SICK AG  |  Sitz: Waldkirch i. Br.  |  Handelsregister: Freiburg i. Br. HRB 
280355 
Vorstand: Dr. Robert Bauer (Vorsitzender)  |  Reinhard Bösl  |  Dr. Mats 
Gökstorp  |  Dr. Martin Krämer  |  Markus Vatter
Aufsichtsrat: Gisela Sick (Ehrenvorsitzende)  |  Klaus M. Bukenberger 
(Vorsitzender)

Re: Unable to achieve boosting into solr 5.5

Please review: http://wiki.apache.org/solr/UsingMailingLists

You haven't shown us what you _do_ get, what you expect, why
you think there's an error. Adding &debug=query will show you
the parsed query and may give you a clue.

Best,
Erick

On Mon, May 9, 2016 at 11:02 PM, Upendra Kumar Baliyan
 wrote:
> Hi,
>
> We are using solr 5.5, but could not achieve field boosting. We are not 
> getting the result as per the below configuration.
>
>
>
> Below is configuration in solrconfig.xml for 
>
>
> 
>
> 
>
> edismax
>
>
>
>   metatag.keywords^10.0 metatag.description^9.0 h1^7.0 h2^6.0 h3^5.0 
> h4^4.0 _text_^1.0 id^0.5
>
>
>
>100%
>
>*:*
>
>10
>
>*,score
>
>   explicit
>
> 
>
>
>
> Any help ?
>
>
>
> Regards
>
> Upendra Kumar Baliyan
>

Re: Solr edismax field boosting

2016-05-10 Thread Nick D

Megha,

What are the field types for the fields you are trying to search through?
Grab a copy of the schema.xml and paste the relevant fields.

My guess is you have _text_ as some copy field for everything else and have
it stored=false correct? I am no seeing that field in the output above.
Also in you first post you show the /elevate requestHandler definition, is
that your default request handler or did you paste in the incorrect
handler.

The simple reason the boosting isn't working is Solr isnt finding a match
in that your query fields that you are applying a boost too it is only
finding the values in the _text_ field.

Also you probably should read up on BM25Similarity as this is the default
in the version of solr you are using.


Nick




On Tue, May 10, 2016 at 12:27 AM, Megha Bhandari 
wrote:

> Thanks Nick, got the response formatted. We are using Solr 5.5.
> Not able to understand why it is ignoring the boosts completely. What
> configuration is being missed? As you correctly pointed out it is only
> calculating based on the _text_ field.
>
> Query:
>
> http://10.203.101.42:8983/solr/uhc/select?defType=edismax&indent=on&mm=1&q=upendra&qf=h1
> ^9.0%20_text_^1.0&wt=ruby&debug=true
>
> Response with debug on:
> {
>   'responseHeader'=>{
> 'status'=>0,
> 'QTime'=>6,
> 'params'=>{
>   'mm'=>'1',
>   'q'=>'upendra',
>   'defType'=>'edismax',
>   'debug'=>'true',
>   'indent'=>'on',
>   'qf'=>'h1^9.0 _text_^1.0',
>   'wt'=>'ruby'}},
>   'response'=>{'numFound'=>6,'start'=>0,'maxScore'=>0.14641379,'docs'=>[
>   {
> 'h2'=>['Looks like your browser is a little out-of-date.'],
> 'h3'=>['Already a member?'],
> 'strtitle'=>['I m increasiing the the page title content Upendra
> Custon'],
> 'id'=>'http://localhost:4503/baseurl/upendra-custon.html',
> 'tstamp'=>'2016-05-10T05:50:22.316Z',
> 'metataghideininternalsearch'=>false,
> 'metatagtopresultthumbnailalt'=>',',
> 'segment'=>[20160510112017],
> 'digest'=>['fb988351afceb26a835fba68e2bcc33f'],
> 'boost'=>[1.4142135],
> 'lang'=>'en',
> 'metatagkeywords'=>[','],
> '_version_'=>1533919301006786560,
> 'host'=>'localhost',
> 'url'=>'http://localhost:4503/baseurl/upendra-custon.html',
> 'score'=>0.14641379},
>   {
> 'metatagdescription'=>['test'],
> 'h1'=>['Upendra'],
> 'h2'=>['Looks like your browser is a little out-of-date.'],
> 'h3'=>['Already a member?'],
> 'strtitle'=>['health care body content'],
> 'id'=>'
> http://localhost:4503/baseurl/upendra-custon/care-body-content.html',
> 'tstamp'=>'2016-05-10T05:50:22.269Z',
> 'metataghideininternalsearch'=>false,
> 'metatagtopresultthumbnailalt'=>',',
> 'segment'=>[20160510112017],
> 'digest'=>['dd4ef8879be2d4d3f28e24928e9b84c5'],
> 'boost'=>[1.4142135],
> 'lang'=>'en',
> 'metatagkeywords'=>[','],
> '_version_'=>1533919301071798272,
> 'host'=>'localhost',
> 'url'=>'
> http://localhost:4503/baseurl/upendra-custon/care-body-content.html',
> 'score'=>0.13738367},
>   {
> 'metatagdescription'=>['test'],
> 'h1'=>['health care keyword'],
> 'h2'=>['Looks like your browser is a little out-of-date.'],
> 'h3'=>['Already a member?'],
> 'strtitle'=>['health care keyword'],
> 'id'=>'
> http://localhost:4503/baseurl/upendra-custon/care-keyword.html',
> 'tstamp'=>'2016-05-10T05:50:22.300Z',
> 'metataghideininternalsearch'=>false,
> 'metatagtopresultthumbnailalt'=>',',
> 'segment'=>[20160510112017],
> 'digest'=>['4af11065d604bcec7aa4cbc1cf0fca59'],
> 'boost'=>[1.4142135],
> 'lang'=>'en',
> 'metatagkeywords'=>['upendra,upendra'],
> '_version_'=>1533919301088575488,
> 'host'=>'localhost',
> 'url'=>'
> http://localhost:4503/baseurl/upendra-custon/care-keyword.html',
> 'score'=>0.13738367},
>   {
> 'metatagdescription'=>['test'],
> 'h1'=>['Health care'],
> 'h2'=>['Looks like your browser is a little out-of-date.'],
> 'h3'=>['Already a member?'],
> 'strtitle'=>['This is the page Title Upendra, lets do the
> testing'],
> 'id'=>'http://localhost:4503/baseurl/upendra-custon/care.html',
> 'tstamp'=>'2016-05-10T05:50:22.518Z',
> 'metataghideininternalsearch'=>false,
> 'metatagtopresultthumbnailalt'=>',,,',
> 'segment'=>[20160510112017],
> 'digest'=>['711a059f2a05a6c03e59d490cd7008ff'],
> 'boost'=>[1.4142135],
> 'lang'=>'en',
> 'metatagkeywords'=>[',,,'],
> '_version_'=>1533919301088575489,
> 'host'=>'localhost',
> 'url'=>'http://localhost:4503/baseurl/upendra-custon/care.html',
> 'score'=>0.13286635},
>   {
> 'metatagdescription

Re: Filter queries & caching

No. Please re-read and use the admin plugins/stats page to examine for yourself.

1)  fq=filter(fromfield:[* TO NOW/DAY+1DAY]&& tofield:[NOW/DAY-7DAY TO *])
&& fq=type:abc

&& is totally unnecessary when using fq clauses, there is already an
implicit AND.
I'm not even sure what the above does, I don't quite know off the top of my head
how that would be parsed.

fq=filter() is unnecessary and in fact (apparently) uses extra
filterCache entries
to no purpose.

I'm guessing you're thinking of something like this

q=*:*&fq=(fromfield:[* TO NOW/DAY+1DAY] && tofield:[NOW/DAY-7DAY TO
*])&fq=type:abc

Would use 2 filterCache entries,

or maybe this: (notice this is "q=" not "fq=")

q=filter(fromfield:[* TO NOW/DAY+1DAY] && tofield:[NOW/DAY-7DAY TO *])
&& filter(type:abc)

would use two filter queries as well. Same thing essentially.

2) fq= fromfield:[* TO NOW/DAY+1DAY]&& fq=tofield:[NOW/DAY-7DAY TO *]) &&
fq=type:abc

This is syntactically incorrect, I assume you meant (added left paren
and again the && is unnecessary):

q=*:*&fq=(fromfield:[* TO NOW/DAY+1DAY] && fq=tofield:[NOW/DAY-7DAY TO *])&
fq=type:abc

As above the rewritten form would use two filterCache entries.

Best,
Erick

On Mon, May 9, 2016 at 11:03 PM, Jay Potharaju  wrote:
> Thanks for the explanation Eric.
>
> So that I understand this clearly
>
>
> 1)  fq=filter(fromfield:[* TO NOW/DAY+1DAY]&& tofield:[NOW/DAY-7DAY TO *])
> && fq=type:abc
> 2) fq= fromfield:[* TO NOW/DAY+1DAY]&& fq=tofield:[NOW/DAY-7DAY TO *]) &&
> fq=type:abc
>
> Using 1) would benefit from having 2 separate filter caches instead of 3
> slots in the cache. But in general both would be using the filter cache.
> And secondly it would  be more useful to use filter() in a scenario like
> above(mentioned in your email).
> Thanks
>
>
>
>
> On Mon, May 9, 2016 at 9:43 PM, Erick Erickson 
> wrote:
>
>> You're confusing a query clause with fq when thinking about filter() I
>> think.
>>
>> Essentially they don't need to be used together, i.e.
>>
>> q=myclause AND filter(field:value)
>>
>> is identical to
>>
>> q=myclause&fq=field:value
>>
>> both in docs returned and filterCache usage.
>>
>> q=myclause&filter(fq=field:value)
>>
>> actually uses two filterCache entries, so is probably not what you want to
>> use.
>>
>> the filter() syntax attached to a q clause (not an fq clause) is meant
>> to allow you to get speedups
>> you want to use compound clauses without having every combination be
>> separate filterCache entries.
>>
>> Consider the following:
>> fq=A OR B
>> fq=A AND B
>> fq=A
>> fq=B
>>
>> These would require 4 filterCache entries.
>>
>> q=filter(A) OR filter(B)
>> q=filter(A) AND filter(B)
>> q=filter(A)
>> q=filter(B)
>>
>> would only require two. Yet all of them would be satisfied only by
>> looking at the filterCache.
>>
>> Aside from the example immediately above, which one you use is largely
>> a matter of taste.
>>
>> Best,
>> Erick
>>
>> On Mon, May 9, 2016 at 12:47 PM, Jay Potharaju 
>> wrote:
>> > Thanks Ahmet...but I am not still clear how is adding filter() option
>> > better or is it the same as filtercache?
>> >
>> > My question is below.
>> >
>> > "As mentioned above adding filter() will add the filter query to the
>> cache.
>> > This would mean that results are fetched from cache instead of running n
>> > number of filter queries  in parallel.
>> > Is it necessary to use the filter() option? I was under the impression
>> that
>> > all filter queries will get added to the "filtercache". What is the
>> > advantage of using filter()?"
>> >
>> > Thanks
>> >
>> > On Sun, May 8, 2016 at 6:30 PM, Ahmet Arslan 
>> > wrote:
>> >
>> >> Hi,
>> >>
>> >> As I understand it useful incase you use an OR operator between two
>> >> restricting clauses.
>> >> Recall that multiple fq means implicit AND.
>> >>
>> >> ahmet
>> >>
>> >>
>> >>
>> >> On Monday, May 9, 2016 4:02 AM, Jay Potharaju 
>> >> wrote:
>> >> As mentioned above adding filter() will add the filter query to the
>> cache.
>> >> This would mean that results are fetched from cache instead of running n
>> >> number of filter queries  in parallel.
>> >> Is it necessary to use the filter() option? I was under the impression
>> that
>> >> all filter queries will get added to the "filtercache". What is the
>> >> advantage of using filter()?
>> >>
>> >> *From
>> >> doc:
>> >>
>> https://cwiki.apache.org/confluence/display/solr/Query+Settings+in+SolrConfig
>> >> <
>> >>
>> https://cwiki.apache.org/confluence/display/solr/Query+Settings+in+SolrConfig
>> >> >*
>> >> This cache is used by SolrIndexSearcher for filters (DocSets) for
>> unordered
>> >> sets of all documents that match a query. The numeric attributes control
>> >> the number of entries in the cache.
>> >> Solr uses the filterCache to cache results of queries that use the fq
>> >> search parameter. Subsequent queries using the same parameter setting
>> >> result in cache hits and rapid returns of results. See Searching for a
>> >> detailed discussion of the fq par

Re: solrcloud performance problem

On 5/9/2016 11:42 PM, lltvw wrote:
> By using jps command double check the parms used to start solr, i found that 
> the max  heap size already set to 10G. So I made a big mistake yesterday.
>
> But by using solr admin UI, I select the collection with performance problem, 
> in the overview page I find that the heap memory is about 8M. What is wrong.
>
> Every time I search difference characters, QTime from response header always 
> greater than 300ms. If I search again, cause i can hit cache, the response 
> time could become to about 30ms.

When my queries hit the cache, they only take a few milliseconds.  30
milliseconds for a cached query seems VERY slow.

Can you open the dashboard in the admin UI, make it large enough to see
everything, take a screenshot of the whole page, and included a URL
where that screenshot can be viewed?  I do not need to see the whole
browser window, just the whole dashboard.  Here's an example of what I
am looking for:

https://www.dropbox.com/s/ixu8dr954mst0c4/dashboard-just-page.png?dl=0

In my example, you can't see all of the JVM Args in the screenshot --
there are a lot more of them, and they wouldn't fit in the window even
when maximized.  So if your screenshot doesn't include all of them, you
probably should copy those as text and include them in your reply --
like this:

-DSTOP.KEY=solrrocks
-DSTOP.PORT=7982
-Dcom.sun.management.jmxremote
-Dcom.sun.management.jmxremote.authenticate=false
-Dcom.sun.management.jmxremote.local.only=false
-Dcom.sun.management.jmxremote.port=18982
-Dcom.sun.management.jmxremote.rmi.port=18982
-Dcom.sun.management.jmxremote.ssl=false
-Djetty.home=/opt/solr5/server
-Djetty.port=8982
-Dlog4j.configuration=file:/index/solr5/log4j.properties
-Dsolr.install.dir=/opt/solr5
-Dsolr.solr.home=/index/solr5/data
-Duser.timezone=UTC
-XX:+CMSParallelRemarkEnabled
-XX:+CMSScavengeBeforeRemark
-XX:+ParallelRefProcEnabled
-XX:+PrintGCApplicationStoppedTime
-XX:+PrintGCDateStamps
-XX:+PrintGCDetails
-XX:+PrintGCTimeStamps
-XX:+PrintHeapAtGC
-XX:+PrintTenuringDistribution
-XX:+UseCMSInitiatingOccupancyOnly
-XX:+UseConcMarkSweepGC
-XX:+UseParNewGC
-XX:CMSInitiatingOccupancyFraction=70
-XX:CMSMaxAbortablePrecleanTime=2000
-XX:MaxTenuringThreshold=8
-XX:NewRatio=3
-XX:OnOutOfMemoryError=/opt/solr5/bin/oom_solr.sh 8982 /index/solr5/logs
-XX:PretenureSizeThreshold=64m
-XX:SurvivorRatio=4
-XX:TargetSurvivorRatio=90
-Xloggc:/index/solr5/logs/solr_gc.log
-Xms22g
-Xmx22g
-verbose:gc

How are you starting Solr?  With Solr 4.x, there are limitless numbers
of ways to install and start Solr, because it is released as a webapp
.war file.  When 5.0 was released, that was reduced to only a few
supported options.

Thanks,
Shawn

Re: query action with wrong result size zero

2016-05-10 Thread Mikhail Khludnev

Usually such issues are troubleshooted with: Solr admin: schema browser and
analysis. Also, you might need to check debugQuery=true output and perhaps
use explainOther param.
05 мая 2016 г. 18:58 пользователь "mixiangliu" <852262...@qq.com> написал:


i found a strange thing  with solr query，when i set the value of query
field like "brand:amd"，the  size of query result is zero,but the real data
is not zero，can  some body tell me why，thank you very much！！
my english is not very good，wish some body understand my words!

Re: Re-indexing in SolRCloud while keeping the collection online -- Best practice?

Peter:

Yeah, that would work, but there are a couple of alternatives:
1> If there's any way to know what the subset of docs that's
 changed, just re-index _them_. The problem here is
 picking up deletes. In the RDBMS case this is often done
 by creating a trigger for deletes and then the last step
 in your update is to remove the docs since the last time
 you indexed using the deleted_docs table (or whatever).
 This falls down if a> you require an instantaneous switch
 from _all_ the old data to the new or b> you can't get a
 list of deleted docs.

2> Use collection aliasing. The pattern is this: you have your
 "Hot" collection (col1) serving queries that is pointed to
 by alias "hot". You create a new collection (col2) and index
 to it in the background. When done, use CREATEALIAS
 to point "hot" to "col2". Now you can delete col1. There are
 no restrictions on where these collections live, so this
 allows you to move your collections around as you want. Plus
 this keeps a better separation of old and new data...

Best,
Erick

On Tue, May 10, 2016 at 4:32 AM, Horváth Péter Gergely
 wrote:
> Hi Everyone,
>
> I am wondering if there is any best practice regarding re-indexing
> documents in SolrCloud 6.0.0 without making the data (or the underlying
> collection) temporarily unavailable. Wiping all documents in a collection
> and performing a full re-indexing is not a viable alternative for us.
>
> Say we had a massive Solr Cloud cluster with a number of separate nodes
> that are used to host *multiple hundreds* of collections, with document
> counts ranging from a couple of thousands to multiple (say up to 20)
> millions of documents, each with 200-300 fields and a background batch
> loader job that fetches data from a variety of source systems.
>
> We have to retain the cluster and ALL collections online all the time (365
> x 24): We cannot allow queries to be blocked while data in a collection is
> being updated and we cannot load everything in a single-shot jumbo commit
> (the replication could overload the cluster).
>
> One solution I could imagine is storing an additional field "load
> time-stamp" in all documents and the client (interactive query) application
> extending all queries with an additional restriction, which requires
> documents "load time-stamp" to be the latest known completed "load
> time-stamp".
>
> This concept would work according to the following:
> 1.) The batch job would simply start loading new documents, with the new
> "load time-stamp". Existing documents would not be touched.
> 2.) The client (interactive query) application would still use the old data
> from the previous load (since all queries are restricted with the old "load
> time-stamp")
> 3.) The batch job would store the new "load time-stamp" as the one to be
> used (e.g. in a separate collection etc.) -- after this, all queries would
> return the most up-to-data documents
> 4.) The batch job would purge all documents from the collection, where
> the "load time-stamp" is not the same as the last one.
>
> This approach seems to be implementable, however, I definitely want to
> avoid reinventing the wheel myself and wondering if there is any better
> solution or built-in Solr Cloud feature to achieve the same or something
> similar.
>
> Thanks,
> Peter

How to restrict outside IP access in Solr with internal jetty server

2016-05-10 Thread Mugeesh Husain


I am using solr 5.3 version with inbuilt jetty server.

I am looking for a proxy kind of thing which i could prevent outside User
access for all of the link, I would give only access select and select core
url accessibility other than this should be not open.

Please give me some suggestion.

Thanks
 Mugeesh




--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-restrict-outside-IP-access-in-Solr-with-internal-jetty-server-tp4275822.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Replicate Between sites

bq: Why not use classic replication between one node in the cluster and another
node in the other cluster.

First of all in SolrCloud I'm pretty sure you can't do that, where
"that" is have
classic replication operate where the source was one SolrCloud and
the destination is another SolrCloud cluster. There's all the
replication logic between leaders and followers that you'd be interfering
with.

Step back for a minute though. Even if you set that up you'd be
replicating your index across your admittedly slow DC/DC
connection. The merging process creates new segments from various
subsets of current segments, and the new segment would be
copied to the backup DC. In some cases the entire index will be merged
into a single segment (admittedly rarely). It seems far more bandwidth-efficient
to just index the raw docs to each DC from the client.

Best,
Erick

On Tue, May 10, 2016 at 6:52 AM, Abdel Belkasri  wrote:
> Erick,
>
> That's not what I was going for. No code porting. I was thinking this:
> Why not use classic replication between one node in the cluster and another
> node in the other cluster?
> something along this line.
>
> Thanks,
> --Abdel.
>
> On Tue, May 10, 2016 at 12:21 AM, Erick Erickson 
> wrote:
>
>> bq: How similar thing could be done in 4.9.1?
>>
>> That's not going to happen. More precisely,
>> there is zero chance that anyone will take on that
>> work unless it's a custom one-off that you
>> hire done or develop internally. And even
>> if someone took this on, it'd never be officially
>> released.
>>
>> IOW, if you want to try backporting it on your own,
>> have at it but that'll be completely unsupported.
>>
>> One thing people have done is create two
>> independent clusters, complete to separate ZK
>> ensembles and have the indexing client send
>> updates to both DCs. At that point it also makes
>> sense to have them both serve queries.
>>
>> Another choice is to have your system-of-record
>> replicated to both DCs, and have the indexing
>> process run in both DCs from the local copy of
>> the system-of-record to the local Solr
>> clusters independently of each other.
>>
>> Best,
>> Erick
>>
>> On Mon, May 9, 2016 at 12:31 PM, Abdel Belkasri 
>> wrote:
>> > Hi Alex,
>> >
>> > just started reading about CDCR, looks very promissing. Is this only in
>> > 6.0? our PROD server are running 4.9.1 and we cannot upgrade just yet.
>> How
>> > similar thing could be done in 4.9.1?
>> >
>> > Thanks,
>> > --Abdel
>> >
>> > On Mon, May 9, 2016 at 2:59 PM, Alexandre Rafalovitch <
>> arafa...@gmail.com>
>> > wrote:
>> >
>> >> Have you looked at Cross Data Center replication that's the new big
>> >> feature in Solr 6.0?
>> >>
>> >> Regards,
>> >>Alex.
>> >> 
>> >> Newsletter and resources for Solr beginners and intermediates:
>> >> http://www.solr-start.com/
>> >>
>> >>
>> >> On 10 May 2016 at 02:13, Abdel Belkasri  wrote:
>> >> > Hi there,
>> >> >
>> >> > we have the main site setup as follows:
>> >> > solrCould:
>> >> > App --> smart Client (solrj) --> ensemble of zookeeper --> SolrCloud
>> Noes
>> >> > (with slice/shard/recplica)
>> >> > Works fine.
>> >> >
>> >> > On the DR site we have a mirror setup, how can we keep the two site in
>> >> > sync, so that if something happened we point the app to DR and get
>> back
>> >> up
>> >> > and running?
>> >> >
>> >> > Note: making zookeeper span the two sites is not an option because of
>> >> > network latency.
>> >> >
>> >> > We are looking for replication (kind of master-slave that exists in
>> Solr
>> >> > classic)...how that is achieved in SolrCloud?
>> >> >
>> >> > Thanks,
>> >> > --Abdel.
>> >>
>> >
>> >
>> >
>> > --
>> > Abdel K. Belkasri, PhD
>>
>
>
>
> --
> Abdel K. Belkasri, PhD

Using Ping Request Handler in SolrCloud within a load balancer

2016-05-10 Thread Sandy Foley

A couple of questions ...
We've upconfig'd the ping request handler to ZooKeeper within the 
solrconfig.xml.  SolrCloud and ZooKeeper are working fine.
I understand that the /solr/admin/ping command is for a ping on its local 
server only (not from a remote machine).  This is working.I also understand 
that /solr/[core]/admin/ping can be used from a load balancer to ping a 
particular core on a server. This is working also.
Question #1:Is there a SINGLE command that can be issued to each server from a 
load balancer to check the ping status of each server?
Question #2:When running /solr/admin/ping from the load balancer to each Solr 
node, one of the three nodes returns a status ok.  It's the same node every 
time; it's the first node that we set up of the 3 (which is not always the 
leader). The zkcli upconfig command has always been issued from this first 
node.Out of curiosity, if this command is for local ping only, why does this 
return status ok on one node (issued from the load balancer) and not the other 
nodes?

Configuration:
WindowsTomcat 8.0SolrCloud 4.10.3 (3 nodes)External ensemble ZooKeeper 3.4.6 - 
3 servers
Thank you.

Re-indexing in SolRCloud while keeping the collection online -- Best practice?

2016-05-10 Thread Horváth Péter Gergely

Hi Everyone,

I am wondering if there is any best practice regarding re-indexing
documents in SolrCloud 6.0.0 without making the data (or the underlying
collection) temporarily unavailable. Wiping all documents in a collection
and performing a full re-indexing is not a viable alternative for us.

Say we had a massive Solr Cloud cluster with a number of separate nodes
that are used to host *multiple hundreds* of collections, with document
counts ranging from a couple of thousands to multiple (say up to 20)
millions of documents, each with 200-300 fields and a background batch
loader job that fetches data from a variety of source systems.

We have to retain the cluster and ALL collections online all the time (365
x 24): We cannot allow queries to be blocked while data in a collection is
being updated and we cannot load everything in a single-shot jumbo commit
(the replication could overload the cluster).

One solution I could imagine is storing an additional field "load
time-stamp" in all documents and the client (interactive query) application
extending all queries with an additional restriction, which requires
documents "load time-stamp" to be the latest known completed "load
time-stamp".

This concept would work according to the following:
1.) The batch job would simply start loading new documents, with the new
"load time-stamp". Existing documents would not be touched.
2.) The client (interactive query) application would still use the old data
from the previous load (since all queries are restricted with the old "load
time-stamp")
3.) The batch job would store the new "load time-stamp" as the one to be
used (e.g. in a separate collection etc.) -- after this, all queries would
return the most up-to-data documents
4.) The batch job would purge all documents from the collection, where
the "load time-stamp" is not the same as the last one.

This approach seems to be implementable, however, I definitely want to
avoid reinventing the wheel myself and wondering if there is any better
solution or built-in Solr Cloud feature to achieve the same or something
similar.

Thanks,
Peter

Re: auto purge for embedded zookeeper

On 5/9/2016 1:11 PM, tedsolr wrote:
> I have a development environment that is using an embedded zookeeper, and the
> zoo_data folder continues to grow. It's filled with snapshot files that are
> not getting purged. zoo.cfg has properties
> autopurge.snapRetainCount=10
> autopurge.purgeInterval=1
> Perhaps it's not in the correct location so its not getting read? Or maybe
> these props don't apply for embedded instances?
>
> Anyone know? Thanks!
> v5.2.1

Reading the source for the SolrZkServer class, it appears that only a
limited set of properties in that config file is parsed by the embedded
zookeeper.  Only these properties are used to configure the server, all
others are ignored:

server.*
group.*
weight.*
dataDir
dataLogDir
clientPort
tickTime
initLimit
syncLimit
electionAlg
maxClientCnxns

The reason that it ignores everything else is that this code is copied
from Zookeeper 3.2 -- which is over six years old.  Zookeeper did not
have snapshot purging functionality back then.

Although this is something we can fix by copying some of the code from
the latest Zookeeper into Solr and making some changes, the way Solr
implements the embedded zookeeper functionality will be susceptible to
similar problems in the future unless we upgrade zookeeper to 3.5.x and
change the embedded zookeeper implementation. ZK 3.5 is only available
as an alpha version, and may not be available in a stable version for a
few months.

You have mentioned that it's a dev environment.  A production
environment configured according to recommendations (no embedded
zookeeper) would not have this problem.

I would recommend scripting something yourself to clean up the zookeeper
data directory ... because even if we do fix this problem, the fix won't
likely be available in a regular Solr release for several weeks, and
will only be available in a new 6.x version, not anything in 4.x or 5.x.

Thanks,
Shawn

what scene using carrot2 cluster

2016-05-10 Thread xiangliumi

hi，all

does someone have used carrot2 with solr，please give me a scene description 
when using carrot2，and best give me some links about deploying solr5.x and 
carrot2. thanks for your help!


thanks
Max Mi

Sent using CloudMagic Email 
[https://cloudmagic.com/k/d/mailapp?ct=pa&cv=8.4.52&pv=4.4.2&source=email_footer_2]

Re: query action with wrong result size zero

2016-05-10 Thread xiangliumi

Hi,Erick

thank you very much, i will do more try following your words.


thanks
Max Mi

Sent using CloudMagic Email 
[https://cloudmagic.com/k/d/mailapp?ct=pa&cv=8.4.52&pv=4.4.2&source=email_footer_2]
 On Fri, May 06, 2016 at 11:39 PM, Erick < erickerick...@gmail.com 
[erickerick...@gmail.com] > wrote:
bq: does this means that different kinds of docs can not be put into
the same solr core

You can certainly put different kinds of docs in the same core,
you just have to search them appropriately, something like
q=field1:value OR field2:value

Say doc1 had "value" in field1 (but did not have field2)
and doc2 had "value" in field2 (but did not have field1)

Then the above query would return both docs.

However, this may have surprising results since presumably
the different "types" of docs represent very different things.
Let's say you have "people" and "places" docs. Ogden is a
surname, but there is also a city in Utah called "Ogden".
A search like above might return both and if the user expected
to be searching places they'd be surprised to see a person.

So, to sum up there's no restriction on having different types
of docs with different fields in Solr, you just have to search
them appropriately (and so the users get what they expect).

Very often, people will put a "type" field in the doc and restrict
what kinds of docs are returned with an fq clause (fq=type:people
in the above example for instance) when appropriate.

Best,
Erick

On Thu, May 5, 2016 at 10:58 PM, 梦在远方  wrote:
> thank you ,Jay Potharaju
>
>
> I got a discover, in the same one solr core , i put two kinds of docs, which 
> means that they does not have the same fields, does this means that different 
> kinds of docs can not be put into the same solr core?
>
>
> thanks!
> 
> max mi
>
>
>
>
> -- 原始邮件 --
> 发件人: "Erick Erickson";;
> 发送时间: 2016年5月6日(星期五) 中午12:14
> 收件人: "solr-user";
>
> 主题: Re: query action with wrong result size zero
>
>
>
> Please show us:
> 1> a sample doc that you expect to be returned
> 2> the results of adding '&debug=query' to the URL
> 3> the schema definition for the field you're querying against.
>
> It is likely that your query isn't quite what you think it is, is going
> against a different field than you think or your schema isn't
> quite doing what you think...
>
> On Thu, May 5, 2016 at 9:40 AM, Jay Potharaju  wrote:
>> Can you check if the field you are searching on is case sensitive? You can
>> quickly test it by copying the exact contents of the brand field into your
>> query and comparing it against the query you have posted above.
>>
>> On Thu, May 5, 2016 at 8:57 AM, mixiangliu <852262...@qq.com> wrote:
>>
>>>
>>> i found a strange thing with solr query，when i set the value of query
>>> field like "brand:amd"，the size of query result is zero,but the real data
>>> is not zero，can some body tell me why，thank you very much！！
>>> my english is not very good，wish some body understand my words!
>>>
>>
>>
>>
>> --
>> Thanks
>> Jay Potharaju

Re: Solr 5.4.1 Mergeindexes duplicate rows

On 5/9/2016 7:55 AM, Kalpana wrote:
> Can anyone help me with a merge. Currently I have the two cores already
> pulling data from SQL Table based on the query I set up.
>
> Solr is running
>
> I also have a third core set up with schema similar to the first two. and
> then I wrote this in the url and hit enter 
> http://localhost:8983/solr/admin/cores?action=mergeindexes&core=Sitecore_SharePoint&srcCore=sitecore_web_index&srcCore=SharePoint_All
>
> I stop and start Solr and I see data with duplicates.
>
> Am I doing this right? 

Some questions:

Is Solr in cloud mode or running standalone?

If you look at the core overview in the admin UI for these three cores,
can you tell me what Num Docs, Max Doc, and the index size is for all
three indexes?

Are the schemas in these three indexes all using the same field name for
uniqueKey?

Are you sure that you have only run the merge once?  Alternately, before
each merge attempt, you could entirely delete
$SOLR_HOME/Sitecore_Sharepoint/data and reload the core or restart Solr.

Thanks,
Shawn

Re: Solr 5.x bug with Service installation script?