date:20151117

Re: Solr Search: Access Control / Role based security

2015-11-17 Thread Noble Paul

I haven't evaluated manifoldCF for this .
However , my preference would be to have a generic mechanism in built
into Solr to restrict user access to certain docs based on some field
values. Relying on external tools make life complex for users who do
not like it.

Our strategy is

* Provide a pluggable framework so that custom external solutions can
be plugged in
* Provide a standard implementation which does not depend upon any
external solutions

any suggestions are welcome


On Wed, Nov 11, 2015 at 12:07 AM, Susheel Kumar  wrote:
> Thanks everyone for the suggestions.
>
> Hi Noble - Were there any thoughts made on utilizing Apache ManifoldCF
> while developing Authentication/Authorization plugins or anything to add
> there.
>
> Thanks,
> Susheel
>
> On Tue, Nov 10, 2015 at 5:01 AM, Alessandro Benedetti > wrote:
>
>> I've been working for a while with Apache ManifoldCF and Enterprise Search
>> in Solr ( with Document level security) .
>> Basically you can add a couple of extra fields , for example :
>>
>> allow_token : containing all the tokens that can view the document
>> deny_token : containing all the tokens that are denied to view the document
>>
>> Apache ManifoldCF provides an integration that add an additional layer, and
>> is able to combine different data sources permission schemes.
>> The Authority Service endpoint will take in input the user name and return
>> all the allow_token values and deny_token.
>> At this point you can append the related filter queries to your queries and
>> be sure that the user will only see what is supposed to see.
>>
>> It's basically an extension of the strategy you were proposing, role based.
>> Of course keep protected your endpoints and avoid users to put custom fq,
>> or all your document security model would be useless :)
>>
>> Cheers
>>
>>
>> On 9 November 2015 at 21:52, Scott Stults <
>> sstu...@opensourceconnections.com
>> > wrote:
>>
>> > Susheel,
>> >
>> > This is perfectly fine for simple use-cases and has the benefit that the
>> > filterCache will help things stay nice and speedy. Apache ManifoldCF
>> goes a
>> > bit further and ties back to your authentication and authorization
>> > mechanism:
>> >
>> >
>> >
>> http://manifoldcf.apache.org/release/trunk/en_US/concepts.html#ManifoldCF+security+model
>> >
>> >
>> > k/r,
>> > Scott
>> >
>> > On Thu, Nov 5, 2015 at 2:26 PM, Susheel Kumar 
>> > wrote:
>> >
>> > > Hi,
>> > >
>> > > I have seen couple of use cases / need where we want to restrict result
>> > of
>> > > search based on role of a user.  For e.g.
>> > >
>> > > - if user role is admin, any document from the search result will be
>> > > returned
>> > > - if user role is manager, only documents intended for managers will be
>> > > returned
>> > > - if user role is worker, only documents intended for workers will be
>> > > returned
>> > >
>> > > Typical practise is to tag the documents with the roles (using a
>> > > multi-valued field) during indexing and then during search append
>> filter
>> > > query to restrict result based on roles.
>> > >
>> > > Wondering if there is any other better way out there and if this common
>> > > requirement should be added as a Solr feature/plugin.
>> > >
>> > > The current security plugins are more towards making Solr
>> apis/resources
>> > > secure not towards securing/controlling data during search.
>> > >
>> > >
>> >
>> https://cwiki.apache.org/confluence/display/solr/Authentication+and+Authorization+Plugins
>> > >
>> > >
>> > > Please share your thoughts.
>> > >
>> > > Thanks,
>> > > Susheel
>> > >
>> >
>> >
>> >
>> > --
>> > Scott Stults | Founder & Solutions Architect | OpenSource Connections,
>> LLC
>> > | 434.409.2780
>> > http://www.opensourceconnections.com
>> >
>>
>>
>>
>> --
>> --
>>
>> Benedetti Alessandro
>> Visiting card : http://about.me/alessandro_benedetti
>>
>> "Tyger, tyger burning bright
>> In the forests of the night,
>> What immortal hand or eye
>> Could frame thy fearful symmetry?"
>>
>> William Blake - Songs of Experience -1794 England
>>



-- 
-
Noble Paul

Re: Security Problems

2015-11-17 Thread Noble Paul

The authentication plugin is not expensive if you are talking in the
context of admin UI. After all it is used not like 100s of requests
per second.

The simplest solution would be

provide a well known permission name called "admin-ui"

ensure that every admin page load makes a call to some resource say
"/admin/security-check"

Then we can just protect that .

The only concern thatI have is the false sense of security it would
give to the user

But, that is a different point altogether

On Wed, Nov 11, 2015 at 1:52 AM, Upayavira  wrote:
> Is the authentication plugin that expensive?
>
> I can help by minifying the UI down to a smaller number of CSS/JS/etc
> files :-)
>
> It may be overkill, but it would also give better experience. And isn't
> that what most applications do? Check authentication tokens on every
> request?
>
> Upayavira
>
> On Tue, Nov 10, 2015, at 07:33 PM, Anshum Gupta wrote:
>> The reason why we bypass that is so that we don't hit the authentication
>> plugin for every request that comes in for static content. I think we
>> could
>> call the authentication plugin for that but that'd be an overkill. Better
>> experience ? yes
>>
>> On Tue, Nov 10, 2015 at 11:24 AM, Upayavira  wrote:
>>
>> > Noble,
>> >
>> > I get that a UI which is open source does not benefit from ACL control -
>> > we're not giving away anything that isn't public (other than perhaps
>> > info that could be used to identify the version of Solr, or even the
>> > fact that it *is* solr).
>> >
>> > However, from a user experience point of view, requiring credentials to
>> > see the UI would be more conventional, and therefore lead to less
>> > confusion. Is it possible for us to protect the UI static files, only
>> > for the sake of user experience, rather than security?
>> >
>> > Upayavira
>> >
>> > On Tue, Nov 10, 2015, at 12:01 PM, Noble Paul wrote:
>> > > The admin UI is a bunch of static pages . We don't let the ACL control
>> > > static content
>> > >
>> > > you must blacklist all the core/collection apis and it is pretty much
>> > > useless for anyone to access the admin UI (w/o the credentials , of
>> > > course)
>> > >
>> > > On Tue, Nov 10, 2015 at 7:08 AM, 马柏樟  wrote:
>> > > > Hi,
>> > > >
>> > > > After I configure Authentication with Basic Authentication Plugin and
>> > Authorization with Rule-Based Authorization Plugin, How can I prevent the
>> > strangers from visiting my solr by browser? For example, if the stranger
>> > visit the http://(my host):8983, the browser will pop up a window and
>> > says "the server http://(my host):8983 requires a username and
>> > password"
>> > >
>> > >
>> > >
>> > > --
>> > > -
>> > > Noble Paul
>> >
>>
>>
>>
>> --
>> Anshum Gupta



-- 
-
Noble Paul

Re: Performance testing on SOLR cloud

2015-11-17 Thread Keith L

to add to Ericks point:

It's also highly dependent on the types of queries you expect (sorting,
faceting, fq, q, size of documents) and how many concurrent updates you
expect. If most queries are going to be similar and you are not going to be
updating very often, you can expect most of your index to be loaded into
page cache and lots of your queries to loaded from doc or query cache
(especially if you can optimize your fq to be similar vs using q and which
introduces scoring overhead). Adding more replicas will help distribute the
load. Adding shards will allow you to parallelize things but add some
memory and latency overhead because results still need to be merged. If
your shards are across multiple machine you now introduce network latency.
I've seen good success with using many shards in the same jvm but this is
with collections with billions of documents.

On Tue, Nov 17, 2015 at 9:07 PM Erick Erickson 
wrote:

> I wouldn't bother to shard either. YMMV of course, but 2.2M documents
> is actually a pretty small number unless the docs themselves are huge.
> Sharding introduces inevitable overhead, so it's usually the last
> thing you resort to.
>
> As far as the number of replicas is concerned, that's strictly a
> function of what QPS you need. Let's say you do not shard and have a
> query rate of 20 queries-per-second. If you need to support 100 QPS,
> just add 4 more replicas, this can be done any time.
>
> Best,
> Erick
>
> On Tue, Nov 17, 2015 at 3:38 PM, Markus Jelsma
>  wrote:
> > Hi - we use the Siege load testing program. It can take a seed list of
> URL's, taken from actual user input, and can put load in parallel. It won't
> reuse common queries unless you prepare your seed list appropriately. If
> your setup achieves the goal your client anticipates, then you are fine.
> Siege is not a good tool to test extreme QPS due to obvious single machine
> and network limitations.
> >
> > Assuming your JVM heap settings and Solr cache settings are optimal, and
> your only question is how many shards, then increase the number of shards.
> Oversharding can be beneficial because more threads process less data.
> Every single core search is single threaded, so oversharding on the same
> hardware makes sense, and it seems to pay off.
> >
> > Make sure you run multiple long stress tests and restart JVM's in
> between because a) query times and load tend to regress to the mean and b)
> because HotSpot needs to 'warm up' so short tests make less sense.
> >
> > M.
> >
> >
> >
> > -Original message-
> >> From:Aswath Srinivasan (TMS) 
> >> Sent: Tuesday 17th November 2015 23:46
> >> To: solr-user@lucene.apache.org
> >> Subject: Performance testing on SOLR cloud
> >>
> >> Hi fellow developers,
> >>
> >> Please share your experience, on how you did performance testing on
> SOLR? What I'm trying to do is have SOLR cloud on 3 Linux servers with 16
> GB RAM and index a total of 2.2 million. Yet to decide how many shards and
> replicas to have (Any hint on this is welcome too, basically 'only'
> performance testing, so suggest the number of shards and replicas if you
> can). Ultimately, I'm trying to find the QPS that this SOLR cloud set up
> can handle.
> >>
> >> To summarize,
> >>
> >> 1.   Find the QPS that my solr cloud set up can support
> >>
> >> 2.   Using 5.3.1 version with external zookeeper
> >>
> >> 3.   3 Linux servers with 16 GB RAM and index a total of 2.2 million
> documents
> >>
> >> 4.   Yet to decide number of shards and replicas
> >>
> >> 5.   Not using any custom search application (performance testing for
> SOLR and not for Search portal)
> >>
> >> Thank you
> >>
>

Re: StringIndexOutOfBoundsException using spellcheck and synonyms

2015-11-17 Thread Derek Poh


Hi

Any advice how to resolve or workaround to this issue?


On 11/17/2015 8:28 AM, Derek Poh wrote:

Hi Scott

I amusing Solr 4.10.4.

On 11/16/2015 10:06 PM, Scott Stults wrote:

Hi Derek,

Could you please add what version of Solr you see this in? I didn't 
see a

related Jira, so this might warrant a new one.


k/r,
Scott

On Sun, Nov 15, 2015 at 11:01 PM, Derek Poh  
wrote:



Hi
Iam using spellcheck and synonyms.I am getting
"java.lang.StringIndexOutOfBoundsException: String index out of 
range: -1"

for some keywords.

I think I managed to narrow down to the likely caused of it.
I have thisline of entry in the synonyms.txt file,

body spray,cologne,parfum,parfume,perfume,purfume,toilette

When I search for 'cologne' it will hit the exception.
If I removed the'body spray' from the line, I will not hit the 
exception.


cologne,parfum,parfume,perfume,purfume,toilette

It seems like it could be due to multi terms in the synonyms files but
there are some keywords with multi terms in synonyms that does not 
has the

issue.
This line has a multi term "paint ball" in it, when I search for 
paintball

or paintballs it does not hit the exception.

paintball,paintballs,paint ball


Any advice how can I resolve this issue?


The field use for spellcheck:




 
   
 
 
 
 
   
   
 
 
 synonyms="synonyms.txt"

ignoreCase="true" expand="true"/>
 
   
 


Exception stacktrace:
2015-11-16T07:06:43,055 - ERROR 
[qtp744979286-193443:SolrException@142] -
null:java.lang.StringIndexOutOfBoundsException: String index out of 
range:

-1
 at
java.lang.AbstractStringBuilder.replace(AbstractStringBuilder.java:789)
 at java.lang.StringBuilder.replace(StringBuilder.java:266)
 at
org.apache.solr.spelling.SpellCheckCollator.getCollation(SpellCheckCollator.java:235) 


 at
org.apache.solr.spelling.SpellCheckCollator.collate(SpellCheckCollator.java:92) 


 at
org.apache.solr.handler.component.SpellCheckComponent.addCollationsToResponse(SpellCheckComponent.java:230) 


 at
org.apache.solr.handler.component.SpellCheckComponent.process(SpellCheckComponent.java:197) 


 at
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:218) 


 at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) 


 at org.apache.solr.core.SolrCore.execute(SolrCore.java:1976)
 at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:777) 


 at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:418) 


 at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:207) 


 at
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652) 


 at
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:585) 


 at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143) 


 at
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:577) 


 at
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:223) 


 at
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1127) 


 at
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515) 


 at
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185) 


 at
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1061) 


 at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141) 


 at
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:215) 


 at
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:110) 


 at
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97) 


 at org.eclipse.jetty.server.Server.handle(Server.java:497)
 at
org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:310)
 at
org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:257) 


 at org.eclipse.jetty.io
.AbstractConnection$2.run(AbstractConnection.java:540)
 at
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:635) 


 at
org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:555) 


 at java.lang.Thread.run(Thread.java:722)

Derek

--
CONFIDENTIALITY NOTICE
This e-mail (including any attachments) may contain confidential and/or
privileged information. If you are not the intended recipient or have
received this e-mail in error, please inform the sender immediately and
delete this e-mail (including any attachments) from your computer, 
and you
must not use, disclose to anyone else or copy this e-m

Re: Split Shards

2015-11-17 Thread Erick Erickson

num_shard is indeed the number of shards created when you create the collection.

num_shards is irrelevant to the splitshard command.

You can look in your state.json (collectionstate.json in Solr 4x) to
find this number.

Best,
Erick

On Tue, Nov 17, 2015 at 2:25 PM, kiyer_adobe  wrote:
> Hi,
>
> Understand you provision the number of shards needed when you create the
> collection using num_shards parameter.
> Few questions:
> - Is this only for initial number of shards or would apply when you split
> the original shard as well?
> - What happens when the splits go over the number of shards that you
> initially allocated?
> - How/where can you see the number of shards allocated when you created the
> collection?
>
> Thanks.
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Split-Shards-tp4240699.html
> Sent from the Solr - User mailing list archive at Nabble.com.

Re: Performance testing on SOLR cloud

2015-11-17 Thread Erick Erickson

I wouldn't bother to shard either. YMMV of course, but 2.2M documents
is actually a pretty small number unless the docs themselves are huge.
Sharding introduces inevitable overhead, so it's usually the last
thing you resort to.

As far as the number of replicas is concerned, that's strictly a
function of what QPS you need. Let's say you do not shard and have a
query rate of 20 queries-per-second. If you need to support 100 QPS,
just add 4 more replicas, this can be done any time.

Best,
Erick

On Tue, Nov 17, 2015 at 3:38 PM, Markus Jelsma
 wrote:
> Hi - we use the Siege load testing program. It can take a seed list of URL's, 
> taken from actual user input, and can put load in parallel. It won't reuse 
> common queries unless you prepare your seed list appropriately. If your setup 
> achieves the goal your client anticipates, then you are fine. Siege is not a 
> good tool to test extreme QPS due to obvious single machine and network 
> limitations.
>
> Assuming your JVM heap settings and Solr cache settings are optimal, and your 
> only question is how many shards, then increase the number of shards. 
> Oversharding can be beneficial because more threads process less data. Every 
> single core search is single threaded, so oversharding on the same hardware 
> makes sense, and it seems to pay off.
>
> Make sure you run multiple long stress tests and restart JVM's in between 
> because a) query times and load tend to regress to the mean and b) because 
> HotSpot needs to 'warm up' so short tests make less sense.
>
> M.
>
>
>
> -Original message-
>> From:Aswath Srinivasan (TMS) 
>> Sent: Tuesday 17th November 2015 23:46
>> To: solr-user@lucene.apache.org
>> Subject: Performance testing on SOLR cloud
>>
>> Hi fellow developers,
>>
>> Please share your experience, on how you did performance testing on SOLR? 
>> What I'm trying to do is have SOLR cloud on 3 Linux servers with 16 GB RAM 
>> and index a total of 2.2 million. Yet to decide how many shards and replicas 
>> to have (Any hint on this is welcome too, basically 'only' performance 
>> testing, so suggest the number of shards and replicas if you can). 
>> Ultimately, I'm trying to find the QPS that this SOLR cloud set up can 
>> handle.
>>
>> To summarize,
>>
>> 1.   Find the QPS that my solr cloud set up can support
>>
>> 2.   Using 5.3.1 version with external zookeeper
>>
>> 3.   3 Linux servers with 16 GB RAM and index a total of 2.2 million 
>> documents
>>
>> 4.   Yet to decide number of shards and replicas
>>
>> 5.   Not using any custom search application (performance testing for SOLR 
>> and not for Search portal)
>>
>> Thank you
>>

Re: EdgeNGramFilterFactory not working? Solr 5.3.1

2015-11-17 Thread Daniel Valdivia

Hi Markus,

I did, everytime I run this experiment I start from 0 :)

However, after the last change I did seems like I forgot to commit and I 
couldn't get results, so now I have some results.

The resolution to this problem was specifying the search in the dispNamePrefix 
field :O

Thanks Markus and Alexandre

> On Nov 17, 2015, at 3:40 PM, Markus Jelsma  wrote:
> 
> Hi - the usual suspect is: 'did you reindex?' Not seeing things change after 
> modifying index-time analysis chains means you need to reindex.
> 
> M.
> 
> 
> 
> -Original message-
>> From:Daniel Valdivia 
>> Sent: Wednesday 18th November 2015 0:17
>> To: solr-user@lucene.apache.org
>> Subject: EdgeNGramFilterFactory not working? Solr 5.3.1
>> 
>> Hi,
>> 
>> I'm trying to get the EdgeNGramFilterFactory filter to work on a certain 
>> field, however after defining the fieldType, creating a field for it and 
>> copying the source, this doesn't seem to be working.
>> 
>> One catch here, that I'm not sure if it's affecting the outcome is that none 
>> of my fields are stored, everything but the document id in my index is 
>> stored=false
>> 
>> I'm using Solr 5.3.1, and I know in my corpus the word "incident" is 
>> present, I can search for it, but looking for "inci" yields no results
>> 
>> http://localhost:8983/solr/superCore/select?q=inci&fl=record_display_name&wt=json&indent=true
>> 
>> Any idea on what could I be doing wrong?
>> 
>> This is how I define the field type
>> 
>> {
>>  "add-field-type" : {
>>"indexed" : true,
>>"queryAnalyzer" : {
>>  "filters" : [
>>{
>>  "class" : "solr.LowerCaseFilterFactory"
>>}
>>  ],
>>  "tokenizer" : {
>>"class" : "solr.WhitespaceTokenizerFactory"
>>  }
>>},
>>"indexAnalyzer" : {
>>  "filters" : [
>>{
>>  "class" : "solr.LowerCaseFilterFactory"
>>},
>>{
>>  "class" : "solr.EdgeNGramFilterFactory",
>>  "minGramSize" : "2",
>>  "maxGramSize" : "10"
>>}
>>  ],
>>  "tokenizer" : {
>>"class" : "solr.WhitespaceTokenizerFactory"
>>  }
>>},
>>"stored" : false,
>>"name" : "prefix",
>>"class" : "solr.TextField"
>>  }
>> }
>> 
>> Adding the field
>> 
>> {
>>  "add-field":{
>> "name":"dispNamePrefix",
>> "type":"prefix",
>> "stored":false }
>> }
>> 
>> Copy field
>> 
>> {
>>  "add-copy-field":{
>> "source":"record_display_name",
>> "dest":[ "dispNamePrefix"]}
>> }

RE: EdgeNGramFilterFactory not working? Solr 5.3.1

2015-11-17 Thread Markus Jelsma

Hi - the usual suspect is: 'did you reindex?' Not seeing things change after 
modifying index-time analysis chains means you need to reindex.

M.

 
 
-Original message-
> From:Daniel Valdivia 
> Sent: Wednesday 18th November 2015 0:17
> To: solr-user@lucene.apache.org
> Subject: EdgeNGramFilterFactory not working? Solr 5.3.1
> 
> Hi,
> 
> I'm trying to get the EdgeNGramFilterFactory filter to work on a certain 
> field, however after defining the fieldType, creating a field for it and 
> copying the source, this doesn't seem to be working.
> 
> One catch here, that I'm not sure if it's affecting the outcome is that none 
> of my fields are stored, everything but the document id in my index is 
> stored=false
> 
> I'm using Solr 5.3.1, and I know in my corpus the word "incident" is present, 
> I can search for it, but looking for "inci" yields no results
> 
> http://localhost:8983/solr/superCore/select?q=inci&fl=record_display_name&wt=json&indent=true
> 
> Any idea on what could I be doing wrong?
> 
> This is how I define the field type
> 
> {
>   "add-field-type" : {
> "indexed" : true,
> "queryAnalyzer" : {
>   "filters" : [
> {
>   "class" : "solr.LowerCaseFilterFactory"
> }
>   ],
>   "tokenizer" : {
> "class" : "solr.WhitespaceTokenizerFactory"
>   }
> },
> "indexAnalyzer" : {
>   "filters" : [
> {
>   "class" : "solr.LowerCaseFilterFactory"
> },
> {
>   "class" : "solr.EdgeNGramFilterFactory",
>   "minGramSize" : "2",
>   "maxGramSize" : "10"
> }
>   ],
>   "tokenizer" : {
> "class" : "solr.WhitespaceTokenizerFactory"
>   }
> },
> "stored" : false,
> "name" : "prefix",
> "class" : "solr.TextField"
>   }
> }
> 
> Adding the field
> 
> {
>   "add-field":{
>  "name":"dispNamePrefix",
>  "type":"prefix",
>  "stored":false }
> }
> 
> Copy field
> 
> {
>   "add-copy-field":{
>  "source":"record_display_name",
>  "dest":[ "dispNamePrefix"]}
> }

RE: Performance testing on SOLR cloud

2015-11-17 Thread Markus Jelsma

Hi - we use the Siege load testing program. It can take a seed list of URL's, 
taken from actual user input, and can put load in parallel. It won't reuse 
common queries unless you prepare your seed list appropriately. If your setup 
achieves the goal your client anticipates, then you are fine. Siege is not a 
good tool to test extreme QPS due to obvious single machine and network 
limitations.

Assuming your JVM heap settings and Solr cache settings are optimal, and your 
only question is how many shards, then increase the number of shards. 
Oversharding can be beneficial because more threads process less data. Every 
single core search is single threaded, so oversharding on the same hardware 
makes sense, and it seems to pay off.

Make sure you run multiple long stress tests and restart JVM's in between 
because a) query times and load tend to regress to the mean and b) because 
HotSpot needs to 'warm up' so short tests make less sense.

M.

 
 
-Original message-
> From:Aswath Srinivasan (TMS) 
> Sent: Tuesday 17th November 2015 23:46
> To: solr-user@lucene.apache.org
> Subject: Performance testing on SOLR cloud
> 
> Hi fellow developers,
> 
> Please share your experience, on how you did performance testing on SOLR? 
> What I'm trying to do is have SOLR cloud on 3 Linux servers with 16 GB RAM 
> and index a total of 2.2 million. Yet to decide how many shards and replicas 
> to have (Any hint on this is welcome too, basically 'only' performance 
> testing, so suggest the number of shards and replicas if you can). 
> Ultimately, I'm trying to find the QPS that this SOLR cloud set up can handle.
> 
> To summarize,
> 
> 1.   Find the QPS that my solr cloud set up can support
> 
> 2.   Using 5.3.1 version with external zookeeper
> 
> 3.   3 Linux servers with 16 GB RAM and index a total of 2.2 million documents
> 
> 4.   Yet to decide number of shards and replicas
> 
> 5.   Not using any custom search application (performance testing for SOLR 
> and not for Search portal)
> 
> Thank you
>

Re: EdgeNGramFilterFactory not working? Solr 5.3.1

2015-11-17 Thread Alexandre Rafalovitch

Here would be my debugging sequence:

1. Are you actually searching against: dispNamePrefix (and not against
the default text field which has its own analyzer stack)?
2. Do you see the field definition in the Schema Browser screen?
3. If you on that screen, click "Load Term Info" do you see the partial terms?
4. If you go to the Analysis screen, you should be able to select the
field (or the type) from the drop-down and put both text to index and
text to search and see what happens to them and whether they match.

Regards,
   Alex.

Newsletter and resources for Solr beginners and intermediates:
http://www.solr-start.com/


On 17 November 2015 at 18:17, Daniel Valdivia  wrote:
> Hi,
>
> I'm trying to get the EdgeNGramFilterFactory filter to work on a certain 
> field, however after defining the fieldType, creating a field for it and 
> copying the source, this doesn't seem to be working.
>
> One catch here, that I'm not sure if it's affecting the outcome is that none 
> of my fields are stored, everything but the document id in my index is 
> stored=false
>
> I'm using Solr 5.3.1, and I know in my corpus the word "incident" is present, 
> I can search for it, but looking for "inci" yields no results
>
> http://localhost:8983/solr/superCore/select?q=inci&fl=record_display_name&wt=json&indent=true
>
> Any idea on what could I be doing wrong?
>
> This is how I define the field type
>
> {
>   "add-field-type" : {
> "indexed" : true,
> "queryAnalyzer" : {
>   "filters" : [
> {
>   "class" : "solr.LowerCaseFilterFactory"
> }
>   ],
>   "tokenizer" : {
> "class" : "solr.WhitespaceTokenizerFactory"
>   }
> },
> "indexAnalyzer" : {
>   "filters" : [
> {
>   "class" : "solr.LowerCaseFilterFactory"
> },
> {
>   "class" : "solr.EdgeNGramFilterFactory",
>   "minGramSize" : "2",
>   "maxGramSize" : "10"
> }
>   ],
>   "tokenizer" : {
> "class" : "solr.WhitespaceTokenizerFactory"
>   }
> },
> "stored" : false,
> "name" : "prefix",
> "class" : "solr.TextField"
>   }
> }
>
> Adding the field
>
> {
>   "add-field":{
>  "name":"dispNamePrefix",
>  "type":"prefix",
>  "stored":false }
> }
>
> Copy field
>
> {
>   "add-copy-field":{
>  "source":"record_display_name",
>  "dest":[ "dispNamePrefix"]}
> }

EdgeNGramFilterFactory not working? Solr 5.3.1

2015-11-17 Thread Daniel Valdivia

Hi,

I'm trying to get the EdgeNGramFilterFactory filter to work on a certain field, 
however after defining the fieldType, creating a field for it and copying the 
source, this doesn't seem to be working.

One catch here, that I'm not sure if it's affecting the outcome is that none of 
my fields are stored, everything but the document id in my index is stored=false

I'm using Solr 5.3.1, and I know in my corpus the word "incident" is present, I 
can search for it, but looking for "inci" yields no results

http://localhost:8983/solr/superCore/select?q=inci&fl=record_display_name&wt=json&indent=true

Any idea on what could I be doing wrong?

This is how I define the field type

{
  "add-field-type" : {
"indexed" : true,
"queryAnalyzer" : {
  "filters" : [
{
  "class" : "solr.LowerCaseFilterFactory"
}
  ],
  "tokenizer" : {
"class" : "solr.WhitespaceTokenizerFactory"
  }
},
"indexAnalyzer" : {
  "filters" : [
{
  "class" : "solr.LowerCaseFilterFactory"
},
{
  "class" : "solr.EdgeNGramFilterFactory",
  "minGramSize" : "2",
  "maxGramSize" : "10"
}
  ],
  "tokenizer" : {
"class" : "solr.WhitespaceTokenizerFactory"
  }
},
"stored" : false,
"name" : "prefix",
"class" : "solr.TextField"
  }
}

Adding the field

{
  "add-field":{
 "name":"dispNamePrefix",
 "type":"prefix",
 "stored":false }
}

Copy field

{
  "add-copy-field":{
 "source":"record_display_name",
 "dest":[ "dispNamePrefix"]}
}

Re: Data Import Handler / Backup indexes

2015-11-17 Thread Jeff Wartes

https://github.com/whitepages/solrcloud_manager supports 5.x, and I added
some backup/restore functionality similar to SOLR-5750 in the last
release. 
Like SOLR-5750, this backup strategy requires a shared filesystem, but
note that unlike SOLR-5750, I haven’t yet added any backup functionality
for the contents of ZK. I’m currently working on some parts of that.

Making a copy of a collection is supported too, with some caveats.

On 11/17/15, 10:20 AM, "Brian Narsi"  wrote:

>Sorry I forgot to mention that we are using SolrCloud 5.1.0.
>
>
>
>On Tue, Nov 17, 2015 at 12:09 PM, KNitin  wrote:
>
>> afaik Data import handler does not offer backups. You can try using the
>> replication handler to backup data as you wish to any custom end point.
>>
>> You can also try out : https://github.com/bloomreach/solrcloud-haft.
>>This
>> helps backup solr indices across clusters.
>>
>> On Tue, Nov 17, 2015 at 7:08 AM, Brian Narsi  wrote:
>>
>> > I am using Data Import Handler to retrieve data from a database with
>> >
>> > full-import, clean = true, commit = true and optimize = true
>> >
>> > This has always worked correctly without any errors.
>> >
>> > But just to be on the safe side, I am thinking that we should do a
>>backup
>> > before initiating Data Import Handler. And just in case something
>>happens
>> > restore the backup.
>> >
>> > Can backup be done automatically (before initiating Data Import
>>Handler)?
>> >
>> > Thanks
>> >
>>

Performance testing on SOLR cloud

2015-11-17 Thread Aswath Srinivasan (TMS)

Hi fellow developers,

Please share your experience, on how you did performance testing on SOLR? What 
I'm trying to do is have SOLR cloud on 3 Linux servers with 16 GB RAM and index 
a total of 2.2 million. Yet to decide how many shards and replicas to have (Any 
hint on this is welcome too, basically 'only' performance testing, so suggest 
the number of shards and replicas if you can). Ultimately, I'm trying to find 
the QPS that this SOLR cloud set up can handle.

To summarize,

1.   Find the QPS that my solr cloud set up can support

2.   Using 5.3.1 version with external zookeeper

3.   3 Linux servers with 16 GB RAM and index a total of 2.2 million documents

4.   Yet to decide number of shards and replicas

5.   Not using any custom search application (performance testing for SOLR and 
not for Search portal)

Thank you

Re: Expand Component Fields Response

2015-11-17 Thread Joel Bernstein

Hi Marshall,

This sounds pretty reasonable. I should have some to review the patch later
in the week.

Joel Bernstein
http://joelsolr.blogspot.com/

On Tue, Nov 17, 2015 at 3:42 PM, Sanders, Marshall (AT - Atlanta) <
marshall.sand...@autotrader.com> wrote:

> Well I didn't receive any responses and couldn't find any resources so I
> created a patch and a corresponding JIRA to allow the ExpandComponent to
> use the TotalHitCountCollector which will only return the total hit count
> when expand.rows=0 which more accurately reflected my use case.  (We don't
> care about the expanded document itself, just the number available so that
> we can show something like "52 more like this")
>
> https://issues.apache.org/jira/browse/SOLR-8306
>
> I'm not sure if this is the right forum for something like this, or how to
> get feedback on the patch.  If someone more knowledgeable could help me out
> in that area it would be excellent.  Thanks!
>
> Marshall Sanders
> Technical Lead - Software Engineer
> Autotrader.com
> 404-568-7130
>
> -Original Message-
> From: Sanders, Marshall (AT - Atlanta) [mailto:
> marshall.sand...@autotrader.com]
> Sent: Monday, November 16, 2015 11:00 AM
> To: solr-user@lucene.apache.org
> Subject: Expand Component Fields Response
>
> Is it possible to specify a separate set of fields to return from the
> expand component which is different from the standard fl parameter?
> Something like this:
>
> fl=fielda&expand.fl=fieldb
>
> Our current use case means we actually only care about the numFound from
> the expand component and not any of the actual fields.  We could also use a
> facet for the field we're collapsing on, but this means mapping from the
> field we collapsed on to the different facets and isn't very elegant, and
> we also have to ask for a large facet.limit to make sure that we get the
> appropriate counts back.  This is pretty poor for high cardinality fields.
> The alternative is the current where we ask for the expand component and
> get TONS of information back that we don't care about.
>
> Thanks for any help!
>
> Marshall Sanders
> Technical Lead - Software Engineer
> Autotrader.com
> 404-568-7130
>
>

Re: Undo Split Shard

2015-11-17 Thread kiyer_adobe

Thanks Jan.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Undo-Split-Shard-tp4240508p4240698.html
Sent from the Solr - User mailing list archive at Nabble.com.

Split Shards

2015-11-17 Thread kiyer_adobe

Hi,

Understand you provision the number of shards needed when you create the
collection using num_shards parameter.
Few questions:
- Is this only for initial number of shards or would apply when you split
the original shard as well? 
- What happens when the splits go over the number of shards that you
initially allocated? 
- How/where can you see the number of shards allocated when you created the
collection?

Thanks.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Split-Shards-tp4240699.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Query gives response multiple times

2015-11-17 Thread Shane McCarthy

Thank you for the resources and all the help.  I hope that I clear this up
soon.  I will ask the Islandora folks for their thoughts.

Cheers,

Shane

On Tue, Nov 17, 2015 at 3:19 PM, Alexandre Rafalovitch 
wrote:

> Add echoParams=all to see what are default and other parameters that
> apply to your request
> https://wiki.apache.org/solr/CoreQueryParameters#echoParams .
> Specifically, what the 'fl' setting is. Or try setting 'fl' explicitly
> and see if the display changes.
>
> copyField, The one that I would expect to see is the one with your
> field as destination, not target.
>
> At this point, baring any other ideas, I would put the blame on
> Islandora. Or at least ask on their mailing list/forum.
>
> Or, if you want to troubleshoot on a lower level and can resubmit the
> documents to be reindexed via your pipeline, get your system
> administrator to put a WireShark on the Solr side or on a network in a
> sniffing mode. Or get him to modify Jetty that houses Solr to log
> incoming requests. Both are a bit hard to explain in emails, so either
> they know how to do it or not. But basically you are trying to catch
> an indexing operation in mid-flight in-between Solr and Islandora and
> see what kind of record is being sent to Solr. I expect it will have
> that multiplication issue already present.
>
> Regards,
> Alex.
>
> 
> Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
> http://www.solr-start.com/
>
>
> On 17 November 2015 at 13:07, Shane McCarthy  wrote:
> > Thank you for the speedy responses.
> >
> > I will give the results I have found based on your comments.
> >
> > In the schema.xml there are 19 dynamicField are specified.  The query and
> > the fields list I want are included in these 19 variables.  They are all
> > stored so based on your comment Erick I would assume that I should be
> > seeing them.  I am not.  Any ideas why I am not?
> >
> > There are 10 copyField directives in the schema.xml.  The only one that
> > matches the variables with duplicated results is  > dest="catch_all_fields_mt"/>.
> >
> > How can I find if I am using a custom request handler?  I had assumed I
> was
> > using the default as in the Request Handler box it has /select.
> >
> > Thanks,
> >
> > Shane
> >
> >
> >
> >
> >
> >
> >
> > On Tue, Nov 17, 2015 at 12:52 PM, Alexandre Rafalovitch <
> arafa...@gmail.com>
> > wrote:
> >
> >> If you have access to the Admin UI, go to the Schema Browser field
> >> under the core (I assume Solr 4+ here, never actually asked for your
> >> version).
> >> https://cwiki.apache.org/confluence/display/solr/Schema+Browser+Screen
> >>
> >> You can see in the example that when you select a field, it will show
> >> whether it is copied to from other fields. That's the copyField I was
> >> talking about earlier. Check that first.
> >>
> >> You might also be able to get your schema.xml and solrconfig.xml on
> >> the Files screen:
> >> https://cwiki.apache.org/confluence/display/solr/Files+Screen. Check
> >> there for definitions. A definition for "fl" under "/select" or
> >> perhaps your custom request handler may restrict the fields you are
> >> getting.
> >>
> >> Regards,
> >>Alex.
> >> 
> >> Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
> >> http://www.solr-start.com/
> >>
> >>
> >> On 17 November 2015 at 11:32, Shane McCarthy  wrote:
> >> > Yes a complex situation and I am having trouble knowing who to ask
> for an
> >> > explanation.
> >> >
> >> > I have been given access to the Solr Admin page.   Using the
> persistent
> >> > identifier PID, I have done a query for PID = *. The PID that are
> found
> >> > with that query all have cml_hfenergy_md, a multivalued double field,
> >> > repeated 8 or 4 times.
> >> >
> >> > I have multiple string, multiple date fields that are also repeated
> >> >
> >> > There are single string and integer fields that are not repeated.
> >> >
> >> > Do you know if all results of the query are returned with the Solr
> Admin?
> >> > It tells me the numFound but does not give me the all the fields
> >> requested
> >> > for all the results.  Is this usual behaviour?
> >> >
> >> > Cheers,
> >> >
> >> > Shane
> >> >
> >> >
> >> > On Mon, Nov 16, 2015 at 7:11 PM, Alexandre Rafalovitch <
> >> arafa...@gmail.com>
> >> > wrote:
> >> >
> >> >> On 16 November 2015 at 17:40, Shane McCarthy 
> wrote:
> >> >> > I am using an instance of Islandora.
> >> >>
> >> >> Ah. This complicates the situation as there is an unknown - to most
> of
> >> >> us - layer in between. So, it is not clear whether this
> multiplication
> >> >> is happening in Solr or in Islandora.
> >> >>
> >> >> Your best option is to hit Solr server directly and basically do a
> >> >> query for a specific record's id with the fields that you are having
> a
> >> >> problem with. If that field for that record shows the problem the
> same
> >> >> way as through the full Islandora path, the problem is Solr. Then,
> you
> >> >> review the copyFields, etc. If it d

RE: Expand Component Fields Response

2015-11-17 Thread Sanders, Marshall (AT - Atlanta)

Well I didn't receive any responses and couldn't find any resources so I 
created a patch and a corresponding JIRA to allow the ExpandComponent to use 
the TotalHitCountCollector which will only return the total hit count when 
expand.rows=0 which more accurately reflected my use case.  (We don't care 
about the expanded document itself, just the number available so that we can 
show something like "52 more like this")

https://issues.apache.org/jira/browse/SOLR-8306

I'm not sure if this is the right forum for something like this, or how to get 
feedback on the patch.  If someone more knowledgeable could help me out in that 
area it would be excellent.  Thanks!

Marshall Sanders
Technical Lead - Software Engineer
Autotrader.com
404-568-7130

-Original Message-
From: Sanders, Marshall (AT - Atlanta) [mailto:marshall.sand...@autotrader.com] 
Sent: Monday, November 16, 2015 11:00 AM
To: solr-user@lucene.apache.org
Subject: Expand Component Fields Response

Is it possible to specify a separate set of fields to return from the expand 
component which is different from the standard fl parameter?  Something like 
this:

fl=fielda&expand.fl=fieldb

Our current use case means we actually only care about the numFound from the 
expand component and not any of the actual fields.  We could also use a facet 
for the field we're collapsing on, but this means mapping from the field we 
collapsed on to the different facets and isn't very elegant, and we also have 
to ask for a large facet.limit to make sure that we get the appropriate counts 
back.  This is pretty poor for high cardinality fields.  The alternative is the 
current where we ask for the expand component and get TONS of information back 
that we don't care about.

Thanks for any help!

Marshall Sanders
Technical Lead - Software Engineer
Autotrader.com
404-568-7130

Re: Date Math, NOW and filter queries

2015-11-17 Thread Mugeesh Husain

thanks all of you,
actually the problem was '+' sign is a URL-escape for space,

Using %2B instead of + sign, should be fine



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Date-Math-NOW-and-filter-queries-tp4240561p4240675.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Query gives response multiple times

2015-11-17 Thread Alexandre Rafalovitch

Add echoParams=all to see what are default and other parameters that
apply to your request
https://wiki.apache.org/solr/CoreQueryParameters#echoParams .
Specifically, what the 'fl' setting is. Or try setting 'fl' explicitly
and see if the display changes.

copyField, The one that I would expect to see is the one with your
field as destination, not target.

At this point, baring any other ideas, I would put the blame on
Islandora. Or at least ask on their mailing list/forum.

Or, if you want to troubleshoot on a lower level and can resubmit the
documents to be reindexed via your pipeline, get your system
administrator to put a WireShark on the Solr side or on a network in a
sniffing mode. Or get him to modify Jetty that houses Solr to log
incoming requests. Both are a bit hard to explain in emails, so either
they know how to do it or not. But basically you are trying to catch
an indexing operation in mid-flight in-between Solr and Islandora and
see what kind of record is being sent to Solr. I expect it will have
that multiplication issue already present.

Regards,
Alex.


Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
http://www.solr-start.com/


On 17 November 2015 at 13:07, Shane McCarthy  wrote:
> Thank you for the speedy responses.
>
> I will give the results I have found based on your comments.
>
> In the schema.xml there are 19 dynamicField are specified.  The query and
> the fields list I want are included in these 19 variables.  They are all
> stored so based on your comment Erick I would assume that I should be
> seeing them.  I am not.  Any ideas why I am not?
>
> There are 10 copyField directives in the schema.xml.  The only one that
> matches the variables with duplicated results is  dest="catch_all_fields_mt"/>.
>
> How can I find if I am using a custom request handler?  I had assumed I was
> using the default as in the Request Handler box it has /select.
>
> Thanks,
>
> Shane
>
>
>
>
>
>
>
> On Tue, Nov 17, 2015 at 12:52 PM, Alexandre Rafalovitch 
> wrote:
>
>> If you have access to the Admin UI, go to the Schema Browser field
>> under the core (I assume Solr 4+ here, never actually asked for your
>> version).
>> https://cwiki.apache.org/confluence/display/solr/Schema+Browser+Screen
>>
>> You can see in the example that when you select a field, it will show
>> whether it is copied to from other fields. That's the copyField I was
>> talking about earlier. Check that first.
>>
>> You might also be able to get your schema.xml and solrconfig.xml on
>> the Files screen:
>> https://cwiki.apache.org/confluence/display/solr/Files+Screen. Check
>> there for definitions. A definition for "fl" under "/select" or
>> perhaps your custom request handler may restrict the fields you are
>> getting.
>>
>> Regards,
>>Alex.
>> 
>> Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
>> http://www.solr-start.com/
>>
>>
>> On 17 November 2015 at 11:32, Shane McCarthy  wrote:
>> > Yes a complex situation and I am having trouble knowing who to ask for an
>> > explanation.
>> >
>> > I have been given access to the Solr Admin page.   Using the persistent
>> > identifier PID, I have done a query for PID = *. The PID that are found
>> > with that query all have cml_hfenergy_md, a multivalued double field,
>> > repeated 8 or 4 times.
>> >
>> > I have multiple string, multiple date fields that are also repeated
>> >
>> > There are single string and integer fields that are not repeated.
>> >
>> > Do you know if all results of the query are returned with the Solr Admin?
>> > It tells me the numFound but does not give me the all the fields
>> requested
>> > for all the results.  Is this usual behaviour?
>> >
>> > Cheers,
>> >
>> > Shane
>> >
>> >
>> > On Mon, Nov 16, 2015 at 7:11 PM, Alexandre Rafalovitch <
>> arafa...@gmail.com>
>> > wrote:
>> >
>> >> On 16 November 2015 at 17:40, Shane McCarthy  wrote:
>> >> > I am using an instance of Islandora.
>> >>
>> >> Ah. This complicates the situation as there is an unknown - to most of
>> >> us - layer in between. So, it is not clear whether this multiplication
>> >> is happening in Solr or in Islandora.
>> >>
>> >> Your best option is to hit Solr server directly and basically do a
>> >> query for a specific record's id with the fields that you are having a
>> >> problem with. If that field for that record shows the problem the same
>> >> way as through the full Islandora path, the problem is Solr. Then, you
>> >> review the copyFields, etc. If it does not...
>> >>
>> >> Also, is this only happening with one "double" field but not another,
>> >> with all "double" fields or with some other combination?
>> >>
>> >> And did it start at some point or was this always like that?
>> >>
>> >> You need to figure out something to contrast the observed behavior
>> against.
>> >>
>> >> Regards,
>> >> Alex.
>> >>
>> >>
>> >>
>> >> 
>> >> Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
>> >> http://www.solr-start.com/
>>

Re: Multiple unique key in Schema

2015-11-17 Thread Erik Hatcher

Fair point indeed.  Depends on how your update process works though.  One can 
do the trick of assigning batch numbers to an indexing run and deleting 
documents that aren’t from that reindexing run for example, so it’s not 
necessary to overwrite documents to “replace” them per se.

Erik

> On Nov 17, 2015, at 9:01 AM, Mugeesh Husain  wrote:
> 
>>> Or perhaps use the UUID auto id feature. 
> if i use  UUID, then how i can update particular document, i think using
> this ,there will not any document identity 
> 
> 
> 
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Multiple-unique-key-in-Schema-tp4240550p4240557.html
> Sent from the Solr - User mailing list archive at Nabble.com.

Re: Query gives response multiple times

2015-11-17 Thread Shane McCarthy

I forgot to let you know that I am using Solr 4.2.

On Tue, Nov 17, 2015 at 2:07 PM, Shane McCarthy  wrote:

> Thank you for the speedy responses.
>
> I will give the results I have found based on your comments.
>
> In the schema.xml there are 19 dynamicField are specified.  The query and
> the fields list I want are included in these 19 variables.  They are all
> stored so based on your comment Erick I would assume that I should be
> seeing them.  I am not.  Any ideas why I am not?
>
> There are 10 copyField directives in the schema.xml.  The only one that
> matches the variables with duplicated results is  dest="catch_all_fields_mt"/>.
>
> How can I find if I am using a custom request handler?  I had assumed I
> was using the default as in the Request Handler box it has /select.
>
> Thanks,
>
> Shane
>
>
>
>
>
>
>
> On Tue, Nov 17, 2015 at 12:52 PM, Alexandre Rafalovitch <
> arafa...@gmail.com> wrote:
>
>> If you have access to the Admin UI, go to the Schema Browser field
>> under the core (I assume Solr 4+ here, never actually asked for your
>> version).
>> https://cwiki.apache.org/confluence/display/solr/Schema+Browser+Screen
>>
>> You can see in the example that when you select a field, it will show
>> whether it is copied to from other fields. That's the copyField I was
>> talking about earlier. Check that first.
>>
>> You might also be able to get your schema.xml and solrconfig.xml on
>> the Files screen:
>> https://cwiki.apache.org/confluence/display/solr/Files+Screen. Check
>> there for definitions. A definition for "fl" under "/select" or
>> perhaps your custom request handler may restrict the fields you are
>> getting.
>>
>> Regards,
>>Alex.
>> 
>> Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
>> http://www.solr-start.com/
>>
>>
>> On 17 November 2015 at 11:32, Shane McCarthy  wrote:
>> > Yes a complex situation and I am having trouble knowing who to ask for
>> an
>> > explanation.
>> >
>> > I have been given access to the Solr Admin page.   Using the persistent
>> > identifier PID, I have done a query for PID = *. The PID that are found
>> > with that query all have cml_hfenergy_md, a multivalued double field,
>> > repeated 8 or 4 times.
>> >
>> > I have multiple string, multiple date fields that are also repeated
>> >
>> > There are single string and integer fields that are not repeated.
>> >
>> > Do you know if all results of the query are returned with the Solr
>> Admin?
>> > It tells me the numFound but does not give me the all the fields
>> requested
>> > for all the results.  Is this usual behaviour?
>> >
>> > Cheers,
>> >
>> > Shane
>> >
>> >
>> > On Mon, Nov 16, 2015 at 7:11 PM, Alexandre Rafalovitch <
>> arafa...@gmail.com>
>> > wrote:
>> >
>> >> On 16 November 2015 at 17:40, Shane McCarthy 
>> wrote:
>> >> > I am using an instance of Islandora.
>> >>
>> >> Ah. This complicates the situation as there is an unknown - to most of
>> >> us - layer in between. So, it is not clear whether this multiplication
>> >> is happening in Solr or in Islandora.
>> >>
>> >> Your best option is to hit Solr server directly and basically do a
>> >> query for a specific record's id with the fields that you are having a
>> >> problem with. If that field for that record shows the problem the same
>> >> way as through the full Islandora path, the problem is Solr. Then, you
>> >> review the copyFields, etc. If it does not...
>> >>
>> >> Also, is this only happening with one "double" field but not another,
>> >> with all "double" fields or with some other combination?
>> >>
>> >> And did it start at some point or was this always like that?
>> >>
>> >> You need to figure out something to contrast the observed behavior
>> against.
>> >>
>> >> Regards,
>> >> Alex.
>> >>
>> >>
>> >>
>> >> 
>> >> Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
>> >> http://www.solr-start.com/
>> >>
>>
>
>

Re: solr 5.0 cloud ,leader's load is more higer than others

2015-11-17 Thread Shawn Heisey

On 11/17/2015 1:52 AM, 初十 wrote:
> Hello everyone!
>
> I use solr 5.0 with the RAMDirectory and a collection with 3 replication
> and 1 shard,
>
> the leader‘s  load is more higer than others, is it a bug ?

Version 5.2 includes a fix that balances the load better so there is not
such an imbalance between the leader and the other replicas when
indexing.  The leader will always have a slightly higher load even with
this fix, because it coordinates updates with the other nodes.

Upgrading to 5.3.1 would be sensible, but I would also stop using
RAMDirectory.  If you have enough memory to use RAMDirectory, then you
have enough memory to achieve stellar performance with the index on disk
too -- with the advantage that restarting Solr will not erase the
index.  RAMDirectory has problems of its own, which are also discussed
in this article:

http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html

If you switch to the typical default in 4.x and 5.x
(NRTCachingDirectoryFactory, which uses MMapDirectoryFactory) then you
can use a smaller heap.  The OS will take care of keeping the index in
memory, rather than Solr/Lucene.

Thanks,
Shawn

Re: Date Math, NOW and filter queries

2015-11-17 Thread Shawn Heisey

On 11/17/2015 7:07 AM, Mugeesh Husain wrote:
> when i fire query fq=date:[NOW/DAY-7DAYS TO NOW/DAY+1DAY], it is giving
> below error 
>
> "fq":"initial_release_date:[NOW/DAY-7DAYS TO NOW/DAY 1DAY]",
>   "rows":"32"}},
>   "error":{
> "msg":"org.apache.solr.search.SyntaxError: Cannot parse
> 'initial_release_date:[NOW/DAY-7DAYS TO NOW/DAY 1DAY]': Encountered \"
>  \"1DAY \"\" at line 1, column 47.\nWas expecting one of:\n   
> \"]\" ...\n\"}\" ...\n",
> "code":400}}

How are you sending the queries to Solr?  I'm betting that you are
constructing a URL manually in your own code and sending it to Solr
directly via HTTP, rather than using a Solr library like SolrJ,
Solarium, Sunspot, etc.

https://wiki.apache.org/solr/IntegratingSolr

What's happening here is that the query is not URL escaped, which means
the URL contains an actual plus sign for the "NOW/DAY+1DAY" part of your
query.  A plus sign in a URL is interpreted by the webserver as a space,
in accordance with the standards that govern HTTP.  This is not a bug.

A Solr library would have taken care of all the URL escaping for
characters that require it, and the user code is typically a lot easier
to write than URL construction code.

Whatever programming language you are using to construct your queries
very likely has a function for doing URL escaping on the parameters for
your URL.  You would want to be careful to only run it on the values in
your parameters, not the entire URL, or it would escape everything and
the URL would not work.

The URL escaped version of a plus sign is %2B if you want to quickly fix
this before you look into URL escaping functions or a Solr library.

Thanks,
Shawn

Re: Data Import Handler / Backup indexes

2015-11-17 Thread Brian Narsi

Sorry I forgot to mention that we are using SolrCloud 5.1.0.



On Tue, Nov 17, 2015 at 12:09 PM, KNitin  wrote:

> afaik Data import handler does not offer backups. You can try using the
> replication handler to backup data as you wish to any custom end point.
>
> You can also try out : https://github.com/bloomreach/solrcloud-haft.  This
> helps backup solr indices across clusters.
>
> On Tue, Nov 17, 2015 at 7:08 AM, Brian Narsi  wrote:
>
> > I am using Data Import Handler to retrieve data from a database with
> >
> > full-import, clean = true, commit = true and optimize = true
> >
> > This has always worked correctly without any errors.
> >
> > But just to be on the safe side, I am thinking that we should do a backup
> > before initiating Data Import Handler. And just in case something happens
> > restore the backup.
> >
> > Can backup be done automatically (before initiating Data Import Handler)?
> >
> > Thanks
> >
>

Re: Data Import Handler / Backup indexes

2015-11-17 Thread KNitin

afaik Data import handler does not offer backups. You can try using the
replication handler to backup data as you wish to any custom end point.

You can also try out : https://github.com/bloomreach/solrcloud-haft.  This
helps backup solr indices across clusters.

On Tue, Nov 17, 2015 at 7:08 AM, Brian Narsi  wrote:

> I am using Data Import Handler to retrieve data from a database with
>
> full-import, clean = true, commit = true and optimize = true
>
> This has always worked correctly without any errors.
>
> But just to be on the safe side, I am thinking that we should do a backup
> before initiating Data Import Handler. And just in case something happens
> restore the backup.
>
> Can backup be done automatically (before initiating Data Import Handler)?
>
> Thanks
>

Re: Query gives response multiple times

2015-11-17 Thread Shane McCarthy

Thank you for the speedy responses.

I will give the results I have found based on your comments.

In the schema.xml there are 19 dynamicField are specified.  The query and
the fields list I want are included in these 19 variables.  They are all
stored so based on your comment Erick I would assume that I should be
seeing them.  I am not.  Any ideas why I am not?

There are 10 copyField directives in the schema.xml.  The only one that
matches the variables with duplicated results is .

How can I find if I am using a custom request handler?  I had assumed I was
using the default as in the Request Handler box it has /select.

Thanks,

Shane







On Tue, Nov 17, 2015 at 12:52 PM, Alexandre Rafalovitch 
wrote:

> If you have access to the Admin UI, go to the Schema Browser field
> under the core (I assume Solr 4+ here, never actually asked for your
> version).
> https://cwiki.apache.org/confluence/display/solr/Schema+Browser+Screen
>
> You can see in the example that when you select a field, it will show
> whether it is copied to from other fields. That's the copyField I was
> talking about earlier. Check that first.
>
> You might also be able to get your schema.xml and solrconfig.xml on
> the Files screen:
> https://cwiki.apache.org/confluence/display/solr/Files+Screen. Check
> there for definitions. A definition for "fl" under "/select" or
> perhaps your custom request handler may restrict the fields you are
> getting.
>
> Regards,
>Alex.
> 
> Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
> http://www.solr-start.com/
>
>
> On 17 November 2015 at 11:32, Shane McCarthy  wrote:
> > Yes a complex situation and I am having trouble knowing who to ask for an
> > explanation.
> >
> > I have been given access to the Solr Admin page.   Using the persistent
> > identifier PID, I have done a query for PID = *. The PID that are found
> > with that query all have cml_hfenergy_md, a multivalued double field,
> > repeated 8 or 4 times.
> >
> > I have multiple string, multiple date fields that are also repeated
> >
> > There are single string and integer fields that are not repeated.
> >
> > Do you know if all results of the query are returned with the Solr Admin?
> > It tells me the numFound but does not give me the all the fields
> requested
> > for all the results.  Is this usual behaviour?
> >
> > Cheers,
> >
> > Shane
> >
> >
> > On Mon, Nov 16, 2015 at 7:11 PM, Alexandre Rafalovitch <
> arafa...@gmail.com>
> > wrote:
> >
> >> On 16 November 2015 at 17:40, Shane McCarthy  wrote:
> >> > I am using an instance of Islandora.
> >>
> >> Ah. This complicates the situation as there is an unknown - to most of
> >> us - layer in between. So, it is not clear whether this multiplication
> >> is happening in Solr or in Islandora.
> >>
> >> Your best option is to hit Solr server directly and basically do a
> >> query for a specific record's id with the fields that you are having a
> >> problem with. If that field for that record shows the problem the same
> >> way as through the full Islandora path, the problem is Solr. Then, you
> >> review the copyFields, etc. If it does not...
> >>
> >> Also, is this only happening with one "double" field but not another,
> >> with all "double" fields or with some other combination?
> >>
> >> And did it start at some point or was this always like that?
> >>
> >> You need to figure out something to contrast the observed behavior
> against.
> >>
> >> Regards,
> >> Alex.
> >>
> >>
> >>
> >> 
> >> Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
> >> http://www.solr-start.com/
> >>
>

Re: Date Math, NOW and filter queries

2015-11-17 Thread Chris Hostetter


: the '+' sign is a URL-escape for space, which you
: see in the error message.

more specifically, the error indicates that something/somewhere/somehow 
when you are construction your request to Solr, your HTTP request params 
are not getting properly escaped -- so the '+' is being sent literlay over 
the wire, and Solr is URL un-escaping it as whitespace.

: Escape it as %2B and you should be fine.

Rather then you manually "escaping" just the "+" character in your code, 
you should instead figure out where/how to ensure that *all* the HTTP 
communication you have with Solr gets properly escaped -- so you don't 
have this problem over and over with other params/characters.

if you can tell us a bit more about how you communicate with Solr, we can 
try to help you with the larger problem you are having.  (any decent HTTP 
client library should  automatically escape any request param keys/values 
you ask it to include in the request)

-Hoss
http://www.lucidworks.com/

Re: Solr/jetty and datasource

2015-11-17 Thread fabigol

sun.java.command start.jar --module=http from interface (properties Java)

I want that this line equals sun.java.commandstart.jar --module=http,jndi

How must i do



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-jetty-and-datasource-tp4240426p4240619.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: CloudSolrCloud - Commit returns but not all data is visible (occasionally)

2015-11-17 Thread Erick Erickson

That's what was behind my earlier comment about perhaps
the call is timing out, thus the commit call is returning
_before_ the actual searcher is opened. But the call
coming back is not a return from commit, but from Jetty
even though the commit hasn't really returned.

Just a guess however.

Best,
Erick

On Tue, Nov 17, 2015 at 12:11 AM, adfel70  wrote:
> Thanks Eric,
> I'll try to play with the autowarm config.
>
> But I have a more direct question - why does the commit return without
> waiting till the searchers are fully refreshed?
>
> Could it be that the parameter waitSearcher=true doesn't really work?
> or maybe I don't understand something here...
>
> Thanks,
>
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/CloudSolrCloud-Commit-returns-but-not-all-data-is-visible-occasionally-tp4240368p4240518.html
> Sent from the Solr - User mailing list archive at Nabble.com.

Re: solr 5.0 cloud ,leader's load is more higer than others

2015-11-17 Thread Erick Erickson

First of all, why are you using RAMDirectory? This is
NOT recommended except for very special
circumstances, and it will not increase search speed.

Before worrying about the CPU usage on the leader, I'd
like to understand why RAMDirectory is in the mix at all.

Best,
Erick

On Tue, Nov 17, 2015 at 12:55 AM, 初十  wrote:
> Is  anyone encounter the same problem? and how to solve it!
>
> 2015-11-17 16:52 GMT+08:00 初十 :
>
>>
>> Hello everyone!
>>
>> I use solr 5.0 with the RAMDirectory and a collection with 3 replication
>> and 1 shard,
>>
>> the leader‘s  load is more higer than others, is it a bug ?
>>

Re: Sold 4.10.4 dropping index on shutdown

2015-11-17 Thread Erick Erickson

Did you commit after indexing and before shutting down? Even if you didn't, I'm
still a bit surprised, but that's one possible explanation.

But this is the first time I've seen this problem mentioned...

Best,
Erick

On Tue, Nov 17, 2015 at 4:08 AM, Oliver Schrenk  wrote:
> Hi,
>
> since we upgraded our cluster from 4.7 to 4.10.4 we are experiencing issues. 
> When shutting down the service (with a confirmed graceful shutdown in the 
> logs), the index is dropped, with only one lonely `segments.gen` file left 
> for each shard and all other files being deleted.
>
> There is no message in the logs, other than graceful shutdown. Did anybody 
> have a similar issues and has some advice?
>
> Cheers,
> Oliver

Re: Date Math, NOW and filter queries

2015-11-17 Thread Erick Erickson

Congratulations, you are in "Url escaping hell" ;)

the '+' sign is a URL-escape for space, which you
see in the error message.

Escape it as %2B and you should be fine.

Best,
Erick

On Tue, Nov 17, 2015 at 6:07 AM, Mugeesh Husain  wrote:
> hi!,
>
> http://lucidworks.com/blog/2012/02/23/date-math-now-and-filter-queries/
>
> for date range query i am following above article,in this article
>
> I try to querying fq=date:[NOW/DAY-7DAYS TO NOW/DAY], it is working fine,
>
> when i fire query fq=date:[NOW/DAY-7DAYS TO NOW/DAY+1DAY], it is giving
> below error
>
> "fq":"initial_release_date:[NOW/DAY-7DAYS TO NOW/DAY 1DAY]",
>   "rows":"32"}},
>   "error":{
> "msg":"org.apache.solr.search.SyntaxError: Cannot parse
> 'initial_release_date:[NOW/DAY-7DAYS TO NOW/DAY 1DAY]': Encountered \"
>  \"1DAY \"\" at line 1, column 47.\nWas expecting one of:\n
> \"]\" ...\n\"}\" ...\n",
> "code":400}}
>
>
> why it is giving error
>
> Thanks
> mugeesh
>
>
>
>
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Date-Math-NOW-and-filter-queries-tp4240561.html
> Sent from the Solr - User mailing list archive at Nabble.com.

Re: Limiting number of parallel queries per user

2015-11-17 Thread Erick Erickson

This will be hard to do in SolrCloud assuming that the entire cluster
is fronted by a load balancer _or_ you're using CloudSolrClient
(CloudSolrServer in 4x) because the node that gets the highest-level
is distributed against all the Solr nodes.

I'd probably go more simply (assuming the above is not a problem).
I'd create a first-component that checks whether to let the query
continue. If so, increment your counter.

Then a last-component to decrement the counter for that user.

As far as the serial processing is concerned, sounds more like
your client is somehow serializing that. Solr should be handling
lots of queries simultaneously.

Best,
Erick

On Tue, Nov 17, 2015 at 6:51 AM, deansg  wrote:
> Hello,
> My team is trying to write a SearchComponent that will limit the amount of
> queries a certain user can run in parallel at any given moment. We want to
> do this to avoid a certain user from slowing Solr down to much.
>
>  In the search component, we can identify the user sending the request, and
> we keep a static map containing data on how many queries the user is
> currently running. In the prepare method we increment the entry in the map,
> and assert that it is not too big. In finishStage we decrement it.
>
> However, when i tried testing my component by running several queries in
> parallel from a single user, it seemed like Solr somehow runs the queries
> serially and not in parallel. Every time my component's code is executed it
> believes the user isnt running any other queries as finish Stage had been
> called on the previous query before prepare is being called on the new
> query. What do you think? Is writing such a component a bad idea? Am I
> missing something?
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Limiting-number-of-parallel-queries-per-user-tp4240566.html
> Sent from the Solr - User mailing list archive at Nabble.com.

Re: Query gives response multiple times

2015-11-17 Thread Alexandre Rafalovitch

If you have access to the Admin UI, go to the Schema Browser field
under the core (I assume Solr 4+ here, never actually asked for your
version). https://cwiki.apache.org/confluence/display/solr/Schema+Browser+Screen

You can see in the example that when you select a field, it will show
whether it is copied to from other fields. That's the copyField I was
talking about earlier. Check that first.

You might also be able to get your schema.xml and solrconfig.xml on
the Files screen:
https://cwiki.apache.org/confluence/display/solr/Files+Screen. Check
there for definitions. A definition for "fl" under "/select" or
perhaps your custom request handler may restrict the fields you are
getting.

Regards,
   Alex.

Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
http://www.solr-start.com/


On 17 November 2015 at 11:32, Shane McCarthy  wrote:
> Yes a complex situation and I am having trouble knowing who to ask for an
> explanation.
>
> I have been given access to the Solr Admin page.   Using the persistent
> identifier PID, I have done a query for PID = *. The PID that are found
> with that query all have cml_hfenergy_md, a multivalued double field,
> repeated 8 or 4 times.
>
> I have multiple string, multiple date fields that are also repeated
>
> There are single string and integer fields that are not repeated.
>
> Do you know if all results of the query are returned with the Solr Admin?
> It tells me the numFound but does not give me the all the fields requested
> for all the results.  Is this usual behaviour?
>
> Cheers,
>
> Shane
>
>
> On Mon, Nov 16, 2015 at 7:11 PM, Alexandre Rafalovitch 
> wrote:
>
>> On 16 November 2015 at 17:40, Shane McCarthy  wrote:
>> > I am using an instance of Islandora.
>>
>> Ah. This complicates the situation as there is an unknown - to most of
>> us - layer in between. So, it is not clear whether this multiplication
>> is happening in Solr or in Islandora.
>>
>> Your best option is to hit Solr server directly and basically do a
>> query for a specific record's id with the fields that you are having a
>> problem with. If that field for that record shows the problem the same
>> way as through the full Islandora path, the problem is Solr. Then, you
>> review the copyFields, etc. If it does not...
>>
>> Also, is this only happening with one "double" field but not another,
>> with all "double" fields or with some other combination?
>>
>> And did it start at some point or was this always like that?
>>
>> You need to figure out something to contrast the observed behavior against.
>>
>> Regards,
>> Alex.
>>
>>
>>
>> 
>> Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
>> http://www.solr-start.com/
>>

Re: Query gives response multiple times

2015-11-17 Thread Erick Erickson

As far as getting fields back when you specify the "fl"
parameter, only _stored_ fields (i.e stored="true" in
the schema) are available.

As far as your doubled (or more) fields, I'm 99% certain
that somehow your input process is doing this (I've
seen SQL do "surprising" things for instance). Or you're
copying multiple inputs to the field you're looking at via the
copyField directive).

Best,
Erick

On Tue, Nov 17, 2015 at 8:32 AM, Shane McCarthy  wrote:
> Yes a complex situation and I am having trouble knowing who to ask for an
> explanation.
>
> I have been given access to the Solr Admin page.   Using the persistent
> identifier PID, I have done a query for PID = *. The PID that are found
> with that query all have cml_hfenergy_md, a multivalued double field,
> repeated 8 or 4 times.
>
> I have multiple string, multiple date fields that are also repeated
>
> There are single string and integer fields that are not repeated.
>
> Do you know if all results of the query are returned with the Solr Admin?
> It tells me the numFound but does not give me the all the fields requested
> for all the results.  Is this usual behaviour?
>
> Cheers,
>
> Shane
>
>
> On Mon, Nov 16, 2015 at 7:11 PM, Alexandre Rafalovitch 
> wrote:
>
>> On 16 November 2015 at 17:40, Shane McCarthy  wrote:
>> > I am using an instance of Islandora.
>>
>> Ah. This complicates the situation as there is an unknown - to most of
>> us - layer in between. So, it is not clear whether this multiplication
>> is happening in Solr or in Islandora.
>>
>> Your best option is to hit Solr server directly and basically do a
>> query for a specific record's id with the fields that you are having a
>> problem with. If that field for that record shows the problem the same
>> way as through the full Islandora path, the problem is Solr. Then, you
>> review the copyFields, etc. If it does not...
>>
>> Also, is this only happening with one "double" field but not another,
>> with all "double" fields or with some other combination?
>>
>> And did it start at some point or was this always like that?
>>
>> You need to figure out something to contrast the observed behavior against.
>>
>> Regards,
>> Alex.
>>
>>
>>
>> 
>> Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
>> http://www.solr-start.com/
>>

Re: Why can a dynamic field ONLY start or end with '*' but not both?

2015-11-17 Thread Erick Erickson

Starting and ending globs were never officially supported, at
least as far back as 3.6. They were never programmatically
enforced either apparently.

This is from the 3.6 schema.xml:

So  it was not really discussed that I know about relative to the JIRAs you
mentioned, more like you were using a feature that just happened to work
despite explicitly being unsupported.

Best,
Erick

On Tue, Nov 17, 2015 at 8:39 AM, Frank Greguska  wrote:
> Hello,
>
> Prior to the implementation of SOLR-3251
> , it seems it was possible
> to create dynamic fields using multiple 'glob' characters.
>
> e.g. 
>
> Since this commit
> ,
> a constraint has been added that the name of a dynamic field must either
> start or end with '*', but not both. Was this decision discussed anywhere?
> It's causing a schema of mine to break after upgrading and I'm just curious
> if there was a technical reason for implementing this constraint.
>
> Thank you,
>
> Frank
>
> P.S. I've asked a related question
>  on StackOverflow.

Re: CloudSolrClient Connect To Zookeeper with ACL Protected files

2015-11-17 Thread Kevin Lee

Does anyone know if it is possible to set the ACL credentials in 
CloudSolrClient needed to access a protected resource in Zookeeper?

Thanks!

> On Nov 13, 2015, at 1:20 PM, Kevin Lee  wrote:
> 
> Hi,
> 
> Is there a way to use CloudSolrClient and connect to a Zookeeper instance 
> where ACL is enabled and resources/files like /live_nodes, etc are ACL 
> protected?  Couldn’t find a way to set the ACL credentials.
> 
> Thanks,
> Kevin

Why can a dynamic field ONLY start or end with '*' but not both?

2015-11-17 Thread Frank Greguska

Hello,

Prior to the implementation of SOLR-3251
, it seems it was possible
to create dynamic fields using multiple 'glob' characters.

e.g. 

Since this commit
,
a constraint has been added that the name of a dynamic field must either
start or end with '*', but not both. Was this decision discussed anywhere?
It's causing a schema of mine to break after upgrading and I'm just curious
if there was a technical reason for implementing this constraint.

Thank you,

Frank

P.S. I've asked a related question
 on StackOverflow.

Re: Query gives response multiple times

2015-11-17 Thread Shane McCarthy

Yes a complex situation and I am having trouble knowing who to ask for an
explanation.

I have been given access to the Solr Admin page.   Using the persistent
identifier PID, I have done a query for PID = *. The PID that are found
with that query all have cml_hfenergy_md, a multivalued double field,
repeated 8 or 4 times.

I have multiple string, multiple date fields that are also repeated

There are single string and integer fields that are not repeated.

Do you know if all results of the query are returned with the Solr Admin?
It tells me the numFound but does not give me the all the fields requested
for all the results.  Is this usual behaviour?

Cheers,

Shane

On Mon, Nov 16, 2015 at 7:11 PM, Alexandre Rafalovitch 
wrote:

> On 16 November 2015 at 17:40, Shane McCarthy  wrote:
> > I am using an instance of Islandora.
>
> Ah. This complicates the situation as there is an unknown - to most of
> us - layer in between. So, it is not clear whether this multiplication
> is happening in Solr or in Islandora.
>
> Your best option is to hit Solr server directly and basically do a
> query for a specific record's id with the fields that you are having a
> problem with. If that field for that record shows the problem the same
> way as through the full Islandora path, the problem is Solr. Then, you
> review the copyFields, etc. If it does not...
>
> Also, is this only happening with one "double" field but not another,
> with all "double" fields or with some other combination?
>
> And did it start at some point or was this always like that?
>
> You need to figure out something to contrast the observed behavior against.
>
> Regards,
> Alex.
>
>
>
> 
> Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
> http://www.solr-start.com/
>

Re: Best way to backup and restore an index for a cloud setup in 4.6.1?

2015-11-17 Thread KNitin

You can use solrcloud haft :
https://github.com/bloomreach/solrcloud-haft

We use it in our production against 4.6.1.

Nitin

On Monday, May 11, 2015, Shalin Shekhar Mangar 
wrote:

> Hi John,
>
> There are a few HTTP APIs for replication, one of which can let you take a
> backup of the index. Restoring can be as simple as just copying over the
> index in the right location on the disk. A new restore API will be released
> with the next version of Solr which will make some of these tasks easier.
>
> See
>
> https://cwiki.apache.org/confluence/display/solr/Index+Replication#IndexReplication-HTTPAPICommandsfortheReplicationHandler
>
> On Fri, May 8, 2015 at 10:26 PM, John Smith  > wrote:
>
> > All,
> >
> > With a cloud setup for a collection in 4.6.1, what is the most elegant
> way
> > to backup and restore an index?
> >
> > We are specifically looking into the application of when doing a full
> > reindex, with the idea of building an index on one set of servers,
> backing
> > up the index, and then restoring that backup on another set of servers.
> Is
> > there a better way to rebuild indexes on another set of servers?
> >
> > We are not sharding if that makes any difference.
> >
> > Thanks,
> > g10vstmoney
> >
>
>
>
> --
> Regards,
> Shalin Shekhar Mangar.
>

Re: search for documents where all words of field present in the query

2015-11-17 Thread Alexandre Rafalovitch

Are you sure your original description is not a reverse of your
use-case? Now, it seems like you just want mm=100 which means
"samsung" will match all entries, but "samsung 32G" will only match 3
of them.

https://wiki.apache.org/solr/DisMaxQParserPlugin#mm_.28Minimum_.27Should.27_Match.29

Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
http://www.solr-start.com/


On 17 November 2015 at 10:28, superjim  wrote:
> Thank you so match for answer!
>
> I'm check Luwak solution.
>
> By business case is very common and simple.
>
> 1) user search for products.
> sample real query: smartphone samsung s3 black 32G
>
> 2) i have really big database of products.
> I want return to user all products from my database:
> "Samsung s3 32g BLACK"
> "Samsung s3 BLACK"
> "Samsung s3 32G"
>
> I also have products like (must not be in result!):
> "Smartphone Samsung s5"
> "Samsung s6 black"
> "Samsung s6 32G black"
>
> so i want: ALL PRODUCTS HAS ALL WORDS IN USER QUERY
>
> Are you sure that this is not impossible to do with SOLR?
> I am already using solr for suggestions and it works perfect!
>
>
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/search-for-documents-where-all-words-of-field-present-in-the-query-tp4240564p4240569.html
> Sent from the Solr - User mailing list archive at Nabble.com.

Re: search for documents where all words of field present in the query

2015-11-17 Thread superjim

Thank you so match for answer!

I'm check Luwak solution.

By business case is very common and simple.

1) user search for products.
sample real query: smartphone samsung s3 black 32G

2) i have really big database of products.
I want return to user all products from my database:
"Samsung s3 32g BLACK"
"Samsung s3 BLACK"
"Samsung s3 32G"

I also have products like (must not be in result!):
"Smartphone Samsung s5"
"Samsung s6 black"
"Samsung s6 32G black"

so i want: ALL PRODUCTS HAS ALL WORDS IN USER QUERY

Are you sure that this is not impossible to do with SOLR?
I am already using solr for suggestions and it works perfect!





--
View this message in context: 
http://lucene.472066.n3.nabble.com/search-for-documents-where-all-words-of-field-present-in-the-query-tp4240564p4240569.html
Sent from the Solr - User mailing list archive at Nabble.com.

Data Import Handler / Backup indexes

2015-11-17 Thread Brian Narsi

I am using Data Import Handler to retrieve data from a database with

full-import, clean = true, commit = true and optimize = true

This has always worked correctly without any errors.

But just to be on the safe side, I am thinking that we should do a backup
before initiating Data Import Handler. And just in case something happens
restore the backup.

Can backup be done automatically (before initiating Data Import Handler)?

Thanks

Limiting number of parallel queries per user

2015-11-17 Thread deansg

Hello, 
My team is trying to write a SearchComponent that will limit the amount of
queries a certain user can run in parallel at any given moment. We want to
do this to avoid a certain user from slowing Solr down to much.

 In the search component, we can identify the user sending the request, and
we keep a static map containing data on how many queries the user is
currently running. In the prepare method we increment the entry in the map,
and assert that it is not too big. In finishStage we decrement it.

However, when i tried testing my component by running several queries in
parallel from a single user, it seemed like Solr somehow runs the queries
serially and not in parallel. Every time my component's code is executed it
believes the user isnt running any other queries as finish Stage had been
called on the previous query before prepare is being called on the new
query. What do you think? Is writing such a component a bad idea? Am I
missing something?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Limiting-number-of-parallel-queries-per-user-tp4240566.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: search for documents where all words of field present in the query

2015-11-17 Thread Alexandre Rafalovitch

This sounds more like a use case for https://github.com/flaxsearch/luwak

Or a variation of Ted Sullivan's work:
http://lucidworks.com/blog/author/tedsullivan/

I do not think this can be done in Solr directly. If your matched
fields were always 2-tokens, you could do complex mm param. If the
words in the query always appear in the same order as in the field,
you probably do some sort of auto-phrasing or n-gram matching. But
just as you described - it is unlikely.

Can you explain your business case, maybe there are different ways to
index the data to match the requirements.

Regards,
   Alex.

Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
http://www.solr-start.com/

On 17 November 2015 at 09:46, superjim  wrote:
> There the same questions I've found in google:
>
> Solr query must match all words/tokens in a field
> http://stackoverflow.com/questions/10508078/solr-query-must-match-all-words-tokens-in-a-field
>
> Syntax for query where all words in field must be present in query
> http://stackoverflow.com/questions/18390892/syntax-for-query-where-all-words-in-field-must-be-present-in-query
>
>
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/search-for-documents-where-all-words-of-field-present-in-the-query-tp4240564p4240565.html
> Sent from the Solr - User mailing list archive at Nabble.com.

Re: search for documents where all words of field present in the query

2015-11-17 Thread superjim

There the same questions I've found in google:

Solr query must match all words/tokens in a field
http://stackoverflow.com/questions/10508078/solr-query-must-match-all-words-tokens-in-a-field

Syntax for query where all words in field must be present in query
http://stackoverflow.com/questions/18390892/syntax-for-query-where-all-words-in-field-must-be-present-in-query





--
View this message in context: 
http://lucene.472066.n3.nabble.com/search-for-documents-where-all-words-of-field-present-in-the-query-tp4240564p4240565.html
Sent from the Solr - User mailing list archive at Nabble.com.

search for documents where all words of field present in the query

2015-11-17 Thread superjim

How would I form a query where all of the words in a field must be present in
the query (but possibly more). For example, if I have the following words in
a text field: "John Smith"

A query for "John" should return no results

A query for "Smith" should return no results

A query for "John Smith" should return that one result

A query for "banana John Smith purple monkey dishwasher" should return that
one result





--
View this message in context: 
http://lucene.472066.n3.nabble.com/search-for-documents-where-all-words-of-field-present-in-the-query-tp4240564.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Multiple unique key in Schema

2015-11-17 Thread Mugeesh Husain

>>Or perhaps use the UUID auto id feature. 
if i use  UUID, then how i can update particular document, i think using
this ,there will not any document identity



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Multiple-unique-key-in-Schema-tp4240550p4240563.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: DIH Caching w/ BerkleyBackedCache

2015-11-17 Thread Todd Long

Mikhail Khludnev wrote
> It's worth to mention that for really complex relations scheme it might be
> challenging to organize all of them into parallel ordered streams.

This will most likely be the issue for us which is why I would like to have
the Berkley cache solution to fall back on, if possible. Again, I'm not sure
why but it appears that the Berkley cache is overwriting itself (i.e.
cleaning up unused data) when building the database... I've read plenty of
other threads where it appears folks are having success using that caching
solution.


Mikhail Khludnev wrote
> threads... you said? Which ones? Declarative parallelization in
> EntityProcessor worked only with certain 3.x version.

We are running multiple DIH instances which query against specific
partitions of the data (i.e. mod of the document id we're indexing).



--
View this message in context: 
http://lucene.472066.n3.nabble.com/DIH-Caching-w-BerkleyBackedCache-tp4240142p4240562.html
Sent from the Solr - User mailing list archive at Nabble.com.

Date Math, NOW and filter queries

2015-11-17 Thread Mugeesh Husain

hi!,

http://lucidworks.com/blog/2012/02/23/date-math-now-and-filter-queries/

for date range query i am following above article,in this article

I try to querying fq=date:[NOW/DAY-7DAYS TO NOW/DAY], it is working fine,

when i fire query fq=date:[NOW/DAY-7DAYS TO NOW/DAY+1DAY], it is giving
below error 

"fq":"initial_release_date:[NOW/DAY-7DAYS TO NOW/DAY 1DAY]",
  "rows":"32"}},
  "error":{
"msg":"org.apache.solr.search.SyntaxError: Cannot parse
'initial_release_date:[NOW/DAY-7DAYS TO NOW/DAY 1DAY]': Encountered \"
 \"1DAY \"\" at line 1, column 47.\nWas expecting one of:\n   
\"]\" ...\n\"}\" ...\n",
"code":400}}


why it is giving error 

Thanks
mugeesh

 
 




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Date-Math-NOW-and-filter-queries-tp4240561.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Multiple unique key in Schema

2015-11-17 Thread Mugeesh Husain

>>Or perhaps use the UUID auto id feature. 
if i use  UUID, then how i can update particular document, i think using
this ,there will not any document identity 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Multiple-unique-key-in-Schema-tp4240550p4240557.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Multiple unique key in Schema

2015-11-17 Thread Erik Hatcher

Make each document have a composite unique key: user-1, user-2, review-1,... 
Etc. 

Easier said than done if you're just posting the CSV directly to Solr but an 
update script could help. 

Or perhaps use the UUID auto id feature. 

  Erik

> On Nov 17, 2015, at 08:14, Mugeesh Husain  wrote:
> 
> Hi!
> 
> I have a 3 csv table,
> 1.)retuarant 
> 2.)User
> 3.)Review
> 
> every csv have a unique key, then how i can configure multiple unique key in
> solr
> 
> 
> 
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Multiple-unique-key-in-Schema-tp4240550.html
> Sent from the Solr - User mailing list archive at Nabble.com.

Re: Multiple unique key in Schema

2015-11-17 Thread Alexandre Rafalovitch

When you index into Solr, you are overlapping the definitions into one
schema. Therefore, you will need a unified uniqueKey.

There is a couple of approaches:
1) Maybe you don't actually store the data as three types of entities.
Think about what you will want to find and structure the data to
match. Doing JOINS in Solr is a bad idea, even if sometimes possible
2) Make a compositeKey as unique key by adding type prefix to your key
ids when exporting to SQL (select concat('r',id), .)
3) Make a compositeKey as unique key by using UpdateRequestProcessors
to manipulate the value of the uniqueKey field. You'd need three
different update chains to apply different prefixes, but you can pass
the chain name as a request parameter. You can find the full list of
the URPs at: http://www.solr-start.com/info/update-request-processors/

Regards,
Alex.

Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
http://www.solr-start.com/

On 17 November 2015 at 08:14, Mugeesh Husain  wrote:
> Hi!
>
> I have a 3 csv table,
> 1.)retuarant
> 2.)User
> 3.)Review
>
> every csv have a unique key, then how i can configure multiple unique key in
> solr
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Multiple-unique-key-in-Schema-tp4240550.html
> Sent from the Solr - User mailing list archive at Nabble.com.

Multiple unique key in Schema

2015-11-17 Thread Mugeesh Husain

Hi!

I have a 3 csv table,
1.)retuarant 
2.)User
3.)Review

every csv have a unique key, then how i can configure multiple unique key in
solr



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Multiple-unique-key-in-Schema-tp4240550.html
Sent from the Solr - User mailing list archive at Nabble.com.

Sold 4.10.4 dropping index on shutdown

2015-11-17 Thread Oliver Schrenk

Hi,

since we upgraded our cluster from 4.7 to 4.10.4 we are experiencing issues. 
When shutting down the service (with a confirmed graceful shutdown in the 
logs), the index is dropped, with only one lonely `segments.gen` file left for 
each shard and all other files being deleted. 

There is no message in the logs, other than graceful shutdown. Did anybody have 
a similar issues and has some advice?

Cheers,
Oliver

Re: Undo Split Shard

2015-11-17 Thread Jan Høydahl

Stop Solr.
Then use zkcli - 
https://cwiki.apache.org/confluence/display/solr/Command+Line+Utilities
You want to first do getfile for state.json, then modify it, then putfile to 
upload it again.
Start Solr

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com

> 17. nov. 2015 kl. 09.36 skrev kiyer_adobe :
> 
> Thanks Emir. How do I update the cluster state in zk? Is there an API for it?
> 
> 
> 
> 
> 
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Undo-Split-Shard-tp4240508p4240523.html
> Sent from the Solr - User mailing list archive at Nabble.com.

Re: solr 5.0 cloud ,leader's load is more higer than others

2015-11-17 Thread 初十

Is  anyone encounter the same problem? and how to solve it!

2015-11-17 16:52 GMT+08:00 初十 :

>
> Hello everyone!
>
> I use solr 5.0 with the RAMDirectory and a collection with 3 replication
> and 1 shard,
>
> the leader‘s  load is more higer than others, is it a bug ?
>

solr 5.0 cloud ,leader's load is more higer than others

2015-11-17 Thread 初十

Hello everyone!

I use solr 5.0 with the RAMDirectory and a collection with 3 replication
and 1 shard,

the leader‘s  load is more higer than others, is it a bug ?

Re: Undo Split Shard

2015-11-17 Thread kiyer_adobe

Thanks Emir. How do I update the cluster state in zk? Is there an API for it?





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Undo-Split-Shard-tp4240508p4240523.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Undo Split Shard

2015-11-17 Thread Emir Arnautovic


Hi,
You can try manually adjusting cluster state in ZK to include parent 
shard and exclude splits, reload collection and try split again.


Btw. any error in logs when split failed?

Thanks,
Emir

On 17.11.2015 07:08, kiyer_adobe wrote:

We had 32 shards of 30GB each. The query performance was awful. We decided to
split shards for all of them. Most of them went fine but for 3 shards that
got split with _1 to 16GB but _0 in low MB's. The _1 is
fine but _0 is definitely wrong. The parent shard is inactive and now
the split shards are active.
I tried deleteshard on the splitshards to split it again but it does not all
deleteshard on active shards. Running splitshard again on parent shard
failed.

I am unsure of what the options are at this point and query went from bad
performance to not working at all.

Please advise.

Thanks.




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Undo-Split-Shard-tp4240508.html
Sent from the Solr - User mailing list archive at Nabble.com.


--
Monitoring * Alerting * Anomaly Detection * Centralized Log Management
Solr & Elasticsearch Support * http://sematext.com/

Re: Re: how to join search mutiple collection in sorlcloud

2015-11-17 Thread Paul Blanchaert

When you want the results of 'b' in the results of the join, you'll have to
reconsider and merge 'b' into 'a' as suggested by Erick... This because the
results of the join are not a combination of the 2 collections (as with "
select a*,b.* " ).
In the 'search' world, you can look at a join as a fq (filter on 'to' field
with 'to'='from' field of query in fromindex "b") on the query in
collection "a". So results from query in fromindex "b" are not
used/retained when querying "a" except for the values used for setting the
'to' field in the filter.




--


Kind regards,

Paul Blanchaert 

www.search-solutions.net

Tel: +32 497 05.01.03

[image: View my profile on LinkedIn]
view
 or connect


--

Please consider the environment before printing this e-mail.

This message is explicitly subject to the conditions of the e-mail
disclaimer, available via the following link: mail-disclaimer
. If you are
unable to consult this e-mail disclaimer, please notify the sender at once.

On 17 November 2015 at 09:05, soledede_w...@ehsy.com  wrote:

>
> Yes,Thanks,But It just support (select * from A where A.id in(select id
> from B where...)) or not
> I hope It is (select a*,b.* from A a join B b on A.id = B.id)
> How to merge the result of shards
>
> Thanks
>
>
> soledede_w...@ehsy.com
>
> From: Paul Blanchaert
> Date: 2015-11-17 15:57
> To: solr-user
> Subject: Re: Re: how to join search mutiple collection in sorlcloud
> You might want to take a look at/follow up upon SOLR-8297
> 
>
>
> On Tue, 17 Nov 2015 at 04:14 soledede_w...@ehsy.com <
> soledede_w...@ehsy.com>
> wrote:
>
> >
> > Thanks Erick
> >
> > I think if can use hash for doc_id to all shards, then do join,Last merge
> > the result in a node.
> >
> >
> > soledede_w...@ehsy.com
> >
> > From: Erick Erickson
> > Date: 2015-11-17 11:10
> > To: solr-user
> > Subject: Re: how to join search mutiple collection in sorlcloud
> > In a word, no. At least probably not.
> >
> > There are some JIRA tickets dealing with distributed joins, and some
> > with certain restrictions, specifically if the second (from)
> > collection can be reproduced on every slice of the first (to)
> > collection.
> >
> > In the trunk (6.0), there's the ParallelSQL stuff which has some
> > relevance, but it's still not a full RDBMS type join.
> >
> > The usual recommendation is to flatten your data if at all possible so
> > you don't _have_ two collections.
> >
> > Solr is a wonderful search engine. It is not an RDBMS and whenever I
> > find myself trying to make it behave like an RDBMS I try to rethink
> > the architecture.
> >
> > On Mon, Nov 16, 2015 at 6:56 PM, soledede_w...@ehsy.com
> >  wrote:
> > > Dear @solr_lucene
> > > currently,I am using solr5.3.1,I have a requirement, I need search like
> > in relation database(select * from A ,B where A.id=B.id),Can we implments
> > with solr5.3 in SolrCloud mode,I have two collection,2 shards per
> > collection.
> > >   Help me please.
> > >
> > > Thanks
> > >
> > >
> > > soledede_w...@ehsy.com
> >
> --
> --
>
>
> Kind regards,
>
> Paul Blanchaert 
> www.search-solutions.net
> Tel: +32 497 05.01.03
> [image: View my profile on LinkedIn]
> view
>  or connect
> <
> https://www.linkedin.com/inviteFromProfile?from=profile&key=682342&firstName=Paul&lastName=Blanchaert
> >
>
> --
>
> Please consider the environment before printing this e-mail.
>
> This message is explicitly subject to the conditions of the e-mail
> disclaimer, available via the following link: mail-disclaimer
> . If you are
> unable to consult this e-mail disclaimer, please notify the sender at once.
>

Re: CloudSolrCloud - Commit returns but not all data is visible (occasionally)

2015-11-17 Thread adfel70

Thanks Eric,
I'll try to play with the autowarm config.

But I have a more direct question - why does the commit return without
waiting till the searchers are fully refreshed?

Could it be that the parameter waitSearcher=true doesn't really work?
or maybe I don't understand something here...

Thanks,




--
View this message in context: 
http://lucene.472066.n3.nabble.com/CloudSolrCloud-Commit-returns-but-not-all-data-is-visible-occasionally-tp4240368p4240518.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Re: how to join search mutiple collection in sorlcloud

2015-11-17 Thread soledede_w...@ehsy.com


Yes,Thanks,But It just support (select * from A where A.id in(select id from B 
where...)) or not
I hope It is (select a*,b.* from A a join B b on A.id = B.id)
How to merge the result of shards

Thanks


soledede_w...@ehsy.com
 
From: Paul Blanchaert
Date: 2015-11-17 15:57
To: solr-user
Subject: Re: Re: how to join search mutiple collection in sorlcloud
You might want to take a look at/follow up upon SOLR-8297

 
 
On Tue, 17 Nov 2015 at 04:14 soledede_w...@ehsy.com 
wrote:
 
>
> Thanks Erick
>
> I think if can use hash for doc_id to all shards, then do join,Last merge
> the result in a node.
>
>
> soledede_w...@ehsy.com
>
> From: Erick Erickson
> Date: 2015-11-17 11:10
> To: solr-user
> Subject: Re: how to join search mutiple collection in sorlcloud
> In a word, no. At least probably not.
>
> There are some JIRA tickets dealing with distributed joins, and some
> with certain restrictions, specifically if the second (from)
> collection can be reproduced on every slice of the first (to)
> collection.
>
> In the trunk (6.0), there's the ParallelSQL stuff which has some
> relevance, but it's still not a full RDBMS type join.
>
> The usual recommendation is to flatten your data if at all possible so
> you don't _have_ two collections.
>
> Solr is a wonderful search engine. It is not an RDBMS and whenever I
> find myself trying to make it behave like an RDBMS I try to rethink
> the architecture.
>
> On Mon, Nov 16, 2015 at 6:56 PM, soledede_w...@ehsy.com
>  wrote:
> > Dear @solr_lucene
> > currently,I am using solr5.3.1,I have a requirement, I need search like
> in relation database(select * from A ,B where A.id=B.id),Can we implments
> with solr5.3 in SolrCloud mode,I have two collection,2 shards per
> collection.
> >   Help me please.
> >
> > Thanks
> >
> >
> > soledede_w...@ehsy.com
>
-- 
--
 
 
Kind regards,
 
Paul Blanchaert 
www.search-solutions.net
Tel: +32 497 05.01.03
[image: View my profile on LinkedIn]
view
 or connect

 
--
 
Please consider the environment before printing this e-mail.
 
This message is explicitly subject to the conditions of the e-mail
disclaimer, available via the following link: mail-disclaimer
. If you are
unable to consult this e-mail disclaimer, please notify the sender at once.

64 matches

Mail list logo