Does more shards in core improve performance?

2015-09-17 Thread Zheng Lin Edwin Yeo
Hi,

Would like to check, does creating more shards for the core improve the
overall performance? I'm using Solr 5.3.0.

I tried the indexing for a core with 1 shard and another core with 2
shards, but both are taking the same amount of time to do the indexing.

Currently, both my shards are in the same machine. Will the performance be
improved if the shards are located in different machine?


Regards,
Edwin


Re: 5 second timeout in bin/solr stop command

2015-09-17 Thread Ere Maijala

16.9.2015, 16.16, Shawn Heisey kirjoitti:

I agree here.  I don't like the forceful termination unless it becomes
truly necessary.

I changed the timeout to 20 seconds in the script installed in
/etc/init.d ... a bit of a brute force approach.  When I find some time,
I will think about how to make this better, and choose a better default
value.  30 seconds is probably good.  It should also be configurable,
probably in the /var/solr/solr.in.sh config fragment.


Thanks, Shawn. Insprired by this I filed an issue and attached a patch 
in Jira, see https://issues.apache.org/jira/browse/SOLR-8065. The patch 
makes the stop function behave like start so that it waits up to 30 
seconds for the process to shut down and checks the status once a 
second. I didn't make the timeout configurable since I think 30 seconds 
should be enough in any situation (this may be a statement I'll regret 
later..) and the script doesn't wait any longer than necessary. But if 
you find that a necessity, it shouldn't be too difficult to add.


--Ere

--
Ere Maijala
Kansalliskirjasto / The National Library of Finland


Re: Does more shards in core improve performance?

2015-09-17 Thread Toke Eskildsen
On Thu, 2015-09-17 at 12:04 +0530, Shalin Shekhar Mangar wrote:
> Yes, of course, the only reason to have more shards is so that they
> can reside on different machines (or use different disks, assuming you
> have enough CPU/memory etc) so that you can scale your indexing
> throughput.

For indexing, true. Due to Solr's 1-request-1-thread nature, sharding on
the same hardware can be used to lower latency for CPU-heavy searches.

We are running 25 shards/machine, where the machines has 16HT CPU-cores.
Granted we also do it due to the pesky 2 billion limit, but the result
is that the CPU-cores are nicely utilized with our low queries/second
usage pattern.

- Toke Eskildsen, State and University Library, Denmark




Solr 'rq' parameter with QParserPlugin

2015-09-17 Thread Ajinkya Kale
Hi all,

I have an existing custom QParserPlugin which uses my custom implementation
of the CustomScoreQuery in its parse() method.

I am trying to see if I can use this with the rq parameter to re-rank the
top N documents after the first default ranking. All I found till now is
you can use either the "rerank" RankQuery implementation or else I will
have to implement my own ReRankQParserPlugin.
I am evaluating, instead of writing my custom ReRankQParserPlugin, can I
use my custom QParserPlugin to provide scoring function for the top N
documents.

--aj


Re: solr.SynonymFilterFactory

2015-09-17 Thread Koji Sekiguchi

Hi Vincenzo,

By intuition, regardless of what value you set for attributes such as expand or 
ignoreCase,
I think synonym records that LHS==RHS are meaningless. That is, you can remove 
these lines.

Koji


On 2015/09/17 16:51, Vincenzo D'Amore wrote:

Hello,

this may be a silly question.
I have found a synonyms file with a lot of cases where LHS is equal to RHS.

airmax=>airmax
airplane=>airplane
airwell=>airwell
akai=>akai
akasa=>akasa
akea=>akea
akg=>akg

Given that the solr.SynonymFilterFactory is configured with expand="false"
ignoreCase="true"

May I remove all these lines?

Bests,
Vincenzo







Re: solr.SynonymFilterFactory

2015-09-17 Thread Alessandro Benedetti
I can not see any reason in keeping those lines, they are actually identity
mapping !

Cheers

2015-09-17 8:51 GMT+01:00 Vincenzo D'Amore :

> Hello,
>
> this may be a silly question.
> I have found a synonyms file with a lot of cases where LHS is equal to RHS.
>
> airmax=>airmax
> airplane=>airplane
> airwell=>airwell
> akai=>akai
> akasa=>akasa
> akea=>akea
> akg=>akg
>
> Given that the solr.SynonymFilterFactory is configured with expand="false"
> ignoreCase="true"
>
> May I remove all these lines?
>
> Bests,
> Vincenzo
>
>
> --
> Vincenzo D'Amore
> email: v.dam...@gmail.com
> skype: free.dev
> mobile: +39 349 8513251
>



-- 
--

Benedetti Alessandro
Visiting card - http://about.me/alessandro_benedetti
Blog - http://alexbenedetti.blogspot.co.uk

"Tyger, tyger burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symmetry?"

William Blake - Songs of Experience -1794 England


Re: Does more shards in core improve performance?

2015-09-17 Thread Shalin Shekhar Mangar
Yes, of course, the only reason to have more shards is so that they
can reside on different machines (or use different disks, assuming you
have enough CPU/memory etc) so that you can scale your indexing
throughput. Move one of them to a different machine and measure the
performance.

On Thu, Sep 17, 2015 at 11:32 AM, Zheng Lin Edwin Yeo
 wrote:
> Hi,
>
> Would like to check, does creating more shards for the core improve the
> overall performance? I'm using Solr 5.3.0.
>
> I tried the indexing for a core with 1 shard and another core with 2
> shards, but both are taking the same amount of time to do the indexing.
>
> Currently, both my shards are in the same machine. Will the performance be
> improved if the shards are located in different machine?
>
>
> Regards,
> Edwin



-- 
Regards,
Shalin Shekhar Mangar.


Re: SolrCloud clarification/Question

2015-09-17 Thread Upayavira
and replicationFactor is the number of copies of your data, not the
number of servers marked 'replica'. So as has been said, if you have one
leader, and three replicas, your replicationFactor will be 4.

Upayavira

On Thu, Sep 17, 2015, at 03:29 AM, Erick Erickson wrote:
> Ravi:
> 
> Sameer is correct on how to get it done in one go.
> 
> Don't get too hung up on replicationFactor. You can always
> ADDREPLICA after the collection is created if you need to.
> 
> Best,
> Erick
> 
> 
> On Wed, Sep 16, 2015 at 12:44 PM, Sameer Maggon
>  wrote:
> > I just gave an example API call, but for your scenario, the
> > replicationFactor will be 4 (replicationFactor=4). In this way, all 4
> > machines will have the same copy of the data and you can put an LB in front
> > of those 4 machines.
> >
> > On Wed, Sep 16, 2015 at 12:00 PM, Ravi Solr  wrote:
> >
> >> OK...I understood numShards=1, when you say replicationFactor=2 what does
> >> it mean ? I have 4 machines, then, only 3 copies of data (1 at leader and 2
> >> replicas) ?? so am i not under utilizing one machine ?
> >>
> >> I was more thinking in the lines of a Mesh connectivity format i.e.
> >> everybody has others copy so that I can put all 4 machines behind a Load
> >> Balancer...Is that a wrong way to look at it ?
> >>
> >> Thanks
> >>
> >> Ravi Kiran
> >>
> >> On Wed, Sep 16, 2015 at 2:51 PM, Sameer Maggon 
> >> wrote:
> >>
> >> > You'll have to say numShards=1 and replicationFactor=2.
> >> >
> >> > http://
> >> >
> >> >
> >> [hostname]:8983/solr/admin/collections?action=CREATE=test=test=1=2
> >> >
> >> > On Wed, Sep 16, 2015 at 11:23 AM, Ravi Solr  wrote:
> >> >
> >> > > Thank you very much for responding Sameer so numShards=0 and
> >> > > replicationFactr=4 if I have 4 machines ??
> >> > >
> >> > > Thanks
> >> > >
> >> > > Ravi Kiran Bhaskar
> >> > >
> >> > > On Wed, Sep 16, 2015 at 12:56 PM, Sameer Maggon <
> >> > sam...@measuredsearch.com
> >> > > >
> >> > > wrote:
> >> > >
> >> > > > Absolutely. You can have a collection with just replicas and no
> >> shards
> >> > > for
> >> > > > redundancy and have a load balancer in front of it that removes the
> >> > > > dependency on a single node. One of them will assume the role of a
> >> > > leader,
> >> > > > and in case that leader goes down, one of the replicas will be
> >> elected
> >> > > as a
> >> > > > leader and your application will be fine.
> >> > > >
> >> > > > Thanks,
> >> > > >
> >> > > > On Wed, Sep 16, 2015 at 9:44 AM, Ravi Solr 
> >> wrote:
> >> > > >
> >> > > > > Hello,
> >> > > > >  We are trying to move away from Master-Slave configuration
> >> > to
> >> > > a
> >> > > > > SolrCloud environment. I have a couple of questions. Currently in
> >> the
> >> > > > > Master-Slave setup we have 4 Machines 2 of which are indexers and 2
> >> > of
> >> > > > them
> >> > > > > are query servers. The query servers are fronted via Load Balancer.
> >> > > > >
> >> > > > > There are 3 solr cores for 3 different/separate applications
> >> > (mutually
> >> > > > > exclusive). Each core is a complete index of all docs (i.e. the
> >> data
> >> > is
> >> > > > not
> >> > > > > sharded).
> >> > > > >
> >> > > > >   We intend to keep it in a non-sharded mode even after the
> >> > > SolrCloud
> >> > > > > mode.The prime motivation to move to cloud is to effectively use
> >> all
> >> > > > > servers for indexing and querying (read fault tolerant/redundant).
> >> > > > >
> >> > > > > So, the real question is, can SolrCloud be used without shards ?
> >> > i.e. a
> >> > > > > "collection" resides entirely on one machine rather than
> >> partitioning
> >> > > > data
> >> > > > > onto different machines ?
> >> > > > >
> >> > > > > Thanks
> >> > > > >
> >> > > > > Ravi Kiran Bhaskar
> >> > > > >
> >> > > >
> >> > > >
> >> > > >
> >> > > > --
> >> > > > *Sameer Maggon*
> >> > > > Measured Search
> >> > > > c: 310.344.7266
> >> > > > www.measuredsearch.com 
> >> > > >
> >> > >
> >> >
> >> >
> >> >
> >> > --
> >> > *Sameer Maggon*
> >> > Measured Search
> >> > c: 310.344.7266
> >> > www.measuredsearch.com 
> >> >
> >>
> >
> >
> >
> > --
> > *Sameer Maggon*
> > Measured Search
> > c: 310.344.7266
> > www.measuredsearch.com 


solr.SynonymFilterFactory

2015-09-17 Thread Vincenzo D'Amore
Hello,

this may be a silly question.
I have found a synonyms file with a lot of cases where LHS is equal to RHS.

airmax=>airmax
airplane=>airplane
airwell=>airwell
akai=>akai
akasa=>akasa
akea=>akea
akg=>akg

Given that the solr.SynonymFilterFactory is configured with expand="false"
ignoreCase="true"

May I remove all these lines?

Bests,
Vincenzo


-- 
Vincenzo D'Amore
email: v.dam...@gmail.com
skype: free.dev
mobile: +39 349 8513251


Re: solr.SynonymFilterFactory

2015-09-17 Thread Vincenzo D'Amore
Thanks a lot guys, I wanted just to be extremely sure don't break anything.

On Thu, Sep 17, 2015 at 10:14 AM, Koji Sekiguchi <
koji.sekigu...@rondhuit.com> wrote:

> Hi Vincenzo,
>
> By intuition, regardless of what value you set for attributes such as
> expand or ignoreCase,
> I think synonym records that LHS==RHS are meaningless. That is, you can
> remove these lines.
>
> Koji
>
>
>
> On 2015/09/17 16:51, Vincenzo D'Amore wrote:
>
>> Hello,
>>
>> this may be a silly question.
>> I have found a synonyms file with a lot of cases where LHS is equal to
>> RHS.
>>
>> airmax=>airmax
>> airplane=>airplane
>> airwell=>airwell
>> akai=>akai
>> akasa=>akasa
>> akea=>akea
>> akg=>akg
>>
>> Given that the solr.SynonymFilterFactory is configured with expand="false"
>> ignoreCase="true"
>>
>> May I remove all these lines?
>>
>> Bests,
>> Vincenzo
>>
>>
>>
>
>


-- 
Vincenzo D'Amore
email: v.dam...@gmail.com
skype: free.dev
mobile: +39 349 8513251


Re: Securing solr 5.2 basic auth permission rules

2015-09-17 Thread Aziz Gaou
thank you so much for your reply

2015-09-16 18:58 GMT+00:00 Anshum Gupta :

> Basic authentication (and the API support, that you're trying to use) was
> only released with 5.3.0 so it wouldn't work with 5.2.
> 5.2 only had the authentication and authorization frameworks, and shipped
> with Kerberos authentication plugin out of the box.
>
> There are a few known issues with that though, and a 5.3.1 release is just
> around the corner.
>
> On Wed, Sep 16, 2015 at 10:11 AM, Aziz Gaou  wrote:
>
> > Hi,
> >
> > I try to follow:
> >
> >
> https://cwiki.apache.org/confluence/display/solr/Basic+Authentication+Plugin
> > ,
> > to protect Solr 5.2 Admin with password, but I have not been able to
> > secure.
> >
> > 1) When I run the following command:
> >
> > curl --user solr:SolrRocks
> http://localhost:8983/solr/admin/authentication
> > -H 'Content-type:application/json'-d '{
> >   "set-user": {"tom" : "TomIsCool" }}'
> >
> > no update on the file security.json
> >
> > 2) I launched the following 2 commands:
> >
> > curl --user solr:SolrRocks
> http://localhost:8983/solr/admin/authorization
> > -H 'Content-type:application/json'-d '{"set-permission": {
> > "name":"updates", "collection":"MyCollection", "role": "dev"}}'
> >
> > curl --user solr:SolrRocks
> http://localhost:8983/solr/admin/authorization
> > -H 'Content-type:application/json' -d '{ "set-user-role":
> {"tom":["dev"}}'
> >
> > always MyCollection is not protected.
> >
> >
> > thank you for your help.
> >
>
>
>
> --
> Anshum Gupta
>


Re: Securing solr 5.2 basic auth permission rules

2015-09-17 Thread Aziz Gaou
thank you so much for your reply,

Now, i try to protect Apache Solr 5 admin with jetty, when I change

1) sudo nano /opt/solr/server/etc/webdefault.xml








  
Solr
/*
  
  
search-role
  



  BASIC
  Solr Realm




2) i changed too "*jetty.xml *
 " and "
*realm.properties*
"

3) the following message will appear on browser:

 - http://localhost:8983/solr/


HTTP ERROR: 503

Problem accessing /solr/. Reason:

Service Unavailable

--
*Powered by Jetty://*


Thanks for your help

2015-09-16 18:58 GMT+00:00 Anshum Gupta :

> Basic authentication (and the API support, that you're trying to use) was
> only released with 5.3.0 so it wouldn't work with 5.2.
> 5.2 only had the authentication and authorization frameworks, and shipped
> with Kerberos authentication plugin out of the box.
>
> There are a few known issues with that though, and a 5.3.1 release is just
> around the corner.
>
> On Wed, Sep 16, 2015 at 10:11 AM, Aziz Gaou  wrote:
>
> > Hi,
> >
> > I try to follow:
> >
> >
> https://cwiki.apache.org/confluence/display/solr/Basic+Authentication+Plugin
> > ,
> > to protect Solr 5.2 Admin with password, but I have not been able to
> > secure.
> >
> > 1) When I run the following command:
> >
> > curl --user solr:SolrRocks
> http://localhost:8983/solr/admin/authentication
> > -H 'Content-type:application/json'-d '{
> >   "set-user": {"tom" : "TomIsCool" }}'
> >
> > no update on the file security.json
> >
> > 2) I launched the following 2 commands:
> >
> > curl --user solr:SolrRocks
> http://localhost:8983/solr/admin/authorization
> > -H 'Content-type:application/json'-d '{"set-permission": {
> > "name":"updates", "collection":"MyCollection", "role": "dev"}}'
> >
> > curl --user solr:SolrRocks
> http://localhost:8983/solr/admin/authorization
> > -H 'Content-type:application/json' -d '{ "set-user-role":
> {"tom":["dev"}}'
> >
> > always MyCollection is not protected.
> >
> >
> > thank you for your help.
> >
>
>
>
> --
> Anshum Gupta
>


Re: Generating a document by group count and displaying the result

2015-09-17 Thread Sreekant Sreedharan
The solution we came up with is to do a faceted search and use a stylesheet
to 'flatten' the result.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Generating-a-document-by-group-count-and-displaying-the-result-tp4229183p4229712.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Does more shards in core improve performance?

2015-09-17 Thread Toke Eskildsen
On Thu, 2015-09-17 at 16:58 +0800, Zheng Lin Edwin Yeo wrote:

> I was trying with 2 shards and 4 shards but all on the same machine,
> and they have the same performance (no improvement in performance) as
> the one with 1 shard. My machine has a 32GB RAM.

As you are testing indexing speed, Shalin's post is spot-on: Sharding on
the same machine won't help you. I just added my comment on search to
help build a complete picture.

A simple metric is to look at CPU usage on the machine: If it is near
100% when you index, you will need extra hardware to get more speed.
If it is substantially less than 100%, then feed Solr from more than one
thread at a time.

- Toke Eskildsen, State and University Library, Denmark





Re: Problem with CoreAdmin API CREATE command

2015-09-17 Thread Yago Riveiro
I have a very old index with more than 12T (re-index data is not an option ...) 
that I want upgrade to 5.3, I’m using lucene-core-4.10.4.jar (I’m in 4.10.4 
right now) to upgrade old segments of data. With solr running I can run the 
command because solr has the lock of the core.





I only want to unload it to perform an index upgrade command to load again 
without stop the node with the core.





The issue here is that the DELETEREPLICA deletes the data … the UNLOAD command 
doesn’t save the original core.properties and creates a new one 
core.properties.unloaded that not store the shard param and creates a new 
random coreNodeName that doesn’t correspond to the name of the clusterstate, 
the result … the core is loaded twice or even worse, in some situation is 
attached to a wrong shard.








—/Yago Riveiro

On Wed, Sep 16, 2015 at 4:38 PM, Erick Erickson 
wrote:

> The not-very-helpful answer is that you're using the core admin API in
> a SolrCloud setup. Please do not do this as (you're well aware of this by 
> now!)
> it's far too easy to get "interesting" results.
> Instead, use the Collections API, specifically the ADDREPLICA and 
> DELETEREPLICA
> commands. Under the covers, they actually create a core via the core admin 
> API,
> but they insure that all the core create parameters are correct. For 
> ADDREPLICA,
> you can also easily insure that a node lands on a particular machine.
> Best,
> Erick
> On Tue, Sep 15, 2015 at 9:46 AM, Yago Riveiro  wrote:
>> Hi,
>>
>> I’m having some issues with the command CREATE of the CoreAdmin API in solr
>> 4.10.4.
>>
>> When I try to load a previous unloaded core with CREATE command, the result
>> of this operation is 2 replicas in down state. One with the original
>> coreNodeName set in clusterstate.json and other with an new one.
>>
>> I’m with the new solr.xml format and the core.properties of cores looks like
>> (I set this configuration when  upgraded from 4.6.1 in legacy solr.xml to
>> 4.10.4:
>>
>> name=bucket-15_shardX_replicaY
>> shard=shardX
>> collection=bucket-15
>>
>> The API command looks like:
>>
>> solr/admin/cores?action=CREATE=bucket-15_shardX_replicaY=bucket-15=shardX=json
>>
>> What I’m doing wrong?
>>
>>
>>
>> -
>> Best regards
>> --
>> View this message in context: 
>> http://lucene.472066.n3.nabble.com/Problem-with-CoreAdmin-API-CREATE-command-tp4229248.html
>> Sent from the Solr - User mailing list archive at Nabble.com.

Re: Problem with CoreAdmin API CREATE command

2015-09-17 Thread Shai Erera
Solr 5.3 can read Solr 4.10.4 indexes as-is. Why are you trying to upgrade
the indexes in the first place?

Shai

On Thu, Sep 17, 2015 at 3:05 PM, Yago Riveiro 
wrote:

> I have a very old index with more than 12T (re-index data is not an option
> ...) that I want upgrade to 5.3, I’m using lucene-core-4.10.4.jar (I’m in
> 4.10.4 right now) to upgrade old segments of data. With solr running I can
> run the command because solr has the lock of the core.
>
>
>
>
>
> I only want to unload it to perform an index upgrade command to load again
> without stop the node with the core.
>
>
>
>
>
> The issue here is that the DELETEREPLICA deletes the data … the UNLOAD
> command doesn’t save the original core.properties and creates a new one
> core.properties.unloaded that not store the shard param and creates a new
> random coreNodeName that doesn’t correspond to the name of
> the clusterstate, the result … the core is loaded twice or even worse, in
> some situation is attached to a wrong shard.
>
>
>
>
>
>
>
>
> —/Yago Riveiro
>
> On Wed, Sep 16, 2015 at 4:38 PM, Erick Erickson 
> wrote:
>
> > The not-very-helpful answer is that you're using the core admin API in
> > a SolrCloud setup. Please do not do this as (you're well aware of this
> by now!)
> > it's far too easy to get "interesting" results.
> > Instead, use the Collections API, specifically the ADDREPLICA and
> DELETEREPLICA
> > commands. Under the covers, they actually create a core via the core
> admin API,
> > but they insure that all the core create parameters are correct. For
> ADDREPLICA,
> > you can also easily insure that a node lands on a particular machine.
> > Best,
> > Erick
> > On Tue, Sep 15, 2015 at 9:46 AM, Yago Riveiro 
> wrote:
> >> Hi,
> >>
> >> I’m having some issues with the command CREATE of the CoreAdmin API in
> solr
> >> 4.10.4.
> >>
> >> When I try to load a previous unloaded core with CREATE command, the
> result
> >> of this operation is 2 replicas in down state. One with the original
> >> coreNodeName set in clusterstate.json and other with an new one.
> >>
> >> I’m with the new solr.xml format and the core.properties of cores looks
> like
> >> (I set this configuration when  upgraded from 4.6.1 in legacy solr.xml
> to
> >> 4.10.4:
> >>
> >> name=bucket-15_shardX_replicaY
> >> shard=shardX
> >> collection=bucket-15
> >>
> >> The API command looks like:
> >>
> >>
> solr/admin/cores?action=CREATE=bucket-15_shardX_replicaY=bucket-15=shardX=json
> >>
> >> What I’m doing wrong?
> >>
> >>
> >>
> >> -
> >> Best regards
> >> --
> >> View this message in context:
> http://lucene.472066.n3.nabble.com/Problem-with-CoreAdmin-API-CREATE-command-tp4229248.html
> >> Sent from the Solr - User mailing list archive at Nabble.com.


Strange behaviour of ttf function query

2015-09-17 Thread Jie Gao
Hi,

I've found a very strange behaviour for ttf function query.

I can understand that ttf should be based on full-text query.

My query analyser is configured as follows:
 







When i query a term "D blooms" via
http://localhost:8983/solr/myCore/select?q=*:*=ttf(content,%27D
blooms%27)=1

I got the result 0. However, in my text, there is 1 occurrence.

Then, i changed the term to lower case, i can got 10632 occurrences which
is exactly the same as the occurrences of "blooms".

Next, when i comment out solr.StopFilterFactory, I can get the result 1
occurrence for "d blooms", which is correct as expected.

So, my question is that if the query analyser apply to ttf function query,
why lowercase filter  (or the whole pipeline) don't apply so as to allow me
to query with upper case?

Notes, full-text query for this field don't have this problem.

any idea ?

Thanks,


Single Term-Multiple Keywords mapping not working in solr.

2015-09-17 Thread SatyasaiHariharaPrasad.Pulipaka
Hi,


I'm trying to set up some basic synonyms in Solr. I'm facing an issue with 
following entries in synonyms.txt file

Castle=> Castle,Cinderella Castle,Le Chateau de la Belle au Bois Dormant

When a user searches for castle, he gets back documents containing Le Chateau 
de la Belle au Bois Dormant but not for Castle and Cinderella Castle.

How do I get results for other two keywords?


Regards,

Satya


This e-mail and any files transmitted with it are for the sole use of the 
intended recipient(s) and may contain confidential and privileged information. 
If you are not the intended recipient(s), please reply to the sender and 
destroy all copies of the original message. Any unauthorized review, use, 
disclosure, dissemination, forwarding, printing or copying of this email, 
and/or any action taken in reliance on the contents of this e-mail is strictly 
prohibited and may be unlawful. Where permitted by applicable law, this e-mail 
and other e-mail communications sent to and from Cognizant e-mail addresses may 
be monitored.


Re: Does more shards in core improve performance?

2015-09-17 Thread Upayavira
How many CPUs on that machine? How many other requests using the server?

On Thu, Sep 17, 2015, at 09:58 AM, Zheng Lin Edwin Yeo wrote:
> Thanks for the information.
> 
> I was trying with 2 shards and 4 shards but all on the same machine, and
> they have the same performance (no improvement in performance) as the one
> with 1 shard. My machine has a 32GB RAM.
> 
> Probably I should try one of the shard in different machine and see how
> it
> goes?
> 
> Regards,
> Edwin
> 
> 
> On 17 September 2015 at 15:37, Toke Eskildsen 
> wrote:
> 
> > On Thu, 2015-09-17 at 12:04 +0530, Shalin Shekhar Mangar wrote:
> > > Yes, of course, the only reason to have more shards is so that they
> > > can reside on different machines (or use different disks, assuming you
> > > have enough CPU/memory etc) so that you can scale your indexing
> > > throughput.
> >
> > For indexing, true. Due to Solr's 1-request-1-thread nature, sharding on
> > the same hardware can be used to lower latency for CPU-heavy searches.
> >
> > We are running 25 shards/machine, where the machines has 16HT CPU-cores.
> > Granted we also do it due to the pesky 2 billion limit, but the result
> > is that the CPU-cores are nicely utilized with our low queries/second
> > usage pattern.
> >
> > - Toke Eskildsen, State and University Library, Denmark
> >
> >
> >


Re: Does more shards in core improve performance?

2015-09-17 Thread Zheng Lin Edwin Yeo
Thanks for the information.

I was trying with 2 shards and 4 shards but all on the same machine, and
they have the same performance (no improvement in performance) as the one
with 1 shard. My machine has a 32GB RAM.

Probably I should try one of the shard in different machine and see how it
goes?

Regards,
Edwin


On 17 September 2015 at 15:37, Toke Eskildsen 
wrote:

> On Thu, 2015-09-17 at 12:04 +0530, Shalin Shekhar Mangar wrote:
> > Yes, of course, the only reason to have more shards is so that they
> > can reside on different machines (or use different disks, assuming you
> > have enough CPU/memory etc) so that you can scale your indexing
> > throughput.
>
> For indexing, true. Due to Solr's 1-request-1-thread nature, sharding on
> the same hardware can be used to lower latency for CPU-heavy searches.
>
> We are running 25 shards/machine, where the machines has 16HT CPU-cores.
> Granted we also do it due to the pesky 2 billion limit, but the result
> is that the CPU-cores are nicely utilized with our low queries/second
> usage pattern.
>
> - Toke Eskildsen, State and University Library, Denmark
>
>
>


Atomic updates on multiple documents

2015-09-17 Thread Alfonso Muñoz-Pomer Fuentes

Hi, I have a question regarding atomic updates on multiple documents.

We’re using Solr 5.1.0, and I’m wondering if it’s possible to perform an 
atomic update on multiple documents with one request.


After having a look at http://yonik.com/solr/atomic-updates/ I tried to 
do the following after the example by Yonik:

$ curl http://localhost:8983/solr/demo/update -d '
[
 {"author_s"   : "Neal Stephenson",
  "cat_ss" : {"add":"Cyberpunk"}
 }
]'

I expected it would add the category “Cyberpunk” to all the books by 
“Neal Stephenson”, but instead a new document with only those fields was 
created.


Is there a way to achieve what I described above, without reading all 
the relevant documents, modify them, and then sending them back to Solr?


Thank you in advance.

--
Alfonso Muñoz-Pomer Fuentes
Software Engineer @ Expression Atlas Team
European Bioinformatics Institute (EMBL-EBI)
European Molecular Biology Laboratory
Tel:+ 44 (0) 1223 49 2633
Skype: amunozpomer


Re: Understanding SOLR 5.3.0

2015-09-17 Thread Shawn Heisey
On 9/16/2015 3:47 PM, vetrik kumaran murugesan wrote:
> Can you please help me understand the following usage of below mentioned
> jar files, in apache Solr 5.3.0,
>
> 1. Tagsoup 1.2.1
> 2. Junit4-ant v 2.1.13
>  3. com.googlecode.juniversalchardet v1.0.3
>
>
> 2.  Is it right to ask , can we rebuild  solr 5.3.0 without/replacing
> above mentioned  files?

Those jars are not used in the core of Solr at all.  They ARE included
in the binary download, for reasons outlined below:

The tagsoup and juniversalchardet jars are only used in the "extraction"
contrib, which is the add-on capability for Solr to extract text from
rich documents like Word, PDF, etc.  The junit4-ant jar is used by the
build system for Solr tests, and it is also used by the Solr test framework.

In order for Solr to build from unmodified source code, these jars will
be downloaded with ivy and must be present, but if you wanted to alter
the build system so the extraction contrib and the test framework were
not built, and if you never run the tests, you could probably exclude them.

Upayavira asked a very relevant question ... what are you actually
trying to accomplish?  This feels a little bit like an XY problem ...
and the Y is not very well-defined.

http://people.apache.org/~hossman/#xyproblem

Thanks,
Shawn



Re: Does more shards in core improve performance?

2015-09-17 Thread Zheng Lin Edwin Yeo
Thank you everyone for your reply.

> How many CPUs on that machine? How many other requests using the server?

A) There's 8 CPU on the machine, and there is no other requests that's
using the server. Only the indexing script is running.

> A simple metric is to look at CPU usage on the machine: If it is near
100% when you index, you will need extra hardware to get more speed.
If it is substantially less than 100%, then feed Solr from more than one
thread at a time.

A) So far from what I observe, the CPU usage is usually around 50% to 70%.
It haven't go up to 100% yet. But I'll probably try to do sharing on a
different machine, as that is probably the case for the real production
server.


Regards,
Edwin


On 17 September 2015 at 19:55, Toke Eskildsen 
wrote:

> On Thu, 2015-09-17 at 16:58 +0800, Zheng Lin Edwin Yeo wrote:
>
> > I was trying with 2 shards and 4 shards but all on the same machine,
> > and they have the same performance (no improvement in performance) as
> > the one with 1 shard. My machine has a 32GB RAM.
>
> As you are testing indexing speed, Shalin's post is spot-on: Sharding on
> the same machine won't help you. I just added my comment on search to
> help build a complete picture.
>
> A simple metric is to look at CPU usage on the machine: If it is near
> 100% when you index, you will need extra hardware to get more speed.
> If it is substantially less than 100%, then feed Solr from more than one
> thread at a time.
>
> - Toke Eskildsen, State and University Library, Denmark
>
>
>
>


Re: Single Term-Multiple Keywords mapping not working in solr.

2015-09-17 Thread Vincenzo D'Amore
Hi Satya,

during these days I'm working at this problem, and as far as I understood,
this is an hard problem to solve.
You can add in different ways synonyms in your index but the query parser
is unable to match them.

I have read many articles around, this is the best I found:

http://lucidworks.com/blog/automatic-phrase-tokenization-improving-lucene-search-precision-by-more-precise-linguistic-analysis/

For me there is only one big problem with this solution, I use edismax.
OTOH, the search handler available in this project extends
the QParserPlugin which is an abstract class that can be used to define
customized user query process. Basically it only explains how to fix the
problem.

This is interesting but not very much useful, so I have modified the
plugin, now AutoPhrasingQParserPlugin extends ExtendedDismaxQParserPlugin
adding auto phrasing capability to edismax.

https://github.com/freedev/auto-phrase-tokenfilter

Consider that the code has been modified to work with SolrCloud 4.8.1.
But there is not very much work to adapt the code to another SolrCloud
version.

Best,
Vincenzo


On Thu, Sep 17, 2015 at 12:57 PM, <
satyasaihariharaprasad.pulip...@cognizant.com> wrote:

> Hi,
>
>
> I'm trying to set up some basic synonyms in Solr. I'm facing an issue with
> following entries in synonyms.txt file
>
> Castle=> Castle,Cinderella Castle,Le Chateau de la Belle au Bois Dormant
>
> When a user searches for castle, he gets back documents containing Le
> Chateau de la Belle au Bois Dormant but not for Castle and Cinderella
> Castle.
>
> How do I get results for other two keywords?
>
>
> Regards,
>
> Satya
>
>
> This e-mail and any files transmitted with it are for the sole use of the
> intended recipient(s) and may contain confidential and privileged
> information. If you are not the intended recipient(s), please reply to the
> sender and destroy all copies of the original message. Any unauthorized
> review, use, disclosure, dissemination, forwarding, printing or copying of
> this email, and/or any action taken in reliance on the contents of this
> e-mail is strictly prohibited and may be unlawful. Where permitted by
> applicable law, this e-mail and other e-mail communications sent to and
> from Cognizant e-mail addresses may be monitored.
>



-- 
Vincenzo D'Amore
email: v.dam...@gmail.com
skype: free.dev
mobile: +39 349 8513251


Re: Solr DataImportHandler is not indexing all data defined

2015-09-17 Thread Alexandre Rafalovitch
Sanity check. Did you restart Solr or reloaded the core after you
updated your schema definition? In the Admin UI, in the Schema
Browser, you should be able to see all the fields you defined. Are
those fields there?

Regards,
   Alex.

Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
http://www.solr-start.com/


On 17 September 2015 at 11:39, gaurav pant  wrote:
> Hi All,
>
> Greetings for the day.
>
> I am using solr5.3 and trying to upload wikipedia page article dump
> 
> to
> solr using "DataImportHandler" but I am getting only id and title files
> when i am querying.
>
> Below is my data-config.xml
>
> 
> 
> 
>  processor="XPathEntityProcessor"
> stream="true"
> forEach="/mediawiki/page/"
> url="/mnt/TEST/enwiki-20150602-pages-articles1.xml"
> transformer="RegexTransformer,DateFormatTransformer"
> >
> 
> 
> 
>  xpath="/mediawiki/page/revision/contributor/username" />
>  xpath="/mediawiki/page/revision/contributor/id" />
> 
>  xpath="/mediawiki/page/revision/timestamp"
> dateTimeFormat="-MM-dd'T'hh:mm:ss'Z'" />
>  replaceWith="true" sourceColName="text"/>
>
> 
>
> Also I have added below entires to schema.xml.
>
>   required="true" multiValued="false" />
> 
> 
> 
> 
> 
> 
> 
>
> I have copied schema.xml from
> "example/example-DIH/solr/solr/conf/schema.xml" and removed all field
> entries with few exceptions as mentioned in comments.
>
> After importing data I am just trying to fetch all fields but I am getting
> only "Id" and "Title".
>
> Also I tried to run documentImport using debug mode so that I can get some
> information regarding indexing, but at whenever i am selecting debug mode
> it is only importing 2 documents. I am not sure why? Due to this reason I
> am not able to debug the indexing process.
>
> Please guide me further.
>
> EDIT-I am now sure that other fields are not getting indexed because when I
> am specifying df=user or text, I am getting below message.
>
> "msg": "undefined field user",
>
> I am querying like below: *http://localhost:8983/solr/wiki/select?q=
> %3A=id%2Ctitle%2Ctext%2Crevision=json=true=true*
>
> --
> Regards
> Gaurav Pant
> +91-7709196607
>
>
>
> --
> Regards
> Gaurav Pant
> +91-7709196607


Re: Atomic updates on multiple documents

2015-09-17 Thread Shawn Heisey
On 9/17/2015 9:48 AM, Alfonso Muñoz-Pomer Fuentes wrote:
> Hi, I have a question regarding atomic updates on multiple documents.
>
> We’re using Solr 5.1.0, and I’m wondering if it’s possible to perform
> an atomic update on multiple documents with one request.
>
> After having a look at http://yonik.com/solr/atomic-updates/ I tried
> to do the following after the example by Yonik:
> $ curl http://localhost:8983/solr/demo/update -d '
> [
>  {"author_s"   : "Neal Stephenson",
>   "cat_ss" : {"add":"Cyberpunk"}
>  }
> ]'
>
> I expected it would add the category “Cyberpunk” to all the books by
> “Neal Stephenson”, but instead a new document with only those fields
> was created.
>
> Is there a way to achieve what I described above, without reading all
> the relevant documents, modify them, and then sending them back to Solr?

I think you're probably looking at Yonik's example that looks like this:

|$ curl http:||//localhost:8983/solr/demo/update -d '|
|[|
| ||{||"id"| |: ||"book1"||,|
|  ||"author_s"|   |: {||"set"||:||"Neal Stephenson"||},|
|  ||"copies_i"|   |: {||"inc"||:3},|
|  ||"cat_ss"| |: {||"add"||:||"Cyberpunk"||}|
| ||}|
|]'

What this means in practical terms is this:

 * Find the existing document with the id of "book1".
 * Set (replace if already present) the author.
 * Increment the number of copies by three.
 * Add Cyberpunk to the category list.
|

This assumes that the uniqueKey field is "id".  Unless your uniqueKey
field is "author_s" (which is highly unlikely), the JSON that you used
will not work.  Chances are that the request failed, that nothing happened.

Notice that the document (surrounded by curly braces) is inside square
brackets.  This is standard JSON syntax.  The curly braces mark a data
structure like a Java "Map" object, and the square brackets indicate an
array.  You can put multiple documents in the array, in accordance with
JSON syntax.  You will need to know all the ID values that you want to
update, and construct a document for each of them ... Solr does not have
anything like the SQL syntax that lets you update all rows that match a
WHERE clause.

Thanks,
Shawn



Fwd: Solr DataImportHandler is not indexing all data defined

2015-09-17 Thread gaurav pant
Hi All,

Greetings for the day.

I am using solr5.3 and trying to upload wikipedia page article dump

to
solr using "DataImportHandler" but I am getting only id and title files
when i am querying.

Below is my data-config.xml













   


Also I have added below entires to schema.xml.

 








I have copied schema.xml from
"example/example-DIH/solr/solr/conf/schema.xml" and removed all field
entries with few exceptions as mentioned in comments.

After importing data I am just trying to fetch all fields but I am getting
only "Id" and "Title".

Also I tried to run documentImport using debug mode so that I can get some
information regarding indexing, but at whenever i am selecting debug mode
it is only importing 2 documents. I am not sure why? Due to this reason I
am not able to debug the indexing process.

Please guide me further.

EDIT-I am now sure that other fields are not getting indexed because when I
am specifying df=user or text, I am getting below message.

"msg": "undefined field user",

I am querying like below: *http://localhost:8983/solr/wiki/select?q=
%3A=id%2Ctitle%2Ctext%2Crevision=json=true=true*

-- 
Regards
Gaurav Pant
+91-7709196607



-- 
Regards
Gaurav Pant
+91-7709196607


Re: Atomic updates on multiple documents

2015-09-17 Thread Shawn Heisey
On 9/17/2015 10:14 AM, Shawn Heisey wrote:
> This assumes that the uniqueKey field is "id". Unless your uniqueKey
> field is "author_s" (which is highly unlikely), the JSON that you used
> will not work. Chances are that the request failed, that nothing happened.

On my first reading, I did not catch that you said it added a new
document with the fields you specified, so my assumption that the
request failed was clearly wrong.

I think this must mean that you have disabled (removed) the uniqueKey
setting in your schema -- adding a document that does not have the
uniqueKey field will fail.  I'm reasonably certain that you cannot do
atomic updates if you do not have a uniqueKey.

I have just checked our documentation for Atomic Updates ... and the
uniqueKey requirement is NOT mentioned.  I think that's a documentation bug.

Thanks,
Shawn



Re: solr training

2015-09-17 Thread Tim Dunphy
>
> How about in Denver?


Nah dude. I'm in Jersey. Denver's like a half a country away!

On Thu, Sep 17, 2015 at 12:18 AM, William Bell  wrote:

> How about in Denver?
>
> On Sun, Sep 13, 2015 at 7:53 PM, Otis Gospodnetić <
> otis.gospodne...@gmail.com> wrote:
>
> > Hi Tim,
> >
> > A slightly delayed reply ;)
> > We are running Solr training in NYC next month -
> > http://sematext.com/training/solr-training.html - 2nd seat is 50% off.
> >
> > Otis
> > --
> > Monitoring * Alerting * Anomaly Detection * Centralized Log Management
> > Solr & Elasticsearch Support * http://sematext.com/
> >
> >
> > On Fri, May 1, 2015 at 2:18 PM, Tim Dunphy  wrote:
> >
> > > Hey guys,
> > >
> > >  My company has a training budget that it wants me to use. So what I'd
> > like
> > > to find out is if there is any instructor lead courses in the NY/NJ
> area,
> > > or courses online that are instructor lead that you could recommend?
> > >
> > > Thanks,
> > > Tim
> > >
> > > --
> > > GPG me!!
> > >
> > > gpg --keyserver pool.sks-keyservers.net --recv-keys F186197B
> > >
> >
>
>
>
> --
> Bill Bell
> billnb...@gmail.com
> cell 720-256-8076
>



-- 
GPG me!!

gpg --keyserver pool.sks-keyservers.net --recv-keys F186197B


Re: Problem with CoreAdmin API CREATE command

2015-09-17 Thread Shai Erera
That's definitely strange as Solr 5.x should support all Solr 4.x indexes.

Anyway, you can somewhat force an upgrade by running a forceMerge command
after you've upgraded the libraries to 5.3.0. This will rewrite the index
into one segment whose version will be 5.3. It is usually not recommended
to run this as it's a heavy operation, but if you want all index segments
to be upgraded, this is an option.

But again, I don't see any reason to do that. If you hit an error that you
can reproduce, it might be worth it to file a bug report.

Shai

On Thu, Sep 17, 2015 at 4:01 PM, Yago Riveiro 
wrote:

> 90% of my data was indexed in 4.6.1 or lower.
>
>
>
>
> My goal is upgrade all data to 4.10.4 and then upgrade to 5.3
>
>
>
>
> In previous test that I did with 5.3 in dev cluster, I see some strange
> behaviour with data indexed with 4.6.1 that in 4.10.4 didn’t reproduce.
>
>
>
>
> Some queries to data indexed in 4.6.1 that returns data with source 4.6.1,
> in 5.3 some times returned zero docs (in a random fashion way, I can’t find
> the pattern that produced the failure of the query). The same data upgraded
> to 4.10.4 with source code of 5.3 worked as expected with any issue.
>
>
> —/Yago Riveiro
>
> On Thu, Sep 17, 2015 at 1:08 PM, Shai Erera  wrote:
>
> > Solr 5.3 can read Solr 4.10.4 indexes as-is. Why are you trying to
> upgrade
> > the indexes in the first place?
> > Shai
> > On Thu, Sep 17, 2015 at 3:05 PM, Yago Riveiro 
> > wrote:
> >> I have a very old index with more than 12T (re-index data is not an
> option
> >> ...) that I want upgrade to 5.3, I’m using lucene-core-4.10.4.jar (I’m
> in
> >> 4.10.4 right now) to upgrade old segments of data. With solr running I
> can
> >> run the command because solr has the lock of the core.
> >>
> >>
> >>
> >>
> >>
> >> I only want to unload it to perform an index upgrade command to load
> again
> >> without stop the node with the core.
> >>
> >>
> >>
> >>
> >>
> >> The issue here is that the DELETEREPLICA deletes the data … the UNLOAD
> >> command doesn’t save the original core.properties and creates a new one
> >> core.properties.unloaded that not store the shard param and creates a
> new
> >> random coreNodeName that doesn’t correspond to the name of
> >> the clusterstate, the result … the core is loaded twice or even worse,
> in
> >> some situation is attached to a wrong shard.
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >> —/Yago Riveiro
> >>
> >> On Wed, Sep 16, 2015 at 4:38 PM, Erick Erickson <
> erickerick...@gmail.com>
> >> wrote:
> >>
> >> > The not-very-helpful answer is that you're using the core admin API in
> >> > a SolrCloud setup. Please do not do this as (you're well aware of this
> >> by now!)
> >> > it's far too easy to get "interesting" results.
> >> > Instead, use the Collections API, specifically the ADDREPLICA and
> >> DELETEREPLICA
> >> > commands. Under the covers, they actually create a core via the core
> >> admin API,
> >> > but they insure that all the core create parameters are correct. For
> >> ADDREPLICA,
> >> > you can also easily insure that a node lands on a particular machine.
> >> > Best,
> >> > Erick
> >> > On Tue, Sep 15, 2015 at 9:46 AM, Yago Riveiro  >
> >> wrote:
> >> >> Hi,
> >> >>
> >> >> I’m having some issues with the command CREATE of the CoreAdmin API
> in
> >> solr
> >> >> 4.10.4.
> >> >>
> >> >> When I try to load a previous unloaded core with CREATE command, the
> >> result
> >> >> of this operation is 2 replicas in down state. One with the original
> >> >> coreNodeName set in clusterstate.json and other with an new one.
> >> >>
> >> >> I’m with the new solr.xml format and the core.properties of cores
> looks
> >> like
> >> >> (I set this configuration when  upgraded from 4.6.1 in legacy
> solr.xml
> >> to
> >> >> 4.10.4:
> >> >>
> >> >> name=bucket-15_shardX_replicaY
> >> >> shard=shardX
> >> >> collection=bucket-15
> >> >>
> >> >> The API command looks like:
> >> >>
> >> >>
> >>
> solr/admin/cores?action=CREATE=bucket-15_shardX_replicaY=bucket-15=shardX=json
> >> >>
> >> >> What I’m doing wrong?
> >> >>
> >> >>
> >> >>
> >> >> -
> >> >> Best regards
> >> >> --
> >> >> View this message in context:
> >>
> http://lucene.472066.n3.nabble.com/Problem-with-CoreAdmin-API-CREATE-command-tp4229248.html
> >> >> Sent from the Solr - User mailing list archive at Nabble.com.


Re: Sorting parent documents based on a field from children

2015-09-17 Thread Florin Mandoc

Great, thank you very much for your help.


On 16.09.2015 13:07, Mikhail Khludnev wrote:

On Wed, Sep 16, 2015 at 12:15 PM, Florin Mandoc  wrote:


Is possible to to also add "name_s:expensive" search term in q? I know i
can add it to fq but I will have no score boost.


Sure you can. But beware of query syntax trap. It's explained by David
Smiley at comment
http://blog.griddynamics.com/2013/09/solr-block-join-support.html (sadly
linking a comment doesn't work)






Re: Problem with CoreAdmin API CREATE command

2015-09-17 Thread Yago Riveiro
90% of my data was indexed in 4.6.1 or lower.




My goal is upgrade all data to 4.10.4 and then upgrade to 5.3




In previous test that I did with 5.3 in dev cluster, I see some strange 
behaviour with data indexed with 4.6.1 that in 4.10.4 didn’t reproduce.




Some queries to data indexed in 4.6.1 that returns data with source 4.6.1, in 
5.3 some times returned zero docs (in a random fashion way, I can’t find the 
pattern that produced the failure of the query). The same data upgraded to 
4.10.4 with source code of 5.3 worked as expected with any issue.


—/Yago Riveiro

On Thu, Sep 17, 2015 at 1:08 PM, Shai Erera  wrote:

> Solr 5.3 can read Solr 4.10.4 indexes as-is. Why are you trying to upgrade
> the indexes in the first place?
> Shai
> On Thu, Sep 17, 2015 at 3:05 PM, Yago Riveiro 
> wrote:
>> I have a very old index with more than 12T (re-index data is not an option
>> ...) that I want upgrade to 5.3, I’m using lucene-core-4.10.4.jar (I’m in
>> 4.10.4 right now) to upgrade old segments of data. With solr running I can
>> run the command because solr has the lock of the core.
>>
>>
>>
>>
>>
>> I only want to unload it to perform an index upgrade command to load again
>> without stop the node with the core.
>>
>>
>>
>>
>>
>> The issue here is that the DELETEREPLICA deletes the data … the UNLOAD
>> command doesn’t save the original core.properties and creates a new one
>> core.properties.unloaded that not store the shard param and creates a new
>> random coreNodeName that doesn’t correspond to the name of
>> the clusterstate, the result … the core is loaded twice or even worse, in
>> some situation is attached to a wrong shard.
>>
>>
>>
>>
>>
>>
>>
>>
>> —/Yago Riveiro
>>
>> On Wed, Sep 16, 2015 at 4:38 PM, Erick Erickson 
>> wrote:
>>
>> > The not-very-helpful answer is that you're using the core admin API in
>> > a SolrCloud setup. Please do not do this as (you're well aware of this
>> by now!)
>> > it's far too easy to get "interesting" results.
>> > Instead, use the Collections API, specifically the ADDREPLICA and
>> DELETEREPLICA
>> > commands. Under the covers, they actually create a core via the core
>> admin API,
>> > but they insure that all the core create parameters are correct. For
>> ADDREPLICA,
>> > you can also easily insure that a node lands on a particular machine.
>> > Best,
>> > Erick
>> > On Tue, Sep 15, 2015 at 9:46 AM, Yago Riveiro 
>> wrote:
>> >> Hi,
>> >>
>> >> I’m having some issues with the command CREATE of the CoreAdmin API in
>> solr
>> >> 4.10.4.
>> >>
>> >> When I try to load a previous unloaded core with CREATE command, the
>> result
>> >> of this operation is 2 replicas in down state. One with the original
>> >> coreNodeName set in clusterstate.json and other with an new one.
>> >>
>> >> I’m with the new solr.xml format and the core.properties of cores looks
>> like
>> >> (I set this configuration when  upgraded from 4.6.1 in legacy solr.xml
>> to
>> >> 4.10.4:
>> >>
>> >> name=bucket-15_shardX_replicaY
>> >> shard=shardX
>> >> collection=bucket-15
>> >>
>> >> The API command looks like:
>> >>
>> >>
>> solr/admin/cores?action=CREATE=bucket-15_shardX_replicaY=bucket-15=shardX=json
>> >>
>> >> What I’m doing wrong?
>> >>
>> >>
>> >>
>> >> -
>> >> Best regards
>> >> --
>> >> View this message in context:
>> http://lucene.472066.n3.nabble.com/Problem-with-CoreAdmin-API-CREATE-command-tp4229248.html
>> >> Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr facets implementation question

2015-09-17 Thread adfel70
Toke Eskildsen wrote
> adfel70 

> adfel70@

>  wrote:
>> I am trying to understand why faceting on a field with lots of unique
>> values
>> has a great impact on query performance.
> 
> Faceting in Solr is performed in different ways. String faceting different
> from Numerics faceting, DocValued fields different from non-DocValued, fc
> different from enum. Let's look at String faceting with facet.method=fc
> and DocValues.
> 
> Strings (aka Terms) are represented in the faceting code with an ordinal,
> which is really just a number. The first term has number 0, the next
> number 1 and so forth. When doing a faceting call with the above premise,
> what happens is
> 
> 1) An counter of int[unique_values] is allocated.
> This is fairly fast, but still with a noticeable impact when the number of
> unique value creeps into the millions. On our machine it takes several
> hundred milliseconds for 100M values. Also relevant is the overall strain
> it puts on the garbage collector.
> 
> 2) For each hit in the result set, the corresponding ordinals are resolved
> and counter[ordinal]++ is triggered.
> This scales with the result set. Small sets are very fast, quite
> independent of the size of the counter-structure. Large result sets are
> (naturally) equally slow.
> 
> 3) The counter-structure is iterated and top-X are determined.
> This scales with the size of the counter-structure, (nearly) independent
> of the result set size.
> 
> 4) The Terms for the top-X ordinals are resolved from the index.
> This scales with X.
> 
> 
> Some of these parts has some non-intuitive penalties: Even very tiny
> result sets has aa constant overhead from allocation and iteration. Asking
> for top-1M hits means that the underlying priority queue will probably no
> longer fit in the CPU cache and will slow things down. Resolving Terms
> from ordinals relies of fast IO and a large number of unique Terms might
> mean that the disk cache is not large enough.
> 
> 
> Blatant plug: I have spend a fair amount of time trying to make some of
> this faster http://tokee.github.io/lucene-solr/
> 
> - Toke Eskildsen

Hi Toke, Thank you for the detailed explanation, thats exactly what I was
looking for, except this algorithm fit single index only. could you please
elaborate what adjustments are needed for distributed index?
The naive solution would count all terms on each shard, but the initial
shard (the one that executed the request) must have ALL results for correct
aggregation (its easy to find example which shows that top K results from
every shard is not good enough). 
Is that correct? I tried to verify this behaviour, but I didnt see that the
process who got the request from the user used more memory than the other
shards.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-facets-implementation-question-tp4227604p4229741.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Atomic updates on multiple documents

2015-09-17 Thread Alessandro Benedetti
You need to do that programmatically.
Using SolrJ would be not so difficult to do that in few line of codes.
Be careful to the stored fields if you don't want to lose anything.

Cheers

2015-09-17 17:48 GMT+01:00 Alfonso Muñoz-Pomer Fuentes :

> You’re right, we’re not working with a uniqueKey and I wasn’t aware of
> that requirement.
>
> What I’d like is to update the documents without having to retrieve all of
> them (or their unique ids). Basically, there are some data that all
> documents that match a query will share; for the sake of the example, all
> Neal Stephenson’s books are going to be categorised sci-fi, so something
> like the request I specified before (in the same way that multiple
> documents can be deleted with just one request).
>
> If Solr doesn’t offer that “out of the box”, could I accomplish that with
> a plug-in?
>
> Thanks a lot for the info.
>
>
> On 17/09/2015 17:23, Shawn Heisey wrote:
>
>> On 9/17/2015 10:14 AM, Shawn Heisey wrote:
>>
>>> This assumes that the uniqueKey field is "id". Unless your uniqueKey
>>> field is "author_s" (which is highly unlikely), the JSON that you used
>>> will not work. Chances are that the request failed, that nothing
>>> happened.
>>>
>>
>> On my first reading, I did not catch that you said it added a new
>> document with the fields you specified, so my assumption that the
>> request failed was clearly wrong.
>>
>> I think this must mean that you have disabled (removed) the uniqueKey
>> setting in your schema -- adding a document that does not have the
>> uniqueKey field will fail.  I'm reasonably certain that you cannot do
>> atomic updates if you do not have a uniqueKey.
>>
>> I have just checked our documentation for Atomic Updates ... and the
>> uniqueKey requirement is NOT mentioned.  I think that's a documentation
>> bug.
>>
>> Thanks,
>> Shawn
>>
>>
> --
> Alfonso Muñoz-Pomer Fuentes
> Software Engineer @ Expression Atlas Team
> European Bioinformatics Institute (EMBL-EBI)
> European Molecular Biology Laboratory
> Tel:+ 44 (0) 1223 49 2633
> Skype: amunozpomer
>



-- 
--

Benedetti Alessandro
Visiting card - http://about.me/alessandro_benedetti
Blog - http://alexbenedetti.blogspot.co.uk

"Tyger, tyger burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symmetry?"

William Blake - Songs of Experience -1794 England


Re: Atomic updates on multiple documents

2015-09-17 Thread Alfonso Muñoz-Pomer Fuentes
We’re using SolrJ as well, but if I understood correctly I would need to 
have the uniqueKey values anyway, right? There’s no way to do what I 
want with one request. A simple outline is:


1. Get the uniqueKey values that match my query
2. Create a set of SolrInputDocument
3. Add the id and the additional data
4. Send the documents to Solr

Thanks a lot for the help.

On 17/09/2015 17:51, Alessandro Benedetti wrote:

You need to do that programmatically.
Using SolrJ would be not so difficult to do that in few line of codes.
Be careful to the stored fields if you don't want to lose anything.

Cheers

2015-09-17 17:48 GMT+01:00 Alfonso Muñoz-Pomer Fuentes :


You’re right, we’re not working with a uniqueKey and I wasn’t aware of
that requirement.

What I’d like is to update the documents without having to retrieve all of
them (or their unique ids). Basically, there are some data that all
documents that match a query will share; for the sake of the example, all
Neal Stephenson’s books are going to be categorised sci-fi, so something
like the request I specified before (in the same way that multiple
documents can be deleted with just one request).

If Solr doesn’t offer that “out of the box”, could I accomplish that with
a plug-in?

Thanks a lot for the info.


On 17/09/2015 17:23, Shawn Heisey wrote:


On 9/17/2015 10:14 AM, Shawn Heisey wrote:


This assumes that the uniqueKey field is "id". Unless your uniqueKey
field is "author_s" (which is highly unlikely), the JSON that you used
will not work. Chances are that the request failed, that nothing
happened.



On my first reading, I did not catch that you said it added a new
document with the fields you specified, so my assumption that the
request failed was clearly wrong.

I think this must mean that you have disabled (removed) the uniqueKey
setting in your schema -- adding a document that does not have the
uniqueKey field will fail.  I'm reasonably certain that you cannot do
atomic updates if you do not have a uniqueKey.

I have just checked our documentation for Atomic Updates ... and the
uniqueKey requirement is NOT mentioned.  I think that's a documentation
bug.

Thanks,
Shawn



--
Alfonso Muñoz-Pomer Fuentes
Software Engineer @ Expression Atlas Team
European Bioinformatics Institute (EMBL-EBI)
European Molecular Biology Laboratory
Tel:+ 44 (0) 1223 49 2633
Skype: amunozpomer







--
Alfonso Muñoz-Pomer Fuentes
Software Engineer @ Expression Atlas Team
European Bioinformatics Institute (EMBL-EBI)
European Molecular Biology Laboratory
Tel:+ 44 (0) 1223 49 2633
Skype: amunozpomer


Re: Atomic updates on multiple documents

2015-09-17 Thread Alessandro Benedetti
Hi Highlight Shawn answer : "  Solr does not have
anything like the SQL syntax that lets you update all rows that match a
WHERE clause."

In particular when adding a document to Solr, you are actually adding a new
Document to the index ( under the hood) .
The atomic update is only a user friendly way to send less information in
the REST API request, but actually a simple update never happens.

The id is used to retrieve the original document if present.
The original fields stored are retrieved and the new document built
accordingly with the user update.

What you were expecting is not so immediate, also because you are not
telling Solr anywhere which is the query to use to get the set of documents
to update.
Actually could be an improvement, but should be designed and implemented.

Cheers

2015-09-17 17:23 GMT+01:00 Shawn Heisey :

> On 9/17/2015 10:14 AM, Shawn Heisey wrote:
> > This assumes that the uniqueKey field is "id". Unless your uniqueKey
> > field is "author_s" (which is highly unlikely), the JSON that you used
> > will not work. Chances are that the request failed, that nothing
> happened.
>
> On my first reading, I did not catch that you said it added a new
> document with the fields you specified, so my assumption that the
> request failed was clearly wrong.
>
> I think this must mean that you have disabled (removed) the uniqueKey
> setting in your schema -- adding a document that does not have the
> uniqueKey field will fail.  I'm reasonably certain that you cannot do
> atomic updates if you do not have a uniqueKey.
>
> I have just checked our documentation for Atomic Updates ... and the
> uniqueKey requirement is NOT mentioned.  I think that's a documentation
> bug.
>
> Thanks,
> Shawn
>
>


-- 
--

Benedetti Alessandro
Visiting card - http://about.me/alessandro_benedetti
Blog - http://alexbenedetti.blogspot.co.uk

"Tyger, tyger burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symmetry?"

William Blake - Songs of Experience -1794 England


RE: Securing solr 5.2 basic auth permission rules

2015-09-17 Thread Sanders, Marshall (AT - Atlanta)
I'm actually trying to do something similar with 5.3

We're in the process of upgrading from 4.10 and were previously using jaas to 
secure dih pages and a few others and had a config similar to what you 
described.

The Error I get is the following (Might only visible when you change the log4j 
startup log level, I didn't check what the default log level is):

2015-09-17 11:19:10,121 [main] WARN  xml.XmlConfiguration Config error at 
  SolrRealmmultiloginmodule
  

From what I gather now with jetty 9 the modules have to be enabled individually:
http://www.eclipse.org/jetty/documentation/current/startup-modules.html

However: when I run
java -jar start.jar --list-modules

I only get a few modules as possibilities (server,http,https,ssl).  I tried 
adding the jetty-jaas jar for the version of jetty with 5.3 to /lib but I still 
am not able to figure out how to turn it on as it doesn't show up in the list.

I'm much less familiar with jetty than I am with others so I'm still fumbling a 
bit here.  But it seems we need to:

1. Add the jetty-jaas.jar that's missing via an outside script  (Also note that 
if you want ldap you'll have to use an additional jar)
2. Execute the following (java -jar start.jar --add-to-startd=jaas)
3. Start the server (either with your own script or the new ./solr scripts)

I've got the jar added, but either it's not in the right place (I've got it in 
/lib maybe it needs to be in /lib/ext?) or jetty needs to be configured to 
recognize it.

Not sure what the thinking was behind the decision that only people running 
solr cloud would want authentication, or even how solr made it to 5.2 before 
adding anything in at all!

We had all this working great in jetty8 solr versions but with the new jetty9 
modules/classloaders it's proving a challenge.

Marshall Sanders
Technical Lead – Software Engineer
Autotrader.com
404-568-7130

-Original Message-
From: Aziz Gaou [mailto:gaoua...@gmail.com] 
Sent: Thursday, September 17, 2015 5:55 AM
To: solr-user@lucene.apache.org
Subject: Re: Securing solr 5.2 basic auth permission rules

thank you so much for your reply,

Now, i try to protect Apache Solr 5 admin with jetty, when I change

1) sudo nano /opt/solr/server/etc/webdefault.xml


 




  
Solr
/*
  
  
search-role
  



  BASIC
  Solr Realm




2) i changed too "*jetty.xml *
 " and "
*realm.properties*
"

3) the following message will appear on browser:

 - http://localhost:8983/solr/


HTTP ERROR: 503

Problem accessing /solr/. Reason:

Service Unavailable

--
*Powered by Jetty://*


Thanks for your help

2015-09-16 18:58 GMT+00:00 Anshum Gupta :

> Basic authentication (and the API support, that you're trying to use) 
> was only released with 5.3.0 so it wouldn't work with 5.2.
> 5.2 only had the authentication and authorization frameworks, and 
> shipped with Kerberos authentication plugin out of the box.
>
> There are a few known issues with that though, and a 5.3.1 release is 
> just around the corner.
>
> On Wed, Sep 16, 2015 at 10:11 AM, Aziz Gaou  wrote:
>
> > Hi,
> >
> > I try to follow:
> >
> >
> https://cwiki.apache.org/confluence/display/solr/Basic+Authentication+
> Plugin
> > ,
> > to protect Solr 5.2 Admin with password, but I have not been able to 
> > secure.
> >
> > 1) When I run the following command:
> >
> > curl --user solr:SolrRocks
> http://localhost:8983/solr/admin/authentication
> > -H 'Content-type:application/json'-d '{
> >   "set-user": {"tom" : "TomIsCool" }}'
> >
> > no update on the file security.json
> >
> > 2) I launched the following 2 commands:
> >
> > curl --user solr:SolrRocks
> http://localhost:8983/solr/admin/authorization
> > -H 'Content-type:application/json'-d '{"set-permission": { 
> > "name":"updates", "collection":"MyCollection", "role": "dev"}}'
> >
> > curl --user solr:SolrRocks
> http://localhost:8983/solr/admin/authorization
> > -H 'Content-type:application/json' -d '{ "set-user-role":
> {"tom":["dev"}}'
> >
> > always MyCollection is not protected.
> >
> >
> > thank you for your help.
> >
>
>
>
> --
> Anshum Gupta
>


Re: Atomic updates on multiple documents

2015-09-17 Thread Alexandre Rafalovitch
You could probably do this as a RequestUpdateProcessor (a custom one)
that would take your submitted document, run a query and expand it to
a bunch of documents. So, do the ID mapping internally. But you would
need the ID/uniqueKeys.

Definitely nothing out of the box, that I can think of.

Regards,
   Alex.

Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
http://www.solr-start.com/


On 17 September 2015 at 12:58, Alfonso Muñoz-Pomer Fuentes
 wrote:
> We’re using SolrJ as well, but if I understood correctly I would need to
> have the uniqueKey values anyway, right? There’s no way to do what I want
> with one request. A simple outline is:
>
> 1. Get the uniqueKey values that match my query
> 2. Create a set of SolrInputDocument
> 3. Add the id and the additional data
> 4. Send the documents to Solr
>
> Thanks a lot for the help.
>
>
> On 17/09/2015 17:51, Alessandro Benedetti wrote:
>>
>> You need to do that programmatically.
>> Using SolrJ would be not so difficult to do that in few line of codes.
>> Be careful to the stored fields if you don't want to lose anything.
>>
>> Cheers
>>
>> 2015-09-17 17:48 GMT+01:00 Alfonso Muñoz-Pomer Fuentes :
>>
>>> You’re right, we’re not working with a uniqueKey and I wasn’t aware of
>>> that requirement.
>>>
>>> What I’d like is to update the documents without having to retrieve all
>>> of
>>> them (or their unique ids). Basically, there are some data that all
>>> documents that match a query will share; for the sake of the example, all
>>> Neal Stephenson’s books are going to be categorised sci-fi, so something
>>> like the request I specified before (in the same way that multiple
>>> documents can be deleted with just one request).
>>>
>>> If Solr doesn’t offer that “out of the box”, could I accomplish that with
>>> a plug-in?
>>>
>>> Thanks a lot for the info.
>>>
>>>
>>> On 17/09/2015 17:23, Shawn Heisey wrote:
>>>
 On 9/17/2015 10:14 AM, Shawn Heisey wrote:

> This assumes that the uniqueKey field is "id". Unless your uniqueKey
> field is "author_s" (which is highly unlikely), the JSON that you used
> will not work. Chances are that the request failed, that nothing
> happened.
>

 On my first reading, I did not catch that you said it added a new
 document with the fields you specified, so my assumption that the
 request failed was clearly wrong.

 I think this must mean that you have disabled (removed) the uniqueKey
 setting in your schema -- adding a document that does not have the
 uniqueKey field will fail.  I'm reasonably certain that you cannot do
 atomic updates if you do not have a uniqueKey.

 I have just checked our documentation for Atomic Updates ... and the
 uniqueKey requirement is NOT mentioned.  I think that's a documentation
 bug.

 Thanks,
 Shawn


>>> --
>>> Alfonso Muñoz-Pomer Fuentes
>>> Software Engineer @ Expression Atlas Team
>>> European Bioinformatics Institute (EMBL-EBI)
>>> European Molecular Biology Laboratory
>>> Tel:+ 44 (0) 1223 49 2633
>>> Skype: amunozpomer
>>>
>>
>>
>>
>
> --
> Alfonso Muñoz-Pomer Fuentes
> Software Engineer @ Expression Atlas Team
> European Bioinformatics Institute (EMBL-EBI)
> European Molecular Biology Laboratory
> Tel:+ 44 (0) 1223 49 2633
> Skype: amunozpomer


RE: Securing solr 5.2 basic auth permission rules

2015-09-17 Thread Davis, Daniel (NIH/NLM) [C]
I had a similar problem attempting to use JNDI when the Jetty included with 
Solr does not include jetty-plus... 
I'd like to second the suggestion to include more of jetty.

In my case, there was a better solution - I just wrote a JDBC driver to wrap 
each driverClass I needed (Oracle, MySQL, PostgreSQL), and that fixed the 
problem of getting passwords out of my data-config.xml files, which is 
important.

-Original Message-
From: Sanders, Marshall (AT - Atlanta) [mailto:marshall.sand...@autotrader.com] 
Sent: Thursday, September 17, 2015 3:37 PM
To: solr-user@lucene.apache.org
Subject: RE: Securing solr 5.2 basic auth permission rules

So the issue is that when it's stated that solr runs on jetty 9 what it really 
means is that it runs on 5% of jetty9 and the other 95% has been stripped out.  
(WH!  It's only ~13 MB)

You'll need to download the appropriate version of jetty and before starting up 
do the following

1. Copy modules/jaas.mod to the unpacked solr directory server/modules 2. Copy 
etc/jetty-jaas.xml to server/etc 3. Copy the jetty-jaas-.jar to 
server/lib 4. Call the following before starting solr: java -jar start.jar 
--add-to-startd=jaas

Now when you start solr JAAS will be available and you should be able to 
configure it with all of the defaults that you would expect.
http://www.eclipse.org/jetty/documentation/current/jaas-support.html


I'll reiterate that I think it's a pretty bad decision to have stripped out the 
modules from the version of jetty shipped.  Especially since they won't be 
loaded into the classloader with the new jetty modules setup.


Marshall Sanders
Technical Lead – Software Engineer
Autotrader.com
404-568-7130

-Original Message-
From: Sanders, Marshall (AT - Atlanta) [mailto:marshall.sand...@autotrader.com]
Sent: Thursday, September 17, 2015 2:28 PM
To: solr-user@lucene.apache.org
Subject: RE: Securing solr 5.2 basic auth permission rules

I'm actually trying to do something similar with 5.3

We're in the process of upgrading from 4.10 and were previously using jaas to 
secure dih pages and a few others and had a config similar to what you 
described.

The Error I get is the following (Might only visible when you change the log4j 
startup log level, I didn't check what the default log level is):

2015-09-17 11:19:10,121 [main] WARN  xml.XmlConfiguration Config error at 
  SolrRealmmultiloginmodule
  

From what I gather now with jetty 9 the modules have to be enabled individually:
http://www.eclipse.org/jetty/documentation/current/startup-modules.html

However: when I run
java -jar start.jar --list-modules

I only get a few modules as possibilities (server,http,https,ssl).  I tried 
adding the jetty-jaas jar for the version of jetty with 5.3 to /lib but I still 
am not able to figure out how to turn it on as it doesn't show up in the list.

I'm much less familiar with jetty than I am with others so I'm still fumbling a 
bit here.  But it seems we need to:

1. Add the jetty-jaas.jar that's missing via an outside script  (Also note that 
if you want ldap you'll have to use an additional jar) 2. Execute the following 
(java -jar start.jar --add-to-startd=jaas) 3. Start the server (either with 
your own script or the new ./solr scripts)

I've got the jar added, but either it's not in the right place (I've got it in 
/lib maybe it needs to be in /lib/ext?) or jetty needs to be configured to 
recognize it.

Not sure what the thinking was behind the decision that only people running 
solr cloud would want authentication, or even how solr made it to 5.2 before 
adding anything in at all!

We had all this working great in jetty8 solr versions but with the new jetty9 
modules/classloaders it's proving a challenge.

Marshall Sanders
Technical Lead – Software Engineer
Autotrader.com
404-568-7130

-Original Message-
From: Aziz Gaou [mailto:gaoua...@gmail.com]
Sent: Thursday, September 17, 2015 5:55 AM
To: solr-user@lucene.apache.org
Subject: Re: Securing solr 5.2 basic auth permission rules

thank you so much for your reply,

Now, i try to protect Apache Solr 5 admin with jetty, when I change

1) sudo nano /opt/solr/server/etc/webdefault.xml


 




  
Solr
/*
  
  
search-role
  



  BASIC
  Solr Realm




2) i changed too "*jetty.xml *
 " and "
*realm.properties*
"

3) the following message will appear on browser:

 - http://localhost:8983/solr/


HTTP ERROR: 503

Problem accessing /solr/. Reason:

Service Unavailable

--
*Powered by Jetty://*


Thanks for your help

2015-09-16 18:58 GMT+00:00 Anshum Gupta :

> Basic authentication (and the API support, that you're trying to use) 
> was only released with 5.3.0 so it wouldn't work with 5.2.
> 5.2 only had the authentication and authorization frameworks, and 
> shipped with Kerberos 

RE: Securing solr 5.2 basic auth permission rules

2015-09-17 Thread Sanders, Marshall (AT - Atlanta)
So the issue is that when it's stated that solr runs on jetty 9 what it really 
means is that it runs on 5% of jetty9 and the other 95% has been stripped out.  
(WH!  It's only ~13 MB)

You'll need to download the appropriate version of jetty and before starting up 
do the following

1. Copy modules/jaas.mod to the unpacked solr directory server/modules
2. Copy etc/jetty-jaas.xml to server/etc
3. Copy the jetty-jaas-.jar to server/lib
4. Call the following before starting solr: java -jar start.jar 
--add-to-startd=jaas

Now when you start solr JAAS will be available and you should be able to 
configure it with all of the defaults that you would expect.
http://www.eclipse.org/jetty/documentation/current/jaas-support.html


I'll reiterate that I think it's a pretty bad decision to have stripped out the 
modules from the version of jetty shipped.  Especially since they won't be 
loaded into the classloader with the new jetty modules setup.


Marshall Sanders
Technical Lead – Software Engineer
Autotrader.com
404-568-7130

-Original Message-
From: Sanders, Marshall (AT - Atlanta) [mailto:marshall.sand...@autotrader.com] 
Sent: Thursday, September 17, 2015 2:28 PM
To: solr-user@lucene.apache.org
Subject: RE: Securing solr 5.2 basic auth permission rules

I'm actually trying to do something similar with 5.3

We're in the process of upgrading from 4.10 and were previously using jaas to 
secure dih pages and a few others and had a config similar to what you 
described.

The Error I get is the following (Might only visible when you change the log4j 
startup log level, I didn't check what the default log level is):

2015-09-17 11:19:10,121 [main] WARN  xml.XmlConfiguration Config error at 
  SolrRealmmultiloginmodule
  

From what I gather now with jetty 9 the modules have to be enabled individually:
http://www.eclipse.org/jetty/documentation/current/startup-modules.html

However: when I run
java -jar start.jar --list-modules

I only get a few modules as possibilities (server,http,https,ssl).  I tried 
adding the jetty-jaas jar for the version of jetty with 5.3 to /lib but I still 
am not able to figure out how to turn it on as it doesn't show up in the list.

I'm much less familiar with jetty than I am with others so I'm still fumbling a 
bit here.  But it seems we need to:

1. Add the jetty-jaas.jar that's missing via an outside script  (Also note that 
if you want ldap you'll have to use an additional jar) 2. Execute the following 
(java -jar start.jar --add-to-startd=jaas) 3. Start the server (either with 
your own script or the new ./solr scripts)

I've got the jar added, but either it's not in the right place (I've got it in 
/lib maybe it needs to be in /lib/ext?) or jetty needs to be configured to 
recognize it.

Not sure what the thinking was behind the decision that only people running 
solr cloud would want authentication, or even how solr made it to 5.2 before 
adding anything in at all!

We had all this working great in jetty8 solr versions but with the new jetty9 
modules/classloaders it's proving a challenge.

Marshall Sanders
Technical Lead – Software Engineer
Autotrader.com
404-568-7130

-Original Message-
From: Aziz Gaou [mailto:gaoua...@gmail.com]
Sent: Thursday, September 17, 2015 5:55 AM
To: solr-user@lucene.apache.org
Subject: Re: Securing solr 5.2 basic auth permission rules

thank you so much for your reply,

Now, i try to protect Apache Solr 5 admin with jetty, when I change

1) sudo nano /opt/solr/server/etc/webdefault.xml


 




  
Solr
/*
  
  
search-role
  



  BASIC
  Solr Realm




2) i changed too "*jetty.xml *
 " and "
*realm.properties*
"

3) the following message will appear on browser:

 - http://localhost:8983/solr/


HTTP ERROR: 503

Problem accessing /solr/. Reason:

Service Unavailable

--
*Powered by Jetty://*


Thanks for your help

2015-09-16 18:58 GMT+00:00 Anshum Gupta :

> Basic authentication (and the API support, that you're trying to use) 
> was only released with 5.3.0 so it wouldn't work with 5.2.
> 5.2 only had the authentication and authorization frameworks, and 
> shipped with Kerberos authentication plugin out of the box.
>
> There are a few known issues with that though, and a 5.3.1 release is 
> just around the corner.
>
> On Wed, Sep 16, 2015 at 10:11 AM, Aziz Gaou  wrote:
>
> > Hi,
> >
> > I try to follow:
> >
> >
> https://cwiki.apache.org/confluence/display/solr/Basic+Authentication+
> Plugin
> > ,
> > to protect Solr 5.2 Admin with password, but I have not been able to 
> > secure.
> >
> > 1) When I run the following command:
> >
> > curl --user solr:SolrRocks
> http://localhost:8983/solr/admin/authentication
> > -H 'Content-type:application/json'-d '{
> >   "set-user": {"tom" : "TomIsCool" }}'
> >
> > no 

Re: Atomic updates on multiple documents

2015-09-17 Thread Alfonso Muñoz-Pomer Fuentes
You’re right, we’re not working with a uniqueKey and I wasn’t aware of 
that requirement.


What I’d like is to update the documents without having to retrieve all 
of them (or their unique ids). Basically, there are some data that all 
documents that match a query will share; for the sake of the example, 
all Neal Stephenson’s books are going to be categorised sci-fi, so 
something like the request I specified before (in the same way that 
multiple documents can be deleted with just one request).


If Solr doesn’t offer that “out of the box”, could I accomplish that 
with a plug-in?


Thanks a lot for the info.

On 17/09/2015 17:23, Shawn Heisey wrote:

On 9/17/2015 10:14 AM, Shawn Heisey wrote:

This assumes that the uniqueKey field is "id". Unless your uniqueKey
field is "author_s" (which is highly unlikely), the JSON that you used
will not work. Chances are that the request failed, that nothing happened.


On my first reading, I did not catch that you said it added a new
document with the fields you specified, so my assumption that the
request failed was clearly wrong.

I think this must mean that you have disabled (removed) the uniqueKey
setting in your schema -- adding a document that does not have the
uniqueKey field will fail.  I'm reasonably certain that you cannot do
atomic updates if you do not have a uniqueKey.

I have just checked our documentation for Atomic Updates ... and the
uniqueKey requirement is NOT mentioned.  I think that's a documentation bug.

Thanks,
Shawn



--
Alfonso Muñoz-Pomer Fuentes
Software Engineer @ Expression Atlas Team
European Bioinformatics Institute (EMBL-EBI)
European Molecular Biology Laboratory
Tel:+ 44 (0) 1223 49 2633
Skype: amunozpomer


Question related to reranking and RankQuery

2015-09-17 Thread Ajinkya Kale
Hi all,

I am new to Solr. I have a QParser plugin which uses an implementation
of CustomScoreQuery to provide custom score for each document.
Is there a way I can use the same plugin to provide score for top N
documents after an initial query/sort ?
I looked at the ReRankQParserPlugin but it looks lot more involved to write
a custom RankQuery implementation. So I was thinking if I can reuse my
existing CustomScoreQuery implementation to provide score for top N
documents to rerank.

On a side note, is there a place where the ReRankQParserPlugin is explained
in detail ?

--aj