Re: Security Problems

2015-12-12 Thread Noble Paul
This could have multiple solutions

1) "read" should cover all the paths
2) system properties are a strict NO . This can be strictly a property
of the Authentication plugin. So , you can use the API to modify the
property.

On Sat, Nov 21, 2015 at 3:57 AM, Jan Høydahl  wrote:
>> ideally we should have a simple permission name called "all" (which we
>> don't have)
>>
>> so that one rule should be enough
>>
>> "name":"all",
>> "role":"somerole"
>>
>> Open a ticket and we should fix it for 5.4.0
>> It should also include  the admin paths as well
>
> Yes, that would be convenient.
>
> I still don’t like the existing "open-by-default” security mode of Solr. It 
> is very fragile to mis-configuration without people noticing. Take the 
> well-known permission “read” for instance. It protects /select and /get. But 
> it won’t protect /query, /browse, /export, /spell, /suggest, /tvrh, /terms, 
> /clustering or /elevate, all which also expose sensitive info.
>
> How about allowing to choose between three different security modes?
>
> -Dsolr.security.mode=open  : As today - paths not configured are wide 
> open
> -Dsolr.security.mode=authenticated : Paths not configured are open to any 
> authenticated user
> -Dsolr.security.mode=explicit  : Paths not configured are closed to all. 
> All acccess is explicitly configured
>
> /Jan



-- 
-
Noble Paul


Re: Getting a document version back after updating

2015-12-12 Thread Shalin Shekhar Mangar
You will have to request a real-time-get with the unique key of the
document you added/updated. In Solr 5.1+ you can go
client.getById(String id) to get this information.

On Sat, Dec 12, 2015 at 10:19 AM, Debraj Manna  wrote:
> Is there a way I can get the version of a document back in response after
> adding or updating the document via Solrj 5.2.1?



-- 
Regards,
Shalin Shekhar Mangar.


RE: Use multiple istance simultaneously

2015-12-12 Thread Gian Maria Ricci - aka Alkampfer
Thanks a lot for all the clarifications.

Actually resources are not a big problem, I think customer can afford 4 GB RAM 
Red Hat linux machines for Zookeeper. Solr Machines will have in production 64 
or 96 GB of ram, depending on the dimension of the index.

My primary concern is maintenance of the structure. With single independent 
machines, the situation is trivial, we can stop solr on one of the machine 
during the night, and issue a full backup of the indexes. With a full backup of 
the indexes, rebuilding a machine from scratch in case of disaster is simple, 
just spin off a new Virtual machine, restore the backup, restart solr and 
everything is ok.

If for any reason the SolrCloud cluster stops working, restoring everything is 
somewhat more complicated. Are there any best practice for SolrCloud to backup 
everything so we can restore the entire cluster if anything goes wrong?

Thanks a lot for the interesting discussion and for the really useful 
information you gave me.

--
Gian Maria Ricci
Cell: +39 320 0136949


-Original Message-
From: Shawn Heisey [mailto:apa...@elyograg.org] 
Sent: venerdì 11 dicembre 2015 17:11
To: solr-user@lucene.apache.org
Subject: Re: Use multiple istance simultaneously

On 12/11/2015 8:19 AM, Gian Maria Ricci - aka Alkampfer wrote:
> Thanks for all of your clarification. I know that solrcloud is a 
> really better configuration than any other, but actually it has a 
> complexity that is really higher. I just want to give you the pain 
> point I've noticed while I was gathering all the info I can got on SolrCloud.
> 
> 1) zookeeper documentation says that to have the best experience you 
> should have a dedicated filesystem for the persistence and it should 
> never swap to disk. I've not found any guidelines on how I should 
> dimension zookeeper machine, how much ram, disk? Can I install 
> zookeeper in the same machines where Solr resides ( I suspect no, 
> because Solr machine are under stress and if zookeeper start swapping is can 
> lead to problem)?

Standalone zookeeper doesn't require much in the way of resources.
Unless the SolrCloud installation is enormous, a machine with 1-2GB of RAM is 
probably plenty, if the only thing it is doing is zookeeper and it's not 
running Windows.  If the SolrCloud install has a lot of collections, shards, 
and/or servers, then you might need more, because the zookeeper database will 
be larger.

> 2) What about the update? If I need to update my solrcloud instance 
> and the new version requires a new version of zookeeper which is the 
> path to go? I need to first update zookeeper, or upgrading solr to existing 
> machine or?
> Maybe I did not search well but I did not find a comprehensive 
> guideline that told me how to upgrade my SolrCloud installation in various 
> situation.

If you're following recommendations and using standalone zookeeper, then 
upgrading it is entirely separate from upgrading Solr.  It's probably a good 
idea to upgrade your three (or more) zookeeper servers first.

Here's a FAQ entry from zookeeper about upgrades:

https://wiki.apache.org/hadoop/ZooKeeper/FAQ#A6

> 3) Which are the best practices to run DIH in solrcloud? I think I can 
> round robin triggering DIH import on different server composing the 
> cloud infrastructure, or there is a better way to go? (I probably need 
> to trigger a DIH each 5/10 minutes but the number of new records is 
> really small)

When checking the status of an import, you must send the status request to the 
same machine where you sent the command to start the import.

If you're only ever going to run one DIH at a time, then I don't see any reason 
to involve multiple servers.  If you want to run more than one simultaneously, 
then you might want to run each one on a different machine.

> 4) Since I believe that it is not best practice to install zookeeper 
> on same SolrMachine (as separated process, not the built in 
> zookeeper), I need at least three more machine to maintain / monitor / 
> upgrade and I need also to monitor zookeeper, a new appliance that 
> need to be mastered by IT Infrastructure.

The only real reason to avoid zookeeper and Solr on the same machine is 
performance under high load, and mostly that comes down to I/O performance, so 
if you can put zookeeper on a separate set of disks, you're probably good.  If 
the query/update load will not be high, then sharing machines will likely work 
well, even if the disks are all shared.

> Is there any guidelines on how to automate promoting a slave as a 
> master in classic Master Slave situation? I did not find anything 
> official, because auto promoting a slave into master could solve my problem.

I don't know of any explicit information explaining how to promote a new 
master.  Basically what you have to do is reconfigure the new master's 
replication (so it stops trying to be a slave), reconfigure every slave to 
point to the new master, and reconfigure every client that makes 

Re: Solr Cloud 5.3.0 Read Time Outs

2015-12-12 Thread Shalin Shekhar Mangar
Yes, that is probably the cause. I think you have very aggressive
commit rates and Solr is not able to keep up. If you are sending
explicit commits, switch to using autoCommit with openSearcher=false
every 5-10 minutes (this depends on your indexing rate) and
autoSoftCommit every 2-5 minutes. Adjust as necessary.

On Sat, Dec 12, 2015 at 10:08 PM, Adrian Liew  wrote:
> Hi there,
>
> I am using Solr Cloud 5.3.0 on a multiserver cluster (3 servers to mention) 
> whereby each server spec is at 16 core and 32 GB Ram.
>
> I am facing regular errors -  Error sending update to http://someip:8983/solr 
>  - "Timeout occured while waiting response from server at server a"  ... 
> Caused by java.net.SocketTimeoutException: Read Timed out.
>
> I am not sure if this error can be caused due to some preceding warnings 
> reported such as
>
> Error sending update to http://someip:8983/solr  -
> org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error 
> from server at http://someip:8983/solr/sitecore_master_index_shard1_replica3: 
> Error opening new searcher. exceeded limit of maxWarmingSearchers=6
>
> Can the maxWarmingSearchers error possibly cause the read timeouts to occur? 
> If yes, when maxWarmingSearchers warning is addressed, will that remove the 
> errors for the read timeouts?
>
> Best regards,
> Adrian
>



-- 
Regards,
Shalin Shekhar Mangar.


Re: Getting a document version back after updating

2015-12-12 Thread Debraj Manna
I was thinking if it is possible to get the version without making one more
extra call getById. Can I get that as part of the update response when I am
updating or adding a new document?
On Dec 12, 2015 3:28 PM, "Shalin Shekhar Mangar" 
wrote:

> You will have to request a real-time-get with the unique key of the
> document you added/updated. In Solr 5.1+ you can go
> client.getById(String id) to get this information.
>
> On Sat, Dec 12, 2015 at 10:19 AM, Debraj Manna 
> wrote:
> > Is there a way I can get the version of a document back in response after
> > adding or updating the document via Solrj 5.2.1?
>
>
>
> --
> Regards,
> Shalin Shekhar Mangar.
>


Solr Cloud 5.3.0 Read Time Outs

2015-12-12 Thread Adrian Liew
Hi there,

I am using Solr Cloud 5.3.0 on a multiserver cluster (3 servers to mention) 
whereby each server spec is at 16 core and 32 GB Ram.

I am facing regular errors -  Error sending update to http://someip:8983/solr  
- "Timeout occured while waiting response from server at server a"  ... Caused 
by java.net.SocketTimeoutException: Read Timed out.

I am not sure if this error can be caused due to some preceding warnings 
reported such as

Error sending update to http://someip:8983/solr  -
org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error 
from server at http://someip:8983/solr/sitecore_master_index_shard1_replica3: 
Error opening new searcher. exceeded limit of maxWarmingSearchers=6

Can the maxWarmingSearchers error possibly cause the read timeouts to occur? If 
yes, when maxWarmingSearchers warning is addressed, will that remove the errors 
for the read timeouts?

Best regards,
Adrian



Re: Getting a document version back after updating

2015-12-12 Thread Alexandre Rafalovitch
Does "versions=true" flag match what you are looking for? It is
described towards the end of:
https://cwiki.apache.org/confluence/display/solr/Updating+Parts+of+Documents#UpdatingPartsofDocuments-OptimisticConcurrency

Regards,
   Alex.

Newsletter and resources for Solr beginners and intermediates:
http://www.solr-start.com/


On 12 December 2015 at 11:35, Debraj Manna  wrote:
> I was thinking if it is possible to get the version without making one more
> extra call getById. Can I get that as part of the update response when I am
> updating or adding a new document?
> On Dec 12, 2015 3:28 PM, "Shalin Shekhar Mangar" 
> wrote:
>
>> You will have to request a real-time-get with the unique key of the
>> document you added/updated. In Solr 5.1+ you can go
>> client.getById(String id) to get this information.
>>
>> On Sat, Dec 12, 2015 at 10:19 AM, Debraj Manna 
>> wrote:
>> > Is there a way I can get the version of a document back in response after
>> > adding or updating the document via Solrj 5.2.1?
>>
>>
>>
>> --
>> Regards,
>> Shalin Shekhar Mangar.
>>


Re: Solrcloud 4.8.1 - Solr cores reload

2015-12-12 Thread Erick Erickson
Right. What's happening is, essentially what used to be
happening in your custom code where individual core
reload commands were being sent. Except it's all happening
in Solr. To whit:
1> the code looks at the collection state
2> for each replica it sends a core admin API reload command
 to the appropriate node.

It's really nothing different than what you probably had before
but I'm much more confident in code that's
1> written by the same people who wrote the rest of the Cloud code
2> tested in the Solr test case
3> not something I can forget to maintain ;)

Best,
Erick

On Fri, Dec 11, 2015 at 6:10 PM, Vincenzo D'Amore  wrote:
> Thanks for your suggestion Erick, I'm changing the code and I'll use the
> Collections API RELOAD.
> I have done few test changing synonyms dictionary or solrconfig and
> everything works fine.
>
> Well, I think you already know, but looking at solr.log file after the
> collections api reload call, I have seen a bunch of lines like this one:
>
> - Collection Admin sending CoreAdmin cmd to http://192.168.101.118:8080/solr
> params:action=RELOAD=collection1_shard1_replica1=%2Fadmin%2Fcores
> ...
>
> Best regards and thanks again,
> Vincenzo
>
>
> On Fri, Dec 11, 2015 at 7:38 PM, Erick Erickson 
> wrote:
>
>> You should absolutely always use the Collection API rather than
>> any core admin API if at all possible. If for no other reason
>> than your client will be _lots_ simpler (i.e. you don't have
>> to find all the replicas and issue the core admin RELOAD
>> command for each one).
>>
>> I'm not entirely sure whether the RELOAD command is
>> synchronous or not though.
>>
>> Best,
>> erick
>>
>> On Fri, Dec 11, 2015 at 8:22 AM, Vincenzo D'Amore 
>> wrote:
>> > Hi all,
>> >
>> > in day by day work, often I need to change the solr configurations files.
>> > Often adding new synonyms, changing the schema or the solrconfig.xml.
>> >
>> > Everything is stored in zookeeper.
>> >
>> > But I have inherited a piece of code that, after every change, reload all
>> > the cores using CoreAdmin API.
>> >
>> > Now I have 15 replicas in the collection, and after every core reload the
>> > code waits for 60 seconds (I suppose it's because who wrote the code was
>> > worried about the cache invalidation).
>> >
>> > Given that, it takes about 25 minutes to update all the cores. Obviously
>> > during this time we cannot modify the collection.
>> >
>> > The question is, to reduce this wait, if I use the collection API RELOAD,
>> > what are the counter indication?
>> >
>> > Thanks in advance for your time,
>> > Vincenzo
>> >
>> >
>> > --
>> > Vincenzo D'Amore
>> > email: v.dam...@gmail.com
>> > skype: free.dev
>> > mobile: +39 349 8513251
>>
>
>
>
> --
> Vincenzo D'Amore
> email: v.dam...@gmail.com
> skype: free.dev
> mobile: +39 349 8513251


Re: Solr Cloud 5.3.0 Read Time Outs

2015-12-12 Thread Erick Erickson
+1 to what Shalin said. You've adjusted maxWarmingSeachers up,
probably because you saw warnings in the log files. This is _not_
the solution to the "MaxWarmingSearchers exceeded" error. The solution
is, as Shalin says, decrease your commit frequency.

Commit can be an expensive operation,
see: 
https://lucidworks.com/blog/2013/08/23/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/

Best,
Erick

On Sat, Dec 12, 2015 at 9:54 AM, Shalin Shekhar Mangar
 wrote:
> Yes, that is probably the cause. I think you have very aggressive
> commit rates and Solr is not able to keep up. If you are sending
> explicit commits, switch to using autoCommit with openSearcher=false
> every 5-10 minutes (this depends on your indexing rate) and
> autoSoftCommit every 2-5 minutes. Adjust as necessary.
>
> On Sat, Dec 12, 2015 at 10:08 PM, Adrian Liew  wrote:
>> Hi there,
>>
>> I am using Solr Cloud 5.3.0 on a multiserver cluster (3 servers to mention) 
>> whereby each server spec is at 16 core and 32 GB Ram.
>>
>> I am facing regular errors -  Error sending update to 
>> http://someip:8983/solr  - "Timeout occured while waiting response from 
>> server at server a"  ... Caused by java.net.SocketTimeoutException: Read 
>> Timed out.
>>
>> I am not sure if this error can be caused due to some preceding warnings 
>> reported such as
>>
>> Error sending update to http://someip:8983/solr  -
>> org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error 
>> from server at 
>> http://someip:8983/solr/sitecore_master_index_shard1_replica3: Error opening 
>> new searcher. exceeded limit of maxWarmingSearchers=6
>>
>> Can the maxWarmingSearchers error possibly cause the read timeouts to occur? 
>> If yes, when maxWarmingSearchers warning is addressed, will that remove the 
>> errors for the read timeouts?
>>
>> Best regards,
>> Adrian
>>
>
>
>
> --
> Regards,
> Shalin Shekhar Mangar.


Re: SolrCloud 4.8.1 - commit wait

2015-12-12 Thread Erick Erickson
Autowarm times will only happen when the commit has openSearcher=true
or on a soft commit. But maybe your log levels aren't at INFO for the right
code...

That said, your autowarm counts at 0 probably means that you're not seeing
any autowarming really, so that might be a red herring. Your newSearcher
event in solrconfig.xml will still be fired, but may be commented out.

This is still something of a puzzle. With an index this size, your hard
commits should never take more than a second or two unless you're
in some very strange state. Stack traces would be in order if lengthening
the commit interval doesn't work.

Best,
Erick

On Fri, Dec 11, 2015 at 5:58 PM, Vincenzo D'Amore  wrote:
> Hi All,
>
> an update, I have switched logging from WARN to INFO for all except for
> those two:
>
> - org.apache.solr.core
> - org.apache.solr.handler.component.SpellCheckComponent
>
> Well, looking at log file I'm unable to find any autowarm log line, even
> after few updates and commits.
>
> Looking at solrconfig.xml I see most autowarmCount parameters are set to 0
>
>  autowarmCount="0" />
>  autowarmCount="0" />
>  autowarmCount="0" />
>  ="10" initialSize="0" autowarmCount="10" regenerator="solr.NoOpRegenerator"
> />
>
> Not sure what this means...
>
> On Sat, Dec 12, 2015 at 1:13 AM, Vincenzo D'Amore 
> wrote:
>
>> Thanks Erick, Mark,
>>
>> I'll raise maxTime asap.
>> Just to be sure understand, given that I have openSearcher=false, I
>> suppose it shouldn't trigger autowarming at least until a commit is
>> executed, shouldn't it?
>>
>> Anyway, I don't understand, given that maxTime is very aggressive, why
>> hard commit takes so long.
>>
>> Thanks again for your answers.
>> Vincenzo
>>
>>
>> On Fri, Dec 11, 2015 at 7:22 PM, Erick Erickson 
>> wrote:
>>
>>> First of all, your autocommit settings are _very_ aggressive. Committing
>>> every second is far to frequent IMO.
>>>
>>> As an aside, I generally prefer to omit the maxDocs as it's not all
>>> that predictable,
>>> but that's a personal preference and really doesn't bear on your problem..
>>>
>>> My _guess_ is that you are doing a lot of autowarming. The number of docs
>>> doesn't really matter if your autowarming is taking forever, your Solr
>>> logs
>>> should report the autowarm times at INFO level, have you checked those?
>>>
>>> The commit settings shouldn't be a problem in terms of your server dying,
>>> the indexing process flushes docs to the tlog independent of committing so
>>> upon restart they should be recovered. Here's a blog on the subject:
>>>
>>>
>>> https://lucidworks.com/blog/2013/08/23/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/
>>>
>>> Best,
>>> Erick
>>>
>>> On Fri, Dec 11, 2015 at 8:24 AM, Vincenzo D'Amore 
>>> wrote:
>>> > Hi all,
>>> >
>>> > I have a SolrCloud cluster with a collection (2.5M docs) with 3 shards
>>> and
>>> > 15 replicas.
>>> > There is a solrj application that feeds the collection, updating few
>>> > documents every hour, I don't understand why, at end of process, the
>>> hard
>>> > commit takes about 8/10 minutes.
>>> >
>>> > Even if there are only few hundreds of documents.
>>> >
>>> > This is the autocommit configuration:
>>> >
>>> > 
>>> > 1
>>> > 1000
>>> > false
>>> > 
>>> >
>>> > In your experience why hard commit takes so long even for so few
>>> documents?
>>> >
>>> > Now I'm changing the code to softcommit, calling commit (waitFlush =
>>> > false, waitSearcher
>>> > = false, softCommit = true);
>>> >
>>> > solrServer.commit(false, false, true);.
>>> >
>>> > I have configured NRTCachingDirectoryFactory, but I'm a little bit
>>> worried
>>> > if a server goes down (something like: kill -9, SolrCloud crashes, out
>>> of
>>> > memory, etc.), and if, using this strategy
>>> softcommit+NRTCachingDirectory,
>>> > SolrCloud instance could not recover a replica.
>>> >
>>> > Should I worry about this new configuration? I was thinking to take a
>>> > snapshot of everything every day, in order to recover immediately the
>>> > index. Could this be considered a best practice?
>>> >
>>> > Thanks in advance for your time,
>>> > Vincenzo
>>> >
>>> > --
>>> > Vincenzo D'Amore
>>> > email: v.dam...@gmail.com
>>> > skype: free.dev
>>> > mobile: +39 349 8513251
>>>
>>
>>
>>
>> --
>> Vincenzo D'Amore
>> email: v.dam...@gmail.com
>> skype: free.dev
>> mobile: +39 349 8513251
>>
>
>
>
> --
> Vincenzo D'Amore
> email: v.dam...@gmail.com
> skype: free.dev
> mobile: +39 349 8513251


Re: Getting a document version back after updating

2015-12-12 Thread Shalin Shekhar Mangar
Oh yes, I had forgotten about that! Thanks Alexandre!

On Sat, Dec 12, 2015 at 11:57 PM, Alexandre Rafalovitch
 wrote:
> Does "versions=true" flag match what you are looking for? It is
> described towards the end of:
> https://cwiki.apache.org/confluence/display/solr/Updating+Parts+of+Documents#UpdatingPartsofDocuments-OptimisticConcurrency
>
> Regards,
>Alex.
> 
> Newsletter and resources for Solr beginners and intermediates:
> http://www.solr-start.com/
>
>
> On 12 December 2015 at 11:35, Debraj Manna  wrote:
>> I was thinking if it is possible to get the version without making one more
>> extra call getById. Can I get that as part of the update response when I am
>> updating or adding a new document?
>> On Dec 12, 2015 3:28 PM, "Shalin Shekhar Mangar" 
>> wrote:
>>
>>> You will have to request a real-time-get with the unique key of the
>>> document you added/updated. In Solr 5.1+ you can go
>>> client.getById(String id) to get this information.
>>>
>>> On Sat, Dec 12, 2015 at 10:19 AM, Debraj Manna 
>>> wrote:
>>> > Is there a way I can get the version of a document back in response after
>>> > adding or updating the document via Solrj 5.2.1?
>>>
>>>
>>>
>>> --
>>> Regards,
>>> Shalin Shekhar Mangar.
>>>



-- 
Regards,
Shalin Shekhar Mangar.


Re: Getting a document version back after updating

2015-12-12 Thread Debraj Manna
Thanks Alex. This is what I was looking for. One more query how to set this
from solrj while calling add() ? Do I have to make a curl call with this
param set?
On Dec 13, 2015 12:53 AM, "Shalin Shekhar Mangar" 
wrote:

> Oh yes, I had forgotten about that! Thanks Alexandre!
>
> On Sat, Dec 12, 2015 at 11:57 PM, Alexandre Rafalovitch
>  wrote:
> > Does "versions=true" flag match what you are looking for? It is
> > described towards the end of:
> >
> https://cwiki.apache.org/confluence/display/solr/Updating+Parts+of+Documents#UpdatingPartsofDocuments-OptimisticConcurrency
> >
> > Regards,
> >Alex.
> > 
> > Newsletter and resources for Solr beginners and intermediates:
> > http://www.solr-start.com/
> >
> >
> > On 12 December 2015 at 11:35, Debraj Manna 
> wrote:
> >> I was thinking if it is possible to get the version without making one
> more
> >> extra call getById. Can I get that as part of the update response when
> I am
> >> updating or adding a new document?
> >> On Dec 12, 2015 3:28 PM, "Shalin Shekhar Mangar" <
> shalinman...@gmail.com>
> >> wrote:
> >>
> >>> You will have to request a real-time-get with the unique key of the
> >>> document you added/updated. In Solr 5.1+ you can go
> >>> client.getById(String id) to get this information.
> >>>
> >>> On Sat, Dec 12, 2015 at 10:19 AM, Debraj Manna <
> subharaj.ma...@gmail.com>
> >>> wrote:
> >>> > Is there a way I can get the version of a document back in response
> after
> >>> > adding or updating the document via Solrj 5.2.1?
> >>>
> >>>
> >>>
> >>> --
> >>> Regards,
> >>> Shalin Shekhar Mangar.
> >>>
>
>
>
> --
> Regards,
> Shalin Shekhar Mangar.
>