Re: Changing definition of id field

2017-03-13 Thread danny teichthal
Thanks Shawn,
I understand that changing id to to string is not an option.

I have a limitation that our Solr cluster  is "live" during full indexing.
We have many updates and the index is incrementally updated.
There's no way to index everything on a side index and replace.
So, I'm trying to find a solution where users can still work not depending
on my full indexing.
We have a periodic indexing job that runs once in a few weeks.

I thought about another option:
1. Add a new field - "id_str".
2. Let periodic full indexing run.
3. Somewhere later during full change the uniqeKey from my "id" to "id_str".

May this work? Assuming that all ids are unique?



On Thu, Mar 9, 2017 at 5:14 PM, Shawn Heisey <apa...@elyograg.org> wrote:

> On 3/9/2017 4:20 AM, danny teichthal wrote:
> > I have an "id" field that is defined on schema.xml with type long.
> > For some use cases the id that is indexed exceeds Max long limitation.
> > I thought about solving it by changing the id to type string.
> >
> > For my surprise, by only changing the definition on schema.xml and
> > restarting Solr, I was able to read documents and update without the need
> > to re-index.
>
> In general, changing the class used for a field won't work without a
> reindex.  You'll probably find that queries don't work to find the old
> documents, even if Solr is able to correctly show the field in results.
> I am very surprised that *anything* works with that change and an
> existing index.
>
> If "id" is your uniqueKey and you are relying on Solr's ability to
> replace existing documents using that field, you'll likely discover that
> reindexing existing documents will result in duplicates instead of
> deleting the old one.
>
> Thanks,
> Shawn
>
>


Changing definition of id field

2017-03-09 Thread danny teichthal
Hi,
I have an "id" field that is defined on schema.xml with type long.
For some use cases the id that is indexed exceeds Max long limitation.
I thought about solving it by changing the id to type string.

For my surprise, by only changing the definition on schema.xml and
restarting Solr, I was able to read documents and update without the need
to re-index.

My questions are:
1. Is there another way to overcome the problem except from changing the id
to string?
2. Are there any downsides by changing id to string? I know that the
schema.xml example comes with id as string by default.
3. I want to avoid full indexing, is it valid just to change the field
type? Are there any known issues with this?There's a comment on the example
schema about not changing the type, mentioning document routing.



This is how the long and string types are defined:



I'm using Solr Cloud on version 5.2.1.
Multiple collections, every collection has only one shard.

Thanks,


Re: Solr 6.1.0, zookeeper 3.4.8, Solrj and SolrCloud

2016-08-21 Thread danny teichthal
Hi,
Not sure if it is related, but could be - I see that you do this =
CloudSolrClient
solrClient = new
CloudSolrClient.Builder().withZkHost(zkHosts).build();
Are you creating a new client on each update?
If yes, pay attention that the Solr Client should be a singleton.

Regarding session timeout, what value did you set - zkClientTimeout?
The parameter - maxSessionTimeout, controls this time out on zookeeper side.
zkClientTimeout - controls your client timeout.


Session expiry can also be affected from:
1. Garbage collection on Solr node/Zookeeper.
2. Slow IO on disk.
3. Network latency.

You should check these metrics on your system at the time you got this
expiry to see if it might be related.
If your zkClientTimeout  is set to a small value in addition to one of the
factors above - you could get many of these exceptions.






On Thu, Aug 18, 2016 at 6:51 PM, Narayana B  wrote:

> Hi SolrTeam,
>
> I see session exipre and my solr index fails.
>
> please help me here, my infra details are shared below
>
> I have total 3 compute
> nodes[pcam-stg-app-02,pcam-stg-app-03,pcam-stg-app-04]
>
> 1) 3 nodes are running with zoo1, zoo2, zoo3 instances
>
> /apps/scm-core/zookeeper/zkData/zkData1/myid  value 1
> /apps/scm-core/zookeeper/zkData/zkData2/myid  value 2
> /apps/scm-core/zookeeper/zkData/zkData3/myid  value 3
>
> zoo1.cfg my setup
>
> tickTime=2000
> initLimit=5
> syncLimit=2
> dataDir=/apps/scm-core/zookeeper/zkData/zkData1
> clientPort=2181
> server.1=pcam-stg-app-01:2888:3888
> server.2=pcam-stg-app-02:2888:3888
> server.3=pcam-stg-app-03:2888:3888
> server.4=pcam-stg-app-04:2888:3888
> dataLogDir=/apps/scm-core/zookeeper/zkLogData/zkLogData1
> # Default 64M, changed to 128M, represented in KiloBytes
> preAllocSize=131072
> # Default : 10
> snapCount=100
> globalOutstandingLimit=1000
> maxClientCnxns=100
> autopurge.snapRetainCount=3
> autopurge.purgeInterval=23
> minSessionTimeout=4
> maxSessionTimeout=30
>
> [zk: pcam-stg-app-02:2181(CONNECTED) 0] ls /
> [zookeeper, solr]
> [zk: pcam-stg-app-02:2181(CONNECTED) 1] ls /solr
> [configs, overseer, aliases.json, live_nodes, collections, overseer_elect,
> security.json, clusterstate.json]
>
>
>
> 2) 2 nodes are running solrcloud
> pcam-stg-app-03: solr port 8983, solr port 8984
> pcam-stg-app-04: solr port 8983, solr port 8984
>
>
> Config upload to zookeeper
>
> server/scripts/cloud-scripts/zkcli.sh -zkhost
> pcam-stg-app-02:2181,pcam-stg-app-03:2181,pcam-stg-app-04:2181/solr \
> -cmd upconfig -confname scdata -confdir
> /apps/scm-core/solr/solr-6.1.0/server/solr/configsets/data_
> driven_schema_configs/conf
>
> Collection creation url:
>
> http://pcam-stg-app-03:8983/solr/admin/collections?action=
> CREATE=scdata_test=2=
> 2=2=pcam-stg-app-03:
> 8983_solr,pcam-stg-app-03:8984_solr,pcam-stg-app-04:
> 8983_solr,pcam-stg-app-04:8984_solr=scdata
>
> solrj client
>
>
> String zkHosts =
> "pcam-stg-app-02:2181,pcam-stg-app-03:2181,pcam-stg-app-04:2181/solr";
> CloudSolrClient solrClient = new
> CloudSolrClient.Builder().withZkHost(zkHosts).build();
> solrClient.setDefaultCollection("scdata_test");
> solrClient.setParallelUpdates(true);
>
> List cpnSpendSavingsList = new ArrayList<>();
> i have done data setter to cpnSpendSavingsList
>
> solrClient.addBeans(cpnSpendSavingsList);
> solrClient.commit();
>
>
>
>
> SessionExpire Error for the collections
>
> Why this SessionExpire error comes when i start bulk insert/update to solr
>
>
> org.apache.solr.common.SolrException: Could not load collection from ZK:
> scdata_test
> at
> org.apache.solr.common.cloud.ZkStateReader.getCollectionLive(
> ZkStateReader.java:1047)
> at
> org.apache.solr.common.cloud.ZkStateReader$LazyCollectionRef.get(
> ZkStateReader.java:610)
> at
> org.apache.solr.common.cloud.ClusterState.getCollectionOrNull(
> ClusterState.java:211)
> at
> org.apache.solr.common.cloud.ClusterState.hasCollection(
> ClusterState.java:113)
> at
> org.apache.solr.client.solrj.impl.CloudSolrClient.getCollectionNames(
> CloudSolrClient.java:1239)
> at
> org.apache.solr.client.solrj.impl.CloudSolrClient.
> requestWithRetryOnStaleState(CloudSolrClient.java:961)
> at
> org.apache.solr.client.solrj.impl.CloudSolrClient.request(
> CloudSolrClient.java:934)
> at
> org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:149)
> at org.apache.solr.client.solrj.SolrClient.add(SolrClient.
> java:106)
> at
> org.apache.solr.client.solrj.SolrClient.addBeans(SolrClient.java:357)
> at
> org.apache.solr.client.solrj.SolrClient.addBeans(SolrClient.java:329)
> at
> com.cisco.pcam.spark.stream.HiveDataProcessStream.main(
> HiveDataProcessStream.java:165)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:
> 62)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(

Re: SOLR-7036 - a new Faster method for facet grouping

2016-08-14 Thread danny teichthal
Hi,
A reminder, in case anyone that is interested in performance of grouped
facet missed the older mail.
There's a new patch for improving grouped facet performance that works with
latest branch.

It uses the UIF method from JSON API.
Can be a first step for adding support for grouped facets on JSON API.

Please take a look at https://issues.apache.org/jira/browse/SOLR-7036
Comments and votes are welcome.





On Wed, Jul 27, 2016 at 11:31 AM, danny teichthal <dannyt...@gmail.com>
wrote:

> Hi,
> SOLR-7036 introduced a new faster method for group.facet, which uses
> UnInvertedField.
> It was patched for version 4.x.
> Over the last week, my colleague uploaded a new patch that work against
> the trunk.
>
> We would really appreciate if anyone could take a look at it and give us
> some feedback about it.
> Full details and performance tests results were also added to the JIRA
> issue.
>
> We are willing to work at it and if possible backport it to an older
> branch.
>
> Link:
> https://issues.apache.org/jira/browse/SOLR-7036
>
>
> Thanks in advance,
>


Re: Solr DeleteByQuery vs DeleteById

2016-08-09 Thread danny teichthal
Hi Bharath,
I'm no expert, but we had some major problems because of deleteByQuery ( in
short DBQ).
We ended up replacing all of our DBQ to delete by ids.

My suggestion is that if you don't realy need it - don't use it.
Especially in your case, since you already know the population of ids, it
is redundant to query for it.

I don't know how CDCR works, but we have a replication factor of 2 on our
SolrCloud cluster.
Since Solr 5.x , DBQ were stuck for a long while on the replicas, blocking
all updates.
It appears that on the replica side, there's an overhead of reordering and
executing the same DBQ over and over again, for consistency reasons.
It ends up buffering many delete by queries and blocks all updates.
In addition there's another defect on related slowness on DBQ - LUCENE-7049





On Tue, Aug 9, 2016 at 7:14 AM, Bharath Kumar 
wrote:

> Hi All,
>
> We are using SOLR 6.1 and i wanted to know which is better to use -
> deleteById or deleteByQuery?
>
> We have a program which deletes 10 documents every 5 minutes from the
> SOLR and we do it in a batch of 200 to delete those documents. For that we
> now use deleteById(List ids, 1) to delete.
> I wanted to know if we change it to deleteByQuery(query, 1) where the
> query is like this - (id:1 OR id:2 OR id:3 OR id:4). Will this have a
> performance impact?
>
> We use SOLR cloud with 3 SOLR nodes in the cluster and also we have a
> similar setup on the target site and we use Cross Data Center Replication
> to replicate from main site.
>
> Can you please let me know if using deleteByQuery will have any impact? I
> see it opens real time searcher on all the nodes in cluster.
>
> --
> Thanks & Regards,
> Bharath MV Kumar
>
> "Life is short, enjoy every moment of it"
>


SOLR-7036 - Faster method for group.facet - new patch for trunk

2016-07-27 Thread danny teichthal
Hi,
SOLR-7036 introduced a new faster method for group.facet, which uses
UnInvertedField.
It was patched for version 4.x.
Over the last week, my colleague uploaded a new patch that work against the
trunk.

We would really appreciate if anyone could take a look at it and give us
some feedback about it.
Full details and performance tests results were also added to the JIRA
issue.

We are willing to work at it and if possible backport it to an older branch.

Link:
https://issues.apache.org/jira/browse/SOLR-7036


Thanks in advance,


Re: SOLR war for SOLR 6

2016-06-19 Thread danny teichthal
If you are running on tomcat you will probably have a deployment problem.
On version 5.2.1 it worked fine for me, I manually packaged solr.war on
build time.
But, when trying to upgrade to Solr 5.5.1, I had problems with incompatible
servlet-api of Solr's jetty version and my tomcat servlert-api.
Solr code explicitly use some new methods that existed in the Jetty, but
not in my tomcat.
For me it was a no-go, from all the reasons that Shawn stated.




On Sat, Jun 18, 2016 at 12:26 AM, Shawn Heisey  wrote:

> On 6/16/2016 1:20 AM, Bharath Kumar wrote:
> > I was trying to generate a solr war out of the solr 6 source, but even
> > after i create the war, i was not able to get it deployed correctly on
> > jboss. Wanted to know if anyone was able to successfully generate solr
> > war and deploy it on tomcat or jboss? Really appreciate your help on
> > this.
>
> FYI: If you do this, you're running an unsupported configuration.
> You're on your own for both getting it working AND any problems that are
> related to the deployment rather than Solr itself.
>
> You actually don't need to create a war.  Just run "ant clean server" in
> the solr directory of the source code and then install the exploded
> webapp (found in server/solr-webapp/webapp) into the container.  There
> should be instructions available for how to install an exploded webapp
> into tomcat or jboss.  As already stated, you are on your own for
> finding and following those instructions, and if Solr doesn't deploy,
> you will need to talk to somebody who knows the container for help.
> Once they are sure you have the config for the container right, they may
> refer you back here ... but because it's an unsupported config, the
> amount of support we can offer is minimal.
>
> https://wiki.apache.org/solr/WhyNoWar
>
> If you want the admin UI to work when you install into a user-supplied
> container, then you must set the context path for the app to "/solr".
> The admin UI in 6.x will not work if you use another path, and that is
> not considered a bug, because the only supported container has the path
> hardcoded to /solr.
>
> Thanks,
> Shawn
>
>


Re: Questions on SolrCloud core state, when will Solr recover a "DOWN" core to "ACTIVE" core.

2016-04-21 Thread danny teichthal
Hi Li,
If you could supply some more info from your logs would help.
We also had some similar issue. There were some bugs related to SolrCloud
that were solved on solr 4.10.4 and further on solr 5.x.
I would suggest you compare your logs with defects on 4.10.4 release notes
to see if they are the same.
Also, send relevant solr/zookeeper parts of logs to the mailing list.


On Thu, Apr 21, 2016 at 1:50 AM, Li Ding  wrote:

> Hi All,
>
> We are using SolrCloud 4.6.1.  We have observed following behaviors
> recently.  A Solr node in a Solrcloud cluster is up but some of the cores
> on the nodes are marked as down in Zookeeper.  If the cores are parts of a
> multi-sharded collection with one replica,  the queries to that collection
> will fail.  However, when this happened, if we issue queries to the core
> directly, it returns 200 and correct info.  But once Solr got into the
> state, the core will be marked down forever unless we do a restart on Solr.
>
> Has anyone seen this behavior before?  Is there any to get out of the state
> on its own?
>
> Thanks,
>
> Li
>


Re: SolrCloud - Strategy for recovering cluster states

2016-03-02 Thread danny teichthal
According to what you describe, I really don't see the need of core
discovery in Solr Cloud. It will only be used to eagerly load a core on
startup.
If I understand correctly, when ZK = truth, this eager loading can/should
be done by consulting zookeeper instead of local disk.
I agree that it is really confusing.
The best strategy that I see form is to stop relying on core.properties and
keep it all in zookeeper.


On Wed, Mar 2, 2016 at 7:54 PM, Jeff Wartes <jwar...@whitepages.com> wrote:

> Well, with the understanding that someone who isn’t involved in the
> process is describing something that isn’t built yet...
>
> I could imagine changes like:
>  - Core discovery ignores cores that aren’t present in the ZK cluster state
>  - New cores are automatically created to bring a node in line with ZK
> cluster state (addreplica, essentially)
>
> So if the clusterstate said “node XYZ has a replica of shard3 of
> collection1 and that’s all”, and you downed node XYZ and deleted the data
> directory, it’d get restored when you started the node again. And if you
> copied the core directory for shard1 of collection2 in there and restarted
> the node, it’d get ignored because the clusterstate says node XYZ doesn’t
> have that.
>
> More importantly, if you completely destroyed a node and rebuilt it from
> an image, (AWS?) that image wouldn't need any special core directories
> specific to that node. As long as the node name was the same, Solr would
> handle bringing that node back to where it was in the cluster.
>
> Back to opinions, I think mixing the cluster definition between local disk
> on the nodes and ZK clusterstate is just confusing. It should really be one
> or the other. Specifically, I think it should be local disk for
> non-SolrCloud, and ZK for SolrCloud.
>
>
>
>
>
> On 3/2/16, 12:13 AM, "danny teichthal" <dannyt...@gmail.com> wrote:
>
> >Thanks Jeff,
> >I understand your philosophy and it sounds correct.
> >Since we had many problems with zookeeper when switching to Solr Cloud. we
> >couldn't make it as a source of knowledge and had to relay on a more
> stable
> >source.
> >The issues is that when we get such an event of zookeeper, it brought our
> >system down, and in this case, clearing the core.properties were a life
> >saver.
> >We've managed to make it pretty stable not, but we will always need a
> >"dooms day" weapon.
> >
> >I looked into the related JIRA and it confused me a little, and raised a
> >few other questions:
> >1. What exactly defines zookeeper as a truth?
> >2. What is the role of core.properties if the state is only in zookeeper?
> >
> >
> >
> >Your tool is very interesting, I just thought about writing such a tool
> >myself.
> >From the sources I understand that you represent each node as a path in
> the
> >git repository.
> >So, I guess that for restore purposes I will have to do
> >the opposite direction and create a node for every path entry.
> >
> >
> >
> >
> >On Tue, Mar 1, 2016 at 11:36 PM, Jeff Wartes <jwar...@whitepages.com>
> wrote:
> >
> >>
> >> I’ve been running SolrCloud clusters in various versions for a few years
> >> here, and I can only think of two or three cases that the ZK-stored
> cluster
> >> state was broken in a way that I had to manually intervene by
> hand-editing
> >> the contents of ZK. I think I’ve seen Solr fixes go by for those cases,
> >> too. I’ve never completely wiped ZK. (Although granted, my ZK cluster
> has
> >> been pretty stable, and my collection count is smaller than yours)
> >>
> >> My philosophy is that ZK is the source of cluster configuration, not the
> >> collection of core.properties files on the nodes.
> >> Currently, cluster state is shared between ZK and core directories. I’d
> >> prefer, and I think Solr development is going this way, (SOLR-7269) that
> >> all cluster state exist and be managed via ZK, and all state be removed
> >> from the local disk of the cluster nodes. The fact that a node uses
> local
> >> disk based configuration to figure out what collections/replicas it has
> is
> >> something that should be fixed, in my opinion.
> >>
> >> If you’re frequently getting into bad states due to ZK issues, I’d
> suggest
> >> you file bugs against Solr for the fact that you got into the state, and
> >> then fix your ZK cluster.
> >>
> >> Failing that, can you just periodically back up your ZK data and restore
> >> it if something breaks? I wrote a little tool to watch clusterstate.j

Re: SolrCloud - Strategy for recovering cluster states

2016-03-02 Thread danny teichthal
Thanks Jeff,
I understand your philosophy and it sounds correct.
Since we had many problems with zookeeper when switching to Solr Cloud. we
couldn't make it as a source of knowledge and had to relay on a more stable
source.
The issues is that when we get such an event of zookeeper, it brought our
system down, and in this case, clearing the core.properties were a life
saver.
We've managed to make it pretty stable not, but we will always need a
"dooms day" weapon.

I looked into the related JIRA and it confused me a little, and raised a
few other questions:
1. What exactly defines zookeeper as a truth?
2. What is the role of core.properties if the state is only in zookeeper?



Your tool is very interesting, I just thought about writing such a tool
myself.
>From the sources I understand that you represent each node as a path in the
git repository.
So, I guess that for restore purposes I will have to do
the opposite direction and create a node for every path entry.




On Tue, Mar 1, 2016 at 11:36 PM, Jeff Wartes <jwar...@whitepages.com> wrote:

>
> I’ve been running SolrCloud clusters in various versions for a few years
> here, and I can only think of two or three cases that the ZK-stored cluster
> state was broken in a way that I had to manually intervene by hand-editing
> the contents of ZK. I think I’ve seen Solr fixes go by for those cases,
> too. I’ve never completely wiped ZK. (Although granted, my ZK cluster has
> been pretty stable, and my collection count is smaller than yours)
>
> My philosophy is that ZK is the source of cluster configuration, not the
> collection of core.properties files on the nodes.
> Currently, cluster state is shared between ZK and core directories. I’d
> prefer, and I think Solr development is going this way, (SOLR-7269) that
> all cluster state exist and be managed via ZK, and all state be removed
> from the local disk of the cluster nodes. The fact that a node uses local
> disk based configuration to figure out what collections/replicas it has is
> something that should be fixed, in my opinion.
>
> If you’re frequently getting into bad states due to ZK issues, I’d suggest
> you file bugs against Solr for the fact that you got into the state, and
> then fix your ZK cluster.
>
> Failing that, can you just periodically back up your ZK data and restore
> it if something breaks? I wrote a little tool to watch clusterstate.json
> and write every version to a local git repo a few years ago. I was mostly
> interested because I wanted to see changes that happened pretty fast, but
> it could also serve as a backup approach. Here’s a link, although I clearly
> haven’t touched it lately. Feel free to ask if you have issues:
> https://github.com/randomstatistic/git_zk_monitor
>
>
>
>
> On 3/1/16, 12:09 PM, "danny teichthal" <dannyt...@gmail.com> wrote:
>
> >Hi,
> >Just summarizing my questions if the long mail is a little intimidating:
> >1. Is there a best practice/automated tool for overcoming problems in
> >cluster state coming from zookeeper disconnections?
> >2. Creating a collection via core admin is discouraged, is it true also
> for
> >core.properties discovery?
> >
> >I would like to be able to specify collection.configName in the
> >core.properties and when starting server, the collection will be created
> >and linked to the config name specified.
> >
> >
> >
> >On Mon, Feb 29, 2016 at 4:01 PM, danny teichthal <dannyt...@gmail.com>
> >wrote:
> >
> >> Hi,
> >>
> >>
> >> I would like to describe a process we use for overcoming problems in
> >> cluster state when we have networking issues. Would appreciate if anyone
> >> can answer about what are the flaws on this solution and what is the
> best
> >> practice for recovery in case of network problems involving zookeeper.
> >> I'm working with Solr Cloud with version 5.2.1
> >> ~100 collections in a cluster of 6 machines.
> >>
> >> This is the short procedure:
> >> 1. Bring all the cluster down.
> >> 2. Clear all data from zookeeper.
> >> 3. Upload configuration.
> >> 4. Restart the cluster.
> >>
> >> We rely on the fact that a collection is created on core discovery
> >> process, if it does not exist. It gives us much flexibility.
> >> When the cluster comes up, it reads from core.properties and creates the
> >> collections if needed.
> >> Since we have only one configuration, the collections are automatically
> >> linked to it and the cores inherit it from the collection.
> >> This is a very robust procedure, that helped us overcome many problems
> >> until we stabilized 

Re: SolrCloud - Strategy for recovering cluster states

2016-03-01 Thread danny teichthal
Hi,
Just summarizing my questions if the long mail is a little intimidating:
1. Is there a best practice/automated tool for overcoming problems in
cluster state coming from zookeeper disconnections?
2. Creating a collection via core admin is discouraged, is it true also for
core.properties discovery?

I would like to be able to specify collection.configName in the
core.properties and when starting server, the collection will be created
and linked to the config name specified.



On Mon, Feb 29, 2016 at 4:01 PM, danny teichthal <dannyt...@gmail.com>
wrote:

> Hi,
>
>
> I would like to describe a process we use for overcoming problems in
> cluster state when we have networking issues. Would appreciate if anyone
> can answer about what are the flaws on this solution and what is the best
> practice for recovery in case of network problems involving zookeeper.
> I'm working with Solr Cloud with version 5.2.1
> ~100 collections in a cluster of 6 machines.
>
> This is the short procedure:
> 1. Bring all the cluster down.
> 2. Clear all data from zookeeper.
> 3. Upload configuration.
> 4. Restart the cluster.
>
> We rely on the fact that a collection is created on core discovery
> process, if it does not exist. It gives us much flexibility.
> When the cluster comes up, it reads from core.properties and creates the
> collections if needed.
> Since we have only one configuration, the collections are automatically
> linked to it and the cores inherit it from the collection.
> This is a very robust procedure, that helped us overcome many problems
> until we stabilized our cluster which is now pretty stable.
> I know that the leader might change in such case and may lose updates, but
> it is ok.
>
>
> The problem is that today I want to add a new config set.
> When I add it and clear zookeeper, the cores cannot be created because
> there are 2 configurations. This breaks my recovery procedure.
>
> I thought about a few options:
> 1. Put the config Name in core.properties - this doesn't work. (It is
> supported in CoreAdminHandler, but  is discouraged according to
> documentation)
> 2. Change recovery procedure to not delete all data from zookeeper, but
> only relevant parts.
> 3. Change recovery procedure to delete all, but recreate and link
> configurations for all collections before startup.
>
> Option #1 is my favorite, because it is very simple, it is currently not
> supported, but from looking on code it looked like it is not complex to
> implement.
>
>
>
> My questions are:
> 1. Is there something wrong in the recovery procedure that I described ?
> 2. What is the best way to fix problems in cluster state, except from
> editing clusterstate.json manually? Is there an automated tool for that? We
> have about 100 collections in a cluster, so editing is not really a
> solution.
> 3.Is creating a collection via core.properties is also discouraged?
>
>
>
> Would very appreciate any answers/ thoughts on that.
>
>
> Thanks,
>
>
>
>
>
>


SolrCloud - Strategy for recovering cluster states

2016-02-29 Thread danny teichthal
Hi,


I would like to describe a process we use for overcoming problems in
cluster state when we have networking issues. Would appreciate if anyone
can answer about what are the flaws on this solution and what is the best
practice for recovery in case of network problems involving zookeeper.
I'm working with Solr Cloud with version 5.2.1
~100 collections in a cluster of 6 machines.

This is the short procedure:
1. Bring all the cluster down.
2. Clear all data from zookeeper.
3. Upload configuration.
4. Restart the cluster.

We rely on the fact that a collection is created on core discovery process,
if it does not exist. It gives us much flexibility.
When the cluster comes up, it reads from core.properties and creates the
collections if needed.
Since we have only one configuration, the collections are automatically
linked to it and the cores inherit it from the collection.
This is a very robust procedure, that helped us overcome many problems
until we stabilized our cluster which is now pretty stable.
I know that the leader might change in such case and may lose updates, but
it is ok.


The problem is that today I want to add a new config set.
When I add it and clear zookeeper, the cores cannot be created because
there are 2 configurations. This breaks my recovery procedure.

I thought about a few options:
1. Put the config Name in core.properties - this doesn't work. (It is
supported in CoreAdminHandler, but  is discouraged according to
documentation)
2. Change recovery procedure to not delete all data from zookeeper, but
only relevant parts.
3. Change recovery procedure to delete all, but recreate and link
configurations for all collections before startup.

Option #1 is my favorite, because it is very simple, it is currently not
supported, but from looking on code it looked like it is not complex to
implement.



My questions are:
1. Is there something wrong in the recovery procedure that I described ?
2. What is the best way to fix problems in cluster state, except from
editing clusterstate.json manually? Is there an automated tool for that? We
have about 100 collections in a cluster, so editing is not really a
solution.
3.Is creating a collection via core.properties is also discouraged?



Would very appreciate any answers/ thoughts on that.


Thanks,


Re: Solr node 'Gone' status

2016-01-19 Thread danny teichthal
>From my short experience, it  indicates that the particular node lost
connection with zookeeper.
Like Binoy said, It may be because the process/host is down, but also could
be a result of a network problem.

On Tue, Jan 19, 2016 at 12:20 PM, Binoy Dalal 
wrote:

> In my experience 'Gone' indicates that that particular solr instance itself
> is down.
> To bring it back up, simply restart solr.
>
> On Tue, 19 Jan 2016, 11:16 davidphilip cherian <
> davidphilipcher...@gmail.com>
> wrote:
>
> > Hi,
> >
> > Solr-admin cloud view page has got another new radio button indicating
> > status of node :  'Gone' status. What does that mean?  One of my
> collection
> > is in that state and it is not serving any request. How to bring that up?
> >
> --
> Regards,
> Binoy Dalal
>


Re: Cluster down for long time after zookeeper disconnection

2015-08-11 Thread danny teichthal
1. Erik, thanks,  I agree that it is really serious, but I think that the 3
minutes on this case were not mandatory.
On my case it was a deadlock, which smells like some kind of bug.
One replica is waiting for other to come up, before it takes leadership,
while the other is waiting for the election results.
If I will be able to reproduce it on 5.2.1, is it legitimate to file a JIRA
issue for that?

2. Regarding session timeouts, there's something about configuration that I
don't understand.
 If zkClientTimeout is set to 30 seconds, how come see in the log that
session expired after ~50 seconds.
Maybe I have a mismatch between zookeeper and solr configuration?

3. Resuming the question of leaderVoteWait parameter, I have seen in a few
threads that it may be reduced to a minimum.
I'm not clear about the full meaning, but I understand that it is meant to
prevent lose of update on cluster startup.
Can anyone confirm/clarify that?




Links for leaderVoteWait:

http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201307.mbox/%3ccajt9wnhivirpn79kttcn8ekafevhhmqwkfl-+i16kbz0ogl...@mail.gmail.com%3E

http://qnalist.com/questions/4812859/waitforleadertoseedownstate-when-leader-is-down

Relevant part from My zookeeper conf:
tickTime=2000
initLimit=10
syncLimit=5



On Tue, Aug 11, 2015 at 1:06 AM, Erick Erickson erickerick...@gmail.com
wrote:

 Not that I know of. With ZK as the one source of truth, dropping below
 quorum
 is Really Serious, so having to wait 3 minutes or so for action to be
 taken is the
 fallback.

 Best,
 Erick

 On Mon, Aug 10, 2015 at 1:34 PM, danny teichthal dannyt...@gmail.com
 wrote:
  Erick, I assume you are referring to zkClientTimeout, it is set to 30
  seconds. I also see these messages on Solr side:
   Client session timed out, have not heard from server in 48865ms for
  sessionid 0x44efbb91b5f0001, closing socket connection and attempting
  reconnect.
  So, I'm not sure what was the actual disconnection duration time, but it
  could have been up to a minute.
  We are working on finding the network issues root cause, but assuming
  disconnections will always occur, are there any other options to overcome
  this issues?
 
 
 
  On Mon, Aug 10, 2015 at 11:18 PM, Erick Erickson 
 erickerick...@gmail.com
  wrote:
 
  I didn't see the zk timeout you set (just skimmed). But if your
 Zookeeper
  was
  down _very_ termporarily, it may suffice to up the ZK timeout. The
 default
  in the 10.4 time-frame (if I remember correctly) was 15 seconds which
 has
  proven to be too short in many circumstances.
 
  Of course if your ZK was down for minutest this wouldn't help.
 
  Best,
  Erick
 
  On Mon, Aug 10, 2015 at 1:06 PM, danny teichthal dannyt...@gmail.com
  wrote:
   Hi Alexander ,
   Thanks for your reply, I looked at the release notes.
   There is one bug fix - SOLR-7503
   https://issues.apache.org/jira/browse/SOLR-7503 – register cores
   asynchronously.
   It may reduce the registration time since it is done on parallel, but
   still, 3 minutes (leaderVoteWait) is a long time to recover from a few
   seconds of disconnection.
  
   Except from that one I don't see any bug fix that addresses the same
   problem.
   I am able to reproduce it on 4.10.4 pretty easily, I will also try it
  with
   5.2.1 and see if it reproduces.
  
   Anyway, since migrating to 5.2.1 is not an option for me in the short
  term,
   I'm left with the question if reducing leaderVoteWait may help here,
 and
   what may be the consequences.
   If i understand correctly, there might be a chance of losing updates
 that
   were made on leader.
   From my side it is a lot worse to lose availability for 3 minutes.
  
   I would really appreciate a feedback on this.
  
  
  
  
   On Mon, Aug 10, 2015 at 6:55 PM, Alexandre Rafalovitch 
  arafa...@gmail.com
   wrote:
  
   Did you look at release notes for Solr versions after your own?
  
   I am pretty sure some similar things were identified and/or resolved
   for 5.x. It may not help if you cannot migrate, but would at least
   give a confirmation and maybe workaround on what you are facing.
  
   Regards,
  Alex.
   
   Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
   http://www.solr-start.com/
  
  
   On 10 August 2015 at 11:37, danny teichthal dannyt...@gmail.com
  wrote:
Hi,
We are using Solr cloud with solr 4.10.4.
On the passed week we encountered a problem where all of our
 servers
disconnected from zookeeper cluster.
This might be ok, the problem is that after reconnecting to
 zookeeper
  it
looks like for every collection both replicas do not have a leader
 and
   are
stuck in some kind of a deadlock for a few minutes.
   
From what we understand:
One of the replicas assume it ill be the leader and at some point
   starting
to wait on leaderVoteWait, which is by default 3 minutes.
The other replica is stuck on this part of code for a few minutes

Cluster down for long time after zookeeper disconnection

2015-08-10 Thread danny teichthal
Hi,
We are using Solr cloud with solr 4.10.4.
On the passed week we encountered a problem where all of our servers
disconnected from zookeeper cluster.
This might be ok, the problem is that after reconnecting to zookeeper it
looks like for every collection both replicas do not have a leader and are
stuck in some kind of a deadlock for a few minutes.

From what we understand:
One of the replicas assume it ill be the leader and at some point starting
to wait on leaderVoteWait, which is by default 3 minutes.
The other replica is stuck on this part of code for a few minutes:
 at org.apache.solr.cloud.ZkController.getLeaderProps(ZkController.java:957)
at
org.apache.solr.cloud.ZkController.getLeaderProps(ZkController.java:921)
at
org.apache.solr.cloud.ZkController.waitForLeaderToSeeDownState(ZkController.java:1521)
at
org.apache.solr.cloud.ZkController.registerAllCoresAsDown(ZkController.java:392)

Looks like replica 1 waits for a leader to be registered in the zookeeper,
but replica 2 is waiting for replica 1.
(org.apache.solr.cloud.ShardLeaderElectionContext.waitForReplicasToComeUp).

We have 100 collections distributed in 3 pairs of Solr nodes. Each
collection has one shard with 2 replicas.
As I understand from code and logs, all the collections are being
registered synchronously, which means that we have to wait 3 minutes *
number of collections for the whole cluster to come up. It could be more
than an hour!



1. We thought about lowering leaderVoteWait to solve the problem, but we
are not sure what is the risk?

2. The following thread is very similar to our case:
http://qnalist.com/questions/4812859/waitforleadertoseedownstate-when-leader-is-down.
Does anybody know if it is indeed a bug and if there's a related JIRA issue?

3. I see this on logs before the reconnection Client session timed out,
have not heard from server in 48865ms for sessionid 0x44efbb91b5f0001,
closing socket connection and attempting reconnect, does it mean that
there was a disconnection of over 50 seconds between SOLR and zookeeper?


Thanks in advance for your kind answer


Re: Cluster down for long time after zookeeper disconnection

2015-08-10 Thread danny teichthal
Hi Alexander ,
Thanks for your reply, I looked at the release notes.
There is one bug fix - SOLR-7503
https://issues.apache.org/jira/browse/SOLR-7503 – register cores
asynchronously.
It may reduce the registration time since it is done on parallel, but
still, 3 minutes (leaderVoteWait) is a long time to recover from a few
seconds of disconnection.

Except from that one I don't see any bug fix that addresses the same
problem.
I am able to reproduce it on 4.10.4 pretty easily, I will also try it with
5.2.1 and see if it reproduces.

Anyway, since migrating to 5.2.1 is not an option for me in the short term,
I'm left with the question if reducing leaderVoteWait may help here, and
what may be the consequences.
If i understand correctly, there might be a chance of losing updates that
were made on leader.
From my side it is a lot worse to lose availability for 3 minutes.

I would really appreciate a feedback on this.




On Mon, Aug 10, 2015 at 6:55 PM, Alexandre Rafalovitch arafa...@gmail.com
wrote:

 Did you look at release notes for Solr versions after your own?

 I am pretty sure some similar things were identified and/or resolved
 for 5.x. It may not help if you cannot migrate, but would at least
 give a confirmation and maybe workaround on what you are facing.

 Regards,
Alex.
 
 Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
 http://www.solr-start.com/


 On 10 August 2015 at 11:37, danny teichthal dannyt...@gmail.com wrote:
  Hi,
  We are using Solr cloud with solr 4.10.4.
  On the passed week we encountered a problem where all of our servers
  disconnected from zookeeper cluster.
  This might be ok, the problem is that after reconnecting to zookeeper it
  looks like for every collection both replicas do not have a leader and
 are
  stuck in some kind of a deadlock for a few minutes.
 
  From what we understand:
  One of the replicas assume it ill be the leader and at some point
 starting
  to wait on leaderVoteWait, which is by default 3 minutes.
  The other replica is stuck on this part of code for a few minutes:
   at
 org.apache.solr.cloud.ZkController.getLeaderProps(ZkController.java:957)
  at
  org.apache.solr.cloud.ZkController.getLeaderProps(ZkController.java:921)
  at
 
 org.apache.solr.cloud.ZkController.waitForLeaderToSeeDownState(ZkController.java:1521)
  at
 
 org.apache.solr.cloud.ZkController.registerAllCoresAsDown(ZkController.java:392)
 
  Looks like replica 1 waits for a leader to be registered in the
 zookeeper,
  but replica 2 is waiting for replica 1.
 
 (org.apache.solr.cloud.ShardLeaderElectionContext.waitForReplicasToComeUp).
 
  We have 100 collections distributed in 3 pairs of Solr nodes. Each
  collection has one shard with 2 replicas.
  As I understand from code and logs, all the collections are being
  registered synchronously, which means that we have to wait 3 minutes *
  number of collections for the whole cluster to come up. It could be more
  than an hour!
 
 
 
  1. We thought about lowering leaderVoteWait to solve the problem, but we
  are not sure what is the risk?
 
  2. The following thread is very similar to our case:
 
 http://qnalist.com/questions/4812859/waitforleadertoseedownstate-when-leader-is-down
 .
  Does anybody know if it is indeed a bug and if there's a related JIRA
 issue?
 
  3. I see this on logs before the reconnection Client session timed out,
  have not heard from server in 48865ms for sessionid 0x44efbb91b5f0001,
  closing socket connection and attempting reconnect, does it mean that
  there was a disconnection of over 50 seconds between SOLR and zookeeper?
 
 
  Thanks in advance for your kind answer



Re: Cluster down for long time after zookeeper disconnection

2015-08-10 Thread danny teichthal
Erick, I assume you are referring to zkClientTimeout, it is set to 30
seconds. I also see these messages on Solr side:
 Client session timed out, have not heard from server in 48865ms for
sessionid 0x44efbb91b5f0001, closing socket connection and attempting
reconnect.
So, I'm not sure what was the actual disconnection duration time, but it
could have been up to a minute.
We are working on finding the network issues root cause, but assuming
disconnections will always occur, are there any other options to overcome
this issues?



On Mon, Aug 10, 2015 at 11:18 PM, Erick Erickson erickerick...@gmail.com
wrote:

 I didn't see the zk timeout you set (just skimmed). But if your Zookeeper
 was
 down _very_ termporarily, it may suffice to up the ZK timeout. The default
 in the 10.4 time-frame (if I remember correctly) was 15 seconds which has
 proven to be too short in many circumstances.

 Of course if your ZK was down for minutest this wouldn't help.

 Best,
 Erick

 On Mon, Aug 10, 2015 at 1:06 PM, danny teichthal dannyt...@gmail.com
 wrote:
  Hi Alexander ,
  Thanks for your reply, I looked at the release notes.
  There is one bug fix - SOLR-7503
  https://issues.apache.org/jira/browse/SOLR-7503 – register cores
  asynchronously.
  It may reduce the registration time since it is done on parallel, but
  still, 3 minutes (leaderVoteWait) is a long time to recover from a few
  seconds of disconnection.
 
  Except from that one I don't see any bug fix that addresses the same
  problem.
  I am able to reproduce it on 4.10.4 pretty easily, I will also try it
 with
  5.2.1 and see if it reproduces.
 
  Anyway, since migrating to 5.2.1 is not an option for me in the short
 term,
  I'm left with the question if reducing leaderVoteWait may help here, and
  what may be the consequences.
  If i understand correctly, there might be a chance of losing updates that
  were made on leader.
  From my side it is a lot worse to lose availability for 3 minutes.
 
  I would really appreciate a feedback on this.
 
 
 
 
  On Mon, Aug 10, 2015 at 6:55 PM, Alexandre Rafalovitch 
 arafa...@gmail.com
  wrote:
 
  Did you look at release notes for Solr versions after your own?
 
  I am pretty sure some similar things were identified and/or resolved
  for 5.x. It may not help if you cannot migrate, but would at least
  give a confirmation and maybe workaround on what you are facing.
 
  Regards,
 Alex.
  
  Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
  http://www.solr-start.com/
 
 
  On 10 August 2015 at 11:37, danny teichthal dannyt...@gmail.com
 wrote:
   Hi,
   We are using Solr cloud with solr 4.10.4.
   On the passed week we encountered a problem where all of our servers
   disconnected from zookeeper cluster.
   This might be ok, the problem is that after reconnecting to zookeeper
 it
   looks like for every collection both replicas do not have a leader and
  are
   stuck in some kind of a deadlock for a few minutes.
  
   From what we understand:
   One of the replicas assume it ill be the leader and at some point
  starting
   to wait on leaderVoteWait, which is by default 3 minutes.
   The other replica is stuck on this part of code for a few minutes:
at
  org.apache.solr.cloud.ZkController.getLeaderProps(ZkController.java:957)
   at
  
 org.apache.solr.cloud.ZkController.getLeaderProps(ZkController.java:921)
   at
  
 
 org.apache.solr.cloud.ZkController.waitForLeaderToSeeDownState(ZkController.java:1521)
   at
  
 
 org.apache.solr.cloud.ZkController.registerAllCoresAsDown(ZkController.java:392)
  
   Looks like replica 1 waits for a leader to be registered in the
  zookeeper,
   but replica 2 is waiting for replica 1.
  
 
 (org.apache.solr.cloud.ShardLeaderElectionContext.waitForReplicasToComeUp).
  
   We have 100 collections distributed in 3 pairs of Solr nodes. Each
   collection has one shard with 2 replicas.
   As I understand from code and logs, all the collections are being
   registered synchronously, which means that we have to wait 3 minutes *
   number of collections for the whole cluster to come up. It could be
 more
   than an hour!
  
  
  
   1. We thought about lowering leaderVoteWait to solve the problem, but
 we
   are not sure what is the risk?
  
   2. The following thread is very similar to our case:
  
 
 http://qnalist.com/questions/4812859/waitforleadertoseedownstate-when-leader-is-down
  .
   Does anybody know if it is indeed a bug and if there's a related JIRA
  issue?
  
   3. I see this on logs before the reconnection Client session timed
 out,
   have not heard from server in 48865ms for sessionid 0x44efbb91b5f0001,
   closing socket connection and attempting reconnect, does it mean that
   there was a disconnection of over 50 seconds between SOLR and
 zookeeper?
  
  
   Thanks in advance for your kind answer
 



SolrCloud updates are slow on replica (DBQ?)

2015-06-21 Thread danny teichthal
Hi,


We are experiencing  some intermittent slowness on updates for one of our
collections.

We see user operations hanging on updates to SOLR via SolrJ client.

Every time in the period of the slowness we see something like this in the
log of the replica:

[org.apache.solr.update.UpdateHandler] Reordered DBQs detected.
Update=add{_version_=1504391336428568576,id=

2392581250002321} DBQs=[DBQ{version=1504391337298886656,q=level_2_id:12345}]

After  a while The DBQ is piling up and we see the list of DBQ growing.




At some point the time of updates is increase from 300 ms to 20 seconds and
then on the leader log I see read timeout exception and it initiates
recovery on the replica.

At that point all updates start to be very slow – from 20 seconds to 60
seconds. Especially updates with deletByQuery.

We are not sure if the DBQ is the cause or symptom. But, what does not make
sense to me is that the slowness is only on the replica side.

We suspect that the fact that the updates become slow on the replica cause
a timeout on the leader side and cause the recovery.


Would really appreciate any help on this.


Thanks,








Some info:

DBQ are sent as a separate update request from the add requests.


We currently use SolrCloud 4.9.0.

We have ~140 collections on  4 nodes – 1,2,3,4.

Each collection has a single shard with a leader and another replica.

~70 collections are on node 1 and 2 as leader and replica and the other
collections are on 3 and 4.



On each node there’s about 65GB of index with 25,000,000 documents.



This is our update handler, autoSoftCommit is set to 2 seconds, but there
may be manual soft commits coming from user operations from time to time:



updateHandler class=solr.DirectUpdateHandler2

autoCommit

maxDocs1/maxDocs

maxTime12/maxTime

openSearchertrue/openSearcher

/autoCommit

autoSoftCommit

   maxDocs1000/maxDocs

   maxTime2000/maxTime

/autoSoftCommit

updateLog /

/updateHandler


Re: SolrCloud updates are slow on replica (DBQ?)

2015-06-21 Thread danny teichthal
Thanks Erik,
Actually, only lately we started to use autoSoftCommit lately because of
the performance warning, it was after reading the first link you provided.
Our application does frequent updates from batch and online requests.
Until now we issued softCommit after each user transaction finished. We
were able to reduce the vast majority of the manual commits and left a few
cases where the commit was essential for the screens not to fail.
Due to business requirements, 2 seconds are the maximum we can do for now.
But if you say that a few more seconds will make a difference we will try
to increase it.

As for GC, we continuously check the GC logs and can say for sure that it
is not the problem on our case.
Regarding cache - we don't use auto warm at all. For my small understanding
the cache is not that big, please correct me if I'm wrong.

queryResultCache class=solr.LRUCache size=64
initialSize=32 autowarmCount=0 /
documentCache class=solr.FastLRUCache size=4096
initialSize=1024 autowarmCount=0 /
filterCache class=solr.FastLRUCache size=4096
initialSize=1024 autowarmCount=0 /
fieldValueCache class=solr.FastLRUCache size=4096
initialSize=1024 autowarmCount=0 /



2 more quetions, just for understanding:
1. What is the reason behind removing the maxdocs?  If I set no limit,
couldn't it explode the transaction log  in case of heavy indexing?
2. Do you think that the DBQ is causing a problem or is just indicating on
it, is there a problem with many DBQs?

We will probably start by removing the maxDocs and openSearcher=true.
The link about indexing performance also looks very relevant - I will read
it thoroughly.

Thanks again,





On Sun, Jun 21, 2015 at 6:29 PM, Erick Erickson erickerick...@gmail.com
wrote:

 The very first thing I would do is straighten out your commit strategy,
 they are _very_ aggressive. I'd guess you're also seeing warnings in
 the logs about too many on deck searchers or something like, or
 you've upped your max warming searchers in solrconfig.xml.

 Soft commits aren't free. They're less expensive than hard
 commits (openSearcher=true), but they're not free. Here's a long
 writeup on this:

 https://lucidworks.com/blog/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/

 What I'd do:
 1 remove maxDocs entirely
 2 set openSearcher=false for your autoCommit
 3 remove maxDocs from your autoSoftCommit
 4 lengthen the soft commit as much as you can stand.
 5 if you must have very short soft commits, consider
 turning off (or at least down) your caches in solrconfig.xml
 6 stop issuing any kind of commits from the client. This is
an anti-pattern except in very unusual circumstances and
in your setup you see all the docs 2 seconds later anyway
   so it is doing you no good and (maybe) active harm.

 If the problem persists, try looking at your garbage collection,
 you may well be hitting long GC pauses.

 Also note that there was a bottleneck in Solr prior to 5.2
 when replicas were present, see:
 http://lucidworks.com/blog/indexing-performance-solr-5-2-now-twice-fast/

 Best,
 Erick

 On Sun, Jun 21, 2015 at 7:14 AM, danny teichthal dannyt...@gmail.com
 wrote:
  Hi,
 
 
  We are experiencing  some intermittent slowness on updates for one of our
  collections.
 
  We see user operations hanging on updates to SOLR via SolrJ client.
 
  Every time in the period of the slowness we see something like this in
 the
  log of the replica:
 
  [org.apache.solr.update.UpdateHandler] Reordered DBQs detected.
  Update=add{_version_=1504391336428568576,id=
 
  2392581250002321}
 DBQs=[DBQ{version=1504391337298886656,q=level_2_id:12345}]
 
  After  a while The DBQ is piling up and we see the list of DBQ growing.
 
 
 
 
  At some point the time of updates is increase from 300 ms to 20 seconds
 and
  then on the leader log I see read timeout exception and it initiates
  recovery on the replica.
 
  At that point all updates start to be very slow – from 20 seconds to 60
  seconds. Especially updates with deletByQuery.
 
  We are not sure if the DBQ is the cause or symptom. But, what does not
 make
  sense to me is that the slowness is only on the replica side.
 
  We suspect that the fact that the updates become slow on the replica
 cause
  a timeout on the leader side and cause the recovery.
 
 
  Would really appreciate any help on this.
 
 
  Thanks,
 
 
 
 
 
 
 
 
  Some info:
 
  DBQ are sent as a separate update request from the add requests.
 
 
  We currently use SolrCloud 4.9.0.
 
  We have ~140 collections on  4 nodes – 1,2,3,4.
 
  Each collection has a single shard with a leader and another replica.
 
  ~70 collections are on node 1 and 2 as leader and replica and the other
  collections are on 3 and 4.
 
 
 
  On each node there’s about 65GB of index with 25,000,000 documents.
 
 
 
  This is our update handler, autoSoftCommit is set to 2 seconds, but there
  may be manual soft commits coming from user operations from time to time:
 
 
 
  updateHandler class

Re: CopyField exclude patterns

2015-02-03 Thread danny teichthal
Alexander and Jack Thanks for the reply.
Looking at both, I think that the CloneFieldUpdateProcessor can do what I
need without having to implement a custom one.
By the way, Is there a performance penalty by update processor comparing to
copy Field?



On Mon, Feb 2, 2015 at 4:29 PM, Alexandre Rafalovitch arafa...@gmail.com
wrote:

 Not on copyField,

 You can use UpdateRequestProcessor instead (

 http://www.solr-start.com/javadoc/solr-lucene/org/apache/solr/update/processor/CloneFieldUpdateProcessorFactory.html
 ).

 This allows to specify both inclusion and exclusion patterns.

 Regards,
Alex.
 
 Sign up for my Solr resources newsletter at http://www.solr-start.com/


 On 2 February 2015 at 02:53, danny teichthal dannyt...@gmail.com wrote:
  Hi,
  Is there a way to make some patterns to be excluded on the source of a
  copyField?
 
  We are using globs to copy all our text fields to some target field.
  It looks something like this:
  copyField source=prefix_* dest=destination /
 
  I would like a subset of the fields starting with prefix_ to be
 excluded
  and not copied to destination. (e.g. all fields with prefix_abc_* ).
  Is there a way to do it on SOLR?
 
  I couldn't find anything saying that it exists.
 
  Thanks



CopyField exclude patterns

2015-02-01 Thread danny teichthal
Hi,
Is there a way to make some patterns to be excluded on the source of a
copyField?

We are using globs to copy all our text fields to some target field.
It looks something like this:
copyField source=prefix_* dest=destination /

I would like a subset of the fields starting with prefix_ to be excluded
and not copied to destination. (e.g. all fields with prefix_abc_* ).
Is there a way to do it on SOLR?

I couldn't find anything saying that it exists.

Thanks


Re: Different update handlers for auto commit configuration

2014-12-02 Thread danny teichthal
Thanks for the clarification, I indeed mixed it with UpdateRequestHandler.

On Mon, Dec 1, 2014 at 11:24 PM, Chris Hostetter hossman_luc...@fucit.org
wrote:


 : I thought that the auto commit is per update handler because they are
 : configured within the update handler tag.

 updateHandler is not the same thing as a requestHandler that does
 updates.

 there can be many Update request handlers configured, but there is only
 ever one updateHandler/ in a SolrCore.



 -Hoss
 http://www.lucidworks.com/



Re: Different update handlers for auto commit configuration

2014-12-01 Thread danny teichthal
Thanks for the reply Erik,
I thought that the auto commit is per update handler because they are
configured within the update handler tag.
From a little debugging  on SOLR code and it looked as if the Commit
Tracker will schedule a soft commit on update if it gets commitwithin
value, and if not it will use the autosofcommit timeUpperBound. So,
according to my understanding, the autosoftcommit should be equivalent to
the commit within.
Anyway, it looks like the commitWithin is more flexible, so I'll try it.

Regarding  the requirement, the problem is that we are suppose to be Near
real time.
Since we are already in production it's harder to change the requirements.

You are right about the caches and warming, the cache is not being used
that much and we cannot do warming at all.
 But, the trigger for the change is that on load time the commit frequency
increases and we get the performance warnings.
 We may have a few commits in the same second.
So, as a first step if we set it to 2 we will have a benefit.
In addition, on our big cores we have an average of commit per second on 24
hours, so still there's some benefit.



On Mon, Dec 1, 2014 at 12:42 AM, Erick Erickson erickerick...@gmail.com
wrote:

 Uhhhm, the soft/hard commit settings are global, not
 configured in each update handler.

 How are updates being done? Because if you're using SolrJ,
 you can just use the server.add(doclist, commitwithin) and it'll
 just be handled automatically.

 True, putting a 2 second commitwithin on an online update
 would pick up some batch updates that happened to come in, but
 that's probably OK.

 I'd also be sure that the 2 second requirement is real. Soft commits
 aren't as expensive as hard commits with openSearcher=true, but
 they aren't free either. At that fast a commit rate you probably won't
 get much benefit out of the top-level caches, and you'll be warming an
 awful lot.

 FWIW,
 Erick

 On Sun, Nov 30, 2014 at 12:32 PM, danny teichthal dannyt...@gmail.com
 wrote:
  Hi,
  On our system we currently initiate a soft commit to SOLR after each
  business transaction that initiate an update. Hard commits are automatic
  each 2 minutes.
  We want to limit the explicit commit and move to autoSoftCommit.
 
  Because of business restrictions:
  Online request should be available for searching after 2 seconds.
  Update from batch jobs can be available after 10 seconds. (maybe more,
  currently on discussion).
  There are some transactions that must be available immediately.
 
  Question
  I thought about creating 3 different update handlers, each containing a
  different autoSoftCommit configuration. Is this an acceptable solution,
 are
  there any downsides in using multiple update handlers?
 
  Thanks,



Different update handlers for auto commit configuration

2014-11-30 Thread danny teichthal
Hi,
On our system we currently initiate a soft commit to SOLR after each
business transaction that initiate an update. Hard commits are automatic
each 2 minutes.
We want to limit the explicit commit and move to autoSoftCommit.

Because of business restrictions:
Online request should be available for searching after 2 seconds.
Update from batch jobs can be available after 10 seconds. (maybe more,
currently on discussion).
There are some transactions that must be available immediately.

Question
I thought about creating 3 different update handlers, each containing a
different autoSoftCommit configuration. Is this an acceptable solution, are
there any downsides in using multiple update handlers?

Thanks,


Solr performance: multiValued filed vs separate fields

2014-05-16 Thread danny teichthal
I wonder about performance difference of 2 indexing options: 1- multivalued
field 2- separate fields

The case is as follows: Each document has 100 “properties”: prop1..prop100.
The values are strings and there is no relation between different
properties. I would like to search by exact match on several properties by
known values (like ids). For example: search for all docs having
prop1=”blue” and prop6=”high”

I can choose to build the indexes in 1 of 2 ways: 1- the trivial way – 100
separate fields, 1 for each property, multiValued=false. the values are
just property values. 2- 1 field (named “properties”) multiValued=true. The
field will have 100 values: value1=”prop1:blue”.. value6=”high” etc

Is it correct to say that option1 will have much better performance in
searching? How about indexing performance?


Re: Nested documents, block join - re-indexing a single document upon update

2014-03-18 Thread danny teichthal
Thanks Jack,
I understand that updating a single document on a block is currently not
supported.
But,  atomic update to a single document does not have to be in conflict
with block joins.

If I got it right from the documentation:
Currently, If a document is atomically  updated, SOLR finds the stored
document, and re index it, changing only the fields that were specified as
update=operation.

It looked intuitive that if we specify the _root_ while using atomic
update, SOLR will find the whole block by _root_, update the changed
document, and re-index the whole block.

Of course I cannot estimate the feasibility and effort of implementing it,
but it looks like a nice enhancement.



On Sun, Mar 16, 2014 at 4:09 PM, Jack Krupansky j...@basetechnology.comwrote:

 You stumbled upon the whole point of block join - that the documents are
 and must be managed as a block and not individually.

 -- Jack Krupansky

 From: danny teichthal
 Sent: Sunday, March 16, 2014 6:47 AM
 To: solr-user@lucene.apache.org
 Subject: Nested documents, block join - re-indexing a single document upon
 update




 Hi All,




 To make things short, I would like to use block joins, but to be able to
 index each document on the block separately.

 Is it possible?



 In more details:



 We have some nested parent-child structure where:

 1.   Parent may have a single level of children

 2.   Parent and child documents may be updated independently.

 3.   We may want to search for parent by child info and vise versa.



 At first we thought of managing the parent and child in different
 documents, de-normalizing child data at parent level and parent data on
 child level.

 After reading Mikhail blog
 http://blog.griddynamics.com/2013/09/solr-block-join-support.html, we
 thought of using the block join for this purpose.



 But, I got into a wall when trying to update a single child document.

 For me, it's ok if SOLR will internally index the whole block, I just
 don't want to fetch the whole hierarchy from DB for update.



 I was trying to achieve this using atomic updates - since all the fields
 must be stored anyway - if I send an atomic update on one of the children
 with the _root_ field then there's no need to send the whole hierarchy.

 But, when I try this method, I see that the child document is indeed
 updated, but it's order is changed to be after the parent.



 This is what I did:

 1.   Change the root field to be stored - field name=_root_
 type=string indexed=true stored=true/

 2.   Put attached docs on example\exampledocs.

 3.   Run post.jar on parent-child.xml

 4.   Run post.jar on update-child-atomic.xml.

 5.   Now -
 http://localhost:8983/solr/collection1/select?q=%7B!parent+which%3D%27type_s%3Aparent%27%7D%2BCOLOR_s%3ARed+%2BSIZE_s%3AXL%0Awt=jsonindent=true,
 returns parent 10 as expected.

 6.   But,
 http://localhost:8983/solr/collection1/select?q={!parent+which%3D'type_s%3Aparent'}%2BCOLOR_s%3AGreen+%2BSIZE_s%3AXL%0Awt=jsonindent=true-
  returns nothing.

 7.   When searching *:* on Admin,   record with id=12 was updated with
 'Green', but it is returned below the parent record.

 8.



 Thanks in advance.



 In case the attachments does not work:

 1st file to post:



 update

   deletequery*:*/query/delete

   add

 doc

   field name=id10/field

   field name=type_sparent/field

   field name=BRAND_sNike/field

   doc

 field name=id11/field

 field name=COLOR_sRed/field

 field name=SIZE_sXL/field

   /doc

   doc

 field name=id12/field

 field name=COLOR_sBlue/field

 field name=SIZE_sXL/field

   /doc

 /doc

   /add

   commit/

 /update







 2nd file:



 update

   add

   doc

 field name=id12/field

 field name=COLOR_s update=setGreen/field

 field name=SIZE_sXL/field

 field name=_root_ 10/field

   /doc

 /doc

   /add

   commit/

 /update




Re: Nested documents, block join - re-indexing a single document upon update

2014-03-18 Thread danny teichthal
Thanks,
Indeed, the subject line was misleading.
Then I will file a new improvement request for block atomic update
support.




On Tue, Mar 18, 2014 at 2:08 PM, Jack Krupansky j...@basetechnology.comwrote:

 That's a reasonable request and worth a Jira, but different from what you
 have specified in your subject line: re-indexing a single document - the
 entire block needs to be re-indexed.

 I suppose people might want a block atomic update - where multiple child
 documents as well as the parent document can be updated with one rewrite of
 the entire block, and maybe some way to delete individual child documents
 as well.

 -- Jack Krupansky

 -Original Message- From: danny teichthal
 Sent: Tuesday, March 18, 2014 3:58 AM
 To: solr-user@lucene.apache.org
 Subject: Re: Nested documents, block join - re-indexing a single document
 upon update


 Thanks Jack,
 I understand that updating a single document on a block is currently not
 supported.
 But,  atomic update to a single document does not have to be in conflict
 with block joins.

 If I got it right from the documentation:
 Currently, If a document is atomically  updated, SOLR finds the stored
 document, and re index it, changing only the fields that were specified as
 update=operation.

 It looked intuitive that if we specify the _root_ while using atomic
 update, SOLR will find the whole block by _root_, update the changed
 document, and re-index the whole block.

 Of course I cannot estimate the feasibility and effort of implementing it,
 but it looks like a nice enhancement.



 On Sun, Mar 16, 2014 at 4:09 PM, Jack Krupansky j...@basetechnology.com
 wrote:

  You stumbled upon the whole point of block join - that the documents are
 and must be managed as a block and not individually.

 -- Jack Krupansky

 From: danny teichthal
 Sent: Sunday, March 16, 2014 6:47 AM
 To: solr-user@lucene.apache.org
 Subject: Nested documents, block join - re-indexing a single document upon
 update




 Hi All,




 To make things short, I would like to use block joins, but to be able to
 index each document on the block separately.

 Is it possible?



 In more details:



 We have some nested parent-child structure where:

 1.   Parent may have a single level of children

 2.   Parent and child documents may be updated independently.

 3.   We may want to search for parent by child info and vise versa.



 At first we thought of managing the parent and child in different
 documents, de-normalizing child data at parent level and parent data on
 child level.

 After reading Mikhail blog
 http://blog.griddynamics.com/2013/09/solr-block-join-support.html, we
 thought of using the block join for this purpose.



 But, I got into a wall when trying to update a single child document.

 For me, it's ok if SOLR will internally index the whole block, I just
 don't want to fetch the whole hierarchy from DB for update.



 I was trying to achieve this using atomic updates - since all the fields
 must be stored anyway - if I send an atomic update on one of the children
 with the _root_ field then there's no need to send the whole hierarchy.

 But, when I try this method, I see that the child document is indeed
 updated, but it's order is changed to be after the parent.



 This is what I did:

 1.   Change the root field to be stored - field name=_root_
 type=string indexed=true stored=true/

 2.   Put attached docs on example\exampledocs.

 3.   Run post.jar on parent-child.xml

 4.   Run post.jar on update-child-atomic.xml.

 5.   Now -
 http://localhost:8983/solr/collection1/select?q=%7B!
 parent+which%3D%27type_s%3Aparent%27%7D%2BCOLOR_s%
 3ARed+%2BSIZE_s%3AXL%0Awt=jsonindent=true,
 returns parent 10 as expected.

 6.   But,
 http://localhost:8983/solr/collection1/select?q={!parent+
 which%3D'type_s%3Aparent'}%2BCOLOR_s%3AGreen+%2BSIZE_s%
 3AXL%0Awt=jsonindent=true-http://localhost:8983/solr/collection1/select?q=%7B!parent+which%3D'type_s%3Aparent'%7D%2BCOLOR_s%3AGreen+%2BSIZE_s%3AXL%0Awt=jsonindent=true-returns
  nothing.


 7.   When searching *:* on Admin,   record with id=12 was updated with
 'Green', but it is returned below the parent record.

 8.



 Thanks in advance.



 In case the attachments does not work:

 1st file to post:



 update

   deletequery*:*/query/delete

   add

 doc

   field name=id10/field

   field name=type_sparent/field

   field name=BRAND_sNike/field

   doc

 field name=id11/field

 field name=COLOR_sRed/field

 field name=SIZE_sXL/field

   /doc

   doc

 field name=id12/field

 field name=COLOR_sBlue/field

 field name=SIZE_sXL/field

   /doc

 /doc

   /add

   commit/

 /update







 2nd file:



 update

   add

   doc

 field name=id12/field

 field name=COLOR_s update=setGreen/field

 field name=SIZE_sXL/field

 field name=_root_ 10/field

Nested documents, block join - re-indexing a single document upon update

2014-03-16 Thread danny teichthal
 Hi All,


 To make things short, I would like to use block joins, but to be able to
index each document on the block separately.

Is it possible?



In more details:



We have some nested parent-child structure where:

1.   Parent may have a single level of children

2.   Parent and child documents may be updated independently.

3.   We may want to search for parent by child info and vise versa.



At first we thought of managing the parent and child in different
documents, de-normalizing child data at parent level and parent data on
child level.

After reading *Mikhail*http://www.blogger.com/profile/03731629466352186647blog
http://blog.griddynamics.com/2013/09/solr-block-join-support.html, we
thought of using the block join for this purpose.



But, I got into a wall when trying to update a single child document.

For me, it's ok if SOLR will internally index the whole block, I just don't
want to fetch the whole hierarchy from DB for update.



I was trying to achieve this using atomic updates - since all the fields
must be stored anyway - if I send an atomic update on one of the children
with the _*root*_ field then there's no need to send the whole hierarchy.

But, when I try this method, I see that the child document is indeed
updated, but it's order is changed to be after the parent.



This is what I did:

1.   Change the root field to be stored - field name=_root_
type=string indexed=true stored=true/

2.   Put attached docs on example\exampledocs.

3.   Run post.jar on parent-child.xml

4.   Run post.jar on update-child-atomic.xml.

5.   Now -
http://localhost:8983/solr/collection1/select?q=%7B!parent+which%3D%27type_s%3Aparent%27%7D%2BCOLOR_s%3ARed+%2BSIZE_s%3AXL%0Awt=jsonindent=true,
returns parent 10 as expected.

6.   But,
http://localhost:8983/solr/collection1/select?q={!parent+which%3D'type_s%3Aparent'}%2BCOLOR_s%3AGreen+%2BSIZE_s%3AXL%0Awt=jsonindent=truehttp://localhost:8983/solr/collection1/select?q=%7b!parent+which%3D'type_s%3Aparent'%7d%2BCOLOR_s%3AGreen+%2BSIZE_s%3AXL%0Awt=jsonindent=true-
returns nothing.

7.   When searching *:* on Admin,   record with id=12 was updated with
'Green', but it is returned below the parent record.

8.



Thanks in advance.



In case the attachments does not work:

1st file to post:



update

  deletequery*:*/query/delete

  add

doc

  field name=id10/field

  field name=type_sparent/field

  field name=BRAND_sNike/field

  doc

field name=id11/field

field name=COLOR_sRed/field

field name=SIZE_sXL/field

  /doc

  doc

field name=id12/field

field name=COLOR_sBlue/field

field name=SIZE_sXL/field

  /doc

/doc

  /add

  commit/

/update







2nd file:



update

  add

  doc

field name=id12/field

field name=COLOR_s update=setGreen/field

field name=SIZE_sXL/field

field name=_root_ 10/field

  /doc

/doc

  /add

  commit/

/update
update
  deletequery*:*/query/delete
  add
doc
  field name=id10/field
  field name=type_sparent/field
  field name=BRAND_sNike/field
  doc
field name=id11/field
field name=COLOR_sRed/field
field name=SIZE_sXL/field
  /doc
  doc
field name=id12/field
field name=COLOR_sBlue/field
field name=SIZE_sXL/field
  /doc
/doc
  /add
  commit/
/updateupdate  
  add
  doc
field name=id12/field
field name=COLOR_s update=setGreen/field
field name=SIZE_sXL/field
		field name=_root_ 10/field
  /doc
/doc
  /add
  commit/
/update