Re: HDInsight with Solr 4.9.0 Create Collection

2018-03-09 Thread john spooner

would be nice to not get this email.


On 3/9/2018 1:23 PM, Abhi Basu wrote:

This has been resolved!

Turned out to be schema and config file version diff between 4.10 and 4.9.

Thanks,

Abhi

On Fri, Mar 9, 2018 at 11:41 AM, Abhi Basu <9000r...@gmail.com> wrote:


That was due to a folder not being present. Is this something to do with
version?

http://hn0-esohad.mzwz3dh4pb1evcdwc1lcsddrbe.jx.
internal.cloudapp.net:8983/solr/admin/collections?action=
CREATE=ems-collection2=2=
2=1


org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException:Error
CREATEing SolrCore 'ems-collection2_shard2_replica2': Unable to create
core: ems-collection2_shard2_replica2 Caused by: No enum constant
org.apache.lucene.util.Version.4.10.3

On Fri, Mar 9, 2018 at 11:11 AM, Abhi Basu <9000r...@gmail.com> wrote:


Ok, so I tried the following:

/usr/hdp/current/solr/example/scripts/cloud-scripts/zkcli.sh -cmd
upconfig -zkhost zk0-esohad.mzwz3dh4pb1evcdwc1l
csddrbe.jx.internal.cloudapp.net:2181 -confdir
/home/sshuser/abhi/ems-collection/conf -confname ems-collection

And got this exception:
java.lang.IllegalArgumentException: Illegal directory:
/home/sshuser/abhi/ems-collection/conf


On Fri, Mar 9, 2018 at 10:43 AM, Abhi Basu <9000r...@gmail.com> wrote:


Thanks for the reply, this really helped me.

For Solr 4.9, what is the actual zkcli command to upload config?

java -classpath example/solr-webapp/WEB-INF/lib/*
  org.apache.solr.cloud.ZkCLI -cmd upconfig -zkhost 127.0.0.1:9983
  -confdir example/solr/collection1/conf -confname conf1 -solrhome
example/solr

OR

./server/scripts/cloud-scripts/zkcli.sh -zkhost 127.0.0.1:9983 -cmd
upconfig -confname my_new_config -confdir server/solr/configsets/basic_c
onfigs/conf

I dont know why HDP/HDInsight does not provide something like solrctl
commands to make life easier for all!




On Thu, Mar 8, 2018 at 5:43 PM, Shawn Heisey 
wrote:


On 3/8/2018 1:26 PM, Abhi Basu wrote:

I'm in a bind. Added Solr 4.9.0 to HDInsight cluster and find no

Solrctl

commands installed. So, I am doing the following to create a

collection.

This 'solrctl' command is NOT part of Solr.  Google tells me it's part
of software from Cloudera.

You need to talk to Cloudera for support on that software.


I have my collection schema in a location:

/home/sshuser/abhi/ems-collection/conf

Using this command to create a collection:

http://headnode1:8983/solr/admin/cores?action=CREATE=em

s-collection=/home/sshuser/abhi/ems-collection/conf



/

You're using the term "collection".  And later you mention ZooKeeper. So
you're almost certainly running in SolrCloud mode.  If your Solr is
running in SolrCloud mode, do not try to use the CoreAdmin API
(/solr/admin/cores).  Use the Collections API instead.  But before that,
you need to get the configuration into ZooKeeper.  For standard Solr
without Cloudera's tools, you would typically use the "zkcli" script
(either zkcli.sh or zkcli.bat).  See page 376 of the reference guide for
that specific version of Solr for help with the "upconfig" command for
that script:

http://archive.apache.org/dist/lucene/solr/ref-guide/apache-
solr-ref-guide-4.9.pdf


I guess i need to register my config name with Zk. How do I register

the

collection schema with Zookeeper?

Is there way to bypass the registration with zk and build the

collection

directly from my schema files at that folder location, like I was

able to

do in Solr 4.10 in CDH 5.14:

solrctl --zk hadoop-dn6.eso.local:2181/solr instancedir --create
ems-collection /home/sshuser/abhi/ems-collection/

solrctl --zk hadoop-dn6.eso.local:2181/solr collection --create
ems-collection -s 3 -r 2

The solrctl command is not something we can help you with on this
mailing list.  Cloudera customizes Solr to the point where only they are
able to really provide support for their version.  Your best bet will be
to talk to Cloudera.

When Solr is running with ZooKeeper, it's in SolrCloud mode.  In
SolrCloud mode, you cannot create cores in the same way that you can in
standalone mode -- you MUST create collections, and all configuration
will be in zookeeper, not on the disk.

Thanks,
Shawn




--
Abhi Basu




--
Abhi Basu




--
Abhi Basu








Re: CDCR performance issues

2018-03-09 Thread john spooner

please unsubscribe i tried to manaually unsubscribe


On 3/9/2018 12:59 PM, Tom Peters wrote:

Thanks. This was helpful. I did some tcpdumps and I'm noticing that the 
requests to the target data center are not batched in any way. Each update 
comes in as an independent update. Some follow-up questions:

1. Is it accurate that updates are not actually batched in transit from the 
source to the target and instead each document is posted separately?

2. Are they done synchronously? I assume yes (since you wouldn't want 
operations applied out of order)

3. If they are done synchronously, and are not batched in any way, does that 
mean that the best performance I can expect would be roughly how long it takes 
to round-trip a single document? ie. If my average ping is 25ms, then I can 
expect a peak performance of roughly 40 ops/s.

Thanks




On Mar 9, 2018, at 11:21 AM, Davis, Daniel (NIH/NLM) [C]  
wrote:

These are general guidelines, I've done loads of networking, but may be less 
familiar with SolrCloud  and CDCR architecture.  However, I know it's all TCP 
sockets, so general guidelines do apply.

Check the round-trip time between the data centers using ping or TCP ping.   
Throughput tests may be high, but if Solr has to wait for a response to a 
request before sending the next action, then just like any network protocol 
that does that, it will get slow.

I'm pretty sure CDCR uses HTTP/HTTPS rather than just TCP, so also check 
whether some proxy/load balancer between data centers is causing it to be a 
single connection per operation.   That will *kill* performance.   Some proxies 
default to HTTP/1.0 (open, send request, server send response, close), and that 
will hurt.

Why you should listen to me even without SolrCloud knowledge - checkout paper 
"Latency performance of SOAP Implementations".   Same distribution of skills - 
I knew TCP well, but Apache Axis 1.1 not so well.   I still improved response time of 
Apache Axis 1.1 by 250ms per call with 1-line of code.

-Original Message-
From: Tom Peters [mailto:tpet...@synacor.com]
Sent: Wednesday, March 7, 2018 6:19 PM
To: solr-user@lucene.apache.org
Subject: CDCR performance issues

I'm having issues with the target collection staying up-to-date with indexing 
from the source collection using CDCR.

This is what I'm getting back in terms of OPS:

curl -s 'solr2-a:8080/solr/mycollection/cdcr?action=OPS' | jq .
{
  "responseHeader": {
"status": 0,
"QTime": 0
  },
  "operationsPerSecond": [
"zook01,zook02,zook03/solr",
[
  "mycollection",
  [
"all",
49.10140553500938,
"adds",
10.27612635309587,
"deletes",
38.82527896994054
  ]
]
  ]
}

The source and target collections are in separate data centers.

Doing a network test between the leader node in the source data center and the 
ZooKeeper nodes in the target data center show decent enough network 
performance: ~181 Mbit/s

I've tried playing around with the "batchSize" value (128, 512, 728, 1000, 
2000, 2500) and they've haven't made much of a difference.

Any suggestions on potential settings to tune to improve the performance?

Thanks

--

Here's some relevant log lines from the source data center's leader:

2018-03-07 23:16:11.984 INFO  
(cdcr-replicator-207-thread-3-processing-n:solr2-a:8080_solr 
x:mycollection_shard1_replica_n6 s:shard1 c:mycollection r:core_node9) 
[c:mycollection s:shard1 r:core_node9 x:mycollection_shard1_replica_n6] 
o.a.s.h.CdcrReplicator Forwarded 511 updates to target mycollection
2018-03-07 23:16:23.062 INFO  
(cdcr-replicator-207-thread-4-processing-n:solr2-a:8080_solr 
x:mycollection_shard1_replica_n6 s:shard1 c:mycollection r:core_node9) 
[c:mycollection s:shard1 r:core_node9 x:mycollection_shard1_replica_n6] 
o.a.s.h.CdcrReplicator Forwarded 510 updates to target mycollection
2018-03-07 23:16:32.063 INFO  
(cdcr-replicator-207-thread-5-processing-n:solr2-a:8080_solr 
x:mycollection_shard1_replica_n6 s:shard1 c:mycollection r:core_node9) 
[c:mycollection s:shard1 r:core_node9 x:mycollection_shard1_replica_n6] 
o.a.s.h.CdcrReplicator Forwarded 511 updates to target mycollection
2018-03-07 23:16:36.209 INFO  
(cdcr-replicator-207-thread-1-processing-n:solr2-a:8080_solr 
x:mycollection_shard1_replica_n6 s:shard1 c:mycollection r:core_node9) 
[c:mycollection s:shard1 r:core_node9 x:mycollection_shard1_replica_n6] 
o.a.s.h.CdcrReplicator Forwarded 512 updates to target mycollection
2018-03-07 23:16:42.091 INFO  
(cdcr-replicator-207-thread-2-processing-n:solr2-a:8080_solr 
x:mycollection_shard1_replica_n6 s:shard1 c:mycollection r:core_node9) 
[c:mycollection s:shard1 r:core_node9 x:mycollection_shard1_replica_n6] 
o.a.s.h.CdcrReplicator Forwarded 512 updates to target mycollection
2018-03-07 23:16:46.790 INFO