Intermittent issue in solr index update

2017-10-18 Thread Bhaumik Joshi
Hi,


I am facing "Cannot talk to ZooKeeper" issue intermittently in solr index 
update. While facing this issue strange thing is that there is no error in 
ZooKeeper logs and also all shards are showing active in solr admin panel.


Please find below details logs and Solr server configuration.


Logs:

ERROR (qtp41903949-261266) [c:documents s:shard1 r:core_node4 x:documents] 
o.a.s.h.RequestHandlerBase org.apache.solr.common.SolrException: Cannot talk to 
ZooKeeper - Updates are disabled.
at 
org.apache.solr.update.processor.DistributedUpdateProcessor.zkCheck(DistributedUpdateProcessor.java:1490)
at 
org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:678)
at 
org.apache.solr.update.processor.LogUpdateProcessorFactory$LogUpdateProcessor.processAdd(LogUpdateProcessorFactory.java:103)
at 
org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:48)
at 
org.apache.solr.update.processor.AsiteDocumentUpdateReqProcessor.processAdd(AsiteDocumentUpdateReqProcessorFactory.java:125)
at org.apache.solr.handler.loader.JavabinLoader$1.update(JavabinLoader.java:97)
at 
org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readOuterMostDocIterator(JavaBinUpdateRequestCodec.java:179)
at 
org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readIterator(JavaBinUpdateRequestCodec.java:135)
at org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:274)
at 
org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readNamedList(JavaBinUpdateRequestCodec.java:121)
at org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:239)
at org.apache.solr.common.util.JavaBinCodec.unmarshal(JavaBinCodec.java:157)
at 
org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec.unmarshal(JavaBinUpdateRequestCodec.java:186)
at 
org.apache.solr.handler.loader.JavabinLoader.parseAndLoadDocs(JavabinLoader.java:107)
at org.apache.solr.handler.loader.JavabinLoader.load(JavabinLoader.java:54)
at 
org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:97)
at 
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:69)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:156)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:2036)
at org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:657)
at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:464)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:257)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:208)
at 
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1668)
at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:581)
at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)
at 
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:226)
at 
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1160)
at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:511)
at 
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
at 
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1092)
at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
at 
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:213)
at 
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:119)
at 
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)
at org.eclipse.jetty.server.Server.handle(Server.java:518)
at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:308)
at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:244)
at 
org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:273)
at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:95)
at 
org.eclipse.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93)
at 
org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.produceAndRun(ExecuteProduceConsume.java:246)
at 
org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProduceConsume.java:156)
at 
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:654)
at 
org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:572)
at java.lang.Thread.run(Thread.java:745)


Solr server configuration:
Processor: Intel(R) Xeon(R) CPU ES-2630 V4 @ 2.20Ghz (2 processor)
RAM : 128 GB usable
System type : 64-bit
OS : Window Server 2012 standard


Thanks & Regards,

Bhaumik Joshi


Issue in SolrInputDocument

2016-07-21 Thread Bhaumik Joshi
Hi,

I am getting below error while converting json to my object. I am using Gson 
class (gson-2.2.4.jar) to generate json from object and object from json.
gson fromJson() method throws below error.
Note: This was working fine with solr-solrj-5.2.0.jar but it causing issue when 
i uses solr-solrj-6.1.0.jar. As i checked SolrInputDocument class has changed 
in solr-solrj-5.5.0.

java.lang.IllegalArgumentException: Can not set 
org.apache.solr.common.SolrInputDocument field 
com.test.common.MySolrMessage.body to com.google.gson.internal.LinkedTreeMap
at 
sun.reflect.UnsafeFieldAccessorImpl.throwSetIllegalArgumentException(UnsafeFieldAccessorImpl.java:167)
at 
sun.reflect.UnsafeFieldAccessorImpl.throwSetIllegalArgumentException(UnsafeFieldAccessorImpl.java:171)
at 
sun.reflect.UnsafeObjectFieldAccessorImpl.set(UnsafeObjectFieldAccessorImpl.java:81)
at java.lang.reflect.Field.set(Field.java:764)
at 
com.google.gson.internal.bind.ReflectiveTypeAdapterFactory$1.read(ReflectiveTypeAdapterFactory.java:108)
at 
com.google.gson.internal.bind.ReflectiveTypeAdapterFactory$Adapter.read(ReflectiveTypeAdapterFactory.java:185)
at 
com.google.gson.internal.bind.TypeAdapterRuntimeTypeWrapper.read(TypeAdapterRuntimeTypeWrapper.java:40)
at 
com.google.gson.internal.bind.CollectionTypeAdapterFactory$Adapter.read(CollectionTypeAdapterFactory.java:81)
at 
com.google.gson.internal.bind.CollectionTypeAdapterFactory$Adapter.read(CollectionTypeAdapterFactory.java:1)
at 
com.google.gson.internal.bind.ReflectiveTypeAdapterFactory$1.read(ReflectiveTypeAdapterFactory.java:106)
at 
com.google.gson.internal.bind.ReflectiveTypeAdapterFactory$Adapter.read(ReflectiveTypeAdapterFactory.java:185)
at com.google.gson.Gson.fromJson(Gson.java:825)
at com.google.gson.Gson.fromJson(Gson.java:790)
at com.google.gson.Gson.fromJson(Gson.java:739)
at com.google.gson.Gson.fromJson(Gson.java:711)


public class MySolrMessage implements IMessage
{
private static final long serialVersionUID = 1L;
private T body = null;
private String collection;
private int action;
private int errorCode;
private long msgId;
//few parameterized constructor
//getter and setter method of all above attributes
}

public interface IMessage extends Serializable
{
public long getMsgId();
public void setMsgId(long id);
public Object getBody();
public void setBody(Object o);
public void setErrorCode(int ec);
public int getErrorCode();
}

public class Request {
LinkedList msgList = new LinkedList();

public Request() {
}

public Request(LinkedList l) {
this.msgList = l;
}

public LinkedList getMsgList() {
return this.msgList;
}
}

@JsonAutoDetect(JsonMethod.FIELD)
@JsonSerialize(include = JsonSerialize.Inclusion.NON_NULL)
public class Request2
{
@JsonProperty
@JsonDeserialize(as=LinkedList.class,contentAs = MySolrMessage.class)
LinkedList<MySolrMessage> msgList = new 
LinkedList<MySolrMessage>();

public Request()
{

}

public Request(LinkedList<MySolrMessage> l)
{
this.msgList = l;
}

public LinkedList<MySolrMessage> getMsgList()
{
return this.msgList;
}
}


public class Test {

public static void main(String[] args) {
SolrInputDocument solrDocument = new SolrInputDocument();
solrDocument.addField("id", "1234");
solrDocument.addField("name", "test");
MySolrMessage asm = new MySolrMessage(solrDocument, 
"collection1", 1);
IMessage message = asm;
List msgList = new ArrayList();
msgList.add(message);
LinkedList ex = new LinkedList();
ex.addAll(msgList);
Request request = new Request(ex);
try
{
String json = "";
Gson gson = (new GsonBuilder()).serializeNulls().create();
gson.setASessionId((String) null);
json = gson.toJson(request);
Gson gson2 = new Gson();
Request2 retObj = gson2.fromJson(json, Request2.class); //this will gives the 
above error.
}
catch (Exception e)
{
   e.printStackTrace();
}
}
}

Any idea?



Thanks & Regards,

Bhaumik Joshi


Re: Disabling solr scoring

2016-07-11 Thread Bhaumik Joshi
Thanks Hoss got the point.


Bhaumik Joshi


From: Chris Hostetter <hossman_luc...@fucit.org>
Sent: Friday, July 8, 2016 4:52 PM
To: solr-user
Subject: Re: Disabling solr scoring


: Can you please elaborate? I am passing user defined sort field and order 
whenever i search.

I think Mikhail just missunderstood your question -- he was giving an
example of how to override the default sort (which uses score) with one
that would ensure scores are not computed.

: > Is there any way to completely disable scoring in solr cloud as i am
: > always passing sort parameter whenever i search.

In general, you don't have to do anythign special.

Solr's internal code looks at the sort specified, and the fields requested
(via the fl param) to determine if/when scores need to be computed while
colleting documents.  If scores aren't needed for any reason, then that
info is passed down to the low level lucene document matching/collection
code for optimizing the collection so scores aren't computed.


-Hoss
http://www.lucidworks.com/
Lucidworks<http://www.lucidworks.com/>
www.lucidworks.com
Lucidworks Fusion is the search and analytics platform powering the next 
generation of big data applications.





Re: Disabling solr scoring

2016-07-08 Thread Bhaumik Joshi
Can you please elaborate? I am passing user defined sort field and order 
whenever i search.


Thanks & Regards,

Bhaumik Joshi



From: Mikhail Khludnev <m...@apache.org>
Sent: Friday, July 8, 2016 4:13 AM
To: solr-user
Subject: Re: Disabling solr scoring

What about
sort=_docid_ asc
?
08  2016 ?. 13:50 ???? "Bhaumik Joshi" <
bhaumik.jo...@outlook.com> ???:

> Hi,
>
>
> Is there any way to completely disable scoring in solr cloud as i am
> always passing sort parameter whenever i search.
>
> And disabling scoring will improve performance?
>
>
> Thanks & Regards,
>
> Bhaumik Joshi
>


Disabling solr scoring

2016-07-08 Thread Bhaumik Joshi
Hi,


Is there any way to completely disable scoring in solr cloud as i am always 
passing sort parameter whenever i search.

And disabling scoring will improve performance?


Thanks & Regards,

Bhaumik Joshi


Re: Passing Ids in query takes more time

2016-05-08 Thread Bhaumik Joshi
Thanks Jeff. TermsQueryParser worked for me. 

Thanks & Regards,
Bhaumik Joshi


From: Jeff Wartes <jwar...@whitepages.com>
Sent: Thursday, May 5, 2016 8:19 AM
To: solr-user@lucene.apache.org
Subject: Re: Passing Ids in query takes more time

An ID lookup is a very simple and fast query, for one ID. Or’ing a lookup for 
80k ids though is basically 80k searches as far as Solr is concerned, so it’s 
not altogether surprising that it takes a while. Your complaint seems to be 
that the query planner doesn’t know in advance that  should be 
run first, and then the id selection applied to the reduced set.

So, I can think of a few things for you to look at, in no particular order:

1. TermsQueryParser is designed for lists of terms, you might get better 
results from that: 
https://cwiki.apache.org/confluence/display/solr/Other+Parsers#OtherParsers-TermsQueryParser

2. If your  is the real discriminating factor in your search, 
you could just search for  and then apply your ID list as a 
PostFilter: http://yonik.com/advanced-filter-caching-in-solr/
I guess that’d look something like ={!terms f= v="= 100 
should qualify it as a post filter, which only operates on an already-found 
result set instead of the full index. (Note: I haven’t confirmed that the Terms 
query parser supports post filtering.)

3. I’m not really aware of any storage engine that’ll love doing a filter on 
80k ids at once, but a key-value store like Cassandra might work out better for 
that.

4. There is a thing called a JoinQParserPlugin 
(https://cwiki.apache.org/confluence/display/solr/Other+Parsers#OtherParsers-JoinQueryParser)
 that can join to another collection 
(https://issues.apache.org/jira/browse/SOLR-4905). But I’ve never used it, and 
there are some significant restrictions.




On 5/5/16, 2:46 AM, "Bhaumik Joshi" <bhaumik.jo...@outlook.com> wrote:

>Hi,
>
>
>I am retrieving ids from collection1 based on some query and passing those ids 
>as a query to collection2 so the query to collection2 which contains ids in it 
>takes much more time compare to normal query.
>
>
>Que. 1 - While passing ids to query why it takes more time compare to normal 
>query however we are narrowing the criteria by passing ids?
>
>e.g.  query-1: doc_id:(111 222 333 444 ...) AND  slower 
>(passing 80k ids takes 7-9 sec) than query-2: only  (700-800 
>ms). Both returns 250 records with same set of fields.
>
>
>Que. 2 - Any idea on how i can achieve above (get ids from one collection and 
>pass those ids to other one) in efficient manner or any other way to get data 
>from one collection based on response of other collection?
>
>
>Thanks & Regards,
>
>Bhaumik Joshi

Re: Passing IDs in query takes more time

2016-05-08 Thread Bhaumik Joshi
Thanks Erick. TermsQueryParser worked for me. 

Thanks & Regards,
Bhaumik Joshi


From: Erick Erickson <erickerick...@gmail.com>
Sent: Friday, May 6, 2016 10:00 AM
To: solr-user
Subject: Re: Passing IDs in query takes more time

Well, you're parsing 80K IDs and forming them into a query. Consider
what has to happen. Even in the very best case of the 
being evaluated first, for every doc that satisfies that clause the inverted
index must be examined 80,000 times to see if that doc matches
one of the IDs in your huge clause for scoring purposes.

You might be better off by moving the 80K list to an fq clause like
fq={!cache=false}docid:(111 222 333).

Additionally, you probably want to use the TermsQueryParser, something like:
fq={!terms f=id cache=false}111,222,333
see:
https://cwiki.apache.org/confluence/display/solr/Other+Parsers#OtherParsers-TermsQueryParser

In any case, though, an 80K clause will slow things down considerably.

Best,
Erick

On Thu, May 5, 2016 at 2:42 AM, Bhaumik Joshi <bhaumik.jo...@outlook.com> wrote:
> Hi,
>
>
> I am retrieving ids from collection1 based on some query and passing those 
> ids as a query to collection2 so the query to collection2 which contains ids 
> in it takes much more time compare to normal query.
>
>
> Que. 1 - While passing ids to query why it takes more time compare to normal 
> query however we are narrowing the criteria by passing ids?
>
> e.g.  query-1: doc_id:(111 222 333 444 ...) AND  slower 
> (takes 7-9 sec) than
>
> only  (700-800 ms). Please note that in this case i am 
> passing 80k ids in  and retrieving 250 rows.
>
>
> Que. 2 - Any idea on how i can achieve above (get ids from one collection and 
> pass those ids to other one) in efficient manner or any other way to get data 
> from one collection based on response of other collection?
>
>
> Thanks & Regards,
>
> Bhaumik Joshi

Passing Ids in query takes more time

2016-05-05 Thread Bhaumik Joshi
Hi,


I am retrieving ids from collection1 based on some query and passing those ids 
as a query to collection2 so the query to collection2 which contains ids in it 
takes much more time compare to normal query.


Que. 1 - While passing ids to query why it takes more time compare to normal 
query however we are narrowing the criteria by passing ids?

e.g.  query-1: doc_id:(111 222 333 444 ...) AND  slower 
(passing 80k ids takes 7-9 sec) than query-2: only  (700-800 
ms). Both returns 250 records with same set of fields.


Que. 2 - Any idea on how i can achieve above (get ids from one collection and 
pass those ids to other one) in efficient manner or any other way to get data 
from one collection based on response of other collection?


Thanks & Regards,

Bhaumik Joshi


Passing IDs in query takes more time

2016-05-05 Thread Bhaumik Joshi
Hi,


I am retrieving ids from collection1 based on some query and passing those ids 
as a query to collection2 so the query to collection2 which contains ids in it 
takes much more time compare to normal query.


Que. 1 - While passing ids to query why it takes more time compare to normal 
query however we are narrowing the criteria by passing ids?

e.g.  query-1: doc_id:(111 222 333 444 ...) AND  slower (takes 
7-9 sec) than

only  (700-800 ms). Please note that in this case i am passing 
80k ids in  and retrieving 250 rows.


Que. 2 - Any idea on how i can achieve above (get ids from one collection and 
pass those ids to other one) in efficient manner or any other way to get data 
from one collection based on response of other collection?


Thanks & Regards,

Bhaumik Joshi


Re: Solr Sharding Strategy

2016-04-15 Thread Bhaumik Joshi
Hi ,

Toke - I tried with pausing the indexing fully but got the slight improvement 
so the impact of indexing is not that much.

Shawn - Answer to your question - I am sending one document in one update 
request.

I have test solr cloud configured like 2 shards on one machine and each of has 
one replica on another machine. So in order to check the network latency is 
bottleneck or not i have disabled replicas and run the test but didn't get 
improvement.

Another thing i have tried in order to balance the load and providing more CPU 
and memory resources i have configured only 2 shards both are on separate 
machine and no replica and then run the test but in that case performance got 
down.

Talking about the production we want to have 2 shard in order to make platform 
scalable  and future proof. Just want inform that we have 22 collections on 
production in that 4 are major in terms of volume and complexity and which 
frequently used for querying and indexing and rest of them are comparatively 
minor and have less query and index hits. Below are the production index 
statistics.

No of collections: 22 collections having 139 million documents with index size 
of 85 GB.
Major collections: 4 collections having 134 million documents with index size 
of 77 GB.
Minor collections: 18 collections having 5 million documents with index size of 
8 GB.

So any idea on how to improve query performance with this statistics along with 
Index-heavy (100 index updates per sec) and Query-heavy (100 queries per sec) 
scenario?

Thanks & Regards,
Bhaumik Joshi


From: Shawn Heisey <apa...@elyograg.org>
Sent: Tuesday, April 12, 2016 7:37 AM
To: solr-user@lucene.apache.org
Subject: Re: Solr Sharding Strategy

On 4/11/2016 6:31 AM, Bhaumik Joshi wrote:
> We are using solr 5.2.0 and we have Index-heavy (100 index updates per
> sec) and Query-heavy (100 queries per sec) scenario.
>
> *Index stats: *10 million documents and 16 GB index size
>
>
>
> Which sharding strategy is best suited in above scenario?
>
> Please share reference resources which states detailed comparison of
> single shard over multi shard if any.
>
>
>
> Meanwhile we did some tests with SolrMeter (Standalone java tool for
> stress tests with Solr) for single shard and two shards.
>
> *Index stats of test solr cloud: *0.7 million documents and 1 GB index
> size.
>
> As observed in test average query time with 2 shards is much higher
> than single shard.
>

On the same hardware, multiple shards will usually be slower than one
shard, especially under a high load.  Sharding can give good results
with *more* hardware, providing more CPU and memory resources.  When the
query load is high, there should only be only one core (shard replica)
per server, and Solr works best when it is running on bare metal, not
virtualized.

Handling 100 queries per second will require multiple copies of your
index on separate hardware.  This is a fairly high query load.  There
are installations handling much higher loads, of course.  Those
installations have a LOT of replicas and some way to balance load across
them.

For 10 million documents and 16GB of index, I'm not sure that I would
shard at all, just make sure that each machine has plenty of memory --
probably somewhere in the neighborhood of 24GB to 32GB.  That assumes
that Solr is the only thing running on that server, and that if it's
virtualized, making sure that the physical server's memory is not
oversubscribed.

Regarding your specific numbers:

The low queries per second may be caused by one or more of these
problems, or perhaps something I haven't thought of:  1) your queries
are particularly heavy.  2) updates are interfering by tying up scarce
resources.  3) you don't have enough memory in the machine.

How many documents are in each update request that you are sending?  In
another thread on the list, you have stated that you have a 1 second
maxTime on autoSoftCommit.  This is *way* too low, and a *major* source
of performance issues.  Very few people actually need that level of
latency -- a maxTime measured in minutes may be fast enough, and is much
friendlier for performance.

Thanks,
Shawn

Re: Soft commit does not affecting query performance

2016-04-13 Thread Bhaumik Joshi
Hi Bill,


Please find below reference.

http://www.cloudera.com/documentation/enterprise/5-4-x/topics/search_tuning_solr.html
* "Enable soft commits and set the value to the largest value that 
meets your requirements. The default value of 1000 (1 second) is too aggressive 
for some environments."


Thanks & Regards,

Bhaumik Joshi



From: billnb...@gmail.com <billnb...@gmail.com>
Sent: Monday, April 11, 2016 7:07 AM
To: solr-user@lucene.apache.org
Subject: Re: Soft commit does not affecting query performance

Why do you think it would ?

Bill Bell
Sent from mobile


> On Apr 11, 2016, at 7:48 AM, Bhaumik Joshi <bjo...@asite.com> wrote:
>
> Hi All,
>
> We are doing query performance test with different soft commit intervals. In 
> the test with 1sec of soft commit interval and 1min of soft commit interval 
> we didn't notice any improvement in query timings.
>
>
>
> We did test with SolrMeter (Standalone java tool for stress tests with Solr) 
> for 1sec soft commit and 1min soft commit.
>
> Index stats of test solr cloud: 0.7 million documents and 1 GB index size.
>
> Solr cloud has 2 shard and each shard has one replica.
>
>
>
> Please find below detailed test readings: (all timings are in milliseconds)
>
>
> Soft commit - 1sec
> Queries per sec Updates per sec   Total Queries   
>   Total Q time   Avg Q Time Total Client time 
>   Avg Client time
> 1  5  
> 100 44340 
>443 48834
> 488
> 5  5  
> 101 128914
>   1276   143239  1418
> 10   5
>   104 295325  
> 2839   330931  3182
> 25   5
>   102 675319  
> 6620   793874  7783
>
> Soft commit - 1min
> Queries per sec Updates per sec   Total Queries   
>   Total Q time   Avg Q Time Total Client time 
>   Avg Client time
> 1  5  
> 100 44292 
>442 48569
> 485
> 5  5  
> 105 131389
>   1251   147174  1401
> 10   5
>   102 299518  
> 2936   337748  3311
> 25   5
>   108 742639  
> 6876   865222      8011
>
> As theory suggests soft commit affects query performance but in my case it 
> doesn't. Can you put some light on this?
> Also suggest if I am missing something here.
>
> Regards,
> Bhaumik Joshi
>
>
>
>
>
>
>
>
>
>
> [Asite]
>
> The Hyperloop Station Design Competition - A 48hr design collaboration, from 
> mid-day, 23rd May 2016.
> REGISTER HERE http://www.buildearthlive.com/hyperloop
[http://www.buildearthlive.com/resources/images/BuildEarthLiveLogo-Hyperloop-2.png]<http://www.buildearthlive.com/hyperloop>

The Hyperloop Station Design Competition - Build Earth 
Live<http://www.buildearthlive.com/hyperloop>
www.buildearthlive.com
The Hyperloop Station Design Competition. A 48hr design collaboration, from 
mid-day,23rd May.



>
> [Build Earth Live Hyperloop]<http://www.buildearthlive.com/hyperloop>
>
> [CC Award Winners 2015]


Re: Solr Sharding Strategy

2016-04-12 Thread Bhaumik Joshi
Ok i will try with pausing the indexing fully and will check the impact.

In performance test queries issued sequentially.

Thanks & Regards,
Bhaumik Joshi

From: Toke Eskildsen <t...@statsbiblioteket.dk>
Sent: Monday, April 11, 2016 11:13 PM
To: Bhaumik Joshi
Cc: solr-user@lucene.apache.org
Subject: Re: Solr Sharding Strategy

On Tue, 2016-04-12 at 05:57 +0000, Bhaumik Joshi wrote:

> //Insert Document
> UpdateResponse resp = cloudServer.add(doc, 1000);
>
Don't insert documents one at a time, if it can be avoided:
https://lucidworks.com/blog/2015/10/05/really-batch-updates-solr-2/


Try pausing the indexing fully when you do your query test, to check how
big the impact of indexing is.

When you run your query performance test, are the queries issued
sequentially or in parallel?


- Toke Eskildsen, State and Univeristy Library, Denmark



Re: Solr Sharding Strategy

2016-04-11 Thread Bhaumik Joshi
Please note that all caches are disable in mentioned test.


In 2 shards: Intended queries and updates = 10 per sec Actual queries per sec = 
3.3 Actual updates per sec = 10 so for 302 queries avg query time is 2192ms.

In 1 shard: Intended queries and updates = 10 per sec Actual queries per sec = 
9.7 Actual updates per sec = 10.3 so for 302 queries avg query time is 83ms.

We do soft commit when we insert/update document.

//Insert Document
UpdateResponse resp = cloudServer.add(doc, 1000);
if (resp.getStatus() == 0)
{
success = true;
}

//Update Document
UpdateRequest req = new UpdateRequest();
req.setCommitWithin(1000);
req.add(docs);
UpdateResponse resp = req.process(cloudServer);
if (resp.getStatus() == 0)
{
success = true;
}

Here is commit settings in solrconfig.xml.


60
2
false



${solr.autoSoftCommit.maxTime:-1}




Thanks & Regards,

Bhaumik Joshi


From: Daniel Collins <danwcoll...@gmail.com>
Sent: Monday, April 11, 2016 8:12 AM
To: solr-user@lucene.apache.org
Subject: Re: Solr Sharding Strategy

I'd also ask about your indexing times, what QTime do you see for indexing
(in both scenarios), and what commit times are you using (which Toke
already asked).

Not entirely sure how to read your table, but looking at the indexing side
of things, with 2 shards, there is inherently more work to do, so you would
expect indexing latency to increase (we have to index in 1 shard, and then
index in the 2nd shard, so logically its twice the workload).

Your table suggests you managed 10 updates per second, but you never
managed 25 updates per second either with 1 shard or 2 shards.  Though the
numbers don't make sense, you managed 13.9 updates per sec on 1 shard, and
21.9 updates per sec on 2 shards.  That suggests to me that in the single
shard case, your searches are causing your indexing to throttle, maybe the
resourcing is favoring searches and so the indexing threads aren't getting
a look in...  Whereas in the 2 shard case, it seems clear (as Toke said),
that search isn't really hitting the index much, not sure where the
bottleneck is, but its not on the index, which is why your indexing load
can get more requests through.

On 11 April 2016 at 15:36, Toke Eskildsen <t...@statsbiblioteket.dk> wrote:

> On Mon, 2016-04-11 at 11:23 +, Bhaumik Joshi wrote:
> > We are using solr 5.2.0 and we have Index-heavy (100 index updates per
> > sec) and Query-heavy (100 queries per sec) scenario.
>
> > Index stats: 10 million documents and 16 GB index size
>
> > Which sharding strategy is best suited in above scenario?
>
> Sharding reduces query throughput and can improve query latency as well
> as indexing speed. For small indexes, the overhead of sharding is likely
> to worsen query latency. So as always, it depends.
>
> Qualified guess: Don't use multiple shards, but consider using replicas.
>
> > Please share reference resources which states detailed comparison of
> > single shard over multi shard if any.
>
> Sorry, could not find the one I had in mind.
> >
> > Meanwhile we did some tests with SolrMeter (Standalone java tool for
> > stress tests with Solr) for single shard and two shards.
> >
> > Index stats of test solr cloud: 0.7 million documents and 1 GB index
> > size.
> >
> > As observed in test average query time with 2 shards is much higher
> > than single shard.
>
> Makes sense: Your shards are so small that the actual time spend on the
> queries is very low. So relatively, the overhead of distributed (aka
> multi-shard) searching is high, negating any search-gain you got by
> sharding. I would not have expected the performance drop-off to be that
> large (factor 20-60) though.
>
> Your query speed is unusually low for an index of your size, which leads
> me to believe that your indexing is slowing everything down. This is
> often due to too frequent commits and/or too many warm up queries.
>
> There is a bit about it at
> https://wiki.apache.org/solr/SolrPerformanceFactors
>
>
> - Toke Eskildsen, State and University Library, Denmark
>
>
>
>


Soft commit does not affecting query performance

2016-04-11 Thread Bhaumik Joshi
Hi All,

We are doing query performance test with different soft commit intervals. In 
the test with 1sec of soft commit interval and 1min of soft commit interval we 
didn't notice any improvement in query timings.



We did test with SolrMeter (Standalone java tool for stress tests with Solr) 
for 1sec soft commit and 1min soft commit.

Index stats of test solr cloud: 0.7 million documents and 1 GB index size.

Solr cloud has 2 shard and each shard has one replica.



Please find below detailed test readings: (all timings are in milliseconds)


Soft commit - 1sec
Queries per sec Updates per sec   Total Queries 
Total Q time   Avg Q Time Total Client time   
Avg Client time
1  5
  100 44340 
   443 48834488
5  5
  101 128914
  1276   143239  1418
10   5  
104 295325  
2839   330931  3182
25   5  
102 675319  
6620   793874  7783

Soft commit - 1min
Queries per sec Updates per sec   Total Queries 
Total Q time   Avg Q Time Total Client time   
Avg Client time
1  5
  100 44292 
   442 48569485
5  5
  105 131389
  1251   147174  1401
10   5  
102 299518  
2936   337748  3311
25   5  
108 742639  
6876   865222  8011

As theory suggests soft commit affects query performance but in my case it 
doesn't. Can you put some light on this?
Also suggest if I am missing something here.

Regards,
Bhaumik Joshi










[Asite]

The Hyperloop Station Design Competition - A 48hr design collaboration, from 
mid-day, 23rd May 2016.
REGISTER HERE http://www.buildearthlive.com/hyperloop

[Build Earth Live Hyperloop]<http://www.buildearthlive.com/hyperloop>

[CC Award Winners 2015]


Solr Sharding Strategy

2016-04-11 Thread Bhaumik Joshi
Hi,



We are using solr 5.2.0 and we have Index-heavy (100 index updates per sec) and 
Query-heavy (100 queries per sec) scenario.

Index stats: 10 million documents and 16 GB index size



Which sharding strategy is best suited in above scenario?

Please share reference resources which states detailed comparison of single 
shard over multi shard if any.



Meanwhile we did some tests with SolrMeter (Standalone java tool for stress 
tests with Solr) for single shard and two shards.

Index stats of test solr cloud: 0.7 million documents and 1 GB index size.

As observed in test average query time with 2 shards is much higher than single 
shard.

Please find below detailed readings:
2 Shards

Intended queries per sec

Actual queries per min

Actual queries per sec

Intended updates per sec

Actual updates per min

Actual updates per sec

Total Queries

Total Q time (ms)

Avg Q Time (ms)

Avg Q Time (sec)

Total Client time (ms)

Avg Client time (ms)

10

198

3.3

10

600

10

302

662176

2192

2.192

756603

2505

25

168

2.8

25

1314

21.9

301

2019735

6710

6.71

2370018

7873


1 Shard

Intended queries per sec

Actual queries per min

Actual queries per sec

Intended updates per sec

Actual updates per min

Actual updates per sec

Total Queries

Total Q time (ms)

Avg Q Time (ms)

Avg Q Time (sec)

Total Client time (ms)

Avg Client time (ms)

10

582

9.7

10

618

10.3

302

25081

83

0.083

55612

184

25

1026

17.1

25

834

13.9

306

33366

109

0.109

259392

847


Note: Query returns 250 rows and matches 57880 documents




Thanks & Regards,


[Description: Description: Description: 
C:\Users\hparekh\AppData\Roaming\Microsoft\Signatures\images\logo.jpg]

Bhaumik Joshi
Developer



Asite, A4, Shivalik Business Center, B/h. Rajpath Club, Opp. Kens Ville Golf 
Academy, Bodakdev,
Ahmedabad 380054, Gujarat, India.
T: +91 (079) 4021 1900 Ext: 5234 | M: +91 94282 99055 | E: 
bjo...@asite.com<mailto:bjo...@asite.com>
W: www.asite.com<http://www.asite.com/> | Twitter: 
@Asite<https://twitter.com/Asite/> | Facebook: 
facebook.com/Asite<http://www.facebook.com/pages/ASITE/201872569531>



[Asite]

The Hyperloop Station Design Competition - A 48hr design collaboration, from 
mid-day, 23rd May 2016.
REGISTER HERE http://www.buildearthlive.com/hyperloop

[Build Earth Live Hyperloop]<http://www.buildearthlive.com/hyperloop>

[CC Award Winners 2015]


Solr Sharding Strategy

2016-04-11 Thread Bhaumik Joshi
Hi,



We are using solr 5.2.0 and we have Index-heavy (100 index updates per sec) and 
Query-heavy (100 queries per sec) scenario.

Index stats: 10 million documents and 16 GB index size



Which sharding strategy is best suited in above scenario?

Please share reference resources which states detailed comparison of single 
shard over multi shard if any.



Meanwhile we did some tests with SolrMeter (Standalone java tool for stress 
tests with Solr) for single shard and two shards.

Index stats of test solr cloud: 0.7 million documents and 1 GB index size.

As observed in test average query time with 2 shards is much higher than single 
shard.

Please find below detailed readings:
2 Shards

Intended queries per sec

Actual queries per min

Actual queries per sec

Intended updates per sec

Actual updates per min

Actual updates per sec

Total Queries

Total Q time (ms)

Avg Q Time (ms)

Avg Q Time (sec)

Total Client time (ms)

Avg Client time (ms)

10

198

3.3

10

600

10

302

662176

2192

2.192

756603

2505

25

168

2.8

25

1314

21.9

301

2019735

6710

6.71

2370018

7873


1 Shard

Intended queries per sec

Actual queries per min

Actual queries per sec

Intended updates per sec

Actual updates per min

Actual updates per sec

Total Queries

Total Q time (ms)

Avg Q Time (ms)

Avg Q Time (sec)

Total Client time (ms)

Avg Client time (ms)

10

582

9.7

10

618

10.3

302

25081

83

0.083

55612

184

25

1026

17.1

25

834

13.9

306

33366

109

0.109

259392

847


Note: Query returns 250 rows and matches 57880 documents





Thanks & Regards,


[Description: Description: Description: 
C:\Users\hparekh\AppData\Roaming\Microsoft\Signatures\images\logo.jpg]

Bhaumik Joshi
Developer



Asite, A4, Shivalik Business Center, B/h. Rajpath Club, Opp. Kens Ville Golf 
Academy, Bodakdev,
Ahmedabad 380054, Gujarat, India.
T: +91 (079) 4021 1900 Ext: 5234 | M: +91 94282 99055 | E: 
bjo...@asite.com<mailto:bjo...@asite.com>
W: www.asite.com<http://www.asite.com/> | Twitter: 
@Asite<https://twitter.com/Asite/> | Facebook: 
facebook.com/Asite<http://www.facebook.com/pages/ASITE/201872569531>



[Asite]

The Hyperloop Station Design Competition - A 48hr design collaboration, from 
mid-day, 23rd May 2016.
REGISTER HERE http://www.buildearthlive.com/hyperloop

[Build Earth Live Hyperloop]<http://www.buildearthlive.com/hyperloop>

[CC Award Winners 2015]