Re: SOLR as nosql database store

2017-05-08 Thread Bharath Kumar
Thanks Hrishikesh and Dave. We use SOLR cloud with 2 extra replicas, will
that not serve as backup when something goes wrong? Also we use latest solr
6 and from the documentation of solr, the indexing performance has been
good. The reason is that we are using MySQL as the primary data store and
the performance might not be optimal if we write data at a very rapid rate.
Already we index almost half the fields that are in MySQL in solr.

On Mon, May 8, 2017 at 9:24 PM, Dave  wrote:

> You will want to have both solr and a sql/nosql data storage option. They
> serve different purposes
>
>
> > On May 8, 2017, at 10:43 PM, bharath.mvkumar 
> wrote:
> >
> > Hi All,
> >
> > We have a use case where we have mysql database which stores documents
> and
> > also some of the fields in the document is also indexed in solr.
> > We plan to move all those documents to solr by making solr as the nosql
> > datastore for storing those documents. The reason we plan to do this is
> > because we have to support cross center data replication for both mysql
> and
> > solr and we are in a way duplicating the same data.The number of writes
> we
> > do per second is around 10,000. Also currently we have only one shard
> and we
> > have around 70 million records and we plan to support close to 1 billion
> > records and also perform sharding.
> >
> > Using solr as the nosql database is a good choice or should we look at
> > Cassandra for our use case?
> >
> > Thanks,
> > Bharath Kumar
> >
> >
> >
> > --
> > View this message in context: http://lucene.472066.n3.
> nabble.com/SOLR-as-nosql-database-store-tp4334119.html
> > Sent from the Solr - User mailing list archive at Nabble.com.
>



-- 
Thanks & Regards,
Bharath MV Kumar

"Life is short, enjoy every moment of it"


Solr licensing for commercial product.

2017-05-08 Thread vrindavda
Hello,

Please let me know what all things do I need to consider for licensing
before shipping solr with commercial product.

How will Solr know that what client is using it.

Thank you,
Vrinda Davda 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-licensing-for-commercial-product-tp4334146.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Search inside grouping list

2017-05-08 Thread donjose
Hi Emir,

Grouping by default is part of the configuration

   
 true
 assetid
 true 
   

Don.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Search-inside-grouping-list-tp4333488p4334136.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: SOLR as nosql database store

2017-05-08 Thread Dave
You will want to have both solr and a sql/nosql data storage option. They serve 
different purposes 


> On May 8, 2017, at 10:43 PM, bharath.mvkumar  
> wrote:
> 
> Hi All,
> 
> We have a use case where we have mysql database which stores documents and
> also some of the fields in the document is also indexed in solr. 
> We plan to move all those documents to solr by making solr as the nosql
> datastore for storing those documents. The reason we plan to do this is
> because we have to support cross center data replication for both mysql and
> solr and we are in a way duplicating the same data.The number of writes we
> do per second is around 10,000. Also currently we have only one shard and we
> have around 70 million records and we plan to support close to 1 billion
> records and also perform sharding.
> 
> Using solr as the nosql database is a good choice or should we look at
> Cassandra for our use case? 
> 
> Thanks,
> Bharath Kumar
> 
> 
> 
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/SOLR-as-nosql-database-store-tp4334119.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Re: SOLR as nosql database store

2017-05-08 Thread Hrishikesh Gadre
Hi Bharath,

In general its not a good idea to use Solr as the *primary data store* for
various reasons listed here,

https://wiki.apache.org/solr/HowToReindex


But if you design your system such that at-least one copy of the raw data
is stored in some other storage system then you can use Solr as the
operational database.

Hope this helps.

-Hrishikesh




On Mon, May 8, 2017 at 7:43 PM, bharath.mvkumar 
wrote:

> Hi All,
>
> We have a use case where we have mysql database which stores documents and
> also some of the fields in the document is also indexed in solr.
> We plan to move all those documents to solr by making solr as the nosql
> datastore for storing those documents. The reason we plan to do this is
> because we have to support cross center data replication for both mysql and
> solr and we are in a way duplicating the same data.The number of writes we
> do per second is around 10,000. Also currently we have only one shard and
> we
> have around 70 million records and we plan to support close to 1 billion
> records and also perform sharding.
>
> Using solr as the nosql database is a good choice or should we look at
> Cassandra for our use case?
>
> Thanks,
> Bharath Kumar
>
>
>
> --
> View this message in context: http://lucene.472066.n3.
> nabble.com/SOLR-as-nosql-database-store-tp4334119.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


SOLR as nosql database store

2017-05-08 Thread bharath.mvkumar
Hi All,

We have a use case where we have mysql database which stores documents and
also some of the fields in the document is also indexed in solr. 
We plan to move all those documents to solr by making solr as the nosql
datastore for storing those documents. The reason we plan to do this is
because we have to support cross center data replication for both mysql and
solr and we are in a way duplicating the same data.The number of writes we
do per second is around 10,000. Also currently we have only one shard and we
have around 70 million records and we plan to support close to 1 billion
records and also perform sharding.

Using solr as the nosql database is a good choice or should we look at
Cassandra for our use case? 

Thanks,
Bharath Kumar



--
View this message in context: 
http://lucene.472066.n3.nabble.com/SOLR-as-nosql-database-store-tp4334119.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: OutOfMemoryError and Too many open files

2017-05-08 Thread Erick Erickson
Solr/Lucene really like having a bunch of files available, so bumping
the ulimit is often the right thing to do.

This assumes you don't have any custom code that is failing to close
searchers and the like.

Best,
Erick

On Mon, May 8, 2017 at 10:40 AM, Satya Marivada
 wrote:
> Hi,
>
> Started getting below errors/exceptions. I have listed the resolution
> inline. Could you please see if I am headed right?
>
> The below error basically says that there are no more threads can be
> created as the limit has reached. We have big index and I assume the
> threads are being created outside of jvm and could not be because of low
> ulimit setting of nproc (4096). It has been increased to 131072. This
> number can be found by ulimit -u
>
> java.lang.OutOfMemoryError: unable to create new native thread
> at java.lang.Thread.start0(Native Method)
> at java.lang.Thread.start(Thread.java:714)
> at
> java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:950)
> at
> java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1368)
> at
> org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.execute(ExecutorUtil.java:214)
> at
> java.util.concurrent.AbstractExecutorService.submit(AbstractExecutorService.java:112)
> at
> org.apache.solr.common.cloud.SolrZkClient$3.process(SolrZkClient.java:268)
> at
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:522)
> at
> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498)
>
> The below error basically says that there are no more files can be opened
> as the limit has reached. It has been increased to 65536 from 4096. This
> number can be found by ulimit -Hn, ulimit -Sn
>
> java.io.IOException: Too many open files
> at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method)
> at
> sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:422)
> at
> sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:250)
> at
> org.eclipse.jetty.server.ServerConnector.accept(ServerConnector.java:382)
> at
> org.eclipse.jetty.server.AbstractConnector$Acceptor.run(AbstractConnector.java:593)
> at
> org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:654)
>
> Thanks,
> Satya


Re: OutOfMemoryError and Too many open files

2017-05-08 Thread Shawn Heisey
On 5/8/2017 11:40 AM, Satya Marivada wrote:
> Started getting below errors/exceptions. I have listed the resolution
> inline. Could you please see if I am headed right?
>
> java.lang.OutOfMemoryError: unable to create new native thread

> java.io.IOException: Too many open files

I have never had any luck setting these limits with the ulimit command. 
On Linux, I have adjusted these in the /etc/security/limits.conf config
file.  This is what I added to the file:

solrhardnproc   61440
solrsoftnproc   40960

solrhardnofile  65535
solrsoftnofile  49151

A reboot shouldn't be needed to get the change to take effect, though
you may need to log out and back in again before attempting to restart
Solr.  A reboot would be the guaranteed option.

For an OS other than Linux, the method for changing these limits is
probably going to be different.

Thanks,
Shawn



Re: SessionExpiredException

2017-05-08 Thread Satya Marivada
This is on solr-6.3.0 and external zookeeper 3.4.9

On Wed, May 3, 2017 at 11:39 PM Zheng Lin Edwin Yeo 
wrote:

> Are you using SolrCloud with external ZooKeeper, or Solr's internal
> ZooKeeper?
>
> Also, which version of Solr are you using?
>
> Regards,
> Edwin
>
> On 3 May 2017 at 21:32, Satya Marivada  wrote:
>
> > Hi,
> >
> > I see below exceptions in my logs sometimes. What could be causing it?
> >
> > org.apache.zookeeper.KeeperException$SessionExpiredException:
> > KeeperErrorCode = Session expired for /overseer
> > at
> > org.apache.zookeeper.KeeperException.create(KeeperException.java:127)
> > at
> > org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
> > at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:783)
> >
> > Thanks,
> > Satya
> >
>


OutOfMemoryError and Too many open files

2017-05-08 Thread Satya Marivada
Hi,

Started getting below errors/exceptions. I have listed the resolution
inline. Could you please see if I am headed right?

The below error basically says that there are no more threads can be
created as the limit has reached. We have big index and I assume the
threads are being created outside of jvm and could not be because of low
ulimit setting of nproc (4096). It has been increased to 131072. This
number can be found by ulimit -u

java.lang.OutOfMemoryError: unable to create new native thread
at java.lang.Thread.start0(Native Method)
at java.lang.Thread.start(Thread.java:714)
at
java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:950)
at
java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1368)
at
org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.execute(ExecutorUtil.java:214)
at
java.util.concurrent.AbstractExecutorService.submit(AbstractExecutorService.java:112)
at
org.apache.solr.common.cloud.SolrZkClient$3.process(SolrZkClient.java:268)
at
org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:522)
at
org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498)

The below error basically says that there are no more files can be opened
as the limit has reached. It has been increased to 65536 from 4096. This
number can be found by ulimit -Hn, ulimit -Sn

java.io.IOException: Too many open files
at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method)
at
sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:422)
at
sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:250)
at
org.eclipse.jetty.server.ServerConnector.accept(ServerConnector.java:382)
at
org.eclipse.jetty.server.AbstractConnector$Acceptor.run(AbstractConnector.java:593)
at
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:654)

Thanks,
Satya


Need Help on solr for Email Search

2017-05-08 Thread Udaya Ganga Santosh Kumar Palivela
HI Team,

We are using solr for Quick retrieval of search result.
Recently we are encountered with a problem while searching for Email  in
solr.
search is performing well when i enter simple text ,but When i enter any
special characters (Like @ ,(comma))  it is not returning any results.

i have attached the schema file once please verify and let us know how to
perform search on solr for email address .

please get back to me as soon as possible.
-- 

*Thanks & Regards,*

*Santosh Palivela.*


Re: SessionExpiredException

2017-05-08 Thread Satya Marivada
The 3g memory is doing well, performing a gc at 600-700 MB.

-XX:+UseConcMarkSweepGC -XX:+UseParNewGC

Here are my jvm start up

The start up parameters are:

java -server -Xms3g -Xmx3g -XX:NewRatio=3 -XX:SurvivorRatio=4
-XX:TargetSurvivorRatio=90 -XX:MaxTenuringThreshold=8
-XX:+UseConcMarkSweepGC -XX:+UseParNewGC -XX:ConcGCThreads=4
-XX:ParallelGCThreads=4 -XX:+CMSScavengeBeforeRemark
-XX:PretenureSizeThreshold=64m -XX:+UseCMSInitiatingOccupancyOnly
-XX:CMSInitiatingOccupancyFraction=50 -XX:CMSMaxAbortablePrecleanTime=6000
-XX:+CMSParallelRemarkEnabled -XX:+ParallelRefProcEnabled
-XX:-OmitStackTraceInFastThrow -verbose:gc -XX:+PrintHeapAtGC
-XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintGCTimeStamps
-XX:+PrintTenuringDistribution -XX:+PrintGCApplicationStoppedTime
-Xloggc:/sanfs/mnt/vol01/solr/solr-6.3.0/server/logs/solr_gc.log
-XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=9 -XX:GCLogFileSize=20M
-DzkClientTimeout=15000 ...

On Mon, May 8, 2017 at 11:50 AM Walter Underwood 
wrote:

> Which garbage collector are you using? The default GC will probably give
> long pauses.
>
> You need to use CMS or G1.
>
> wunder
> Walter Underwood
> wun...@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
>
>
> > On May 8, 2017, at 8:48 AM, Erick Erickson 
> wrote:
> >
> > 3G of memory should not lead to long GC pauses unless you're running
> > very close to the edge of available memory. Paradoxically, running
> > with 6G of memory may lead to _fewer_ noticeable pauses since the
> > background threads can do the work, well, in the background.
> >
> > Best,
> > Erick
> >
> > On Mon, May 8, 2017 at 7:29 AM, Satya Marivada
> >  wrote:
> >> Hi Piyush and Shawn,
> >>
> >> May I ask what is the solution for it, if it is the long gc pauses? I am
> >> skeptical about the same problem in our case too. We have started with
> 3G
> >> of memory for the heap.
> >> Did you have to adjust some of the memory allotted? Very much
> appreciated.
> >>
> >> Thanks,
> >> Satya
> >>
> >> On Sat, May 6, 2017 at 12:36 PM Piyush Kunal 
> >> wrote:
> >>
> >>> We already faced this issue and found out the issue to be long GC
> pauses
> >>> itself on either client side or server side.
> >>> Regards,
> >>> Piyush
> >>>
> >>> On Sat, May 6, 2017 at 6:10 PM, Shawn Heisey 
> wrote:
> >>>
>  On 5/3/2017 7:32 AM, Satya Marivada wrote:
> > I see below exceptions in my logs sometimes. What could be causing
> it?
> >
> > org.apache.zookeeper.KeeperException$SessionExpiredException:
> 
>  Based on my limited research, this would tend to indicate that the
>  heartbeats ZK uses to detect when sessions have gone inactive are not
>  occurring in a timely fashion.
> 
>  Common causes seem to be:
> 
>  JVM Garbage collections.  These can cause the entire JVM to pause for
> an
>  extended period of time, and this time may exceed the configured
> >>> timeouts.
> 
>  Excess client connections to ZK.  ZK limits the number of connections
>  from each client address, with the idea of preventing denial of
> service
>  attacks.  If a client is misbehaving, it may make more connections
> than
>  it should.  You can try increasing the limit in the ZK config, but if
>  this is the reason for the exception, then something's probably wrong,
>  and you may be just hiding the real problem.
> 
>  Although we might have bugs causing the second situation, the first
>  situation seems more likely.
> 
>  Thanks,
>  Shawn
> 
> 
> >>>
>
>


Re: SessionExpiredException

2017-05-08 Thread Walter Underwood
Which garbage collector are you using? The default GC will probably give long 
pauses.

You need to use CMS or G1.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)


> On May 8, 2017, at 8:48 AM, Erick Erickson  wrote:
> 
> 3G of memory should not lead to long GC pauses unless you're running
> very close to the edge of available memory. Paradoxically, running
> with 6G of memory may lead to _fewer_ noticeable pauses since the
> background threads can do the work, well, in the background.
> 
> Best,
> Erick
> 
> On Mon, May 8, 2017 at 7:29 AM, Satya Marivada
>  wrote:
>> Hi Piyush and Shawn,
>> 
>> May I ask what is the solution for it, if it is the long gc pauses? I am
>> skeptical about the same problem in our case too. We have started with 3G
>> of memory for the heap.
>> Did you have to adjust some of the memory allotted? Very much appreciated.
>> 
>> Thanks,
>> Satya
>> 
>> On Sat, May 6, 2017 at 12:36 PM Piyush Kunal 
>> wrote:
>> 
>>> We already faced this issue and found out the issue to be long GC pauses
>>> itself on either client side or server side.
>>> Regards,
>>> Piyush
>>> 
>>> On Sat, May 6, 2017 at 6:10 PM, Shawn Heisey  wrote:
>>> 
 On 5/3/2017 7:32 AM, Satya Marivada wrote:
> I see below exceptions in my logs sometimes. What could be causing it?
> 
> org.apache.zookeeper.KeeperException$SessionExpiredException:
 
 Based on my limited research, this would tend to indicate that the
 heartbeats ZK uses to detect when sessions have gone inactive are not
 occurring in a timely fashion.
 
 Common causes seem to be:
 
 JVM Garbage collections.  These can cause the entire JVM to pause for an
 extended period of time, and this time may exceed the configured
>>> timeouts.
 
 Excess client connections to ZK.  ZK limits the number of connections
 from each client address, with the idea of preventing denial of service
 attacks.  If a client is misbehaving, it may make more connections than
 it should.  You can try increasing the limit in the ZK config, but if
 this is the reason for the exception, then something's probably wrong,
 and you may be just hiding the real problem.
 
 Although we might have bugs causing the second situation, the first
 situation seems more likely.
 
 Thanks,
 Shawn
 
 
>>> 



Re: SessionExpiredException

2017-05-08 Thread Erick Erickson
3G of memory should not lead to long GC pauses unless you're running
very close to the edge of available memory. Paradoxically, running
with 6G of memory may lead to _fewer_ noticeable pauses since the
background threads can do the work, well, in the background.

Best,
Erick

On Mon, May 8, 2017 at 7:29 AM, Satya Marivada
 wrote:
> Hi Piyush and Shawn,
>
> May I ask what is the solution for it, if it is the long gc pauses? I am
> skeptical about the same problem in our case too. We have started with 3G
> of memory for the heap.
> Did you have to adjust some of the memory allotted? Very much appreciated.
>
> Thanks,
> Satya
>
> On Sat, May 6, 2017 at 12:36 PM Piyush Kunal 
> wrote:
>
>> We already faced this issue and found out the issue to be long GC pauses
>> itself on either client side or server side.
>> Regards,
>> Piyush
>>
>> On Sat, May 6, 2017 at 6:10 PM, Shawn Heisey  wrote:
>>
>> > On 5/3/2017 7:32 AM, Satya Marivada wrote:
>> > > I see below exceptions in my logs sometimes. What could be causing it?
>> > >
>> > > org.apache.zookeeper.KeeperException$SessionExpiredException:
>> >
>> > Based on my limited research, this would tend to indicate that the
>> > heartbeats ZK uses to detect when sessions have gone inactive are not
>> > occurring in a timely fashion.
>> >
>> > Common causes seem to be:
>> >
>> > JVM Garbage collections.  These can cause the entire JVM to pause for an
>> > extended period of time, and this time may exceed the configured
>> timeouts.
>> >
>> > Excess client connections to ZK.  ZK limits the number of connections
>> > from each client address, with the idea of preventing denial of service
>> > attacks.  If a client is misbehaving, it may make more connections than
>> > it should.  You can try increasing the limit in the ZK config, but if
>> > this is the reason for the exception, then something's probably wrong,
>> > and you may be just hiding the real problem.
>> >
>> > Although we might have bugs causing the second situation, the first
>> > situation seems more likely.
>> >
>> > Thanks,
>> > Shawn
>> >
>> >
>>


Re: distribution of leader and replica in SolrCloud

2017-05-08 Thread Erick Erickson
Also, you can specify custom placement rules, see:
https://cwiki.apache.org/confluence/display/solr/Rule-based+Replica+Placement

But Shawn's statement is the nub of what you're seeing, by default
multiple JVMs on the same physical machine are considered separate
Solr instances.

Also note that if you want to, you can specify a nodeSet when you
create the nodes, and in particular the special value EMPTY. That'll
create a collection with no replicas and you can ADDREPLICA to
precisely place each one if you require that level of control.

Best,
Erick

On Mon, May 8, 2017 at 7:44 AM, Shawn Heisey  wrote:
> On 5/8/2017 5:38 AM, Bernd Fehling wrote:
>> boss -- shard1 - server2:7574
>>| |-- server2:8983 (leader)
>
> The reason that this happened is because you've got two nodes running on
> every server.  From SolrCloud's perspective, there are ten distinct
> nodes, not five.
>
> SolrCloud doesn't notice the fact that different nodes are running on
> the same server(s).  If your reaction to hearing this is that it
> *should* notice, you're probably right, but in a typical use case, each
> server should only be running one Solr instance, so this would never happen.
>
> There is only one instance where I can think of where I would recommend
> running multiple instances per server, and that is when the required
> heap size for a single instance would be VERY large.  Running two
> instances with smaller heaps can yield better performance.
>
> See this issue:
>
> https://issues.apache.org/jira/browse/SOLR-6027
>
> Thanks,
> Shawn
>


Re: JSON facet performance for aggregations

2017-05-08 Thread Yonik Seeley
On Mon, May 8, 2017 at 3:55 AM, Mikhail Ibraheem
 wrote:
> Thanks Yonik.
> It is double because our use case allows to group by any field of any type.

Grouping in Solr does not require a double type, so I'm not sure how
that logically follows.  Perhaps it's a limitation in the system using
Solr?

> According to your below valuable explanation, is it better at this case to 
> use flat faceting instead of JSON faceting?

I don't think it would help.

I opened https://issues.apache.org/jira/browse/SOLR-10634 to address
this performance issue.

> Indexing the field should give us better performance than flat faceting?

Indexing the studentId field should give better performance wherever
you need to search for or filter by specific student ids.

-Yonik


> Indexing the field should give us better performance than flat faceting?
> Do you recommend streaming at that case?
>
> Please advise.
>
> Thanks
> Mikhail
>
> -Original Message-
> From: Yonik Seeley [mailto:ysee...@gmail.com]
> Sent: Sunday, May 07, 2017 6:25 PM
> To: solr-user@lucene.apache.org
> Subject: Re: JSON facet performance for aggregations
>
> OK, so I think I know what's going on.
>
> The current code is more optimized for finding the top K buckets from a total 
> of N.
> When one asks to return the top 10 buckets when there are potentially 
> millions of buckets, it makes sense to defer calculating other metrics for 
> those buckets until we know which ones they are.  After we identify the top 
> 10 buckets, we calculate the domain for that bucket and use that to calculate 
> the remaining metrics.
>
> The current method is obviously much slower when one is requesting
> *all* buckets.  We might as well just calculate all metrics in the first pass 
> rather than trying to defer them.
>
> This inefficiency is compounded by the fact that the fields are not indexed.  
> In the second phase, finding the domain for a bucket is a field query.  For 
> an indexed field, this would involve a single term lookup.  For a non-indexed 
> docValues field, this involves a full column scan.
>
> If you ever want to do quick lookups on studentId, it would make sense for it 
> to be indexed (and why is it a double, anyway?)
>
> I'll open up a JIRA issue for the first problem (don't defer metrics if we're 
> going to return all buckets anyway)
>
> -Yonik
>
>
> On Sun, Apr 30, 2017 at 8:58 AM, Mikhail Ibraheem 
>  wrote:
>> Hi Yonik,
>> We are using Solr 6.5
>> Both studentId and grades are double:
>>   > indexed="false" stored="true" docValues="true" multiValued="false"
>> required="false"/>
>>
>> We have 1.5 million records.
>>
>> Thanks
>> Mikhail
>>
>> -Original Message-
>> From: Yonik Seeley [mailto:ysee...@gmail.com]
>> Sent: Sunday, April 30, 2017 1:04 PM
>> To: solr-user@lucene.apache.org
>> Subject: Re: JSON facet performance for aggregations
>>
>> It is odd there would be quite such a big performance delta.
>> What version of solr are you using?
>> What is the fieldType of "grades"?
>> -Yonik
>>
>>
>> On Sun, Apr 30, 2017 at 5:15 AM, Mikhail Ibraheem 
>>  wrote:
>>> 1-
>>> studentId has docValue = true . it is of type double which is
>>> >> stored="true" docValues="true" multiValued="false" required="false"/>
>>>
>>>
>>> 2- If we just facet without aggregation it finishes in good time 60ms:
>>>
>>> json.facet={
>>>studentId:{
>>>   type:terms,
>>>   limit:-1,
>>>   field:" studentId "
>>>
>>>}
>>> }
>>>
>>>
>>> Thanks
>>>
>>>
>>> -Original Message-
>>> From: Vijay Tiwary [mailto:vijaykr.tiw...@gmail.com]
>>> Sent: Sunday, April 30, 2017 10:44 AM
>>> To: solr-user@lucene.apache.org
>>> Subject: RE: JSON facet performance for aggregations
>>>
>>> Please enable doc values and try.
>>> There is a bug in the source code which causes json facet on string field 
>>> to run very slow. On numeric fields it runs fine with doc value enabled.
>>>
>>> On Apr 30, 2017 1:41 PM, "Mikhail Ibraheem"
>>> 
>>> wrote:
>>>
 Hi Vijay,
 It is already numeric field.
 It is huge difference between json and flat here. Do you know the
 reason for this? Is there a way to improve it ?

 -Original Message-
 From: Vijay Tiwary [mailto:vijaykr.tiw...@gmail.com]
 Sent: Sunday, April 30, 2017 9:58 AM
 To: solr-user@lucene.apache.org
 Subject: Re: JSON facet performance for aggregations

 Json facet on string fields run lot slower than on numeric fields.
 Try and see if you can represent studentid as a numeric field.

 On Apr 30, 2017 1:19 PM, "Mikhail Ibraheem"
 
 wrote:

 > Hi,
 >
 > I am trying to do aggregation with JSON faceting but performance
 > is very bad for one of the requests:
 >
 > json.facet={
 >
 >studentId:{
 >
 >   type:terms,
 >
 >   limit:-1,
 >
 >   field:"studentId",
 >
 >   facet:{
 >
 >   x:"sum(grades)"
 >
 >  

Re: distribution of leader and replica in SolrCloud

2017-05-08 Thread Shawn Heisey
On 5/8/2017 5:38 AM, Bernd Fehling wrote:
> boss -- shard1 - server2:7574
>| |-- server2:8983 (leader)

The reason that this happened is because you've got two nodes running on
every server.  From SolrCloud's perspective, there are ten distinct
nodes, not five.

SolrCloud doesn't notice the fact that different nodes are running on
the same server(s).  If your reaction to hearing this is that it
*should* notice, you're probably right, but in a typical use case, each
server should only be running one Solr instance, so this would never happen.

There is only one instance where I can think of where I would recommend
running multiple instances per server, and that is when the required
heap size for a single instance would be VERY large.  Running two
instances with smaller heaps can yield better performance.

See this issue:

https://issues.apache.org/jira/browse/SOLR-6027

Thanks,
Shawn



Re: SessionExpiredException

2017-05-08 Thread Satya Marivada
Hi Piyush and Shawn,

May I ask what is the solution for it, if it is the long gc pauses? I am
skeptical about the same problem in our case too. We have started with 3G
of memory for the heap.
Did you have to adjust some of the memory allotted? Very much appreciated.

Thanks,
Satya

On Sat, May 6, 2017 at 12:36 PM Piyush Kunal 
wrote:

> We already faced this issue and found out the issue to be long GC pauses
> itself on either client side or server side.
> Regards,
> Piyush
>
> On Sat, May 6, 2017 at 6:10 PM, Shawn Heisey  wrote:
>
> > On 5/3/2017 7:32 AM, Satya Marivada wrote:
> > > I see below exceptions in my logs sometimes. What could be causing it?
> > >
> > > org.apache.zookeeper.KeeperException$SessionExpiredException:
> >
> > Based on my limited research, this would tend to indicate that the
> > heartbeats ZK uses to detect when sessions have gone inactive are not
> > occurring in a timely fashion.
> >
> > Common causes seem to be:
> >
> > JVM Garbage collections.  These can cause the entire JVM to pause for an
> > extended period of time, and this time may exceed the configured
> timeouts.
> >
> > Excess client connections to ZK.  ZK limits the number of connections
> > from each client address, with the idea of preventing denial of service
> > attacks.  If a client is misbehaving, it may make more connections than
> > it should.  You can try increasing the limit in the ZK config, but if
> > this is the reason for the exception, then something's probably wrong,
> > and you may be just hiding the real problem.
> >
> > Although we might have bugs causing the second situation, the first
> > situation seems more likely.
> >
> > Thanks,
> > Shawn
> >
> >
>


Re: Adding shards dynamically when the automatic routing is used

2017-05-08 Thread Shawn Heisey
On 5/8/2017 2:07 AM, mganeshs wrote:
> Is there possiblity in near future in coming new releases, adding
> shards dynamically though compositeId ( default ) based routing is
> used. Currently only option is we need to split the shard, instead we
> should able to add shards dynamically and then on all new documents
> should go on new shards. Is there a plan to include this in coming
> release(s) ? 

Once a shard layout is created with compositeId routing and you index
data into that layout, you can't change it.  Each shard contains
documents that hash to a certain range of hash values.  If you change
the hash ranges without changing what documents are actually in each
shard (by reindexing) then SolrCloud functionality breaks, because the
internal consistency won't be there.

There is an issue to implement a rebalance API -- which could for
instance let you change from six shards to ten shards.  If implemented,
it involves completely rewriting the entire index, likely across
multiple servers, moving data from an old shard layout to a new one. 
The patch that has been supplied does apparently work, but it is
necessarily a relatively slow process.

https://issues.apache.org/jira/browse/SOLR-9241

Activity on the issue has stalled.  I do not know why.  It might be
because the patch includes no tests, so we are not assured that the
patch is as bulletproof as we want it to be.

Thanks,
Shawn



Re: distribution of leader and replica in SolrCloud

2017-05-08 Thread Bernd Fehling
And then delete replica shard2-->server1:8983 and add replica 
shard2-->server2:7574 ?

Would be nice to have some automatic logic like ES (_cluster/reroute with move).

Regards
Bernd


Am 08.05.2017 um 14:16 schrieb Amrit Sarkar:
> Bernd,
> 
> When you create a collection via Collections API, the internal logic tries
> its best to equally distribute the nodes across the shards but sometimes it
> don't happen.
> 
> The best thing about SolrCloud is you can manipulate its cloud architecture
> on the fly using Collections API. You can delete a replica of one
> particular shard and add a replica (on a specific machine/node) to any of
> the shards anytime depending to your design.
> 
> For the above, you can simply:
> 
> call DELETEREPLICA api on shard1--->server2:7574 (or the other one)
> 
> https://cwiki.apache.org/confluence/display/solr/Collections+API#CollectionsAPI-DELETEREPLICA:DeleteaReplica
> 
> boss -- shard1
>| |-- server2:8983 (leader)
>|
> --- shard2 - server1:8983
>| |-- server5:7575 (leader)
>|
> --- shard3 - server3:8983 (leader)
>| |-- server4:8983
>|
> --- shard4 - server1:7574 (leader)
>| |-- server4:7574
>|
> --- shard5 - server3:7574 (leader)
>  |-- server5:8983
> 
> call ADDREPLICA api on shard1>server1:8983
> 
> https://cwiki.apache.org/confluence/display/solr/Collections+API#CollectionsAPI-DELETEREPLICA:DeleteaReplica
> 
> boss -- shard1 - server1:8983
>| |-- server2:8983 (leader)
>|
> --- shard2 - server1:8983
>| |-- server5:7575 (leader)
>|
> --- shard3 - server3:8983 (leader)
>| |-- server4:8983
>|
> --- shard4 - server1:7574 (leader)
>| |-- server4:7574
>|
> --- shard5 - server3:7574 (leader)
>  |-- server5:8983
> 
> Hope this helps.
> 
> Amrit Sarkar
> Search Engineer
> Lucidworks, Inc.
> 415-589-9269
> www.lucidworks.com
> Twitter http://twitter.com/lucidworks
> LinkedIn: https://www.linkedin.com/in/sarkaramrit2
> 
> On Mon, May 8, 2017 at 5:08 PM, Bernd Fehling <
> bernd.fehl...@uni-bielefeld.de> wrote:
> 
>> My assumption was that the strength of SolrCloud is the distribution
>> of leader and replica within the Cloud and make the Cloud somewhat
>> failsafe.
>> But after setting up SolrCloud with a collection I have both, leader and
>> replica, on the same shard. And this should be failsafe?
>>
>> o.a.s.h.a.CollectionsHandler Invoked Collection Action :create with params
>> replicationFactor=2&routerName=compositeId&collection.configName=boss&
>> maxShardsPerNode=1&name=boss&router.name=compositeId&action=
>> CREATE&numShards=5
>>
>> boss -- shard1 - server2:7574
>>| |-- server2:8983 (leader)
>>|
>> --- shard2 - server1:8983
>>| |-- server5:7575 (leader)
>>|
>> --- shard3 - server3:8983 (leader)
>>| |-- server4:8983
>>|
>> --- shard4 - server1:7574 (leader)
>>| |-- server4:7574
>>|
>> --- shard5 - server3:7574 (leader)
>>  |-- server5:8983
>>
>> From my point of view, if server2 is going to crash then shard1 will
>> disappear and
>> 1/5th of the index is missing.
>>
>> What is your opinion?
>>
>> Regards
>> Bernd
>>
>>
>>
>>
> 



Re: Search inside grouping list

2017-05-08 Thread Emir Arnautovic

Hi Don,

This is query without grouping and returns expected results. But when 
you apply grouping by some field, you get wrong results? Can you share 
query results and query with grouping.


Emir


On 08.05.2017 14:28, donjose wrote:

Hi Emir,
Thank you for the response.

Please find the query which i am sending to SOLR
http://localhost:8983/solr/pema/select?fq=color:red&indent=on&q=*:*&wt=json

Regards,
Don.




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Search-inside-grouping-list-tp4333488p4333936.html
Sent from the Solr - User mailing list archive at Nabble.com.


--
Monitoring * Alerting * Anomaly Detection * Centralized Log Management
Solr & Elasticsearch Support * http://sematext.com/



Re: Search inside grouping list

2017-05-08 Thread donjose
Hi Emir,
Thank you for the response.

Please find the query which i am sending to SOLR
http://localhost:8983/solr/pema/select?fq=color:red&indent=on&q=*:*&wt=json

Regards,
Don.




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Search-inside-grouping-list-tp4333488p4333936.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: SPLITSHARD Working

2017-05-08 Thread Amrit Sarkar
Vrinda,

The expected behavior if parent shard 'shardA' resides on node'1', node'2'
... node'n' and do a SPLITSHARD on it.

the child shards, shardA_0 and shardA_1 will reside on node'1', node'2' ...
node'n'.

shardA --- node'1' (leader) & node'2' (replica)

after splitshard;

shardA --- node'1' (leader) & node'2' (replica) (INACTIVE)
shardA_0 -- node'1' & node'2' (ACTIVE)
shardA_1 -- node'1' & node'2' (ACTIVE)

Any one of them can be a leader and replica for the children nodes.

Amrit Sarkar
Search Engineer
Lucidworks, Inc.
415-589-9269
www.lucidworks.com
Twitter http://twitter.com/lucidworks
LinkedIn: https://www.linkedin.com/in/sarkaramrit2

On Mon, May 8, 2017 at 4:32 PM, vrindavda  wrote:

> Thanks I go it.
>
> But I see that distribution of shards and replicas is not equal.
>
>  For Example in my case :
> I had shard 1 and shard2  on Node 1 and their replica_1 and replica_2 on
> Node 2.
> I did SHARDSPLIT on shard1  to get shard1_0 and shard1_1  such that
> and shard1_0_replica0 are created on Node 1 and shard1_0_replica1,
> shard1_1_replica1 and  shard1_1_replica0 on Node 2.
>
> Is this expected behavior ?
>
> Thank you,
> Vrinda Davda
>
>
>
> --
> View this message in context: http://lucene.472066.n3.
> nabble.com/SPLITSHARD-Working-tp4333876p4333922.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: distribution of leader and replica in SolrCloud

2017-05-08 Thread Amrit Sarkar
Bernd,

When you create a collection via Collections API, the internal logic tries
its best to equally distribute the nodes across the shards but sometimes it
don't happen.

The best thing about SolrCloud is you can manipulate its cloud architecture
on the fly using Collections API. You can delete a replica of one
particular shard and add a replica (on a specific machine/node) to any of
the shards anytime depending to your design.

For the above, you can simply:

call DELETEREPLICA api on shard1--->server2:7574 (or the other one)

https://cwiki.apache.org/confluence/display/solr/Collections+API#CollectionsAPI-DELETEREPLICA:DeleteaReplica

boss -- shard1
   | |-- server2:8983 (leader)
   |
--- shard2 - server1:8983
   | |-- server5:7575 (leader)
   |
--- shard3 - server3:8983 (leader)
   | |-- server4:8983
   |
--- shard4 - server1:7574 (leader)
   | |-- server4:7574
   |
--- shard5 - server3:7574 (leader)
 |-- server5:8983

call ADDREPLICA api on shard1>server1:8983

https://cwiki.apache.org/confluence/display/solr/Collections+API#CollectionsAPI-DELETEREPLICA:DeleteaReplica

boss -- shard1 - server1:8983
   | |-- server2:8983 (leader)
   |
--- shard2 - server1:8983
   | |-- server5:7575 (leader)
   |
--- shard3 - server3:8983 (leader)
   | |-- server4:8983
   |
--- shard4 - server1:7574 (leader)
   | |-- server4:7574
   |
--- shard5 - server3:7574 (leader)
 |-- server5:8983

Hope this helps.

Amrit Sarkar
Search Engineer
Lucidworks, Inc.
415-589-9269
www.lucidworks.com
Twitter http://twitter.com/lucidworks
LinkedIn: https://www.linkedin.com/in/sarkaramrit2

On Mon, May 8, 2017 at 5:08 PM, Bernd Fehling <
bernd.fehl...@uni-bielefeld.de> wrote:

> My assumption was that the strength of SolrCloud is the distribution
> of leader and replica within the Cloud and make the Cloud somewhat
> failsafe.
> But after setting up SolrCloud with a collection I have both, leader and
> replica, on the same shard. And this should be failsafe?
>
> o.a.s.h.a.CollectionsHandler Invoked Collection Action :create with params
> replicationFactor=2&routerName=compositeId&collection.configName=boss&
> maxShardsPerNode=1&name=boss&router.name=compositeId&action=
> CREATE&numShards=5
>
> boss -- shard1 - server2:7574
>| |-- server2:8983 (leader)
>|
> --- shard2 - server1:8983
>| |-- server5:7575 (leader)
>|
> --- shard3 - server3:8983 (leader)
>| |-- server4:8983
>|
> --- shard4 - server1:7574 (leader)
>| |-- server4:7574
>|
> --- shard5 - server3:7574 (leader)
>  |-- server5:8983
>
> From my point of view, if server2 is going to crash then shard1 will
> disappear and
> 1/5th of the index is missing.
>
> What is your opinion?
>
> Regards
> Bernd
>
>
>
>


Re: Search inside grouping list

2017-05-08 Thread Emir Arnautovic

Hi,

Can you please provide full query that you are sending to Solr.

Thanks,
Emir


On 08.05.2017 07:18, donjose wrote:

Could anyone can please reply for this query



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Search-inside-grouping-list-tp4333488p4333870.html
Sent from the Solr - User mailing list archive at Nabble.com.


--
Monitoring * Alerting * Anomaly Detection * Centralized Log Management
Solr & Elasticsearch Support * http://sematext.com/



distribution of leader and replica in SolrCloud

2017-05-08 Thread Bernd Fehling
My assumption was that the strength of SolrCloud is the distribution
of leader and replica within the Cloud and make the Cloud somewhat failsafe.
But after setting up SolrCloud with a collection I have both, leader and
replica, on the same shard. And this should be failsafe?

o.a.s.h.a.CollectionsHandler Invoked Collection Action :create with params
replicationFactor=2&routerName=compositeId&collection.configName=boss&
maxShardsPerNode=1&name=boss&router.name=compositeId&action=CREATE&numShards=5

boss -- shard1 - server2:7574
   | |-- server2:8983 (leader)
   |
--- shard2 - server1:8983
   | |-- server5:7575 (leader)
   |
--- shard3 - server3:8983 (leader)
   | |-- server4:8983
   |
--- shard4 - server1:7574 (leader)
   | |-- server4:7574
   |
--- shard5 - server3:7574 (leader)
 |-- server5:8983

>From my point of view, if server2 is going to crash then shard1 will disappear 
>and
1/5th of the index is missing.

What is your opinion?

Regards
Bernd





Re: SPLITSHARD Working

2017-05-08 Thread vrindavda
Thanks I go it.

But I see that distribution of shards and replicas is not equal.

 For Example in my case :
I had shard 1 and shard2  on Node 1 and their replica_1 and replica_2 on
Node 2. 
I did SHARDSPLIT on shard1  to get shard1_0 and shard1_1  such that 
and shard1_0_replica0 are created on Node 1 and shard1_0_replica1,
shard1_1_replica1 and  shard1_1_replica0 on Node 2.

Is this expected behavior ? 

Thank you,
Vrinda Davda



--
View this message in context: 
http://lucene.472066.n3.nabble.com/SPLITSHARD-Working-tp4333876p4333922.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: SPLITSHARD Working

2017-05-08 Thread Shalin Shekhar Mangar
No, split always happens on the original node. But you can move the
sub-shard leader to a new node once the split is complete by using
AddReplica/DeleteReplica collection API.

On Mon, May 8, 2017 at 1:02 PM, vrindavda  wrote:
> Hi,
>
> I need to SPLITSHARD such that one split remains on the same machine as
> original and another uses new machines for leader and replicas. Is this
> possible ? Please let me know what properties do I need to specify in
> Collection API to achieve this.
>
> Thank you,
> Vrinda Davda
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/SPLITSHARD-Working-tp4333876.html
> Sent from the Solr - User mailing list archive at Nabble.com.



-- 
Regards,
Shalin Shekhar Mangar.


Re: Automatic conversion to Range Query

2017-05-08 Thread Rick Leir

Of course, I should have noticed he typed 3G instead of 32G.


On 2017-05-07 10:46 AM, Aman Deep Singh wrote:

Yes Rick,
User is actually typing this type of queries ,this was a random user query
pick from access logs


On 07-May-2017 7:29 PM, "Rick Leir"  wrote:

Hi Aman,
Is the user actually entering that query? It seems unlikely. Perhaps you
have a form selector for various Apple products. Could you not have an
enumerated type for the products, and simplify everything? I must be
missing something here. Cheers -- Rick

On May 6, 2017 8:38:14 AM EDT, Shawn Heisey  wrote:

On 5/5/2017 12:42 PM, Aman Deep Singh wrote:

Hi Erick, I don't want to do the range query , That is why I'm using
the pattern replace filter to remove all the non alphanumeric to

space

so that this type of situation don't arrive,Since end user can query
anything, also in the query I haven't mention any range related
keyword (TO). If my query is like [64GB/3GB] it works fine and

doesn't

convert to range query.

I hope I'm headed in the right direction here.

Square brackets are special characters to the query parser -- they are
typically used to specify a range query.  It's a little odd that Solr
would add the "TO" for you like it seems to be doing, but not REALLY
surprising.  This would be happening *before* the parts of the query
make it to your analysis chain where you have the pattern replace
filter.

If you want to NOT have special characters perform their special
function, but actually become part of the query, you'll need to escape
them with a backslash.  Escaping all the special characters in your
query yields this query:

xiomi Mi 5 \-white \[64GB\/ 3GB\]

It's difficult to decide whether the dash character before "white" was
intended as a "NOT" operator or to be part of the query.  You might not
want to escape that one.

Thanks,
Shawn

--
Sorry for being brief. Alternate email is rickleir at yahoo dot com





6.5.1. cloud went partially down

2017-05-08 Thread Markus Jelsma
Hi,

Multiple 6.5.1. clouds / collections went down this weekend around the same 
time, they share the same ZK quorum. The nodes stayed up but did not rejoin the 
cluster (find or connect to ZK)

This is what the log told us:

2017-05-06 18:58:34.893 WARN  
(zkCallback-5-thread-9-processing-n:idx6.example.org:8983_solr) [   ] 
o.a.s.c.c.ConnectionManager Watcher 
org.apache.solr.common.cloud.ConnectionManager@4f97bdad name: ZooKe
eperConnection 
Watcher:89.188.14.10:2181,89.188.14.11:2181,89.188.14.12:2181/solr_collection_search
 got event WatchedEvent state:Disconnected type:None path:null path: null type: 
None
2017-05-06 18:58:34.893 WARN  
(zkCallback-5-thread-9-processing-n:idx6.example.org:8983_solr) [   ] 
o.a.s.c.c.ConnectionManager zkClient has disconnected
2017-05-06 18:58:35.001 WARN  
(zkCallback-9-thread-5-processing-n:idx6.example.org:8983_solr 
x:search_shard2_replica3 s:shard2 c:search r:core_node6-EventThread) [c:search 
s:shard2 r:core_node6 x:search_shard2_replica3] o.a.s.c.c.ConnectionManager 
Watcher org.apache.solr.common.cloud.ConnectionManager@c226cc name: 
ZooKeeperConnection 
Watcher:89.188.14.10:2181,89.188.14.11:2181,89.188.14.12:2181/solr_collection_search
 got event WatchedEvent state:Disconnected type:None path:null path: null type: 
None
2017-05-06 18:58:35.010 WARN  
(zkCallback-9-thread-5-processing-n:idx6.example.org:8983_solr 
x:search_shard2_replica3 s:shard2 c:search r:core_node6-EventThread) [c:search 
s:shard2 r:core_node6 x:search_shard2_replica3] o.a.s.c.c.ConnectionManager 
zkClient has disconnected
2017-05-06 18:58:45.360 WARN  
(zkCallback-5-thread-8-processing-n:idx6.example.org:8983_solr) [   ] 
o.a.s.c.c.ConnectionManager Watcher 
org.apache.solr.common.cloud.ConnectionManager@4f97bdad name: 
ZooKeeperConnection 
Watcher:89.188.14.10:2181,89.188.14.11:2181,89.188.14.12:2181/solr_collection_search
 got event WatchedEvent state:Expired type:None path:null path: null type: None
2017-05-06 18:58:45.360 WARN  
(zkCallback-5-thread-8-processing-n:idx6.example.org:8983_solr) [   ] 
o.a.s.c.c.ConnectionManager Our previous ZooKeeper session was expired. 
Attempting to reconnect to recover relationship with ZooKeeper...
2017-05-06 18:58:45.380 WARN  
(OverseerStateUpdate-97740792370385619-idx6.example.org:8983_solr-n_000558) 
[   ] o.a.s.c.Overseer Solr cannot talk to ZK, exiting Overseer main queue loop
org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = 
Session expired for /overseer/queue
at org.apache.zookeeper.KeeperException.create(KeeperException.java:127)
at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
at org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:1472)
at 
org.apache.solr.common.cloud.SolrZkClient$6.execute(SolrZkClient.java:339)
at 
org.apache.solr.common.cloud.SolrZkClient$6.execute(SolrZkClient.java:336)
at 
org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkCmdExecutor.java:60)
at 
org.apache.solr.common.cloud.SolrZkClient.getChildren(SolrZkClient.java:336)
at 
org.apache.solr.cloud.DistributedQueue.fetchZkChildren(DistributedQueue.java:308)
at 
org.apache.solr.cloud.DistributedQueue.firstChild(DistributedQueue.java:285)
at 
org.apache.solr.cloud.DistributedQueue.firstElement(DistributedQueue.java:393)
at 
org.apache.solr.cloud.DistributedQueue.peek(DistributedQueue.java:159)
at 
org.apache.solr.cloud.DistributedQueue.peek(DistributedQueue.java:137)
at 
org.apache.solr.cloud.Overseer$ClusterStateUpdater.run(Overseer.java:180)
at java.lang.Thread.run(Thread.java:745)
2017-05-06 18:58:45.381 WARN  
(zkCallback-5-thread-8-processing-n:idx6.example.org:8983_solr) [   ] 
o.a.s.c.c.DefaultConnectionStrategy Connection expired - starting a new one...
2017-05-06 18:58:45.382 ERROR (OverseerExitThread) [   ] o.a.s.c.Overseer could 
not read the data
org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = 
Session expired for /overseer_elect/leader
at org.apache.zookeeper.KeeperException.create(KeeperException.java:127)
at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1155)
at 
org.apache.solr.common.cloud.SolrZkClient$7.execute(SolrZkClient.java:356)
at 
org.apache.solr.common.cloud.SolrZkClient$7.execute(SolrZkClient.java:353)
at 
org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkCmdExecutor.java:60)
at 
org.apache.solr.common.cloud.SolrZkClient.getData(SolrZkClient.java:353)
at 
org.apache.solr.cloud.Overseer$ClusterStateUpdater.checkIfIamStillLeader(Overseer.java:287)
at java.lang.Thread.run(Thread.java:745)
2017-05-06 18:58:46.453 WARN  
(zkCallback-9-thread-5-processing-n:idx6.example.org:8983_solr 
x:search_shard2_replica3 s:shard2 c:search r:core_node6-EventThread) [c:search 
s:shard2 r

Adding shards dynamically when the automatic routing is used

2017-05-08 Thread mganeshs
All,

Is there possiblity in near future in coming new releases, adding shards
dynamically though compositeId ( default ) based routing is used.
Currently only option is we need to split the shard, instead we should able
to add shards dynamically and then on all new documents should go on new
shards.
Is there a plan to include this in coming release(s) ?

Regards,
Ganesh



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Adding-shards-dynamically-when-the-automatic-routing-is-used-tp4333883.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: JSON facet performance for aggregations

2017-05-08 Thread Mikhail Ibraheem
Thanks Yonik.
It is double because our use case allows to group by any field of any type.
According to your below valuable explanation, is it better at this case to use 
flat faceting instead of JSON faceting?
Indexing the field should give us better performance than flat faceting?
Do you recommend streaming at that case?

Please advise.

Thanks
Mikhail

-Original Message-
From: Yonik Seeley [mailto:ysee...@gmail.com] 
Sent: Sunday, May 07, 2017 6:25 PM
To: solr-user@lucene.apache.org
Subject: Re: JSON facet performance for aggregations

OK, so I think I know what's going on.

The current code is more optimized for finding the top K buckets from a total 
of N.
When one asks to return the top 10 buckets when there are potentially millions 
of buckets, it makes sense to defer calculating other metrics for those buckets 
until we know which ones they are.  After we identify the top 10 buckets, we 
calculate the domain for that bucket and use that to calculate the remaining 
metrics.

The current method is obviously much slower when one is requesting
*all* buckets.  We might as well just calculate all metrics in the first pass 
rather than trying to defer them.

This inefficiency is compounded by the fact that the fields are not indexed.  
In the second phase, finding the domain for a bucket is a field query.  For an 
indexed field, this would involve a single term lookup.  For a non-indexed 
docValues field, this involves a full column scan.

If you ever want to do quick lookups on studentId, it would make sense for it 
to be indexed (and why is it a double, anyway?)

I'll open up a JIRA issue for the first problem (don't defer metrics if we're 
going to return all buckets anyway)

-Yonik


On Sun, Apr 30, 2017 at 8:58 AM, Mikhail Ibraheem  
wrote:
> Hi Yonik,
> We are using Solr 6.5
> Both studentId and grades are double:
>indexed="false" stored="true" docValues="true" multiValued="false" 
> required="false"/>
>
> We have 1.5 million records.
>
> Thanks
> Mikhail
>
> -Original Message-
> From: Yonik Seeley [mailto:ysee...@gmail.com]
> Sent: Sunday, April 30, 2017 1:04 PM
> To: solr-user@lucene.apache.org
> Subject: Re: JSON facet performance for aggregations
>
> It is odd there would be quite such a big performance delta.
> What version of solr are you using?
> What is the fieldType of "grades"?
> -Yonik
>
>
> On Sun, Apr 30, 2017 at 5:15 AM, Mikhail Ibraheem 
>  wrote:
>> 1-
>> studentId has docValue = true . it is of type double which is 
>> > stored="true" docValues="true" multiValued="false" required="false"/>
>>
>>
>> 2- If we just facet without aggregation it finishes in good time 60ms:
>>
>> json.facet={
>>studentId:{
>>   type:terms,
>>   limit:-1,
>>   field:" studentId "
>>
>>}
>> }
>>
>>
>> Thanks
>>
>>
>> -Original Message-
>> From: Vijay Tiwary [mailto:vijaykr.tiw...@gmail.com]
>> Sent: Sunday, April 30, 2017 10:44 AM
>> To: solr-user@lucene.apache.org
>> Subject: RE: JSON facet performance for aggregations
>>
>> Please enable doc values and try.
>> There is a bug in the source code which causes json facet on string field to 
>> run very slow. On numeric fields it runs fine with doc value enabled.
>>
>> On Apr 30, 2017 1:41 PM, "Mikhail Ibraheem"
>> 
>> wrote:
>>
>>> Hi Vijay,
>>> It is already numeric field.
>>> It is huge difference between json and flat here. Do you know the 
>>> reason for this? Is there a way to improve it ?
>>>
>>> -Original Message-
>>> From: Vijay Tiwary [mailto:vijaykr.tiw...@gmail.com]
>>> Sent: Sunday, April 30, 2017 9:58 AM
>>> To: solr-user@lucene.apache.org
>>> Subject: Re: JSON facet performance for aggregations
>>>
>>> Json facet on string fields run lot slower than on numeric fields.
>>> Try and see if you can represent studentid as a numeric field.
>>>
>>> On Apr 30, 2017 1:19 PM, "Mikhail Ibraheem"
>>> 
>>> wrote:
>>>
>>> > Hi,
>>> >
>>> > I am trying to do aggregation with JSON faceting but performance 
>>> > is very bad for one of the requests:
>>> >
>>> > json.facet={
>>> >
>>> >studentId:{
>>> >
>>> >   type:terms,
>>> >
>>> >   limit:-1,
>>> >
>>> >   field:"studentId",
>>> >
>>> >   facet:{
>>> >
>>> >   x:"sum(grades)"
>>> >
>>> >   }
>>> >
>>> >}
>>> >
>>> > }
>>> >
>>> >
>>> >
>>> > This request finishes in 250 seconds, and we can't paginate for 
>>> > this service for functional reason so we have to use limit:-1, and 
>>> > the cardinality of the studentId is 7500.
>>> >
>>> >
>>> >
>>> > If I try the same with flat facet it finishes in 3 seconds :
>>> > stats=true&facet=true&stats.field={!tag=piv1
>>> > sum=true}grades&facet.pivot={!stats=piv1}studentId
>>> >
>>> >
>>> >
>>> > We are hoping to use one approach json or flat for all our services.
>>> > JSON facet performance is better for many case.
>>> >
>>> >
>>> >
>>> > Please advise on why the performance for this is so bad and if we 
>>> > can improve it. Also what is the de

SPLITSHARD Working

2017-05-08 Thread vrindavda
Hi,

I need to SPLITSHARD such that one split remains on the same machine as
original and another uses new machines for leader and replicas. Is this
possible ? Please let me know what properties do I need to specify in
Collection API to achieve this.

Thank you,
Vrinda Davda



--
View this message in context: 
http://lucene.472066.n3.nabble.com/SPLITSHARD-Working-tp4333876.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Fw: How to secure solr-6.2.0 in standalone mode?

2017-05-08 Thread Rick Leir

Christian

Cool, you prompted me to learn something. Is your answer in the 
following cwiki link?


https://cwiki.apache.org/confluence/display/solr/Kerberos+Authentication+Plugin

cheers -- Rick

google apache kerberos basic auth

On 2017-05-07 05:21 PM, FOTACHE CHRISTIAN wrote:
  

   
Hi

I'm using solr-6.2.0 in standalone and i need to setup security with kerberos 
(???) for standalone
I have previously setup basic authentication for solr-6.1.0 but it seems that 
solr-6.2.0 has a pretty different approach when it comes to security... I can't 
make it happenPlease help
Thank you,

Christian Fotache Tel: 0728.297.207