Re: Solr 8.0 Json Facets are slow - need help

2020-01-21 Thread kumar gaurav
HI Mikhail

Can you please help ?

On Tue, Jan 21, 2020 at 7:48 PM kumar gaurav  wrote:

> Hi Mikhail
>
> Thanks for your reply . Please help me in this .
>
> Followings are the screenshot:-
>
> [image: image.png]
>
>
> [image: image.png]
>
>
> json facet debug Output:-
>
> json:
> {
>
>- facet:
>{
>   - color_refine:
>   {
>  - domain:
>  {
> - excludeTags: "rassortment,top,top2,top3,top4,",
> - filter:
> [
>-
>"{!filters param=$child.fq excludeTags=rcolor_refine v=$sq}"
>,
>- "{!child of=$pq filters=$fq}docType:(product collection)"
>,
>],
> },
>  - type: "terms",
>  - field: "color_refine",
>  - limit: -1,
>  - facet:
>  {
> - productsCount: "uniqueBlock(_root_)"
> },
>  },
>   - size_refine:
>   {
>  - domain:
>  {
> - excludeTags: "rassortment,top,top2,top3,top4,",
> - filter:
> [
>-
>"{!filters param=$child.fq excludeTags=rsize_refine v=$sq}"
>,
>- "{!child of=$pq filters=$fq}docType:(product collection)"
>,
>],
> },
>  - type: "terms",
>  - field: "size_refine",
>  - limit: -1,
>  - facet:
>  {
> - productsCount: "uniqueBlock(_root_)"
> },
>  },
>   }
>
> }
>
>
>
> regards
> Kumar Gaurav
>
>
> On Tue, Jan 21, 2020 at 5:25 PM Mikhail Khludnev  wrote:
>
>> Hi.
>> Can you share debugQuery=true output?
>>
>> On Tue, Jan 21, 2020 at 1:37 PM kumar gaurav  wrote:
>>
>> > HI
>> >
>> > i have a parent child query in which i have used json facet for child
>> > faceting like following.
>> >
>> > qt=/dismax
>> > matchAllQueryRef1=+(+({!query v=$cq}))
>> > sq=+{!lucene v=$matchAllQueryRef1}
>> > q={!parent tag=top which=$pq filters=$child.fq score=max v=$cq}
>> > child.fq={!tag=rcolor_refine}filter({!term f=color_refine
>> > v=$qcolor_refine1}) filter({!term f=color_refine v=$qcolor_refine2})
>> > qcolor_refine1=Blue
>> > qcolor_refine2=Other clrs
>> > cq=+{!simpleFilter v=docType:sku}
>> > pq=docType:(product)
>> > facet=true
>> > facet.mincount=1
>> > facet.limit=-1
>> > facet.missing=false
>> > json.facet= {color_refine:{
>> > domain:{
>> > filter:["{!filters param=$child.fq excludeTags=rcolor_refine
>> > v=$sq}","{!child of=$pq filters=$fq}docType:(product)"]
>> >},
>> > type:terms,
>> > field:color_refine,
>> > limit:-1,
>> > facet:{productsCount:"uniqueBlock(_root_)"}}}
>> >
>> > schema :-
>> > > > multiValued="true" docValues="true"/>
>> >
>> > i have observed that json facets are slow . It is taking much time than
>> > expected .
>> > Can anyone please check this query specially child.fq and json.facet
>> part .
>> >
>> > Please help me in this .
>> >
>> > Thanks & regards
>> > Kumar Gaurav
>> >
>>
>>
>> --
>> Sincerely yours
>> Mikhail Khludnev
>>
>


Error when posting data to collection

2020-01-21 Thread Suhrid.Ghosh(tayana.in)
Hello,

I tried the below & facing issues , can you please provide a solution 
to check the below :

[wtda@spark-1 solr-8.4.0]$ ./bin/solr start -e cloud
*** [WARN] *** Your open file limit is currently 1024.
 It should be set to 65000 to avoid operational disruption.
 If you no longer wish to see this warning, set SOLR_ULIMIT_CHECKS to false in 
your profile or solr.in.sh
*** [WARN] ***  Your Max Processes Limit is currently 4096.
 It should be set to 65000 to avoid operational disruption.
 If you no longer wish to see this warning, set SOLR_ULIMIT_CHECKS to false in 
your profile or solr.in.sh

Welcome to the SolrCloud example!

This interactive session will help you launch a SolrCloud cluster on your local 
workstation.
To begin, how many Solr nodes would you like to run in your local cluster? 
(specify 1-4 nodes) [2]:
1
Ok, let's start up 1 Solr nodes for your example SolrCloud cluster.
Please enter the port for node1 [8983]:
8000
Solr home directory /home/wtda/solr-8.4.0/example/cloud/node1/solr already 
exists.

Starting up Solr on port 8000 using command:
"bin/solr" start -cloud -p 8000 -s "example/cloud/node1/solr"

*** [WARN] *** Your open file limit is currently 4096.
 It should be set to 65000 to avoid operational disruption.
 If you no longer wish to see this warning, set SOLR_ULIMIT_CHECKS to false in 
your profile or solr.in.sh
*** [WARN] ***  Your Max Processes Limit is currently 4096.
 It should be set to 65000 to avoid operational disruption.
 If you no longer wish to see this warning, set SOLR_ULIMIT_CHECKS to false in 
your profile or solr.in.sh
Waiting up to 180 seconds to see Solr running on port 8000 [\]
Started Solr server on port 8000 (pid=11904). Happy searching!

   INFO  - 
2020-01-20 17:26:08.008; org.apache.solr.common.cloud.ConnectionManager; 
Waiting for client to connect to ZooKeeper
INFO  - 2020-01-20 17:26:08.027; 
org.apache.solr.common.cloud.ConnectionManager; zkClient has connected
INFO  - 2020-01-20 17:26:08.027; 
org.apache.solr.common.cloud.ConnectionManager; Client is connected to ZooKeeper
INFO  - 2020-01-20 17:26:08.046; org.apache.solr.common.cloud.ZkStateReader; 
Updated live nodes from ZooKeeper... (0) -> (3)
INFO  - 2020-01-20 17:26:08.065; 
org.apache.solr.client.solrj.impl.ZkClientClusterStateProvider; Cluster at 
localhost:9000 ready

Now let's create a new collection for indexing documents in your 1-node cluster.
Please provide a name for your new collection: [gettingstarted]
techproducts

Collection 'techproducts' already exists!
Do you want to re-use the existing collection or create a new one? Enter 1 to 
reuse, 2 to create new [1]:
sampletechproducts
Please enter a 1 or 2 between 1 and 2 [1]:
2
Please provide a name for your new collection: [techproducts]
sampletechproducts
How many shards would you like to split sampletechproducts into? [2]
2
How many replicas per shard would you like to create? [2]
2
Please choose a configuration for the sampletechproducts collection, available 
options are:
_default or sample_techproducts_configs [_default]
sample_techproducts_configs
Created collection 'sampletechproducts' with 2 shard(s), 2 replica(s) with 
config-set 'sampletechproducts'

Enabling auto soft-commits with maxTime 3 secs using the Config API

POSTing request to Config API: 
http://localhost:8000/solr/sampletechproducts/config
{"set-property":{"updateHandler.autoSoftCommit.maxTime":"3000"}}
Successfully set-property updateHandler.autoSoftCommit.maxTime to 3000


SolrCloud example running, please visit: http://localhost:8000/solr

[wtda@spark-1 solr-8.4.0]$ bin/post -c sampletechproducts example/exampledocs/*
/usr/java/jdk1.8.0_162//bin/java -classpath 
/home/wtda/solr-8.4.0/dist/solr-core-8.4.0.jar -Dauto=yes 
-Dc=sampletechproducts -Ddata=files org.apache.solr.util.SimplePostTool 
example/exampledocs/books.csv example/exampledocs/books.json 
example/exampledocs/gb18030-example.xml example/exampledocs/hd.xml 
example/exampledocs/ipod_other.xml example/exampledocs/ipod_video.xml 
example/exampledocs/manufacturers.xml example/exampledocs/mem.xml 
example/exampledocs/money.xml example/exampledocs/monitor2.xml 
example/exampledocs/monitor.xml example/exampledocs/more_books.jsonl 
example/exampledocs/mp500.xml example/exampledocs/post.jar 
example/exampledocs/sample.html example/exampledocs/sd500.xml 
example/exampledocs/solr-word.pdf example/exampledocs/solr.xml 
example/exampledocs/test_utf8.sh example/exampledocs/utf8-example.xml 
example/exampledocs/vidcard.xml
SimplePostTool version 5.0.0
Posting files to [base] url 
http://localhost:8983/solr/sampletechproducts/update...
Entering auto mode. File endings considered are 
xml,json,jsonl,csv,pdf,doc,docx,ppt,pptx,xls,xlsx,odt,odp,ods,ott,otp,ots,rtf,htm,html,txt,log
POSTing file books.csv (text/csv) to [base]
SimplePostTool: WARNING: Solr ret

Re: Index growing and growing until restart

2020-01-21 Thread Jörn Franke
Ok, i created collection from scratch based on config

Unfortunately, it does not improve. It is just growing and growing. Except
when I stop solr and then during startup the unnecessary index files are
purged. Even with the previous config this did not happen in older Solr
versions (for sure not in 8.2, in 8.3 maybe, but for sure in 8.4).

Reproduction is simple: just load documents into the index (even during the
first load i observe a significant index size increase (4x fold) that is
then reduced after restart).

I observe though that during metadata update (= atomic updates) it
increases double (not anywhere near what is expected due to the update) and
then slightly reduce (a few megabytes, nothing compared to the real full
size that the index now has).

At the moment, it looks to me it is due to the Solr version, because the
config did not change (we have them all versioned, I checked). However,
maybe I am overlooking something.

Furthermore, it seems that during segment merges old segments are not
deleted until restart (but again, it is a speculation).
I suspect not many have observed this, because the only way that would be
observe is 1) they index a collection completely new and see a huge index
file consumption 2) they update their collection a lot and hit a limit of
disk space (which may happen in some cases not so soon).

I created a JIRA: https://issues.apache.org/jira/browse/SOLR-14202

Please let me know if I can test anything else.

On Tue, Jan 21, 2020 at 10:58 PM Jörn Franke  wrote:

> After testing the update?commit=true i now face an error: "Maximum lock
> count exceeded". strange this is the first time i see this in the lockfiles
> and when doing commit=true
> ava.lang.Error: Maximum lock count exceeded
> at
> java.base/java.util.concurrent.locks.ReentrantReadWriteLock$Sync.fullTryAcquireShared(ReentrantReadWriteLock.java:535)
> at
> java.base/java.util.concurrent.locks.ReentrantReadWriteLock$Sync.tryAcquireShared(ReentrantReadWriteLock.java:494)
> at
> java.base/java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1368)
> at
> java.base/java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.tryLock(ReentrantReadWriteLock.java:882)
> at
> org.apache.solr.update.DefaultSolrCoreState.lock(DefaultSolrCoreState.java:179)
> at
> org.apache.solr.update.DefaultSolrCoreState.getIndexWriter(DefaultSolrCoreState.java:124)
> at
> org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:658)
> at
> org.apache.solr.update.processor.RunUpdateProcessor.processCommit(RunUpdateProcessorFactory.java:102)
> at
> org.apache.solr.update.processor.UpdateRequestProcessor.processCommit(UpdateRequestProcessor.java:68)
> at
> org.apache.solr.update.processor.DistributedUpdateProcessor.doLocalCommit(DistributedUpdateProcessor.java:1079)
> at
> org.apache.solr.update.processor.DistributedZkUpdateProcessor.processCommit(DistributedZkUpdateProcessor.java:220)
> at
> org.apache.solr.update.processor.LogUpdateProcessorFactory$LogUpdateProcessor.processCommit(LogUpdateProcessorFactory.java:160)
> at
> org.apache.solr.update.processor.UpdateRequestProcessor.processCommit(UpdateRequestProcessor.java:68)
> at
> org.apache.solr.update.processor.UpdateRequestProcessor.processCommit(UpdateRequestProcessor.java:68)
> at
> org.apache.solr.update.processor.UpdateRequestProcessor.processCommit(UpdateRequestProcessor.java:68)
> at
> org.apache.solr.update.processor.UpdateRequestProcessor.processCommit(UpdateRequestProcessor.java:68)
> at
> org.apache.solr.update.processor.UpdateRequestProcessor.processCommit(UpdateRequestProcessor.java:68)
> at
> org.apache.solr.update.processor.UpdateRequestProcessor.processCommit(UpdateRequestProcessor.java:68)
> at
> org.apache.solr.update.processor.UpdateRequestProcessor.processCommit(UpdateRequestProcessor.java:68)
> at
> org.apache.solr.update.processor.UpdateRequestProcessor.processCommit(UpdateRequestProcessor.java:68)
> at
> org.apache.solr.handler.RequestHandlerUtils.handleCommit(RequestHandlerUtils.java:69)
> at
> org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:62)
> at
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:211)
> at org.apache.solr.core.SolrCore.execute(SolrCore.java:2596)
> at org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:799)
> at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:578)
> at
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:419)
> at
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:351)
> at
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1602)
> at
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:540)
> at
> org.eclipse.jetty.server.ha

Re: Index growing and growing until restart

2020-01-21 Thread Jörn Franke
After testing the update?commit=true i now face an error: "Maximum lock
count exceeded". strange this is the first time i see this in the lockfiles
and when doing commit=true
ava.lang.Error: Maximum lock count exceeded
at
java.base/java.util.concurrent.locks.ReentrantReadWriteLock$Sync.fullTryAcquireShared(ReentrantReadWriteLock.java:535)
at
java.base/java.util.concurrent.locks.ReentrantReadWriteLock$Sync.tryAcquireShared(ReentrantReadWriteLock.java:494)
at
java.base/java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1368)
at
java.base/java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.tryLock(ReentrantReadWriteLock.java:882)
at
org.apache.solr.update.DefaultSolrCoreState.lock(DefaultSolrCoreState.java:179)
at
org.apache.solr.update.DefaultSolrCoreState.getIndexWriter(DefaultSolrCoreState.java:124)
at
org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:658)
at
org.apache.solr.update.processor.RunUpdateProcessor.processCommit(RunUpdateProcessorFactory.java:102)
at
org.apache.solr.update.processor.UpdateRequestProcessor.processCommit(UpdateRequestProcessor.java:68)
at
org.apache.solr.update.processor.DistributedUpdateProcessor.doLocalCommit(DistributedUpdateProcessor.java:1079)
at
org.apache.solr.update.processor.DistributedZkUpdateProcessor.processCommit(DistributedZkUpdateProcessor.java:220)
at
org.apache.solr.update.processor.LogUpdateProcessorFactory$LogUpdateProcessor.processCommit(LogUpdateProcessorFactory.java:160)
at
org.apache.solr.update.processor.UpdateRequestProcessor.processCommit(UpdateRequestProcessor.java:68)
at
org.apache.solr.update.processor.UpdateRequestProcessor.processCommit(UpdateRequestProcessor.java:68)
at
org.apache.solr.update.processor.UpdateRequestProcessor.processCommit(UpdateRequestProcessor.java:68)
at
org.apache.solr.update.processor.UpdateRequestProcessor.processCommit(UpdateRequestProcessor.java:68)
at
org.apache.solr.update.processor.UpdateRequestProcessor.processCommit(UpdateRequestProcessor.java:68)
at
org.apache.solr.update.processor.UpdateRequestProcessor.processCommit(UpdateRequestProcessor.java:68)
at
org.apache.solr.update.processor.UpdateRequestProcessor.processCommit(UpdateRequestProcessor.java:68)
at
org.apache.solr.update.processor.UpdateRequestProcessor.processCommit(UpdateRequestProcessor.java:68)
at
org.apache.solr.handler.RequestHandlerUtils.handleCommit(RequestHandlerUtils.java:69)
at
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:62)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:211)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:2596)
at org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:799)
at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:578)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:419)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:351)
at
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1602)
at
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:540)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:146)
at
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)
at
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)
at
org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:257)
at
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1711)
at
org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:255)
at
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1347)
at
org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:203)
at
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:480)
at
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1678)
at
org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:201)
at
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1249)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:144)
at
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:220)
at
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:152)
at
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)
at
org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(RewriteHandler.java:335)
at
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)
at org.eclipse.jetty.server.Server.handle(Server.java:505)
at org.eclipse.jetty.ser

Re: Index growing and growing until restart

2020-01-21 Thread Jörn Franke
The only weird thing is I see that for instance I have
${solr.autoCommit.maxTime:15000}  and similar entries.
It looks like a template gone wrong, but this was not caused due to an
internal development. It must have been come from a Solr version.

On Tue, Jan 21, 2020 at 10:49 PM Jörn Franke  wrote:

> It is btw. a Linux system and autosoftcommit is set to -1. However, indeed
> openSearcher is set to false. A commit is set to true after doing all the
> updates, but the index is not shrinking. The files are not disappearing
> during shutdown, but they disappear after starting up again.
>
> On Tue, Jan 21, 2020 at 4:04 PM Jörn Franke  wrote:
>
>> thanks for the answer I will look into it - it is a possible explanation.
>>
>> > Am 20.01.2020 um 14:30 schrieb Erick Erickson > >:
>> >
>> > Jörn:
>> >
>> > The only thing I can think of that _might_ cause this (I’m not all that
>> familiar with the code) is if your solrconfig settings never open a
>> searcher. Either you need to be sure openSearcher is set to true in the
>> autocommit section in solrconfig.xml or your autoSoftCommit is set to
>> something other than -1. Real Time Get requires access to all segments and
>> it takes a new searcher being opened to release them. Actually, a very
>> quick test would be to submit 
>> “http://host:port/solr/collection/update?commit=true”
>> and see if the index shrinks as a result. You don’t need to change
>> solrconfig.xml for that test.
>> >
>> > If you are opening a new searcher, this is very concerning. There
>> shouldn’t be anything else you have to set to prevent the index from
>> growing. Could you check one thing? Compare the directory listing of the
>> data/index directory just before you shut down Solr and then just after.
>> What I’m  interested in is whether some subset of files disappears when you
>> shut down Solr. This assumes you’re running on a *nix system, if Windows
>> you may have to start Solr again to see the difference.
>> >
>> > So if you open a searcher and still see the problem, I can try to
>> reproduce it. Can you share your solrconfig file or at least the autocommit
>> and cache portions?
>> >
>> > Best,
>> > Erick
>> >
>> >> On Jan 20, 2020, at 5:40 AM, Jörn Franke  wrote:
>> >>
>> >> From what is see it basically duplicates the index files, but does not
>> delete the old ones.
>> >> It uses caffeine cache.
>> >>
>> >> What I observe is that there is an exception when shutting down for
>> the collection that is updated - timeout waiting for all directory ref
>> counts to be released - gave up waiting on CacheDir.
>> >>
>>  Am 20.01.2020 um 11:26 schrieb Jörn Franke :
>> >>>
>> >>> Sorry I missed a line - not tlog is growing but the /data/index
>> folder is growing - until restart when it seems to be purged.
>> >>>
>>  Am 20.01.2020 um 10:47 schrieb Jörn Franke :
>> 
>>  Hi,
>> 
>>  I have a test system here with Solr 8.4 (but this is also
>> reproducible in older Solr versions), which has an index which is growing
>> and growing - until the SolrCloud instance is restarted - then it is
>> reduced tot the expected normal size.
>>  The collection is configured to do auto commit after 15000 ms. I
>> expect the index grows comes due to the usage of atomic updates, but I
>> would expect that due to the auto commit this does not grow all the time.
>>  After the atomic updates a commit is done in any case.
>> 
>>  I don’t see any error message in the log files, but the growth is
>> quiet significant and frequent restarts are not a solution of course.
>> 
>>  Maybe I am overlooking here a tiny configuration issue?
>> 
>>  Thank you.
>> 
>> 
>>  Best regards
>> >
>>
>


Re: Index growing and growing until restart

2020-01-21 Thread Jörn Franke
It is btw. a Linux system and autosoftcommit is set to -1. However, indeed
openSearcher is set to false. A commit is set to true after doing all the
updates, but the index is not shrinking. The files are not disappearing
during shutdown, but they disappear after starting up again.

On Tue, Jan 21, 2020 at 4:04 PM Jörn Franke  wrote:

> thanks for the answer I will look into it - it is a possible explanation.
>
> > Am 20.01.2020 um 14:30 schrieb Erick Erickson :
> >
> > Jörn:
> >
> > The only thing I can think of that _might_ cause this (I’m not all that
> familiar with the code) is if your solrconfig settings never open a
> searcher. Either you need to be sure openSearcher is set to true in the
> autocommit section in solrconfig.xml or your autoSoftCommit is set to
> something other than -1. Real Time Get requires access to all segments and
> it takes a new searcher being opened to release them. Actually, a very
> quick test would be to submit 
> “http://host:port/solr/collection/update?commit=true”
> and see if the index shrinks as a result. You don’t need to change
> solrconfig.xml for that test.
> >
> > If you are opening a new searcher, this is very concerning. There
> shouldn’t be anything else you have to set to prevent the index from
> growing. Could you check one thing? Compare the directory listing of the
> data/index directory just before you shut down Solr and then just after.
> What I’m  interested in is whether some subset of files disappears when you
> shut down Solr. This assumes you’re running on a *nix system, if Windows
> you may have to start Solr again to see the difference.
> >
> > So if you open a searcher and still see the problem, I can try to
> reproduce it. Can you share your solrconfig file or at least the autocommit
> and cache portions?
> >
> > Best,
> > Erick
> >
> >> On Jan 20, 2020, at 5:40 AM, Jörn Franke  wrote:
> >>
> >> From what is see it basically duplicates the index files, but does not
> delete the old ones.
> >> It uses caffeine cache.
> >>
> >> What I observe is that there is an exception when shutting down for the
> collection that is updated - timeout waiting for all directory ref counts
> to be released - gave up waiting on CacheDir.
> >>
>  Am 20.01.2020 um 11:26 schrieb Jörn Franke :
> >>>
> >>> Sorry I missed a line - not tlog is growing but the /data/index
> folder is growing - until restart when it seems to be purged.
> >>>
>  Am 20.01.2020 um 10:47 schrieb Jörn Franke :
> 
>  Hi,
> 
>  I have a test system here with Solr 8.4 (but this is also
> reproducible in older Solr versions), which has an index which is growing
> and growing - until the SolrCloud instance is restarted - then it is
> reduced tot the expected normal size.
>  The collection is configured to do auto commit after 15000 ms. I
> expect the index grows comes due to the usage of atomic updates, but I
> would expect that due to the auto commit this does not grow all the time.
>  After the atomic updates a commit is done in any case.
> 
>  I don’t see any error message in the log files, but the growth is
> quiet significant and frequent restarts are not a solution of course.
> 
>  Maybe I am overlooking here a tiny configuration issue?
> 
>  Thank you.
> 
> 
>  Best regards
> >
>


BooleanQueryBuilder is not adding parenthesis around the query

2020-01-21 Thread Arnold Bronley
Hi,

BooleanQueryBuilder is not adding parenthesis around the query. It
only adds + sign at the start of the query but not the parentheses around
the query. Why is that? How should I add it?

booleanQueryBuilder.add(query, BooleanClause.Occur.MUST)


Re: Need help in configuring Spell check in Apache Solr 8.4

2020-01-21 Thread kumar gaurav
Can you share spellcheck component and handler which you have used ?

On Mon, Jan 20, 2020 at 3:35 PM seeteshh  wrote:

> Hello all,
>
> I am not able to check and test the spell check feature in Apache solr 8.4
>
> Tried multiple examples including
>
>
> https://examples.javacodegeeks.com/enterprise-java/apache-solr/solr-spellcheck-example/
>
> However I am not getting any results
>
> Regards,
>
> Seetesh Hindlekar
>
>
>
> --
> Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>


Re: ConnectionImpl.isValid() does not behave as described in Connection javadocs

2020-01-21 Thread Kevin Risden
Nick - Feel free to open a Jira and PR. I think the disconnect is the
meaning of timeout=0 between JDBC and the Solr client.

Kevin Risden


On Sun, Jan 19, 2020 at 3:34 PM Nick Vercammen 
wrote:

> I think so as the ConnectionImpl in solr is not in line with the
> description of the java connection interface
>
> > Op 19 jan. 2020 om 21:23 heeft Erick Erickson 
> het volgende geschreven:
> >
> > Is this a Solr issue?
> >
> >> On Sun, Jan 19, 2020, 14:24 Nick Vercammen 
> >> wrote:
> >>
> >> Hello,
> >>
> >> I'm trying to write a solr driver for metabase. Internally metabase
> uses a
> >> C3P0 connection pool. Upon checkout of the connection from the pool the
> >> library does a call to isValid(0) (timeout = 0)
> >>
> >> According to the javadocs (
> >>
> >>
> https://docs.oracle.com/en/java/javase/11/docs/api/java.sql/java/sql/Connection.html#isValid(int)
> >> )
> >> a
> >> timeout = 0 means no timeout. In the current implementation a timeout =
> 0
> >> means that the connection is always invalid.
> >>
> >> I can provide a PR for this.
> >>
> >> Nick
> >>
> >> --
> >> [image: Zeticon]
> >> Nick Vercammen
> >> CTO
> >> +32 9 275 31 31
> >> +32 471 39 77 36
> >> nick.vercam...@zeticon.com
> >> 
> >>  <
> >> https://twitter.com/mediahaven>
> >> www.zeticon.com
> >>
>


Re: Index growing and growing until restart

2020-01-21 Thread Jörn Franke
thanks for the answer I will look into it - it is a possible explanation. 

> Am 20.01.2020 um 14:30 schrieb Erick Erickson :
> 
> Jörn:
> 
> The only thing I can think of that _might_ cause this (I’m not all that 
> familiar with the code) is if your solrconfig settings never open a searcher. 
> Either you need to be sure openSearcher is set to true in the autocommit 
> section in solrconfig.xml or your autoSoftCommit is set to something other 
> than -1. Real Time Get requires access to all segments and it takes a new 
> searcher being opened to release them. Actually, a very quick test would be 
> to submit “http://host:port/solr/collection/update?commit=true” and see if 
> the index shrinks as a result. You don’t need to change solrconfig.xml for 
> that test.
> 
> If you are opening a new searcher, this is very concerning. There shouldn’t 
> be anything else you have to set to prevent the index from growing. Could you 
> check one thing? Compare the directory listing of the data/index directory 
> just before you shut down Solr and then just after. What I’m  interested in 
> is whether some subset of files disappears when you shut down Solr. This 
> assumes you’re running on a *nix system, if Windows you may have to start 
> Solr again to see the difference.
> 
> So if you open a searcher and still see the problem, I can try to reproduce 
> it. Can you share your solrconfig file or at least the autocommit and cache 
> portions? 
> 
> Best,
> Erick
> 
>> On Jan 20, 2020, at 5:40 AM, Jörn Franke  wrote:
>> 
>> From what is see it basically duplicates the index files, but does not 
>> delete the old ones.
>> It uses caffeine cache.
>> 
>> What I observe is that there is an exception when shutting down for the 
>> collection that is updated - timeout waiting for all directory ref counts to 
>> be released - gave up waiting on CacheDir.
>> 
 Am 20.01.2020 um 11:26 schrieb Jörn Franke :
>>> 
>>> Sorry I missed a line - not tlog is growing but the /data/index folder is 
>>> growing - until restart when it seems to be purged.
>>> 
 Am 20.01.2020 um 10:47 schrieb Jörn Franke :
 
 Hi,
 
 I have a test system here with Solr 8.4 (but this is also reproducible in 
 older Solr versions), which has an index which is growing and growing - 
 until the SolrCloud instance is restarted - then it is reduced tot the 
 expected normal size. 
 The collection is configured to do auto commit after 15000 ms. I expect 
 the index grows comes due to the usage of atomic updates, but I would 
 expect that due to the auto commit this does not grow all the time.
 After the atomic updates a commit is done in any case.
 
 I don’t see any error message in the log files, but the growth is quiet 
 significant and frequent restarts are not a solution of course.
 
 Maybe I am overlooking here a tiny configuration issue? 
 
 Thank you.
 
 
 Best regards
> 


Call for presentations for ApacheCon North America 2020 now open

2020-01-21 Thread Rich Bowen

Dear Apache enthusiast,

(You’re receiving this message because you are subscribed to one or more 
project mailing lists at the Apache Software Foundation.)


The call for presentations for ApacheCon North America 2020 is now open 
at https://apachecon.com/acna2020/cfp


ApacheCon will be held at the Sheraton, New Orleans, September 28th 
through October 2nd, 2020.


As in past years, ApacheCon will feature tracks focusing on the various 
technologies within the Apache ecosystem, and so the call for 
presentations will ask you to select one of those tracks, or “General” 
if the content falls outside of one of our already-organized tracks. 
These tracks are:


Karaf
Internet of Things
Fineract
Community
Content Delivery
Solr/Lucene (Search)
Gobblin/Big Data Integration
Ignite
Observability
Cloudstack
Geospatial
Graph
Camel/Integration
Flagon
Tomcat
Cassandra
Groovy
Web/httpd
General/Other

The CFP will close Friday, May 1, 2020 8:00 AM (America/New_York time).

Submit early, submit often, at https://apachecon.com/acna2020/cfp

Rich, for the ApacheCon Planners


Re: Solr 8.0 Json Facets are slow - need help

2020-01-21 Thread kumar gaurav
Hi Mikhail

Thanks for your reply . Please help me in this .

Followings are the screenshot:-

[image: image.png]


[image: image.png]


json facet debug Output:-

json:
{

   - facet:
   {
  - color_refine:
  {
 - domain:
 {
- excludeTags: "rassortment,top,top2,top3,top4,",
- filter:
[
   -
   "{!filters param=$child.fq excludeTags=rcolor_refine v=$sq}",
   - "{!child of=$pq filters=$fq}docType:(product collection)",
   ],
},
 - type: "terms",
 - field: "color_refine",
 - limit: -1,
 - facet:
 {
- productsCount: "uniqueBlock(_root_)"
},
 },
  - size_refine:
  {
 - domain:
 {
- excludeTags: "rassortment,top,top2,top3,top4,",
- filter:
[
   - "{!filters param=$child.fq excludeTags=rsize_refine v=$sq}"
   ,
   - "{!child of=$pq filters=$fq}docType:(product collection)",
   ],
},
 - type: "terms",
 - field: "size_refine",
 - limit: -1,
 - facet:
 {
- productsCount: "uniqueBlock(_root_)"
},
 },
  }

}



regards
Kumar Gaurav


On Tue, Jan 21, 2020 at 5:25 PM Mikhail Khludnev  wrote:

> Hi.
> Can you share debugQuery=true output?
>
> On Tue, Jan 21, 2020 at 1:37 PM kumar gaurav  wrote:
>
> > HI
> >
> > i have a parent child query in which i have used json facet for child
> > faceting like following.
> >
> > qt=/dismax
> > matchAllQueryRef1=+(+({!query v=$cq}))
> > sq=+{!lucene v=$matchAllQueryRef1}
> > q={!parent tag=top which=$pq filters=$child.fq score=max v=$cq}
> > child.fq={!tag=rcolor_refine}filter({!term f=color_refine
> > v=$qcolor_refine1}) filter({!term f=color_refine v=$qcolor_refine2})
> > qcolor_refine1=Blue
> > qcolor_refine2=Other clrs
> > cq=+{!simpleFilter v=docType:sku}
> > pq=docType:(product)
> > facet=true
> > facet.mincount=1
> > facet.limit=-1
> > facet.missing=false
> > json.facet= {color_refine:{
> > domain:{
> > filter:["{!filters param=$child.fq excludeTags=rcolor_refine
> > v=$sq}","{!child of=$pq filters=$fq}docType:(product)"]
> >},
> > type:terms,
> > field:color_refine,
> > limit:-1,
> > facet:{productsCount:"uniqueBlock(_root_)"}}}
> >
> > schema :-
> >  > multiValued="true" docValues="true"/>
> >
> > i have observed that json facets are slow . It is taking much time than
> > expected .
> > Can anyone please check this query specially child.fq and json.facet
> part .
> >
> > Please help me in this .
> >
> > Thanks & regards
> > Kumar Gaurav
> >
>
>
> --
> Sincerely yours
> Mikhail Khludnev
>


Re: Solr 8.0 Json Facets are slow - need help

2020-01-21 Thread Mikhail Khludnev
Hi.
Can you share debugQuery=true output?

On Tue, Jan 21, 2020 at 1:37 PM kumar gaurav  wrote:

> HI
>
> i have a parent child query in which i have used json facet for child
> faceting like following.
>
> qt=/dismax
> matchAllQueryRef1=+(+({!query v=$cq}))
> sq=+{!lucene v=$matchAllQueryRef1}
> q={!parent tag=top which=$pq filters=$child.fq score=max v=$cq}
> child.fq={!tag=rcolor_refine}filter({!term f=color_refine
> v=$qcolor_refine1}) filter({!term f=color_refine v=$qcolor_refine2})
> qcolor_refine1=Blue
> qcolor_refine2=Other clrs
> cq=+{!simpleFilter v=docType:sku}
> pq=docType:(product)
> facet=true
> facet.mincount=1
> facet.limit=-1
> facet.missing=false
> json.facet= {color_refine:{
> domain:{
> filter:["{!filters param=$child.fq excludeTags=rcolor_refine
> v=$sq}","{!child of=$pq filters=$fq}docType:(product)"]
>},
> type:terms,
> field:color_refine,
> limit:-1,
> facet:{productsCount:"uniqueBlock(_root_)"}}}
>
> schema :-
>  multiValued="true" docValues="true"/>
>
> i have observed that json facets are slow . It is taking much time than
> expected .
> Can anyone please check this query specially child.fq and json.facet part .
>
> Please help me in this .
>
> Thanks & regards
> Kumar Gaurav
>


-- 
Sincerely yours
Mikhail Khludnev


Solr 8.0 Json Facets are slow - need help

2020-01-21 Thread kumar gaurav
HI

i have a parent child query in which i have used json facet for child
faceting like following.

qt=/dismax
matchAllQueryRef1=+(+({!query v=$cq}))
sq=+{!lucene v=$matchAllQueryRef1}
q={!parent tag=top which=$pq filters=$child.fq score=max v=$cq}
child.fq={!tag=rcolor_refine}filter({!term f=color_refine
v=$qcolor_refine1}) filter({!term f=color_refine v=$qcolor_refine2})
qcolor_refine1=Blue
qcolor_refine2=Other clrs
cq=+{!simpleFilter v=docType:sku}
pq=docType:(product)
facet=true
facet.mincount=1
facet.limit=-1
facet.missing=false
json.facet= {color_refine:{
domain:{
filter:["{!filters param=$child.fq excludeTags=rcolor_refine
v=$sq}","{!child of=$pq filters=$fq}docType:(product)"]
   },
type:terms,
field:color_refine,
limit:-1,
facet:{productsCount:"uniqueBlock(_root_)"}}}

schema :-


i have observed that json facets are slow . It is taking much time than
expected .
Can anyone please check this query specially child.fq and json.facet part .

Please help me in this .

Thanks & regards
Kumar Gaurav


Re: regarding Extracting text from Images

2020-01-21 Thread Retro
Hello, thank you for the info, Iwill look into this as well. Yes, we plan to
use it in production, but on a longer run. For the moment I just need to
make it work as a test case. 



--
Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: regarding Extracting text from Images

2020-01-21 Thread Retro
Yes, I did. this manual is referring to standalone version of TIKA, while I
have a build-in version.



--
Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html