Re: NullPointerException in PeerSync.handleUpdates

2017-11-21 Thread Erick Erickson
Right, if there's no "fixed version" mentioned and if the resolution
is "unresolved", it's not in the code base at all. But that JIRA is
not apparently reproducible, especially on more recent versions that
6.2. Is it possible to test a more recent version (6.6.2 would be my
recommendation).

Erick

On Tue, Nov 21, 2017 at 9:58 PM, S G  wrote:
> My bad. I found it at https://issues.apache.org/jira/browse/SOLR-9453
> But I could not find it in changes.txt perhaps because its yet not resolved.
>
> On Tue, Nov 21, 2017 at 9:15 AM, Erick Erickson 
> wrote:
>
>> Did you check the JIRA list? Or CHANGES.txt in more recent versions?
>>
>> On Tue, Nov 21, 2017 at 1:13 AM, S G  wrote:
>> > Hi,
>> >
>> > We are running 6.2 version of Solr and hitting this error frequently.
>> >
>> > Error while trying to recover. core=my_core:java.lang.
>> NullPointerException
>> > at org.apache.solr.update.PeerSync.handleUpdates(
>> PeerSync.java:605)
>> > at org.apache.solr.update.PeerSync.handleResponse(
>> PeerSync.java:344)
>> > at org.apache.solr.update.PeerSync.sync(PeerSync.java:257)
>> > at org.apache.solr.cloud.RecoveryStrategy.doRecovery(
>> RecoveryStrategy.java:376)
>> > at org.apache.solr.cloud.RecoveryStrategy.run(
>> RecoveryStrategy.java:221)
>> > at java.util.concurrent.Executors$RunnableAdapter.
>> call(Executors.java:511)
>> > at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>> > at org.apache.solr.common.util.ExecutorUtil$
>> MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:229)
>> > at java.util.concurrent.ThreadPoolExecutor.runWorker(
>> ThreadPoolExecutor.java:1142)
>> > at java.util.concurrent.ThreadPoolExecutor$Worker.run(
>> ThreadPoolExecutor.java:617)
>> > at java.lang.Thread.run(Thread.java:745)
>> >
>> >
>> >
>> > Is this a known issue and fixed in some newer version?
>> >
>> >
>> > Thanks
>> > SG
>>


Re: Recovery Issue - Solr 6.6.1 and HDFS

2017-11-21 Thread Erick Erickson
Well, you can  always manually change the ZK nodes, but whether just
setting a node's state to "leader" in ZK then starting the Solr
instance hosting that node would work... I don't know. Do consider
running CheckIndex on one of the replicas in question first though.

Best,
Erick

On Tue, Nov 21, 2017 at 3:06 PM, Joe Obernberger
 wrote:
> One other data point I just saw on one of the nodes.  It has the following
> error:
> 2017-11-21 22:59:48.886 ERROR
> (coreZkRegister-1-thread-1-processing-n:leda:9100_solr) [c:UNCLASS s:shard14
> r:core_node175 x:UNCLASS_shard14_replica3]
> o.a.s.c.ShardLeaderElectionContext There was a problem trying to register as
> the leader:org.apache.solr.common.SolrException: Leader Initiated Recovery
> prevented leadership
> at
> org.apache.solr.cloud.ShardLeaderElectionContext.checkLIR(ElectionContext.java:521)
> at
> org.apache.solr.cloud.ShardLeaderElectionContext.runLeaderProcess(ElectionContext.java:424)
> at
> org.apache.solr.cloud.LeaderElector.runIamLeaderProcess(LeaderElector.java:170)
> at
> org.apache.solr.cloud.LeaderElector.checkIfIamLeader(LeaderElector.java:135)
> at
> org.apache.solr.cloud.LeaderElector.joinElection(LeaderElector.java:307)
> at
> org.apache.solr.cloud.LeaderElector.joinElection(LeaderElector.java:216)
> at
> org.apache.solr.cloud.ShardLeaderElectionContext.rejoinLeaderElection(ElectionContext.java:684)
> at
> org.apache.solr.cloud.ShardLeaderElectionContext.runLeaderProcess(ElectionContext.java:454)
> at
> org.apache.solr.cloud.LeaderElector.runIamLeaderProcess(LeaderElector.java:170)
> at
> org.apache.solr.cloud.LeaderElector.checkIfIamLeader(LeaderElector.java:135)
> at
> org.apache.solr.cloud.LeaderElector.joinElection(LeaderElector.java:307)
> at
> org.apache.solr.cloud.LeaderElector.joinElection(LeaderElector.java:216)
> at
> org.apache.solr.cloud.ShardLeaderElectionContext.rejoinLeaderElection(ElectionContext.java:684)
> at
> org.apache.solr.cloud.ShardLeaderElectionContext.runLeaderProcess(ElectionContext.java:454)
> at
> org.apache.solr.cloud.LeaderElector.runIamLeaderProcess(LeaderElector.java:170)
> at
> org.apache.solr.cloud.LeaderElector.checkIfIamLeader(LeaderElector.java:135)
> at
> org.apache.solr.cloud.LeaderElector.joinElection(LeaderElector.java:307)
> at
> org.apache.solr.cloud.LeaderElector.joinElection(LeaderElector.java:216)
>
> This stack trace repeats for a long while; looks like a recursive call.
>
>
> -Joe
>
>
> On 11/21/2017 3:24 PM, Hendrik Haddorp wrote:
>>
>> We sometimes also have replicas not recovering. If one replica is left
>> active the easiest is to then to delete the replica and create a new one.
>> When all replicas are down it helps most of the time to restart one of the
>> nodes that contains a replica in down state. If that also doesn't get the
>> replica to recover I would check the logs of the node and also that of the
>> overseer node. I have seen the same issue on Solr using local storage. The
>> main HDFS related issues we had so far was those lock files and if you
>> delete and recreate collections/cores and it sometimes happens that the data
>> was not cleaned up in HDFS and then causes a conflict.
>>
>> Hendrik
>>
>> On 21.11.2017 21:07, Joe Obernberger wrote:
>>>
>>> We've never run an index this size in anything but HDFS, so I have no
>>> comparison.  What we've been doing is keeping two main collections - all
>>> data, and the last 30 days of data.  Then we handle queries based on date
>>> range. The 30 day index is significantly faster.
>>>
>>> My main concern right now is that 6 of the 100 shards are not coming back
>>> because of no leader.  I've never seen this error before.  Any ideas?
>>> ClusterStatus shows all three replicas with state 'down'.
>>>
>>> Thanks!
>>>
>>> -joe
>>>
>>>
>>> On 11/21/2017 2:35 PM, Hendrik Haddorp wrote:

 We actually also have some performance issue with HDFS at the moment. We
 are doing lots of soft commits for NRT search. Those seem to be slower then
 with local storage. The investigation is however not really far yet.

 We have a setup with 2000 collections, with one shard each and a
 replication factor of 2 or 3. When we restart nodes too fast that causes
 problems with the overseer queue, which can lead to the queue getting out 
 of
 control and Solr pretty much dying. We are still on Solr 6.3. 6.6 has some
 improvements and should handle these actions faster. I would check what you
 see for "/solr/admin/collections?action=OVERSEERSTATUS=json". The
 critical part is the "overseer_queue_size" value. If this goes up to about
 1 it is pretty much game over on our setup. In that case it seems to be
 best to stop all nodes, clear the queue in ZK and then restart the nodes 
 one
 by one with a gap of like 5min. That 

Re: NullPointerException in PeerSync.handleUpdates

2017-11-21 Thread S G
My bad. I found it at https://issues.apache.org/jira/browse/SOLR-9453
But I could not find it in changes.txt perhaps because its yet not resolved.

On Tue, Nov 21, 2017 at 9:15 AM, Erick Erickson 
wrote:

> Did you check the JIRA list? Or CHANGES.txt in more recent versions?
>
> On Tue, Nov 21, 2017 at 1:13 AM, S G  wrote:
> > Hi,
> >
> > We are running 6.2 version of Solr and hitting this error frequently.
> >
> > Error while trying to recover. core=my_core:java.lang.
> NullPointerException
> > at org.apache.solr.update.PeerSync.handleUpdates(
> PeerSync.java:605)
> > at org.apache.solr.update.PeerSync.handleResponse(
> PeerSync.java:344)
> > at org.apache.solr.update.PeerSync.sync(PeerSync.java:257)
> > at org.apache.solr.cloud.RecoveryStrategy.doRecovery(
> RecoveryStrategy.java:376)
> > at org.apache.solr.cloud.RecoveryStrategy.run(
> RecoveryStrategy.java:221)
> > at java.util.concurrent.Executors$RunnableAdapter.
> call(Executors.java:511)
> > at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> > at org.apache.solr.common.util.ExecutorUtil$
> MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:229)
> > at java.util.concurrent.ThreadPoolExecutor.runWorker(
> ThreadPoolExecutor.java:1142)
> > at java.util.concurrent.ThreadPoolExecutor$Worker.run(
> ThreadPoolExecutor.java:617)
> > at java.lang.Thread.run(Thread.java:745)
> >
> >
> >
> > Is this a known issue and fixed in some newer version?
> >
> >
> > Thanks
> > SG
>


Re: tokenstream reusable

2017-11-21 Thread Mikhail Khludnev
Hello, Roxana.
You probably looking for TeeSinkTokenFilter, but I believe the idea is
cumbersome to implement in Solr.
Also there is a preanalyzed field which can keep tokenstream in external
form.


Re: Merging of index in Solr

2017-11-21 Thread Zheng Lin Edwin Yeo
Hi,

I have encountered this error during the merging of the 3.5TB of index.
What could be the cause that lead to this?

Exception in thread "main" Exception in thread "Lucene Merge Thread #8"
java.io.

IOException: background merge hit exception: _6f(6.5.1):C7256757
_6e(6.5.1):C646

2072 _6d(6.5.1):C3750777 _6c(6.5.1):C2243594 _6b(6.5.1):C1015431
_6a(6.5.1):C105

0220 _69(6.5.1):c273879 _28(6.4.1):c79011/84:delGen=84
_26(6.4.1):c44960/8149:de

lGen=100 _29(6.4.1):c73855/68:delGen=68 _5(6.4.1):C46672/31:delGen=31
_68(6.5.1)

:c66 into _6g [maxNumSegments=1]

at
org.apache.lucene.index.IndexWriter.forceMerge(IndexWriter.java:1931)



at
org.apache.lucene.index.IndexWriter.forceMerge(IndexWriter.java:1871)



at
org.apache.lucene.misc.IndexMergeTool.main(IndexMergeTool.java:57)

Caused by: java.io.IOException: The requested operation could not be
completed d

ue to a file system limitation

at sun.nio.ch.FileDispatcherImpl.write0(Native Method)

at sun.nio.ch.FileDispatcherImpl.write(Unknown Source)

at sun.nio.ch.IOUtil.writeFromNativeBuffer(Unknown Source)

at sun.nio.ch.IOUtil.write(Unknown Source)

at sun.nio.ch.FileChannelImpl.write(Unknown Source)

at java.nio.channels.Channels.writeFullyImpl(Unknown Source)

at java.nio.channels.Channels.writeFully(Unknown Source)

at java.nio.channels.Channels.access$000(Unknown Source)

at java.nio.channels.Channels$1.write(Unknown Source)

at
org.apache.lucene.store.FSDirectory$FSIndexOutput$1.write(FSDirectory

.java:419)

at java.util.zip.CheckedOutputStream.write(Unknown Source)

at java.io.BufferedOutputStream.flushBuffer(Unknown Source)

at java.io.BufferedOutputStream.write(Unknown Source)

at
org.apache.lucene.store.OutputStreamIndexOutput.writeBytes(OutputStre

amIndexOutput.java:53)

at
org.apache.lucene.store.RateLimitedIndexOutput.writeBytes(RateLimited

IndexOutput.java:73)

at org.apache.lucene.store.DataOutput.writeBytes(DataOutput.java:52)

at
org.apache.lucene.codecs.lucene50.ForUtil.writeBlock(ForUtil.java:175

)

at
org.apache.lucene.codecs.lucene50.Lucene50PostingsWriter.addPosition(

Lucene50PostingsWriter.java:286)

at
org.apache.lucene.codecs.PushPostingsWriterBase.writeTerm(PushPosting

sWriterBase.java:156)

at
org.apache.lucene.codecs.blocktree.BlockTreeTermsWriter$TermsWriter.w

rite(BlockTreeTermsWriter.java:866)

at
org.apache.lucene.codecs.blocktree.BlockTreeTermsWriter.write(BlockTr

eeTermsWriter.java:344)

at
org.apache.lucene.codecs.FieldsConsumer.merge(FieldsConsumer.java:105

)

at
org.apache.lucene.codecs.perfield.PerFieldPostingsFormat$FieldsWriter

.merge(PerFieldPostingsFormat.java:164)

at
org.apache.lucene.index.SegmentMerger.mergeTerms(SegmentMerger.java:2

16)

at
org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:101)

at
org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:4353

)

at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3928)

at
org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMe

rgeScheduler.java:624)

at
org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(Conc

urrentMergeScheduler.java:661)

org.apache.lucene.index.MergePolicy$MergeException: java.io.IOException:
The req

uested operation could not be completed due to a file system limitation

at
org.apache.lucene.index.ConcurrentMergeScheduler.handleMergeException

(ConcurrentMergeScheduler.java:703)

at
org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(Conc

urrentMergeScheduler.java:683)

Caused by: java.io.IOException: The requested operation could not be
completed d

ue to a file system limitation

at sun.nio.ch.FileDispatcherImpl.write0(Native Method)

at sun.nio.ch.FileDispatcherImpl.write(Unknown Source)

at sun.nio.ch.IOUtil.writeFromNativeBuffer(Unknown Source)

at sun.nio.ch.IOUtil.write(Unknown Source)

at sun.nio.ch.FileChannelImpl.write(Unknown Source)

at java.nio.channels.Channels.writeFullyImpl(Unknown Source)

at java.nio.channels.Channels.writeFully(Unknown Source)

at java.nio.channels.Channels.access$000(Unknown Source)

at java.nio.channels.Channels$1.write(Unknown Source)

Regards,
Edwin

On 22 November 2017 at 00:10, Zheng Lin Edwin Yeo 
wrote:

> I am using the IndexMergeTool from Solr, from the command below:
>
> java -classpath lucene-core-6.5.1.jar;lucene-misc-6.5.1.jar
> org.apache.lucene.misc.IndexMergeTool
>
> The heap size is 32GB. There are more than 20 million documents in the two
> cores.
>
> Regards,
> Edwin
>
>
>
> On 21 November 2017 at 21:54, Shawn Heisey  wrote:
>
>> On 11/20/2017 9:35 AM, Zheng Lin Edwin Yeo wrote:
>>
>>> Does anyone knows how long 

FORCELEADER not working - solr 6.6.1

2017-11-21 Thread Joe Obernberger
Hi All - sorry for the repeat, but I'm at a complete loss on this.  I 
have a collection with 100 shards and 3 replicas each.  6 of the shard 
will not elect a leader.  I've tried the FORCELEADER command, but 
nothing changes.


The log shows 'Force leader attempt 1.  Waiting 5 secs for an active 
leader'  It tries 9 times, and then stops.


The error that I get for a shard in question is:

org.apache.solr.common.SolrException: Error getting leader from zk for 
shard shard21

    at org.apache.solr.cloud.ZkController.getLeader(ZkController.java:996)
    at org.apache.solr.cloud.ZkController.register(ZkController.java:902)
    at org.apache.solr.cloud.ZkController.register(ZkController.java:846)
    at 
org.apache.solr.core.ZkContainer.lambda$registerInZk$0(ZkContainer.java:181)
    at 
org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:229)
    at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)

    at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.solr.common.SolrException: Could not get leader props
    at 
org.apache.solr.cloud.ZkController.getLeaderProps(ZkController.java:1043)
    at 
org.apache.solr.cloud.ZkController.getLeaderProps(ZkController.java:1007)

    at org.apache.solr.cloud.ZkController.getLeader(ZkController.java:963)
    ... 7 more
Caused by: org.apache.zookeeper.KeeperException$NoNodeException: 
KeeperErrorCode = NoNode for /collections/UNCLASS/leaders/shard21/leader
    at 
org.apache.zookeeper.KeeperException.create(KeeperException.java:111)
    at 
org.apache.zookeeper.KeeperException.create(KeeperException.java:51)

    at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1151)
    at 
org.apache.solr.common.cloud.SolrZkClient$7.execute(SolrZkClient.java:357)
    at 
org.apache.solr.common.cloud.SolrZkClient$7.execute(SolrZkClient.java:354)
    at 
org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkCmdExecutor.java:60)
    at 
org.apache.solr.common.cloud.SolrZkClient.getData(SolrZkClient.java:354)
    at 
org.apache.solr.cloud.ZkController.getLeaderProps(ZkController.java:1021)

    ... 9 more

Please help.  Thank you!

-Joe



Re: Solr 7.x: Issues with unique()/hll() function on a string field nested in a range facet

2017-11-21 Thread Yonik Seeley
I opened https://issues.apache.org/jira/browse/SOLR-11664 to track this.
I should be able to look into this shortly if no one else does.

-Yonik


On Tue, Nov 21, 2017 at 6:02 PM, Yonik Seeley  wrote:
> Thanks for the complete info that allowed me to easily reproduce this!
> The bug seems to extend beyond hll/unique... I tried min(string_s) and
> got wonky results as well.
>
> -Yonik
>
>
> On Tue, Nov 21, 2017 at 7:47 AM, Volodymyr Rudniev  
> wrote:
>> Hello,
>>
>> I've encountered 2 issues while trying to apply unique()/hll() function to a
>> string field inside a range facet:
>>
>> Results are incorrect for a single-valued string field.
>> I’m getting ArrayIndexOutOfBoundsException for a multi-valued string field.
>>
>>
>> How to reproduce:
>>
>> Create a core based on the default configSet.
>> Add several simple documents to the core, like these:
>>
>> [
>>   {
>> "id": "14790",
>> "int_i": 2010,
>> "date_dt": "2010-01-01T00:00:00Z",
>> "string_s": "a",
>> "string_ss": ["a", "b"]
>>   },
>>   {
>> "id": "12254",
>> "int_i": 2014,
>> "date_dt": "2014-01-01T00:00:00Z",
>> "string_s": "e",
>> "string_ss": ["b", "c"]
>>   },
>>   {
>> "id": "12937",
>> "int_i": 2008,
>> "date_dt": "2008-01-01T00:00:00Z",
>> "string_s": "c",
>> "string_ss": ["c", "d"]
>>   },
>>   {
>> "id": "10575",
>> "int_i": 2008,
>> "date_dt": "2008-01-01T00:00:00Z",
>> "string_s": "b",
>> "string_ss": ["d", "e"]
>>   },
>>   {
>> "id": "13644",
>> "int_i": 2014,
>> "date_dt": "2014-01-01T00:00:00Z",
>> "string_s": "e",
>> "string_ss": ["e", "a"]
>>   },
>>   {
>> "id": "8405",
>> "int_i": 2014,
>> "date_dt": "2014-01-01T00:00:00Z",
>> "string_s": "d",
>> "string_ss": ["a", "b"]
>>   },
>>   {
>> "id": "6128",
>> "int_i": 2008,
>> "date_dt": "2008-01-01T00:00:00Z",
>> "string_s": "a",
>> "string_ss": ["b", "c"]
>>   },
>>   {
>> "id": "5220",
>> "int_i": 2015,
>> "date_dt": "2015-01-01T00:00:00Z",
>> "string_s": "d",
>> "string_ss": ["c", "d"]
>>   },
>>   {
>> "id": "6850",
>> "int_i": 2012,
>> "date_dt": "2012-01-01T00:00:00Z",
>> "string_s": "b",
>> "string_ss": ["d", "e"]
>>   },
>>   {
>> "id": "5748",
>> "int_i": 2014,
>> "date_dt": "2014-01-01T00:00:00Z",
>> "string_s": "e",
>> "string_ss": ["e", "a"]
>>   }
>> ]
>>
>> 3. Try queries like the following for a single-valued string field:
>>
>> q=*:*=0={"facet":{"histogram":{"include":"lower,edge","other":"all","field":"int_i","gap":1,"missing":false,"start":2008,"end":2016,"type":"range","facet":{"distinct_count":"unique(string_s)"
>>
>> q=*:*=0={"facet":{"histogram":{"include":"lower,edge","other":"all","field":"date_dt","gap":"%2B1YEAR","missing":false,"start":"2008-01-01T00:00:00Z","end":"2016-01-01T00:00:00Z","type":"range","facet":{"distinct_count":"unique(string_s)"
>>
>> Distinct counts returned are incorrect in general. For example, for the set
>> of documents above, the response will contain:
>>
>> {
>> "val": 2010,
>> "count": 1,
>> "distinct_count": 0
>> }
>>
>> and
>>
>> "between": {
>> "count": 10,
>> "distinct_count": 1
>> }
>>
>> (there should be 5 distinct values).
>>
>> Note, the result depends on the order in which the documents are added.
>>
>> 4. Try queries like the following for a multi-valued string field:
>>
>> q=*:*=0={"facet":{"histogram":{"include":"lower,edge","other":"all","field":"int_i","gap":1,"missing":false,"start":2008,"end":2016,"type":"range","facet":{"distinct_count":"unique(string_ss)"
>>
>> q=*:*=0={"facet":{"histogram":{"include":"lower,edge","other":"all","field":"date_dt","gap":"%2B1YEAR","missing":false,"start":"2008-01-01T00:00:00Z","end":"2016-01-01T00:00:00Z","type":"range","facet":{"distinct_count":"unique(string_ss)"
>>
>> I’m getting ArrayIndexOutOfBoundsException for such queries.
>>
>> Note, everything looks Ok for other field types (I tried single- and
>> multi-valued ints, doubles and dates) or when the enclosing facet is a terms
>> facet or there is no enclosing facet at all.
>>
>> I can reproduce these issues both for Solr 7.0.1 and 7.1.0. Solr 6.x and
>> 5.x, as it seems, do not have such issues.
>>
>> Is it a bug? Or, may be, I’ve missed something?
>>
>> Thanks,
>>
>> Volodymyr
>>


Re: Recovery Issue - Solr 6.6.1 and HDFS

2017-11-21 Thread Joe Obernberger
One other data point I just saw on one of the nodes.  It has the 
following error:
2017-11-21 22:59:48.886 ERROR 
(coreZkRegister-1-thread-1-processing-n:leda:9100_solr) [c:UNCLASS 
s:shard14 r:core_node175 x:UNCLASS_shard14_replica3] 
o.a.s.c.ShardLeaderElectionContext There was a problem trying to 
register as the leader:org.apache.solr.common.SolrException: Leader 
Initiated Recovery prevented leadership
    at 
org.apache.solr.cloud.ShardLeaderElectionContext.checkLIR(ElectionContext.java:521)
    at 
org.apache.solr.cloud.ShardLeaderElectionContext.runLeaderProcess(ElectionContext.java:424)
    at 
org.apache.solr.cloud.LeaderElector.runIamLeaderProcess(LeaderElector.java:170)
    at 
org.apache.solr.cloud.LeaderElector.checkIfIamLeader(LeaderElector.java:135)
    at 
org.apache.solr.cloud.LeaderElector.joinElection(LeaderElector.java:307)
    at 
org.apache.solr.cloud.LeaderElector.joinElection(LeaderElector.java:216)
    at 
org.apache.solr.cloud.ShardLeaderElectionContext.rejoinLeaderElection(ElectionContext.java:684)
    at 
org.apache.solr.cloud.ShardLeaderElectionContext.runLeaderProcess(ElectionContext.java:454)
    at 
org.apache.solr.cloud.LeaderElector.runIamLeaderProcess(LeaderElector.java:170)
    at 
org.apache.solr.cloud.LeaderElector.checkIfIamLeader(LeaderElector.java:135)
    at 
org.apache.solr.cloud.LeaderElector.joinElection(LeaderElector.java:307)
    at 
org.apache.solr.cloud.LeaderElector.joinElection(LeaderElector.java:216)
    at 
org.apache.solr.cloud.ShardLeaderElectionContext.rejoinLeaderElection(ElectionContext.java:684)
    at 
org.apache.solr.cloud.ShardLeaderElectionContext.runLeaderProcess(ElectionContext.java:454)
    at 
org.apache.solr.cloud.LeaderElector.runIamLeaderProcess(LeaderElector.java:170)
    at 
org.apache.solr.cloud.LeaderElector.checkIfIamLeader(LeaderElector.java:135)
    at 
org.apache.solr.cloud.LeaderElector.joinElection(LeaderElector.java:307)
    at 
org.apache.solr.cloud.LeaderElector.joinElection(LeaderElector.java:216)


This stack trace repeats for a long while; looks like a recursive call.

-Joe


On 11/21/2017 3:24 PM, Hendrik Haddorp wrote:
We sometimes also have replicas not recovering. If one replica is left 
active the easiest is to then to delete the replica and create a new 
one. When all replicas are down it helps most of the time to restart 
one of the nodes that contains a replica in down state. If that also 
doesn't get the replica to recover I would check the logs of the node 
and also that of the overseer node. I have seen the same issue on Solr 
using local storage. The main HDFS related issues we had so far was 
those lock files and if you delete and recreate collections/cores and 
it sometimes happens that the data was not cleaned up in HDFS and then 
causes a conflict.


Hendrik

On 21.11.2017 21:07, Joe Obernberger wrote:
We've never run an index this size in anything but HDFS, so I have no 
comparison.  What we've been doing is keeping two main collections - 
all data, and the last 30 days of data.  Then we handle queries based 
on date range. The 30 day index is significantly faster.


My main concern right now is that 6 of the 100 shards are not coming 
back because of no leader.  I've never seen this error before.  Any 
ideas?  ClusterStatus shows all three replicas with state 'down'.


Thanks!

-joe


On 11/21/2017 2:35 PM, Hendrik Haddorp wrote:
We actually also have some performance issue with HDFS at the 
moment. We are doing lots of soft commits for NRT search. Those seem 
to be slower then with local storage. The investigation is however 
not really far yet.


We have a setup with 2000 collections, with one shard each and a 
replication factor of 2 or 3. When we restart nodes too fast that 
causes problems with the overseer queue, which can lead to the queue 
getting out of control and Solr pretty much dying. We are still on 
Solr 6.3. 6.6 has some improvements and should handle these actions 
faster. I would check what you see for 
"/solr/admin/collections?action=OVERSEERSTATUS=json". The 
critical part is the "overseer_queue_size" value. If this goes up to 
about 1 it is pretty much game over on our setup. In that case 
it seems to be best to stop all nodes, clear the queue in ZK and 
then restart the nodes one by one with a gap of like 5min. That 
normally recovers pretty well.


regards,
Hendrik

On 21.11.2017 20:12, Joe Obernberger wrote:
We set the hard commit time long because we were having performance 
issues with HDFS, and thought that since the block size is 128M, 
having a longer hard commit made sense.  That was our hypothesis 
anyway. Happy to switch it back and see what happens.


I don't know what caused the cluster to go into recovery in the 
first place.  We had a server die over the weekend, but it's just 
one out of ~50.  Every shard is 3x replicated (and 3x replicated in 
HDFS...so 9 copies).  It was at this point that 

Re: Solr 7.x: Issues with unique()/hll() function on a string field nested in a range facet

2017-11-21 Thread Yonik Seeley
Thanks for the complete info that allowed me to easily reproduce this!
The bug seems to extend beyond hll/unique... I tried min(string_s) and
got wonky results as well.

-Yonik


On Tue, Nov 21, 2017 at 7:47 AM, Volodymyr Rudniev  wrote:
> Hello,
>
> I've encountered 2 issues while trying to apply unique()/hll() function to a
> string field inside a range facet:
>
> Results are incorrect for a single-valued string field.
> I’m getting ArrayIndexOutOfBoundsException for a multi-valued string field.
>
>
> How to reproduce:
>
> Create a core based on the default configSet.
> Add several simple documents to the core, like these:
>
> [
>   {
> "id": "14790",
> "int_i": 2010,
> "date_dt": "2010-01-01T00:00:00Z",
> "string_s": "a",
> "string_ss": ["a", "b"]
>   },
>   {
> "id": "12254",
> "int_i": 2014,
> "date_dt": "2014-01-01T00:00:00Z",
> "string_s": "e",
> "string_ss": ["b", "c"]
>   },
>   {
> "id": "12937",
> "int_i": 2008,
> "date_dt": "2008-01-01T00:00:00Z",
> "string_s": "c",
> "string_ss": ["c", "d"]
>   },
>   {
> "id": "10575",
> "int_i": 2008,
> "date_dt": "2008-01-01T00:00:00Z",
> "string_s": "b",
> "string_ss": ["d", "e"]
>   },
>   {
> "id": "13644",
> "int_i": 2014,
> "date_dt": "2014-01-01T00:00:00Z",
> "string_s": "e",
> "string_ss": ["e", "a"]
>   },
>   {
> "id": "8405",
> "int_i": 2014,
> "date_dt": "2014-01-01T00:00:00Z",
> "string_s": "d",
> "string_ss": ["a", "b"]
>   },
>   {
> "id": "6128",
> "int_i": 2008,
> "date_dt": "2008-01-01T00:00:00Z",
> "string_s": "a",
> "string_ss": ["b", "c"]
>   },
>   {
> "id": "5220",
> "int_i": 2015,
> "date_dt": "2015-01-01T00:00:00Z",
> "string_s": "d",
> "string_ss": ["c", "d"]
>   },
>   {
> "id": "6850",
> "int_i": 2012,
> "date_dt": "2012-01-01T00:00:00Z",
> "string_s": "b",
> "string_ss": ["d", "e"]
>   },
>   {
> "id": "5748",
> "int_i": 2014,
> "date_dt": "2014-01-01T00:00:00Z",
> "string_s": "e",
> "string_ss": ["e", "a"]
>   }
> ]
>
> 3. Try queries like the following for a single-valued string field:
>
> q=*:*=0={"facet":{"histogram":{"include":"lower,edge","other":"all","field":"int_i","gap":1,"missing":false,"start":2008,"end":2016,"type":"range","facet":{"distinct_count":"unique(string_s)"
>
> q=*:*=0={"facet":{"histogram":{"include":"lower,edge","other":"all","field":"date_dt","gap":"%2B1YEAR","missing":false,"start":"2008-01-01T00:00:00Z","end":"2016-01-01T00:00:00Z","type":"range","facet":{"distinct_count":"unique(string_s)"
>
> Distinct counts returned are incorrect in general. For example, for the set
> of documents above, the response will contain:
>
> {
> "val": 2010,
> "count": 1,
> "distinct_count": 0
> }
>
> and
>
> "between": {
> "count": 10,
> "distinct_count": 1
> }
>
> (there should be 5 distinct values).
>
> Note, the result depends on the order in which the documents are added.
>
> 4. Try queries like the following for a multi-valued string field:
>
> q=*:*=0={"facet":{"histogram":{"include":"lower,edge","other":"all","field":"int_i","gap":1,"missing":false,"start":2008,"end":2016,"type":"range","facet":{"distinct_count":"unique(string_ss)"
>
> q=*:*=0={"facet":{"histogram":{"include":"lower,edge","other":"all","field":"date_dt","gap":"%2B1YEAR","missing":false,"start":"2008-01-01T00:00:00Z","end":"2016-01-01T00:00:00Z","type":"range","facet":{"distinct_count":"unique(string_ss)"
>
> I’m getting ArrayIndexOutOfBoundsException for such queries.
>
> Note, everything looks Ok for other field types (I tried single- and
> multi-valued ints, doubles and dates) or when the enclosing facet is a terms
> facet or there is no enclosing facet at all.
>
> I can reproduce these issues both for Solr 7.0.1 and 7.1.0. Solr 6.x and
> 5.x, as it seems, do not have such issues.
>
> Is it a bug? Or, may be, I’ve missed something?
>
> Thanks,
>
> Volodymyr
>


Re: Data inconsistencies and updates in solrcloud

2017-11-21 Thread Tom Barber

Thanks Erick!

As I said, user error! ;)

Tom

On 21/11/17 22:41, Erick Erickson wrote:

I think you're confusing shards with replicas.

numShards is 2, each with one replica. Therefore half of your docs
will wind up on one replica and half on the other. If you're adding a
single doc, by definition it'll be placed on only one of the two
shards. If your shards had multiple replicas, all of the replicas
associated with that shard would change.

Best,
Erick

On Tue, Nov 21, 2017 at 12:56 PM, Tom Barber  wrote:

Hi folks

I can't find an answer to this, and its clearly user error,  we have a 
collection in solrcloud that is started numShards=2 replicationFactor=1 solr 
seems happy the collection seems happy. Yet when we post and update to it and 
then look at the record again, it seems to only affect one core and not the 
second.

What are we likely to be doing wrong in our config or update to prevent the 
replication?

Thanks

Tom




--


Spicule Limited is registered in England & Wales. Company Number: 09954122. 
Registered office: First Floor, Telecom House, 125-135 Preston Road, 
Brighton, England, BN1 6AF. VAT No. 251478891.



All engagements are subject to Spicule Terms and Conditions of Business. 
This email and its contents are intended solely for the individual to whom 
it is addressed and may contain information that is confidential, 
privileged or otherwise protected from disclosure, distributing or copying. 
Any views or opinions presented in this email are solely those of the 
author and do not necessarily represent those of Spicule Limited. The 
company accepts no liability for any damage caused by any virus transmitted 
by this email. If you have received this message in error, please notify us 
immediately by reply email before deleting it from your system. Service of 
legal notice cannot be effected on Spicule Limited by email.


Re: Recovery Issue - Solr 6.6.1 and HDFS

2017-11-21 Thread Joe Obernberger
Hi Hendrick - the shards in question have three replicas.  I tried 
restarting each one (one by one) - no luck.  No leader is found.  I 
deleted one of the replicas and added a new one, and the new one also 
shows as 'down'.  I also tried the FORCELEADER call, but that had no 
effect.  I checked the OVERSEERSTATUS, but there is nothing unusual 
there.  I don't see anything useful in the logs except the error:


org.apache.solr.common.SolrException: Error getting leader from zk for 
shard shard21

    at org.apache.solr.cloud.ZkController.getLeader(ZkController.java:996)
    at org.apache.solr.cloud.ZkController.register(ZkController.java:902)
    at org.apache.solr.cloud.ZkController.register(ZkController.java:846)
    at 
org.apache.solr.core.ZkContainer.lambda$registerInZk$0(ZkContainer.java:181)
    at 
org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:229)
    at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)

    at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.solr.common.SolrException: Could not get leader props
    at 
org.apache.solr.cloud.ZkController.getLeaderProps(ZkController.java:1043)
    at 
org.apache.solr.cloud.ZkController.getLeaderProps(ZkController.java:1007)

    at org.apache.solr.cloud.ZkController.getLeader(ZkController.java:963)
    ... 7 more
Caused by: org.apache.zookeeper.KeeperException$NoNodeException: 
KeeperErrorCode = NoNode for /collections/UNCLASS/leaders/shard21/leader
    at 
org.apache.zookeeper.KeeperException.create(KeeperException.java:111)

    at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
    at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1151)
    at 
org.apache.solr.common.cloud.SolrZkClient$7.execute(SolrZkClient.java:357)
    at 
org.apache.solr.common.cloud.SolrZkClient$7.execute(SolrZkClient.java:354)
    at 
org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkCmdExecutor.java:60)
    at 
org.apache.solr.common.cloud.SolrZkClient.getData(SolrZkClient.java:354)
    at 
org.apache.solr.cloud.ZkController.getLeaderProps(ZkController.java:1021)

    ... 9 more

Can I modify zookeeper to force a leader?  Is there any other way to 
recover from this?  Thanks very much!


-Joe


On 11/21/2017 3:24 PM, Hendrik Haddorp wrote:
We sometimes also have replicas not recovering. If one replica is left 
active the easiest is to then to delete the replica and create a new 
one. When all replicas are down it helps most of the time to restart 
one of the nodes that contains a replica in down state. If that also 
doesn't get the replica to recover I would check the logs of the node 
and also that of the overseer node. I have seen the same issue on Solr 
using local storage. The main HDFS related issues we had so far was 
those lock files and if you delete and recreate collections/cores and 
it sometimes happens that the data was not cleaned up in HDFS and then 
causes a conflict.


Hendrik

On 21.11.2017 21:07, Joe Obernberger wrote:
We've never run an index this size in anything but HDFS, so I have no 
comparison.  What we've been doing is keeping two main collections - 
all data, and the last 30 days of data.  Then we handle queries based 
on date range. The 30 day index is significantly faster.


My main concern right now is that 6 of the 100 shards are not coming 
back because of no leader.  I've never seen this error before.  Any 
ideas?  ClusterStatus shows all three replicas with state 'down'.


Thanks!

-joe


On 11/21/2017 2:35 PM, Hendrik Haddorp wrote:
We actually also have some performance issue with HDFS at the 
moment. We are doing lots of soft commits for NRT search. Those seem 
to be slower then with local storage. The investigation is however 
not really far yet.


We have a setup with 2000 collections, with one shard each and a 
replication factor of 2 or 3. When we restart nodes too fast that 
causes problems with the overseer queue, which can lead to the queue 
getting out of control and Solr pretty much dying. We are still on 
Solr 6.3. 6.6 has some improvements and should handle these actions 
faster. I would check what you see for 
"/solr/admin/collections?action=OVERSEERSTATUS=json". The 
critical part is the "overseer_queue_size" value. If this goes up to 
about 1 it is pretty much game over on our setup. In that case 
it seems to be best to stop all nodes, clear the queue in ZK and 
then restart the nodes one by one with a gap of like 5min. That 
normally recovers pretty well.


regards,
Hendrik

On 21.11.2017 20:12, Joe Obernberger wrote:
We set the hard commit time long because we were having performance 
issues with HDFS, and thought that since the block size is 128M, 
having a longer hard commit made sense.  That was our hypothesis 
anyway. Happy to switch it back and see what happens.


I don't know what caused 

Re: Data inconsistencies and updates in solrcloud

2017-11-21 Thread Erick Erickson
I think you're confusing shards with replicas.

numShards is 2, each with one replica. Therefore half of your docs
will wind up on one replica and half on the other. If you're adding a
single doc, by definition it'll be placed on only one of the two
shards. If your shards had multiple replicas, all of the replicas
associated with that shard would change.

Best,
Erick

On Tue, Nov 21, 2017 at 12:56 PM, Tom Barber  wrote:
> Hi folks
>
> I can't find an answer to this, and its clearly user error,  we have a 
> collection in solrcloud that is started numShards=2 replicationFactor=1 solr 
> seems happy the collection seems happy. Yet when we post and update to it and 
> then look at the record again, it seems to only affect one core and not the 
> second.
>
> What are we likely to be doing wrong in our config or update to prevent the 
> replication?
>
> Thanks
>
> Tom


Re: Possible to disable SynonymQuery and get legacy behavior?

2017-11-21 Thread Doug Turnbull
I have submitted a patch to make the query generated for overlapping query
terms somewhat configurable (w/ default being SynonymQuery), based on
practices I've seen in the field. I'd love to hear feedback

https://issues.apache.org/jira/browse/SOLR-11662

On Tue, Nov 21, 2017 at 12:37 PM Doug Turnbull <
dturnb...@opensourceconnections.com> wrote:

> We help clients that perform index-time semantic expansion to hypernyms at
> index time. For example, they will have a synonyms file that does the
> following
>
> wing_tips => wing_tips, dress_shoes, shoes
> dress_shoes => dress_shoes, shoes
> oxfords => oxfords, dress_shoes, shoes
>
> Then at query time, we rely on differing IDF of these terms in the same
> position to bring up the rare, specific terms matches, followed by
> increasingly semantically broad matches. For example, Previously, a search
> for wing_tips would get turned into "wing_tips OR dress_shoes OR shoes".
> Shoes being very common would get scored lowest. Wing tips being very
> specific would get scored very highly
>
> ( I have a blog post about this (which uses Elasticsearch)
>
> http://opensourceconnections.com/blog/2016/12/23/elasticsearch-synonyms-patterns-taxonomies/
>  )
>
> As our clients upgrade to Solr 6 and above, we're noticing our technique
> no longer works due to SynonymQuery, which blends the doc freq at query
> time of synonyms at query time. SynonymQuery seems to be the right
> direction for most people :) Still I would like to figure out how/if
> there's a setting anywhere to return to the legacy behavior (a boolean
> query of term queries) so I don't have to go back to the drawing board for
> clients that rely on this technique.
>
> I've been going through QueryBuilder and I don't see where we could go
> back to the legacy behavior. It seems to be based on position overlap.
>
> Thanks!
> -Doug
>
>
>
> --
> Consultant, OpenSource Connections. Contact info at
> http://o19s.com/about-us/doug-turnbull/; Free/Busy (
> http://bit.ly/dougs_cal)
>
-- 
Consultant, OpenSource Connections. Contact info at
http://o19s.com/about-us/doug-turnbull/; Free/Busy (http://bit.ly/dougs_cal)


Re: Recovery Issue - Solr 6.6.1 and HDFS

2017-11-21 Thread Joe Obernberger
Thank you Erick.  I've set the RamBufferSize to 1G; perhaps higher would 
be beneficial.  One more data point is that if I restart a node, more 
often than not, it goes into recovery, beats up the network for a while, 
and then goes green.  This happens even if I do no indexing between 
restarts.  Is that expected? Sometimes this can take longer than 20 
minutes.  No new data was added to the index between the restarts.


-Joe


On 11/21/2017 3:43 PM, Erick Erickson wrote:

bq: We are doing lots of soft commits for NRT search...

It's not surprising that this is slower than local storage, especially
if you have any autowarming going on. Opening  new searchers will need
to read data from disk for the new segments, and HDFS may be slower
here.

As far as the commit interval, an under-appreciated event is that when
RAMBufferSizeMB is exceeded (default 100M last I knew) new segments
are written _anyway_, they're just a little invisible. That is, the
segments_n file isn't updated even though they're closed IIUC at
least. So that very long interval isn't helping with that problem I
don't think

Evidence to the contrary trumps my understanding of course.

About starting all these collections up at once and the Overseer
queue. I've seen this in similar situations. There are a _lot_ of
messages flying back and forth for each replica on startup, and the
Overseer processing was very inefficient historically so that queue
could get in the 100s of K, I've seen some pathological situations
where it's over 1M. SOLR-10524 made this a lot better. There are still
a lot of messages written in a case like yours, but at least the
Overseer has a much better chance to keep up Solr 6.6... At that
point bringing up Solr took a very long time.


Erick

On Tue, Nov 21, 2017 at 12:24 PM, Hendrik Haddorp
 wrote:

We sometimes also have replicas not recovering. If one replica is left
active the easiest is to then to delete the replica and create a new one.
When all replicas are down it helps most of the time to restart one of the
nodes that contains a replica in down state. If that also doesn't get the
replica to recover I would check the logs of the node and also that of the
overseer node. I have seen the same issue on Solr using local storage. The
main HDFS related issues we had so far was those lock files and if you
delete and recreate collections/cores and it sometimes happens that the data
was not cleaned up in HDFS and then causes a conflict.

Hendrik


On 21.11.2017 21:07, Joe Obernberger wrote:

We've never run an index this size in anything but HDFS, so I have no
comparison.  What we've been doing is keeping two main collections - all
data, and the last 30 days of data.  Then we handle queries based on date
range.  The 30 day index is significantly faster.

My main concern right now is that 6 of the 100 shards are not coming back
because of no leader.  I've never seen this error before.  Any ideas?
ClusterStatus shows all three replicas with state 'down'.

Thanks!

-joe


On 11/21/2017 2:35 PM, Hendrik Haddorp wrote:

We actually also have some performance issue with HDFS at the moment. We
are doing lots of soft commits for NRT search. Those seem to be slower then
with local storage. The investigation is however not really far yet.

We have a setup with 2000 collections, with one shard each and a
replication factor of 2 or 3. When we restart nodes too fast that causes
problems with the overseer queue, which can lead to the queue getting out of
control and Solr pretty much dying. We are still on Solr 6.3. 6.6 has some
improvements and should handle these actions faster. I would check what you
see for "/solr/admin/collections?action=OVERSEERSTATUS=json". The
critical part is the "overseer_queue_size" value. If this goes up to about
1 it is pretty much game over on our setup. In that case it seems to be
best to stop all nodes, clear the queue in ZK and then restart the nodes one
by one with a gap of like 5min. That normally recovers pretty well.

regards,
Hendrik

On 21.11.2017 20:12, Joe Obernberger wrote:

We set the hard commit time long because we were having performance
issues with HDFS, and thought that since the block size is 128M, having a
longer hard commit made sense.  That was our hypothesis anyway.  Happy to
switch it back and see what happens.

I don't know what caused the cluster to go into recovery in the first
place.  We had a server die over the weekend, but it's just one out of ~50.
Every shard is 3x replicated (and 3x replicated in HDFS...so 9 copies).  It
was at this point that we noticed lots of network activity, and most of the
shards in this recovery, fail, retry loop.  That is when we decided to shut
it down resulting in zombie lock files.

I tried using the FORCELEADER call, which completed, but doesn't seem to
have any effect on the shards that have no leader. Kinda out of ideas for
that problem.  If I can get the cluster back up, I'll try a lower hard
commit time.  

Data inconsistencies and updates in solrcloud

2017-11-21 Thread Tom Barber
Hi folks

I can't find an answer to this, and its clearly user error,  we have a 
collection in solrcloud that is started numShards=2 replicationFactor=1 solr 
seems happy the collection seems happy. Yet when we post and update to it and 
then look at the record again, it seems to only affect one core and not the 
second. 

What are we likely to be doing wrong in our config or update to prevent the 
replication?

Thanks

Tom


Re: Recovery Issue - Solr 6.6.1 and HDFS

2017-11-21 Thread Erick Erickson
bq: We are doing lots of soft commits for NRT search...

It's not surprising that this is slower than local storage, especially
if you have any autowarming going on. Opening  new searchers will need
to read data from disk for the new segments, and HDFS may be slower
here.

As far as the commit interval, an under-appreciated event is that when
RAMBufferSizeMB is exceeded (default 100M last I knew) new segments
are written _anyway_, they're just a little invisible. That is, the
segments_n file isn't updated even though they're closed IIUC at
least. So that very long interval isn't helping with that problem I
don't think

Evidence to the contrary trumps my understanding of course.

About starting all these collections up at once and the Overseer
queue. I've seen this in similar situations. There are a _lot_ of
messages flying back and forth for each replica on startup, and the
Overseer processing was very inefficient historically so that queue
could get in the 100s of K, I've seen some pathological situations
where it's over 1M. SOLR-10524 made this a lot better. There are still
a lot of messages written in a case like yours, but at least the
Overseer has a much better chance to keep up Solr 6.6... At that
point bringing up Solr took a very long time.


Erick

On Tue, Nov 21, 2017 at 12:24 PM, Hendrik Haddorp
 wrote:
> We sometimes also have replicas not recovering. If one replica is left
> active the easiest is to then to delete the replica and create a new one.
> When all replicas are down it helps most of the time to restart one of the
> nodes that contains a replica in down state. If that also doesn't get the
> replica to recover I would check the logs of the node and also that of the
> overseer node. I have seen the same issue on Solr using local storage. The
> main HDFS related issues we had so far was those lock files and if you
> delete and recreate collections/cores and it sometimes happens that the data
> was not cleaned up in HDFS and then causes a conflict.
>
> Hendrik
>
>
> On 21.11.2017 21:07, Joe Obernberger wrote:
>>
>> We've never run an index this size in anything but HDFS, so I have no
>> comparison.  What we've been doing is keeping two main collections - all
>> data, and the last 30 days of data.  Then we handle queries based on date
>> range.  The 30 day index is significantly faster.
>>
>> My main concern right now is that 6 of the 100 shards are not coming back
>> because of no leader.  I've never seen this error before.  Any ideas?
>> ClusterStatus shows all three replicas with state 'down'.
>>
>> Thanks!
>>
>> -joe
>>
>>
>> On 11/21/2017 2:35 PM, Hendrik Haddorp wrote:
>>>
>>> We actually also have some performance issue with HDFS at the moment. We
>>> are doing lots of soft commits for NRT search. Those seem to be slower then
>>> with local storage. The investigation is however not really far yet.
>>>
>>> We have a setup with 2000 collections, with one shard each and a
>>> replication factor of 2 or 3. When we restart nodes too fast that causes
>>> problems with the overseer queue, which can lead to the queue getting out of
>>> control and Solr pretty much dying. We are still on Solr 6.3. 6.6 has some
>>> improvements and should handle these actions faster. I would check what you
>>> see for "/solr/admin/collections?action=OVERSEERSTATUS=json". The
>>> critical part is the "overseer_queue_size" value. If this goes up to about
>>> 1 it is pretty much game over on our setup. In that case it seems to be
>>> best to stop all nodes, clear the queue in ZK and then restart the nodes one
>>> by one with a gap of like 5min. That normally recovers pretty well.
>>>
>>> regards,
>>> Hendrik
>>>
>>> On 21.11.2017 20:12, Joe Obernberger wrote:

 We set the hard commit time long because we were having performance
 issues with HDFS, and thought that since the block size is 128M, having a
 longer hard commit made sense.  That was our hypothesis anyway.  Happy to
 switch it back and see what happens.

 I don't know what caused the cluster to go into recovery in the first
 place.  We had a server die over the weekend, but it's just one out of ~50.
 Every shard is 3x replicated (and 3x replicated in HDFS...so 9 copies).  It
 was at this point that we noticed lots of network activity, and most of the
 shards in this recovery, fail, retry loop.  That is when we decided to shut
 it down resulting in zombie lock files.

 I tried using the FORCELEADER call, which completed, but doesn't seem to
 have any effect on the shards that have no leader. Kinda out of ideas for
 that problem.  If I can get the cluster back up, I'll try a lower hard
 commit time.  Thanks again Erick!

 -Joe


 On 11/21/2017 2:00 PM, Erick Erickson wrote:
>
> Frankly with HDFS I'm a bit out of my depth so listen to Hendrik ;)...
>
> I need to back up a bit. Once nodes are in this state it's not

Re: Recovery Issue - Solr 6.6.1 and HDFS

2017-11-21 Thread Hendrik Haddorp
We sometimes also have replicas not recovering. If one replica is left 
active the easiest is to then to delete the replica and create a new 
one. When all replicas are down it helps most of the time to restart one 
of the nodes that contains a replica in down state. If that also doesn't 
get the replica to recover I would check the logs of the node and also 
that of the overseer node. I have seen the same issue on Solr using 
local storage. The main HDFS related issues we had so far was those lock 
files and if you delete and recreate collections/cores and it sometimes 
happens that the data was not cleaned up in HDFS and then causes a conflict.


Hendrik

On 21.11.2017 21:07, Joe Obernberger wrote:
We've never run an index this size in anything but HDFS, so I have no 
comparison.  What we've been doing is keeping two main collections - 
all data, and the last 30 days of data.  Then we handle queries based 
on date range.  The 30 day index is significantly faster.


My main concern right now is that 6 of the 100 shards are not coming 
back because of no leader.  I've never seen this error before.  Any 
ideas?  ClusterStatus shows all three replicas with state 'down'.


Thanks!

-joe


On 11/21/2017 2:35 PM, Hendrik Haddorp wrote:
We actually also have some performance issue with HDFS at the moment. 
We are doing lots of soft commits for NRT search. Those seem to be 
slower then with local storage. The investigation is however not 
really far yet.


We have a setup with 2000 collections, with one shard each and a 
replication factor of 2 or 3. When we restart nodes too fast that 
causes problems with the overseer queue, which can lead to the queue 
getting out of control and Solr pretty much dying. We are still on 
Solr 6.3. 6.6 has some improvements and should handle these actions 
faster. I would check what you see for 
"/solr/admin/collections?action=OVERSEERSTATUS=json". The critical 
part is the "overseer_queue_size" value. If this goes up to about 
1 it is pretty much game over on our setup. In that case it seems 
to be best to stop all nodes, clear the queue in ZK and then restart 
the nodes one by one with a gap of like 5min. That normally recovers 
pretty well.


regards,
Hendrik

On 21.11.2017 20:12, Joe Obernberger wrote:
We set the hard commit time long because we were having performance 
issues with HDFS, and thought that since the block size is 128M, 
having a longer hard commit made sense.  That was our hypothesis 
anyway.  Happy to switch it back and see what happens.


I don't know what caused the cluster to go into recovery in the 
first place.  We had a server die over the weekend, but it's just 
one out of ~50.  Every shard is 3x replicated (and 3x replicated in 
HDFS...so 9 copies).  It was at this point that we noticed lots of 
network activity, and most of the shards in this recovery, fail, 
retry loop.  That is when we decided to shut it down resulting in 
zombie lock files.


I tried using the FORCELEADER call, which completed, but doesn't 
seem to have any effect on the shards that have no leader. Kinda out 
of ideas for that problem.  If I can get the cluster back up, I'll 
try a lower hard commit time.  Thanks again Erick!


-Joe


On 11/21/2017 2:00 PM, Erick Erickson wrote:

Frankly with HDFS I'm a bit out of my depth so listen to Hendrik ;)...

I need to back up a bit. Once nodes are in this state it's not
surprising that they need to be forcefully killed. I was more thinking
about how they got in this situation in the first place. _Before_ you
get into the nasty state how are the Solr nodes shut down? Forcefully?

Your hard commit is far longer than it needs to be, resulting in much
larger tlog files etc. I usually set this at 15-60 seconds with local
disks, not quite sure whether longer intervals are helpful on HDFS.
What this means is that you can spend up to 30 minutes when you
restart solr _replaying the tlogs_! If Solr is killed, it may not have
had a chance to fsync the segments and may have to replay on startup.
If you have openSearcher set to false, the hard commit operation is
not horribly expensive, it just fsync's the current segments and opens
new ones. It won't be a total cure, but I bet reducing this interval
would help a lot.

Also, if you stop indexing there's no need to wait 30 minutes if you
issue a manual commit, something like
.../collection/update?commit=true. Just reducing the hard commit
interval will make the wait between stopping indexing and restarting
shorter all by itself if you don't want to issue the manual commit.

Best,
Erick

On Tue, Nov 21, 2017 at 10:34 AM, Hendrik Haddorp
 wrote:

Hi,

the write.lock issue I see as well when Solr is not been stopped 
gracefully.
The write.lock files are then left in the HDFS as they do not get 
removed

automatically when the client disconnects like a ephemeral node in
ZooKeeper. Unfortunately Solr does also not realize that it should 
be owning
the lock as it is marked in the 

Re: Recovery Issue - Solr 6.6.1 and HDFS

2017-11-21 Thread Joe Obernberger
We've never run an index this size in anything but HDFS, so I have no 
comparison.  What we've been doing is keeping two main collections - all 
data, and the last 30 days of data.  Then we handle queries based on 
date range.  The 30 day index is significantly faster.


My main concern right now is that 6 of the 100 shards are not coming 
back because of no leader.  I've never seen this error before.  Any 
ideas?  ClusterStatus shows all three replicas with state 'down'.


Thanks!

-joe


On 11/21/2017 2:35 PM, Hendrik Haddorp wrote:
We actually also have some performance issue with HDFS at the moment. 
We are doing lots of soft commits for NRT search. Those seem to be 
slower then with local storage. The investigation is however not 
really far yet.


We have a setup with 2000 collections, with one shard each and a 
replication factor of 2 or 3. When we restart nodes too fast that 
causes problems with the overseer queue, which can lead to the queue 
getting out of control and Solr pretty much dying. We are still on 
Solr 6.3. 6.6 has some improvements and should handle these actions 
faster. I would check what you see for 
"/solr/admin/collections?action=OVERSEERSTATUS=json". The critical 
part is the "overseer_queue_size" value. If this goes up to about 
1 it is pretty much game over on our setup. In that case it seems 
to be best to stop all nodes, clear the queue in ZK and then restart 
the nodes one by one with a gap of like 5min. That normally recovers 
pretty well.


regards,
Hendrik

On 21.11.2017 20:12, Joe Obernberger wrote:
We set the hard commit time long because we were having performance 
issues with HDFS, and thought that since the block size is 128M, 
having a longer hard commit made sense.  That was our hypothesis 
anyway.  Happy to switch it back and see what happens.


I don't know what caused the cluster to go into recovery in the first 
place.  We had a server die over the weekend, but it's just one out 
of ~50.  Every shard is 3x replicated (and 3x replicated in HDFS...so 
9 copies).  It was at this point that we noticed lots of network 
activity, and most of the shards in this recovery, fail, retry loop.  
That is when we decided to shut it down resulting in zombie lock files.


I tried using the FORCELEADER call, which completed, but doesn't seem 
to have any effect on the shards that have no leader. Kinda out of 
ideas for that problem.  If I can get the cluster back up, I'll try a 
lower hard commit time.  Thanks again Erick!


-Joe


On 11/21/2017 2:00 PM, Erick Erickson wrote:

Frankly with HDFS I'm a bit out of my depth so listen to Hendrik ;)...

I need to back up a bit. Once nodes are in this state it's not
surprising that they need to be forcefully killed. I was more thinking
about how they got in this situation in the first place. _Before_ you
get into the nasty state how are the Solr nodes shut down? Forcefully?

Your hard commit is far longer than it needs to be, resulting in much
larger tlog files etc. I usually set this at 15-60 seconds with local
disks, not quite sure whether longer intervals are helpful on HDFS.
What this means is that you can spend up to 30 minutes when you
restart solr _replaying the tlogs_! If Solr is killed, it may not have
had a chance to fsync the segments and may have to replay on startup.
If you have openSearcher set to false, the hard commit operation is
not horribly expensive, it just fsync's the current segments and opens
new ones. It won't be a total cure, but I bet reducing this interval
would help a lot.

Also, if you stop indexing there's no need to wait 30 minutes if you
issue a manual commit, something like
.../collection/update?commit=true. Just reducing the hard commit
interval will make the wait between stopping indexing and restarting
shorter all by itself if you don't want to issue the manual commit.

Best,
Erick

On Tue, Nov 21, 2017 at 10:34 AM, Hendrik Haddorp
 wrote:

Hi,

the write.lock issue I see as well when Solr is not been stopped 
gracefully.
The write.lock files are then left in the HDFS as they do not get 
removed

automatically when the client disconnects like a ephemeral node in
ZooKeeper. Unfortunately Solr does also not realize that it should 
be owning
the lock as it is marked in the state stored in ZooKeeper as the 
owner and
is also not willing to retry, which is why you need to restart the 
whole
Solr instance after the cleanup. I added some logic to my Solr 
start up
script which scans the log files in HDFS and compares that with the 
state in
ZooKeeper and then delete all lock files that belong to the node 
that I'm

starting.

regards,
Hendrik


On 21.11.2017 14:07, Joe Obernberger wrote:
Hi All - we have a system with 45 physical boxes running solr 
6.6.1 using

HDFS as the index.  The current index size is about 31TBytes. With 3x
replication that takes up 93TBytes of disk. Our main collection is 
split
across 100 shards with 3 replicas each.  The issue that we're 
running 

Re: Recovery Issue - Solr 6.6.1 and HDFS

2017-11-21 Thread Hendrik Haddorp
We actually also have some performance issue with HDFS at the moment. We 
are doing lots of soft commits for NRT search. Those seem to be slower 
then with local storage. The investigation is however not really far yet.


We have a setup with 2000 collections, with one shard each and a 
replication factor of 2 or 3. When we restart nodes too fast that causes 
problems with the overseer queue, which can lead to the queue getting 
out of control and Solr pretty much dying. We are still on Solr 6.3. 6.6 
has some improvements and should handle these actions faster. I would 
check what you see for 
"/solr/admin/collections?action=OVERSEERSTATUS=json". The critical 
part is the "overseer_queue_size" value. If this goes up to about 1 
it is pretty much game over on our setup. In that case it seems to be 
best to stop all nodes, clear the queue in ZK and then restart the nodes 
one by one with a gap of like 5min. That normally recovers pretty well.


regards,
Hendrik

On 21.11.2017 20:12, Joe Obernberger wrote:
We set the hard commit time long because we were having performance 
issues with HDFS, and thought that since the block size is 128M, 
having a longer hard commit made sense.  That was our hypothesis 
anyway.  Happy to switch it back and see what happens.


I don't know what caused the cluster to go into recovery in the first 
place.  We had a server die over the weekend, but it's just one out of 
~50.  Every shard is 3x replicated (and 3x replicated in HDFS...so 9 
copies).  It was at this point that we noticed lots of network 
activity, and most of the shards in this recovery, fail, retry loop.  
That is when we decided to shut it down resulting in zombie lock files.


I tried using the FORCELEADER call, which completed, but doesn't seem 
to have any effect on the shards that have no leader.  Kinda out of 
ideas for that problem.  If I can get the cluster back up, I'll try a 
lower hard commit time.  Thanks again Erick!


-Joe


On 11/21/2017 2:00 PM, Erick Erickson wrote:

Frankly with HDFS I'm a bit out of my depth so listen to Hendrik ;)...

I need to back up a bit. Once nodes are in this state it's not
surprising that they need to be forcefully killed. I was more thinking
about how they got in this situation in the first place. _Before_ you
get into the nasty state how are the Solr nodes shut down? Forcefully?

Your hard commit is far longer than it needs to be, resulting in much
larger tlog files etc. I usually set this at 15-60 seconds with local
disks, not quite sure whether longer intervals are helpful on HDFS.
What this means is that you can spend up to 30 minutes when you
restart solr _replaying the tlogs_! If Solr is killed, it may not have
had a chance to fsync the segments and may have to replay on startup.
If you have openSearcher set to false, the hard commit operation is
not horribly expensive, it just fsync's the current segments and opens
new ones. It won't be a total cure, but I bet reducing this interval
would help a lot.

Also, if you stop indexing there's no need to wait 30 minutes if you
issue a manual commit, something like
.../collection/update?commit=true. Just reducing the hard commit
interval will make the wait between stopping indexing and restarting
shorter all by itself if you don't want to issue the manual commit.

Best,
Erick

On Tue, Nov 21, 2017 at 10:34 AM, Hendrik Haddorp
 wrote:

Hi,

the write.lock issue I see as well when Solr is not been stopped 
gracefully.
The write.lock files are then left in the HDFS as they do not get 
removed

automatically when the client disconnects like a ephemeral node in
ZooKeeper. Unfortunately Solr does also not realize that it should 
be owning
the lock as it is marked in the state stored in ZooKeeper as the 
owner and
is also not willing to retry, which is why you need to restart the 
whole

Solr instance after the cleanup. I added some logic to my Solr start up
script which scans the log files in HDFS and compares that with the 
state in
ZooKeeper and then delete all lock files that belong to the node 
that I'm

starting.

regards,
Hendrik


On 21.11.2017 14:07, Joe Obernberger wrote:
Hi All - we have a system with 45 physical boxes running solr 6.6.1 
using

HDFS as the index.  The current index size is about 31TBytes. With 3x
replication that takes up 93TBytes of disk. Our main collection is 
split
across 100 shards with 3 replicas each.  The issue that we're 
running into
is when restarting the solr6 cluster.  The shards go into recovery 
and start
to utilize nearly all of their network interfaces.  If we start too 
many of
the nodes at once, the shards will go into a recovery, fail, and 
retry loop

and never come up.  The errors are related to HDFS not responding fast
enough and warnings from the DFSClient.  If we stop a node when 
this is

happening, the script will force a stop (180 second timeout) and upon
restart, we have lock files (write.lock) inside of HDFS.

The process at this point is to 

Re: Recovery Issue - Solr 6.6.1 and HDFS

2017-11-21 Thread Hendrik Haddorp
Unfortunately I can not upload my cleanup code but the steps I'm doing 
are quite easy. I wrote it in Java using the HDFS API and Curator for 
ZooKeeper. Steps are:
    - read out the children of /collections in ZK so you know all the 
collection names

    - read /collections//state.json to get the state
    - find the replicas in the state and filter those out that have a 
"node_name" matching your locale node (the node name is basically a 
combination of your host name and the solr port)
    - if the replica data has "dataDir" set then you basically only 
need to add "index/write.lock" to it and you have the lock location
    - if "dataDir" is not set (not really sure why) then you need to 
construct it yourself: //name>/data/index/write.lock

    - if the lock file exist delete it

I believe there is a small race condition in case you use replica auto 
fail over. So I try to keep the time between checking the state in 
ZooKeeper and deleting the lock file as short, like not first determine 
all lock file locations and only then delete them but do that while 
checking the state.


regards,
Hendrik

On 21.11.2017 19:53, Joe Obernberger wrote:
A clever idea.  Normally what we do when we need to do a restart, is 
to halt indexing, and then wait about 30 minutes.  If we do not wait, 
and stop the cluster, the default scripts 180 second timeout is not 
enough and we'll have lock files to clean up.  We use puppet to start 
and stop the nodes, but at this point that is not working well since 
we need to start one node at a time.  With each one taking hours, this 
is a lengthy process!  I'd love to see your script!


This new error is now coming up - see screen shot.  For some reason 
some of the shards have no leader assigned:


http://lovehorsepower.com/SolrClusterErrors.jpg

-Joe


On 11/21/2017 1:34 PM, Hendrik Haddorp wrote:

Hi,

the write.lock issue I see as well when Solr is not been stopped 
gracefully. The write.lock files are then left in the HDFS as they do 
not get removed automatically when the client disconnects like a 
ephemeral node in ZooKeeper. Unfortunately Solr does also not realize 
that it should be owning the lock as it is marked in the state stored 
in ZooKeeper as the owner and is also not willing to retry, which is 
why you need to restart the whole Solr instance after the cleanup. I 
added some logic to my Solr start up script which scans the log files 
in HDFS and compares that with the state in ZooKeeper and then delete 
all lock files that belong to the node that I'm starting.


regards,
Hendrik

On 21.11.2017 14:07, Joe Obernberger wrote:
Hi All - we have a system with 45 physical boxes running solr 6.6.1 
using HDFS as the index. The current index size is about 31TBytes. 
With 3x replication that takes up 93TBytes of disk. Our main 
collection is split across 100 shards with 3 replicas each.  The 
issue that we're running into is when restarting the solr6 cluster.  
The shards go into recovery and start to utilize nearly all of their 
network interfaces.  If we start too many of the nodes at once, the 
shards will go into a recovery, fail, and retry loop and never come 
up.  The errors are related to HDFS not responding fast enough and 
warnings from the DFSClient.  If we stop a node when this is 
happening, the script will force a stop (180 second timeout) and 
upon restart, we have lock files (write.lock) inside of HDFS.


The process at this point is to start one node, find out the lock 
files, wait for it to come up completely (hours), stop it, delete 
the write.lock files, and restart.  Usually this second restart is 
faster, but it still can take 20-60 minutes.


The smaller indexes recover much faster (less than 5 minutes). 
Should we have not used so many replicas with HDFS?  Is there a 
better way we should have built the solr6 cluster?


Thank you for any insight!

-Joe




---
This email has been checked for viruses by AVG.
http://www.avg.com







Re: Recovery Issue - Solr 6.6.1 and HDFS

2017-11-21 Thread Joe Obernberger
We set the hard commit time long because we were having performance 
issues with HDFS, and thought that since the block size is 128M, having 
a longer hard commit made sense.  That was our hypothesis anyway.  Happy 
to switch it back and see what happens.


I don't know what caused the cluster to go into recovery in the first 
place.  We had a server die over the weekend, but it's just one out of 
~50.  Every shard is 3x replicated (and 3x replicated in HDFS...so 9 
copies).  It was at this point that we noticed lots of network activity, 
and most of the shards in this recovery, fail, retry loop.  That is when 
we decided to shut it down resulting in zombie lock files.


I tried using the FORCELEADER call, which completed, but doesn't seem to 
have any effect on the shards that have no leader.  Kinda out of ideas 
for that problem.  If I can get the cluster back up, I'll try a lower 
hard commit time.  Thanks again Erick!


-Joe


On 11/21/2017 2:00 PM, Erick Erickson wrote:

Frankly with HDFS I'm a bit out of my depth so listen to Hendrik ;)...

I need to back up a bit. Once nodes are in this state it's not
surprising that they need to be forcefully killed. I was more thinking
about how they got in this situation in the first place. _Before_ you
get into the nasty state how are the Solr nodes shut down? Forcefully?

Your hard commit is far longer than it needs to be, resulting in much
larger tlog files etc. I usually set this at 15-60 seconds with local
disks, not quite sure whether longer intervals are helpful on HDFS.
What this means is that you can spend up to 30 minutes when you
restart solr _replaying the tlogs_! If Solr is killed, it may not have
had a chance to fsync the segments and may have to replay on startup.
If you have openSearcher set to false, the hard commit operation is
not horribly expensive, it just fsync's the current segments and opens
new ones. It won't be a total cure, but I bet reducing this interval
would help a lot.

Also, if you stop indexing there's no need to wait 30 minutes if you
issue a manual commit, something like
.../collection/update?commit=true. Just reducing the hard commit
interval will make the wait between stopping indexing and restarting
shorter all by itself if you don't want to issue the manual commit.

Best,
Erick

On Tue, Nov 21, 2017 at 10:34 AM, Hendrik Haddorp
 wrote:

Hi,

the write.lock issue I see as well when Solr is not been stopped gracefully.
The write.lock files are then left in the HDFS as they do not get removed
automatically when the client disconnects like a ephemeral node in
ZooKeeper. Unfortunately Solr does also not realize that it should be owning
the lock as it is marked in the state stored in ZooKeeper as the owner and
is also not willing to retry, which is why you need to restart the whole
Solr instance after the cleanup. I added some logic to my Solr start up
script which scans the log files in HDFS and compares that with the state in
ZooKeeper and then delete all lock files that belong to the node that I'm
starting.

regards,
Hendrik


On 21.11.2017 14:07, Joe Obernberger wrote:

Hi All - we have a system with 45 physical boxes running solr 6.6.1 using
HDFS as the index.  The current index size is about 31TBytes. With 3x
replication that takes up 93TBytes of disk. Our main collection is split
across 100 shards with 3 replicas each.  The issue that we're running into
is when restarting the solr6 cluster.  The shards go into recovery and start
to utilize nearly all of their network interfaces.  If we start too many of
the nodes at once, the shards will go into a recovery, fail, and retry loop
and never come up.  The errors are related to HDFS not responding fast
enough and warnings from the DFSClient.  If we stop a node when this is
happening, the script will force a stop (180 second timeout) and upon
restart, we have lock files (write.lock) inside of HDFS.

The process at this point is to start one node, find out the lock files,
wait for it to come up completely (hours), stop it, delete the write.lock
files, and restart.  Usually this second restart is faster, but it still can
take 20-60 minutes.

The smaller indexes recover much faster (less than 5 minutes). Should we
have not used so many replicas with HDFS?  Is there a better way we should
have built the solr6 cluster?

Thank you for any insight!

-Joe


---
This email has been checked for viruses by AVG.
http://www.avg.com





NPE in modifyCollection

2017-11-21 Thread Nate Dire
Hi,

I'm trying to set a replica placement rule on an existing collection
and getting a NPE.  It looks like the update code is assuming there's
a current value.

Collection: highspot_test operation: modifycollection
failed:java.lang.NullPointerException
at 
org.apache.solr.cloud.OverseerCollectionMessageHandler.modifyCollection(OverseerCollectionMessageHandler.java:677)

if (!updateKey.equals(ZkStateReader.COLLECTION_PROP)
&& !updateKey.equals(Overseer.QUEUE_OPERATION)
>&& !collection.get(updateKey).equals(updateEntry.getValue())){
  areChangesVisible = false;
  break;
}

I'm on 6.5.1, but the code looks the same in head.

I didn't see anything related in Jira; is this a new ticket?

Thanks,

Nate
--
Nate Dire
Software Engineer
Highspot


tokenstream reusable

2017-11-21 Thread Roxana Danger
Hello all,

I would like to reuse the tokenstream generated in one field, to create a
new tokenstream for another field without executing again the whole
analysis.

The particulate application is:
- I have field *tokens* with an analyzer that generate the tokens (and
maintains the token type attributes)
- I would like to have two new fields: *verbs* and *adjectives*. This
should reuse the token stream generated for the field *tokens* and filter
the verbs and adjectives to add them to the respective fields.

Is this feasible? How should it be implemented?

Many thanks.
Roxana


Re: Recovery Issue - Solr 6.6.1 and HDFS

2017-11-21 Thread Erick Erickson
Frankly with HDFS I'm a bit out of my depth so listen to Hendrik ;)...

I need to back up a bit. Once nodes are in this state it's not
surprising that they need to be forcefully killed. I was more thinking
about how they got in this situation in the first place. _Before_ you
get into the nasty state how are the Solr nodes shut down? Forcefully?

Your hard commit is far longer than it needs to be, resulting in much
larger tlog files etc. I usually set this at 15-60 seconds with local
disks, not quite sure whether longer intervals are helpful on HDFS.
What this means is that you can spend up to 30 minutes when you
restart solr _replaying the tlogs_! If Solr is killed, it may not have
had a chance to fsync the segments and may have to replay on startup.
If you have openSearcher set to false, the hard commit operation is
not horribly expensive, it just fsync's the current segments and opens
new ones. It won't be a total cure, but I bet reducing this interval
would help a lot.

Also, if you stop indexing there's no need to wait 30 minutes if you
issue a manual commit, something like
.../collection/update?commit=true. Just reducing the hard commit
interval will make the wait between stopping indexing and restarting
shorter all by itself if you don't want to issue the manual commit.

Best,
Erick

On Tue, Nov 21, 2017 at 10:34 AM, Hendrik Haddorp
 wrote:
> Hi,
>
> the write.lock issue I see as well when Solr is not been stopped gracefully.
> The write.lock files are then left in the HDFS as they do not get removed
> automatically when the client disconnects like a ephemeral node in
> ZooKeeper. Unfortunately Solr does also not realize that it should be owning
> the lock as it is marked in the state stored in ZooKeeper as the owner and
> is also not willing to retry, which is why you need to restart the whole
> Solr instance after the cleanup. I added some logic to my Solr start up
> script which scans the log files in HDFS and compares that with the state in
> ZooKeeper and then delete all lock files that belong to the node that I'm
> starting.
>
> regards,
> Hendrik
>
>
> On 21.11.2017 14:07, Joe Obernberger wrote:
>>
>> Hi All - we have a system with 45 physical boxes running solr 6.6.1 using
>> HDFS as the index.  The current index size is about 31TBytes. With 3x
>> replication that takes up 93TBytes of disk. Our main collection is split
>> across 100 shards with 3 replicas each.  The issue that we're running into
>> is when restarting the solr6 cluster.  The shards go into recovery and start
>> to utilize nearly all of their network interfaces.  If we start too many of
>> the nodes at once, the shards will go into a recovery, fail, and retry loop
>> and never come up.  The errors are related to HDFS not responding fast
>> enough and warnings from the DFSClient.  If we stop a node when this is
>> happening, the script will force a stop (180 second timeout) and upon
>> restart, we have lock files (write.lock) inside of HDFS.
>>
>> The process at this point is to start one node, find out the lock files,
>> wait for it to come up completely (hours), stop it, delete the write.lock
>> files, and restart.  Usually this second restart is faster, but it still can
>> take 20-60 minutes.
>>
>> The smaller indexes recover much faster (less than 5 minutes). Should we
>> have not used so many replicas with HDFS?  Is there a better way we should
>> have built the solr6 cluster?
>>
>> Thank you for any insight!
>>
>> -Joe
>>
>


Re: Recovery Issue - Solr 6.6.1 and HDFS

2017-11-21 Thread Joe Obernberger
A clever idea.  Normally what we do when we need to do a restart, is to 
halt indexing, and then wait about 30 minutes.  If we do not wait, and 
stop the cluster, the default scripts 180 second timeout is not enough 
and we'll have lock files to clean up.  We use puppet to start and stop 
the nodes, but at this point that is not working well since we need to 
start one node at a time.  With each one taking hours, this is a lengthy 
process!  I'd love to see your script!


This new error is now coming up - see screen shot.  For some reason some 
of the shards have no leader assigned:


http://lovehorsepower.com/SolrClusterErrors.jpg

-Joe


On 11/21/2017 1:34 PM, Hendrik Haddorp wrote:

Hi,

the write.lock issue I see as well when Solr is not been stopped 
gracefully. The write.lock files are then left in the HDFS as they do 
not get removed automatically when the client disconnects like a 
ephemeral node in ZooKeeper. Unfortunately Solr does also not realize 
that it should be owning the lock as it is marked in the state stored 
in ZooKeeper as the owner and is also not willing to retry, which is 
why you need to restart the whole Solr instance after the cleanup. I 
added some logic to my Solr start up script which scans the log files 
in HDFS and compares that with the state in ZooKeeper and then delete 
all lock files that belong to the node that I'm starting.


regards,
Hendrik

On 21.11.2017 14:07, Joe Obernberger wrote:
Hi All - we have a system with 45 physical boxes running solr 6.6.1 
using HDFS as the index.  The current index size is about 31TBytes. 
With 3x replication that takes up 93TBytes of disk. Our main 
collection is split across 100 shards with 3 replicas each.  The 
issue that we're running into is when restarting the solr6 cluster.  
The shards go into recovery and start to utilize nearly all of their 
network interfaces.  If we start too many of the nodes at once, the 
shards will go into a recovery, fail, and retry loop and never come 
up.  The errors are related to HDFS not responding fast enough and 
warnings from the DFSClient.  If we stop a node when this is 
happening, the script will force a stop (180 second timeout) and upon 
restart, we have lock files (write.lock) inside of HDFS.


The process at this point is to start one node, find out the lock 
files, wait for it to come up completely (hours), stop it, delete the 
write.lock files, and restart.  Usually this second restart is 
faster, but it still can take 20-60 minutes.


The smaller indexes recover much faster (less than 5 minutes). Should 
we have not used so many replicas with HDFS?  Is there a better way 
we should have built the solr6 cluster?


Thank you for any insight!

-Joe




---
This email has been checked for viruses by AVG.
http://www.avg.com





Re: Recovery Issue - Solr 6.6.1 and HDFS

2017-11-21 Thread Hendrik Haddorp

Hi,

the write.lock issue I see as well when Solr is not been stopped 
gracefully. The write.lock files are then left in the HDFS as they do 
not get removed automatically when the client disconnects like a 
ephemeral node in ZooKeeper. Unfortunately Solr does also not realize 
that it should be owning the lock as it is marked in the state stored in 
ZooKeeper as the owner and is also not willing to retry, which is why 
you need to restart the whole Solr instance after the cleanup. I added 
some logic to my Solr start up script which scans the log files in HDFS 
and compares that with the state in ZooKeeper and then delete all lock 
files that belong to the node that I'm starting.


regards,
Hendrik

On 21.11.2017 14:07, Joe Obernberger wrote:
Hi All - we have a system with 45 physical boxes running solr 6.6.1 
using HDFS as the index.  The current index size is about 31TBytes. 
With 3x replication that takes up 93TBytes of disk. Our main 
collection is split across 100 shards with 3 replicas each.  The issue 
that we're running into is when restarting the solr6 cluster.  The 
shards go into recovery and start to utilize nearly all of their 
network interfaces.  If we start too many of the nodes at once, the 
shards will go into a recovery, fail, and retry loop and never come 
up.  The errors are related to HDFS not responding fast enough and 
warnings from the DFSClient.  If we stop a node when this is 
happening, the script will force a stop (180 second timeout) and upon 
restart, we have lock files (write.lock) inside of HDFS.


The process at this point is to start one node, find out the lock 
files, wait for it to come up completely (hours), stop it, delete the 
write.lock files, and restart.  Usually this second restart is faster, 
but it still can take 20-60 minutes.


The smaller indexes recover much faster (less than 5 minutes). Should 
we have not used so many replicas with HDFS?  Is there a better way we 
should have built the solr6 cluster?


Thank you for any insight!

-Joe





Re: Recovery Issue - Solr 6.6.1 and HDFS

2017-11-21 Thread Joe Obernberger
Erick - thank you very much for the reply.  I'm still working through 
restarting the nodes one by one.


I'm stopping the nodes with the script, but yes - they are being killed 
forcefully because they are in this recovery, failed, retry loop.  I 
could increase the timeout, but they never seem to recover.


The largest tlog file that I see currently is 222MBytes. Autocommit is 
set to 180 and autoSoftCommit is set to 12. Errors when they are 
in the long recovery are things like:


2017-11-20 21:41:29.755 ERROR 
(recoveryExecutor-3-thread-4-processing-n:frodo:9100_solr 
x:UNCLASS_shard37_replica1 s:shard37 c:UNCLASS r:core_node196) 
[c:UNCLASS s:shard37 r:core_node196 x:UNCLASS_shard37_replica1] 
o.a.s.h.IndexFetcher Error closing file: _8dmn.cfs
org.apache.hadoop.ipc.RemoteException(java.io.IOException): File 
/solr6.6.0/UNCLASS/core_node196/data/index.20171120195705961/_8dmn.cfs 
could only be replicated to 0 nodes instead of minReplication (=1).  
There are 39 datanode(s) running and no node(s) are excluded in this 
operation.
    at 
org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:1716)
    at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:3385)


Complete log is here for one of the shards that was forcefully stopped.
http://lovehorsepower.com/solr.log

As to what is in the logs when it is recovering for several hours, it's 
many WARN messages from the DFSClient such as:
Abandoning 
BP-1714598269-10.2.100.220-1341346046854:blk_4366207808_1103082741732

and
Excluding datanode 
DatanodeInfoWithStorage[172.16.100.229:50010,DS-5985e40d-830a-44e7-a2ea-fc60bebabc30,DISK]


or from the IndexFetcher:

File _a96y.cfe did not match. expected checksum is 3502268220 and actual 
is checksum 2563579651. expected length is 405 and actual length is 405


Unfortunately, I'm not getting errors from some of the nodes (still 
working through restarting them) about zookeeper:


org.apache.solr.common.SolrException: Could not get leader props
    at 
org.apache.solr.cloud.ZkController.getLeaderProps(ZkController.java:1043)
    at 
org.apache.solr.cloud.ZkController.getLeaderProps(ZkController.java:1007)

    at org.apache.solr.cloud.ZkController.getLeader(ZkController.java:963)
    at org.apache.solr.cloud.ZkController.register(ZkController.java:902)
    at org.apache.solr.cloud.ZkController.register(ZkController.java:846)
    at 
org.apache.solr.core.ZkContainer.lambda$registerInZk$0(ZkContainer.java:181)
    at 
org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:229)
    at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)

    at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.zookeeper.KeeperException$NoNodeException: 
KeeperErrorCode = NoNode for /collections/UNCLASS/leaders/shard21/leader
    at 
org.apache.zookeeper.KeeperException.create(KeeperException.java:111)

    at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
    at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1151)
    at 
org.apache.solr.common.cloud.SolrZkClient$7.execute(SolrZkClient.java:357)
    at 
org.apache.solr.common.cloud.SolrZkClient$7.execute(SolrZkClient.java:354)
    at 
org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkCmdExecutor.java:60)
    at 
org.apache.solr.common.cloud.SolrZkClient.getData(SolrZkClient.java:354)
    at 
org.apache.solr.cloud.ZkController.getLeaderProps(ZkController.java:1021)


Any idea what those could be?  Those shards are not coming back up. 
Sorry so many questions!


-Joe

On 11/21/2017 12:11 PM, Erick Erickson wrote:

How are you stopping Solr? Nodes should not go into recovery on
startup unless Solr was killed un-gracefully (i.e. kill -9 or the
like). If you use the bin/solr script to stop Solr and see a message
about "killing XXX forcefully" then you can lengthen out the time the
script waits for shutdown (there's a sysvar you can set, look in the
script).

Actually I'll correct myself a bit. Shards _do_ go into recovery but
it should be very short in the graceful shutdown case. Basically
shards temporarily go into recovery as part of normal processing just
long enough to see there's no recovery necessary, but that should be
measured in a few seconds.

What it sounds like from this "The shards go into recovery and start
to utilize nearly all of their network" is that your nodes go into
"full recovery" where the entire index is copied down because the
replica thinks it's "too far" out of date. That indicates something
weird about the state when the Solr nodes stopped.

wild-shot-in-the-dark question. How big are your tlogs? If you don't
hard commit very often, the tlogs can replay at startup for a very
long time.

This makes no sense to me, I'm surely missing something:

The 

Possible to disable SynonymQuery and get legacy behavior?

2017-11-21 Thread Doug Turnbull
We help clients that perform index-time semantic expansion to hypernyms at
index time. For example, they will have a synonyms file that does the
following

wing_tips => wing_tips, dress_shoes, shoes
dress_shoes => dress_shoes, shoes
oxfords => oxfords, dress_shoes, shoes

Then at query time, we rely on differing IDF of these terms in the same
position to bring up the rare, specific terms matches, followed by
increasingly semantically broad matches. For example, Previously, a search
for wing_tips would get turned into "wing_tips OR dress_shoes OR shoes".
Shoes being very common would get scored lowest. Wing tips being very
specific would get scored very highly

( I have a blog post about this (which uses Elasticsearch)
http://opensourceconnections.com/blog/2016/12/23/elasticsearch-synonyms-patterns-taxonomies/
 )

As our clients upgrade to Solr 6 and above, we're noticing our technique no
longer works due to SynonymQuery, which blends the doc freq at query time
of synonyms at query time. SynonymQuery seems to be the right direction for
most people :) Still I would like to figure out how/if there's a setting
anywhere to return to the legacy behavior (a boolean query of term queries)
so I don't have to go back to the drawing board for clients that rely on
this technique.

I've been going through QueryBuilder and I don't see where we could go back
to the legacy behavior. It seems to be based on position overlap.

Thanks!
-Doug



-- 
Consultant, OpenSource Connections. Contact info at
http://o19s.com/about-us/doug-turnbull/; Free/Busy (http://bit.ly/dougs_cal)


Re: OutOfMemoryError in 6.5.1

2017-11-21 Thread Shawn Heisey
On 11/21/2017 9:17 AM, Walter Underwood wrote:
> All our customizations are in solr.in.sh. We’re using the one we configured 
> for 6.3.0. I’ll check for any differences between that and the 6.5.1 script.

The order looks correct to me -- the arguments for the OOM killer are
listed *before* the "-jar start.jar" part of the command, so they should
be taking effect.  Take a look at /apps/solr6/bin/oom_solr.sh and make
sure it's marked as executable for the user that Solr is running under,
and that the shebang at the top of the script is correct and executable
as well.

> I don’t see any arguments at all in the dashboard. I do see them in a ps 
> listing, right at the end.

This UI problem is documented/handled in SOLR-11645.  Your argument list
includes "-Dsolr.log.muteconsole" twice, which triggers the problem.

https://issues.apache.org/jira/browse/SOLR-11645

The fix isn't available in a released version yet, but the patch can
easily be applied to a downloaded/installed Solr without compiling
source code.  Your browser will be caching the old version, so you'll
have to deal with that.

> I’m still confused why we are hitting OOM in 6.5.1 but weren’t in 6.3.0. Our 
> load benchmarks use prod logs. We added suggesters, but those use analyzing 
> infix, so they are search indexes, not in-memory.

It can be very difficult to figure out what's causing OOM issues,
especially if the config, index, and queries are identical between one
version without the problem and another version with the problem.  It
sounds like you and Erick have some theories about it.  What is the
exact message on the OOME that you're getting?

Thanks,
Shawn



Re: NullPointerException in PeerSync.handleUpdates

2017-11-21 Thread Erick Erickson
Did you check the JIRA list? Or CHANGES.txt in more recent versions?

On Tue, Nov 21, 2017 at 1:13 AM, S G  wrote:
> Hi,
>
> We are running 6.2 version of Solr and hitting this error frequently.
>
> Error while trying to recover. core=my_core:java.lang.NullPointerException
> at org.apache.solr.update.PeerSync.handleUpdates(PeerSync.java:605)
> at org.apache.solr.update.PeerSync.handleResponse(PeerSync.java:344)
> at org.apache.solr.update.PeerSync.sync(PeerSync.java:257)
> at 
> org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:376)
> at 
> org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:221)
> at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:229)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
>
>
>
> Is this a known issue and fixed in some newer version?
>
>
> Thanks
> SG


Re: OutOfMemoryError in 6.5.1

2017-11-21 Thread Erick Erickson
Walter:

Yeah, I've seen this on occasion. IIRC, the OOM exception will be
specific to running out of stack space, or at least slightly different
than the "standard" OOM error. That would be the "smoking gun" for too
many threads

Erick

On Tue, Nov 21, 2017 at 9:00 AM, Walter Underwood  wrote:
> I do have one theory about the OOM. The server is running out of memory 
> because there are too many threads. Instead of queueing up overload in the 
> load balancer, it is queue in new threads waiting to run. Setting 
> solr.jetty.threads.max to 10,000 guarantees this will happen under overload.
>
> New Relic shows this clearly. CPU hits 100% at 15:40, thread count and load 
> average start climbing. At 15:43, it reaches 3000 threads and starts throwing 
> OOM. After that, the server is in a stable congested state.
>
> I understand why the Jetty thread max was set so high, but I think the cure 
> is worse than the disease. We’ll run another load benchmark with thread max 
> at something realistic, like 200.
>
> wunder
> Walter Underwood
> wun...@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
>
>
>> On Nov 21, 2017, at 8:17 AM, Walter Underwood  wrote:
>>
>> All our customizations are in solr.in.sh. We’re using the one we configured 
>> for 6.3.0. I’ll check for any differences between that and the 6.5.1 script.
>>
>> I don’t see any arguments at all in the dashboard. I do see them in a ps 
>> listing, right at the end.
>>
>> java -server -Xms8g -Xmx8g -XX:+UseG1GC -XX:+ParallelRefProcEnabled 
>> -XX:G1HeapRegionSize=8m -XX:MaxGCPauseMillis=200 -XX:+UseLargePages 
>> -XX:+AggressiveOpts -XX:+HeapDumpOnOutOfMemoryError -verbose:gc 
>> -XX:+PrintHeapAtGC -XX:+PrintGCDetails -XX:+PrintGCDateStamps 
>> -XX:+PrintGCTimeStamps -XX:+PrintTenuringDistribution 
>> -XX:+PrintGCApplicationStoppedTime -Xloggc:/solr/logs/solr_gc.log 
>> -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=9 -XX:GCLogFileSize=20M 
>> -Dcom.sun.management.jmxremote 
>> -Dcom.sun.management.jmxremote.local.only=false 
>> -Dcom.sun.management.jmxremote.ssl=false 
>> -Dcom.sun.management.jmxremote.authenticate=false 
>> -Dcom.sun.management.jmxremote.port=18983 
>> -Dcom.sun.management.jmxremote.rmi.port=18983 
>> -Djava.rmi.server.hostname=new-solr-c01.test3.cloud.cheggnet.com 
>> -DzkClientTimeout=15000 
>> -DzkHost=zookeeper1.test3.cloud.cheggnet.com:2181,zookeeper2.test3.cloud.cheggnet.com:2181,zookeeper3.test3.cloud.cheggnet.com:2181/solr-cloud
>>  -Dsolr.log.level=WARN -Dsolr.log.dir=/solr/logs -Djetty.port=8983 
>> -DSTOP.PORT=7983 -DSTOP.KEY=solrrocks 
>> -Dhost=new-solr-c01.test3.cloud.cheggnet.com -Duser.timezone=UTC 
>> -Djetty.home=/apps/solr6/server -Dsolr.solr.home=/apps/solr6/server/solr 
>> -Dsolr.install.dir=/apps/solr6 -Dgraphite.prefix=solr-cloud.new-solr-c01 
>> -Dgraphite.host=influx.test.cheggnet.com 
>> -javaagent:/apps/solr6/newrelic/newrelic.jar -Dnewrelic.environment=test3 
>> -Dsolr.log.muteconsole -Xss256k -Dsolr.log.muteconsole 
>> -XX:OnOutOfMemoryError=/apps/solr6/bin/oom_solr.sh 8983 /solr/logs -jar 
>> start.jar --module=http
>>
>> I’m still confused why we are hitting OOM in 6.5.1 but weren’t in 6.3.0. Our 
>> load benchmarks use prod logs. We added suggesters, but those use analyzing 
>> infix, so they are search indexes, not in-memory.
>>
>> wunder
>> Walter Underwood
>> wun...@wunderwood.org
>> http://observer.wunderwood.org/  (my blog)
>>
>>
>>> On Nov 21, 2017, at 5:46 AM, Shawn Heisey  wrote:
>>>
>>> On 11/20/2017 6:17 PM, Walter Underwood wrote:
 When I ran load benchmarks with 6.3.0, an overloaded cluster would get 
 super slow but keep functioning. With 6.5.1, we hit 100% CPU, then start 
 getting OOMs. That is really bad, because it means we need to reboot every 
 node in the cluster.
 Also, the JVM OOM hook isn’t running the process killer (JVM 
 1.8.0_121-b13). Using the G1 collector with the Shawn Heisey settings in 
 an 8G heap.
>>> 
 This is not good behavior in prod. The process goes to the bad place, then 
 we need to wait until someone is paged and kills it manually. Luckily, it 
 usually drops out of the live nodes for each collection and doesn’t take 
 user traffic.
>>>
>>> There was a bug, fixed long before 6.3.0, where the OOM killer script 
>>> wasn't working because the arguments enabling it were in the wrong place.  
>>> It was fixed in 5.5.1 and 6.0.
>>>
>>> https://issues.apache.org/jira/browse/SOLR-8145
>>>
>>> If the scripts that you are using to get Solr started originated with a 
>>> much older version of Solr than you are currently running, maybe you've got 
>>> the arguments in the wrong order.
>>>
>>> Do you see the commandline arguments for the OOM killer (only available on 
>>> *NIX systems, not Windows) on the admin UI dashboard?  If they are properly 
>>> placed, you will see them on the dashboard, but if they aren't properly 
>>> placed, then you won't 

Re: Recovery Issue - Solr 6.6.1 and HDFS

2017-11-21 Thread Erick Erickson
How are you stopping Solr? Nodes should not go into recovery on
startup unless Solr was killed un-gracefully (i.e. kill -9 or the
like). If you use the bin/solr script to stop Solr and see a message
about "killing XXX forcefully" then you can lengthen out the time the
script waits for shutdown (there's a sysvar you can set, look in the
script).

Actually I'll correct myself a bit. Shards _do_ go into recovery but
it should be very short in the graceful shutdown case. Basically
shards temporarily go into recovery as part of normal processing just
long enough to see there's no recovery necessary, but that should be
measured in a few seconds.

What it sounds like from this "The shards go into recovery and start
to utilize nearly all of their network" is that your nodes go into
"full recovery" where the entire index is copied down because the
replica thinks it's "too far" out of date. That indicates something
weird about the state when the Solr nodes stopped.

wild-shot-in-the-dark question. How big are your tlogs? If you don't
hard commit very often, the tlogs can replay at startup for a very
long time.

This makes no sense to me, I'm surely missing something:

The process at this point is to start one node, find out the lock
files, wait for it to come up completely (hours), stop it, delete the
write.lock files, and restart.  Usually this second restart is faster,
but it still can take 20-60 minutes.

When you start one node it may take a few minutes for leader electing
to kick in (the default is 180 seconds) but absent replication it
should just be there. Taking hours totally violates my expectations.
What does Solr _think_ it's doing? What's in the logs at that point?
And if you stop solr gracefully, there shouldn't be a problem with
write.lock.

You could also try increasing the timeouts, and the HDFS directory
factory has some parameters to tweak that are a mystery to me...

All in all, this is behavior that I find mystifying.

Best,
Erick

On Tue, Nov 21, 2017 at 5:07 AM, Joe Obernberger
 wrote:
> Hi All - we have a system with 45 physical boxes running solr 6.6.1 using
> HDFS as the index.  The current index size is about 31TBytes.  With 3x
> replication that takes up 93TBytes of disk. Our main collection is split
> across 100 shards with 3 replicas each.  The issue that we're running into
> is when restarting the solr6 cluster.  The shards go into recovery and start
> to utilize nearly all of their network interfaces.  If we start too many of
> the nodes at once, the shards will go into a recovery, fail, and retry loop
> and never come up.  The errors are related to HDFS not responding fast
> enough and warnings from the DFSClient.  If we stop a node when this is
> happening, the script will force a stop (180 second timeout) and upon
> restart, we have lock files (write.lock) inside of HDFS.
>
> The process at this point is to start one node, find out the lock files,
> wait for it to come up completely (hours), stop it, delete the write.lock
> files, and restart.  Usually this second restart is faster, but it still can
> take 20-60 minutes.
>
> The smaller indexes recover much faster (less than 5 minutes). Should we
> have not used so many replicas with HDFS?  Is there a better way we should
> have built the solr6 cluster?
>
> Thank you for any insight!
>
> -Joe
>


Re: Custom analyzer & frequency

2017-11-21 Thread Erick Erickson
One thing you might do is use the termfreq function to see that it
looks like in the index. Also the schema/analysis page will put terms
in "buckets" by power-of-2 so that might help too.

Best,
Erick

On Tue, Nov 21, 2017 at 7:55 AM, Barbet Alain  wrote:
> You rock, thank you so much for this clear answer, I loose 2 days for
> nothing as I've already the term freq but now I've an answer :-)
> (And yes I check it's the doc freq, not the term freq).
>
> Thank you again !
>
> 2017-11-21 16:34 GMT+01:00 Emir Arnautović :
>> Hi Alain,
>> As explained in prev mail that is doc frequency and each doc is counted 
>> once. I am not sure if Luke can provide you information about overall term 
>> frequency - sum of term frequency of all docs.
>>
>> Regards,
>> Emir
>> --
>> Monitoring - Log Management - Alerting - Anomaly Detection
>> Solr & Elasticsearch Consulting Support Training - http://sematext.com/
>>
>>
>>
>>> On 21 Nov 2017, at 16:30, Barbet Alain  wrote:
>>>
>>> $ cat add_test.sh
>>> DATA='
>>> 
>>>  
>>>666
>>>toto titi tata toto tutu titi
>>>  
>>> 
>>> '
>>> $ sh add_test.sh
>>> 
>>> 
>>> 0>> name="QTime">484
>>> 
>>>
>>>
>>> $ curl 
>>> 'http://localhost:8983/solr/alian_test/terms?terms.fl=titi_txt_fr=index'
>>> 
>>> 
>>> 0>> name="QTime">0>> name="titi_txt_fr">1>> name="titi">11>> name="tutu">1
>>> 
>>>
>>>
>>> So it's not only on Luke Side, it's come from Solr. Does it sound normal ?
>>>
>>> 2017-11-21 11:43 GMT+01:00 Barbet Alain :
 Hi,

 I build a custom analyzer & setup it in solr, but doesn't work as I expect.
 I always get 1 as frequency for each word even if it's present
 multiple time in the text.

 So I try with default analyzer & find same behavior:
 My schema

  

  
  >>> stored="true"/>
  

 alian@yoda:~/solr> cat add_test.sh
 DATA='
 
  
666
toto titi tata toto tutu titi
  
 
 '
 curl -X POST -H 'Content-Type: text/xml'
 'http://localhost:8983/solr/alian_test/update?commit=true'
 --data-binary "$DATA"

 When I test in solr interface / analyze, I find the right behavior
 (find titi & toto 2 times).
 But when I look in solr index with Luke or solr interface / schema,
 the top term always get 1 as frequency. Can someone give me the thing
 I forget ?

 (solr 6.5)

 Thank you !
>>


Re: OutOfMemoryError in 6.5.1

2017-11-21 Thread Walter Underwood
I do have one theory about the OOM. The server is running out of memory because 
there are too many threads. Instead of queueing up overload in the load 
balancer, it is queue in new threads waiting to run. Setting 
solr.jetty.threads.max to 10,000 guarantees this will happen under overload.

New Relic shows this clearly. CPU hits 100% at 15:40, thread count and load 
average start climbing. At 15:43, it reaches 3000 threads and starts throwing 
OOM. After that, the server is in a stable congested state.

I understand why the Jetty thread max was set so high, but I think the cure is 
worse than the disease. We’ll run another load benchmark with thread max at 
something realistic, like 200.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)


> On Nov 21, 2017, at 8:17 AM, Walter Underwood  wrote:
> 
> All our customizations are in solr.in.sh. We’re using the one we configured 
> for 6.3.0. I’ll check for any differences between that and the 6.5.1 script.
> 
> I don’t see any arguments at all in the dashboard. I do see them in a ps 
> listing, right at the end.
> 
> java -server -Xms8g -Xmx8g -XX:+UseG1GC -XX:+ParallelRefProcEnabled 
> -XX:G1HeapRegionSize=8m -XX:MaxGCPauseMillis=200 -XX:+UseLargePages 
> -XX:+AggressiveOpts -XX:+HeapDumpOnOutOfMemoryError -verbose:gc 
> -XX:+PrintHeapAtGC -XX:+PrintGCDetails -XX:+PrintGCDateStamps 
> -XX:+PrintGCTimeStamps -XX:+PrintTenuringDistribution 
> -XX:+PrintGCApplicationStoppedTime -Xloggc:/solr/logs/solr_gc.log 
> -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=9 -XX:GCLogFileSize=20M 
> -Dcom.sun.management.jmxremote 
> -Dcom.sun.management.jmxremote.local.only=false 
> -Dcom.sun.management.jmxremote.ssl=false 
> -Dcom.sun.management.jmxremote.authenticate=false 
> -Dcom.sun.management.jmxremote.port=18983 
> -Dcom.sun.management.jmxremote.rmi.port=18983 
> -Djava.rmi.server.hostname=new-solr-c01.test3.cloud.cheggnet.com 
> -DzkClientTimeout=15000 
> -DzkHost=zookeeper1.test3.cloud.cheggnet.com:2181,zookeeper2.test3.cloud.cheggnet.com:2181,zookeeper3.test3.cloud.cheggnet.com:2181/solr-cloud
>  -Dsolr.log.level=WARN -Dsolr.log.dir=/solr/logs -Djetty.port=8983 
> -DSTOP.PORT=7983 -DSTOP.KEY=solrrocks 
> -Dhost=new-solr-c01.test3.cloud.cheggnet.com -Duser.timezone=UTC 
> -Djetty.home=/apps/solr6/server -Dsolr.solr.home=/apps/solr6/server/solr 
> -Dsolr.install.dir=/apps/solr6 -Dgraphite.prefix=solr-cloud.new-solr-c01 
> -Dgraphite.host=influx.test.cheggnet.com 
> -javaagent:/apps/solr6/newrelic/newrelic.jar -Dnewrelic.environment=test3 
> -Dsolr.log.muteconsole -Xss256k -Dsolr.log.muteconsole 
> -XX:OnOutOfMemoryError=/apps/solr6/bin/oom_solr.sh 8983 /solr/logs -jar 
> start.jar --module=http
> 
> I’m still confused why we are hitting OOM in 6.5.1 but weren’t in 6.3.0. Our 
> load benchmarks use prod logs. We added suggesters, but those use analyzing 
> infix, so they are search indexes, not in-memory.
> 
> wunder
> Walter Underwood
> wun...@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
> 
> 
>> On Nov 21, 2017, at 5:46 AM, Shawn Heisey  wrote:
>> 
>> On 11/20/2017 6:17 PM, Walter Underwood wrote:
>>> When I ran load benchmarks with 6.3.0, an overloaded cluster would get 
>>> super slow but keep functioning. With 6.5.1, we hit 100% CPU, then start 
>>> getting OOMs. That is really bad, because it means we need to reboot every 
>>> node in the cluster.
>>> Also, the JVM OOM hook isn’t running the process killer (JVM 
>>> 1.8.0_121-b13). Using the G1 collector with the Shawn Heisey settings in an 
>>> 8G heap.
>> 
>>> This is not good behavior in prod. The process goes to the bad place, then 
>>> we need to wait until someone is paged and kills it manually. Luckily, it 
>>> usually drops out of the live nodes for each collection and doesn’t take 
>>> user traffic.
>> 
>> There was a bug, fixed long before 6.3.0, where the OOM killer script wasn't 
>> working because the arguments enabling it were in the wrong place.  It was 
>> fixed in 5.5.1 and 6.0.
>> 
>> https://issues.apache.org/jira/browse/SOLR-8145
>> 
>> If the scripts that you are using to get Solr started originated with a much 
>> older version of Solr than you are currently running, maybe you've got the 
>> arguments in the wrong order.
>> 
>> Do you see the commandline arguments for the OOM killer (only available on 
>> *NIX systems, not Windows) on the admin UI dashboard?  If they are properly 
>> placed, you will see them on the dashboard, but if they aren't properly 
>> placed, then you won't see them.  This is what the argument looks like for 
>> one of my Solr installs:
>> 
>> -XX:OnOutOfMemoryError=/opt/solr/bin/oom_solr.sh 8983 /var/solr/logs
>> 
>> Something which you probably already know:  If you're hitting OOM, you need 
>> a larger heap, or you need to adjust the config so it uses less memory.  
>> There are no other ways to "fix" OOM problems.
>> 
>> Thanks,
>> Shawn
> 



Re: OutOfMemoryError in 6.5.1

2017-11-21 Thread Erick Erickson
bq: but those use analyzing infix, so they are search indexes, not in-memory

Sure, but they still can consume heap. Most of the index is MMapped of
course, but there are some control structures, indexes and the like
still kept on the heap.

I suppose not using the suggester would nail it though.

I guess the second thing I'd be interested in is a heap dump of the
two to get a sense of whether something really wonky crept in between
those versions. Certainly nothing intentional that I know of.

Erick

On Tue, Nov 21, 2017 at 8:17 AM, Walter Underwood  wrote:
> All our customizations are in solr.in.sh. We’re using the one we configured 
> for 6.3.0. I’ll check for any differences between that and the 6.5.1 script.
>
> I don’t see any arguments at all in the dashboard. I do see them in a ps 
> listing, right at the end.
>
> java -server -Xms8g -Xmx8g -XX:+UseG1GC -XX:+ParallelRefProcEnabled 
> -XX:G1HeapRegionSize=8m -XX:MaxGCPauseMillis=200 -XX:+UseLargePages 
> -XX:+AggressiveOpts -XX:+HeapDumpOnOutOfMemoryError -verbose:gc 
> -XX:+PrintHeapAtGC -XX:+PrintGCDetails -XX:+PrintGCDateStamps 
> -XX:+PrintGCTimeStamps -XX:+PrintTenuringDistribution 
> -XX:+PrintGCApplicationStoppedTime -Xloggc:/solr/logs/solr_gc.log 
> -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=9 -XX:GCLogFileSize=20M 
> -Dcom.sun.management.jmxremote 
> -Dcom.sun.management.jmxremote.local.only=false 
> -Dcom.sun.management.jmxremote.ssl=false 
> -Dcom.sun.management.jmxremote.authenticate=false 
> -Dcom.sun.management.jmxremote.port=18983 
> -Dcom.sun.management.jmxremote.rmi.port=18983 
> -Djava.rmi.server.hostname=new-solr-c01.test3.cloud.cheggnet.com 
> -DzkClientTimeout=15000 
> -DzkHost=zookeeper1.test3.cloud.cheggnet.com:2181,zookeeper2.test3.cloud.cheggnet.com:2181,zookeeper3.test3.cloud.cheggnet.com:2181/solr-cloud
>  -Dsolr.log.level=WARN -Dsolr.log.dir=/solr/logs -Djetty.port=8983 
> -DSTOP.PORT=7983 -DSTOP.KEY=solrrocks 
> -Dhost=new-solr-c01.test3.cloud.cheggnet.com -Duser.timezone=UTC 
> -Djetty.home=/apps/solr6/server -Dsolr.solr.home=/apps/solr6/server/solr 
> -Dsolr.install.dir=/apps/solr6 -Dgraphite.prefix=solr-cloud.new-solr-c01 
> -Dgraphite.host=influx.test.cheggnet.com 
> -javaagent:/apps/solr6/newrelic/newrelic.jar -Dnewrelic.environment=test3 
> -Dsolr.log.muteconsole -Xss256k -Dsolr.log.muteconsole 
> -XX:OnOutOfMemoryError=/apps/solr6/bin/oom_solr.sh 8983 /solr/logs -jar 
> start.jar --module=http
>
> I’m still confused why we are hitting OOM in 6.5.1 but weren’t in 6.3.0. Our 
> load benchmarks use prod logs. We added suggesters, but those use analyzing 
> infix, so they are search indexes, not in-memory.
>
> wunder
> Walter Underwood
> wun...@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
>
>
>> On Nov 21, 2017, at 5:46 AM, Shawn Heisey  wrote:
>>
>> On 11/20/2017 6:17 PM, Walter Underwood wrote:
>>> When I ran load benchmarks with 6.3.0, an overloaded cluster would get 
>>> super slow but keep functioning. With 6.5.1, we hit 100% CPU, then start 
>>> getting OOMs. That is really bad, because it means we need to reboot every 
>>> node in the cluster.
>>> Also, the JVM OOM hook isn’t running the process killer (JVM 
>>> 1.8.0_121-b13). Using the G1 collector with the Shawn Heisey settings in an 
>>> 8G heap.
>> 
>>> This is not good behavior in prod. The process goes to the bad place, then 
>>> we need to wait until someone is paged and kills it manually. Luckily, it 
>>> usually drops out of the live nodes for each collection and doesn’t take 
>>> user traffic.
>>
>> There was a bug, fixed long before 6.3.0, where the OOM killer script wasn't 
>> working because the arguments enabling it were in the wrong place.  It was 
>> fixed in 5.5.1 and 6.0.
>>
>> https://issues.apache.org/jira/browse/SOLR-8145
>>
>> If the scripts that you are using to get Solr started originated with a much 
>> older version of Solr than you are currently running, maybe you've got the 
>> arguments in the wrong order.
>>
>> Do you see the commandline arguments for the OOM killer (only available on 
>> *NIX systems, not Windows) on the admin UI dashboard?  If they are properly 
>> placed, you will see them on the dashboard, but if they aren't properly 
>> placed, then you won't see them.  This is what the argument looks like for 
>> one of my Solr installs:
>>
>> -XX:OnOutOfMemoryError=/opt/solr/bin/oom_solr.sh 8983 /var/solr/logs
>>
>> Something which you probably already know:  If you're hitting OOM, you need 
>> a larger heap, or you need to adjust the config so it uses less memory.  
>> There are no other ways to "fix" OOM problems.
>>
>> Thanks,
>> Shawn
>


Re: Solr cloud in kubernetes

2017-11-21 Thread Upayavira
We hopefully will switch to Kubernetes/Rancher 2.0 from Rancher
1.x/Docker, soon.

Here are some utilities that we've used as run-once containers to start
everything up:

https://github.com/odoko-devops/solr-utils

Using a single image, run with many different configurations, we have
been able to stand up an entire Solr stack, from scratch, including
ZooKeeper, Solr, solr.xml, config upload, collection creation, replica
creation, content indexing, etc. It is a delight to see when it works.

Upayavira

On Mon, 20 Nov 2017, at 09:30 AM, Björn Häuser wrote:
> Hi Raja,
> 
> we are using solrcloud as a statefulset and every pod has its own storage
> attached to it.
> 
> Thanks
> Björn
> 
> > On 20. Nov 2017, at 05:59, rajasaur  wrote:
> > 
> > Hi Bjorn,
> > 
> > Im trying a similar approach now (to get solrcloud working on kubernetes). I
> > have run Zookeeper as a statefulset, but not running SolrCloud, which is
> > causing an issue when my pods get destroyed and restarted. 
> > I will try running the -h option so that the SOLR_HOST is used when
> > connecting to itself (and to zookeeper).
> > 
> > On another note, how do you store the indexes ? I had an issue with my GCE
> > node (Node NotReady), which had its kubelet to be restarted, but with that,
> > since solrcloud pods were restarted, all the data got wiped out. Just
> > wondering how you have setup your indexes with the solrcloud kubernetes
> > setup.
> > 
> > Thanks
> > Raja
> > 
> > 
> > 
> > 
> > --
> > Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
> 


Re: OutOfMemoryError in 6.5.1

2017-11-21 Thread Walter Underwood
All our customizations are in solr.in.sh. We’re using the one we configured for 
6.3.0. I’ll check for any differences between that and the 6.5.1 script.

I don’t see any arguments at all in the dashboard. I do see them in a ps 
listing, right at the end.

java -server -Xms8g -Xmx8g -XX:+UseG1GC -XX:+ParallelRefProcEnabled 
-XX:G1HeapRegionSize=8m -XX:MaxGCPauseMillis=200 -XX:+UseLargePages 
-XX:+AggressiveOpts -XX:+HeapDumpOnOutOfMemoryError -verbose:gc 
-XX:+PrintHeapAtGC -XX:+PrintGCDetails -XX:+PrintGCDateStamps 
-XX:+PrintGCTimeStamps -XX:+PrintTenuringDistribution 
-XX:+PrintGCApplicationStoppedTime -Xloggc:/solr/logs/solr_gc.log 
-XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=9 -XX:GCLogFileSize=20M 
-Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.local.only=false 
-Dcom.sun.management.jmxremote.ssl=false 
-Dcom.sun.management.jmxremote.authenticate=false 
-Dcom.sun.management.jmxremote.port=18983 
-Dcom.sun.management.jmxremote.rmi.port=18983 
-Djava.rmi.server.hostname=new-solr-c01.test3.cloud.cheggnet.com 
-DzkClientTimeout=15000 
-DzkHost=zookeeper1.test3.cloud.cheggnet.com:2181,zookeeper2.test3.cloud.cheggnet.com:2181,zookeeper3.test3.cloud.cheggnet.com:2181/solr-cloud
 -Dsolr.log.level=WARN -Dsolr.log.dir=/solr/logs -Djetty.port=8983 
-DSTOP.PORT=7983 -DSTOP.KEY=solrrocks 
-Dhost=new-solr-c01.test3.cloud.cheggnet.com -Duser.timezone=UTC 
-Djetty.home=/apps/solr6/server -Dsolr.solr.home=/apps/solr6/server/solr 
-Dsolr.install.dir=/apps/solr6 -Dgraphite.prefix=solr-cloud.new-solr-c01 
-Dgraphite.host=influx.test.cheggnet.com 
-javaagent:/apps/solr6/newrelic/newrelic.jar -Dnewrelic.environment=test3 
-Dsolr.log.muteconsole -Xss256k -Dsolr.log.muteconsole 
-XX:OnOutOfMemoryError=/apps/solr6/bin/oom_solr.sh 8983 /solr/logs -jar 
start.jar --module=http

I’m still confused why we are hitting OOM in 6.5.1 but weren’t in 6.3.0. Our 
load benchmarks use prod logs. We added suggesters, but those use analyzing 
infix, so they are search indexes, not in-memory.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)


> On Nov 21, 2017, at 5:46 AM, Shawn Heisey  wrote:
> 
> On 11/20/2017 6:17 PM, Walter Underwood wrote:
>> When I ran load benchmarks with 6.3.0, an overloaded cluster would get super 
>> slow but keep functioning. With 6.5.1, we hit 100% CPU, then start getting 
>> OOMs. That is really bad, because it means we need to reboot every node in 
>> the cluster.
>> Also, the JVM OOM hook isn’t running the process killer (JVM 1.8.0_121-b13). 
>> Using the G1 collector with the Shawn Heisey settings in an 8G heap.
> 
>> This is not good behavior in prod. The process goes to the bad place, then 
>> we need to wait until someone is paged and kills it manually. Luckily, it 
>> usually drops out of the live nodes for each collection and doesn’t take 
>> user traffic.
> 
> There was a bug, fixed long before 6.3.0, where the OOM killer script wasn't 
> working because the arguments enabling it were in the wrong place.  It was 
> fixed in 5.5.1 and 6.0.
> 
> https://issues.apache.org/jira/browse/SOLR-8145
> 
> If the scripts that you are using to get Solr started originated with a much 
> older version of Solr than you are currently running, maybe you've got the 
> arguments in the wrong order.
> 
> Do you see the commandline arguments for the OOM killer (only available on 
> *NIX systems, not Windows) on the admin UI dashboard?  If they are properly 
> placed, you will see them on the dashboard, but if they aren't properly 
> placed, then you won't see them.  This is what the argument looks like for 
> one of my Solr installs:
> 
> -XX:OnOutOfMemoryError=/opt/solr/bin/oom_solr.sh 8983 /var/solr/logs
> 
> Something which you probably already know:  If you're hitting OOM, you need a 
> larger heap, or you need to adjust the config so it uses less memory.  There 
> are no other ways to "fix" OOM problems.
> 
> Thanks,
> Shawn



Re: Merging of index in Solr

2017-11-21 Thread Zheng Lin Edwin Yeo
I am using the IndexMergeTool from Solr, from the command below:

java -classpath lucene-core-6.5.1.jar;lucene-misc-6.5.1.jar
org.apache.lucene.misc.IndexMergeTool

The heap size is 32GB. There are more than 20 million documents in the two
cores.

Regards,
Edwin



On 21 November 2017 at 21:54, Shawn Heisey  wrote:

> On 11/20/2017 9:35 AM, Zheng Lin Edwin Yeo wrote:
>
>> Does anyone knows how long usually the merging in Solr will take?
>>
>> I am currently merging about 3.5TB of data, and it has been running for
>> more than 28 hours and it is not completed yet. The merging is running on
>> SSD disk.
>>
>
> The following will apply if you mean Solr's "optimize" feature when you
> say "merging".
>
> In my experience, merging proceeds at about 20 to 30 megabytes per second
> -- even if the disks are capable of far faster data transfer.  Merging is
> not just copying the data. Lucene is completely rebuilding very large data
> structures, and *not* including data from deleted documents as it does so.
> It takes a lot of CPU power and time.
>
> If we average the data rates I've seen to 25, then that would indicate
> that an optimize on a 3.5TB is going to take about 39 hours, and might take
> as long as 48 hours.  And if you're running SolrCloud with multiple
> replicas, multiply that by the number of copies of the 3.5TB index.  An
> optimize on a SolrCloud collection handles one shard replica at a time and
> works its way through the entire collection.
>
> If you are merging different indexes *together*, which a later message
> seems to state, then the actual Lucene operation is probably nearly
> identical, but I'm not really familiar with it, so I cannot say for sure.
>
> Thanks,
> Shawn
>
>


Re: Custom analyzer & frequency

2017-11-21 Thread Barbet Alain
You rock, thank you so much for this clear answer, I loose 2 days for
nothing as I've already the term freq but now I've an answer :-)
(And yes I check it's the doc freq, not the term freq).

Thank you again !

2017-11-21 16:34 GMT+01:00 Emir Arnautović :
> Hi Alain,
> As explained in prev mail that is doc frequency and each doc is counted once. 
> I am not sure if Luke can provide you information about overall term 
> frequency - sum of term frequency of all docs.
>
> Regards,
> Emir
> --
> Monitoring - Log Management - Alerting - Anomaly Detection
> Solr & Elasticsearch Consulting Support Training - http://sematext.com/
>
>
>
>> On 21 Nov 2017, at 16:30, Barbet Alain  wrote:
>>
>> $ cat add_test.sh
>> DATA='
>> 
>>  
>>666
>>toto titi tata toto tutu titi
>>  
>> 
>> '
>> $ sh add_test.sh
>> 
>> 
>> 0> name="QTime">484
>> 
>>
>>
>> $ curl 
>> 'http://localhost:8983/solr/alian_test/terms?terms.fl=titi_txt_fr=index'
>> 
>> 
>> 0> name="QTime">0> name="titi_txt_fr">1> name="titi">11> name="tutu">1
>> 
>>
>>
>> So it's not only on Luke Side, it's come from Solr. Does it sound normal ?
>>
>> 2017-11-21 11:43 GMT+01:00 Barbet Alain :
>>> Hi,
>>>
>>> I build a custom analyzer & setup it in solr, but doesn't work as I expect.
>>> I always get 1 as frequency for each word even if it's present
>>> multiple time in the text.
>>>
>>> So I try with default analyzer & find same behavior:
>>> My schema
>>>
>>>  
>>>
>>>  
>>>  >> stored="true"/>
>>>  
>>>
>>> alian@yoda:~/solr> cat add_test.sh
>>> DATA='
>>> 
>>>  
>>>666
>>>toto titi tata toto tutu titi
>>>  
>>> 
>>> '
>>> curl -X POST -H 'Content-Type: text/xml'
>>> 'http://localhost:8983/solr/alian_test/update?commit=true'
>>> --data-binary "$DATA"
>>>
>>> When I test in solr interface / analyze, I find the right behavior
>>> (find titi & toto 2 times).
>>> But when I look in solr index with Luke or solr interface / schema,
>>> the top term always get 1 as frequency. Can someone give me the thing
>>> I forget ?
>>>
>>> (solr 6.5)
>>>
>>> Thank you !
>


Re: Custom analyzer & frequency

2017-11-21 Thread Emir Arnautović
Hi Alain,
As explained in prev mail that is doc frequency and each doc is counted once. I 
am not sure if Luke can provide you information about overall term frequency - 
sum of term frequency of all docs.

Regards,
Emir
--
Monitoring - Log Management - Alerting - Anomaly Detection
Solr & Elasticsearch Consulting Support Training - http://sematext.com/



> On 21 Nov 2017, at 16:30, Barbet Alain  wrote:
> 
> $ cat add_test.sh
> DATA='
> 
>  
>666
>toto titi tata toto tutu titi
>  
> 
> '
> $ sh add_test.sh
> 
> 
> 0 name="QTime">484
> 
> 
> 
> $ curl 
> 'http://localhost:8983/solr/alian_test/terms?terms.fl=titi_txt_fr=index'
> 
> 
> 0 name="QTime">0 name="titi_txt_fr">1 name="titi">11 name="tutu">1
> 
> 
> 
> So it's not only on Luke Side, it's come from Solr. Does it sound normal ?
> 
> 2017-11-21 11:43 GMT+01:00 Barbet Alain :
>> Hi,
>> 
>> I build a custom analyzer & setup it in solr, but doesn't work as I expect.
>> I always get 1 as frequency for each word even if it's present
>> multiple time in the text.
>> 
>> So I try with default analyzer & find same behavior:
>> My schema
>> 
>>  
>>
>>  
>>  > stored="true"/>
>>  
>> 
>> alian@yoda:~/solr> cat add_test.sh
>> DATA='
>> 
>>  
>>666
>>toto titi tata toto tutu titi
>>  
>> 
>> '
>> curl -X POST -H 'Content-Type: text/xml'
>> 'http://localhost:8983/solr/alian_test/update?commit=true'
>> --data-binary "$DATA"
>> 
>> When I test in solr interface / analyze, I find the right behavior
>> (find titi & toto 2 times).
>> But when I look in solr index with Luke or solr interface / schema,
>> the top term always get 1 as frequency. Can someone give me the thing
>> I forget ?
>> 
>> (solr 6.5)
>> 
>> Thank you !



Re: Custom analyzer & frequency

2017-11-21 Thread Emir Arnautović
Hi Alain,
I haven’t been using Luke UI in a while, but if you are talking about top terms 
for some field, that might be doc freq, not term freq and every doc is counted 
once - that is equivalent to “Load Term Info” in “Schema” in Solr Admin console.

HTH,
Emir
--
Monitoring - Log Management - Alerting - Anomaly Detection
Solr & Elasticsearch Consulting Support Training - http://sematext.com/



> On 21 Nov 2017, at 16:21, Barbet Alain  wrote:
> 
> Thank you very much for your answer.
> 
> It was an error on copy / paste on my mail sorry about that !
> So it was already a text field, so omitTermFrequenciesAndPosition was
> already on “false”
> 
> So I forget my custom analyzer and try to test with an already defined
> field_type (text_fr) and see same behaviour in luke !
> So I look better.
> On Luke when I took term one by one on "Document" tab, I see my
> frequency set to 2.
> But in first panel of Luke "overview", in "show top terms" Freq is
> still at 1 for all values.
> 
> I use Solr 6.5 & Luke 7.1. It didn't see this behavior if I open a
> Lucene base I build outside Solr, I see top terms freq same on 2
> panels.
> Do you know a reason for that ?
> Does this have an impact on Solr search ? Does bad freq in "top terms"
> come from Luke or Solr ?
> 
> 
> 2017-11-21 12:08 GMT+01:00 Emir Arnautović :
>> Hi Alain,
>> You did not provided definition of used field type - you use “nametext” type 
>> and pasted “text_ami” field type. It is possible that you have 
>> omitTermFrequenciesAndPosition=“true” on nametext field type. The default 
>> value for text fields should be false.
>> 
>> HTH,
>> Emir
>> --
>> Monitoring - Log Management - Alerting - Anomaly Detection
>> Solr & Elasticsearch Consulting Support Training - http://sematext.com/
>> 
>> 
>> 
>>> On 21 Nov 2017, at 11:43, Barbet Alain  wrote:
>>> 
>>> Hi,
>>> 
>>> I build a custom analyzer & setup it in solr, but doesn't work as I expect.
>>> I always get 1 as frequency for each word even if it's present
>>> multiple time in the text.
>>> 
>>> So I try with default analyzer & find same behavior:
>>> My schema
>>> 
>>> 
>>>   
>>> 
>>> >> stored="true"/>
>>> 
>>> 
>>> alian@yoda:~/solr> cat add_test.sh
>>> DATA='
>>> 
>>> 
>>>   666
>>>   toto titi tata toto tutu titi
>>> 
>>> 
>>> '
>>> curl -X POST -H 'Content-Type: text/xml'
>>> 'http://localhost:8983/solr/alian_test/update?commit=true'
>>> --data-binary "$DATA"
>>> 
>>> When I test in solr interface / analyze, I find the right behavior
>>> (find titi & toto 2 times).
>>> But when I look in solr index with Luke or solr interface / schema,
>>> the top term always get 1 as frequency. Can someone give me the thing
>>> I forget ?
>>> 
>>> (solr 6.5)
>>> 
>>> Thank you !
>> 



Re: Custom analyzer & frequency

2017-11-21 Thread Barbet Alain
$ cat add_test.sh
DATA='

  
666
toto titi tata toto tutu titi
  

'
$ sh add_test.sh


0484



$ curl 
'http://localhost:8983/solr/alian_test/terms?terms.fl=titi_txt_fr=index'


00



So it's not only on Luke Side, it's come from Solr. Does it sound normal ?

2017-11-21 11:43 GMT+01:00 Barbet Alain :
> Hi,
>
> I build a custom analyzer & setup it in solr, but doesn't work as I expect.
> I always get 1 as frequency for each word even if it's present
> multiple time in the text.
>
> So I try with default analyzer & find same behavior:
> My schema
>
>   
> 
>   
>stored="true"/>
>   
>
> alian@yoda:~/solr> cat add_test.sh
> DATA='
> 
>   
> 666
> toto titi tata toto tutu titi
>   
> 
> '
> curl -X POST -H 'Content-Type: text/xml'
> 'http://localhost:8983/solr/alian_test/update?commit=true'
> --data-binary "$DATA"
>
> When I test in solr interface / analyze, I find the right behavior
> (find titi & toto 2 times).
> But when I look in solr index with Luke or solr interface / schema,
> the top term always get 1 as frequency. Can someone give me the thing
> I forget ?
>
> (solr 6.5)
>
> Thank you !


Re: Custom analyzer & frequency

2017-11-21 Thread Barbet Alain
Thank you very much for your answer.

It was an error on copy / paste on my mail sorry about that !
So it was already a text field, so omitTermFrequenciesAndPosition was
already on “false”

So I forget my custom analyzer and try to test with an already defined
field_type (text_fr) and see same behaviour in luke !
So I look better.
On Luke when I took term one by one on "Document" tab, I see my
frequency set to 2.
But in first panel of Luke "overview", in "show top terms" Freq is
still at 1 for all values.

I use Solr 6.5 & Luke 7.1. It didn't see this behavior if I open a
Lucene base I build outside Solr, I see top terms freq same on 2
panels.
Do you know a reason for that ?
Does this have an impact on Solr search ? Does bad freq in "top terms"
come from Luke or Solr ?


2017-11-21 12:08 GMT+01:00 Emir Arnautović :
> Hi Alain,
> You did not provided definition of used field type - you use “nametext” type 
> and pasted “text_ami” field type. It is possible that you have 
> omitTermFrequenciesAndPosition=“true” on nametext field type. The default 
> value for text fields should be false.
>
> HTH,
> Emir
> --
> Monitoring - Log Management - Alerting - Anomaly Detection
> Solr & Elasticsearch Consulting Support Training - http://sematext.com/
>
>
>
>> On 21 Nov 2017, at 11:43, Barbet Alain  wrote:
>>
>> Hi,
>>
>> I build a custom analyzer & setup it in solr, but doesn't work as I expect.
>> I always get 1 as frequency for each word even if it's present
>> multiple time in the text.
>>
>> So I try with default analyzer & find same behavior:
>> My schema
>>
>>  
>>
>>  
>>  > stored="true"/>
>>  
>>
>> alian@yoda:~/solr> cat add_test.sh
>> DATA='
>> 
>>  
>>666
>>toto titi tata toto tutu titi
>>  
>> 
>> '
>> curl -X POST -H 'Content-Type: text/xml'
>> 'http://localhost:8983/solr/alian_test/update?commit=true'
>> --data-binary "$DATA"
>>
>> When I test in solr interface / analyze, I find the right behavior
>> (find titi & toto 2 times).
>> But when I look in solr index with Luke or solr interface / schema,
>> the top term always get 1 as frequency. Can someone give me the thing
>> I forget ?
>>
>> (solr 6.5)
>>
>> Thank you !
>


Re: Merging of index in Solr

2017-11-21 Thread Shawn Heisey

On 11/20/2017 9:35 AM, Zheng Lin Edwin Yeo wrote:

Does anyone knows how long usually the merging in Solr will take?

I am currently merging about 3.5TB of data, and it has been running for
more than 28 hours and it is not completed yet. The merging is running on
SSD disk.


The following will apply if you mean Solr's "optimize" feature when you 
say "merging".


In my experience, merging proceeds at about 20 to 30 megabytes per 
second -- even if the disks are capable of far faster data transfer.  
Merging is not just copying the data. Lucene is completely rebuilding 
very large data structures, and *not* including data from deleted 
documents as it does so.  It takes a lot of CPU power and time.


If we average the data rates I've seen to 25, then that would indicate 
that an optimize on a 3.5TB is going to take about 39 hours, and might 
take as long as 48 hours.  And if you're running SolrCloud with multiple 
replicas, multiply that by the number of copies of the 3.5TB index.  An 
optimize on a SolrCloud collection handles one shard replica at a time 
and works its way through the entire collection.


If you are merging different indexes *together*, which a later message 
seems to state, then the actual Lucene operation is probably nearly 
identical, but I'm not really familiar with it, so I cannot say for sure.


Thanks,
Shawn



Re: Merging of index in Solr

2017-11-21 Thread Emir Arnautović
Hi Edwin,
I’ll let somebody with more knowledge about merge to comment merge aspects.
What do you use to merge those cores - merge tool or you run it using Solr’s 
core API? What is the heap size? How many documents are in those two cores?

Regards,
Emir
--
Monitoring - Log Management - Alerting - Anomaly Detection
Solr & Elasticsearch Consulting Support Training - http://sematext.com/



> On 21 Nov 2017, at 14:20, Zheng Lin Edwin Yeo  wrote:
> 
> Hi Emir,
> 
> Thanks for your reply.
> 
> There are only 1 host, 1 nodes and 1 shard for these 3.5TB.
> The merging has already written the additional 3.5TB to another segment.
> However, it is still not a single segment, and the size of the folder where
> the merged index is supposed to be is now 4.6TB, This excludes the original
> 3.5TB, meaning it is already using up 8.1TB of space, but the merging is
> still going on.
> 
> The index are currently updates free. We have only index the data in 2
> different collections, and we now need to merge them into a single
> collection.
> 
> Regards,
> Edwin
> 
> On 21 November 2017 at 16:52, Emir Arnautović 
> wrote:
> 
>> Hi Edwin,
>> How many host/nodes/shard are those 3.5TB? I am not familiar with merge
>> code, but trying to think what it might include, so don’t take any of
>> following as ground truth.
>> Merging for sure will include segments rewrite, so you better have
>> additional 3.5TB if you are merging it to a single segment. But that should
>> not last days on SSD. My guess is that you are running on the edge of your
>> heap and doing a lot GCs and maybe you will OOM at some point. I would
>> guess that merging is memory intensive operation and even if not holding
>> large structures in memory, it will probably create a lot of garbage.
>> Merging requires a lot of comparison so it is also a possibility that you
>> are exhausting CPU resources.
>> Bottom line - without more details and some monitoring tool, it is hard to
>> tell why it is taking that much.
>> And there is also a question if merging is good choice in you case - is
>> index static/updates free?
>> 
>> Regards,
>> Emir
>> --
>> Monitoring - Log Management - Alerting - Anomaly Detection
>> Solr & Elasticsearch Consulting Support Training - http://sematext.com/
>> 
>> 
>> 
>>> On 20 Nov 2017, at 17:35, Zheng Lin Edwin Yeo 
>> wrote:
>>> 
>>> Hi,
>>> 
>>> Does anyone knows how long usually the merging in Solr will take?
>>> 
>>> I am currently merging about 3.5TB of data, and it has been running for
>>> more than 28 hours and it is not completed yet. The merging is running on
>>> SSD disk.
>>> 
>>> I am using Solr 6.5.1.
>>> 
>>> Regards,
>>> Edwin
>> 
>> 



Re: OutOfMemoryError in 6.5.1

2017-11-21 Thread Shawn Heisey

On 11/20/2017 6:17 PM, Walter Underwood wrote:

When I ran load benchmarks with 6.3.0, an overloaded cluster would get super 
slow but keep functioning. With 6.5.1, we hit 100% CPU, then start getting 
OOMs. That is really bad, because it means we need to reboot every node in the 
cluster.

Also, the JVM OOM hook isn’t running the process killer (JVM 1.8.0_121-b13). 
Using the G1 collector with the Shawn Heisey settings in an 8G heap.



This is not good behavior in prod. The process goes to the bad place, then we 
need to wait until someone is paged and kills it manually. Luckily, it usually 
drops out of the live nodes for each collection and doesn’t take user traffic.


There was a bug, fixed long before 6.3.0, where the OOM killer script 
wasn't working because the arguments enabling it were in the wrong 
place.  It was fixed in 5.5.1 and 6.0.


https://issues.apache.org/jira/browse/SOLR-8145

If the scripts that you are using to get Solr started originated with a 
much older version of Solr than you are currently running, maybe you've 
got the arguments in the wrong order.


Do you see the commandline arguments for the OOM killer (only available 
on *NIX systems, not Windows) on the admin UI dashboard?  If they are 
properly placed, you will see them on the dashboard, but if they aren't 
properly placed, then you won't see them.  This is what the argument 
looks like for one of my Solr installs:


-XX:OnOutOfMemoryError=/opt/solr/bin/oom_solr.sh 8983 /var/solr/logs

Something which you probably already know:  If you're hitting OOM, you 
need a larger heap, or you need to adjust the config so it uses less 
memory.  There are no other ways to "fix" OOM problems.


Thanks,
Shawn


Re: Merging of index in Solr

2017-11-21 Thread Zheng Lin Edwin Yeo
Hi Emir,

Thanks for your reply.

There are only 1 host, 1 nodes and 1 shard for these 3.5TB.
The merging has already written the additional 3.5TB to another segment.
However, it is still not a single segment, and the size of the folder where
the merged index is supposed to be is now 4.6TB, This excludes the original
3.5TB, meaning it is already using up 8.1TB of space, but the merging is
still going on.

The index are currently updates free. We have only index the data in 2
different collections, and we now need to merge them into a single
collection.

Regards,
Edwin

On 21 November 2017 at 16:52, Emir Arnautović 
wrote:

> Hi Edwin,
> How many host/nodes/shard are those 3.5TB? I am not familiar with merge
> code, but trying to think what it might include, so don’t take any of
> following as ground truth.
> Merging for sure will include segments rewrite, so you better have
> additional 3.5TB if you are merging it to a single segment. But that should
> not last days on SSD. My guess is that you are running on the edge of your
> heap and doing a lot GCs and maybe you will OOM at some point. I would
> guess that merging is memory intensive operation and even if not holding
> large structures in memory, it will probably create a lot of garbage.
> Merging requires a lot of comparison so it is also a possibility that you
> are exhausting CPU resources.
> Bottom line - without more details and some monitoring tool, it is hard to
> tell why it is taking that much.
> And there is also a question if merging is good choice in you case - is
> index static/updates free?
>
> Regards,
> Emir
> --
> Monitoring - Log Management - Alerting - Anomaly Detection
> Solr & Elasticsearch Consulting Support Training - http://sematext.com/
>
>
>
> > On 20 Nov 2017, at 17:35, Zheng Lin Edwin Yeo 
> wrote:
> >
> > Hi,
> >
> > Does anyone knows how long usually the merging in Solr will take?
> >
> > I am currently merging about 3.5TB of data, and it has been running for
> > more than 28 hours and it is not completed yet. The merging is running on
> > SSD disk.
> >
> > I am using Solr 6.5.1.
> >
> > Regards,
> > Edwin
>
>


Recovery Issue - Solr 6.6.1 and HDFS

2017-11-21 Thread Joe Obernberger
Hi All - we have a system with 45 physical boxes running solr 6.6.1 
using HDFS as the index.  The current index size is about 31TBytes.  
With 3x replication that takes up 93TBytes of disk. Our main collection 
is split across 100 shards with 3 replicas each.  The issue that we're 
running into is when restarting the solr6 cluster.  The shards go into 
recovery and start to utilize nearly all of their network interfaces.  
If we start too many of the nodes at once, the shards will go into a 
recovery, fail, and retry loop and never come up.  The errors are 
related to HDFS not responding fast enough and warnings from the 
DFSClient.  If we stop a node when this is happening, the script will 
force a stop (180 second timeout) and upon restart, we have lock files 
(write.lock) inside of HDFS.


The process at this point is to start one node, find out the lock files, 
wait for it to come up completely (hours), stop it, delete the 
write.lock files, and restart.  Usually this second restart is faster, 
but it still can take 20-60 minutes.


The smaller indexes recover much faster (less than 5 minutes). Should we 
have not used so many replicas with HDFS?  Is there a better way we 
should have built the solr6 cluster?


Thank you for any insight!

-Joe



Solr 7.x: Issues with unique()/hll() function on a string field nested in a range facet

2017-11-21 Thread Volodymyr Rudniev
Hello,

I've encountered 2 issues while trying to apply unique()/hll() function to
a string field inside a range facet:

   1. Results are incorrect for a single-valued string field.
   2. I’m getting ArrayIndexOutOfBoundsException for a multi-valued string
   field.


How to reproduce:

   1. Create a core based on the default configSet.
   2. Add several simple documents to the core, like these:

[
  {
"id": "14790",
"int_i": 2010,
"date_dt": "2010-01-01T00:00:00Z",
"string_s": "a",
"string_ss": ["a", "b"]
  },
  {
"id": "12254",
"int_i": 2014,
"date_dt": "2014-01-01T00:00:00Z",
"string_s": "e",
"string_ss": ["b", "c"]
  },
  {
"id": "12937",
"int_i": 2008,
"date_dt": "2008-01-01T00:00:00Z",
"string_s": "c",
"string_ss": ["c", "d"]
  },
  {
"id": "10575",
"int_i": 2008,
"date_dt": "2008-01-01T00:00:00Z",
"string_s": "b",
"string_ss": ["d", "e"]
  },
  {
"id": "13644",
"int_i": 2014,
"date_dt": "2014-01-01T00:00:00Z",
"string_s": "e",
"string_ss": ["e", "a"]
  },
  {
"id": "8405",
"int_i": 2014,
"date_dt": "2014-01-01T00:00:00Z",
"string_s": "d",
"string_ss": ["a", "b"]
  },
  {
"id": "6128",
"int_i": 2008,
"date_dt": "2008-01-01T00:00:00Z",
"string_s": "a",
"string_ss": ["b", "c"]
  },
  {
"id": "5220",
"int_i": 2015,
"date_dt": "2015-01-01T00:00:00Z",
"string_s": "d",
"string_ss": ["c", "d"]
  },
  {
"id": "6850",
"int_i": 2012,
"date_dt": "2012-01-01T00:00:00Z",
"string_s": "b",
"string_ss": ["d", "e"]
  },
  {
"id": "5748",
"int_i": 2014,
"date_dt": "2014-01-01T00:00:00Z",
"string_s": "e",
"string_ss": ["e", "a"]
  }
]

3. Try queries like the following for a single-valued string field:

q=*:*=0={"facet":{"histogram":{"include":"lower,edge","other":"all","field":"int_i","gap":1,"missing":false,"start":2008,"end":2016,"type":"range","facet":{"distinct_count":"unique(string_s)"

q=*:*=0={"facet":{"histogram":{"include":"lower,edge","other":"all","field":"date_dt","gap":"%2B1YEAR","missing":false,"start":"2008-01-01T00:00:00Z","end":"2016-01-01T00:00:00Z","type":"range","facet":{"distinct_count":"unique(string_s)"

Distinct counts returned are incorrect in general. For example, for the set
of documents above, the response will contain:

{
"val": 2010,
"count": 1,
"distinct_count": 0
}

and

"between": {
"count": 10,
"distinct_count": 1
}

(there should be 5 distinct values).

Note, the result depends on the order in which the documents are added.

4. Try queries like the following for a multi-valued string field:

q=*:*=0={"facet":{"histogram":{"include":"lower,edge","other":"all","field":"int_i","gap":1,"missing":false,"start":2008,"end":2016,"type":"range","facet":{"distinct_count":"unique(string_ss)"

q=*:*=0={"facet":{"histogram":{"include":"lower,edge","other":"all","field":"date_dt","gap":"%2B1YEAR","missing":false,"start":"2008-01-01T00:00:00Z","end":"2016-01-01T00:00:00Z","type":"range","facet":{"distinct_count":"unique(string_ss)"

I’m getting ArrayIndexOutOfBoundsException for such queries.

Note, everything looks Ok for other field types (I tried single- and
multi-valued ints, doubles and dates) or when the enclosing facet is a
terms facet or there is no enclosing facet at all.

I can reproduce these issues both for Solr 7.0.1 and 7.1.0. Solr 6.x and
5.x, as it seems, do not have such issues.

Is it a bug? Or, may be, I’ve missed something?

Thanks,

Volodymyr
q=*:*=0={"facet":{"histogram":{"include":"lower,edge","other":"all","field":"date_dt","gap":"%2B1YEAR","missing":false,"start":"2008-01-01T00:00:00Z","end":"2016-01-01T00:00:00Z","type":"range","facet":{"distinct_count":"unique(string_ss)"

docs_1-10.json
Description: application/json
q=*:*=0={"facet":{"histogram":{"include":"lower,edge","other":"all","field":"date_dt","gap":"%2B1YEAR","missing":false,"start":"2008-01-01T00:00:00Z","end":"2016-01-01T00:00:00Z","type":"range","facet":{"distinct_count":"unique(string_s)"q=*:*=0={"facet":{"histogram":{"include":"lower,edge","other":"all","field":"int_i","gap":1,"missing":false,"start":2008,"end":2016,"type":"range","facet":{"distinct_count":"unique(string_ss)"q=*:*=0={"facet":{"histogram":{"include":"lower,edge","other":"all","field":"int_i","gap":1,"missing":false,"start":2008,"end":2016,"type":"range","facet":{"distinct_count":"unique(string_s)"

Re: Issue facing with spell text field containing hyphen

2017-11-21 Thread Atita Arora
I was about to suggest the same , Analysis  Panel is the savior in such
cases of doubts.

-Atita

On Tue, Nov 21, 2017 at 7:26 AM, Rick Leir  wrote:

> Chirag
> Look in Sor Admin, the Analysis panel. Put spider-man in the left and
> right text inputs, and see how it gets analysed. Cheers -- Rick
>
> On November 20, 2017 10:00:49 PM EST, Chirag garg 
> wrote:
> >Hi Rick,
> >
> >Actually my spell field also contains text with hyphen i.e. it contains
> >"spider-man" even then also i am not able to search it.
> >
> >Regards,
> >Chirag
> >
> >
> >
> >--
> >Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>
> --
> Sorry for being brief. Alternate email is rickleir at yahoo dot com
>


Re: Issue facing with spell text field containing hyphen

2017-11-21 Thread Rick Leir
Chirag
Look in Sor Admin, the Analysis panel. Put spider-man in the left and right 
text inputs, and see how it gets analysed. Cheers -- Rick

On November 20, 2017 10:00:49 PM EST, Chirag garg  wrote:
>Hi Rick,
>
>Actually my spell field also contains text with hyphen i.e. it contains
>"spider-man" even then also i am not able to search it.
>
>Regards,
>Chirag
>
>
>
>--
>Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

-- 
Sorry for being brief. Alternate email is rickleir at yahoo dot com 

Re: Please help me with solr plugin

2017-11-21 Thread Binoy Dalal
Zara,
If you're looking for custom search components, request handlers or update
processors, you can check out my github repo with examples here:
https://github.com/bdalal/SolrPluginsExamples/

On Tue, Nov 21, 2017 at 3:58 PM Emir Arnautović <
emir.arnauto...@sematext.com> wrote:

> Hi Zara,
> What sort of plugins are you trying to build? What sort os issues did you
> run into? Maybe you are not too far from having running custom plugin. I
> would recommend you try running some of existing plugins as your own - just
> to make sure that you are able to build and configure custom plugin. After
> that you can concentrate on custom logic.
>
> Regards,
> Emir
> --
> Monitoring - Log Management - Alerting - Anomaly Detection
> Solr & Elasticsearch Consulting Support Training - http://sematext.com/
>
>
>
> > On 21 Nov 2017, at 11:22, Zara Parst  wrote:
> >
> > Hi,
> >
> > I have spent too much time learning plugin for Solr. I am about give up.
> If
> > some one has experience writing it. Please contact me. I am open to all
> > options. I want to learn it at any cost.
> >
> > Thanks
> > Zara
>
> --
Regards,
Binoy Dalal


Re: Custom analyzer & frequency

2017-11-21 Thread Emir Arnautović
Hi Alain,
You did not provided definition of used field type - you use “nametext” type 
and pasted “text_ami” field type. It is possible that you have 
omitTermFrequenciesAndPosition=“true” on nametext field type. The default value 
for text fields should be false.

HTH,
Emir
--
Monitoring - Log Management - Alerting - Anomaly Detection
Solr & Elasticsearch Consulting Support Training - http://sematext.com/



> On 21 Nov 2017, at 11:43, Barbet Alain  wrote:
> 
> Hi,
> 
> I build a custom analyzer & setup it in solr, but doesn't work as I expect.
> I always get 1 as frequency for each word even if it's present
> multiple time in the text.
> 
> So I try with default analyzer & find same behavior:
> My schema
> 
>  
>
>  
>   stored="true"/>
>  
> 
> alian@yoda:~/solr> cat add_test.sh
> DATA='
> 
>  
>666
>toto titi tata toto tutu titi
>  
> 
> '
> curl -X POST -H 'Content-Type: text/xml'
> 'http://localhost:8983/solr/alian_test/update?commit=true'
> --data-binary "$DATA"
> 
> When I test in solr interface / analyze, I find the right behavior
> (find titi & toto 2 times).
> But when I look in solr index with Luke or solr interface / schema,
> the top term always get 1 as frequency. Can someone give me the thing
> I forget ?
> 
> (solr 6.5)
> 
> Thank you !



Custom analyzer & frequency

2017-11-21 Thread Barbet Alain
Hi,

I build a custom analyzer & setup it in solr, but doesn't work as I expect.
I always get 1 as frequency for each word even if it's present
multiple time in the text.

So I try with default analyzer & find same behavior:
My schema

  

  
  
  

alian@yoda:~/solr> cat add_test.sh
DATA='

  
666
toto titi tata toto tutu titi
  

'
curl -X POST -H 'Content-Type: text/xml'
'http://localhost:8983/solr/alian_test/update?commit=true'
--data-binary "$DATA"

When I test in solr interface / analyze, I find the right behavior
(find titi & toto 2 times).
But when I look in solr index with Luke or solr interface / schema,
the top term always get 1 as frequency. Can someone give me the thing
I forget ?

(solr 6.5)

Thank you !


Re: Please help me with solr plugin

2017-11-21 Thread Emir Arnautović
Hi Zara,
What sort of plugins are you trying to build? What sort os issues did you run 
into? Maybe you are not too far from having running custom plugin. I would 
recommend you try running some of existing plugins as your own - just to make 
sure that you are able to build and configure custom plugin. After that you can 
concentrate on custom logic.

Regards,
Emir
--
Monitoring - Log Management - Alerting - Anomaly Detection
Solr & Elasticsearch Consulting Support Training - http://sematext.com/



> On 21 Nov 2017, at 11:22, Zara Parst  wrote:
> 
> Hi,
> 
> I have spent too much time learning plugin for Solr. I am about give up. If
> some one has experience writing it. Please contact me. I am open to all
> options. I want to learn it at any cost.
> 
> Thanks
> Zara



Please help me with solr plugin

2017-11-21 Thread Zara Parst
Hi,

I have spent too much time learning plugin for Solr. I am about give up. If
some one has experience writing it. Please contact me. I am open to all
options. I want to learn it at any cost.

Thanks
Zara


NullPointerException in PeerSync.handleUpdates

2017-11-21 Thread S G
Hi,

We are running 6.2 version of Solr and hitting this error frequently.

Error while trying to recover. core=my_core:java.lang.NullPointerException
at org.apache.solr.update.PeerSync.handleUpdates(PeerSync.java:605)
at org.apache.solr.update.PeerSync.handleResponse(PeerSync.java:344)
at org.apache.solr.update.PeerSync.sync(PeerSync.java:257)
at 
org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:376)
at org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:221)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:229)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)



Is this a known issue and fixed in some newer version?


Thanks
SG


Re: Merging of index in Solr

2017-11-21 Thread Emir Arnautović
Hi Edwin,
How many host/nodes/shard are those 3.5TB? I am not familiar with merge code, 
but trying to think what it might include, so don’t take any of following as 
ground truth.
Merging for sure will include segments rewrite, so you better have additional 
3.5TB if you are merging it to a single segment. But that should not last days 
on SSD. My guess is that you are running on the edge of your heap and doing a 
lot GCs and maybe you will OOM at some point. I would guess that merging is 
memory intensive operation and even if not holding large structures in memory, 
it will probably create a lot of garbage. Merging requires a lot of comparison 
so it is also a possibility that you are exhausting CPU resources.
Bottom line - without more details and some monitoring tool, it is hard to tell 
why it is taking that much.
And there is also a question if merging is good choice in you case - is index 
static/updates free?

Regards,
Emir 
--
Monitoring - Log Management - Alerting - Anomaly Detection
Solr & Elasticsearch Consulting Support Training - http://sematext.com/



> On 20 Nov 2017, at 17:35, Zheng Lin Edwin Yeo  wrote:
> 
> Hi,
> 
> Does anyone knows how long usually the merging in Solr will take?
> 
> I am currently merging about 3.5TB of data, and it has been running for
> more than 28 hours and it is not completed yet. The merging is running on
> SSD disk.
> 
> I am using Solr 6.5.1.
> 
> Regards,
> Edwin