[ANNOUNCE] Apache Gora 0.8 Release

2017-09-20 Thread lewis john mcgibbney
Hi Folks,

The Apache Gora team are pleased to announce the immediate availability of
Apache Gora 0.8.

The Apache Gora open source framework provides an in-memory data model and
persistence for big data. Gora supports persisting to

   - column stores,
   - key value stores,
   - document stores,
   - distributed in-memory key/value stores,
   - in-memory data grids,
   - in-memory caches,
   - distributed multi-model stores, and
   - hybrid in-memory architectures

Gora also enables analysis of data with extensive Apache Hadoop™ MapReduce
and Apache Spark™ support. Gora uses the Apache Software License v2.0.

Gora is released as both source code, downloads for which can be found at
our downloads page [0] as well as Maven artifacts which can be found on
Maven central [1].
The DOAP file for Gora can be found here [2]

This release addresses a modest 35 issues with the addition of new
datastore for OrientDB and Aerospike. The full Jira release report can be
found here [3].

Suggested Gora database support is as follows


   - Apache Avro  1.8.1
   - Apache Hadoop  2.5.2
   - Apache HBase  1.2.3
   - Apache Cassandra  3.11.0 (Datastax Java
   Driver 3.3.0)
   - Apache Solr  6.5.1
   - MongoDB  (driver) 3.5.0
   - Apache Accumlo  1.7.1
   - Apache Spark  1.4.1
   - Apache CouchDB  1.4.2 (test containers
    1.1.0)
   - Amazon DynamoDB  (driver) 1.10.55
   - Infinispan  7.2.5.Final
   - JCache  1.0.0 with Hazelcast
    3.6.4 support.
   - OrientDB  2.2.22
   - Aerospike  4.0.6


Thank you

Lewis

(on behalf of Gora PMC)

[0] http://gora.apache.org/downloads.html
[1] http://search.maven.org/#search|ga|1|g%3A%22org.apache.gora%22
[2] https://svn.apache.org/repos/asf/gora/committers/doap_Gora.rdf
[3] https://s.apache.org/3YdY

--
http://home.apache.org/~lewismc/
@hectorMcSpector
http://www.linkedin.com/in/lmcgibbney


Seeing very low ingestion performance for a single non-cloud Solr core

2017-09-20 Thread saiks
Hi,

Environment:
- Solr is running in non-cloud mode on 6.4.2, Sun Java8, Linux
4.4.0-31-generic x86_64
- Ingesting into a single core
- SoftCommit = 5 seconds, HardCommit = 10 seconds
- System has 16 Cpus and 32 Gb of memory (Solr is given 20 Gb of JVM heap)
- text = StandardTokenizer, id = solr.StrField/docValues, hostname =
solr.StrField/docValues, app = solr.StrField/docValues, epoch =
solr.TrieLongField/docValues

I am using jmeter to ingest to Solr core using UpdateRequestHandle
("/update/json") and sending in a batch of 1000 messages(same message) in a
single json array.

Sample message
[{"text":"May 11 10:18:22 scrooge Web-Requests: May 11 10:18:22
@IunAIir17k-- EVENT_WR-Y-attack-600 SG_child[823]: [event.error]
Possible attack - 5 blocked requests within 120 seconds",
 "id":"id1",
 "hostname": "xx.com",
 "app": "",
 "epoch": 1483667347941
 },
]

Jmeter is configured to run 10 threads in parallel repeating the request
1000 times, which should ingest 10,000,000 messages in total.
Jmeter post url:
"/solr/mycore/update/json?overwrite=false=json=false"

Jmeter summary:
summary =   5000 in 00:03:07 =   26.7/s Avg:   370 Min:27 Max:  1734
Err: 0 (0.00%)

I am only able to ingest 26000 messages per second, looking at system
resources only one or two cpus are at 25-30% and the rest are sitting idle
and also Solr heap is flat at 3Gb with no iowait on the devices.
Increasing parallelism in Jmeter to ingest using 20 threads did not increase
ingested messages per second, but increased the latency by 2x for each
request.

I don't understand why Solr is not able to use all the cpus on the host if I
increase Jmeter parallelism from 10 -> 20 -> 40. What can I do to achieve
performance gain and make Solr utilize system resources to their maximum.

Please help.

Thank you






--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Replicates not recovering after rolling restart

2017-09-20 Thread Bill Oconnor
I have no clue where that number comes from it does not seem to be in the 
actual post to the leader as seen in my tcpdump.   It is mystery.


From: Walter Underwood 
Sent: Wednesday, September 20, 2017 7:00:53 PM
To: solr-user@lucene.apache.org
Subject: Re: Replicates not recovering after rolling restart


> On Sep 20, 2017, at 6:15 PM, Bill Oconnor  wrote:
>
> I restart using the standard "sudo service solr start/stop"

You might look into what that actually does.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)



Re: Replicates not recovering after rolling restart

2017-09-20 Thread Walter Underwood

> On Sep 20, 2017, at 6:15 PM, Bill Oconnor  wrote:
> 
> I restart using the standard "sudo service solr start/stop"

You might look into what that actually does.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)



Re: Replicates not recovering after rolling restart

2017-09-20 Thread Bill Oconnor
Thanks everyone for the response.


I do not think we changed anything other than the JVM memory size.


I did leave out one piece of info - one of the host is a replicate in another 
shard.


collection1 -> shard1 -> *h1, h2, h3, h4where star is leader

collection2 -> shard1 -> *h5, h3


When I restart *h1 works fine h2,h3,h4 go into recovery but still respond to 
request. *h1 starts getting

the post from the recovering servers and responds with the 500 Server Error 
until the servers quit.


Collection2 with h3 is active and fine even though it is recovering in 
collection1.


This happened before and I resolved it by deleting and then creating a new 
collection.


I restart using the standard "sudo service solr start/stop"


I have to say I am not comfortable with have multiple shards being shared on 
the same host. The Productions servers will not be configured this way but 
these servers are for development.


From: Erick Erickson 
Sent: Wednesday, September 20, 2017 3:35:16 PM
To: solr-user
Subject: Re: Replicates not recovering after rolling restart

The numberformatexception is...odd. Clearly that's too big a number
for an integer, did anything in the underlying schema change?

Best,
Erick

On Wed, Sep 20, 2017 at 3:00 PM, Walter Underwood  wrote:
> Rolling restarts work fine for us. I often include installing new configs 
> with that. Here is our script. Pass it any hostname in the cluster. I use the 
> load balancer name. You’ll need to change the domain and the install 
> directory of course.
>
> #!/bin/bash
>
> cluster=$1
>
> hosts=`curl -s 
> "http://${cluster}:8983/solr/admin/collections?action=CLUSTERSTATUS=json; 
> | jq -r '.cluster.live_nodes[]' | sort`
>
> for host in $hosts
> do
> host="${host}.cloud.cheggnet.com"
> echo restarting Solr on $host
> ssh $host 'cd /apps/solr6 ; sudo -u bin bin/solr stop; sudo -u bin 
> bin/solr start -cloud -h `hostname`'
> done
>
>
> Walter Underwood
> wun...@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
>
>
>> On Sep 20, 2017, at 1:42 PM, Bill Oconnor  wrote:
>>
>> Hello,
>>
>>
>> Background:
>>
>>
>> We have been successfully using Solr for over 5 years and we recently made 
>> the decision to move into SolrCloud. For the most part that has been easy 
>> but we have repeated problems with our rolling restart were server remain 
>> functional but stay in Recovery until they stop trying. We restarted because 
>> we increased the memory from 12GB to 16GB on the JVM.
>>
>>
>> Does anyone have any insight as to what is going on here?
>>
>> Is there a special procedure I should use for starting a stopping host?
>>
>> Is it ok to do a rolling restart on all the nodes in s shard?
>>
>>
>> Any insight would be appreciated.
>>
>>
>> Configuration:
>>
>>
>> We have a group of servers with multiple collections. Each collection 
>> consist of one shard and multiple replicates. We are running the latest 
>> stable version of SolrClound 6.6 on Ubuntu LTS and Oracle Corporation Java 
>> HotSpot(TM) 64-Bit Server VM 1.8.0_66 25.66-b17
>>
>>
>> (collection)  (shard)  (replicates)
>>
>> journals_stage   ->  shard1  ->  solr-220 (leader) , solr-223, solr-221, 
>> solr-222 (replicates)
>>
>>
>> Problem:
>>
>>
>> Restarting the system puts the replicates in a recovery state they never 
>> exit from. They eventually give up after 500 tries.  If I go to the 
>> individual replicates and execute a query the data is still available.
>>
>>
>> Using tcpdump I find the replicates sending this request to the leader (the 
>> leader appears to be active).
>>
>>
>> The exchange goes  like this - :
>>
>>
>> solr-220 is the leader.
>>
>> Solr-221 to Solr-220
>>
>>
>> 10:18:42.426823 IP solr-221:54341 > solr-220:8983:
>>
>>
>> POST /solr/journals_stage_shard1_replica1/update HTTP/1.1
>> Content-Type: application/x-www-form-urlencoded; charset=UTF-8
>> User-Agent: 
>> Solr[org.apache.solr.client.solrj.impl.HttpSolrClient]
>>  1.0
>> Content-Length: 108
>> Host: solr-220:8983
>> Connection: Keep-Alive
>>
>>
>> commit_end_point=true=false=true=false=true=javabin=2
>>
>>
>> Solr-220 back to Solr-221
>>
>>
>> IP solr-220:8983 > solr-221:54341: Flags [P.], seq 1:5152, ack 385, win 235, 
>> options [nop,nop,
>> TS val 85813 ecr 858107069], length 5151
>> ..HTTP/1.1 500 Server Error
>> Content-Type: application/octet-stream
>> Content-Length: 5060
>>
>>
>> .responseHeader..%QTimeC.%error..#msg?.For input string: 
>> "1578578283947098112".%trace?.: For
>> input string: "1578578283947098112"
>> at 
>> java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
>> at java.lang.Integer.parseInt(Integer.java:583)
>> at java.lang.Integer.parseInt(Integer.java:615)
>> at 
>> org.apache.lucene.queries.function.docvalues.IntDocValues.getRangeScorer(IntDocValues.java:89)
>> at 
>> 

Re: Solr- Data search across multiple vores

2017-09-20 Thread Rick Leir
Harshal,
You could send your Solr query to both cores but then you could have problems 
combining the results because the scores are not absolute: they just give a 
ranking in their own core. It might be ok, if you are searching on fields which 
are common to both cores.

But I suspect that you would do better to tell us what problem you are trying 
to solve. And I am almost certain that you should re-index all your data into 
one core. Cheers -- Rick

On September 18, 2017 1:42:00 AM EDT, "Agrawal, Harshal (GE Digital)" 
 wrote:
>Hello Folks,
>
>I want to search data in two separate cores. Both cores are unidentical
>only few fields are common in between.
>I don't want to join data . Is it possible to search data from two
>cores.
>
>I read about distributed search concept but not able to understand
>that. Is it the only way to search across multiple cores?
>
>Regards
>Harshal

-- 
Sorry for being brief. Alternate email is rickleir at yahoo dot com 

Re: Replicates not recovering after rolling restart

2017-09-20 Thread Walter Underwood
1578578283947098112 needs 61 bits. Is it being parsed into a 32 bit target? 

That doesn’t explain where it came from, of course.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)


> On Sep 20, 2017, at 3:35 PM, Erick Erickson  wrote:
> 
> The numberformatexception is...odd. Clearly that's too big a number
> for an integer, did anything in the underlying schema change?
> 
> Best,
> Erick
> 
> On Wed, Sep 20, 2017 at 3:00 PM, Walter Underwood  
> wrote:
>> Rolling restarts work fine for us. I often include installing new configs 
>> with that. Here is our script. Pass it any hostname in the cluster. I use 
>> the load balancer name. You’ll need to change the domain and the install 
>> directory of course.
>> 
>> #!/bin/bash
>> 
>> cluster=$1
>> 
>> hosts=`curl -s 
>> "http://${cluster}:8983/solr/admin/collections?action=CLUSTERSTATUS=json; 
>> | jq -r '.cluster.live_nodes[]' | sort`
>> 
>> for host in $hosts
>> do
>>host="${host}.cloud.cheggnet.com"
>>echo restarting Solr on $host
>>ssh $host 'cd /apps/solr6 ; sudo -u bin bin/solr stop; sudo -u bin 
>> bin/solr start -cloud -h `hostname`'
>> done
>> 
>> 
>> Walter Underwood
>> wun...@wunderwood.org
>> http://observer.wunderwood.org/  (my blog)
>> 
>> 
>>> On Sep 20, 2017, at 1:42 PM, Bill Oconnor  wrote:
>>> 
>>> Hello,
>>> 
>>> 
>>> Background:
>>> 
>>> 
>>> We have been successfully using Solr for over 5 years and we recently made 
>>> the decision to move into SolrCloud. For the most part that has been easy 
>>> but we have repeated problems with our rolling restart were server remain 
>>> functional but stay in Recovery until they stop trying. We restarted 
>>> because we increased the memory from 12GB to 16GB on the JVM.
>>> 
>>> 
>>> Does anyone have any insight as to what is going on here?
>>> 
>>> Is there a special procedure I should use for starting a stopping host?
>>> 
>>> Is it ok to do a rolling restart on all the nodes in s shard?
>>> 
>>> 
>>> Any insight would be appreciated.
>>> 
>>> 
>>> Configuration:
>>> 
>>> 
>>> We have a group of servers with multiple collections. Each collection 
>>> consist of one shard and multiple replicates. We are running the latest 
>>> stable version of SolrClound 6.6 on Ubuntu LTS and Oracle Corporation Java 
>>> HotSpot(TM) 64-Bit Server VM 1.8.0_66 25.66-b17
>>> 
>>> 
>>> (collection)  (shard)  (replicates)
>>> 
>>> journals_stage   ->  shard1  ->  solr-220 (leader) , solr-223, solr-221, 
>>> solr-222 (replicates)
>>> 
>>> 
>>> Problem:
>>> 
>>> 
>>> Restarting the system puts the replicates in a recovery state they never 
>>> exit from. They eventually give up after 500 tries.  If I go to the 
>>> individual replicates and execute a query the data is still available.
>>> 
>>> 
>>> Using tcpdump I find the replicates sending this request to the leader (the 
>>> leader appears to be active).
>>> 
>>> 
>>> The exchange goes  like this - :
>>> 
>>> 
>>> solr-220 is the leader.
>>> 
>>> Solr-221 to Solr-220
>>> 
>>> 
>>> 10:18:42.426823 IP solr-221:54341 > solr-220:8983:
>>> 
>>> 
>>> POST /solr/journals_stage_shard1_replica1/update HTTP/1.1
>>> Content-Type: application/x-www-form-urlencoded; charset=UTF-8
>>> User-Agent: 
>>> Solr[org.apache.solr.client.solrj.impl.HttpSolrClient]
>>>  1.0
>>> Content-Length: 108
>>> Host: solr-220:8983
>>> Connection: Keep-Alive
>>> 
>>> 
>>> commit_end_point=true=false=true=false=true=javabin=2
>>> 
>>> 
>>> Solr-220 back to Solr-221
>>> 
>>> 
>>> IP solr-220:8983 > solr-221:54341: Flags [P.], seq 1:5152, ack 385, win 
>>> 235, options [nop,nop,
>>> TS val 85813 ecr 858107069], length 5151
>>> ..HTTP/1.1 500 Server Error
>>> Content-Type: application/octet-stream
>>> Content-Length: 5060
>>> 
>>> 
>>> .responseHeader..%QTimeC.%error..#msg?.For input string: 
>>> "1578578283947098112".%trace?.: For
>>> input string: "1578578283947098112"
>>> at 
>>> java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
>>> at java.lang.Integer.parseInt(Integer.java:583)
>>> at java.lang.Integer.parseInt(Integer.java:615)
>>> at 
>>> org.apache.lucene.queries.function.docvalues.IntDocValues.getRangeScorer(IntDocValues.java:89)
>>> at 
>>> org.apache.solr.search.function.ValueSourceRangeFilter$1.iterator(ValueSourceRangeFilter.java:83)
>>> at 
>>> org.apache.solr.search.SolrConstantScoreQuery$ConstantWeight.scorer(SolrConstantScoreQuery.java:100)
>>> at org.apache.lucene.search.Weight.scorerSupplier(Weight.java:126)
>>> at 
>>> org.apache.lucene.search.BooleanWeight.scorerSupplier(BooleanWeight.java:400)
>>> at org.apache.lucene.search.BooleanWeight.scorer(BooleanWeight.java:381)
>>> at 
>>> org.apache.solr.update.DeleteByQueryWrapper$1.scorer(DeleteByQueryWrapper.java:90)
>>> at 
>>> 

Re: Indexes don't synch when node with old data is returned to cluster

2017-09-20 Thread Erick Erickson
this _should_ be OK. I'd expect the new nodes to see that more than
100 docs have been indexed and do a full sync.

However, you can insure this by removing the entire data directory
from the nodes that are down (rm -rf data). They'll
come back up, do a full sync and start answering queries only after
the full sync is completed.

Best,
Erick

On Wed, Sep 20, 2017 at 3:00 PM, Joe Heasly  wrote:
> Hello,
>
> We have just moved from solr 4.6 master/slave to 6.4.2 SolrCloud.  We have 
> three collections, each with a single shard and a varying number of replicas, 
> all kept by an ensemble of three zooKeepers (on their own hosts).  As an 
> ecommerce site, our capacity needs vary so we add and remove replicas with 
> some frequency.  The basic topology is like this:
>
> solr1
>   |- collection1
> |- shard1 - replica1
>   |- collection2
> |- shard1 - replica1
>   |- collection3
> |- shard1 - replica1
>  .
>  .
>  .
> solrN
>   |- collection1
> |- shard1 - replicaN
>   |- collection2
> |- shard1 - replicaN
>   |- collection3
> |- shard1 - replicaN
>
> Where N varies between three and six most of the time.
>
> During a recent test, we ran our indexing processes to a set of nodes, and 
> then two nodes were removed from our configuration.  Subsequently the 
> remaining nodes were reindexed, without problems.  The two nodes that had 
> been previously removed (by simply stopping solr on those boxes) were brought 
> back into the cluster by starting solr with the appropriate zkHost strings.  
> (These were the same zkHosts as when the instances were stopped.)  We found 
> that the indexes did not synch up until we re-indexed the entire cluster.
>
> What are we missing?  We need the re-added indexes to synchronize with those 
> already active in the cluster.  If we have to re-index the whole cluster, we 
> risk inconsistent results being served from the new nodes while indexing is 
> going on.  In reviewing the Reference Guide and doing various searches, I 
> haven't found anything that clearly references adding replicas to a cluster 
> when the cores already contain data.
>
> Thank you for any insights,
> Joe
>
> Joe Heasly, Systems Analyst I
> L.L.Bean, Inc. ~ Direct Channel Business & Technology Team
> Office: 207.552.2254
> Cell:207.756.9250
>


Re: Replicates not recovering after rolling restart

2017-09-20 Thread Erick Erickson
The numberformatexception is...odd. Clearly that's too big a number
for an integer, did anything in the underlying schema change?

Best,
Erick

On Wed, Sep 20, 2017 at 3:00 PM, Walter Underwood  wrote:
> Rolling restarts work fine for us. I often include installing new configs 
> with that. Here is our script. Pass it any hostname in the cluster. I use the 
> load balancer name. You’ll need to change the domain and the install 
> directory of course.
>
> #!/bin/bash
>
> cluster=$1
>
> hosts=`curl -s 
> "http://${cluster}:8983/solr/admin/collections?action=CLUSTERSTATUS=json; 
> | jq -r '.cluster.live_nodes[]' | sort`
>
> for host in $hosts
> do
> host="${host}.cloud.cheggnet.com"
> echo restarting Solr on $host
> ssh $host 'cd /apps/solr6 ; sudo -u bin bin/solr stop; sudo -u bin 
> bin/solr start -cloud -h `hostname`'
> done
>
>
> Walter Underwood
> wun...@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
>
>
>> On Sep 20, 2017, at 1:42 PM, Bill Oconnor  wrote:
>>
>> Hello,
>>
>>
>> Background:
>>
>>
>> We have been successfully using Solr for over 5 years and we recently made 
>> the decision to move into SolrCloud. For the most part that has been easy 
>> but we have repeated problems with our rolling restart were server remain 
>> functional but stay in Recovery until they stop trying. We restarted because 
>> we increased the memory from 12GB to 16GB on the JVM.
>>
>>
>> Does anyone have any insight as to what is going on here?
>>
>> Is there a special procedure I should use for starting a stopping host?
>>
>> Is it ok to do a rolling restart on all the nodes in s shard?
>>
>>
>> Any insight would be appreciated.
>>
>>
>> Configuration:
>>
>>
>> We have a group of servers with multiple collections. Each collection 
>> consist of one shard and multiple replicates. We are running the latest 
>> stable version of SolrClound 6.6 on Ubuntu LTS and Oracle Corporation Java 
>> HotSpot(TM) 64-Bit Server VM 1.8.0_66 25.66-b17
>>
>>
>> (collection)  (shard)  (replicates)
>>
>> journals_stage   ->  shard1  ->  solr-220 (leader) , solr-223, solr-221, 
>> solr-222 (replicates)
>>
>>
>> Problem:
>>
>>
>> Restarting the system puts the replicates in a recovery state they never 
>> exit from. They eventually give up after 500 tries.  If I go to the 
>> individual replicates and execute a query the data is still available.
>>
>>
>> Using tcpdump I find the replicates sending this request to the leader (the 
>> leader appears to be active).
>>
>>
>> The exchange goes  like this - :
>>
>>
>> solr-220 is the leader.
>>
>> Solr-221 to Solr-220
>>
>>
>> 10:18:42.426823 IP solr-221:54341 > solr-220:8983:
>>
>>
>> POST /solr/journals_stage_shard1_replica1/update HTTP/1.1
>> Content-Type: application/x-www-form-urlencoded; charset=UTF-8
>> User-Agent: 
>> Solr[org.apache.solr.client.solrj.impl.HttpSolrClient]
>>  1.0
>> Content-Length: 108
>> Host: solr-220:8983
>> Connection: Keep-Alive
>>
>>
>> commit_end_point=true=false=true=false=true=javabin=2
>>
>>
>> Solr-220 back to Solr-221
>>
>>
>> IP solr-220:8983 > solr-221:54341: Flags [P.], seq 1:5152, ack 385, win 235, 
>> options [nop,nop,
>> TS val 85813 ecr 858107069], length 5151
>> ..HTTP/1.1 500 Server Error
>> Content-Type: application/octet-stream
>> Content-Length: 5060
>>
>>
>> .responseHeader..%QTimeC.%error..#msg?.For input string: 
>> "1578578283947098112".%trace?.: For
>> input string: "1578578283947098112"
>> at 
>> java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
>> at java.lang.Integer.parseInt(Integer.java:583)
>> at java.lang.Integer.parseInt(Integer.java:615)
>> at 
>> org.apache.lucene.queries.function.docvalues.IntDocValues.getRangeScorer(IntDocValues.java:89)
>> at 
>> org.apache.solr.search.function.ValueSourceRangeFilter$1.iterator(ValueSourceRangeFilter.java:83)
>> at 
>> org.apache.solr.search.SolrConstantScoreQuery$ConstantWeight.scorer(SolrConstantScoreQuery.java:100)
>> at org.apache.lucene.search.Weight.scorerSupplier(Weight.java:126)
>> at 
>> org.apache.lucene.search.BooleanWeight.scorerSupplier(BooleanWeight.java:400)
>> at org.apache.lucene.search.BooleanWeight.scorer(BooleanWeight.java:381)
>> at 
>> org.apache.solr.update.DeleteByQueryWrapper$1.scorer(DeleteByQueryWrapper.java:90)
>> at 
>> org.apache.lucene.index.BufferedUpdatesStream.applyQueryDeletes(BufferedUpdatesStream.java:709)
>>
>> at 
>> org.apache.lucene.index.BufferedUpdatesStream.applyDeletesAndUpdates(BufferedUpdatesStream.java:267)
>>
>>
>


Re: SolrCloud Merge Indexes in Solr is running very slow

2017-09-20 Thread Erick Erickson
My very first question is whether you're _absolutely sure_ that the
indexes you're merging have the same hash range. If not you're in for
a world of hurt.

You might try merging on local disks rather than in HDFS...

Best,
Erick

On Wed, Sep 20, 2017 at 11:05 AM, Avinash Patil
 wrote:
>
>
> I have a lot of data coming in SolrCloud and we create multiple collections 
> dynamically after a collection threshold is reached.Currently to maintain 
> fast search response speeds after 100M docs a new collection is triggered( 
> 300G in HDFS ) . After SolrCloud ( CDH solr 4.10.3) reaches 150 - 200 
> collections . So i am trying merge 10  multiple collections to a single new 
> collection shard using merge api in solrj. But the merge index api is running 
> very slow i.e almost 5 mins to merge a collection a single shard.
>
> CoreAdminResponse mergeIndexes = CoreAdminRequest.mergeIndexes(destShard, 
> arr, new String[0], secClient);
> LOGGER.debug(" merge response - {} ", mergeIndexes);
> Thread.sleep(1000l);
> secClient.commit(destCollection);
>
> I need information on if there is a way to make the merge faster . I tried 
> optimizing and committing a collection before we run the merge.
>
> Also any information on how merge runs in the background (does it copy the 
> entire index folder ? ) will also be useful
>
>
>
> Regards
> Avinash Patil
> Software Engineer
> Securonix
> Security Analytics. Delivered. 
> Mobile:
> Email:
> Winner of 12 2016 Information Security Product Guide Global Excellence Awards 
> 
> Winner of Seven Golden Bridge Awards 
> 
> Named 2015 Innovator in Cyber Threat Analysis and Intelligence by SC Magazine 
>
> 
>
>
>
>
>
> --
> This message (including any attachments) contains confidential information
> intended for a specific individual and purpose, and is protected by law. If
> you are not the intended recipient, you should delete this message and any
> disclosure, copying, or distribution of this message, or the taking of any
> action based on it, by you is strictly prohibited.


SolrCloud Merge Indexes in Solr is running very slow

2017-09-20 Thread Avinash Patil


I have a lot of data coming in SolrCloud and we create multiple collections 
dynamically after a collection threshold is reached.Currently to maintain fast 
search response speeds after 100M docs a new collection is triggered( 300G in 
HDFS ) . After SolrCloud ( CDH solr 4.10.3) reaches 150 - 200 collections . So 
i am trying merge 10  multiple collections to a single new collection shard 
using merge api in solrj. But the merge index api is running very slow i.e 
almost 5 mins to merge a collection a single shard. 

CoreAdminResponse mergeIndexes = CoreAdminRequest.mergeIndexes(destShard, 
arr, new String[0], secClient);
LOGGER.debug(" merge response - {} ", mergeIndexes);
Thread.sleep(1000l);
secClient.commit(destCollection);

I need information on if there is a way to make the merge faster . I tried 
optimizing and committing a collection before we run the merge. 

Also any information on how merge runs in the background (does it copy the 
entire index folder ? ) will also be useful  



Regards
Avinash Patil
Software Engineer
Securonix
Security Analytics. Delivered. 
Mobile: 
Email:  
Winner of 12 2016 Information Security Product Guide Global Excellence Awards 

Winner of Seven Golden Bridge Awards 

Named 2015 Innovator in Cyber Threat Analysis and Intelligence by SC Magazine   
 






-- 
This message (including any attachments) contains confidential information 
intended for a specific individual and purpose, and is protected by law. If 
you are not the intended recipient, you should delete this message and any 
disclosure, copying, or distribution of this message, or the taking of any 
action based on it, by you is strictly prohibited.


Re: Replicates not recovering after rolling restart

2017-09-20 Thread Walter Underwood
Rolling restarts work fine for us. I often include installing new configs with 
that. Here is our script. Pass it any hostname in the cluster. I use the load 
balancer name. You’ll need to change the domain and the install directory of 
course.

#!/bin/bash

cluster=$1

hosts=`curl -s 
"http://${cluster}:8983/solr/admin/collections?action=CLUSTERSTATUS=json; | 
jq -r '.cluster.live_nodes[]' | sort`

for host in $hosts
do
host="${host}.cloud.cheggnet.com"
echo restarting Solr on $host
ssh $host 'cd /apps/solr6 ; sudo -u bin bin/solr stop; sudo -u bin bin/solr 
start -cloud -h `hostname`'
done


Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)


> On Sep 20, 2017, at 1:42 PM, Bill Oconnor  wrote:
> 
> Hello,
> 
> 
> Background:
> 
> 
> We have been successfully using Solr for over 5 years and we recently made 
> the decision to move into SolrCloud. For the most part that has been easy but 
> we have repeated problems with our rolling restart were server remain 
> functional but stay in Recovery until they stop trying. We restarted because 
> we increased the memory from 12GB to 16GB on the JVM.
> 
> 
> Does anyone have any insight as to what is going on here?
> 
> Is there a special procedure I should use for starting a stopping host?
> 
> Is it ok to do a rolling restart on all the nodes in s shard?
> 
> 
> Any insight would be appreciated.
> 
> 
> Configuration:
> 
> 
> We have a group of servers with multiple collections. Each collection consist 
> of one shard and multiple replicates. We are running the latest stable 
> version of SolrClound 6.6 on Ubuntu LTS and Oracle Corporation Java 
> HotSpot(TM) 64-Bit Server VM 1.8.0_66 25.66-b17
> 
> 
> (collection)  (shard)  (replicates)
> 
> journals_stage   ->  shard1  ->  solr-220 (leader) , solr-223, solr-221, 
> solr-222 (replicates)
> 
> 
> Problem:
> 
> 
> Restarting the system puts the replicates in a recovery state they never exit 
> from. They eventually give up after 500 tries.  If I go to the individual 
> replicates and execute a query the data is still available.
> 
> 
> Using tcpdump I find the replicates sending this request to the leader (the 
> leader appears to be active).
> 
> 
> The exchange goes  like this - :
> 
> 
> solr-220 is the leader.
> 
> Solr-221 to Solr-220
> 
> 
> 10:18:42.426823 IP solr-221:54341 > solr-220:8983:
> 
> 
> POST /solr/journals_stage_shard1_replica1/update HTTP/1.1
> Content-Type: application/x-www-form-urlencoded; charset=UTF-8
> User-Agent: 
> Solr[org.apache.solr.client.solrj.impl.HttpSolrClient]
>  1.0
> Content-Length: 108
> Host: solr-220:8983
> Connection: Keep-Alive
> 
> 
> commit_end_point=true=false=true=false=true=javabin=2
> 
> 
> Solr-220 back to Solr-221
> 
> 
> IP solr-220:8983 > solr-221:54341: Flags [P.], seq 1:5152, ack 385, win 235, 
> options [nop,nop,
> TS val 85813 ecr 858107069], length 5151
> ..HTTP/1.1 500 Server Error
> Content-Type: application/octet-stream
> Content-Length: 5060
> 
> 
> .responseHeader..%QTimeC.%error..#msg?.For input string: 
> "1578578283947098112".%trace?.: For
> input string: "1578578283947098112"
> at 
> java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
> at java.lang.Integer.parseInt(Integer.java:583)
> at java.lang.Integer.parseInt(Integer.java:615)
> at 
> org.apache.lucene.queries.function.docvalues.IntDocValues.getRangeScorer(IntDocValues.java:89)
> at 
> org.apache.solr.search.function.ValueSourceRangeFilter$1.iterator(ValueSourceRangeFilter.java:83)
> at 
> org.apache.solr.search.SolrConstantScoreQuery$ConstantWeight.scorer(SolrConstantScoreQuery.java:100)
> at org.apache.lucene.search.Weight.scorerSupplier(Weight.java:126)
> at 
> org.apache.lucene.search.BooleanWeight.scorerSupplier(BooleanWeight.java:400)
> at 
> org.apache.lucene.search.BooleanWeight.scorer(BooleanWeight.java:381)
> at 
> org.apache.solr.update.DeleteByQueryWrapper$1.scorer(DeleteByQueryWrapper.java:90)
> at 
> org.apache.lucene.index.BufferedUpdatesStream.applyQueryDeletes(BufferedUpdatesStream.java:709)
> 
> at 
> org.apache.lucene.index.BufferedUpdatesStream.applyDeletesAndUpdates(BufferedUpdatesStream.java:267)
> 
> 



Indexes don't synch when node with old data is returned to cluster

2017-09-20 Thread Joe Heasly
Hello,

We have just moved from solr 4.6 master/slave to 6.4.2 SolrCloud.  We have 
three collections, each with a single shard and a varying number of replicas, 
all kept by an ensemble of three zooKeepers (on their own hosts).  As an 
ecommerce site, our capacity needs vary so we add and remove replicas with some 
frequency.  The basic topology is like this:

solr1
  |- collection1
|- shard1 - replica1
  |- collection2
|- shard1 - replica1
  |- collection3
|- shard1 - replica1
 .
 .
 .
solrN
  |- collection1
|- shard1 - replicaN
  |- collection2
|- shard1 - replicaN
  |- collection3
|- shard1 - replicaN

Where N varies between three and six most of the time.

During a recent test, we ran our indexing processes to a set of nodes, and then 
two nodes were removed from our configuration.  Subsequently the remaining 
nodes were reindexed, without problems.  The two nodes that had been previously 
removed (by simply stopping solr on those boxes) were brought back into the 
cluster by starting solr with the appropriate zkHost strings.  (These were the 
same zkHosts as when the instances were stopped.)  We found that the indexes 
did not synch up until we re-indexed the entire cluster.

What are we missing?  We need the re-added indexes to synchronize with those 
already active in the cluster.  If we have to re-index the whole cluster, we 
risk inconsistent results being served from the new nodes while indexing is 
going on.  In reviewing the Reference Guide and doing various searches, I 
haven't found anything that clearly references adding replicas to a cluster 
when the cores already contain data.

Thank you for any insights,
Joe

Joe Heasly, Systems Analyst I
L.L.Bean, Inc. ~ Direct Channel Business & Technology Team
Office: 207.552.2254
Cell:207.756.9250



Replicates not recovering after rolling restart

2017-09-20 Thread Bill Oconnor
Hello,


Background:


We have been successfully using Solr for over 5 years and we recently made the 
decision to move into SolrCloud. For the most part that has been easy but we 
have repeated problems with our rolling restart were server remain functional 
but stay in Recovery until they stop trying. We restarted because we increased 
the memory from 12GB to 16GB on the JVM.


Does anyone have any insight as to what is going on here?

Is there a special procedure I should use for starting a stopping host?

Is it ok to do a rolling restart on all the nodes in s shard?


Any insight would be appreciated.


Configuration:


We have a group of servers with multiple collections. Each collection consist 
of one shard and multiple replicates. We are running the latest stable version 
of SolrClound 6.6 on Ubuntu LTS and Oracle Corporation Java HotSpot(TM) 64-Bit 
Server VM 1.8.0_66 25.66-b17


(collection)  (shard)  (replicates)

journals_stage   ->  shard1  ->  solr-220 (leader) , solr-223, solr-221, 
solr-222 (replicates)


Problem:


Restarting the system puts the replicates in a recovery state they never exit 
from. They eventually give up after 500 tries.  If I go to the individual 
replicates and execute a query the data is still available.


Using tcpdump I find the replicates sending this request to the leader (the 
leader appears to be active).


The exchange goes  like this - :


solr-220 is the leader.

Solr-221 to Solr-220


10:18:42.426823 IP solr-221:54341 > solr-220:8983:


POST /solr/journals_stage_shard1_replica1/update HTTP/1.1
Content-Type: application/x-www-form-urlencoded; charset=UTF-8
User-Agent: 
Solr[org.apache.solr.client.solrj.impl.HttpSolrClient]
 1.0
Content-Length: 108
Host: solr-220:8983
Connection: Keep-Alive


commit_end_point=true=false=true=false=true=javabin=2


Solr-220 back to Solr-221


IP solr-220:8983 > solr-221:54341: Flags [P.], seq 1:5152, ack 385, win 235, 
options [nop,nop,
TS val 85813 ecr 858107069], length 5151
..HTTP/1.1 500 Server Error
Content-Type: application/octet-stream
Content-Length: 5060


.responseHeader..%QTimeC.%error..#msg?.For input string: 
"1578578283947098112".%trace?.: For
input string: "1578578283947098112"
at 
java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
at java.lang.Integer.parseInt(Integer.java:583)
at java.lang.Integer.parseInt(Integer.java:615)
at 
org.apache.lucene.queries.function.docvalues.IntDocValues.getRangeScorer(IntDocValues.java:89)
at 
org.apache.solr.search.function.ValueSourceRangeFilter$1.iterator(ValueSourceRangeFilter.java:83)
at 
org.apache.solr.search.SolrConstantScoreQuery$ConstantWeight.scorer(SolrConstantScoreQuery.java:100)
at org.apache.lucene.search.Weight.scorerSupplier(Weight.java:126)
at 
org.apache.lucene.search.BooleanWeight.scorerSupplier(BooleanWeight.java:400)
at org.apache.lucene.search.BooleanWeight.scorer(BooleanWeight.java:381)
at 
org.apache.solr.update.DeleteByQueryWrapper$1.scorer(DeleteByQueryWrapper.java:90)
at 
org.apache.lucene.index.BufferedUpdatesStream.applyQueryDeletes(BufferedUpdatesStream.java:709)

at 
org.apache.lucene.index.BufferedUpdatesStream.applyDeletesAndUpdates(BufferedUpdatesStream.java:267)




Re: question about an entry in the log file

2017-09-20 Thread kaveh minooie

Thanks Shalin, that was very helpfull.

On 09/20/2017 01:02 PM, Shalin Shekhar Mangar wrote:

That log shows that the searcher being opened is the "realtime"
searcher as opposed to the "main" searcher. The realtime searcher is
quite lightweight. It causes a flush of built index segments from the
memory to the disk and opens a new searcher over them. No autowarming
or fsync happens for realtime searcher. It is used for realtime gets,
atomic updates and some other internal housekeeping tasks. This
searcher is essential and unavoidable.

The "main" searcher, on the other hand, is used for searching and is
heavy-weight. This is controlled by the "openSearcher" request
parameter and configuration.

On Wed, Sep 20, 2017 at 12:07 PM, kaveh minooie  wrote:

Hi Erick

Thanks for your response. I understand your point, but what I was asking was
does solr reopen searchers after a commit call even if the commit was called
with openSearcher=false since this is what seems to be happening based on
these log entries?

Also, it seems that if autocommit is configured with
false and softAutoCommit is set to -1, absence
of any other commit call, the new updates, although committed, would remain
invisible forever. my problem here is that I am going for maximum indexing
performance, millions of document per each batch job, couple of batch jobs
per day. their immediate visibility or even delayed visibility is not a
priority, but they have to become visible at some point. preferably at the
end of each batch job. what do you think would be the best way to go about
this?

thanks,


On 09/20/2017 10:17 AM, Erick Erickson wrote:


First, I would not recommend you call commit from the client. It's
usually far better to let your autocommit settings in solrconfig.xml
deal with it. When you need to search, you either need to configure
 with true

or set  to something other than -1.


https://lucidworks.com/2013/08/23/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/

Best,
Erick

On Tue, Sep 19, 2017 at 4:53 PM, kaveh minooie  wrote:


Hi eveyone

I am trying to figure out why calling commit from my client takes a very
long time in an environment with concurrent updates, and I see the
following
snippet in the solr log files when client calls for commit. my question
is
regarding the third info. what is it opening? and how can make solr to
stop
doing that?


INFO  - 2017-09-19 16:42:20.557; [c:dosweb2016 s:shard2 r:core_node5
x:dosweb2016] org.apache.solr.update.DirectUpdateHandler2; start

commit{,optimize=false,openSearcher=false,waitSearcher=true,expungeDeletes=false,softCommit=false,prepareCommit=false}
INFO  - 2017-09-19 16:42:20.557; [c:dosweb2016 s:shard2 r:core_node5
x:dosweb2016] org.apache.solr.update.SolrIndexWriter; Calling
setCommitData
with IW:org.apache.solr.update.SolrIndexWriter@3ee73284
INFO  - 2017-09-19 16:42:20.660; [c:dosweb2016 s:shard2 r:core_node5
x:dosweb2016] org.apache.solr.search.SolrIndexSearcher; Opening
[Searcher@644a8d33[dosweb2016] realtime]
INFO  - 2017-09-19 16:42:20.668; [c:dosweb2016 s:shard2 r:core_node5
x:dosweb2016] org.apache.solr.update.DirectUpdateHandler2;
end_commit_flush


thanks,
--
Kaveh Minooie



--
Kaveh Minooie






--
Kaveh Minooie


Re: question about an entry in the log file

2017-09-20 Thread Shalin Shekhar Mangar
That log shows that the searcher being opened is the "realtime"
searcher as opposed to the "main" searcher. The realtime searcher is
quite lightweight. It causes a flush of built index segments from the
memory to the disk and opens a new searcher over them. No autowarming
or fsync happens for realtime searcher. It is used for realtime gets,
atomic updates and some other internal housekeeping tasks. This
searcher is essential and unavoidable.

The "main" searcher, on the other hand, is used for searching and is
heavy-weight. This is controlled by the "openSearcher" request
parameter and configuration.

On Wed, Sep 20, 2017 at 12:07 PM, kaveh minooie  wrote:
> Hi Erick
>
> Thanks for your response. I understand your point, but what I was asking was
> does solr reopen searchers after a commit call even if the commit was called
> with openSearcher=false since this is what seems to be happening based on
> these log entries?
>
> Also, it seems that if autocommit is configured with
> false and softAutoCommit is set to -1, absence
> of any other commit call, the new updates, although committed, would remain
> invisible forever. my problem here is that I am going for maximum indexing
> performance, millions of document per each batch job, couple of batch jobs
> per day. their immediate visibility or even delayed visibility is not a
> priority, but they have to become visible at some point. preferably at the
> end of each batch job. what do you think would be the best way to go about
> this?
>
> thanks,
>
>
> On 09/20/2017 10:17 AM, Erick Erickson wrote:
>>
>> First, I would not recommend you call commit from the client. It's
>> usually far better to let your autocommit settings in solrconfig.xml
>> deal with it. When you need to search, you either need to configure
>>  with true
>>
>> or set  to something other than -1.
>>
>>
>> https://lucidworks.com/2013/08/23/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/
>>
>> Best,
>> Erick
>>
>> On Tue, Sep 19, 2017 at 4:53 PM, kaveh minooie  wrote:
>>>
>>> Hi eveyone
>>>
>>> I am trying to figure out why calling commit from my client takes a very
>>> long time in an environment with concurrent updates, and I see the
>>> following
>>> snippet in the solr log files when client calls for commit. my question
>>> is
>>> regarding the third info. what is it opening? and how can make solr to
>>> stop
>>> doing that?
>>>
>>>
>>> INFO  - 2017-09-19 16:42:20.557; [c:dosweb2016 s:shard2 r:core_node5
>>> x:dosweb2016] org.apache.solr.update.DirectUpdateHandler2; start
>>>
>>> commit{,optimize=false,openSearcher=false,waitSearcher=true,expungeDeletes=false,softCommit=false,prepareCommit=false}
>>> INFO  - 2017-09-19 16:42:20.557; [c:dosweb2016 s:shard2 r:core_node5
>>> x:dosweb2016] org.apache.solr.update.SolrIndexWriter; Calling
>>> setCommitData
>>> with IW:org.apache.solr.update.SolrIndexWriter@3ee73284
>>> INFO  - 2017-09-19 16:42:20.660; [c:dosweb2016 s:shard2 r:core_node5
>>> x:dosweb2016] org.apache.solr.search.SolrIndexSearcher; Opening
>>> [Searcher@644a8d33[dosweb2016] realtime]
>>> INFO  - 2017-09-19 16:42:20.668; [c:dosweb2016 s:shard2 r:core_node5
>>> x:dosweb2016] org.apache.solr.update.DirectUpdateHandler2;
>>> end_commit_flush
>>>
>>>
>>> thanks,
>>> --
>>> Kaveh Minooie
>
>
> --
> Kaveh Minooie



-- 
Regards,
Shalin Shekhar Mangar.


Rescoring from 0 - full

2017-09-20 Thread Dariusz Wojtas
Hi,
When I use boosting fuctionality, it is always about adding or
multiplicating the score calculated in the 'q' param.
I mau use function queries inside 'q', but this may hit performance on
calling multiple nested functions.
I thaught that 'rerank' could help, but it is still about changing the
original score, not full calculation.

How can take full control on score in rerank? Is it possible?

Best regards,
Dariusz Wojtas


Re: [ANNOUNCE] Apache Solr 7.0.0 released

2017-09-20 Thread Anshum Gupta
It’s strange but something seems to have stripped off all the formatting from 
the announce mail. Here’s a plain text version of the same and hope this is 
more readable.


20 September 2017, Apache Solr™ 7.0.0 available

Solr is the popular, blazing fast, open source NoSQL search platform from the 
Apache Lucene project. Its major features include powerful full-text search, 
hit highlighting, faceted search, dynamic clustering, database integration, 
rich document (e.g., Word, PDF) handling, and geospatial search. Solr is highly 
scalable, providing fault tolerant distributed search and indexing, and powers 
the search and navigation features of many of the world's largest internet 
sites. 

Solr 7.0.0 is available for immediate download at: 
http://lucene.apache.org/solr/mirrors-solr-latest-redir.html 


See http://lucene.apache.org/solr/7_0_0/changes/Changes.html 
 for a full list of 
details. 

  * Replica Types - Solr 7 supports different replica types, which handle 
updates differently. In addition to pure NRT operation where all replicas build 
an index and keep a replication log, you can now also add so called PULL 
replicas, achieving the read-speed optimized benefits of a master/slave setup 
while at the same time keeping index redundancy. 

  * Auto-scaling. Solr can now allocate new replicas to nodes using a new auto 
scaling policy framework. This framework will in future releases enable Solr to 
move shards around based on load, disk etc. 

  * Indented JSON is now the default response format for all APIs, pass wt=xml 
and/or indent=off to use the previous unindented XML format. 

  * The JSON Facet API now supports two-phase facet refinement to ensure 
accurate counts and statistics for facet buckets returned in distributed mode. 

  * Streaming Expressions adds a new statistical programming syntax for the 
statistical analysis of sql queries, random samples, time series and graph 
result sets. 

  * Analytics Component version 2.0, which now supports distributed 
collections, expressions over multivalued fields, a new JSON request language, 
and more. 

  * The new v2 API, exposed at /api/ and also supported via SolrJ, is now the 
preferred API, but /solr/ continues to work. 

  * A new '_default' configset is used if no config is specified at collection 
creation. The data-driven functionality of this configset indexes strings as 
analyzed text while at the same time copying to a '*_str' field suitable for 
faceting. 

  * Solr 7 is tested with and verified to support Java 9. 

Being a major release, Solr 7 removes many deprecated APIs, changes various 
parameter defaults and behavior. Some changes may require a re-index of your 
content. You are thus encouraged to thoroughly read the "Upgrade Notes" at 
http://lucene.apache.org/solr/7_0_0/changes/Changes.html 
 or in the 
CHANGES.txt file accompanying the release. 

Solr 7.0.0 also includes many other new features as well as numerous 
optimizations and bugfixes of the corresponding Apache Lucene release. 

Please report any feedback to the mailing lists 
(http://lucene.apache.org/solr/discussion.html 
) 

Note: The Apache Software Foundation uses an extensive mirroring network for 
distributing releases. It is possible that the mirror you are using may not 
have replicated the release yet. If that is the case, please try another 
mirror. This also goes for Maven access.

-Anshum



> On Sep 20, 2017, at 12:09 PM, Anshum Gupta  wrote:
> 
> 20 September 2017, Apache Solr™ 7.0.0 available
> 
> Solr is the popular, blazing fast, open source NoSQL search platform from the 
> Apache Lucene project. Its major features include powerful full-text search, 
> hit highlighting, faceted search, dynamic clustering, database integration, 
> rich document (e.g., Word, PDF) handling, and geospatial search. Solr is 
> highly scalable, providing fault tolerant distributed search and indexing, 
> and powers the search and navigation features of many of the world's largest 
> internet sites. 
> 
> Solr 7.0.0 is available for immediate download at: 
> http://lucene.apache.org/solr/mirrors-solr-latest-redir.html 
> 
> See http://lucene.apache.org/solr/7_0_0/changes/Changes.html 
>  for a full list of 
> details. 
> 
> Replica Types - Solr 7 supports different replica types, which handle updates 
> differently. In addition to pure NRT operation where all replicas build an 
> index and keep a replication log, you can now also add so called PULL 
> replicas, achieving the read-speed optimized benefits of a master/slave setup 
> while at the same time keeping index redundancy. 
> Auto-scaling. Solr can now 

[ANNOUNCE] Apache Solr 7.0.0 released

2017-09-20 Thread Anshum Gupta
20 September 2017, Apache Solr™ 7.0.0 available

Solr is the popular, blazing fast, open source NoSQL search platform from the 
Apache Lucene project. Its major features include powerful full-text search, 
hit highlighting, faceted search, dynamic clustering, database integration, 
rich document (e.g., Word, PDF) handling, and geospatial search. Solr is highly 
scalable, providing fault tolerant distributed search and indexing, and powers 
the search and navigation features of many of the world's largest internet 
sites. 

Solr 7.0.0 is available for immediate download at: 
http://lucene.apache.org/solr/mirrors-solr-latest-redir.html 

See http://lucene.apache.org/solr/7_0_0/changes/Changes.html 
 for a full list of 
details. 

Replica Types - Solr 7 supports different replica types, which handle updates 
differently. In addition to pure NRT operation where all replicas build an 
index and keep a replication log, you can now also add so called PULL replicas, 
achieving the read-speed optimized benefits of a master/slave setup while at 
the same time keeping index redundancy. 
Auto-scaling. Solr can now allocate new replicas to nodes using a new auto 
scaling policy framework. This framework will in future releases enable Solr to 
move shards around based on load, disk etc. 
Indented JSON is now the default response format for all APIs, pass wt=xml 
and/or indent=off to use the previous unindented XML format. 
The JSON Facet API now supports two-phase facet refinement to ensure accurate 
counts and statistics for facet buckets returned in distributed mode. 
Streaming Expressions adds a new statistical programming syntax for the 
statistical analysis of sql queries, random samples, time series and graph 
result sets. 
Analytics Component version 2.0, which now supports distributed collections, 
expressions over multivalued fields, a new JSON request language, and more. 
The new v2 API, exposed at /api/ and also supported via SolrJ, is now the 
preferred API, but /solr/ continues to work. 
A new '_default' configset is used if no config is specified at collection 
creation. The data-driven functionality of this configset indexes strings as 
analyzed text while at the same time copying to a '*_str' field suitable for 
faceting. 
Solr 7 is tested with and verified to support Java 9. 
Being a major release, Solr 7 removes many deprecated APIs, changes various 
parameter defaults and behavior. Some changes may require a re-index of your 
content. You are thus encouraged to thoroughly read the "Upgrade Notes" at 
http://lucene.apache.org/solr/7_0_0/changes/Changes.html 
 or in the 
CHANGES.txt file accompanying the release. 

Solr 7.0.0 also includes many other new features as well as numerous 
optimizations and bugfixes of the corresponding Apache Lucene release. 

Please report any feedback to the mailing lists 
(http://lucene.apache.org/solr/discussion.html 
) 

Note: The Apache Software Foundation uses an extensive mirroring network for 
distributing releases. It is possible that the mirror you are using may not 
have replicated the release yet. If that is the case, please try another 
mirror. This also goes for Maven access.


Anshum Gupta





Re: question about an entry in the log file

2017-09-20 Thread kaveh minooie

Hi Erick

Thanks for your response. I understand your point, but what I was asking 
was does solr reopen searchers after a commit call even if the commit 
was called with openSearcher=false since this is what seems to be 
happening based on these log entries?


Also, it seems that if autocommit is configured with 
false and softAutoCommit is set to -1, 
absence of any other commit call, the new updates, although committed, 
would remain invisible forever. my problem here is that I am going for 
maximum indexing performance, millions of document per each batch job, 
couple of batch jobs per day. their immediate visibility or even delayed 
visibility is not a priority, but they have to become visible at some 
point. preferably at the end of each batch job. what do you think would 
be the best way to go about this?


thanks,

On 09/20/2017 10:17 AM, Erick Erickson wrote:

First, I would not recommend you call commit from the client. It's
usually far better to let your autocommit settings in solrconfig.xml
deal with it. When you need to search, you either need to configure
 with true

or set  to something other than -1.

https://lucidworks.com/2013/08/23/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/

Best,
Erick

On Tue, Sep 19, 2017 at 4:53 PM, kaveh minooie  wrote:

Hi eveyone

I am trying to figure out why calling commit from my client takes a very
long time in an environment with concurrent updates, and I see the following
snippet in the solr log files when client calls for commit. my question is
regarding the third info. what is it opening? and how can make solr to stop
doing that?


INFO  - 2017-09-19 16:42:20.557; [c:dosweb2016 s:shard2 r:core_node5
x:dosweb2016] org.apache.solr.update.DirectUpdateHandler2; start
commit{,optimize=false,openSearcher=false,waitSearcher=true,expungeDeletes=false,softCommit=false,prepareCommit=false}
INFO  - 2017-09-19 16:42:20.557; [c:dosweb2016 s:shard2 r:core_node5
x:dosweb2016] org.apache.solr.update.SolrIndexWriter; Calling setCommitData
with IW:org.apache.solr.update.SolrIndexWriter@3ee73284
INFO  - 2017-09-19 16:42:20.660; [c:dosweb2016 s:shard2 r:core_node5
x:dosweb2016] org.apache.solr.search.SolrIndexSearcher; Opening
[Searcher@644a8d33[dosweb2016] realtime]
INFO  - 2017-09-19 16:42:20.668; [c:dosweb2016 s:shard2 r:core_node5
x:dosweb2016] org.apache.solr.update.DirectUpdateHandler2; end_commit_flush


thanks,
--
Kaveh Minooie


--
Kaveh Minooie


Strange Behavior When Extracting Features

2017-09-20 Thread Michael Alcorn
Hi all,

I'm getting some extremely strange behavior when trying to extract features
for a learning to rank model. The following query incorrectly says all
features have zero values:

http://gss-test-fusion.usersys.redhat.com:8983/solr/access/query?q=added
couple of fiber channel={!ltr model=redhat_efi_model reRankDocs=1
efi.case_summary=the efi.case_description=added couple of fiber channel
efi.case_issue=the efi.case_environment=the}=id,score,[features]=10

But this query, which simply moves the word "added" from the front of the
provided text to the back, properly fills in the feature values:

http://gss-test-fusion.usersys.redhat.com:8983/solr/access/query?q=couple
of fiber channel added={!ltr model=redhat_efi_model reRankDocs=1
efi.case_summary=the efi.case_description=couple of fiber channel added
efi.case_issue=the efi.case_environment=the}=id,score,[features]=10

The explain output for the failing query can be found here:

https://gist.github.com/manisnesan/18a8f1804f29b1b62ebfae1211f38cc4

and the explain output for the properly functioning query can be found here:

https://gist.github.com/manisnesan/47685a561605e2229434b38aed11cc65

Have any of you run into this issue? Seems like it could be a bug.

Thanks,
Michael A. Alcorn


Re: Solr replication

2017-09-20 Thread Satyaprashant Bezwada
Thanks Eric, fixed the issue. The IT team corrected the solrconfig.xml but 
forgot to execute the zkcli.sh script on solr node. After I executed the script 
its working now.


On 9/20/17, 10:20 AM, "Erick Erickson"  wrote:

WARNING - External email; exercise caution.


Your solrconfig.xml file is mal-formed. The smoking gun is:

 Exception during parsing file: solrconfig.xml

Best,
Erick

On Tue, Sep 19, 2017 at 4:48 PM, Satyaprashant Bezwada
 wrote:
> Need some inputs or help in resolving replication across solr nodes. We 
have installed Solr 6.5 in cloud mode and have 3 ZooKeepers and 2 Solr nodes 
configured. Enabled Solr replication in my Solrj client but the replication 
fails and is unable to create a collection.
>
> The same code works in our different environment where in I have 1 
zookeeper and 3 Solr nodes configured. Here is the exception I see on one of 
the nodes of Solr when I try to create a collection in the environment where it 
fails. I have compared the Solrconfig.xml on both the environments and didn’t 
see any difference.
>
>
>
> 2017-09-19 22:09:35.471 ERROR 
(OverseerThreadFactory-8-thread-1-processing-n:sr01:8983_solr) [   ] 
o.a.s.c.OverseerCollectionMessageHandler Cleaning up collection [CIUZLW].
> 2017-09-19 22:09:35.475 INFO  
(OverseerThreadFactory-8-thread-1-processing-n:sr01:8983_solr) [   ] 
o.a.s.c.OverseerCollectionMessageHandler Executing Collection Cmd : 
action=UNLOAD=true=true
> 2017-09-19 22:09:35.486 INFO  (qtp401424608-15) [   ] 
o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/cores 
params={deleteInstanceDir=true=CIUZLW_shard1_replica1=/admin/cores=true=UNLOAD=javabin=2}
 status=0 QTime=6
> 2017-09-19 22:09:36.194 INFO  
(OverseerThreadFactory-8-thread-1-processing-n:sr01:8983_solr) [   ] 
o.a.s.c.CreateCollectionCmd Cleaned up artifacts for failed create collection 
for [CIUZLW]
> 2017-09-19 22:09:38.008 INFO  
(OverseerCollectionConfigSetProcessor-170740497916499012-sr01:8983_solr-n_29)
 [   ] o.a.s.c.OverseerTaskQueue Response ZK path: 
/overseer/collection-queue-work/qnr-000410 doesn't exist.  Requestor may 
have disconnected from ZooKeeper
> 2017-09-19 22:38:36.634 INFO  (ShutdownMonitor) [   ] 
o.a.s.c.CoreContainer Shutting down CoreContainer instance=1549725679
> 2017-09-19 22:38:36.644 INFO  (ShutdownMonitor) [   ] o.a.s.c.Overseer 
Overseer (id=170740497916499012-sr01:8983_solr-n_29) closing
> 2017-09-19 22:38:36.645 INFO  
(OverseerStateUpdate-170740497916499012-sr01:8983_solr-n_29) [   ] 
o.a.s.c.Overseer Overseer Loop exiting : sr01:8983_solr
> 2017-09-19 22:38:36.654 INFO  (ShutdownMonitor) [   ] 
o.a.s.m.SolrMetricManager Closing metric reporters for: solr.node
> ^CCorp-QA-West [sbezw...@boardvantage.net@sr1501:1 ~]$
> Corp-QA-West [sbezw...@boardvantage.net@sr1501:1 ~]$ sudo tail -f 
/var/solr/logs/solr.log
> at java.lang.Thread.run(Thread.java:745)
> 2017-09-19 22:09:35.471 ERROR 
(OverseerThreadFactory-8-thread-1-processing-n:sr01:8983_solr) [   ] 
o.a.s.c.OverseerCollectionMessageHandler Cleaning up collection [CIUZLW].
> 2017-09-19 22:09:35.475 INFO  
(OverseerThreadFactory-8-thread-1-processing-n:sr01:8983_solr) [   ] 
o.a.s.c.OverseerCollectionMessageHandler Executing Collection Cmd : 
action=UNLOAD=true=true
> 2017-09-19 22:09:35.486 INFO  (qtp401424608-15) [   ] 
o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/cores 
params={deleteInstanceDir=true=CIUZLW_shard1_replica1=/admin/cores=true=UNLOAD=javabin=2}
 status=0 QTime=6
> 2017-09-19 22:09:36.194 INFO  
(OverseerThreadFactory-8-thread-1-processing-n:sr01:8983_solr) [   ] 
o.a.s.c.CreateCollectionCmd Cleaned up artifacts for failed create collection 
for [CIUZLW]
> 2017-09-19 22:09:38.008 INFO  
(OverseerCollectionConfigSetProcessor-170740497916499012-sr01:8983_solr-n_29)
 [   ] o.a.s.c.OverseerTaskQueue Response ZK path: 
/overseer/collection-queue-work/qnr-000410 doesn't exist.  Requestor may 
have disconnected from ZooKeeper
> 2017-09-19 22:38:36.634 INFO  (ShutdownMonitor) [   ] 
o.a.s.c.CoreContainer Shutting down CoreContainer instance=1549725679
> 2017-09-19 22:38:36.644 INFO  (ShutdownMonitor) [   ] o.a.s.c.Overseer 
Overseer (id=170740497916499012-sr01:8983_solr-n_29) closing
> 2017-09-19 22:38:36.645 INFO  
(OverseerStateUpdate-170740497916499012-sr01:8983_solr-n_29) [   ] 
o.a.s.c.Overseer Overseer Loop exiting : sr01:8983_solr
> 2017-09-19 22:38:36.654 INFO  (ShutdownMonitor) [   ] 
o.a.s.m.SolrMetricManager Closing metric reporters for: solr.node
> ^CCorp-QA-West [sbezw...@boardvantage.net@sr1501:1 ~]$ sudo tail -f 
/var/solr/logs/solr.log
> 2017-09-19 23:12:22.230 INFO  (main) [   ] o.a.s.s.SolrDispatchFilter 
Loading solr.xml from SolrHome (not found in ZooKeeper)
> 

Re: Solr replication

2017-09-20 Thread Erick Erickson
Your solrconfig.xml file is mal-formed. The smoking gun is:

 Exception during parsing file: solrconfig.xml

Best,
Erick

On Tue, Sep 19, 2017 at 4:48 PM, Satyaprashant Bezwada
 wrote:
> Need some inputs or help in resolving replication across solr nodes. We have 
> installed Solr 6.5 in cloud mode and have 3 ZooKeepers and 2 Solr nodes 
> configured. Enabled Solr replication in my Solrj client but the replication 
> fails and is unable to create a collection.
>
> The same code works in our different environment where in I have 1 zookeeper 
> and 3 Solr nodes configured. Here is the exception I see on one of the nodes 
> of Solr when I try to create a collection in the environment where it fails. 
> I have compared the Solrconfig.xml on both the environments and didn’t see 
> any difference.
>
>
>
> 2017-09-19 22:09:35.471 ERROR 
> (OverseerThreadFactory-8-thread-1-processing-n:sr01:8983_solr) [   ] 
> o.a.s.c.OverseerCollectionMessageHandler Cleaning up collection [CIUZLW].
> 2017-09-19 22:09:35.475 INFO  
> (OverseerThreadFactory-8-thread-1-processing-n:sr01:8983_solr) [   ] 
> o.a.s.c.OverseerCollectionMessageHandler Executing Collection Cmd : 
> action=UNLOAD=true=true
> 2017-09-19 22:09:35.486 INFO  (qtp401424608-15) [   ] o.a.s.s.HttpSolrCall 
> [admin] webapp=null path=/admin/cores 
> params={deleteInstanceDir=true=CIUZLW_shard1_replica1=/admin/cores=true=UNLOAD=javabin=2}
>  status=0 QTime=6
> 2017-09-19 22:09:36.194 INFO  
> (OverseerThreadFactory-8-thread-1-processing-n:sr01:8983_solr) [   ] 
> o.a.s.c.CreateCollectionCmd Cleaned up artifacts for failed create collection 
> for [CIUZLW]
> 2017-09-19 22:09:38.008 INFO  
> (OverseerCollectionConfigSetProcessor-170740497916499012-sr01:8983_solr-n_29)
>  [   ] o.a.s.c.OverseerTaskQueue Response ZK path: 
> /overseer/collection-queue-work/qnr-000410 doesn't exist.  Requestor may 
> have disconnected from ZooKeeper
> 2017-09-19 22:38:36.634 INFO  (ShutdownMonitor) [   ] o.a.s.c.CoreContainer 
> Shutting down CoreContainer instance=1549725679
> 2017-09-19 22:38:36.644 INFO  (ShutdownMonitor) [   ] o.a.s.c.Overseer 
> Overseer (id=170740497916499012-sr01:8983_solr-n_29) closing
> 2017-09-19 22:38:36.645 INFO  
> (OverseerStateUpdate-170740497916499012-sr01:8983_solr-n_29) [   ] 
> o.a.s.c.Overseer Overseer Loop exiting : sr01:8983_solr
> 2017-09-19 22:38:36.654 INFO  (ShutdownMonitor) [   ] 
> o.a.s.m.SolrMetricManager Closing metric reporters for: solr.node
> ^CCorp-QA-West [sbezw...@boardvantage.net@sr1501:1 ~]$
> Corp-QA-West [sbezw...@boardvantage.net@sr1501:1 ~]$ sudo tail -f 
> /var/solr/logs/solr.log
> at java.lang.Thread.run(Thread.java:745)
> 2017-09-19 22:09:35.471 ERROR 
> (OverseerThreadFactory-8-thread-1-processing-n:sr01:8983_solr) [   ] 
> o.a.s.c.OverseerCollectionMessageHandler Cleaning up collection [CIUZLW].
> 2017-09-19 22:09:35.475 INFO  
> (OverseerThreadFactory-8-thread-1-processing-n:sr01:8983_solr) [   ] 
> o.a.s.c.OverseerCollectionMessageHandler Executing Collection Cmd : 
> action=UNLOAD=true=true
> 2017-09-19 22:09:35.486 INFO  (qtp401424608-15) [   ] o.a.s.s.HttpSolrCall 
> [admin] webapp=null path=/admin/cores 
> params={deleteInstanceDir=true=CIUZLW_shard1_replica1=/admin/cores=true=UNLOAD=javabin=2}
>  status=0 QTime=6
> 2017-09-19 22:09:36.194 INFO  
> (OverseerThreadFactory-8-thread-1-processing-n:sr01:8983_solr) [   ] 
> o.a.s.c.CreateCollectionCmd Cleaned up artifacts for failed create collection 
> for [CIUZLW]
> 2017-09-19 22:09:38.008 INFO  
> (OverseerCollectionConfigSetProcessor-170740497916499012-sr01:8983_solr-n_29)
>  [   ] o.a.s.c.OverseerTaskQueue Response ZK path: 
> /overseer/collection-queue-work/qnr-000410 doesn't exist.  Requestor may 
> have disconnected from ZooKeeper
> 2017-09-19 22:38:36.634 INFO  (ShutdownMonitor) [   ] o.a.s.c.CoreContainer 
> Shutting down CoreContainer instance=1549725679
> 2017-09-19 22:38:36.644 INFO  (ShutdownMonitor) [   ] o.a.s.c.Overseer 
> Overseer (id=170740497916499012-sr01:8983_solr-n_29) closing
> 2017-09-19 22:38:36.645 INFO  
> (OverseerStateUpdate-170740497916499012-sr01:8983_solr-n_29) [   ] 
> o.a.s.c.Overseer Overseer Loop exiting : sr01:8983_solr
> 2017-09-19 22:38:36.654 INFO  (ShutdownMonitor) [   ] 
> o.a.s.m.SolrMetricManager Closing metric reporters for: solr.node
> ^CCorp-QA-West [sbezw...@boardvantage.net@sr1501:1 ~]$ sudo tail -f 
> /var/solr/logs/solr.log
> 2017-09-19 23:12:22.230 INFO  (main) [   ] o.a.s.s.SolrDispatchFilter Loading 
> solr.xml from SolrHome (not found in ZooKeeper)
> 2017-09-19 23:12:22.233 INFO  (main) [   ] o.a.s.c.SolrXmlConfig Loading 
> container configuration from /var/solr/data/Node122/solr.xml
> 2017-09-19 23:12:22.644 INFO  (main) [   ] o.a.s.u.UpdateShardHandler 
> Creating UpdateShardHandler HTTP client with params: 
> socketTimeout=60=6=true
> 2017-09-19 23:12:22.650 INFO  (main) [   ] o.a.s.c.ZkContainer 

Re: question about an entry in the log file

2017-09-20 Thread Erick Erickson
First, I would not recommend you call commit from the client. It's
usually far better to let your autocommit settings in solrconfig.xml
deal with it. When you need to search, you either need to configure
 with true

or set  to something other than -1.

https://lucidworks.com/2013/08/23/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/

Best,
Erick

On Tue, Sep 19, 2017 at 4:53 PM, kaveh minooie  wrote:
> Hi eveyone
>
> I am trying to figure out why calling commit from my client takes a very
> long time in an environment with concurrent updates, and I see the following
> snippet in the solr log files when client calls for commit. my question is
> regarding the third info. what is it opening? and how can make solr to stop
> doing that?
>
>
> INFO  - 2017-09-19 16:42:20.557; [c:dosweb2016 s:shard2 r:core_node5
> x:dosweb2016] org.apache.solr.update.DirectUpdateHandler2; start
> commit{,optimize=false,openSearcher=false,waitSearcher=true,expungeDeletes=false,softCommit=false,prepareCommit=false}
> INFO  - 2017-09-19 16:42:20.557; [c:dosweb2016 s:shard2 r:core_node5
> x:dosweb2016] org.apache.solr.update.SolrIndexWriter; Calling setCommitData
> with IW:org.apache.solr.update.SolrIndexWriter@3ee73284
> INFO  - 2017-09-19 16:42:20.660; [c:dosweb2016 s:shard2 r:core_node5
> x:dosweb2016] org.apache.solr.search.SolrIndexSearcher; Opening
> [Searcher@644a8d33[dosweb2016] realtime]
> INFO  - 2017-09-19 16:42:20.668; [c:dosweb2016 s:shard2 r:core_node5
> x:dosweb2016] org.apache.solr.update.DirectUpdateHandler2; end_commit_flush
>
>
> thanks,
> --
> Kaveh Minooie


Re: no search results for specific search in solr 6.6.0

2017-09-20 Thread Erick Erickson
Just go to the admin/analysis page and enter the terms in the "index"
box (I usually uncheck the "verbose" checkbox). You will see exactly
what element in your analysis chain is doing this. You'll see light
gray two-letter codes on the size, e.g. "ST". Hover over it with your
mouse, and you should see exactly what the class and thus the
easily-identifiable element of your fieldType for the field in
question. For instance:

solr.StandardTokenizerFactory

text_general may have fixed _this_ problem, but it's not a great
solution. The french analysis chain is tuned to create a better
solution for, well, french. Likely solr.FrenchLightStemFilterFactory
is removing the last "o", but that's a guess.

In general, stemming is incompatible with wildcards. E.g. "running"
stems to "run", but "runni*" has no real algorithm that can stem.

Best,
Erick

On Wed, Sep 20, 2017 at 5:18 AM, Sascha Tuschinski
 wrote:
> Hello Erik and Josh,
>
> Thanks for your hints and comments.
>
> I found out that the “text_fr” field type didn’t stored the “fraoo” as term. 
> It stored “frao” only. Maybe because of French field type. This field had 
> been automatically created. I’m new to Solr and this is maybe correct.
>
> I use “text_general” as field type now and this works fine. This is fine and 
> solve our problem.
>
> I can deliver the output of the debug query from admin/analysis for the 
> text_fr field type if required.
>
> Thanks again!
> Sascha
>
>
> Am 19.09.17, 20:12 schrieb "Erick Erickson" :
>
> Unfortunately the link you provided goes to "localhost", which isn't 
> accessible.
>
> The very first thing I'd do is go to the admin/analysis page and put
> the terms in both the "index" and "query" boxes for the field in
> question.
> Next, attach =query to the query to see how the query is actually 
> parsed.
>
> My bet: You are using a different stemmer for the two cases and the
> actual token in the index is FRao in the problem field, but that's
> just a guess.
>
> It often fools people that the field returned in the document (i.e. in
> the fl list) is the _stored_ value, not the actual token in the index.
> You can also use the TermsComponent to see the actual terms in the
> index as well as the admin/schema_browser link.
>
> Best,
> Erick
>
>
> On Tue, Sep 19, 2017 at 9:01 AM, Sascha Tuschinski
>  wrote:
> > Hello Community,
> >
> > We are using a Solr Core with Solr 6.6.0 on Windows 10 (latest updates) 
> with field names defined like "f_1179014266_txt". The number in the middle of 
> the name differs for each field we use. For language specific fields we are 
> adding an language specific extension e.g. "f_1179014267_txt_fr", 
> "f_1179014268_txt_de", "f_1179014269_txt_en" and so on.
> > We are having the following odd issue within the french "_fr" field 
> only:
> > Field
> > 
> f_1197829835_txt_fr
> > Dynamic Field /
> > 
> *_txt_fr
> > Type
> > text_fr
> >
> >   *   The saved value which had been added with no problem to the Solr 
> index is "FRaoo".
> >   *   When searching within the Solr query tool for 
> "f_1197829839_txt_fr:*FRao*" it returns the items matching the term as seen 
> below - OK.
> > {
> >   "responseHeader":{
> > "status":0,
> > "QTime":1,
> > "params":{
> >   "q":"f_1197829839_txt_fr:*FRao*",
> >   "indent":"on",
> >   "wt":"json",
> >   "_":"1505808887827"}},
> >   "response":{"numFound":1,"start":0,"docs":[
> >   {
> > "id":"129",
> > "f_1197829834_txt_en":"EnAir",
> > "f_1197829822_txt_de":"Lufti",
> > "f_1197829835_txt_fr":"FRaoi",
> > "f_1197829836_txt_it":"ITAir",
> > "f_1197829799_txt":["Lufti"],
> > "f_1197829838_txt_en":"EnAir",
> > "f_1197829839_txt_fr":"FRaoo",
> > "f_1197829840_txt_it":"ITAir",
> > "_version_":1578520424165146624}]
> >   }}
> >
> >   *   When searching for "f_1197829839_txt_fr:*FRaoo*" NO item is found 
> - Wrong!
> > {
> >   "responseHeader":{
> > "status":0,
> > "QTime":1,
> > "params":{
> >   "q":"f_1197829839_txt_fr:*FRaoo*",
> >   "indent":"on",
> >   "wt":"json",
> >   "_":"1505808887827"}},
> >   "response":{"numFound":0,"start":0,"docs":[]
> >   }}
> > When searching for "f_1197829839_txt_fr:FRaoo" (no wildcards) the 
> matching items are found - OK
> >
> > {
> >   "responseHeader":{
> > "status":0,
> > "QTime":1,
> > "params":{
> >   

TermVectors and ExactStatsCache

2017-09-20 Thread Patrick Plante
Hi!

I have a SolrCloud 6.6 collection with 3 shards setup where I need the 
TermVectors TF and DF values when querying.

I have configured the ExactStatsCache in the solrConfig:



When I query "detector works" in my collection, it returns different docfreq 
values based on the shard the document comes from:

"termVectors":[
"27504103",[
  "uniqueKey","27504103",
  "kc",[
"detector works",[
  "tf",1,
  "df",3,
  "tf-idf",0.]]],
"27507925",[
  "uniqueKey","27507925",
  "kc",[
"detector works",[
  "tf",1,
  "df",3,
  "tf-idf",0.]]],
"27504105",[
  "uniqueKey","27504105",
  "kc",[
"detector works",[
  "tf",1,
  "df",2,
  "tf-idf",0.5]]],
"27507927",[
  "uniqueKey","27507927",
  "kc",[
"detector works",[
  "tf",1,
  "df",2,
  "tf-idf",0.5]]],
"27507929",[
  "uniqueKey","27507929",
  "kc",[
"detector works",[
  "tf",1,
  "df",1,
  "tf-idf",1.0]]],
"27504107",[
  "uniqueKey","27504107",
  "kc",[
"detector works",[
  "tf",1,
  "df",3,
  "tf-idf",0.}

I expect to see the DF values to be 6 and TF-IDF to be adjusted on that value. 
I can see in the debug logs that the cache was active.

I have found a pending bug (since Solr 5.5: 
https://issues.apache.org/jira/browse/SOLR-8893) that explains that this 
ExactStatsCache is used to compute the correct TF-IDF for the query but not for 
the TermVectors component.

Is there any way to get the correctly merged DF values (and TF-IDF) from 
multiple shards?

Is there a way to get from which shard a document comes from so I could compute 
my own correct DF?

Thank you,
Patrick




cannot create core when SSL is enabled

2017-09-20 Thread Younge, Kent A - Norman, OK - Contractor
Hello,

I am getting an error message when trying to create a core when ssl is enabled 
ERROR: Certificate for  doesn't match any of the subject alternative 
names:

However, if I turn off ssl I can create the core just fine.   I have my 
certificates in the solr-6.5.1 directory should they be placed somewhere else 
to resolve this issue?





Thanks,

Kent


Re: Not able to import timestamp data into Solr

2017-09-20 Thread Susheel Kumar
Checkout this article for working with date types and format etc.
http://lucene.apache.org/solr/guide/6_6/working-with-dates.html

On Wed, Sep 20, 2017 at 6:32 AM, shankhamajumdar <
shankha.majum...@lexmark.com> wrote:

> Hi,
>
> I have a field with timestamp data in Cassandra for example - 2017-09-20
> 10:25:46.752000+.
> I am not able to import the data using Solr DataImportHandler, getting the
> bellow error in the Solr log.
>
> Caused by: java.lang.StringIndexOutOfBoundsException: String index out of
> range: -1
>
> I am able to import other datatype data from Cassandra to Solr. I am using
> below configuration
> managed-schema
>  required="true" multiValued="false" />
>  required="true" multiValued="false" />
>  required="true" multiValued="false" />
>  required="true"  multiValued="false" />
>
> dataconfig.xml
> query="SELECT test_data1,test_data2,test_data3, upserttime from
> test_table"
> autoCommit="true">
> 
> 
> 
> 
>
> Regards,
> Shankha
>
>
>
>
>
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>


Re: no search results for specific search in solr 6.6.0

2017-09-20 Thread Sascha Tuschinski
Hello Erik and Josh,

Thanks for your hints and comments.

I found out that the “text_fr” field type didn’t stored the “fraoo” as term. It 
stored “frao” only. Maybe because of French field type. This field had been 
automatically created. I’m new to Solr and this is maybe correct.

I use “text_general” as field type now and this works fine. This is fine and 
solve our problem.

I can deliver the output of the debug query from admin/analysis for the text_fr 
field type if required.

Thanks again!
Sascha
 

Am 19.09.17, 20:12 schrieb "Erick Erickson" :

Unfortunately the link you provided goes to "localhost", which isn't 
accessible.

The very first thing I'd do is go to the admin/analysis page and put
the terms in both the "index" and "query" boxes for the field in
question.
Next, attach =query to the query to see how the query is actually 
parsed.

My bet: You are using a different stemmer for the two cases and the
actual token in the index is FRao in the problem field, but that's
just a guess.

It often fools people that the field returned in the document (i.e. in
the fl list) is the _stored_ value, not the actual token in the index.
You can also use the TermsComponent to see the actual terms in the
index as well as the admin/schema_browser link.

Best,
Erick


On Tue, Sep 19, 2017 at 9:01 AM, Sascha Tuschinski
 wrote:
> Hello Community,
>
> We are using a Solr Core with Solr 6.6.0 on Windows 10 (latest updates) 
with field names defined like "f_1179014266_txt". The number in the middle of 
the name differs for each field we use. For language specific fields we are 
adding an language specific extension e.g. "f_1179014267_txt_fr", 
"f_1179014268_txt_de", "f_1179014269_txt_en" and so on.
> We are having the following odd issue within the french "_fr" field only:
> Field
> 
f_1197829835_txt_fr
> Dynamic Field /
> 
*_txt_fr
> Type
> text_fr
>
>   *   The saved value which had been added with no problem to the Solr 
index is "FRaoo".
>   *   When searching within the Solr query tool for 
"f_1197829839_txt_fr:*FRao*" it returns the items matching the term as seen 
below - OK.
> {
>   "responseHeader":{
> "status":0,
> "QTime":1,
> "params":{
>   "q":"f_1197829839_txt_fr:*FRao*",
>   "indent":"on",
>   "wt":"json",
>   "_":"1505808887827"}},
>   "response":{"numFound":1,"start":0,"docs":[
>   {
> "id":"129",
> "f_1197829834_txt_en":"EnAir",
> "f_1197829822_txt_de":"Lufti",
> "f_1197829835_txt_fr":"FRaoi",
> "f_1197829836_txt_it":"ITAir",
> "f_1197829799_txt":["Lufti"],
> "f_1197829838_txt_en":"EnAir",
> "f_1197829839_txt_fr":"FRaoo",
> "f_1197829840_txt_it":"ITAir",
> "_version_":1578520424165146624}]
>   }}
>
>   *   When searching for "f_1197829839_txt_fr:*FRaoo*" NO item is found - 
Wrong!
> {
>   "responseHeader":{
> "status":0,
> "QTime":1,
> "params":{
>   "q":"f_1197829839_txt_fr:*FRaoo*",
>   "indent":"on",
>   "wt":"json",
>   "_":"1505808887827"}},
>   "response":{"numFound":0,"start":0,"docs":[]
>   }}
> When searching for "f_1197829839_txt_fr:FRaoo" (no wildcards) the 
matching items are found - OK
>
> {
>   "responseHeader":{
> "status":0,
> "QTime":1,
> "params":{
>   "q":"f_1197829839_txt_fr:FRaoo",
>   "indent":"on",
>   "wt":"json",
>   "_":"1505808887827"}},
>   "response":{"numFound":1,"start":0,"docs":[
>   {
> "id":"129",
> "f_1197829834_txt_en":"EnAir",
> "f_1197829822_txt_de":"Lufti",
> "f_1197829835_txt_fr":"FRaoi",
> "f_1197829836_txt_it":"ITAir",
> "f_1197829799_txt":["Lufti"],
> "f_1197829838_txt_en":"EnAir",
> "f_1197829839_txt_fr":"FRaoo",
> "f_1197829840_txt_it":"ITAir",
> "_version_":1578520424165146624}]
>   }}
> If we save exact the same value into a different language field e.g. 
ending on "_en", means "f_1197829834_txt_en", then the search 
"f_1197829834_txt_en:*FRaoo*" find all items correctly!
> We have no idea what's wrong here and we even recreated the index and can 
reproduce this problem all the time. I can only see that the value starts with 
"FR" and the field extension ends with "fr" but this is not problem for "en", 
"de" an so on. All fields are used in the same way and have the same field 

Not able to import timestamp data into Solr

2017-09-20 Thread shankhamajumdar
Hi,

I have a field with timestamp data in Cassandra for example - 2017-09-20
10:25:46.752000+.
I am not able to import the data using Solr DataImportHandler, getting the
bellow error in the Solr log.

Caused by: java.lang.StringIndexOutOfBoundsException: String index out of
range: -1

I am able to import other datatype data from Cassandra to Solr. I am using
below configuration 
managed-schema





dataconfig.xml
query="SELECT test_data1,test_data2,test_data3, upserttime from test_table"
autoCommit="true">





Regards,
Shankha





--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Meet CorruptIndexException while shutdown one node in Solr cloud

2017-09-20 Thread wg85907
Hi Erick,
Thanks for your advice about having openSearcher set to true
unnecessary for my case. For CorruptIndexException issue, I think Solr
should handle this quite well too. Because I always shutdown tomcat
gracefully. 
 Recently I did a couple of tests about this issue. When keep
posting update request to Solr and stop one of three tomcat node in a single
shard cluster, it is easy to reproduction CorruptIndexException, no matter
the stop node is leader node or replica node. So I think this is a Bug of
Solr. Any idea how can I avoid meeting this issue? For example if I can
remove one node from zookeeper before stop it. Also please show me if reboot
tomcat node is the only way to resolve the memory issue. If I can control
the field cache size, then reboot is unnecessary.

Below is the trace when start tomcat and first time meet
CorruptIndexException issue:
2017-09-19 10:18:57,614 ERROR [RecoveryThread][RQ-Init]
(SolrException.java:142) - SnapPull failed
:org.apache.solr.common.SolrException: Error opening new searcher
at org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:1565)
at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1677)
at
org.apache.solr.handler.SnapPuller.openNewSearcherAndUpdateCommitPoint(SnapPuller.java:673)
at
org.apache.solr.handler.SnapPuller.fetchLatestIndex(SnapPuller.java:493)
at
org.apache.solr.handler.ReplicationHandler.doFetch(ReplicationHandler.java:337)
at
org.apache.solr.cloud.RecoveryStrategy.replicate(RecoveryStrategy.java:163)
at
org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:447)
at
org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:235)
Caused by: org.apache.lucene.index.CorruptIndexException:
liveDocs.count()=10309577 info.docCount=15057819 info.getDelCount()=4748252
(filename=_4y65a_13g.del)
at
org.apache.lucene.codecs.lucene40.Lucene40LiveDocsFormat.readLiveDocs(Lucene40LiveDocsFormat.java:96)
at
org.apache.lucene.index.SegmentReader.(SegmentReader.java:116)
at
org.apache.lucene.index.ReadersAndUpdates.getReader(ReadersAndUpdates.java:144)
at
org.apache.lucene.index.ReadersAndUpdates.getReadOnlyClone(ReadersAndUpdates.java:238)
at
org.apache.lucene.index.StandardDirectoryReader.open(StandardDirectoryReader.java:104)
at
org.apache.lucene.index.IndexWriter.getReader(IndexWriter.java:422)
at
org.apache.lucene.index.StandardDirectoryReader.doOpenIfChanged(StandardDirectoryReader.java:279)
at
org.apache.lucene.index.DirectoryReader.openIfChanged(DirectoryReader.java:251)
at org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:1476)
... 7 more


Regards.
Geng, Wei 




--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Cannot load LTRQParserPlugin inot my core

2017-09-20 Thread alessandro.benedetti
Hi Billy,
there is a README.TXT in the contrib/ltr directory.
Reading that you find this useful link[1] .
>From that useful link you see where the Jar of the plugin is located.
Specifically :



Taking a look to the  contrib and dist structure it seems quite a standard
approach to keep the readme in the contrib ( while in the source code the
contrib modules contain the plugins code).
The Solr binaries are located in the dist directory.
External libraries are in contrib.


[1]
https://cwiki.apache.org/confluence/display/solr/Learning+To+Rank#LearningToRank-Installation



-
---
Alessandro Benedetti
Search Consultant, R Software Engineer, Director
Sease Ltd. - www.sease.io
--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html