Solr 8.3.0

2019-11-17 Thread vishal patel
I have created 2 shards of Solr 8.3.0. After I have created 10 collections and 
also re-indexed data.

Some fields are changed in one collection. I deleted a version-2 folder from 
zoo_data and up config that collection.

Is it necessary to create all collections again? Also indexing data again?

Regards,
Vishal


Solr 8.3.0

2019-11-17 Thread vishal patel

I have created 2 shards of Solr 8.3.0. We have created 27 collections using the 
below
http://191.162.100.148:7971/solr/admin/collections?_=1573813004271&action=CREATE&autoAddReplicas=false&collection.configName=actionscomments&maxShardsPerNode=1&name=actionscomments&numShards=2&replicationFactor=1&router.name=compositeId&wt=json


After the re-indexing Data, I want to add a replica of each shard. How can I 
add a replica without re-creating collection and re-indexing?
Can I add one more shard dynamically without re-creating collections and 
re-indexing?


Re: When search term has two stopwords ('and' and 'a') together, it doesn't work

2019-11-17 Thread Paras Lehana
Hi Guilherme,

Have you tried reindexing the documents and compare the results? No issues
if you cannot do that - let's try something else. I was going through the
whole mail and your files. You had said:

As soon as I add dbId or stId (regardless the boost, 1.0 or 100.0), then I
> don't get anything (which make sense).


Why did you think that not getting anything when you add dbId made sense?
Asking because I may be missing something here.

Also, what is the purpose of so many qf's? Going through your documents and
config files, I found that your dbId's are string of numbers and I don't
think you want to find your query terms in dbId, right?
Do you want to boost the score by the values in dbId?

Your qf of dbId^100 boosts documents containing terms in q by 100x. Since
your terms don't match with the values in dbId for any document, the score
produced by this scoring is 0. 100x or 1x of 0 is still 0.
I still need to see how this scoring gets added up in edismax parser but do
reevaluate the usage of these qfs. Same goes for other qf boosts. :)


On Fri, 15 Nov 2019 at 12:23, Guilherme Viteri  wrote:

> Hi Paras
> No worries.
> No I didn’t find anything. This is annoying now...
> Yes! They do contain dbId. Absolutely all my docs contains dbId and it is
> actually my key, if you check again the schema.xml
>
> Cheers
> Guilherme
>
> On 15 Nov 2019, at 05:37, Paras Lehana  wrote:
>
> 
> Hey Guilherme,
>
> I was a bit busy for the past few days and couldn't read your mail. So,
> did you find anything? Anyways, as I had expected, the culprit is
> definitely among the qfs. Do the documents in concern contain dbId? I
> suggest you to cross check the fields in your document with those impacting
> the result in qf.
>
> On Tue, 12 Nov 2019 at 16:14, Guilherme Viteri  wrote:
>
>> What I can't understand is:
>> I search for the exact term - "Immunoregulatory interactions between a
>> Lymphoid *and a* non-Lymphoid cell" and If i search "I search for the
>> exact term - Immunoregulatory interactions between a Lymphoid *and 
>> *non-Lymphoid
>> cell" then it works
>>
>> On 11 Nov 2019, at 12:24, Guilherme Viteri  wrote:
>>
>> Thanks
>>
>> Removing stopwords is another story. I'm curious to find the reason
>> assuming that you keep on using stopwords. In some cases, stopwords are
>> really necessary.
>>
>> Yes. It always make sense the way we've been using.
>>
>> If q.alt is giving you responses, it's confirmed that your stopwords
>> filter
>> is working as expected. The problem definitely lies in the configuration
>> of
>> edismax.
>>
>> I see.
>>
>> *Let me explain again:* In your solrconfig.xml, look at your /search
>>
>> Ok, using q now, removed all qf, performed the search and I got 23
>> results, and the one I really want, on the top.
>> As soon as I add dbId or stId (regardless the boost, 1.0 or 100.0), then
>> I don't get anything (which make sense). However if I query name_exact, I
>> get the 23 results again, and unfortunately if I query stId^1.0
>> name_exact^10.0 I still don't get any results.
>>
>> In summary
>> - without qf - 23 results
>> - dbId - 0 results
>> - name_exact - 16 results
>> - name - 23 results
>> - dbId^1.0
>>  name_exact^10.0 - 0 results
>> - 0 results if any other, stId, dbId (key) is added on top of the
>> name(name_exact, etc).
>>
>> Definitely lost here! :-/
>>
>>
>> On 11 Nov 2019, at 07:59, Paras Lehana 
>> wrote:
>>
>> Hi
>>
>> So I don't think removing it completely is the way to go from the scenario
>>
>> we have
>>
>>
>>
>> Removing stopwords is another story. I'm curious to find the reason
>> assuming that you keep on using stopwords. In some cases, stopwords are
>> really necessary.
>>
>>
>> Quite a considerable increase
>>
>>
>> If q.alt is giving you responses, it's confirmed that your stopwords
>> filter
>> is working as expected. The problem definitely lies in the configuration
>> of
>> edismax.
>>
>>
>>
>> I am sorry but I didn't understand what do you want me to do exactly with
>> the lst (??) and qf and bf.
>>
>>
>>
>> What combinations did you try? I was referring to the field-level boosting
>> you have applied in edismax config.
>>
>> *Let me explain again:* In your solrconfig.xml, look at your /search
>> request handler. There are many qf and some bq boosts. I want you to
>> remove
>> all of these, check response again (with q now) and keep on adding them
>> again (one by one) while looking for when the numFound drastically
>> changes.
>>
>> On Fri, 8 Nov 2019 at 23:47, David Hastings > >
>> wrote:
>>
>> I use 3 word shingles with stopwords for my MLT ML trainer that worked
>> pretty well for such a solution, but for a full index the size became
>> prohibitive
>>
>> On Fri, Nov 8, 2019 at 12:13 PM Walter Underwood 
>> wrote:
>>
>> If we had IDF for phrases, they would be super effective. The 2X weight
>>
>> is
>>
>> a hack that mostly works.
>>
>> Infoseek had phrase IDF and it was a killer algorithm for relevance.
>>
>> wunder
>> Walter Underwood
>> wun...@wunderwood.o

Use of TLog

2019-11-17 Thread Sripra deep
Hi Guys,

 I observed a scenario with the tlog creation and usage and couldn't find
any usage for the tlog.

Solr version: 7.1.0
Number of shards = 3
Number of replica = 1
I indexed the about 10k docs into the collection.

 Scenario 1:
  Using add replica collection API, I created one more replica (tried with
both nrt and tlog) neither of the replicas doesn't pull the tlog files.
Only the index files are pulled from master.
  * If the tlog is not present in a replica then during ungraceful shutdown
of the solr server how the replicas will regain the index without tlog
files.
  * To verify the above scenario, I killed the newly added replica server
with kill -9  command and started back
  also stopped the leader node.

 Questions:
  1) TLog files are not used even in the case of ungraceful shutdown, where
else it will be used?
  2) Tlog files doesn't get copied to the newly added replica so adding a
new replica to the already created collection with data/index is not
advisable?
  3) Is there a way to make the newly added slave node to replicate the
tlog file as it does for the data/index files from leader?
  4) Is it possible to use the Tlog files /index files from an existing
solr server to spin up a new solr cluster?


It would be much helpful for me to understand the core working of Solr
server.

Thanks,
Sripradeep P