Re: How to specify custom update chain in a SolrJ request

2019-01-29 Thread Chris Wareham

Answering myself, the solution is to update my code as follows:

UpdateRequest request = new UpdateRequest();
request.setParam("update.chain", "skipexisting");

for (Map.Entry user : users.entrySet()) {
SolrInputDocument document = new SolrInputDocument();
document.addField("id", user.key().toString());
document.addField("applications", Collections.singletonMap("set", 
user.value()));


request.add(document);
request.process(solrClient);
}

solrClient.commit();

On 29/01/2019 16:27, Chris Wareham wrote:
I'm trying to update records in my Solr core, and have configured a 
custom update chain that skips updates to records that don't exist:



    
    
  true
  true
    
    
    
  

My SolrJ update code is currently:

for (Map.Entry user : users.entrySet()) {
     SolrInputDocument document = new SolrInputDocument();
     document.addField("id", user.key().toString());
     document.addField("applications", Collections.singletonMap("set", 
user.value()));


     solrClient.add(document);
}

solrClient.commit();

I can't seem to specify the update chain to use and I assume I need to 
use the UpdateRequest class. However, it's not clear how I go about 
setting a parameter on the UpdateRequest in order to specify the update 
chain.


Chris


How to specify custom update chain in a SolrJ request

2019-01-29 Thread Chris Wareham
I'm trying to update records in my Solr core, and have configured a 
custom update chain that skips updates to records that don't exist:



   
   
 true
 true
   
   
   
 

My SolrJ update code is currently:

for (Map.Entry user : users.entrySet()) {
SolrInputDocument document = new SolrInputDocument();
document.addField("id", user.key().toString());
document.addField("applications", Collections.singletonMap("set", 
user.value()));


solrClient.add(document);
}

solrClient.commit();

I can't seem to specify the update chain to use and I assume I need to 
use the UpdateRequest class. However, it's not clear how I go about 
setting a parameter on the UpdateRequest in order to specify the update 
chain.


Chris


Re: PatternReplaceFilterFactory problem

2019-01-29 Thread Chris Wareham
Thanks for the help - changing the field type of the destination for the 
copy fields to "text_en" solved the problem. I'd foolishly assumed that 
the analysis of the source fields was applied then the resulting tokens 
passed to the copy field, which doesn't really make sense now that I 
think about it!


So the indexing process is:

+---+ ++ +-+
|companyName| |  companyName   | | companyName |
|input data |>|text_en analysis|>|index|
+---+ ++ +-+
  |
  |   ++ +-+
  +-->|  text  |>|text |
  |text_en analysis| |index|
  ++ +-+

Rather than:

+---+ ++   +-+
|companyName| |  companyName   |   | companyName |
|input data |>|text_en analysis|-->|index|
+---+ ++   +-+
  |
   +-+ +-+
   | text|>|text |
   |text_general analysis| |index|
   +-+ +-+


On 28/01/2019 12:37, Scott Stults wrote:

Hi Chris,

You've included the field definition of type text_en, but in your queries
you're searching the field "text", which is of type text_general. That may
be the source of your problem, but if looking into that doesn't help send
the definition of text_general as well.

Hope that helps!

-Scott

On Mon, Jan 28, 2019 at 6:02 AM Chris Wareham <
chris.ware...@graduate-jobs.com> wrote:


I'm trying to index some data which often includes domain names. I'd
like to remove the .com TLD, so I have modified the text_en field type
by adding a PatternReplaceFilterFactory filter. However, it doesn't
appear to be working as a search for "text:(mydomain.com)" matches
records but "text:(mydomain)" does not.


  








  
  








  


The actual field definitions are as follows:













PatternReplaceFilterFactory problem

2019-01-28 Thread Chris Wareham
I'm trying to index some data which often includes domain names. I'd 
like to remove the .com TLD, so I have modified the text_en field type 
by adding a PatternReplaceFilterFactory filter. However, it doesn't 
appear to be working as a search for "text:(mydomain.com)" matches 
records but "text:(mydomain)" does not.


  positionIncrementGap="100">


  
  ignoreCase="true" synonyms="synonyms.txt"/>
  ignoreCase="true"/>

  
  pattern="([-a-z])\.com" replacement="$1"/>

  
  protected="protwords.txt"/>

  


  
  ignoreCase="true" synonyms="synonyms.txt"/>
  ignoreCase="true"/>

  
  pattern="([-a-z])\.com" replacement="$1"/>

  
  protected="protwords.txt"/>

  

  

The actual field definitions are as follows:

  stored="true"  required="true" />
  stored="true"  required="true" />
  stored="false" />


  
  


Re: indexed and stored for fields that are sources of a copy field

2018-10-22 Thread Chris Wareham

Hi Emir,

Many thanks for the confirmation. I'd kind of inferred this was correct
from the paragraph starting with "Copying is done at the stream source
level", but it would be good to mention it in the "Copying Fields"
section of the Solr documentation. Should I create a JIRA issue asking
for this?

Regards,

Chris

On 22/10/2018 14:28, Emir Arnautović wrote:

Hi Chris,
Yes you can do that. There is also type=“ignored” that you can use in such 
scenario.

HTH,
Emir
--
Monitoring - Log Management - Alerting - Anomaly Detection
Solr & Elasticsearch Consulting Support Training - http://sematext.com/




On 22 Oct 2018, at 15:22, Chris Wareham  wrote:

Hi folks,

I have a number of fields defined in my managed-schema file that are used as 
the sources for a copy field:

  
  
  

  

  
  
  

Can I set both the indexed and stored values to false for the body, sectors and 
locations fields since I don't want to search or retrieve them?

Regards,

Chris




indexed and stored for fields that are sources of a copy field

2018-10-22 Thread Chris Wareham

Hi folks,

I have a number of fields defined in my managed-schema file that are 
used as the sources for a copy field:


  stored="true"/>
  stored="true"  multiValued="true"/>
  stored="true"  multiValued="true"/>


  stored="false" multiValued="true"/>


  
  
  

Can I set both the indexed and stored values to false for the body, 
sectors and locations fields since I don't want to search or retrieve them?


Regards,

Chris