Re: support Rich Document

2021-02-10 Thread Jörn Franke
You can store them on the filesystem and a link to them in Solr. Your search 
application could fetch them from the filesystem and serve them to the users. 

Alternatively serve them as WebDAV, SharePoint or whatever your organization 
sets as standard.

It does not make sense to store them in Solr - they would just blow up the 
index without any value.

> Am 11.02.2021 um 05:08 schrieb Luke :
> 
> HI,
> 
> I know Solr can index rich documents, but I have one requirement.
> 
> I have all kind of documents, such as word, pdf, excel, ppt, jpg etcs
> 
> when Solr indexes them with Tika or OCR, it will extract text and save to
> solr, but the format will be lost, so when the user opens the document, it
> is not readable.
> 
> My question is whether Solr can keep original documents somewhere, such as
> external field, when I load documents, the original document can be
> retrieved too.
> 
> thanks


Down Replica is elected as Leader (solr v8.7.0)

2021-02-10 Thread mmb1234
Hello,

On reboot of one of the solr nodes in the cluster, we often see a
collection's shards with
1. LEADER replica in DOWN state, and/or
2. shard with no LEADER

Output from /solr/admin/collections?action=CLUSTERSTATUS is below.

Even after 5 to 10 minutes, the collection often does not recover. Unclear
why this is happening and what we can try to prevent or remedy it.

ps: perReplicaState= true in solr v8.8.0 didn't work well because after a
rebalance all replicas somehow get a "leader:true" status even though
states.json looked ok.

{
  "responseHeader": {
"status": 0,
"QTime": 2
  },
  "cluster": {
"collections": {
  "datacore": {
"pullReplicas": "0",
"replicationFactor": "0",
"shards": {
  "__": {
"range": null,
"state": "active",
"replicas": {
  "core_node1": {
"core": "datacore____replica_t187",
"base_url": "http://solr-0.solr-headless:8983/solr;,
"node_name": "solr-0.solr-headless:8983_solr",
"state": "down",
"type": "TLOG",
"force_set_state": "false",
"property.preferredleader": "true",
"leader": "true"
  },
  "core_node2": {
"core": "datacore____replica_t188",
"base_url": "http://solr-1.solr-headless:8983/solr;,
"node_name": "solr-1.solr-headless:8983_solr",
"state": "active",
"type": "TLOG",
"force_set_state": "false"
  },
  "core_node3": {
"core": "datacore____replica_t189",
"base_url": "http://solr-2.solr-headless:8983/solr;,
"node_name": "solr-2.solr-headless:8983_solr",
"state": "active",
"type": "TLOG",
"force_set_state": "false"
  }
}
  },
  "__j": {
"range": null,
"state": "active",
"replicas": {
  "core_node19": {
"core": "datacore___j_replica_t187",
"base_url": "http://solr-0.solr-headless:8983/solr;,
"node_name": "solr-0.solr-headless:8983_solr",
"state": "down",
"type": "TLOG",
"force_set_state": "false",
"property.preferredleader": "true"
  },
  "core_node20": {
"core": "datacore___j_replica_t188",
"base_url": "http://solr-1.solr-headless:8983/solr;,
"node_name": "solr-1.solr-headless:8983_solr",
"state": "active",
"type": "TLOG",
"force_set_state": "false"
  },
  "core_node21": {
"core": "datacore___j_replica_t189",
"base_url": "http://solr-2.solr-headless:8983/solr;,
"node_name": "solr-2.solr-headless:8983_solr",
"state": "active",
"type": "TLOG",
"force_set_state": "false"
  }
}
  },
  "__": {
"range": null,
"state": "active",
"replicas": {
  "core_node4": {
"core": "datacore____replica_t91",
"base_url": "http://solr-0...



--
Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html


support Rich Document

2021-02-10 Thread Luke
HI,

I know Solr can index rich documents, but I have one requirement.

I have all kind of documents, such as word, pdf, excel, ppt, jpg etcs

when Solr indexes them with Tika or OCR, it will extract text and save to
solr, but the format will be lost, so when the user opens the document, it
is not readable.

My question is whether Solr can keep original documents somewhere, such as
external field, when I load documents, the original document can be
retrieved too.

thanks


Re: UPDATE collection's Rule-based Replica Placement

2021-02-10 Thread Ilan Ginzburg
Do you look for something that would move existing collection replicas
to comply with a new set of rules?
I'm afraid that doesn't exist, but you can use the Collection API to
move replicas "manually".

Ilan

On Tue, Feb 9, 2021 at 1:10 PM mosheB  wrote:
>
> Hi community,
> Using Solr 8.3, is there any way to change the replica placment of "running"
> collection say "from this point forward" or should I recreate the collection
> and migrate all my data from the existing collection to the new one?
> Tried to use the COLLECTIONPROP action which doesn't do the job, instead it
> just update collectionprops.json file and not really affect the replica
> placement enforcement.
>
> Thanks!
>
>
>
> --
> Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Collection Creation across DC

2021-02-10 Thread Revas
Hello,

Can we create a collection across data Center ( shard replica is in a
different data center)
for HA ?

Thanks
Revas


Index rich document and view

2021-02-10 Thread Luke Oak
Hi,

I have all kind of rich documents, such as excel, ppt, PDF, word, jpg ..., I 
knew Tika or ocr can convert them to text and index it. But when I open the 
document, the format is changed,  how can I keep original document format, is 
it possible in solr?

If not, can I use external field type to save original file and load it when I 
want to view the document?

Thanks 

Sent from my iPhone

Without custom updateRequestProcessorChain: DistributedUpdateProcessorFactory was explicitly disabled from this updateRequestProcessorChain

2021-02-10 Thread diego_70
Hello,

We are using SOLR cloud 8.5.

Several times per hour we can see these kind of errors in logs:

/RunUpdateProcessor has received an AddUpdateCommand containing a document
that appears to still contain Atomic document update operations, most likely
because DistributedUpdateProcessorFactory was explicitly disabled from this
updateRequestProcessorChain./

As far as I understood this error is related to customized
updateRequestProcessorChain. But the main concern is that we are not using
those features, we have not defined any new 
updateRequestProcessorChain. We use the default 
default-update-request-processor-chain

  
.

I'm not able to reproduce the issue in a test environment, the same update
works fine in a test environment.

A tipical failing update contains several atomic updates following the
structure "field": { "set": "value"} , "field1": { "set": "value1"}, etc..

Do you have any idea of what could be the root cause?, maybe performance
issues?, too much load?, any problem with the tlog?

Thanks in advance


Diego
--
Senior Software Engineer
Telefónica Cybersecurity & Cloud Tech



--
Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html