Re: Insert documents to a particular shard
Thanks Jorn for your suggestions , It was a sample schema but each document_type will have more fields . 1) Yes i have exported graph traversal gatherNodes using streaming expression but we found few issues ex: get parent doc based on grandchild doc filter Graph Traversal - {!graph from=parentId to=parentId traversalFilter='document_type:parent' returnRoot=false}(name:David AND document_type:grandchild) this request gives all the fields of the parent doc but gather nodes i can gather only a single field of the parent doc and then i have to query to get all the fields also we are looking for pagination where streams does not support pagination . 2) I tried document routing with explicit way and it might work for us but i have to explore more on what happens when we split the shards. ex: curl 'localhost:8983/solr/admin/collections?action=CREATE&name=family& router.name =implicit&router.field=rfield&collection.configName=base-config&shards=shard1,shard2&maxShardsPerNode=2&numShards=1&replicationFactor=2' - when inserting the parent doc i can randomly pick one of the shard (shard1 or shard2) for the rfield - while inserting any child doc or grandchild doc i use the parent doc rfield to keep them in the same shard. Regards sam On Tue, Jun 2, 2020 at 10:35 PM Jörn Franke wrote: > Hint: you can easily try out streaming expressions in the admin UI > > > Am 03.06.2020 um 07:32 schrieb Jörn Franke : > > > > > > You are trying to achieve data locality by having parents and children > in the same shard? > > Does document routing address it? > > > > > https://lucene.apache.org/solr/guide/8_5/shards-and-indexing-data-in-solrcloud.html#document-routing > > > > > > On a side node, I don’t know your complete use case, but have you > explored streaming expressions for graph traversal? > > > > https://lucene.apache.org/solr/guide/8_5/graph-traversal.html > > > > > >>> Am 03.06.2020 um 00:37 schrieb sambasivarao giddaluri < > sambasiva.giddal...@gmail.com>: > >>> > >> Hi All, > >> I am running solr in cloud mode in local with 2 shards and 2 replica on > >> port 8983 and 7574 and figuring out how to insert document in to a > >> particular shard , I read about implicit and composite route but i don't > >> think it will work for my usecase. > >> > >> shard1 : http://192.168.0.112:8983/family_shard1_replica_n1 > >> http://192.168.0.112:7574/family_shard1_replica_n2 > >> > >> shard2: http://192.168.0.112:8983/family_shard2_replica_n3 > >> http://192.168.0.112:7574/family_shard2_replica_n4 > >> > >> we have documents with parent child relationship but flatten out with 2 > >> levels down and reference to each other. > >> family schema documents: > >> { > >> "Id":"1" > >> "document_type":"parent" > >> "name":"John" > >> } > >> { > >> "Id":"2" > >> "document_type":"child" > >> "parentId":"1" > >> "name":"Rodney" > >> } > >> { > >> "Id":"3" > >> "document_type":"child" > >> "parentId":"1" > >> "name":"George" > >> } > >> { > >> "Id":"4" > >> "document_type":"grandchild" > >> "parentId":"1", > >> "childIdId":"2" > >> "name":"David" > >> } > >> we have complex queries to get data based on graph query parser and as > >> graph query parser does not work on solr cloud with multiple shards. I > was > >> trying to develop a logic like whenever a document gets inserted or > updated > >> make sure it gets saved in the same shard where the parent doc is > stored , > >> in that way graph query works because all the family information will > be in > >> the same shard. > >> Approach : > >> 1) If a new child/grandchild is getting inserted then get the parent doc > >> shard details and add the shard details to the document in a field > >> ex:parentshard and save the doc in the shard. > >> 2) If document is getting updated check if the parentshard field exists > if > >> so update the doc to same shard. > >> But all these check conditions will increase response time , currently > our > >> development is done in cloud mode with single shard and using solrj to > >> save the data. > >> Also i an unable to figure out the query to update doc to a particular > >> shard. > >> > >> Any suggestions will help . > >> > >> Thanks in Advance > >> sam >
Re: Insert documents to a particular shard
Hint: you can easily try out streaming expressions in the admin UI > Am 03.06.2020 um 07:32 schrieb Jörn Franke : > > > You are trying to achieve data locality by having parents and children in the > same shard? > Does document routing address it? > > https://lucene.apache.org/solr/guide/8_5/shards-and-indexing-data-in-solrcloud.html#document-routing > > > On a side node, I don’t know your complete use case, but have you explored > streaming expressions for graph traversal? > > https://lucene.apache.org/solr/guide/8_5/graph-traversal.html > > >>> Am 03.06.2020 um 00:37 schrieb sambasivarao giddaluri >>> : >>> >> Hi All, >> I am running solr in cloud mode in local with 2 shards and 2 replica on >> port 8983 and 7574 and figuring out how to insert document in to a >> particular shard , I read about implicit and composite route but i don't >> think it will work for my usecase. >> >> shard1 : http://192.168.0.112:8983/family_shard1_replica_n1 >> http://192.168.0.112:7574/family_shard1_replica_n2 >> >> shard2: http://192.168.0.112:8983/family_shard2_replica_n3 >> http://192.168.0.112:7574/family_shard2_replica_n4 >> >> we have documents with parent child relationship but flatten out with 2 >> levels down and reference to each other. >> family schema documents: >> { >> "Id":"1" >> "document_type":"parent" >> "name":"John" >> } >> { >> "Id":"2" >> "document_type":"child" >> "parentId":"1" >> "name":"Rodney" >> } >> { >> "Id":"3" >> "document_type":"child" >> "parentId":"1" >> "name":"George" >> } >> { >> "Id":"4" >> "document_type":"grandchild" >> "parentId":"1", >> "childIdId":"2" >> "name":"David" >> } >> we have complex queries to get data based on graph query parser and as >> graph query parser does not work on solr cloud with multiple shards. I was >> trying to develop a logic like whenever a document gets inserted or updated >> make sure it gets saved in the same shard where the parent doc is stored , >> in that way graph query works because all the family information will be in >> the same shard. >> Approach : >> 1) If a new child/grandchild is getting inserted then get the parent doc >> shard details and add the shard details to the document in a field >> ex:parentshard and save the doc in the shard. >> 2) If document is getting updated check if the parentshard field exists if >> so update the doc to same shard. >> But all these check conditions will increase response time , currently our >> development is done in cloud mode with single shard and using solrj to >> save the data. >> Also i an unable to figure out the query to update doc to a particular >> shard. >> >> Any suggestions will help . >> >> Thanks in Advance >> sam
Re: Insert documents to a particular shard
You are trying to achieve data locality by having parents and children in the same shard? Does document routing address it? https://lucene.apache.org/solr/guide/8_5/shards-and-indexing-data-in-solrcloud.html#document-routing On a side node, I don’t know your complete use case, but have you explored streaming expressions for graph traversal? https://lucene.apache.org/solr/guide/8_5/graph-traversal.html > Am 03.06.2020 um 00:37 schrieb sambasivarao giddaluri > : > > Hi All, > I am running solr in cloud mode in local with 2 shards and 2 replica on > port 8983 and 7574 and figuring out how to insert document in to a > particular shard , I read about implicit and composite route but i don't > think it will work for my usecase. > > shard1 : http://192.168.0.112:8983/family_shard1_replica_n1 > http://192.168.0.112:7574/family_shard1_replica_n2 > > shard2: http://192.168.0.112:8983/family_shard2_replica_n3 > http://192.168.0.112:7574/family_shard2_replica_n4 > > we have documents with parent child relationship but flatten out with 2 > levels down and reference to each other. > family schema documents: > { > "Id":"1" > "document_type":"parent" > "name":"John" > } > { > "Id":"2" > "document_type":"child" > "parentId":"1" > "name":"Rodney" > } > { > "Id":"3" > "document_type":"child" > "parentId":"1" > "name":"George" > } > { > "Id":"4" > "document_type":"grandchild" > "parentId":"1", > "childIdId":"2" > "name":"David" > } > we have complex queries to get data based on graph query parser and as > graph query parser does not work on solr cloud with multiple shards. I was > trying to develop a logic like whenever a document gets inserted or updated > make sure it gets saved in the same shard where the parent doc is stored , > in that way graph query works because all the family information will be in > the same shard. > Approach : > 1) If a new child/grandchild is getting inserted then get the parent doc > shard details and add the shard details to the document in a field > ex:parentshard and save the doc in the shard. > 2) If document is getting updated check if the parentshard field exists if > so update the doc to same shard. > But all these check conditions will increase response time , currently our > development is done in cloud mode with single shard and using solrj to > save the data. > Also i an unable to figure out the query to update doc to a particular > shard. > > Any suggestions will help . > > Thanks in Advance > sam
Insert documents to a particular shard
Hi All, I am running solr in cloud mode in local with 2 shards and 2 replica on port 8983 and 7574 and figuring out how to insert document in to a particular shard , I read about implicit and composite route but i don't think it will work for my usecase. shard1 : http://192.168.0.112:8983/family_shard1_replica_n1 http://192.168.0.112:7574/family_shard1_replica_n2 shard2: http://192.168.0.112:8983/family_shard2_replica_n3 http://192.168.0.112:7574/family_shard2_replica_n4 we have documents with parent child relationship but flatten out with 2 levels down and reference to each other. family schema documents: { "Id":"1" "document_type":"parent" "name":"John" } { "Id":"2" "document_type":"child" "parentId":"1" "name":"Rodney" } { "Id":"3" "document_type":"child" "parentId":"1" "name":"George" } { "Id":"4" "document_type":"grandchild" "parentId":"1", "childIdId":"2" "name":"David" } we have complex queries to get data based on graph query parser and as graph query parser does not work on solr cloud with multiple shards. I was trying to develop a logic like whenever a document gets inserted or updated make sure it gets saved in the same shard where the parent doc is stored , in that way graph query works because all the family information will be in the same shard. Approach : 1) If a new child/grandchild is getting inserted then get the parent doc shard details and add the shard details to the document in a field ex:parentshard and save the doc in the shard. 2) If document is getting updated check if the parentshard field exists if so update the doc to same shard. But all these check conditions will increase response time , currently our development is done in cloud mode with single shard and using solrj to save the data. Also i an unable to figure out the query to update doc to a particular shard. Any suggestions will help . Thanks in Advance sam