[ 
https://issues.apache.org/jira/browse/SOLR-13167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kevin Bachmann updated SOLR-13167:
----------------------------------
    Description: 
i have a product search hosted on a solr cloud with 2 shards and two instances 
hosted on ec2 and the following setup: 

a product has an unlimited amount of children which are small objects with shop 
information. these child documents of the products define the shops where the 
product is available. the requirement from my side is to update / sync the 
whole documents (parent and children) at least once a day. the availability 
information is included in the child-documents with a quantity field.

problem:
 # after every sync the number of child documents (shops) increases and nests 
deeper every sync as the quantity changes and the child documents are 
apparently not updated by id but newly created with the same id 
(document-duplicates as comparable in SOLR-5211, SOLR-6096, SOLR-12638). 
 # whenever i sync the products with the children with one level of depth 
(parent > child) i get parent > child > child > child > ... depending on how 
many children there are (see screenshot-4.png). these children also can't be 
displayed with nodeType:shop
 # whenever i try to request the products (parents) by a child attribute 
(shopId) the search is underteministic and does not return the correct 
products. a lot of products do contain children that never have been assigned 
to them. some products are flooded with a huuge amount of children (>1000) 
although they have assigned about 10. as you can see in screenshot-1 to 3 there 
are three queries that are exactly the same and give back different products. 
screenshot-1 with 26241 results would be the correct amount and correct data 
but the other two are completely wrong. 

i would really appreciate any workaround or help on these issues. this is a 
huge problem and my business does depend on this (!):(

 

  was:
i have a product search hosted on a solr cloud with 2 shards and two instances 
hosted on ec2 and the following setup: 

a product has an unlimited amount of children which are small objects with shop 
information. these child documents of the products define the shops where the 
product is available. the requirement from my side is to update / sync the 
whole documents (parent and children) at least once a day. the availability 
information is included in the child-documents with a quantity field.

problem:
 # after every sync the number of child documents (shops) increases and nests 
deeper every sync as the quantity changes and the child documents are 
apparently not updated by id but newly created with the same id (duplicates as 
comparable in SOLR-5211, SOLR-6096, SOLR-12638). 
 # whenever i sync the products with the children with one level of depth 
(parent > child) i get parent > child > child > child > ... depending on how 
many children there are (see screenshot-4.png). these children also can't be 
displayed with nodeType:shop
 # whenever i try to request the products (parents) by a child attribute 
(shopId) the search is underteministic and does not return the correct 
products. a lot of products do contain children that never have been assigned 
to them. some products are flooded with a huuge amount of children (>1000) 
although they have assigned about 10. as you can see in screenshot-1 to 3 there 
are three queries that are exactly the same and give back different products. 
screenshot-1 with 26241 results would be the correct amount and correct data 
but the other two are completely wrong. 

i would really appreciate any workaround or help on these issues. this is a 
huge problem and my business does depend on this (!):(

 


> Duplicate Child Documents and undeterministic search
> ----------------------------------------------------
>
>                 Key: SOLR-13167
>                 URL: https://issues.apache.org/jira/browse/SOLR-13167
>             Project: Solr
>          Issue Type: Bug
>      Security Level: Public(Default Security Level. Issues are Public) 
>          Components: search, SolrCloud
>    Affects Versions: 7.5
>         Environment: SOLR 7.5 running on AWS EC2 Instances with an AMI OS 
> split to two shards running on two different EC2 instances with the built in 
> Zookeeper of SOLR
>            Reporter: Kevin Bachmann
>            Priority: Major
>         Attachments: screenshot-1.png, screenshot-2.png, screenshot-3.png, 
> screenshot-4.png
>
>
> i have a product search hosted on a solr cloud with 2 shards and two 
> instances hosted on ec2 and the following setup: 
> a product has an unlimited amount of children which are small objects with 
> shop information. these child documents of the products define the shops 
> where the product is available. the requirement from my side is to update / 
> sync the whole documents (parent and children) at least once a day. the 
> availability information is included in the child-documents with a quantity 
> field.
> problem:
>  # after every sync the number of child documents (shops) increases and nests 
> deeper every sync as the quantity changes and the child documents are 
> apparently not updated by id but newly created with the same id 
> (document-duplicates as comparable in SOLR-5211, SOLR-6096, SOLR-12638). 
>  # whenever i sync the products with the children with one level of depth 
> (parent > child) i get parent > child > child > child > ... depending on how 
> many children there are (see screenshot-4.png). these children also can't be 
> displayed with nodeType:shop
>  # whenever i try to request the products (parents) by a child attribute 
> (shopId) the search is underteministic and does not return the correct 
> products. a lot of products do contain children that never have been assigned 
> to them. some products are flooded with a huuge amount of children (>1000) 
> although they have assigned about 10. as you can see in screenshot-1 to 3 
> there are three queries that are exactly the same and give back different 
> products. screenshot-1 with 26241 results would be the correct amount and 
> correct data but the other two are completely wrong. 
> i would really appreciate any workaround or help on these issues. this is a 
> huge problem and my business does depend on this (!):(
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to