Re: Index Deeply Nested documents and retrieve a full nested document in solr

2020-09-24 Thread Alexandre Rafalovitch
It is yes to both questions, but I am not sure if they play well
together for historical reasons.

For storing/parsing original JSON in any (custom) format:
https://lucene.apache.org/solr/guide/8_6/transforming-and-indexing-custom-json.html
(srcField parameter)
For indexing nested children (with named collections of subdocuments)
but in Solr's own JSON format:
https://lucene.apache.org/solr/guide/8_6/indexing-nested-documents.html

I am not sure if defining additional fields as per the second document
but indexing the first way will work together. A feedback on that
would be useful.

Please also note that Solr is not intended to be the primary storage
(like a database). If you do atomic operations, the stored JSON will
get out of sync as it is not regenerated. Also, for the advanced
searches, you may want to normalize your data in different ways than
those your original data structure has. So, you may want to consider
an architecture where that JSON is stored separately or is retrieved
from original database and the Solr is focused on good search and
returning you just the record ID. That would actually allow you to
store a lot less in Solr (like just IDs) and focus on indexing in the
best way. Not saying it is the right way for your needs, just that is
a non-obvious architecture choice you may want to keep in mind as you
add Solr to your existing stack.

Regards,
   Alex.

On Thu, 24 Sep 2020 at 10:23, Abhay Kumar  wrote:
>
> Hello Team,
>
> Can someone please help to index the below sample json document into Solr.
>
> I have following queries on indexing multi level child document.
>
>
>   1.  Can we specify names to documents hierarchy such as "therapeuticareas" 
> or "sites" while indexing.
>   2.  How can we index document at multi-level hierarchy.
>
> I have following queries on retrieving the result.
>
>
>   1.  How can I retrieve result with full nested structure.
>
> [{
>"id": "NCT0102",
>"title": "Congenital Adrenal Hyperplasia: Calcium Channels as 
> Therapeutic Targets",
>"phase": "Phase 1/Phase 2",
>"status": "Completed",
>"studytype": "Interventional",
>"enrollmenttype": "",
>"sponsorname": ["National Center for Research Resources 
> (NCRR)"],
>"sponsorrole": ["lead"],
>"score": [0],
>"source": "National Center for Research Resources (NCRR)",
>"therapeuticareas": [{
>  "taid": "ta1",
>  "ta": "Lung Cancer",
>  "diseaseAreas": ["Oncology, 
> Respiratory tract diseases"],
>  "pubmeds": [{
> "pmbid": "pm1",
> "articleTitle": 
> "Consensus minimum data set for lung cancer multidisciplinary teams Results 
> of a Delphi process",
> "revisedDate": 
> "2018-12-11T18:30:00Z"
>  }],
>  "conferences": [{
> "confid": "conf1",
> "conferencename": 
> "American Academy of Neurology Annual Meeting",
> 
> "conferencetopic": "Avances en el manejo de los trastornos del movimiento 
> hipercineticos",
> "conferencedate": 
> "2019-05-08T18:30:00Z"
>  }]
>   },
>   {
>  "taid": "ta2",
>  "ta": "Breast Cancer",
>  "diseaseAreas": ["Oncology"],
>  "pubmeds": [],
>  "conferences": []
>   }
>],
>
>"sites": [{
>   "siteid": "site1",
>   "type": "Hospital",
>   "institutionname": "Methodist Health System",
>   "country": "United States",
>   "state": "Texas",
>   "city": "Dallas",
>   "zip": ""
>}],
>
>"investigators": [{
>   "invid": "inv1",
>   "investigatorname": "Bryan A Faller",
>   "role": "Principal Investigator",
>   "location": "",
>  

Index Deeply Nested documents and retrieve a full nested document in solr

2020-09-24 Thread Abhay Kumar
Hello Team,

Can someone please help to index the below sample json document into Solr.

I have following queries on indexing multi level child document.


  1.  Can we specify names to documents hierarchy such as "therapeuticareas" or 
"sites" while indexing.
  2.  How can we index document at multi-level hierarchy.

I have following queries on retrieving the result.


  1.  How can I retrieve result with full nested structure.

[{
   "id": "NCT0102",
   "title": "Congenital Adrenal Hyperplasia: Calcium Channels as 
Therapeutic Targets",
   "phase": "Phase 1/Phase 2",
   "status": "Completed",
   "studytype": "Interventional",
   "enrollmenttype": "",
   "sponsorname": ["National Center for Research Resources (NCRR)"],
   "sponsorrole": ["lead"],
   "score": [0],
   "source": "National Center for Research Resources (NCRR)",
   "therapeuticareas": [{
 "taid": "ta1",
 "ta": "Lung Cancer",
 "diseaseAreas": ["Oncology, 
Respiratory tract diseases"],
 "pubmeds": [{
"pmbid": "pm1",
"articleTitle": 
"Consensus minimum data set for lung cancer multidisciplinary teams Results of 
a Delphi process",
"revisedDate": 
"2018-12-11T18:30:00Z"
 }],
 "conferences": [{
"confid": "conf1",
"conferencename": 
"American Academy of Neurology Annual Meeting",
"conferencetopic": 
"Avances en el manejo de los trastornos del movimiento hipercineticos",
"conferencedate": 
"2019-05-08T18:30:00Z"
 }]
  },
  {
 "taid": "ta2",
 "ta": "Breast Cancer",
 "diseaseAreas": ["Oncology"],
 "pubmeds": [],
 "conferences": []
  }
   ],

   "sites": [{
  "siteid": "site1",
  "type": "Hospital",
  "institutionname": "Methodist Health System",
  "country": "United States",
  "state": "Texas",
  "city": "Dallas",
  "zip": ""
   }],

   "investigators": [{
  "invid": "inv1",
  "investigatorname": "Bryan A Faller",
  "role": "Principal Investigator",
  "location": "",
  "score": ""
   }],

   "Drugs": [{
  "id": "11",
  "drugname": "Methotrexate",
  "activeIngredient": "Methotrexate Sodium"
   }]
}]

Thanks.
Abhay

Confidentiality Notice

This email message, including any attachments, is for the sole use of the 
intended recipient and may contain confidential and privileged information. Any 
unauthorized view, use, disclosure or distribution is prohibited. If you are 
not the intended recipient, please contact the sender by reply email and 
destroy all copies of the original message. Anju Software, Inc. 4500 S. 
Lakeshore Drive, Suite 620, Tempe, AZ USA 85282.