Re: Having trouble indexing nested docs using "split" feature.
On 12/2/2017 12:55 PM, David Lee wrote: { "responseHeader":{ "status":0, "QTime":798}} Though the status indicates there was no error, when I try to query on the the data using *:*, I get this: curl 'http://localhost:8983/solr/my_collection/select?q=*:*' { "responseHeader":{ "zkConnected":true, "status":0, "QTime":6, "params":{ "q":"*:*"}}, "response":{"numFound":0,"start":0,"maxScore":0.0,"docs":[] }} So it looks like no documents were actually indexed from above. I'm trying to determine if this is due to an error in the reference manual, or if I haven't set up Solr correctly. I don't know anything at all about the split feature or the parent/child document feature. I'm going to concentrate on the fact that numFound is zero. With the indexing returning a success response, there should have been SOMETHING indexed. Did you ever do a commit operation? This can be an explicit operation, or there are some ways you can have it happen automatically. If you include a commitWithin parameter on the indexing request, then there will be an automatic commit within that many milliseconds from when indexing started. You can configure autoSoftCommit in solrconfig.xml, then reload the core/collection or restart Solr. Unless there is a commit that opens a new searcher, changes made to the index will never be visible to clients. https://lucidworks.com/2013/08/23/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/ The article title says "SolrCloud" but all the information is just as applicable to standalone mode. If you *have* done a commit with openSearcher set to true (which is the default setting for openSearcher), then we'll need to examine solr.log, and you'll need to be sure that the indexing request happened during the time the log was created. Thanks, Shawn
Re: Having trouble indexing nested docs using "split" feature.
Sorry about the formatting for the first part, hope this is clearer: { "book_id": "1234", "book_title": "The Martian Chronicles", "author": "Ray Bradbury", "reviews": [ { "reviewer": "John Smith", "reviewer_background": { "highest_rank": "Excellent", "latest_review": "10/15/2017 10:15:00.000 CST", } }, { "reviewer": "Adam Smith", "reviewer_background": { "highest_rank": "Good", "latest_review": "10/10/2017 16:18:00.000 CST", } } ], "checkouts": [ { "member_id": "aaabbbccc", "member_name": "Sam Jackson" },{ "member_id": "bbbcccddd", "member_name": "Buddy Jones" } ] } On 12/2/2017 1:55 PM, David Lee wrote: Hi all, I've been trying for some time now to find a suitable way to deal with json documents that have nested data. By suitable, I mean being able to index them and retrieve them so that they are in the same structure as when indexed. I'm using version 7.1 under linux Mint 18.3 with Oracle Java 1.8.0_151. After untarring the distribution, I ran through the "getting started" tutorial from the reference manual where it had me create the techproducts index. I then created another collection called my_collection so I could run the examples more easily. It used the _default schema. Here is a sample: { "book_id": "1234", "book_title": "The Martian Chronicles", "author": "Ray Bradbury", "reviews": [ { "reviewer": "John Smith", "reviewer_background": { "highest_rank": "Excellent", "latest_review": "10/15/2017 10:15:00.000 CST", } }, { "reviewer": "Adam Smith", "reviewer_background": { "highest_rank": "Good", "latest_review": "10/10/2017 16:18:00.000 CST", } } ], "checkouts": [ { "member_id": "aaabbbccc", "member_name": "Sam Jackson" },{ "member_id": "bbbcccddd", "member_name": "Buddy Jones" } ] } Obviously, I'll need to search at the parent level and child level. I started experimenting and tried to use one of the examples from "Transforming and Indexing Solr JSON". However, when I tried the first example as follows: curl 'http://localhost:8983/solr/my_collection/update/json/docs'\ '?split=/exams'\ '=first:/first'\ '=last:/last'\ '=grade:/grade'\ '=subject:/exams/subject'\ '=test:/exams/test'\ '=marks:/exams/marks'\ -H 'Content-type:application/json' -d ' { "first": "John", "last": "Doe", "grade": 8, "exams": [ { "subject": "Maths", "test" : "term1", "marks" : 90}, { "subject": "Biology", "test" : "term1", "marks" : 86} ] }' { "responseHeader":{ "status":0, "QTime":798}} Though the status indicates there was no error, when I try to query on the the data using *:*, I get this: curl 'http://localhost:8983/solr/my_collection/select?q=*:*' { "responseHeader":{ "zkConnected":true, "status":0, "QTime":6, "params":{ "q":"*:*"}}, "response":{"numFound":0,"start":0,"maxScore":0.0,"docs":[] }} So it looks like no documents were actually indexed from above. I'm trying to determine if this is due to an error in the reference manual, or if I haven't set up Solr correctly. I've tried other techniques (not using the split option) like from Yonik's site, but those are slightly dated and I was hoping there was a more practical approach with the release of Solr 7. Any assistance would be appreciated. Thank you.
Having trouble indexing nested docs using "split" feature.
Hi all, I've been trying for some time now to find a suitable way to deal with json documents that have nested data. By suitable, I mean being able to index them and retrieve them so that they are in the same structure as when indexed. I'm using version 7.1 under linux Mint 18.3 with Oracle Java 1.8.0_151. After untarring the distribution, I ran through the "getting started" tutorial from the reference manual where it had me create the techproducts index. I then created another collection called my_collection so I could run the examples more easily. It used the _default schema. Here is a sample: { "book_id": "1234", "book_title": "The Martian Chronicles", "author": "Ray Bradbury", "reviews": [ { "reviewer": "John Smith", "reviewer_background": { "highest_rank": "Excellent", "latest_review": "10/15/2017 10:15:00.000 CST", } }, { "reviewer": "Adam Smith", "reviewer_background": { "highest_rank": "Good", "latest_review": "10/10/2017 16:18:00.000 CST", } } ], "checkouts": [ { "member_id": "aaabbbccc", "member_name": "Sam Jackson" },{ "member_id": "bbbcccddd", "member_name": "Buddy Jones" } ] } Obviously, I'll need to search at the parent level and child level. I started experimenting and tried to use one of the examples from "Transforming and Indexing Solr JSON". However, when I tried the first example as follows: curl 'http://localhost:8983/solr/my_collection/update/json/docs'\ '?split=/exams'\ '=first:/first'\ '=last:/last'\ '=grade:/grade'\ '=subject:/exams/subject'\ '=test:/exams/test'\ '=marks:/exams/marks'\ -H 'Content-type:application/json' -d ' { "first": "John", "last": "Doe", "grade": 8, "exams": [ { "subject": "Maths", "test" : "term1", "marks" : 90}, { "subject": "Biology", "test" : "term1", "marks" : 86} ] }' { "responseHeader":{ "status":0, "QTime":798}} Though the status indicates there was no error, when I try to query on the the data using *:*, I get this: curl 'http://localhost:8983/solr/my_collection/select?q=*:*' { "responseHeader":{ "zkConnected":true, "status":0, "QTime":6, "params":{ "q":"*:*"}}, "response":{"numFound":0,"start":0,"maxScore":0.0,"docs":[] }} So it looks like no documents were actually indexed from above. I'm trying to determine if this is due to an error in the reference manual, or if I haven't set up Solr correctly. I've tried other techniques (not using the split option) like from Yonik's site, but those are slightly dated and I was hoping there was a more practical approach with the release of Solr 7. Any assistance would be appreciated. Thank you.