Best approach for flattening data or using nested lists.
I have a project where I am working with nested data (not that deep, but multiple lists) and would love to get some advice from other experienced developers. I've read most of the books on Solr (including Solr In Action) and though they provide good information (though dated) on the actual indexing mechanism, not many deal with this issue very much. If there are other resources that aren't necessarily Solr specific that can help here, please feel free to point those out. Here is the structure I'm working with. I've made it generic to simplify things, but the intent is here. { id: 1, _type: "book", name: "My Martian", genre: "Science Fiction", edits: [ { _type: "book_action", action: "Modify", chapter: 3, description: "Corrected spelling for interstellar" }, { _type: "book_action", action: "Removal", chapter: 24, description: "Removed chapter as it adds no value to the story" } ], chapters: [ { _type: "book_chapter", chapter_number: 1, chapter_title: "The Test" }, { _type: "book_chapter", chapter_number: 2, chapter_title: "The Next Test" } ] } My first attempt was to just add both lists through SolrJ (can't do this with the JSON interface since it doesn't allow multiple _childDocuments_ at the same level). That works and I'm able to use the _type value to distinguish between them. However, my problem here is that the users want to be able to search for any field in the top level of the data as well as within the lists. For example (using sql for clarity only): select * from book_index where genre = "Science Fiction" and action = "Removal" and chapter_number = 2; The problem I'm having with this sort of search is that, based on what I know, the {!child ... and {!parent . parsers won't give me access to all fields like this. I've looked at flattening the data similar to the following: { id: 1, name: "My Martian", genre: "Science Fiction", edit_action_3: { action: "Modify", chapter: 3, description: "Corrected spelling for interstellar" }, edit_action_24: { action: "Removal", chapter: 24, description: "Removed chapter as it adds no value to the story" }, chapter_1: { chapter_number: 1, chapter_title: "The Test" }, chapter_2: { chapter_number: 2, chapter_title: "The Next Test" } } This does flatten things out so that the above query would be able to search on any field, but it's a real kludge and makes it nearly impossible to get just a list of chapters or actions. So anyone have any thoughts? (FYI, this is my first Solr project so I'm really starting from scratch here). Thanks --- This email has been checked for viruses by Avast antivirus software. https://www.avast.com/antivirus
Re: Having trouble indexing nested docs using "split" feature.
Sorry about the formatting for the first part, hope this is clearer: { "book_id": "1234", "book_title": "The Martian Chronicles", "author": "Ray Bradbury", "reviews": [ { "reviewer": "John Smith", "reviewer_background": { "highest_rank": "Excellent", "latest_review": "10/15/2017 10:15:00.000 CST", } }, { "reviewer": "Adam Smith", "reviewer_background": { "highest_rank": "Good", "latest_review": "10/10/2017 16:18:00.000 CST", } } ], "checkouts": [ { "member_id": "aaabbbccc", "member_name": "Sam Jackson" },{ "member_id": "bbbcccddd", "member_name": "Buddy Jones" } ] } On 12/2/2017 1:55 PM, David Lee wrote: Hi all, I've been trying for some time now to find a suitable way to deal with json documents that have nested data. By suitable, I mean being able to index them and retrieve them so that they are in the same structure as when indexed. I'm using version 7.1 under linux Mint 18.3 with Oracle Java 1.8.0_151. After untarring the distribution, I ran through the "getting started" tutorial from the reference manual where it had me create the techproducts index. I then created another collection called my_collection so I could run the examples more easily. It used the _default schema. Here is a sample: { "book_id": "1234", "book_title": "The Martian Chronicles", "author": "Ray Bradbury", "reviews": [ { "reviewer": "John Smith", "reviewer_background": { "highest_rank": "Excellent", "latest_review": "10/15/2017 10:15:00.000 CST", } }, { "reviewer": "Adam Smith", "reviewer_background": { "highest_rank": "Good", "latest_review": "10/10/2017 16:18:00.000 CST", } } ], "checkouts": [ { "member_id": "aaabbbccc", "member_name": "Sam Jackson" },{ "member_id": "bbbcccddd", "member_name": "Buddy Jones" } ] } Obviously, I'll need to search at the parent level and child level. I started experimenting and tried to use one of the examples from "Transforming and Indexing Solr JSON". However, when I tried the first example as follows: curl 'http://localhost:8983/solr/my_collection/update/json/docs'\ '?split=/exams'\ '=first:/first'\ '=last:/last'\ '=grade:/grade'\ '=subject:/exams/subject'\ '=test:/exams/test'\ '=marks:/exams/marks'\ -H 'Content-type:application/json' -d ' { "first": "John", "last": "Doe", "grade": 8, "exams": [ { "subject": "Maths", "test" : "term1", "marks" : 90}, { "subject": "Biology", "test" : "term1", "marks" : 86} ] }' { "responseHeader":{ "status":0, "QTime":798}} Though the status indicates there was no error, when I try to query on the the data using *:*, I get this: curl 'http://localhost:8983/solr/my_collection/select?q=*:*' { "responseHeader":{ "zkConnected":true, "status":0, "QTime":6, "params":{ "q":"*:*"}}, "response":{"numFound":0,"start":0,"maxScore":0.0,"docs":[] }} So it looks like no documents were actually indexed from above. I'm trying to determine if this is due to an error in the reference manual, or if I haven't set up Solr correctly. I've tried other techniques (not using the split option) like from Yonik's site, but those are slightly dated and I was hoping there was a more practical approach with the release of Solr 7. Any assistance would be appreciated. Thank you.
Having trouble indexing nested docs using "split" feature.
Hi all, I've been trying for some time now to find a suitable way to deal with json documents that have nested data. By suitable, I mean being able to index them and retrieve them so that they are in the same structure as when indexed. I'm using version 7.1 under linux Mint 18.3 with Oracle Java 1.8.0_151. After untarring the distribution, I ran through the "getting started" tutorial from the reference manual where it had me create the techproducts index. I then created another collection called my_collection so I could run the examples more easily. It used the _default schema. Here is a sample: { "book_id": "1234", "book_title": "The Martian Chronicles", "author": "Ray Bradbury", "reviews": [ { "reviewer": "John Smith", "reviewer_background": { "highest_rank": "Excellent", "latest_review": "10/15/2017 10:15:00.000 CST", } }, { "reviewer": "Adam Smith", "reviewer_background": { "highest_rank": "Good", "latest_review": "10/10/2017 16:18:00.000 CST", } } ], "checkouts": [ { "member_id": "aaabbbccc", "member_name": "Sam Jackson" },{ "member_id": "bbbcccddd", "member_name": "Buddy Jones" } ] } Obviously, I'll need to search at the parent level and child level. I started experimenting and tried to use one of the examples from "Transforming and Indexing Solr JSON". However, when I tried the first example as follows: curl 'http://localhost:8983/solr/my_collection/update/json/docs'\ '?split=/exams'\ '=first:/first'\ '=last:/last'\ '=grade:/grade'\ '=subject:/exams/subject'\ '=test:/exams/test'\ '=marks:/exams/marks'\ -H 'Content-type:application/json' -d ' { "first": "John", "last": "Doe", "grade": 8, "exams": [ { "subject": "Maths", "test" : "term1", "marks" : 90}, { "subject": "Biology", "test" : "term1", "marks" : 86} ] }' { "responseHeader":{ "status":0, "QTime":798}} Though the status indicates there was no error, when I try to query on the the data using *:*, I get this: curl 'http://localhost:8983/solr/my_collection/select?q=*:*' { "responseHeader":{ "zkConnected":true, "status":0, "QTime":6, "params":{ "q":"*:*"}}, "response":{"numFound":0,"start":0,"maxScore":0.0,"docs":[] }} So it looks like no documents were actually indexed from above. I'm trying to determine if this is due to an error in the reference manual, or if I haven't set up Solr correctly. I've tried other techniques (not using the split option) like from Yonik's site, but those are slightly dated and I was hoping there was a more practical approach with the release of Solr 7. Any assistance would be appreciated. Thank you.
Re: How to handle nested documents in solr (SolrJ)
Hi Rick, Adding to this subject, I do appreciate you pointing us to these articles, but I'm curious about how much of these take into account the latest versions of Solr (ie: +6.5 and 7) given the JSON split capabilities, etc. I know that is just on the indexing side so the searches may be the same but things are changing quickly these days (not a bad thing). Thanks, David On 5/24/2017 4:26 AM, Rick Leir wrote: Prasad, Gee, you get confusion from a google search for: nested documents site:mail-archives.apache.org/mod_mbox/lucene-solr-user/ https://www.google.ca/search?safe=strict=nested+documents+site%3Amail-archives.apache.org%2Fmod_mbox%2Flucene-solr-user%2F=nested+documents+site%3Amail-archives.apache.org%2Fmod_mbox%2Flucene-solr-user%2F_l=serp.3...34316.37762.0.37969.10.10.0.0.0.0.104.678.9j1.10.00...1.1.64.serp..0.0.0.JTf887wWCDM But my recent posting might help: " Yonick has some good blogs on this." And Mikhail has an excellent blog: https://blog.griddynamics.com/how-to-use-block-join-to-improve-search-efficiency-with-nested-documents-in-solr cheers -- Rick On 2017-05-24 02:53 AM, prasad chowdary wrote: Dear All, I have a requirement that I need to index the documents in solr using Java code. Each document contains a sub documents like below ( Its just for underastanding my question). student id : 123 student name : john marks : maths: 90 English :95 student id : 124 student name : rack marks : maths: 80 English :96 etc... So, as shown above each document contains one child document i.e marks. Actaully I don't need any joins or anything.My requirement is : if I query "English:95" ,it should return the complete document ,i.e child along with parent like below student id : 123 student name : john marks : maths: 90 English :95 and also if I query "student id : 123" , it should return the whole document same as above. Currently I am able to get the child along with parent for child match by using extendedResults option . But not able to get the child for parent match. --- This email has been checked for viruses by Avast antivirus software. https://www.avast.com/antivirus
Re: Reload an unloaded core
I have similar needs but for a slightly different use-case. In my case, I am breaking up cores / indexes based on the month and year so that I can add an alias that always points to the last few months, but beyond that I want to simply unload the other indexes once they get past a few months old. The indexes will remain on disk but I simply don't want my queries to have to go through the older "archived" documents. However, users will occasionally need to have those indexes reloaded for research reasons so what I was doing in ES was simply re-loading all of the indexes that fit within the range being searched for and added those to an alias (let's call it "archived", for example). Once they are finished querying on that older data, I again unload those indexes and remove the alias. From what I'm reading in this thread, this isn't quite as straight-forward in Solr so I'm looking for other options. Thanks, David On 5/2/2017 5:04 PM, Shashank Pedamallu wrote: Thank you Simon, Erick and Shawn for your replies. Unfortunately, restarting Solr is not a option for me. So, I’ll try to follow the steps given by Shawn to see where I’m standing. Btw, I’m using Solr 6.4.2. Shawn, once again thank you very much for the detailed reply. Thanks, Shashank Pedamallu On 5/2/17, 2:51 PM, "Shawn Heisey"wrote: On 5/2/2017 10:53 AM, Shashank Pedamallu wrote: I want to unload a core from Solr without deleting data-dir or instance-dir. I’m performing some operations on the data-dir after this and then I would like to reload the core from the same data-dir. These are the things I tried: 1. Reload api – throws an exception saying no such core exists. 2. Create api – throws an exception saying a core with given name already exists. Can someone point me what api I could use to achieve this. Please note that, I’m working with Solr in Non-Cloud mode without Zookeeper, Collections, etc. The RELOAD command isn't going to work at all because the core has been unloaded -- Solr doesn't know about the core, so it can't reload it. This is a case where the language used is somewhat confusing, even though it's completely correct. I am about 90 percent certain that the reason the CREATE command gave you an error message is because you tried to make a new core.properties file before you did the CREATE. When things are working correctly, the CREATE command itself is what will create core.properties. If it already exists, CoreAdmin will give you an error. This is the exact text of the error I encountered when trying to use CREATE after building a core.properties file manually: Error CREATEing SolrCore 'foo': Could not create a new core in C:\Users\sheisey\Downloads\solr-6.5.1\server\solr\fooas another core is already defined there That error message is confusing, so I will be fixing it: https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_SOLR-2D10599=DwIDaQ=uilaK90D4TOVoH58JNXRgQ=blJD2pBapH3dDkoajIf9mT9SSbbs19wRbChNde1ErNI=NQtabafDgjFF7o57qRBjFMUG9UDK92RE8pAtrnHmuTs=YimxLbwsjFGBwV4_LR5LK4yXu_uafFvvVujg-7MDJFE= To verify what you need to do, I fired up Solr 6.5.1 from an extracted download directory. I created two cores, "foo" and "bar", using the commandline "bin\solr create" command. Then I went to the admin UI and unloaded foo. The foo directory was still there, but the core was gone >from Solr's list. By clicking on the "Add Core" button in the Core Admin tab, typing "foo" into name and instanceDir, and clearing the other text boxes, the core was recreated exactly as it was before it was unloaded. This is the log from the CREATE command that the admin UI sent: 2017-05-02 18:02:49.232 INFO (qtp1543727556-18) [ x:foo] o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/cores params={schema===foo=CREATE==foo=json&_=1493747904891} status=0 QTime=396 To double-check this and show how it can be done without the admin UI, I accessed these two URLs (in a browser), and accomplished the exact same thing again. The first URL unloads the core, the second asks Solr to find the core and re-add it with default settings. https://urldefense.proofpoint.com/v2/url?u=http-3A__localhost-3A8983_solr_admin_cores-3Faction-3DUNLOAD-26core-3Dfoo=DwIDaQ=uilaK90D4TOVoH58JNXRgQ=blJD2pBapH3dDkoajIf9mT9SSbbs19wRbChNde1ErNI=NQtabafDgjFF7o57qRBjFMUG9UDK92RE8pAtrnHmuTs=CkhSi_Ik3vbgx1UYDGYcifbIuN8GUpc64dpm_hxYy8U= https://urldefense.proofpoint.com/v2/url?u=http-3A__localhost-3A8983_solr_admin_cores-3Faction-3DCREATE-26name-3Dfoo-26instanceDir-3Dfoo=DwIDaQ=uilaK90D4TOVoH58JNXRgQ=blJD2pBapH3dDkoajIf9mT9SSbbs19wRbChNde1ErNI=NQtabafDgjFF7o57qRBjFMUG9UDK92RE8pAtrnHmuTs=vORlMj_KMQCCLbYsbQM3t2y5Fy8i6IKI0zw3O8zsbcI= If you are using additional options with your cores, such as the configset parameter, you would need to include those options on your CREATE call, similar to what you might have done when you initially created the core. With some of the options you can
Re: Poll: Master-Slave or SolrCloud?
As someone who moved from ES to Solr, I can say that one of the things that makes ES so much easier to configure is that the majority of things that need to be set for a specific environment are all in pretty much one config file. Also, I didn't have to deal with the "magic stuff" that many people have talked about where SolrCloud is concerned. One of the problems is also do to documentation and user blogs that discuss how to use SolrCloud. They all tell you how to create a config to run SolrCloud on one system using the -e cloud flag, but then that's it. They all seem to avoid discussions of what to do from there in terms of best practices in distributing to other nodes. It's out there, but in many cases the guides refer to older versions of Solr so sometimes it is hard to know what versions people are writing about until you try their solutions and nothing works, so you finally figure out they are talking about a much older version. I moved away from ES to Solr because I prefer the openness of Solr and the community participation but I really haven't been very successful in deploying this in a production environment at this point. I'd say the two things I find that I'm battling with the most are the cloud configuration and the work I'm having to do to get even the most basic JSON documents indexed correctly (specifically where I need block joins, etc.). I'm hopeful that the V2 Api will help with the JSON issue, but it would be nice to have some documentation that goes more in-depth on how to set up additional nodes. Also, even though I use ZK for other parts of my application, I have no problem with a version running specifically for Solr if it makes this process more straight-forward. David On 4/27/2017 2:51 AM, Emir Arnautovic wrote: I think creating poll for ES ppl with question: "How do you run master nodes? A) on some data nodes B) dedicated node C) dedicated server" would give some insight how big issue is having ZK and if hiding ZK behind Solr would do any good. Emir On 25.04.2017 23:13, Otis Gospodnetić wrote: Hi Erick, Could one run *only* embedded ZK on some SolrCloud nodes, sans any data? It would be equivalent of dedicated Elasticsearch nodes, which is the current ES best practice/recommendation. I've never heard of anyone being scared of running 3 dedicated master ES nodes, so if SolrCloud offered the same, perhaps even completely hiding ZK from users, that would present the same level of complexity (err, simplicity) ES users love about ES. Don't want to talk about SolrCloud vs. ES here at all, just trying to share observations since we work a lot with both Elasticsearch and Solr(Cloud) at Sematext. Otis -- Monitoring - Log Management - Alerting - Anomaly Detection Solr & Elasticsearch Consulting Support Training - http://sematext.com/ On Tue, Apr 25, 2017 at 4:03 PM, Erick Ericksonwrote: bq: I read somewhere that you should run your own ZK externally, and turn off SolrCloud this is a bit confused. "turn off SolrCloud" has nothing to do with running ZK internally or externally. SolrCloud requires ZK, whether internal or external is irrelevant to the term SolrCloud. On to running an external ZK ensemble. Mostly, that's administratively by far the safest. If you're running the embedded ZK, then the ZK instances are tied to your Solr instance. Now if, for any reason, your Solr nodes hosting ZK go down, you lose ZK quorum, can't index. etc Now consider a cluster with, say, 100 Solr nodes. Not talking replicas in a collection here, I'm talking 100 physical machines. BTW, this is not even close to the largest ones I'm aware of. Which three (for example) are running ZK? If I want to upgrade Solr I better make really sure not to upgrade to of the Solr instances running ZK at once if I want my cluster to keep going And, ZK is sensitive to system resources. So putting ZK on a Solr node then hosing, say, updates to my Solr cluster can cause ZK to be starved for resources. This is one of those deals where _functionally_, it's OK to run embedded ZK, but administratively it's suspect. Best, Erick On Tue, Apr 25, 2017 at 10:49 AM, Rick Leir wrote: All, I read somewhere that you should run your own ZK externally, and turn off SolrCloud. Comments please! Rick On April 25, 2017 1:33:31 PM EDT, "Otis Gospodnetić" < otis.gospodne...@gmail.com> wrote: This is interesting - that ZK is seen as adding so much complexity that it turns people off! If you think about it, Elasticsearch users have no choice -- except their "ZK" is built-in, hidden, so one doesn't have to think about it, at least not initially. I think I saw mentions (maybe on user or dev MLs or JIRA) about potentially, in the future, there only being SolrCloud mode (and dropping SolrCloud name in favour of Solr). If the above comment from Charlie about complexity is really true for Solr users, and if that's the reason why
analyzer of user queries in SOLR 4.10?
This could be a dummy question. At index time, we can specify the fieldType of different fields; thus, we know the analyzers for those fields. In schema.xml, I do not see the configuration how to specify the fieldType (thus analyzer) for runtime user queries. Can anyone help explain this ? Thanks, DL
Re: analyzer of user queries in SOLR 4.10?
Thanks Erik. I am actually using edismax query parser in SOLR. I can explicitly specify the fieldType (e.g., text_general or text_en) for different fields (e.g., title or description) . But I do not see how to specify the fieldType (thus analyzer) for runtime queries. Thanks, DL On Sun, Nov 23, 2014 at 10:21 AM, Erik Hatcher erik.hatc...@gmail.com wrote: Query time analysis depends on the query parser in play. If a query parser chooses to analyze some or all of the query it will use the same analysis as index time unless specified separately (in the field type definition itself too) Erik On Nov 23, 2014, at 13:08, David Lee seek...@gmail.com wrote: This could be a dummy question. At index time, we can specify the fieldType of different fields; thus, we know the analyzers for those fields. In schema.xml, I do not see the configuration how to specify the fieldType (thus analyzer) for runtime user queries. Can anyone help explain this ? Thanks, DL -- SeekWWW: the Search Engine of Choice www.seekwww.com
Re: analyzer of user queries in SOLR 4.10?
Yes, my edismax parser is configured to query multiple fields, including qf, pf, pf2 and pf3. Is there any online documentation on multiple analysis chains might get used -- each field uses its own analysis chain ? Thanks, DL On Sun, Nov 23, 2014 at 1:34 PM, Shawn Heisey apa...@elyograg.org wrote: On 11/23/2014 2:13 PM, David Lee wrote: Thanks Erik. I am actually using edismax query parser in SOLR. I can explicitly specify the fieldType (e.g., text_general or text_en) for different fields (e.g., title or description) . But I do not see how to specify the fieldType (thus analyzer) for runtime queries. The query analysis is chosen by the field that you are querying. If the request sent to your edismax parser is configured to query multiple fields (qf, pf, etc), then multiple analysis chains might get used -- each field uses its own analysis chain. Setting the debugQuery parameter to true will show you exactly how a query was analyzed. The same thing can happen when you use multiple field:value clauses in your query. Thanks, Shawn -- SeekWWW: the Search Engine of Choice www.seekwww.com
SOLR bf SyntaxError
Hi, I tried to use bf for boosting, and got the following error: org.apache.solr.common.SolrException: org.apache.solr.search.SyntaxError: Unexpected text after function: ) Here's the bf boosting: str name=bfsum(div(product(log(map(reviews,0,0,1)),rating),2.5),div(log(map(sales,0,0,1)),10))/str What's the syntax issue here? Thanks, DL
Index complex JSON data in SOLR
Hi All, How do I index complex JSON data in SOLR? For example, {prices:[{state:CA, price:101.0}, {state:NJ, price:102.0},{state:CO, price:102.0}]} It's simple in ElasticSearch, but in SOLR it always reports the following error: Error parsing JSON field value. Unexpected OBJECT_START Thanks, DL
Re: Index complex JSON data in SOLR
Thanks Alex. I take a look at the approach of transforming JSON document before mapping it to the Solr schema at http://lucidworks.com/blog/indexing-custom-json-data/ . It's a walk-around. But in my case, if every state has its own price, the number of documents needs to be indexed will increase 50 times, which may have negative impact on performance,etc. {prices:[{state:CA, price:101.0}, {state:NJ, price:102.0},{state:CO, price:102.0}]} Is there any other better solution? Thanks, DL On Sat, Nov 15, 2014 at 2:17 PM, Alexandre Rafalovitch arafa...@gmail.com wrote: It's simple in Elasticsearch, but what you actually get is a single document and all it's children data ({state, price}) entries are joined together behind the scenes into the multivalued fields. Which may or may not be an issue for you. For Solr, nested documents need to be parent/child separate documents. And the syntax is a bit more explicit. So, you can either provide more explicit JSON: https://cwiki.apache.org/confluence/display/solr/Uploading+Data+with+Index+Handlers#UploadingDatawithIndexHandlers-NestedChildDocuments or transform JSON document before mapping it to the Solr schema: http://lucidworks.com/blog/indexing-custom-json-data/ (latest 4.10 Solr). Regards, Alex. Personal: http://www.outerthoughts.com/ and @arafalov Solr resources and newsletter: http://www.solr-start.com/ and @solrstart Solr popularizers community: https://www.linkedin.com/groups?gid=6713853 On 15 November 2014 17:05, David Lee seek...@gmail.com wrote: Hi All, How do I index complex JSON data in SOLR? For example, {prices:[{state:CA, price:101.0}, {state:NJ, price:102.0},{state:CO, price:102.0}]} It's simple in ElasticSearch, but in SOLR it always reports the following error: Error parsing JSON field value. Unexpected OBJECT_START Thanks, DL -- SeekWWW: the Search Engine of Choice www.seekwww.com
Re: Index complex JSON data in SOLR
Assume that we are selling a product online to 50 states in the USA. But each state has its own price. ALthough the base product information is the same, the index size will increase 50 times if we index that way. The usage is similar as searching a product; but based on the location of the user (e.g., which state the user is from), we may show a different price. On Sat, Nov 15, 2014 at 3:40 PM, Alexandre Rafalovitch arafa...@gmail.com wrote: The first link shows how to create children with specific content, but you need to use _childDocuments_:... explicitly instead of the prices: and perhaps add type: price or some such to differentiate record types. But I am not quite following why you say it will increase 50 times. By comparison to what? How did you want the children documents to be stored/found (in Elasticsearch or Solr)? One way to think through this problem is to be explicit about what the _search_ would look like and then adjust indexing accordingly. Regards, Alex. Personal: http://www.outerthoughts.com/ and @arafalov Solr resources and newsletter: http://www.solr-start.com/ and @solrstart Solr popularizers community: https://www.linkedin.com/groups?gid=6713853 On 15 November 2014 18:24, David Lee seek...@gmail.com wrote: Thanks Alex. I take a look at the approach of transforming JSON document before mapping it to the Solr schema at http://lucidworks.com/blog/indexing-custom-json-data/ . It's a walk-around. But in my case, if every state has its own price, the number of documents needs to be indexed will increase 50 times, which may have negative impact on performance,etc. {prices:[{state:CA, price:101.0}, {state:NJ, price:102.0},{state:CO, price:102.0}]} Is there any other better solution? Thanks, DL On Sat, Nov 15, 2014 at 2:17 PM, Alexandre Rafalovitch arafa...@gmail.com wrote: It's simple in Elasticsearch, but what you actually get is a single document and all it's children data ({state, price}) entries are joined together behind the scenes into the multivalued fields. Which may or may not be an issue for you. For Solr, nested documents need to be parent/child separate documents. And the syntax is a bit more explicit. So, you can either provide more explicit JSON: https://cwiki.apache.org/confluence/display/solr/Uploading+Data+with+Index+Handlers#UploadingDatawithIndexHandlers-NestedChildDocuments or transform JSON document before mapping it to the Solr schema: http://lucidworks.com/blog/indexing-custom-json-data/ (latest 4.10 Solr). Regards, Alex. Personal: http://www.outerthoughts.com/ and @arafalov Solr resources and newsletter: http://www.solr-start.com/ and @solrstart Solr popularizers community: https://www.linkedin.com/groups?gid=6713853 On 15 November 2014 17:05, David Lee seek...@gmail.com wrote: Hi All, How do I index complex JSON data in SOLR? For example, {prices:[{state:CA, price:101.0}, {state:NJ, price:102.0},{state:CO, price:102.0}]} It's simple in ElasticSearch, but in SOLR it always reports the following error: Error parsing JSON field value. Unexpected OBJECT_START Thanks, DL -- SeekWWW: the Search Engine of Choice www.seekwww.com -- SeekWWW: the Search Engine of Choice www.seekwww.com
Re: Index complex JSON data in SOLR
Thanks Alex and William for the suggestions. I'll try out the approach storing the JSON string. On Sat, Nov 15, 2014 at 5:27 PM, William Bell billnb...@gmail.com wrote: You can take 4.* of Solr and just apply my fix. Store JSON stringified into a string field (make sure the field name ends in _json). Then you can output with: wt=jsonjson.fsuffix=_json OK? Use SOLR-4685. On Sat, Nov 15, 2014 at 5:07 PM, Alexandre Rafalovitch arafa...@gmail.com wrote: It sounds to me that you are not actually searching on the state or price. So, does it make sense to store it in Solr? Maybe it should stay in external database and you merge it. Or store (not index) that json as pure text field and parse what you need out of it manually, as you would with Elasticsearch. But if you want to store states/prices separately in Solr, then you do have to pay the price somehow, right? And 50 times more documents may not actually have any impact on your performance. Solr scales really well. Especially, if you don't need to display some fields, because tokens in store=false/index=true fields are only stored once. Regards, Alex. Personal: http://www.outerthoughts.com/ and @arafalov Solr resources and newsletter: http://www.solr-start.com/ and @solrstart Solr popularizers community: https://www.linkedin.com/groups?gid=6713853 On 15 November 2014 18:53, David Lee seek...@gmail.com wrote: Assume that we are selling a product online to 50 states in the USA. But each state has its own price. ALthough the base product information is the same, the index size will increase 50 times if we index that way. The usage is similar as searching a product; but based on the location of the user (e.g., which state the user is from), we may show a different price. On Sat, Nov 15, 2014 at 3:40 PM, Alexandre Rafalovitch arafa...@gmail.com wrote: The first link shows how to create children with specific content, but you need to use _childDocuments_:... explicitly instead of the prices: and perhaps add type: price or some such to differentiate record types. But I am not quite following why you say it will increase 50 times. By comparison to what? How did you want the children documents to be stored/found (in Elasticsearch or Solr)? One way to think through this problem is to be explicit about what the _search_ would look like and then adjust indexing accordingly. Regards, Alex. Personal: http://www.outerthoughts.com/ and @arafalov Solr resources and newsletter: http://www.solr-start.com/ and @solrstart Solr popularizers community: https://www.linkedin.com/groups?gid=6713853 On 15 November 2014 18:24, David Lee seek...@gmail.com wrote: Thanks Alex. I take a look at the approach of transforming JSON document before mapping it to the Solr schema at http://lucidworks.com/blog/indexing-custom-json-data/ . It's a walk-around. But in my case, if every state has its own price, the number of documents needs to be indexed will increase 50 times, which may have negative impact on performance,etc. {prices:[{state:CA, price:101.0}, {state:NJ, price:102.0},{state:CO, price:102.0}]} Is there any other better solution? Thanks, DL On Sat, Nov 15, 2014 at 2:17 PM, Alexandre Rafalovitch arafa...@gmail.com wrote: It's simple in Elasticsearch, but what you actually get is a single document and all it's children data ({state, price}) entries are joined together behind the scenes into the multivalued fields. Which may or may not be an issue for you. For Solr, nested documents need to be parent/child separate documents. And the syntax is a bit more explicit. So, you can either provide more explicit JSON: https://cwiki.apache.org/confluence/display/solr/Uploading+Data+with+Index+Handlers#UploadingDatawithIndexHandlers-NestedChildDocuments or transform JSON document before mapping it to the Solr schema: http://lucidworks.com/blog/indexing-custom-json-data/ (latest 4.10 Solr). Regards, Alex. Personal: http://www.outerthoughts.com/ and @arafalov Solr resources and newsletter: http://www.solr-start.com/ and @solrstart Solr popularizers community: https://www.linkedin.com/groups?gid=6713853 On 15 November 2014 17:05, David Lee seek...@gmail.com wrote: Hi All, How do I index complex JSON data in SOLR? For example, {prices:[{state:CA, price:101.0}, {state:NJ, price:102.0},{state:CO, price:102.0}]} It's simple in ElasticSearch, but in SOLR it always reports the following error: Error parsing JSON field value. Unexpected OBJECT_START Thanks, DL -- SeekWWW: the Search Engine of Choice www.seekwww.com