Re: Index complex JSON data in SOLR
Hi David, you might want to look at SIREn 1.4 [1], a plugin for Lucene/Solr, that includes a update handler [2] which mimics elasticsearch index api. You can push JSON documents to the API and it will dynamically flatten and index the JSON documents into a set of fields (similar to Elasticsearch). It also index the full json into a SIREn's field to support nested queries. [1] http://siren.solutions/siren/downloads/ [2] http://siren.solutions/manual/solr-configuration-update-handler.html -- Renaud Delbru On 11/15/2014 10:05 PM, David Lee wrote: Hi All, How do I index complex JSON data in SOLR? For example, {prices:[{state:CA, price:101.0}, {state:NJ, price:102.0},{state:CO, price:102.0}]} It's simple in ElasticSearch, but in SOLR it always reports the following error: Error parsing JSON field value. Unexpected OBJECT_START Thanks, DL
Index complex JSON data in SOLR
Hi All, How do I index complex JSON data in SOLR? For example, {prices:[{state:CA, price:101.0}, {state:NJ, price:102.0},{state:CO, price:102.0}]} It's simple in ElasticSearch, but in SOLR it always reports the following error: Error parsing JSON field value. Unexpected OBJECT_START Thanks, DL
Re: Index complex JSON data in SOLR
It's simple in Elasticsearch, but what you actually get is a single document and all it's children data ({state, price}) entries are joined together behind the scenes into the multivalued fields. Which may or may not be an issue for you. For Solr, nested documents need to be parent/child separate documents. And the syntax is a bit more explicit. So, you can either provide more explicit JSON: https://cwiki.apache.org/confluence/display/solr/Uploading+Data+with+Index+Handlers#UploadingDatawithIndexHandlers-NestedChildDocuments or transform JSON document before mapping it to the Solr schema: http://lucidworks.com/blog/indexing-custom-json-data/ (latest 4.10 Solr). Regards, Alex. Personal: http://www.outerthoughts.com/ and @arafalov Solr resources and newsletter: http://www.solr-start.com/ and @solrstart Solr popularizers community: https://www.linkedin.com/groups?gid=6713853 On 15 November 2014 17:05, David Lee seek...@gmail.com wrote: Hi All, How do I index complex JSON data in SOLR? For example, {prices:[{state:CA, price:101.0}, {state:NJ, price:102.0},{state:CO, price:102.0}]} It's simple in ElasticSearch, but in SOLR it always reports the following error: Error parsing JSON field value. Unexpected OBJECT_START Thanks, DL
Re: Index complex JSON data in SOLR
Thanks Alex. I take a look at the approach of transforming JSON document before mapping it to the Solr schema at http://lucidworks.com/blog/indexing-custom-json-data/ . It's a walk-around. But in my case, if every state has its own price, the number of documents needs to be indexed will increase 50 times, which may have negative impact on performance,etc. {prices:[{state:CA, price:101.0}, {state:NJ, price:102.0},{state:CO, price:102.0}]} Is there any other better solution? Thanks, DL On Sat, Nov 15, 2014 at 2:17 PM, Alexandre Rafalovitch arafa...@gmail.com wrote: It's simple in Elasticsearch, but what you actually get is a single document and all it's children data ({state, price}) entries are joined together behind the scenes into the multivalued fields. Which may or may not be an issue for you. For Solr, nested documents need to be parent/child separate documents. And the syntax is a bit more explicit. So, you can either provide more explicit JSON: https://cwiki.apache.org/confluence/display/solr/Uploading+Data+with+Index+Handlers#UploadingDatawithIndexHandlers-NestedChildDocuments or transform JSON document before mapping it to the Solr schema: http://lucidworks.com/blog/indexing-custom-json-data/ (latest 4.10 Solr). Regards, Alex. Personal: http://www.outerthoughts.com/ and @arafalov Solr resources and newsletter: http://www.solr-start.com/ and @solrstart Solr popularizers community: https://www.linkedin.com/groups?gid=6713853 On 15 November 2014 17:05, David Lee seek...@gmail.com wrote: Hi All, How do I index complex JSON data in SOLR? For example, {prices:[{state:CA, price:101.0}, {state:NJ, price:102.0},{state:CO, price:102.0}]} It's simple in ElasticSearch, but in SOLR it always reports the following error: Error parsing JSON field value. Unexpected OBJECT_START Thanks, DL -- SeekWWW: the Search Engine of Choice www.seekwww.com
Re: Index complex JSON data in SOLR
The first link shows how to create children with specific content, but you need to use _childDocuments_:... explicitly instead of the prices: and perhaps add type: price or some such to differentiate record types. But I am not quite following why you say it will increase 50 times. By comparison to what? How did you want the children documents to be stored/found (in Elasticsearch or Solr)? One way to think through this problem is to be explicit about what the _search_ would look like and then adjust indexing accordingly. Regards, Alex. Personal: http://www.outerthoughts.com/ and @arafalov Solr resources and newsletter: http://www.solr-start.com/ and @solrstart Solr popularizers community: https://www.linkedin.com/groups?gid=6713853 On 15 November 2014 18:24, David Lee seek...@gmail.com wrote: Thanks Alex. I take a look at the approach of transforming JSON document before mapping it to the Solr schema at http://lucidworks.com/blog/indexing-custom-json-data/ . It's a walk-around. But in my case, if every state has its own price, the number of documents needs to be indexed will increase 50 times, which may have negative impact on performance,etc. {prices:[{state:CA, price:101.0}, {state:NJ, price:102.0},{state:CO, price:102.0}]} Is there any other better solution? Thanks, DL On Sat, Nov 15, 2014 at 2:17 PM, Alexandre Rafalovitch arafa...@gmail.com wrote: It's simple in Elasticsearch, but what you actually get is a single document and all it's children data ({state, price}) entries are joined together behind the scenes into the multivalued fields. Which may or may not be an issue for you. For Solr, nested documents need to be parent/child separate documents. And the syntax is a bit more explicit. So, you can either provide more explicit JSON: https://cwiki.apache.org/confluence/display/solr/Uploading+Data+with+Index+Handlers#UploadingDatawithIndexHandlers-NestedChildDocuments or transform JSON document before mapping it to the Solr schema: http://lucidworks.com/blog/indexing-custom-json-data/ (latest 4.10 Solr). Regards, Alex. Personal: http://www.outerthoughts.com/ and @arafalov Solr resources and newsletter: http://www.solr-start.com/ and @solrstart Solr popularizers community: https://www.linkedin.com/groups?gid=6713853 On 15 November 2014 17:05, David Lee seek...@gmail.com wrote: Hi All, How do I index complex JSON data in SOLR? For example, {prices:[{state:CA, price:101.0}, {state:NJ, price:102.0},{state:CO, price:102.0}]} It's simple in ElasticSearch, but in SOLR it always reports the following error: Error parsing JSON field value. Unexpected OBJECT_START Thanks, DL -- SeekWWW: the Search Engine of Choice www.seekwww.com
Re: Index complex JSON data in SOLR
Assume that we are selling a product online to 50 states in the USA. But each state has its own price. ALthough the base product information is the same, the index size will increase 50 times if we index that way. The usage is similar as searching a product; but based on the location of the user (e.g., which state the user is from), we may show a different price. On Sat, Nov 15, 2014 at 3:40 PM, Alexandre Rafalovitch arafa...@gmail.com wrote: The first link shows how to create children with specific content, but you need to use _childDocuments_:... explicitly instead of the prices: and perhaps add type: price or some such to differentiate record types. But I am not quite following why you say it will increase 50 times. By comparison to what? How did you want the children documents to be stored/found (in Elasticsearch or Solr)? One way to think through this problem is to be explicit about what the _search_ would look like and then adjust indexing accordingly. Regards, Alex. Personal: http://www.outerthoughts.com/ and @arafalov Solr resources and newsletter: http://www.solr-start.com/ and @solrstart Solr popularizers community: https://www.linkedin.com/groups?gid=6713853 On 15 November 2014 18:24, David Lee seek...@gmail.com wrote: Thanks Alex. I take a look at the approach of transforming JSON document before mapping it to the Solr schema at http://lucidworks.com/blog/indexing-custom-json-data/ . It's a walk-around. But in my case, if every state has its own price, the number of documents needs to be indexed will increase 50 times, which may have negative impact on performance,etc. {prices:[{state:CA, price:101.0}, {state:NJ, price:102.0},{state:CO, price:102.0}]} Is there any other better solution? Thanks, DL On Sat, Nov 15, 2014 at 2:17 PM, Alexandre Rafalovitch arafa...@gmail.com wrote: It's simple in Elasticsearch, but what you actually get is a single document and all it's children data ({state, price}) entries are joined together behind the scenes into the multivalued fields. Which may or may not be an issue for you. For Solr, nested documents need to be parent/child separate documents. And the syntax is a bit more explicit. So, you can either provide more explicit JSON: https://cwiki.apache.org/confluence/display/solr/Uploading+Data+with+Index+Handlers#UploadingDatawithIndexHandlers-NestedChildDocuments or transform JSON document before mapping it to the Solr schema: http://lucidworks.com/blog/indexing-custom-json-data/ (latest 4.10 Solr). Regards, Alex. Personal: http://www.outerthoughts.com/ and @arafalov Solr resources and newsletter: http://www.solr-start.com/ and @solrstart Solr popularizers community: https://www.linkedin.com/groups?gid=6713853 On 15 November 2014 17:05, David Lee seek...@gmail.com wrote: Hi All, How do I index complex JSON data in SOLR? For example, {prices:[{state:CA, price:101.0}, {state:NJ, price:102.0},{state:CO, price:102.0}]} It's simple in ElasticSearch, but in SOLR it always reports the following error: Error parsing JSON field value. Unexpected OBJECT_START Thanks, DL -- SeekWWW: the Search Engine of Choice www.seekwww.com -- SeekWWW: the Search Engine of Choice www.seekwww.com
Re: Index complex JSON data in SOLR
It sounds to me that you are not actually searching on the state or price. So, does it make sense to store it in Solr? Maybe it should stay in external database and you merge it. Or store (not index) that json as pure text field and parse what you need out of it manually, as you would with Elasticsearch. But if you want to store states/prices separately in Solr, then you do have to pay the price somehow, right? And 50 times more documents may not actually have any impact on your performance. Solr scales really well. Especially, if you don't need to display some fields, because tokens in store=false/index=true fields are only stored once. Regards, Alex. Personal: http://www.outerthoughts.com/ and @arafalov Solr resources and newsletter: http://www.solr-start.com/ and @solrstart Solr popularizers community: https://www.linkedin.com/groups?gid=6713853 On 15 November 2014 18:53, David Lee seek...@gmail.com wrote: Assume that we are selling a product online to 50 states in the USA. But each state has its own price. ALthough the base product information is the same, the index size will increase 50 times if we index that way. The usage is similar as searching a product; but based on the location of the user (e.g., which state the user is from), we may show a different price. On Sat, Nov 15, 2014 at 3:40 PM, Alexandre Rafalovitch arafa...@gmail.com wrote: The first link shows how to create children with specific content, but you need to use _childDocuments_:... explicitly instead of the prices: and perhaps add type: price or some such to differentiate record types. But I am not quite following why you say it will increase 50 times. By comparison to what? How did you want the children documents to be stored/found (in Elasticsearch or Solr)? One way to think through this problem is to be explicit about what the _search_ would look like and then adjust indexing accordingly. Regards, Alex. Personal: http://www.outerthoughts.com/ and @arafalov Solr resources and newsletter: http://www.solr-start.com/ and @solrstart Solr popularizers community: https://www.linkedin.com/groups?gid=6713853 On 15 November 2014 18:24, David Lee seek...@gmail.com wrote: Thanks Alex. I take a look at the approach of transforming JSON document before mapping it to the Solr schema at http://lucidworks.com/blog/indexing-custom-json-data/ . It's a walk-around. But in my case, if every state has its own price, the number of documents needs to be indexed will increase 50 times, which may have negative impact on performance,etc. {prices:[{state:CA, price:101.0}, {state:NJ, price:102.0},{state:CO, price:102.0}]} Is there any other better solution? Thanks, DL On Sat, Nov 15, 2014 at 2:17 PM, Alexandre Rafalovitch arafa...@gmail.com wrote: It's simple in Elasticsearch, but what you actually get is a single document and all it's children data ({state, price}) entries are joined together behind the scenes into the multivalued fields. Which may or may not be an issue for you. For Solr, nested documents need to be parent/child separate documents. And the syntax is a bit more explicit. So, you can either provide more explicit JSON: https://cwiki.apache.org/confluence/display/solr/Uploading+Data+with+Index+Handlers#UploadingDatawithIndexHandlers-NestedChildDocuments or transform JSON document before mapping it to the Solr schema: http://lucidworks.com/blog/indexing-custom-json-data/ (latest 4.10 Solr). Regards, Alex. Personal: http://www.outerthoughts.com/ and @arafalov Solr resources and newsletter: http://www.solr-start.com/ and @solrstart Solr popularizers community: https://www.linkedin.com/groups?gid=6713853 On 15 November 2014 17:05, David Lee seek...@gmail.com wrote: Hi All, How do I index complex JSON data in SOLR? For example, {prices:[{state:CA, price:101.0}, {state:NJ, price:102.0},{state:CO, price:102.0}]} It's simple in ElasticSearch, but in SOLR it always reports the following error: Error parsing JSON field value. Unexpected OBJECT_START Thanks, DL -- SeekWWW: the Search Engine of Choice www.seekwww.com -- SeekWWW: the Search Engine of Choice www.seekwww.com
Re: Index complex JSON data in SOLR
You can take 4.* of Solr and just apply my fix. Store JSON stringified into a string field (make sure the field name ends in _json). Then you can output with: wt=jsonjson.fsuffix=_json OK? Use SOLR-4685. On Sat, Nov 15, 2014 at 5:07 PM, Alexandre Rafalovitch arafa...@gmail.com wrote: It sounds to me that you are not actually searching on the state or price. So, does it make sense to store it in Solr? Maybe it should stay in external database and you merge it. Or store (not index) that json as pure text field and parse what you need out of it manually, as you would with Elasticsearch. But if you want to store states/prices separately in Solr, then you do have to pay the price somehow, right? And 50 times more documents may not actually have any impact on your performance. Solr scales really well. Especially, if you don't need to display some fields, because tokens in store=false/index=true fields are only stored once. Regards, Alex. Personal: http://www.outerthoughts.com/ and @arafalov Solr resources and newsletter: http://www.solr-start.com/ and @solrstart Solr popularizers community: https://www.linkedin.com/groups?gid=6713853 On 15 November 2014 18:53, David Lee seek...@gmail.com wrote: Assume that we are selling a product online to 50 states in the USA. But each state has its own price. ALthough the base product information is the same, the index size will increase 50 times if we index that way. The usage is similar as searching a product; but based on the location of the user (e.g., which state the user is from), we may show a different price. On Sat, Nov 15, 2014 at 3:40 PM, Alexandre Rafalovitch arafa...@gmail.com wrote: The first link shows how to create children with specific content, but you need to use _childDocuments_:... explicitly instead of the prices: and perhaps add type: price or some such to differentiate record types. But I am not quite following why you say it will increase 50 times. By comparison to what? How did you want the children documents to be stored/found (in Elasticsearch or Solr)? One way to think through this problem is to be explicit about what the _search_ would look like and then adjust indexing accordingly. Regards, Alex. Personal: http://www.outerthoughts.com/ and @arafalov Solr resources and newsletter: http://www.solr-start.com/ and @solrstart Solr popularizers community: https://www.linkedin.com/groups?gid=6713853 On 15 November 2014 18:24, David Lee seek...@gmail.com wrote: Thanks Alex. I take a look at the approach of transforming JSON document before mapping it to the Solr schema at http://lucidworks.com/blog/indexing-custom-json-data/ . It's a walk-around. But in my case, if every state has its own price, the number of documents needs to be indexed will increase 50 times, which may have negative impact on performance,etc. {prices:[{state:CA, price:101.0}, {state:NJ, price:102.0},{state:CO, price:102.0}]} Is there any other better solution? Thanks, DL On Sat, Nov 15, 2014 at 2:17 PM, Alexandre Rafalovitch arafa...@gmail.com wrote: It's simple in Elasticsearch, but what you actually get is a single document and all it's children data ({state, price}) entries are joined together behind the scenes into the multivalued fields. Which may or may not be an issue for you. For Solr, nested documents need to be parent/child separate documents. And the syntax is a bit more explicit. So, you can either provide more explicit JSON: https://cwiki.apache.org/confluence/display/solr/Uploading+Data+with+Index+Handlers#UploadingDatawithIndexHandlers-NestedChildDocuments or transform JSON document before mapping it to the Solr schema: http://lucidworks.com/blog/indexing-custom-json-data/ (latest 4.10 Solr). Regards, Alex. Personal: http://www.outerthoughts.com/ and @arafalov Solr resources and newsletter: http://www.solr-start.com/ and @solrstart Solr popularizers community: https://www.linkedin.com/groups?gid=6713853 On 15 November 2014 17:05, David Lee seek...@gmail.com wrote: Hi All, How do I index complex JSON data in SOLR? For example, {prices:[{state:CA, price:101.0}, {state:NJ, price:102.0},{state:CO, price:102.0}]} It's simple in ElasticSearch, but in SOLR it always reports the following error: Error parsing JSON field value. Unexpected OBJECT_START Thanks, DL -- SeekWWW: the Search Engine of Choice www.seekwww.com -- SeekWWW: the Search Engine of Choice www.seekwww.com -- Bill Bell billnb...@gmail.com cell 720-256-8076
Re: Index complex JSON data in SOLR
Thanks Alex and William for the suggestions. I'll try out the approach storing the JSON string. On Sat, Nov 15, 2014 at 5:27 PM, William Bell billnb...@gmail.com wrote: You can take 4.* of Solr and just apply my fix. Store JSON stringified into a string field (make sure the field name ends in _json). Then you can output with: wt=jsonjson.fsuffix=_json OK? Use SOLR-4685. On Sat, Nov 15, 2014 at 5:07 PM, Alexandre Rafalovitch arafa...@gmail.com wrote: It sounds to me that you are not actually searching on the state or price. So, does it make sense to store it in Solr? Maybe it should stay in external database and you merge it. Or store (not index) that json as pure text field and parse what you need out of it manually, as you would with Elasticsearch. But if you want to store states/prices separately in Solr, then you do have to pay the price somehow, right? And 50 times more documents may not actually have any impact on your performance. Solr scales really well. Especially, if you don't need to display some fields, because tokens in store=false/index=true fields are only stored once. Regards, Alex. Personal: http://www.outerthoughts.com/ and @arafalov Solr resources and newsletter: http://www.solr-start.com/ and @solrstart Solr popularizers community: https://www.linkedin.com/groups?gid=6713853 On 15 November 2014 18:53, David Lee seek...@gmail.com wrote: Assume that we are selling a product online to 50 states in the USA. But each state has its own price. ALthough the base product information is the same, the index size will increase 50 times if we index that way. The usage is similar as searching a product; but based on the location of the user (e.g., which state the user is from), we may show a different price. On Sat, Nov 15, 2014 at 3:40 PM, Alexandre Rafalovitch arafa...@gmail.com wrote: The first link shows how to create children with specific content, but you need to use _childDocuments_:... explicitly instead of the prices: and perhaps add type: price or some such to differentiate record types. But I am not quite following why you say it will increase 50 times. By comparison to what? How did you want the children documents to be stored/found (in Elasticsearch or Solr)? One way to think through this problem is to be explicit about what the _search_ would look like and then adjust indexing accordingly. Regards, Alex. Personal: http://www.outerthoughts.com/ and @arafalov Solr resources and newsletter: http://www.solr-start.com/ and @solrstart Solr popularizers community: https://www.linkedin.com/groups?gid=6713853 On 15 November 2014 18:24, David Lee seek...@gmail.com wrote: Thanks Alex. I take a look at the approach of transforming JSON document before mapping it to the Solr schema at http://lucidworks.com/blog/indexing-custom-json-data/ . It's a walk-around. But in my case, if every state has its own price, the number of documents needs to be indexed will increase 50 times, which may have negative impact on performance,etc. {prices:[{state:CA, price:101.0}, {state:NJ, price:102.0},{state:CO, price:102.0}]} Is there any other better solution? Thanks, DL On Sat, Nov 15, 2014 at 2:17 PM, Alexandre Rafalovitch arafa...@gmail.com wrote: It's simple in Elasticsearch, but what you actually get is a single document and all it's children data ({state, price}) entries are joined together behind the scenes into the multivalued fields. Which may or may not be an issue for you. For Solr, nested documents need to be parent/child separate documents. And the syntax is a bit more explicit. So, you can either provide more explicit JSON: https://cwiki.apache.org/confluence/display/solr/Uploading+Data+with+Index+Handlers#UploadingDatawithIndexHandlers-NestedChildDocuments or transform JSON document before mapping it to the Solr schema: http://lucidworks.com/blog/indexing-custom-json-data/ (latest 4.10 Solr). Regards, Alex. Personal: http://www.outerthoughts.com/ and @arafalov Solr resources and newsletter: http://www.solr-start.com/ and @solrstart Solr popularizers community: https://www.linkedin.com/groups?gid=6713853 On 15 November 2014 17:05, David Lee seek...@gmail.com wrote: Hi All, How do I index complex JSON data in SOLR? For example, {prices:[{state:CA, price:101.0}, {state:NJ, price:102.0},{state:CO, price:102.0}]} It's simple in ElasticSearch, but in SOLR it always reports the following error: Error parsing JSON field value. Unexpected OBJECT_START Thanks, DL -- SeekWWW: the Search Engine of Choice www.seekwww.com