Re: Index complex JSON data in SOLR

2014-11-20 Thread Renaud Delbru

Hi David,

you might want to look at SIREn 1.4 [1], a plugin for Lucene/Solr, that 
includes a update handler [2] which mimics elasticsearch index api. You 
can push JSON documents to the API and it will dynamically flatten and 
index the JSON documents into a set of fields (similar to 
Elasticsearch). It also index the full json into a SIREn's field to 
support nested queries.


[1] http://siren.solutions/siren/downloads/
[2] http://siren.solutions/manual/solr-configuration-update-handler.html

--
Renaud Delbru

On 11/15/2014 10:05 PM, David Lee wrote:

Hi All,

How do I index complex JSON data in SOLR? For example,

{prices:[{state:CA, price:101.0}, {state:NJ,
price:102.0},{state:CO, price:102.0}]}


It's simple in ElasticSearch, but in SOLR it always reports the following
error:
Error parsing JSON field value. Unexpected OBJECT_START


Thanks,
DL



Index complex JSON data in SOLR

2014-11-15 Thread David Lee
Hi All,

How do I index complex JSON data in SOLR? For example,

{prices:[{state:CA, price:101.0}, {state:NJ,
price:102.0},{state:CO, price:102.0}]}


It's simple in ElasticSearch, but in SOLR it always reports the following
error:
Error parsing JSON field value. Unexpected OBJECT_START


Thanks,
DL


Re: Index complex JSON data in SOLR

2014-11-15 Thread Alexandre Rafalovitch
It's simple in Elasticsearch, but what you actually get is a single
document and all it's children data ({state, price}) entries are
joined together behind the scenes into the multivalued fields. Which
may or may not be an issue for you.

For Solr, nested documents need to be parent/child separate documents.
And the syntax is a bit more explicit. So, you can either provide more
explicit JSON:
https://cwiki.apache.org/confluence/display/solr/Uploading+Data+with+Index+Handlers#UploadingDatawithIndexHandlers-NestedChildDocuments

or transform JSON document before mapping it to the Solr schema:
http://lucidworks.com/blog/indexing-custom-json-data/ (latest 4.10 Solr).

Regards,
   Alex.
Personal: http://www.outerthoughts.com/ and @arafalov
Solr resources and newsletter: http://www.solr-start.com/ and @solrstart
Solr popularizers community: https://www.linkedin.com/groups?gid=6713853


On 15 November 2014 17:05, David Lee seek...@gmail.com wrote:
 Hi All,

 How do I index complex JSON data in SOLR? For example,

 {prices:[{state:CA, price:101.0}, {state:NJ,
 price:102.0},{state:CO, price:102.0}]}


 It's simple in ElasticSearch, but in SOLR it always reports the following
 error:
 Error parsing JSON field value. Unexpected OBJECT_START


 Thanks,
 DL


Re: Index complex JSON data in SOLR

2014-11-15 Thread David Lee
Thanks Alex.   I  take a look at the approach of transforming JSON document
before mapping it to the Solr schema at
http://lucidworks.com/blog/indexing-custom-json-data/ .

It's  a walk-around.  But in my case,  if every state has its own price,
 the number of documents needs to be indexed will increase 50 times,  which
may have negative impact on performance,etc.

{prices:[{state:CA, price:101.0}, {state:NJ,
price:102.0},{state:CO, price:102.0}]}

Is there any other better solution?

Thanks,
DL

On Sat, Nov 15, 2014 at 2:17 PM, Alexandre Rafalovitch arafa...@gmail.com
wrote:

 It's simple in Elasticsearch, but what you actually get is a single
 document and all it's children data ({state, price}) entries are
 joined together behind the scenes into the multivalued fields. Which
 may or may not be an issue for you.

 For Solr, nested documents need to be parent/child separate documents.
 And the syntax is a bit more explicit. So, you can either provide more
 explicit JSON:

 https://cwiki.apache.org/confluence/display/solr/Uploading+Data+with+Index+Handlers#UploadingDatawithIndexHandlers-NestedChildDocuments

 or transform JSON document before mapping it to the Solr schema:
 http://lucidworks.com/blog/indexing-custom-json-data/ (latest 4.10 Solr).

 Regards,
Alex.
 Personal: http://www.outerthoughts.com/ and @arafalov
 Solr resources and newsletter: http://www.solr-start.com/ and @solrstart
 Solr popularizers community: https://www.linkedin.com/groups?gid=6713853


 On 15 November 2014 17:05, David Lee seek...@gmail.com wrote:
  Hi All,
 
  How do I index complex JSON data in SOLR? For example,
 
  {prices:[{state:CA, price:101.0}, {state:NJ,
  price:102.0},{state:CO, price:102.0}]}
 
 
  It's simple in ElasticSearch, but in SOLR it always reports the following
  error:
  Error parsing JSON field value. Unexpected OBJECT_START
 
 
  Thanks,
  DL




-- 
SeekWWW: the Search Engine of Choice
www.seekwww.com


Re: Index complex JSON data in SOLR

2014-11-15 Thread Alexandre Rafalovitch
The first link shows how to create children with specific content, but
you need to use _childDocuments_:... explicitly instead of the
prices:  and perhaps add type: price or some such to differentiate
record types.

But I am not quite following why you say it will increase 50 times. By
comparison to what? How did you want the children documents to be
stored/found (in Elasticsearch or Solr)?

One way to think through this problem is to be explicit about what the
_search_ would look like and then adjust indexing accordingly.


Regards,
Alex.
Personal: http://www.outerthoughts.com/ and @arafalov
Solr resources and newsletter: http://www.solr-start.com/ and @solrstart
Solr popularizers community: https://www.linkedin.com/groups?gid=6713853


On 15 November 2014 18:24, David Lee seek...@gmail.com wrote:
 Thanks Alex.   I  take a look at the approach of transforming JSON document
 before mapping it to the Solr schema at
 http://lucidworks.com/blog/indexing-custom-json-data/ .

 It's  a walk-around.  But in my case,  if every state has its own price,
  the number of documents needs to be indexed will increase 50 times,  which
 may have negative impact on performance,etc.

 {prices:[{state:CA, price:101.0}, {state:NJ,
 price:102.0},{state:CO, price:102.0}]}

 Is there any other better solution?

 Thanks,
 DL

 On Sat, Nov 15, 2014 at 2:17 PM, Alexandre Rafalovitch arafa...@gmail.com
 wrote:

 It's simple in Elasticsearch, but what you actually get is a single
 document and all it's children data ({state, price}) entries are
 joined together behind the scenes into the multivalued fields. Which
 may or may not be an issue for you.

 For Solr, nested documents need to be parent/child separate documents.
 And the syntax is a bit more explicit. So, you can either provide more
 explicit JSON:

 https://cwiki.apache.org/confluence/display/solr/Uploading+Data+with+Index+Handlers#UploadingDatawithIndexHandlers-NestedChildDocuments

 or transform JSON document before mapping it to the Solr schema:
 http://lucidworks.com/blog/indexing-custom-json-data/ (latest 4.10 Solr).

 Regards,
Alex.
 Personal: http://www.outerthoughts.com/ and @arafalov
 Solr resources and newsletter: http://www.solr-start.com/ and @solrstart
 Solr popularizers community: https://www.linkedin.com/groups?gid=6713853


 On 15 November 2014 17:05, David Lee seek...@gmail.com wrote:
  Hi All,
 
  How do I index complex JSON data in SOLR? For example,
 
  {prices:[{state:CA, price:101.0}, {state:NJ,
  price:102.0},{state:CO, price:102.0}]}
 
 
  It's simple in ElasticSearch, but in SOLR it always reports the following
  error:
  Error parsing JSON field value. Unexpected OBJECT_START
 
 
  Thanks,
  DL




 --
 SeekWWW: the Search Engine of Choice
 www.seekwww.com


Re: Index complex JSON data in SOLR

2014-11-15 Thread David Lee
Assume that we are selling a product online to  50 states in the USA.  But
each state has its own price.  ALthough the base product information is the
same,  the index size will increase 50 times if we index that way.

The usage is similar as searching a product; but based on the location of
the user (e.g., which state the user is from), we may show a different
price.

On Sat, Nov 15, 2014 at 3:40 PM, Alexandre Rafalovitch arafa...@gmail.com
wrote:

 The first link shows how to create children with specific content, but
 you need to use _childDocuments_:... explicitly instead of the
 prices:  and perhaps add type: price or some such to differentiate
 record types.

 But I am not quite following why you say it will increase 50 times. By
 comparison to what? How did you want the children documents to be
 stored/found (in Elasticsearch or Solr)?

 One way to think through this problem is to be explicit about what the
 _search_ would look like and then adjust indexing accordingly.


 Regards,
 Alex.
 Personal: http://www.outerthoughts.com/ and @arafalov
 Solr resources and newsletter: http://www.solr-start.com/ and @solrstart
 Solr popularizers community: https://www.linkedin.com/groups?gid=6713853


 On 15 November 2014 18:24, David Lee seek...@gmail.com wrote:
  Thanks Alex.   I  take a look at the approach of transforming JSON
 document
  before mapping it to the Solr schema at
  http://lucidworks.com/blog/indexing-custom-json-data/ .
 
  It's  a walk-around.  But in my case,  if every state has its own price,
   the number of documents needs to be indexed will increase 50 times,
 which
  may have negative impact on performance,etc.
 
  {prices:[{state:CA, price:101.0}, {state:NJ,
  price:102.0},{state:CO, price:102.0}]}
 
  Is there any other better solution?
 
  Thanks,
  DL
 
  On Sat, Nov 15, 2014 at 2:17 PM, Alexandre Rafalovitch 
 arafa...@gmail.com
  wrote:
 
  It's simple in Elasticsearch, but what you actually get is a single
  document and all it's children data ({state, price}) entries are
  joined together behind the scenes into the multivalued fields. Which
  may or may not be an issue for you.
 
  For Solr, nested documents need to be parent/child separate documents.
  And the syntax is a bit more explicit. So, you can either provide more
  explicit JSON:
 
 
 https://cwiki.apache.org/confluence/display/solr/Uploading+Data+with+Index+Handlers#UploadingDatawithIndexHandlers-NestedChildDocuments
 
  or transform JSON document before mapping it to the Solr schema:
  http://lucidworks.com/blog/indexing-custom-json-data/ (latest 4.10
 Solr).
 
  Regards,
 Alex.
  Personal: http://www.outerthoughts.com/ and @arafalov
  Solr resources and newsletter: http://www.solr-start.com/ and
 @solrstart
  Solr popularizers community:
 https://www.linkedin.com/groups?gid=6713853
 
 
  On 15 November 2014 17:05, David Lee seek...@gmail.com wrote:
   Hi All,
  
   How do I index complex JSON data in SOLR? For example,
  
   {prices:[{state:CA, price:101.0}, {state:NJ,
   price:102.0},{state:CO, price:102.0}]}
  
  
   It's simple in ElasticSearch, but in SOLR it always reports the
 following
   error:
   Error parsing JSON field value. Unexpected OBJECT_START
  
  
   Thanks,
   DL
 
 
 
 
  --
  SeekWWW: the Search Engine of Choice
  www.seekwww.com




-- 
SeekWWW: the Search Engine of Choice
www.seekwww.com


Re: Index complex JSON data in SOLR

2014-11-15 Thread Alexandre Rafalovitch
It sounds to me that you are not actually searching on the state or
price. So, does it make sense to store it in Solr? Maybe it should
stay in external database and you merge it. Or store (not index) that
json as pure text field and parse what you need out of it manually, as
you would with Elasticsearch.

But if you want to store states/prices separately in Solr, then you do
have to pay the price somehow, right? And 50 times more documents may
not actually have any impact on your performance. Solr scales really
well. Especially, if you don't need to display some fields, because
tokens in store=false/index=true fields are only stored once.

Regards,
Alex.
Personal: http://www.outerthoughts.com/ and @arafalov
Solr resources and newsletter: http://www.solr-start.com/ and @solrstart
Solr popularizers community: https://www.linkedin.com/groups?gid=6713853


On 15 November 2014 18:53, David Lee seek...@gmail.com wrote:
 Assume that we are selling a product online to  50 states in the USA.  But
 each state has its own price.  ALthough the base product information is the
 same,  the index size will increase 50 times if we index that way.

 The usage is similar as searching a product; but based on the location of
 the user (e.g., which state the user is from), we may show a different
 price.

 On Sat, Nov 15, 2014 at 3:40 PM, Alexandre Rafalovitch arafa...@gmail.com
 wrote:

 The first link shows how to create children with specific content, but
 you need to use _childDocuments_:... explicitly instead of the
 prices:  and perhaps add type: price or some such to differentiate
 record types.

 But I am not quite following why you say it will increase 50 times. By
 comparison to what? How did you want the children documents to be
 stored/found (in Elasticsearch or Solr)?

 One way to think through this problem is to be explicit about what the
 _search_ would look like and then adjust indexing accordingly.


 Regards,
 Alex.
 Personal: http://www.outerthoughts.com/ and @arafalov
 Solr resources and newsletter: http://www.solr-start.com/ and @solrstart
 Solr popularizers community: https://www.linkedin.com/groups?gid=6713853


 On 15 November 2014 18:24, David Lee seek...@gmail.com wrote:
  Thanks Alex.   I  take a look at the approach of transforming JSON
 document
  before mapping it to the Solr schema at
  http://lucidworks.com/blog/indexing-custom-json-data/ .
 
  It's  a walk-around.  But in my case,  if every state has its own price,
   the number of documents needs to be indexed will increase 50 times,
 which
  may have negative impact on performance,etc.
 
  {prices:[{state:CA, price:101.0}, {state:NJ,
  price:102.0},{state:CO, price:102.0}]}
 
  Is there any other better solution?
 
  Thanks,
  DL
 
  On Sat, Nov 15, 2014 at 2:17 PM, Alexandre Rafalovitch 
 arafa...@gmail.com
  wrote:
 
  It's simple in Elasticsearch, but what you actually get is a single
  document and all it's children data ({state, price}) entries are
  joined together behind the scenes into the multivalued fields. Which
  may or may not be an issue for you.
 
  For Solr, nested documents need to be parent/child separate documents.
  And the syntax is a bit more explicit. So, you can either provide more
  explicit JSON:
 
 
 https://cwiki.apache.org/confluence/display/solr/Uploading+Data+with+Index+Handlers#UploadingDatawithIndexHandlers-NestedChildDocuments
 
  or transform JSON document before mapping it to the Solr schema:
  http://lucidworks.com/blog/indexing-custom-json-data/ (latest 4.10
 Solr).
 
  Regards,
 Alex.
  Personal: http://www.outerthoughts.com/ and @arafalov
  Solr resources and newsletter: http://www.solr-start.com/ and
 @solrstart
  Solr popularizers community:
 https://www.linkedin.com/groups?gid=6713853
 
 
  On 15 November 2014 17:05, David Lee seek...@gmail.com wrote:
   Hi All,
  
   How do I index complex JSON data in SOLR? For example,
  
   {prices:[{state:CA, price:101.0}, {state:NJ,
   price:102.0},{state:CO, price:102.0}]}
  
  
   It's simple in ElasticSearch, but in SOLR it always reports the
 following
   error:
   Error parsing JSON field value. Unexpected OBJECT_START
  
  
   Thanks,
   DL
 
 
 
 
  --
  SeekWWW: the Search Engine of Choice
  www.seekwww.com




 --
 SeekWWW: the Search Engine of Choice
 www.seekwww.com


Re: Index complex JSON data in SOLR

2014-11-15 Thread William Bell
You can take 4.* of Solr and just apply my fix.

Store JSON stringified into a string field (make sure the field name ends
in _json). Then you can output with: wt=jsonjson.fsuffix=_json

OK?

Use SOLR-4685.



On Sat, Nov 15, 2014 at 5:07 PM, Alexandre Rafalovitch arafa...@gmail.com
wrote:

 It sounds to me that you are not actually searching on the state or
 price. So, does it make sense to store it in Solr? Maybe it should
 stay in external database and you merge it. Or store (not index) that
 json as pure text field and parse what you need out of it manually, as
 you would with Elasticsearch.

 But if you want to store states/prices separately in Solr, then you do
 have to pay the price somehow, right? And 50 times more documents may
 not actually have any impact on your performance. Solr scales really
 well. Especially, if you don't need to display some fields, because
 tokens in store=false/index=true fields are only stored once.

 Regards,
 Alex.
 Personal: http://www.outerthoughts.com/ and @arafalov
 Solr resources and newsletter: http://www.solr-start.com/ and @solrstart
 Solr popularizers community: https://www.linkedin.com/groups?gid=6713853


 On 15 November 2014 18:53, David Lee seek...@gmail.com wrote:
  Assume that we are selling a product online to  50 states in the USA.
 But
  each state has its own price.  ALthough the base product information is
 the
  same,  the index size will increase 50 times if we index that way.
 
  The usage is similar as searching a product; but based on the location of
  the user (e.g., which state the user is from), we may show a different
  price.
 
  On Sat, Nov 15, 2014 at 3:40 PM, Alexandre Rafalovitch 
 arafa...@gmail.com
  wrote:
 
  The first link shows how to create children with specific content, but
  you need to use _childDocuments_:... explicitly instead of the
  prices:  and perhaps add type: price or some such to differentiate
  record types.
 
  But I am not quite following why you say it will increase 50 times. By
  comparison to what? How did you want the children documents to be
  stored/found (in Elasticsearch or Solr)?
 
  One way to think through this problem is to be explicit about what the
  _search_ would look like and then adjust indexing accordingly.
 
 
  Regards,
  Alex.
  Personal: http://www.outerthoughts.com/ and @arafalov
  Solr resources and newsletter: http://www.solr-start.com/ and
 @solrstart
  Solr popularizers community:
 https://www.linkedin.com/groups?gid=6713853
 
 
  On 15 November 2014 18:24, David Lee seek...@gmail.com wrote:
   Thanks Alex.   I  take a look at the approach of transforming JSON
  document
   before mapping it to the Solr schema at
   http://lucidworks.com/blog/indexing-custom-json-data/ .
  
   It's  a walk-around.  But in my case,  if every state has its own
 price,
the number of documents needs to be indexed will increase 50 times,
  which
   may have negative impact on performance,etc.
  
   {prices:[{state:CA, price:101.0}, {state:NJ,
   price:102.0},{state:CO, price:102.0}]}
  
   Is there any other better solution?
  
   Thanks,
   DL
  
   On Sat, Nov 15, 2014 at 2:17 PM, Alexandre Rafalovitch 
  arafa...@gmail.com
   wrote:
  
   It's simple in Elasticsearch, but what you actually get is a single
   document and all it's children data ({state, price}) entries are
   joined together behind the scenes into the multivalued fields. Which
   may or may not be an issue for you.
  
   For Solr, nested documents need to be parent/child separate
 documents.
   And the syntax is a bit more explicit. So, you can either provide
 more
   explicit JSON:
  
  
 
 https://cwiki.apache.org/confluence/display/solr/Uploading+Data+with+Index+Handlers#UploadingDatawithIndexHandlers-NestedChildDocuments
  
   or transform JSON document before mapping it to the Solr schema:
   http://lucidworks.com/blog/indexing-custom-json-data/ (latest 4.10
  Solr).
  
   Regards,
  Alex.
   Personal: http://www.outerthoughts.com/ and @arafalov
   Solr resources and newsletter: http://www.solr-start.com/ and
  @solrstart
   Solr popularizers community:
  https://www.linkedin.com/groups?gid=6713853
  
  
   On 15 November 2014 17:05, David Lee seek...@gmail.com wrote:
Hi All,
   
How do I index complex JSON data in SOLR? For example,
   
{prices:[{state:CA, price:101.0}, {state:NJ,
price:102.0},{state:CO, price:102.0}]}
   
   
It's simple in ElasticSearch, but in SOLR it always reports the
  following
error:
Error parsing JSON field value. Unexpected OBJECT_START
   
   
Thanks,
DL
  
  
  
  
   --
   SeekWWW: the Search Engine of Choice
   www.seekwww.com
 
 
 
 
  --
  SeekWWW: the Search Engine of Choice
  www.seekwww.com




-- 
Bill Bell
billnb...@gmail.com
cell 720-256-8076


Re: Index complex JSON data in SOLR

2014-11-15 Thread David Lee
Thanks Alex and William for the suggestions.  I'll try out the approach
storing the JSON string.

On Sat, Nov 15, 2014 at 5:27 PM, William Bell billnb...@gmail.com wrote:

 You can take 4.* of Solr and just apply my fix.

 Store JSON stringified into a string field (make sure the field name ends
 in _json). Then you can output with: wt=jsonjson.fsuffix=_json

 OK?

 Use SOLR-4685.



 On Sat, Nov 15, 2014 at 5:07 PM, Alexandre Rafalovitch arafa...@gmail.com
 
 wrote:

  It sounds to me that you are not actually searching on the state or
  price. So, does it make sense to store it in Solr? Maybe it should
  stay in external database and you merge it. Or store (not index) that
  json as pure text field and parse what you need out of it manually, as
  you would with Elasticsearch.
 
  But if you want to store states/prices separately in Solr, then you do
  have to pay the price somehow, right? And 50 times more documents may
  not actually have any impact on your performance. Solr scales really
  well. Especially, if you don't need to display some fields, because
  tokens in store=false/index=true fields are only stored once.
 
  Regards,
  Alex.
  Personal: http://www.outerthoughts.com/ and @arafalov
  Solr resources and newsletter: http://www.solr-start.com/ and @solrstart
  Solr popularizers community: https://www.linkedin.com/groups?gid=6713853
 
 
  On 15 November 2014 18:53, David Lee seek...@gmail.com wrote:
   Assume that we are selling a product online to  50 states in the USA.
  But
   each state has its own price.  ALthough the base product information is
  the
   same,  the index size will increase 50 times if we index that way.
  
   The usage is similar as searching a product; but based on the location
 of
   the user (e.g., which state the user is from), we may show a different
   price.
  
   On Sat, Nov 15, 2014 at 3:40 PM, Alexandre Rafalovitch 
  arafa...@gmail.com
   wrote:
  
   The first link shows how to create children with specific content, but
   you need to use _childDocuments_:... explicitly instead of the
   prices:  and perhaps add type: price or some such to differentiate
   record types.
  
   But I am not quite following why you say it will increase 50 times. By
   comparison to what? How did you want the children documents to be
   stored/found (in Elasticsearch or Solr)?
  
   One way to think through this problem is to be explicit about what the
   _search_ would look like and then adjust indexing accordingly.
  
  
   Regards,
   Alex.
   Personal: http://www.outerthoughts.com/ and @arafalov
   Solr resources and newsletter: http://www.solr-start.com/ and
  @solrstart
   Solr popularizers community:
  https://www.linkedin.com/groups?gid=6713853
  
  
   On 15 November 2014 18:24, David Lee seek...@gmail.com wrote:
Thanks Alex.   I  take a look at the approach of transforming JSON
   document
before mapping it to the Solr schema at
http://lucidworks.com/blog/indexing-custom-json-data/ .
   
It's  a walk-around.  But in my case,  if every state has its own
  price,
 the number of documents needs to be indexed will increase 50 times,
   which
may have negative impact on performance,etc.
   
{prices:[{state:CA, price:101.0}, {state:NJ,
price:102.0},{state:CO, price:102.0}]}
   
Is there any other better solution?
   
Thanks,
DL
   
On Sat, Nov 15, 2014 at 2:17 PM, Alexandre Rafalovitch 
   arafa...@gmail.com
wrote:
   
It's simple in Elasticsearch, but what you actually get is a single
document and all it's children data ({state, price}) entries are
joined together behind the scenes into the multivalued fields.
 Which
may or may not be an issue for you.
   
For Solr, nested documents need to be parent/child separate
  documents.
And the syntax is a bit more explicit. So, you can either provide
  more
explicit JSON:
   
   
  
 
 https://cwiki.apache.org/confluence/display/solr/Uploading+Data+with+Index+Handlers#UploadingDatawithIndexHandlers-NestedChildDocuments
   
or transform JSON document before mapping it to the Solr schema:
http://lucidworks.com/blog/indexing-custom-json-data/ (latest 4.10
   Solr).
   
Regards,
   Alex.
Personal: http://www.outerthoughts.com/ and @arafalov
Solr resources and newsletter: http://www.solr-start.com/ and
   @solrstart
Solr popularizers community:
   https://www.linkedin.com/groups?gid=6713853
   
   
On 15 November 2014 17:05, David Lee seek...@gmail.com wrote:
 Hi All,

 How do I index complex JSON data in SOLR? For example,

 {prices:[{state:CA, price:101.0}, {state:NJ,
 price:102.0},{state:CO, price:102.0}]}


 It's simple in ElasticSearch, but in SOLR it always reports the
   following
 error:
 Error parsing JSON field value. Unexpected OBJECT_START


 Thanks,
 DL
   
   
   
   
--
SeekWWW: the Search Engine of Choice
www.seekwww.com