Best approach for flattening data or using nested lists.

2018-02-04 Thread David Lee
I have a project where I am working with nested data (not that deep, but 
multiple lists) and would love to get some advice from other experienced 
developers. I've read most of the books on Solr (including Solr In 
Action) and though they provide good information (though dated) on the 
actual indexing mechanism, not many deal with this issue very much.


If there are other resources that aren't necessarily Solr specific that 
can help here, please feel free to point those out.


Here is the structure I'm working with. I've made it generic to simplify 
things, but the intent is here.


{
    id: 1,

    _type: "book",
    name: "My Martian",
    genre: "Science Fiction",
    edits: [
    {
    _type: "book_action",
    action: "Modify",
    chapter: 3,
    description: "Corrected spelling for interstellar"
    }, {
    _type: "book_action",
    action: "Removal",
    chapter: 24,
    description: "Removed chapter as it adds no value to the story"
    }
    ],
    chapters: [
    {
    _type: "book_chapter",
    chapter_number: 1,
    chapter_title: "The Test"
    }, {
    _type: "book_chapter",
    chapter_number: 2,
    chapter_title: "The Next Test"
    }
    ]
}

My first attempt was to just add both lists through SolrJ (can't do this 
with the JSON interface since it doesn't allow multiple _childDocuments_ 
at the same level). That works and I'm able to use the _type value to 
distinguish between them. However, my problem here is that the users 
want to be able to search for any field in the top level of the data as 
well as within the lists. For example (using sql for clarity only):


select * from book_index where genre = "Science Fiction" and action = 
"Removal" and chapter_number = 2;


The problem I'm having with this sort of search is that, based on what I 
know, the {!child ... and {!parent . parsers won't give me 
access to all fields like this.


I've looked at flattening the data similar to the following:

{
    id: 1,
    name: "My Martian",
    genre: "Science Fiction",
    edit_action_3: {
    action: "Modify",
    chapter: 3,
    description: "Corrected spelling for interstellar"
    },
    edit_action_24: {
    action: "Removal",
    chapter: 24,
    description: "Removed chapter as it adds no value to the story"
    },
    chapter_1: {
    chapter_number: 1,
    chapter_title: "The Test"
    },
    chapter_2: {
    chapter_number: 2,
    chapter_title: "The Next Test"
    }

}

This does flatten things out so that the above query would be able to 
search on any field, but it's a real kludge and makes it nearly 
impossible to get just a list of chapters or actions.


So anyone have any thoughts? (FYI, this is my first Solr project so I'm 
really starting from scratch here).


Thanks




---
This email has been checked for viruses by Avast antivirus software.
https://www.avast.com/antivirus



Re: Having trouble indexing nested docs using "split" feature.

2017-12-02 Thread David Lee

Sorry about the formatting for the first part, hope this is clearer:

{
    "book_id": "1234",
    "book_title": "The Martian Chronicles",
"author": "Ray Bradbury",
"reviews": [
    {
"reviewer": "John Smith",
    "reviewer_background": {
"highest_rank": "Excellent",
    "latest_review": "10/15/2017 10:15:00.000 CST",
    }
    }, {
"reviewer": "Adam Smith",
    "reviewer_background": {
"highest_rank": "Good",
    "latest_review": "10/10/2017 16:18:00.000 CST",
}
}
],
"checkouts": [
    {
    "member_id": "aaabbbccc",
    "member_name": "Sam Jackson"
},{
"member_id": "bbbcccddd",
    "member_name": "Buddy Jones"
    }
    ]
}


On 12/2/2017 1:55 PM, David Lee wrote:

Hi all,

I've been trying for some time now to find a suitable way to deal with 
json documents that have nested data. By suitable, I mean being able 
to index them and retrieve them so that they are in the same structure 
as when indexed.


I'm using version 7.1 under linux Mint 18.3 with Oracle Java 
1.8.0_151. After untarring the distribution, I ran through the 
"getting started" tutorial from the reference manual where it had me 
create the techproducts index. I then created another collection 
called my_collection so I could run the examples more easily. It used 
the _default schema.


Here is a sample:

{

    "book_id": "1234",     "book_title": "The Martian Chronicles",     
"author": "Ray Bradbury", "reviews": [     { "reviewer": "John 
Smith",     "reviewer_background": {     
"highest_rank": "Excellent", "latest_review": "10/15/2017 10:15:00.000 
CST",     }     }, {     "reviewer": "Adam Smith", 
"reviewer_background": {     "highest_rank": "Good", 
    "latest_review": "10/10/2017 16:18:00.000 CST",     } 
    } ], "checkouts": [ { "member_id": "aaabbbccc", "member_name": 
"Sam Jackson" },{ "member_id": "bbbcccddd",   "member_name": 
"Buddy Jones"   }   ] }


Obviously, I'll need to search at the parent level and child level. I 
started experimenting and tried to use one of the examples from 
"Transforming and Indexing Solr JSON". However, when I tried the first 
example as follows:


curl 'http://localhost:8983/solr/my_collection/update/json/docs'\

'?split=/exams'\
'=first:/first'\
'=last:/last'\
'=grade:/grade'\
'=subject:/exams/subject'\
'=test:/exams/test'\
'=marks:/exams/marks'\
  -H 'Content-type:application/json' -d '
{
   "first": "John",
   "last": "Doe",
   "grade": 8,
   "exams": [
 {
   "subject": "Maths",
   "test"   : "term1",
   "marks"  : 90},
 {
   "subject": "Biology",
   "test"   : "term1",
   "marks"  : 86}
   ]
}'

{
  "responseHeader":{
    "status":0,
    "QTime":798}}

Though the status indicates there was no error, when I try to query on 
the the data using *:*, I get this:


curl 'http://localhost:8983/solr/my_collection/select?q=*:*'
{
  "responseHeader":{
    "zkConnected":true,
    "status":0,
    "QTime":6,
    "params":{
  "q":"*:*"}},
  "response":{"numFound":0,"start":0,"maxScore":0.0,"docs":[]
  }}

So it looks like no documents were actually indexed from above. I'm 
trying to determine if this is due to an error in the reference 
manual, or if I haven't set up Solr correctly.


I've tried other techniques (not using the split option) like from 
Yonik's site, but those are slightly dated and I was hoping there was 
a more practical approach with the release of Solr 7.


Any assistance would be appreciated.

Thank you.









Having trouble indexing nested docs using "split" feature.

2017-12-02 Thread David Lee

Hi all,

I've been trying for some time now to find a suitable way to deal with 
json documents that have nested data. By suitable, I mean being able to 
index them and retrieve them so that they are in the same structure as 
when indexed.


I'm using version 7.1 under linux Mint 18.3 with Oracle Java 1.8.0_151. 
After untarring the distribution, I ran through the "getting started" 
tutorial from the reference manual where it had me create the 
techproducts index. I then created another collection called 
my_collection so I could run the examples more easily. It used the 
_default schema.


Here is a sample:

{

    "book_id": "1234",     "book_title": "The Martian Chronicles",     
"author": "Ray Bradbury", "reviews": [     {     "reviewer": 
"John Smith",     "reviewer_background": {     
"highest_rank": "Excellent",     "latest_review": 
"10/15/2017 10:15:00.000 CST",     }     }, {     
"reviewer": "Adam Smith",    "reviewer_background": { 
    "highest_rank": "Good",     "latest_review": 
"10/10/2017 16:18:00.000 CST",     }     } ], "checkouts": [ { 
"member_id": "aaabbbccc", "member_name": "Sam Jackson" },{ "member_id": 
"bbbcccddd",   "member_name": "Buddy Jones"   }   ] }


Obviously, I'll need to search at the parent level and child level. I 
started experimenting and tried to use one of the examples from 
"Transforming and Indexing Solr JSON". However, when I tried the first 
example as follows:


curl 'http://localhost:8983/solr/my_collection/update/json/docs'\

'?split=/exams'\
'=first:/first'\
'=last:/last'\
'=grade:/grade'\
'=subject:/exams/subject'\
'=test:/exams/test'\
'=marks:/exams/marks'\
  -H 'Content-type:application/json' -d '
{
   "first": "John",
   "last": "Doe",
   "grade": 8,
   "exams": [
 {
   "subject": "Maths",
   "test"   : "term1",
   "marks"  : 90},
 {
   "subject": "Biology",
   "test"   : "term1",
   "marks"  : 86}
   ]
}'

{
  "responseHeader":{
    "status":0,
    "QTime":798}}

Though the status indicates there was no error, when I try to query on 
the the data using *:*, I get this:


curl 'http://localhost:8983/solr/my_collection/select?q=*:*'
{
  "responseHeader":{
    "zkConnected":true,
    "status":0,
    "QTime":6,
    "params":{
  "q":"*:*"}},
  "response":{"numFound":0,"start":0,"maxScore":0.0,"docs":[]
  }}

So it looks like no documents were actually indexed from above. I'm 
trying to determine if this is due to an error in the reference manual, 
or if I haven't set up Solr correctly.


I've tried other techniques (not using the split option) like from 
Yonik's site, but those are slightly dated and I was hoping there was a 
more practical approach with the release of Solr 7.


Any assistance would be appreciated.

Thank you.







Re: How to handle nested documents in solr (SolrJ)

2017-05-24 Thread David Lee

Hi Rick,

Adding to this subject, I do appreciate you pointing us to these 
articles, but I'm curious about how much of these take into account the 
latest versions of Solr (ie: +6.5 and 7) given the JSON split 
capabilities, etc. I know that is just on the indexing side so the 
searches may be the same but things are changing quickly these days (not 
a bad thing).


Thanks,

David


On 5/24/2017 4:26 AM, Rick Leir wrote:

Prasad,

Gee, you get confusion from a google search for:

nested documents 
site:mail-archives.apache.org/mod_mbox/lucene-solr-user/


https://www.google.ca/search?safe=strict=nested+documents+site%3Amail-archives.apache.org%2Fmod_mbox%2Flucene-solr-user%2F=nested+documents+site%3Amail-archives.apache.org%2Fmod_mbox%2Flucene-solr-user%2F_l=serp.3...34316.37762.0.37969.10.10.0.0.0.0.104.678.9j1.10.00...1.1.64.serp..0.0.0.JTf887wWCDM 



But my recent posting might help: " Yonick has some good blogs on this."

And Mikhail has an excellent blog:

https://blog.griddynamics.com/how-to-use-block-join-to-improve-search-efficiency-with-nested-documents-in-solr 



cheers -- Rick

On 2017-05-24 02:53 AM, prasad chowdary wrote:

Dear All,

I have a requirement that I need to index the documents in solr using 
Java

code.

Each document contains a sub documents like below ( Its just for
underastanding my question).


student id : 123
student name : john
marks :
maths: 90
English :95

student id : 124
student name : rack
marks :
maths: 80
English :96

etc...

So, as shown above each document contains one child document i.e marks.

Actaully I don't need any joins or anything.My requirement is :

if I query "English:95" ,it should return the complete document ,i.e 
child

along with parent like below

student id : 123
student name : john
marks :
maths: 90
English :95

and also if I query "student id : 123" , it should return the whole 
document

same as above.

Currently I am able to get the child along with parent for child 
match by

using extendedResults option .

But not able to get the child for parent match.






---
This email has been checked for viruses by Avast antivirus software.
https://www.avast.com/antivirus



Re: Reload an unloaded core

2017-05-02 Thread David Lee

I have similar needs but for a slightly different use-case.

In my case, I am breaking up cores / indexes based on the month and year 
so that I can add an alias that always points to the last few months, 
but beyond that I want to simply unload the other indexes once they get 
past a few months old. The indexes will remain on disk but I simply 
don't want my queries to have to go through the older "archived" documents.


However, users will occasionally need to have those indexes reloaded for 
research reasons so what I was doing in ES was simply re-loading all of 
the indexes that fit within the range being searched for and added those 
to an alias (let's call it "archived", for example). Once they are 
finished querying on that older data, I again unload those indexes and 
remove the alias.


From what I'm reading in this thread, this isn't quite as 
straight-forward in Solr so I'm looking for other options.


Thanks,
David

On 5/2/2017 5:04 PM, Shashank Pedamallu wrote:

Thank you Simon, Erick and Shawn for your replies. Unfortunately, restarting 
Solr is not a option for me. So, I’ll try to follow the steps given by Shawn to 
see where I’m standing. Btw, I’m using Solr 6.4.2.

Shawn, once again thank you very much for the detailed reply.

Thanks,
Shashank Pedamallu







On 5/2/17, 2:51 PM, "Shawn Heisey"  wrote:


On 5/2/2017 10:53 AM, Shashank Pedamallu wrote:

I want to unload a core from Solr without deleting data-dir or instance-dir. 
I’m performing some operations on the data-dir after this and then I would like 
to reload the core from the same data-dir. These are the things I tried:

   1.  Reload api – throws an exception saying no such core exists.
   2.  Create api – throws an exception saying a core with given name already 
exists.

Can someone point me what api I could use to achieve this. Please note that, 
I’m working with Solr in Non-Cloud mode without Zookeeper, Collections, etc.

The RELOAD command isn't going to work at all because the core has been
unloaded -- Solr doesn't know about the core, so it can't reload it.
This is a case where the language used is somewhat confusing, even
though it's completely correct.

I am about 90 percent certain that the reason the CREATE command gave
you an error message is because you tried to make a new core.properties
file before you did the CREATE.  When things are working correctly, the
CREATE command itself is what will create core.properties.  If it
already exists, CoreAdmin will give you an error.  This is the exact
text of the error I encountered when trying to use CREATE after building
a core.properties file manually:

Error CREATEing SolrCore 'foo': Could not create a new core in
C:\Users\sheisey\Downloads\solr-6.5.1\server\solr\fooas another core is
already defined there

That error message is confusing, so I will be fixing it:

https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_SOLR-2D10599=DwIDaQ=uilaK90D4TOVoH58JNXRgQ=blJD2pBapH3dDkoajIf9mT9SSbbs19wRbChNde1ErNI=NQtabafDgjFF7o57qRBjFMUG9UDK92RE8pAtrnHmuTs=YimxLbwsjFGBwV4_LR5LK4yXu_uafFvvVujg-7MDJFE=

To verify what you need to do, I fired up Solr 6.5.1 from an extracted
download directory.  I created two cores, "foo" and "bar", using the
commandline "bin\solr create" command.  Then I went to the admin UI and
unloaded foo.  The foo directory was still there, but the core was gone

>from Solr's list.

By clicking on the "Add Core" button in the Core Admin tab, typing "foo"
into name and instanceDir, and clearing the other text boxes, the core
was recreated exactly as it was before it was unloaded.

This is the log from the CREATE command that the admin UI sent:

2017-05-02 18:02:49.232 INFO  (qtp1543727556-18) [   x:foo]
o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/cores
params={schema===foo=CREATE==foo=json&_=1493747904891}
status=0 QTime=396

To double-check this and show how it can be done without the admin UI, I
accessed these two URLs (in a browser), and accomplished the exact same
thing again.  The first URL unloads the core, the second asks Solr to
find the core and re-add it with default settings.

https://urldefense.proofpoint.com/v2/url?u=http-3A__localhost-3A8983_solr_admin_cores-3Faction-3DUNLOAD-26core-3Dfoo=DwIDaQ=uilaK90D4TOVoH58JNXRgQ=blJD2pBapH3dDkoajIf9mT9SSbbs19wRbChNde1ErNI=NQtabafDgjFF7o57qRBjFMUG9UDK92RE8pAtrnHmuTs=CkhSi_Ik3vbgx1UYDGYcifbIuN8GUpc64dpm_hxYy8U=
https://urldefense.proofpoint.com/v2/url?u=http-3A__localhost-3A8983_solr_admin_cores-3Faction-3DCREATE-26name-3Dfoo-26instanceDir-3Dfoo=DwIDaQ=uilaK90D4TOVoH58JNXRgQ=blJD2pBapH3dDkoajIf9mT9SSbbs19wRbChNde1ErNI=NQtabafDgjFF7o57qRBjFMUG9UDK92RE8pAtrnHmuTs=vORlMj_KMQCCLbYsbQM3t2y5Fy8i6IKI0zw3O8zsbcI=

If you are using additional options with your cores, such as the
configset parameter, you would need to include those options on your
CREATE call, similar to what you might have done when you initially
created the core.  With some of the options you can 

Re: Poll: Master-Slave or SolrCloud?

2017-04-27 Thread David Lee
As someone who moved from ES to Solr, I can say that one of the things 
that makes ES so much easier to configure is that the majority of things 
that need to be set for a specific environment are all in pretty much 
one config file. Also, I didn't have to deal with the "magic stuff" that 
many people have talked about where SolrCloud is concerned.


One of the problems is also do to documentation and user blogs that 
discuss how to use SolrCloud. They all tell you how to create a config 
to run SolrCloud on one system using the -e cloud flag, but then that's 
it. They all seem to avoid discussions of what to do from there in terms 
of best practices in distributing to other nodes. It's out there, but in 
many cases the guides refer to older versions of Solr so sometimes it is 
hard to know what versions people are writing about until you try their 
solutions and nothing works, so you finally figure out they are talking 
about a much older version.


I moved away from ES to Solr because I prefer the openness of Solr and 
the community participation but I really haven't been very successful in 
deploying this in a production environment at this point.


I'd say the two things I find that I'm battling with the most are the 
cloud configuration and the work I'm having to do to get even the most 
basic JSON documents indexed correctly (specifically where I need block 
joins, etc.).


I'm hopeful that the V2 Api will help with the JSON issue, but it would 
be nice to have some documentation that goes more in-depth on how to set 
up additional nodes. Also, even though I use ZK for other parts of my 
application, I have no problem with a version running specifically for 
Solr if it makes this process more straight-forward.


David



On 4/27/2017 2:51 AM, Emir Arnautovic wrote:
I think creating poll for ES ppl with question: "How do you run master 
nodes? A) on some data nodes B) dedicated node C) dedicated server" 
would give some insight how big issue is having ZK and if hiding ZK 
behind Solr would do any good.


Emir


On 25.04.2017 23:13, Otis Gospodnetić wrote:

Hi Erick,

Could one run *only* embedded ZK on some SolrCloud nodes, sans any data?
It would be equivalent of dedicated Elasticsearch nodes, which is the
current ES best practice/recommendation.  I've never heard of anyone 
being
scared of running 3 dedicated master ES nodes, so if SolrCloud 
offered the
same, perhaps even completely hiding ZK from users, that would 
present the
same level of complexity (err, simplicity) ES users love about ES.  
Don't

want to talk about SolrCloud vs. ES here at all, just trying to share
observations since we work a lot with both Elasticsearch and 
Solr(Cloud) at

Sematext.

Otis
--
Monitoring - Log Management - Alerting - Anomaly Detection
Solr & Elasticsearch Consulting Support Training - http://sematext.com/


On Tue, Apr 25, 2017 at 4:03 PM, Erick Erickson 


wrote:


bq: I read somewhere that you should run your own ZK externally, and
turn off SolrCloud

this is a bit confused. "turn off SolrCloud" has nothing to do with
running ZK internally or externally. SolrCloud requires ZK, whether
internal or external is irrelevant to the term SolrCloud.

On to running an external ZK ensemble. Mostly, that's administratively
by far the safest. If you're running the embedded ZK, then the ZK
instances are tied to your Solr instance. Now if, for any reason, your
Solr nodes hosting ZK go down, you lose ZK quorum, can't index.
etc

Now consider a cluster with, say, 100 Solr nodes. Not talking replicas
in a collection here, I'm talking 100 physical machines. BTW, this is
not even close to the largest ones I'm aware of. Which three (for
example) are running ZK? If I want to upgrade Solr I better make
really sure not to upgrade to of the Solr instances running ZK at once
if I want my cluster to keep going

And, ZK is sensitive to system resources. So putting ZK on a Solr node
then hosing, say, updates to my Solr cluster can cause ZK to be
starved for resources.

This is one of those deals where _functionally_, it's OK to run
embedded ZK, but administratively it's suspect.

Best,
Erick

On Tue, Apr 25, 2017 at 10:49 AM, Rick Leir  wrote:

All,
I read somewhere that you should run your own ZK externally, and turn

off SolrCloud. Comments please!

Rick

On April 25, 2017 1:33:31 PM EDT, "Otis Gospodnetić" <

otis.gospodne...@gmail.com> wrote:
This is interesting - that ZK is seen as adding so much complexity 
that

it
turns people off!

If you think about it, Elasticsearch users have no choice -- except
their
"ZK" is built-in, hidden, so one doesn't have to think about it, at
least
not initially.

I think I saw mentions (maybe on user or dev MLs or JIRA) about
potentially, in the future, there only being SolrCloud mode (and
dropping
SolrCloud name in favour of Solr).  If the above comment from Charlie
about
complexity is really true for Solr users, and if that's the reason 
why

analyzer of user queries in SOLR 4.10?

2014-11-23 Thread David Lee
This could be a dummy question.  At index time, we can specify the
fieldType of different fields; thus,  we know  the analyzers for those
fields.

In schema.xml,   I do not see the configuration how to specify the
fieldType (thus analyzer) for runtime user queries.

Can anyone help explain this ?

Thanks,
DL


Re: analyzer of user queries in SOLR 4.10?

2014-11-23 Thread David Lee
Thanks Erik.  I am actually using edismax query parser in SOLR.   I can
explicitly specify the fieldType (e.g., text_general or text_en) for
different fields (e.g., title or description) .   But I do not see how to
specify the fieldType (thus analyzer) for runtime queries.


Thanks,
DL

On Sun, Nov 23, 2014 at 10:21 AM, Erik Hatcher erik.hatc...@gmail.com
wrote:

 Query time analysis depends on the query parser in play.  If a query
 parser chooses to analyze some or all of the query it will use the same
 analysis as index time unless specified separately (in the field type
 definition itself too)

Erik


  On Nov 23, 2014, at 13:08, David Lee seek...@gmail.com wrote:
 
  This could be a dummy question.  At index time, we can specify the
  fieldType of different fields; thus,  we know  the analyzers for those
  fields.
 
  In schema.xml,   I do not see the configuration how to specify the
  fieldType (thus analyzer) for runtime user queries.
 
  Can anyone help explain this ?
 
  Thanks,
  DL




-- 
SeekWWW: the Search Engine of Choice
www.seekwww.com


Re: analyzer of user queries in SOLR 4.10?

2014-11-23 Thread David Lee
Yes,  my edismax parser is configured to query multiple fields, including
qf, pf, pf2 and pf3.
Is there any online documentation on multiple analysis chains might get
used -- each field uses its own analysis chain ?


Thanks,
DL

On Sun, Nov 23, 2014 at 1:34 PM, Shawn Heisey apa...@elyograg.org wrote:

 On 11/23/2014 2:13 PM, David Lee wrote:
  Thanks Erik.  I am actually using edismax query parser in SOLR.   I can
  explicitly specify the fieldType (e.g., text_general or text_en) for
  different fields (e.g., title or description) .   But I do not see how to
  specify the fieldType (thus analyzer) for runtime queries.

 The query analysis is chosen by the field that you are querying.  If the
 request sent to your edismax parser is configured to query multiple
 fields (qf, pf, etc), then multiple analysis chains might get used --
 each field uses its own analysis chain.  Setting the debugQuery
 parameter to true will show you exactly how a query was analyzed.

 The same thing can happen when you use multiple field:value clauses in
 your query.

 Thanks,
 Shawn




-- 
SeekWWW: the Search Engine of Choice
www.seekwww.com


SOLR bf SyntaxError

2014-11-18 Thread David Lee
Hi,

I tried to use bf for boosting,  and got the following error:

org.apache.solr.common.SolrException: org.apache.solr.search.SyntaxError:
Unexpected text after function: )


Here's the bf boosting:

str
name=bfsum(div(product(log(map(reviews,0,0,1)),rating),2.5),div(log(map(sales,0,0,1)),10))/str


What's the syntax issue here?


Thanks,
DL


Index complex JSON data in SOLR

2014-11-15 Thread David Lee
Hi All,

How do I index complex JSON data in SOLR? For example,

{prices:[{state:CA, price:101.0}, {state:NJ,
price:102.0},{state:CO, price:102.0}]}


It's simple in ElasticSearch, but in SOLR it always reports the following
error:
Error parsing JSON field value. Unexpected OBJECT_START


Thanks,
DL


Re: Index complex JSON data in SOLR

2014-11-15 Thread David Lee
Thanks Alex.   I  take a look at the approach of transforming JSON document
before mapping it to the Solr schema at
http://lucidworks.com/blog/indexing-custom-json-data/ .

It's  a walk-around.  But in my case,  if every state has its own price,
 the number of documents needs to be indexed will increase 50 times,  which
may have negative impact on performance,etc.

{prices:[{state:CA, price:101.0}, {state:NJ,
price:102.0},{state:CO, price:102.0}]}

Is there any other better solution?

Thanks,
DL

On Sat, Nov 15, 2014 at 2:17 PM, Alexandre Rafalovitch arafa...@gmail.com
wrote:

 It's simple in Elasticsearch, but what you actually get is a single
 document and all it's children data ({state, price}) entries are
 joined together behind the scenes into the multivalued fields. Which
 may or may not be an issue for you.

 For Solr, nested documents need to be parent/child separate documents.
 And the syntax is a bit more explicit. So, you can either provide more
 explicit JSON:

 https://cwiki.apache.org/confluence/display/solr/Uploading+Data+with+Index+Handlers#UploadingDatawithIndexHandlers-NestedChildDocuments

 or transform JSON document before mapping it to the Solr schema:
 http://lucidworks.com/blog/indexing-custom-json-data/ (latest 4.10 Solr).

 Regards,
Alex.
 Personal: http://www.outerthoughts.com/ and @arafalov
 Solr resources and newsletter: http://www.solr-start.com/ and @solrstart
 Solr popularizers community: https://www.linkedin.com/groups?gid=6713853


 On 15 November 2014 17:05, David Lee seek...@gmail.com wrote:
  Hi All,
 
  How do I index complex JSON data in SOLR? For example,
 
  {prices:[{state:CA, price:101.0}, {state:NJ,
  price:102.0},{state:CO, price:102.0}]}
 
 
  It's simple in ElasticSearch, but in SOLR it always reports the following
  error:
  Error parsing JSON field value. Unexpected OBJECT_START
 
 
  Thanks,
  DL




-- 
SeekWWW: the Search Engine of Choice
www.seekwww.com


Re: Index complex JSON data in SOLR

2014-11-15 Thread David Lee
Assume that we are selling a product online to  50 states in the USA.  But
each state has its own price.  ALthough the base product information is the
same,  the index size will increase 50 times if we index that way.

The usage is similar as searching a product; but based on the location of
the user (e.g., which state the user is from), we may show a different
price.

On Sat, Nov 15, 2014 at 3:40 PM, Alexandre Rafalovitch arafa...@gmail.com
wrote:

 The first link shows how to create children with specific content, but
 you need to use _childDocuments_:... explicitly instead of the
 prices:  and perhaps add type: price or some such to differentiate
 record types.

 But I am not quite following why you say it will increase 50 times. By
 comparison to what? How did you want the children documents to be
 stored/found (in Elasticsearch or Solr)?

 One way to think through this problem is to be explicit about what the
 _search_ would look like and then adjust indexing accordingly.


 Regards,
 Alex.
 Personal: http://www.outerthoughts.com/ and @arafalov
 Solr resources and newsletter: http://www.solr-start.com/ and @solrstart
 Solr popularizers community: https://www.linkedin.com/groups?gid=6713853


 On 15 November 2014 18:24, David Lee seek...@gmail.com wrote:
  Thanks Alex.   I  take a look at the approach of transforming JSON
 document
  before mapping it to the Solr schema at
  http://lucidworks.com/blog/indexing-custom-json-data/ .
 
  It's  a walk-around.  But in my case,  if every state has its own price,
   the number of documents needs to be indexed will increase 50 times,
 which
  may have negative impact on performance,etc.
 
  {prices:[{state:CA, price:101.0}, {state:NJ,
  price:102.0},{state:CO, price:102.0}]}
 
  Is there any other better solution?
 
  Thanks,
  DL
 
  On Sat, Nov 15, 2014 at 2:17 PM, Alexandre Rafalovitch 
 arafa...@gmail.com
  wrote:
 
  It's simple in Elasticsearch, but what you actually get is a single
  document and all it's children data ({state, price}) entries are
  joined together behind the scenes into the multivalued fields. Which
  may or may not be an issue for you.
 
  For Solr, nested documents need to be parent/child separate documents.
  And the syntax is a bit more explicit. So, you can either provide more
  explicit JSON:
 
 
 https://cwiki.apache.org/confluence/display/solr/Uploading+Data+with+Index+Handlers#UploadingDatawithIndexHandlers-NestedChildDocuments
 
  or transform JSON document before mapping it to the Solr schema:
  http://lucidworks.com/blog/indexing-custom-json-data/ (latest 4.10
 Solr).
 
  Regards,
 Alex.
  Personal: http://www.outerthoughts.com/ and @arafalov
  Solr resources and newsletter: http://www.solr-start.com/ and
 @solrstart
  Solr popularizers community:
 https://www.linkedin.com/groups?gid=6713853
 
 
  On 15 November 2014 17:05, David Lee seek...@gmail.com wrote:
   Hi All,
  
   How do I index complex JSON data in SOLR? For example,
  
   {prices:[{state:CA, price:101.0}, {state:NJ,
   price:102.0},{state:CO, price:102.0}]}
  
  
   It's simple in ElasticSearch, but in SOLR it always reports the
 following
   error:
   Error parsing JSON field value. Unexpected OBJECT_START
  
  
   Thanks,
   DL
 
 
 
 
  --
  SeekWWW: the Search Engine of Choice
  www.seekwww.com




-- 
SeekWWW: the Search Engine of Choice
www.seekwww.com


Re: Index complex JSON data in SOLR

2014-11-15 Thread David Lee
Thanks Alex and William for the suggestions.  I'll try out the approach
storing the JSON string.

On Sat, Nov 15, 2014 at 5:27 PM, William Bell billnb...@gmail.com wrote:

 You can take 4.* of Solr and just apply my fix.

 Store JSON stringified into a string field (make sure the field name ends
 in _json). Then you can output with: wt=jsonjson.fsuffix=_json

 OK?

 Use SOLR-4685.



 On Sat, Nov 15, 2014 at 5:07 PM, Alexandre Rafalovitch arafa...@gmail.com
 
 wrote:

  It sounds to me that you are not actually searching on the state or
  price. So, does it make sense to store it in Solr? Maybe it should
  stay in external database and you merge it. Or store (not index) that
  json as pure text field and parse what you need out of it manually, as
  you would with Elasticsearch.
 
  But if you want to store states/prices separately in Solr, then you do
  have to pay the price somehow, right? And 50 times more documents may
  not actually have any impact on your performance. Solr scales really
  well. Especially, if you don't need to display some fields, because
  tokens in store=false/index=true fields are only stored once.
 
  Regards,
  Alex.
  Personal: http://www.outerthoughts.com/ and @arafalov
  Solr resources and newsletter: http://www.solr-start.com/ and @solrstart
  Solr popularizers community: https://www.linkedin.com/groups?gid=6713853
 
 
  On 15 November 2014 18:53, David Lee seek...@gmail.com wrote:
   Assume that we are selling a product online to  50 states in the USA.
  But
   each state has its own price.  ALthough the base product information is
  the
   same,  the index size will increase 50 times if we index that way.
  
   The usage is similar as searching a product; but based on the location
 of
   the user (e.g., which state the user is from), we may show a different
   price.
  
   On Sat, Nov 15, 2014 at 3:40 PM, Alexandre Rafalovitch 
  arafa...@gmail.com
   wrote:
  
   The first link shows how to create children with specific content, but
   you need to use _childDocuments_:... explicitly instead of the
   prices:  and perhaps add type: price or some such to differentiate
   record types.
  
   But I am not quite following why you say it will increase 50 times. By
   comparison to what? How did you want the children documents to be
   stored/found (in Elasticsearch or Solr)?
  
   One way to think through this problem is to be explicit about what the
   _search_ would look like and then adjust indexing accordingly.
  
  
   Regards,
   Alex.
   Personal: http://www.outerthoughts.com/ and @arafalov
   Solr resources and newsletter: http://www.solr-start.com/ and
  @solrstart
   Solr popularizers community:
  https://www.linkedin.com/groups?gid=6713853
  
  
   On 15 November 2014 18:24, David Lee seek...@gmail.com wrote:
Thanks Alex.   I  take a look at the approach of transforming JSON
   document
before mapping it to the Solr schema at
http://lucidworks.com/blog/indexing-custom-json-data/ .
   
It's  a walk-around.  But in my case,  if every state has its own
  price,
 the number of documents needs to be indexed will increase 50 times,
   which
may have negative impact on performance,etc.
   
{prices:[{state:CA, price:101.0}, {state:NJ,
price:102.0},{state:CO, price:102.0}]}
   
Is there any other better solution?
   
Thanks,
DL
   
On Sat, Nov 15, 2014 at 2:17 PM, Alexandre Rafalovitch 
   arafa...@gmail.com
wrote:
   
It's simple in Elasticsearch, but what you actually get is a single
document and all it's children data ({state, price}) entries are
joined together behind the scenes into the multivalued fields.
 Which
may or may not be an issue for you.
   
For Solr, nested documents need to be parent/child separate
  documents.
And the syntax is a bit more explicit. So, you can either provide
  more
explicit JSON:
   
   
  
 
 https://cwiki.apache.org/confluence/display/solr/Uploading+Data+with+Index+Handlers#UploadingDatawithIndexHandlers-NestedChildDocuments
   
or transform JSON document before mapping it to the Solr schema:
http://lucidworks.com/blog/indexing-custom-json-data/ (latest 4.10
   Solr).
   
Regards,
   Alex.
Personal: http://www.outerthoughts.com/ and @arafalov
Solr resources and newsletter: http://www.solr-start.com/ and
   @solrstart
Solr popularizers community:
   https://www.linkedin.com/groups?gid=6713853
   
   
On 15 November 2014 17:05, David Lee seek...@gmail.com wrote:
 Hi All,

 How do I index complex JSON data in SOLR? For example,

 {prices:[{state:CA, price:101.0}, {state:NJ,
 price:102.0},{state:CO, price:102.0}]}


 It's simple in ElasticSearch, but in SOLR it always reports the
   following
 error:
 Error parsing JSON field value. Unexpected OBJECT_START


 Thanks,
 DL
   
   
   
   
--
SeekWWW: the Search Engine of Choice
www.seekwww.com