from:"David Lee"

Best approach for flattening data or using nested lists.

2018-02-04 Thread David Lee

I have a project where I am working with nested data (not that deep, but 
multiple lists) and would love to get some advice from other experienced 
developers. I've read most of the books on Solr (including Solr In 
Action) and though they provide good information (though dated) on the 
actual indexing mechanism, not many deal with this issue very much.


If there are other resources that aren't necessarily Solr specific that 
can help here, please feel free to point those out.


Here is the structure I'm working with. I've made it generic to simplify 
things, but the intent is here.


{
    id: 1,

    _type: "book",
    name: "My Martian",
    genre: "Science Fiction",
    edits: [
    {
    _type: "book_action",
    action: "Modify",
    chapter: 3,
    description: "Corrected spelling for interstellar"
    }, {
    _type: "book_action",
    action: "Removal",
    chapter: 24,
    description: "Removed chapter as it adds no value to the story"
    }
    ],
    chapters: [
    {
    _type: "book_chapter",
    chapter_number: 1,
    chapter_title: "The Test"
    }, {
    _type: "book_chapter",
    chapter_number: 2,
    chapter_title: "The Next Test"
    }
    ]
}

My first attempt was to just add both lists through SolrJ (can't do this 
with the JSON interface since it doesn't allow multiple _childDocuments_ 
at the same level). That works and I'm able to use the _type value to 
distinguish between them. However, my problem here is that the users 
want to be able to search for any field in the top level of the data as 
well as within the lists. For example (using sql for clarity only):


select * from book_index where genre = "Science Fiction" and action = 
"Removal" and chapter_number = 2;


The problem I'm having with this sort of search is that, based on what I 
know, the {!child ... and {!parent . parsers won't give me 
access to all fields like this.


I've looked at flattening the data similar to the following:

{
    id: 1,
    name: "My Martian",
    genre: "Science Fiction",
    edit_action_3: {
    action: "Modify",
    chapter: 3,
    description: "Corrected spelling for interstellar"
    },
    edit_action_24: {
    action: "Removal",
    chapter: 24,
    description: "Removed chapter as it adds no value to the story"
    },
    chapter_1: {
    chapter_number: 1,
    chapter_title: "The Test"
    },
    chapter_2: {
    chapter_number: 2,
    chapter_title: "The Next Test"
    }

}

This does flatten things out so that the above query would be able to 
search on any field, but it's a real kludge and makes it nearly 
impossible to get just a list of chapters or actions.


So anyone have any thoughts? (FYI, this is my first Solr project so I'm 
really starting from scratch here).


Thanks




---
This email has been checked for viruses by Avast antivirus software.
https://www.avast.com/antivirus

Re: Having trouble indexing nested docs using "split" feature.

2017-12-02 Thread David Lee


Sorry about the formatting for the first part, hope this is clearer:

{
    "book_id": "1234",
    "book_title": "The Martian Chronicles",
"author": "Ray Bradbury",
"reviews": [
    {
"reviewer": "John Smith",
    "reviewer_background": {
"highest_rank": "Excellent",
    "latest_review": "10/15/2017 10:15:00.000 CST",
    }
    }, {
"reviewer": "Adam Smith",
    "reviewer_background": {
"highest_rank": "Good",
    "latest_review": "10/10/2017 16:18:00.000 CST",
}
}
],
"checkouts": [
    {
    "member_id": "aaabbbccc",
    "member_name": "Sam Jackson"
},{
"member_id": "bbbcccddd",
    "member_name": "Buddy Jones"
    }
    ]
}


On 12/2/2017 1:55 PM, David Lee wrote:

Hi all,

I've been trying for some time now to find a suitable way to deal with 
json documents that have nested data. By suitable, I mean being able 
to index them and retrieve them so that they are in the same structure 
as when indexed.


I'm using version 7.1 under linux Mint 18.3 with Oracle Java 
1.8.0_151. After untarring the distribution, I ran through the 
"getting started" tutorial from the reference manual where it had me 
create the techproducts index. I then created another collection 
called my_collection so I could run the examples more easily. It used 
the _default schema.


Here is a sample:

{

    "book_id": "1234",     "book_title": "The Martian Chronicles",     
"author": "Ray Bradbury", "reviews": [     { "reviewer": "John 
Smith",     "reviewer_background": {     
"highest_rank": "Excellent", "latest_review": "10/15/2017 10:15:00.000 
CST",     }     }, {     "reviewer": "Adam Smith", 
"reviewer_background": {     "highest_rank": "Good", 
    "latest_review": "10/10/2017 16:18:00.000 CST",     } 
    } ], "checkouts": [ { "member_id": "aaabbbccc", "member_name": 
"Sam Jackson" },{ "member_id": "bbbcccddd",   "member_name": 
"Buddy Jones"   }   ] }


Obviously, I'll need to search at the parent level and child level. I 
started experimenting and tried to use one of the examples from 
"Transforming and Indexing Solr JSON". However, when I tried the first 
example as follows:


curl 'http://localhost:8983/solr/my_collection/update/json/docs'\

'?split=/exams'\
'=first:/first'\
'=last:/last'\
'=grade:/grade'\
'=subject:/exams/subject'\
'=test:/exams/test'\
'=marks:/exams/marks'\
  -H 'Content-type:application/json' -d '
{
   "first": "John",
   "last": "Doe",
   "grade": 8,
   "exams": [
 {
   "subject": "Maths",
   "test"   : "term1",
   "marks"  : 90},
 {
   "subject": "Biology",
   "test"   : "term1",
   "marks"  : 86}
   ]
}'

{
  "responseHeader":{
    "status":0,
    "QTime":798}}

Though the status indicates there was no error, when I try to query on 
the the data using *:*, I get this:


curl 'http://localhost:8983/solr/my_collection/select?q=*:*'
{
  "responseHeader":{
    "zkConnected":true,
    "status":0,
    "QTime":6,
    "params":{
  "q":"*:*"}},
  "response":{"numFound":0,"start":0,"maxScore":0.0,"docs":[]
  }}

So it looks like no documents were actually indexed from above. I'm 
trying to determine if this is due to an error in the reference 
manual, or if I haven't set up Solr correctly.


I've tried other techniques (not using the split option) like from 
Yonik's site, but those are slightly dated and I was hoping there was 
a more practical approach with the release of Solr 7.


Any assistance would be appreciated.

Thank you.

Having trouble indexing nested docs using "split" feature.

2017-12-02 Thread David Lee


Hi all,

I've been trying for some time now to find a suitable way to deal with 
json documents that have nested data. By suitable, I mean being able to 
index them and retrieve them so that they are in the same structure as 
when indexed.


I'm using version 7.1 under linux Mint 18.3 with Oracle Java 1.8.0_151. 
After untarring the distribution, I ran through the "getting started" 
tutorial from the reference manual where it had me create the 
techproducts index. I then created another collection called 
my_collection so I could run the examples more easily. It used the 
_default schema.


Here is a sample:

{

    "book_id": "1234",     "book_title": "The Martian Chronicles",     
"author": "Ray Bradbury", "reviews": [     {     "reviewer": 
"John Smith",     "reviewer_background": {     
"highest_rank": "Excellent",     "latest_review": 
"10/15/2017 10:15:00.000 CST",     }     }, {     
"reviewer": "Adam Smith",    "reviewer_background": { 
    "highest_rank": "Good",     "latest_review": 
"10/10/2017 16:18:00.000 CST",     }     } ], "checkouts": [ { 
"member_id": "aaabbbccc", "member_name": "Sam Jackson" },{ "member_id": 
"bbbcccddd",   "member_name": "Buddy Jones"   }   ] }


Obviously, I'll need to search at the parent level and child level. I 
started experimenting and tried to use one of the examples from 
"Transforming and Indexing Solr JSON". However, when I tried the first 
example as follows:


curl 'http://localhost:8983/solr/my_collection/update/json/docs'\

'?split=/exams'\
'=first:/first'\
'=last:/last'\
'=grade:/grade'\
'=subject:/exams/subject'\
'=test:/exams/test'\
'=marks:/exams/marks'\
  -H 'Content-type:application/json' -d '
{
   "first": "John",
   "last": "Doe",
   "grade": 8,
   "exams": [
 {
   "subject": "Maths",
   "test"   : "term1",
   "marks"  : 90},
 {
   "subject": "Biology",
   "test"   : "term1",
   "marks"  : 86}
   ]
}'

{
  "responseHeader":{
    "status":0,
    "QTime":798}}

Though the status indicates there was no error, when I try to query on 
the the data using *:*, I get this:


curl 'http://localhost:8983/solr/my_collection/select?q=*:*'
{
  "responseHeader":{
    "zkConnected":true,
    "status":0,
    "QTime":6,
    "params":{
  "q":"*:*"}},
  "response":{"numFound":0,"start":0,"maxScore":0.0,"docs":[]
  }}

So it looks like no documents were actually indexed from above. I'm 
trying to determine if this is due to an error in the reference manual, 
or if I haven't set up Solr correctly.


I've tried other techniques (not using the split option) like from 
Yonik's site, but those are slightly dated and I was hoping there was a 
more practical approach with the release of Solr 7.


Any assistance would be appreciated.

Thank you.

Re: How to handle nested documents in solr (SolrJ)

2017-05-24 Thread David Lee


Hi Rick,

Adding to this subject, I do appreciate you pointing us to these 
articles, but I'm curious about how much of these take into account the 
latest versions of Solr (ie: +6.5 and 7) given the JSON split 
capabilities, etc. I know that is just on the indexing side so the 
searches may be the same but things are changing quickly these days (not 
a bad thing).


Thanks,

David


On 5/24/2017 4:26 AM, Rick Leir wrote:

Prasad,

Gee, you get confusion from a google search for:

nested documents 
site:mail-archives.apache.org/mod_mbox/lucene-solr-user/


https://www.google.ca/search?safe=strict=nested+documents+site%3Amail-archives.apache.org%2Fmod_mbox%2Flucene-solr-user%2F=nested+documents+site%3Amail-archives.apache.org%2Fmod_mbox%2Flucene-solr-user%2F_l=serp.3...34316.37762.0.37969.10.10.0.0.0.0.104.678.9j1.10.00...1.1.64.serp..0.0.0.JTf887wWCDM 



But my recent posting might help: " Yonick has some good blogs on this."

And Mikhail has an excellent blog:

https://blog.griddynamics.com/how-to-use-block-join-to-improve-search-efficiency-with-nested-documents-in-solr 



cheers -- Rick

On 2017-05-24 02:53 AM, prasad chowdary wrote:

Dear All,

I have a requirement that I need to index the documents in solr using 
Java

code.

Each document contains a sub documents like below ( Its just for
underastanding my question).


student id : 123
student name : john
marks :
maths: 90
English :95

student id : 124
student name : rack
marks :
maths: 80
English :96

etc...

So, as shown above each document contains one child document i.e marks.

Actaully I don't need any joins or anything.My requirement is :

if I query "English:95" ,it should return the complete document ,i.e 
child

along with parent like below

student id : 123
student name : john
marks :
maths: 90
English :95

and also if I query "student id : 123" , it should return the whole 
document

same as above.

Currently I am able to get the child along with parent for child 
match by

using extendedResults option .

But not able to get the child for parent match.






---
This email has been checked for viruses by Avast antivirus software.
https://www.avast.com/antivirus

Re: Reload an unloaded core

2017-05-02 Thread David Lee

I have similar needs but for a slightly different use-case.

In my case, I am breaking up cores / indexes based on the month and year
so that I can add an alias that always points to the last few months,
but beyond that I want to simply unload the other indexes once they get
past a few months old. The indexes will remain on disk but I simply
don't want my queries to have to go through the older "archived" documents.

However, users will occasionally need to have those indexes reloaded for
research reasons so what I was doing in ES was simply re-loading all of
the indexes that fit within the range being searched for and added those
to an alias (let's call it "archived", for example). Once they are
finished querying on that older data, I again unload those indexes and
remove the alias.

From what I'm reading in this thread, this isn't quite as
straight-forward in Solr so I'm looking for other options.

Thanks,
David

On 5/2/2017 5:04 PM, Shashank Pedamallu wrote:

Thank you Simon, Erick and Shawn for your replies. Unfortunately, restarting
Solr is not a option for me. So, I’ll try to follow the steps given by Shawn to
see where I’m standing. Btw, I’m using Solr 6.4.2.

Shawn, once again thank you very much for the detailed reply.

Thanks,
Shashank Pedamallu

On 5/2/17, 2:51 PM, "Shawn Heisey" wrote:

On 5/2/2017 10:53 AM, Shashank Pedamallu wrote:

I want to unload a core from Solr without deleting data-dir or instance-dir.
I’m performing some operations on the data-dir after this and then I would like
to reload the core from the same data-dir. These are the things I tried:

1. Reload api – throws an exception saying no such core exists.
2. Create api – throws an exception saying a core with given name already
exists.

Can someone point me what api I could use to achieve this. Please note that,
I’m working with Solr in Non-Cloud mode without Zookeeper, Collections, etc.

The RELOAD command isn't going to work at all because the core has been
unloaded -- Solr doesn't know about the core, so it can't reload it.
This is a case where the language used is somewhat confusing, even
though it's completely correct.

I am about 90 percent certain that the reason the CREATE command gave
you an error message is because you tried to make a new core.properties
file before you did the CREATE. When things are working correctly, the
CREATE command itself is what will create core.properties. If it
already exists, CoreAdmin will give you an error. This is the exact
text of the error I encountered when trying to use CREATE after building
a core.properties file manually:

Error CREATEing SolrCore 'foo': Could not create a new core in
C:\Users\sheisey\Downloads\solr-6.5.1\server\solr\fooas another core is
already defined there

That error message is confusing, so I will be fixing it:

https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_SOLR-2D10599=DwIDaQ=uilaK90D4TOVoH58JNXRgQ=blJD2pBapH3dDkoajIf9mT9SSbbs19wRbChNde1ErNI=NQtabafDgjFF7o57qRBjFMUG9UDK92RE8pAtrnHmuTs=YimxLbwsjFGBwV4_LR5LK4yXu_uafFvvVujg-7MDJFE=

To verify what you need to do, I fired up Solr 6.5.1 from an extracted
download directory. I created two cores, "foo" and "bar", using the
commandline "bin\solr create" command. Then I went to the admin UI and
unloaded foo. The foo directory was still there, but the core was gone

>from Solr's list.

By clicking on the "Add Core" button in the Core Admin tab, typing "foo"
into name and instanceDir, and clearing the other text boxes, the core
was recreated exactly as it was before it was unloaded.

This is the log from the CREATE command that the admin UI sent:

2017-05-02 18:02:49.232 INFO (qtp1543727556-18) [ x:foo]
o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/cores
params={schema===foo=CREATE==foo=json&_=1493747904891}
status=0 QTime=396

To double-check this and show how it can be done without the admin UI, I
accessed these two URLs (in a browser), and accomplished the exact same
thing again. The first URL unloads the core, the second asks Solr to
find the core and re-add it with default settings.

https://urldefense.proofpoint.com/v2/url?u=http-3A__localhost-3A8983_solr_admin_cores-3Faction-3DUNLOAD-26core-3Dfoo=DwIDaQ=uilaK90D4TOVoH58JNXRgQ=blJD2pBapH3dDkoajIf9mT9SSbbs19wRbChNde1ErNI=NQtabafDgjFF7o57qRBjFMUG9UDK92RE8pAtrnHmuTs=CkhSi_Ik3vbgx1UYDGYcifbIuN8GUpc64dpm_hxYy8U=
https://urldefense.proofpoint.com/v2/url?u=http-3A__localhost-3A8983_solr_admin_cores-3Faction-3DCREATE-26name-3Dfoo-26instanceDir-3Dfoo=DwIDaQ=uilaK90D4TOVoH58JNXRgQ=blJD2pBapH3dDkoajIf9mT9SSbbs19wRbChNde1ErNI=NQtabafDgjFF7o57qRBjFMUG9UDK92RE8pAtrnHmuTs=vORlMj_KMQCCLbYsbQM3t2y5Fy8i6IKI0zw3O8zsbcI=

If you are using additional options with your cores, such as the
configset parameter, you would need to include those options on your
CREATE call, similar to what you might have done when you initially
created the core. With some of the options you can

Re: Poll: Master-Slave or SolrCloud?

2017-04-27 Thread David Lee

As someone who moved from ES to Solr, I can say that one of the things 
that makes ES so much easier to configure is that the majority of things 
that need to be set for a specific environment are all in pretty much 
one config file. Also, I didn't have to deal with the "magic stuff" that 
many people have talked about where SolrCloud is concerned.


One of the problems is also do to documentation and user blogs that 
discuss how to use SolrCloud. They all tell you how to create a config 
to run SolrCloud on one system using the -e cloud flag, but then that's 
it. They all seem to avoid discussions of what to do from there in terms 
of best practices in distributing to other nodes. It's out there, but in 
many cases the guides refer to older versions of Solr so sometimes it is 
hard to know what versions people are writing about until you try their 
solutions and nothing works, so you finally figure out they are talking 
about a much older version.


I moved away from ES to Solr because I prefer the openness of Solr and 
the community participation but I really haven't been very successful in 
deploying this in a production environment at this point.


I'd say the two things I find that I'm battling with the most are the 
cloud configuration and the work I'm having to do to get even the most 
basic JSON documents indexed correctly (specifically where I need block 
joins, etc.).


I'm hopeful that the V2 Api will help with the JSON issue, but it would 
be nice to have some documentation that goes more in-depth on how to set 
up additional nodes. Also, even though I use ZK for other parts of my 
application, I have no problem with a version running specifically for 
Solr if it makes this process more straight-forward.


David



On 4/27/2017 2:51 AM, Emir Arnautovic wrote:
I think creating poll for ES ppl with question: "How do you run master 
nodes? A) on some data nodes B) dedicated node C) dedicated server" 
would give some insight how big issue is having ZK and if hiding ZK 
behind Solr would do any good.


Emir


On 25.04.2017 23:13, Otis Gospodnetić wrote:

Hi Erick,

Could one run *only* embedded ZK on some SolrCloud nodes, sans any data?
It would be equivalent of dedicated Elasticsearch nodes, which is the
current ES best practice/recommendation.  I've never heard of anyone 
being
scared of running 3 dedicated master ES nodes, so if SolrCloud 
offered the
same, perhaps even completely hiding ZK from users, that would 
present the
same level of complexity (err, simplicity) ES users love about ES.  
Don't

want to talk about SolrCloud vs. ES here at all, just trying to share
observations since we work a lot with both Elasticsearch and 
Solr(Cloud) at

Sematext.

Otis
--
Monitoring - Log Management - Alerting - Anomaly Detection
Solr & Elasticsearch Consulting Support Training - http://sematext.com/


On Tue, Apr 25, 2017 at 4:03 PM, Erick Erickson 


wrote:


bq: I read somewhere that you should run your own ZK externally, and
turn off SolrCloud

this is a bit confused. "turn off SolrCloud" has nothing to do with
running ZK internally or externally. SolrCloud requires ZK, whether
internal or external is irrelevant to the term SolrCloud.

On to running an external ZK ensemble. Mostly, that's administratively
by far the safest. If you're running the embedded ZK, then the ZK
instances are tied to your Solr instance. Now if, for any reason, your
Solr nodes hosting ZK go down, you lose ZK quorum, can't index.
etc

Now consider a cluster with, say, 100 Solr nodes. Not talking replicas
in a collection here, I'm talking 100 physical machines. BTW, this is
not even close to the largest ones I'm aware of. Which three (for
example) are running ZK? If I want to upgrade Solr I better make
really sure not to upgrade to of the Solr instances running ZK at once
if I want my cluster to keep going

And, ZK is sensitive to system resources. So putting ZK on a Solr node
then hosing, say, updates to my Solr cluster can cause ZK to be
starved for resources.

This is one of those deals where _functionally_, it's OK to run
embedded ZK, but administratively it's suspect.

Best,
Erick

On Tue, Apr 25, 2017 at 10:49 AM, Rick Leir  wrote:

All,
I read somewhere that you should run your own ZK externally, and turn

off SolrCloud. Comments please!

Rick

On April 25, 2017 1:33:31 PM EDT, "Otis Gospodnetić" <

otis.gospodne...@gmail.com> wrote:
This is interesting - that ZK is seen as adding so much complexity 
that

it
turns people off!

If you think about it, Elasticsearch users have no choice -- except
their
"ZK" is built-in, hidden, so one doesn't have to think about it, at
least
not initially.

I think I saw mentions (maybe on user or dev MLs or JIRA) about
potentially, in the future, there only being SolrCloud mode (and
dropping
SolrCloud name in favour of Solr).  If the above comment from Charlie
about
complexity is really true for Solr users, and if that's the reason 
why

analyzer of user queries in SOLR 4.10?

2014-11-23 Thread David Lee

This could be a dummy question.  At index time, we can specify the
fieldType of different fields; thus,  we know  the analyzers for those
fields.

In schema.xml,   I do not see the configuration how to specify the
fieldType (thus analyzer) for runtime user queries.

Can anyone help explain this ?

Thanks,
DL

Re: analyzer of user queries in SOLR 4.10?

2014-11-23 Thread David Lee

Thanks Erik.  I am actually using edismax query parser in SOLR.   I can
explicitly specify the fieldType (e.g., text_general or text_en) for
different fields (e.g., title or description) .   But I do not see how to
specify the fieldType (thus analyzer) for runtime queries.


Thanks,
DL

On Sun, Nov 23, 2014 at 10:21 AM, Erik Hatcher erik.hatc...@gmail.com
wrote:

 Query time analysis depends on the query parser in play.  If a query
 parser chooses to analyze some or all of the query it will use the same
 analysis as index time unless specified separately (in the field type
 definition itself too)

Erik


  On Nov 23, 2014, at 13:08, David Lee seek...@gmail.com wrote:
 
  This could be a dummy question.  At index time, we can specify the
  fieldType of different fields; thus,  we know  the analyzers for those
  fields.
 
  In schema.xml,   I do not see the configuration how to specify the
  fieldType (thus analyzer) for runtime user queries.
 
  Can anyone help explain this ?
 
  Thanks,
  DL




-- 
SeekWWW: the Search Engine of Choice
www.seekwww.com

Re: analyzer of user queries in SOLR 4.10?

2014-11-23 Thread David Lee

Yes,  my edismax parser is configured to query multiple fields, including
qf, pf, pf2 and pf3.
Is there any online documentation on multiple analysis chains might get
used -- each field uses its own analysis chain ?


Thanks,
DL

On Sun, Nov 23, 2014 at 1:34 PM, Shawn Heisey apa...@elyograg.org wrote:

 On 11/23/2014 2:13 PM, David Lee wrote:
  Thanks Erik.  I am actually using edismax query parser in SOLR.   I can
  explicitly specify the fieldType (e.g., text_general or text_en) for
  different fields (e.g., title or description) .   But I do not see how to
  specify the fieldType (thus analyzer) for runtime queries.

 The query analysis is chosen by the field that you are querying.  If the
 request sent to your edismax parser is configured to query multiple
 fields (qf, pf, etc), then multiple analysis chains might get used --
 each field uses its own analysis chain.  Setting the debugQuery
 parameter to true will show you exactly how a query was analyzed.

 The same thing can happen when you use multiple field:value clauses in
 your query.

 Thanks,
 Shawn




-- 
SeekWWW: the Search Engine of Choice
www.seekwww.com

SOLR bf SyntaxError

2014-11-18 Thread David Lee

Hi,

I tried to use bf for boosting,  and got the following error:

org.apache.solr.common.SolrException: org.apache.solr.search.SyntaxError:
Unexpected text after function: )


Here's the bf boosting:

str
name=bfsum(div(product(log(map(reviews,0,0,1)),rating),2.5),div(log(map(sales,0,0,1)),10))/str


What's the syntax issue here?


Thanks,
DL

Index complex JSON data in SOLR

2014-11-15 Thread David Lee

Hi All,

How do I index complex JSON data in SOLR? For example,

{prices:[{state:CA, price:101.0}, {state:NJ,
price:102.0},{state:CO, price:102.0}]}


It's simple in ElasticSearch, but in SOLR it always reports the following
error:
Error parsing JSON field value. Unexpected OBJECT_START


Thanks,
DL

Re: Index complex JSON data in SOLR

2014-11-15 Thread David Lee

Thanks Alex. I take a look at the approach of transforming JSON document
before mapping it to the Solr schema at
http://lucidworks.com/blog/indexing-custom-json-data/ .

It's a walk-around. But in my case, if every state has its own price,
the number of documents needs to be indexed will increase 50 times, which
may have negative impact on performance,etc.

{prices:[{state:CA, price:101.0}, {state:NJ,
price:102.0},{state:CO, price:102.0}]}

Is there any other better solution?

Thanks,
DL

On Sat, Nov 15, 2014 at 2:17 PM, Alexandre Rafalovitch arafa...@gmail.com
wrote:

It's simple in Elasticsearch, but what you actually get is a single
document and all it's children data ({state, price}) entries are
joined together behind the scenes into the multivalued fields. Which
may or may not be an issue for you.

For Solr, nested documents need to be parent/child separate documents.
And the syntax is a bit more explicit. So, you can either provide more
explicit JSON:

https://cwiki.apache.org/confluence/display/solr/Uploading+Data+with+Index+Handlers#UploadingDatawithIndexHandlers-NestedChildDocuments

or transform JSON document before mapping it to the Solr schema:
http://lucidworks.com/blog/indexing-custom-json-data/ (latest 4.10 Solr).

Regards,
Alex.
Personal: http://www.outerthoughts.com/ and @arafalov
Solr resources and newsletter: http://www.solr-start.com/ and @solrstart
Solr popularizers community: https://www.linkedin.com/groups?gid=6713853

On 15 November 2014 17:05, David Lee seek...@gmail.com wrote:
Hi All,

How do I index complex JSON data in SOLR? For example,

{prices:[{state:CA, price:101.0}, {state:NJ,
price:102.0},{state:CO, price:102.0}]}

It's simple in ElasticSearch, but in SOLR it always reports the following
error:
Error parsing JSON field value. Unexpected OBJECT_START

Thanks,
DL

--
SeekWWW: the Search Engine of Choice
www.seekwww.com

Re: Index complex JSON data in SOLR

2014-11-15 Thread David Lee

Assume that we are selling a product online to 50 states in the USA. But
each state has its own price. ALthough the base product information is the
same, the index size will increase 50 times if we index that way.

The usage is similar as searching a product; but based on the location of
the user (e.g., which state the user is from), we may show a different
price.

On Sat, Nov 15, 2014 at 3:40 PM, Alexandre Rafalovitch arafa...@gmail.com
wrote:

The first link shows how to create children with specific content, but
you need to use _childDocuments_:... explicitly instead of the
prices: and perhaps add type: price or some such to differentiate
record types.

But I am not quite following why you say it will increase 50 times. By
comparison to what? How did you want the children documents to be
stored/found (in Elasticsearch or Solr)?

One way to think through this problem is to be explicit about what the
_search_ would look like and then adjust indexing accordingly.

On 15 November 2014 18:24, David Lee seek...@gmail.com wrote:
Thanks Alex. I take a look at the approach of transforming JSON
document
before mapping it to the Solr schema at
http://lucidworks.com/blog/indexing-custom-json-data/ .

It's a walk-around. But in my case, if every state has its own price,
the number of documents needs to be indexed will increase 50 times,
which
may have negative impact on performance,etc.

{prices:[{state:CA, price:101.0}, {state:NJ,
price:102.0},{state:CO, price:102.0}]}

Is there any other better solution?

Thanks,
DL

On Sat, Nov 15, 2014 at 2:17 PM, Alexandre Rafalovitch
arafa...@gmail.com
wrote:

For Solr, nested documents need to be parent/child separate documents.
And the syntax is a bit more explicit. So, you can either provide more
explicit JSON:

https://cwiki.apache.org/confluence/display/solr/Uploading+Data+with+Index+Handlers#UploadingDatawithIndexHandlers-NestedChildDocuments

or transform JSON document before mapping it to the Solr schema:
http://lucidworks.com/blog/indexing-custom-json-data/ (latest 4.10
Solr).

Regards,
Alex.
Personal: http://www.outerthoughts.com/ and @arafalov
Solr resources and newsletter: http://www.solr-start.com/ and
@solrstart
Solr popularizers community:
https://www.linkedin.com/groups?gid=6713853

On 15 November 2014 17:05, David Lee seek...@gmail.com wrote:
Hi All,

How do I index complex JSON data in SOLR? For example,

{prices:[{state:CA, price:101.0}, {state:NJ,
price:102.0},{state:CO, price:102.0}]}

It's simple in ElasticSearch, but in SOLR it always reports the
following
error:
Error parsing JSON field value. Unexpected OBJECT_START

Thanks,
DL

--
SeekWWW: the Search Engine of Choice
www.seekwww.com

Re: Index complex JSON data in SOLR

2014-11-15 Thread David Lee

Thanks Alex and William for the suggestions. I'll try out the approach
storing the JSON string.

On Sat, Nov 15, 2014 at 5:27 PM, William Bell billnb...@gmail.com wrote:

You can take 4.* of Solr and just apply my fix.

Store JSON stringified into a string field (make sure the field name ends
in _json). Then you can output with: wt=jsonjson.fsuffix=_json

OK?

Use SOLR-4685.

On Sat, Nov 15, 2014 at 5:07 PM, Alexandre Rafalovitch arafa...@gmail.com

wrote:

It sounds to me that you are not actually searching on the state or
price. So, does it make sense to store it in Solr? Maybe it should
stay in external database and you merge it. Or store (not index) that
json as pure text field and parse what you need out of it manually, as
you would with Elasticsearch.

But if you want to store states/prices separately in Solr, then you do
have to pay the price somehow, right? And 50 times more documents may
not actually have any impact on your performance. Solr scales really
well. Especially, if you don't need to display some fields, because
tokens in store=false/index=true fields are only stored once.

On 15 November 2014 18:53, David Lee seek...@gmail.com wrote:
Assume that we are selling a product online to 50 states in the USA.
But
each state has its own price. ALthough the base product information is
the
same, the index size will increase 50 times if we index that way.

The usage is similar as searching a product; but based on the location
of
the user (e.g., which state the user is from), we may show a different
price.

On Sat, Nov 15, 2014 at 3:40 PM, Alexandre Rafalovitch
arafa...@gmail.com
wrote:

But I am not quite following why you say it will increase 50 times. By
comparison to what? How did you want the children documents to be
stored/found (in Elasticsearch or Solr)?

One way to think through this problem is to be explicit about what the
_search_ would look like and then adjust indexing accordingly.

Regards,
Alex.
Personal: http://www.outerthoughts.com/ and @arafalov
Solr resources and newsletter: http://www.solr-start.com/ and
@solrstart
Solr popularizers community:
https://www.linkedin.com/groups?gid=6713853

It's a walk-around. But in my case, if every state has its own
price,
the number of documents needs to be indexed will increase 50 times,
which
may have negative impact on performance,etc.

{prices:[{state:CA, price:101.0}, {state:NJ,
price:102.0},{state:CO, price:102.0}]}

Is there any other better solution?

Thanks,
DL

On Sat, Nov 15, 2014 at 2:17 PM, Alexandre Rafalovitch
arafa...@gmail.com
wrote:

For Solr, nested documents need to be parent/child separate
documents.
And the syntax is a bit more explicit. So, you can either provide
more
explicit JSON:

https://cwiki.apache.org/confluence/display/solr/Uploading+Data+with+Index+Handlers#UploadingDatawithIndexHandlers-NestedChildDocuments

or transform JSON document before mapping it to the Solr schema:
http://lucidworks.com/blog/indexing-custom-json-data/ (latest 4.10
Solr).

Regards,
Alex.
Personal: http://www.outerthoughts.com/ and @arafalov
Solr resources and newsletter: http://www.solr-start.com/ and
@solrstart
Solr popularizers community:
https://www.linkedin.com/groups?gid=6713853

On 15 November 2014 17:05, David Lee seek...@gmail.com wrote:
Hi All,

How do I index complex JSON data in SOLR? For example,

{prices:[{state:CA, price:101.0}, {state:NJ,
price:102.0},{state:CO, price:102.0}]}

It's simple in ElasticSearch, but in SOLR it always reports the
following
error:
Error parsing JSON field value. Unexpected OBJECT_START

Thanks,
DL

--
SeekWWW: the Search Engine of Choice
www.seekwww.com

Best approach for flattening data or using nested lists.

Re: Having trouble indexing nested docs using "split" feature.

Having trouble indexing nested docs using "split" feature.

Re: How to handle nested documents in solr (SolrJ)

Re: Reload an unloaded core

Re: Poll: Master-Slave or SolrCloud?

analyzer of user queries in SOLR 4.10?

Re: analyzer of user queries in SOLR 4.10?

Re: analyzer of user queries in SOLR 4.10?

SOLR bf SyntaxError

Index complex JSON data in SOLR

Re: Index complex JSON data in SOLR

Re: Index complex JSON data in SOLR

Re: Index complex JSON data in SOLR

14 matches

Site Navigation

Mail list logo

Footer information