Re: Solr training
Is there a friends-on-the-mailing list discount? I had a bit of sticker shock! On Wed, Sep 16, 2020 at 9:38 AM Charlie Hull wrote: > > I do of course mean 'Group Discounts': you don't get a discount for > being in a 'froup' sadly (I wasn't even aware that was a thing!) > > Charlie > > On 16/09/2020 13:26, Charlie Hull wrote: > > > > Hi all, > > > > We're running our SolrThink Like a Relevance Engineer training 6-9 Oct > > - you can find out more & book tickets at > > https://opensourceconnections.com/training/solr-think-like-a-relevance-engineer-tlre/ > > > > The course is delivered over 4 half-days from 9am EST / 2pm BST / 3pm > > CET and is led by Eric Pugh who co-wrote the first book on Solr and is > > a Solr Committer. It's suitable for all members of the search team - > > search engineers, data scientists, even product owners who want to > > know how Solr search can be measured & tuned. Delivered by working > > relevance engineers the course features practical exercises and will > > give you a great foundation in how to use Solr to build great search. > > > > Tthe early bird discount expires end of this week so do book soon if > > you're interested! Froup discounts also available. We're also running > > a more advanced course on Learning to Rank a couple of weeks later - > > you can find all our training courses and dates at > > https://opensourceconnections.com/training/ > > > > Cheers > > > > Charlie > > > > -- > > Charlie Hull > > OpenSource Connections, previously Flax > > > > tel/fax: +44 (0)8700 118334 > > mobile: +44 (0)7767 825828 > > web:www.o19s.com > > > -- > Charlie Hull > OpenSource Connections, previously Flax > > tel/fax: +44 (0)8700 118334 > mobile: +44 (0)7767 825828 > web: www.o19s.com >
Re: How to remove duplicate tokens from solr
But not sure why these type of search string is causing high cpu utilization. On Fri, 18 Sep, 2020, 12:49 am Rahul Goswami, wrote: > Is this for a phrase search? If yes then the position of the token would > matter too and not sure which token would you want to remove. "eg > "tshirt hat tshirt". > Also, are you looking to save space and want this at index time? Or just > want to remove duplicates from the search string? > > If this is at search time AND is not a phrase search, there are a couple > approaches I could think of : > > 1) You could either handle this in the application layer to only pass the > deduplicated string before it hits solr > 2) You can write a custom search component and configure it in the > list to process the search string and remove duplicates > before it hits the default search components. See here ( > > https://lucene.apache.org/solr/guide/7_7/requesthandlers-and-searchcomponents-in-solrconfig.html#first-components-and-last-components > ). > > However if for search, I would still evaluate if writing those extra lines > of code is worth the investment. I say so since my assumption is that for > duplicated tokens in search string, lucene would have the intelligence to > not fetch the doc ids again, so you should not be worried about spending > computation resources to reevaluate the same tokens (Someone correct me if > I am wrong!) > > -Rahul > > On Thu, Sep 17, 2020 at 2:56 PM Rajdeep Sahoo > wrote: > > > If someone is searching with " tshirt tshirt tshirt tshirt tshirt tshirt" > > we need to remove the duplicates and search with tshirt. > > > > > > On Fri, 18 Sep, 2020, 12:19 am Alexandre Rafalovitch, < > arafa...@gmail.com> > > wrote: > > > > > This is not quite enough information. > > > There is > > > > > > https://lucene.apache.org/solr/guide/8_6/filter-descriptions.html#remove-duplicates-token-filter > > > but it has specific limitations. > > > > > > What is the problem that you are trying to solve that you feel is due > > > to duplicate tokens? Why are they duplicates? Is it about storage or > > > relevancy? > > > > > > Regards, > > >Alex. > > > > > > On Thu, 17 Sep 2020 at 14:35, Rajdeep Sahoo < > rajdeepsahoo2...@gmail.com> > > > wrote: > > > > > > > > Hi team, > > > > Is there any way to remove duplicate tokens from solr. Is there any > > > filter > > > > for this. > > > > > >
Re: How to remove duplicate tokens from solr
Is this for a phrase search? If yes then the position of the token would matter too and not sure which token would you want to remove. "eg "tshirt hat tshirt". Also, are you looking to save space and want this at index time? Or just want to remove duplicates from the search string? If this is at search time AND is not a phrase search, there are a couple approaches I could think of : 1) You could either handle this in the application layer to only pass the deduplicated string before it hits solr 2) You can write a custom search component and configure it in the list to process the search string and remove duplicates before it hits the default search components. See here ( https://lucene.apache.org/solr/guide/7_7/requesthandlers-and-searchcomponents-in-solrconfig.html#first-components-and-last-components ). However if for search, I would still evaluate if writing those extra lines of code is worth the investment. I say so since my assumption is that for duplicated tokens in search string, lucene would have the intelligence to not fetch the doc ids again, so you should not be worried about spending computation resources to reevaluate the same tokens (Someone correct me if I am wrong!) -Rahul On Thu, Sep 17, 2020 at 2:56 PM Rajdeep Sahoo wrote: > If someone is searching with " tshirt tshirt tshirt tshirt tshirt tshirt" > we need to remove the duplicates and search with tshirt. > > > On Fri, 18 Sep, 2020, 12:19 am Alexandre Rafalovitch, > wrote: > > > This is not quite enough information. > > There is > > > https://lucene.apache.org/solr/guide/8_6/filter-descriptions.html#remove-duplicates-token-filter > > but it has specific limitations. > > > > What is the problem that you are trying to solve that you feel is due > > to duplicate tokens? Why are they duplicates? Is it about storage or > > relevancy? > > > > Regards, > >Alex. > > > > On Thu, 17 Sep 2020 at 14:35, Rajdeep Sahoo > > wrote: > > > > > > Hi team, > > > Is there any way to remove duplicate tokens from solr. Is there any > > filter > > > for this. > > >
RE: Need to update SOLR_HOME in the solr service script and getting errors
Hi Mark. Thanks for taking the time to explain it so clearly. It makes perfect sense to me now and using chown solved the problem. Thanks again and have a great day. Victor -Original Message- From: Mark H. Wood Sent: Thursday, September 17, 2020 9:59 AM To: solr-user@lucene.apache.org Subject: Re: Need to update SOLR_HOME in the solr service script and getting errors On Wed, Sep 16, 2020 at 02:59:32PM +, Victor Kretzer wrote: > My setup is two solr nodes running on separate Azure Ubuntu 18.04 LTS vms > using an external zookeeper assembly. > I installed Solr 6.6.6 using the install file and then followed the steps for > enabling ssl. I am able to start solr, add collections and the like using > bin/solr script. > > Example: > /opt/solr$ sudo bin/solr start -cloud -s cloud/test2 -force > > However, if I restart the machine or attempt to start solr using the > installed service, it naturally goes back to the default SOLR_HOME in the > /etc/default/solr.in.sh script: "/var/solr/data" > > I've tried updating SOLR_HOME to "/opt/solr/cloud/test2" That is what I would do. > but then when I start the service I see the following error on the Admin > Dashboard: > SolrCore Initialization Failures > mycollection_shard1_replica1: > org.apache.solr.common.SolrException:org.apache.solr.common.SolrExcept > ion: > /opt/solr-6.6.6/cloud/test2/mycollection_shard1_replica1/data/index/wr > ite.lock Please check your logs for more information > > I'm including what I believe to be the pertinent information from the logs > below: You did well. > I suspect this is a permission issue because the solr user created by the > install script isn't allowed access to /opt/solr but I'm new to Linux and > haven't completely wrapped my head around the way permissions work with it. > Am I correct in guessing the cause of the error and, if so, how do I correct > this so that the service can be used to run my instances? Yes, the stack trace actually tells you explicitly that the problem is permissions on that file. Follow the chain of "Caused by:" and you'll see: Caused by: java.nio.file.AccessDeniedException: /opt/solr-6.6.6/cloud/test2/mycollection_shard1_replica1/data/index/write.lock Since, in the past, you have started Solr using 'sudo', this probably means that write.lock is owned by 'root'. Solr creates this file with permissions that allow only the owner to write it. If the service script runs Solr as any other user (and it should!) then Solr won't be able to open this file for writing, and because of this it won't complete the loading of that core. You should find out what user account is used by the service script, and 'chown' Solr's entire working directories tree to be owned by that user. Then, refrain from ever running Solr as 'root' or the problem may recur. Use the normal service start/stop mechanism for controlling your Solr instances. -- Mark H. Wood Lead Technology Analyst University Library Indiana University - Purdue University Indianapolis 755 W. Michigan Street Indianapolis, IN 46202 317-274-0749 www.ulib.iupui.edu
Re: NPE Issue with atomic update to nested document or child document through SolrJ
The missing underscore is a documentation bug, because it was not escaped the second time and the asciidoc chewed it up as an bold/italic indicator. The declaration and references should match. I am not sure about the code. I hope somebody else will step in on that part. Regards, Alex. On Thu, 17 Sep 2020 at 14:48, Pratik Patel wrote: > > I am running this in a unit test which deletes the collection after the > test is over. So every new test run gets a fresh collection. > > It is a very simple test where I am first indexing a couple of parent > documents with few children and then testing an atomic update on one parent > as I have posted in my previous message. (using UpdateRequest) > > I am not sure if I am triggering the atomic update correctly, do you see > any potential issue in that code? > > I noticed something in the documentation here. > https://lucene.apache.org/solr/guide/8_5/indexing-nested-documents.html#indexing-nested-documents > > name="_nest_path_" type="*nest_path*" /> > > field_type is declared with name *"_nest_path_"* whereas field is declared > with type *"nest_path". * > > Is this intentional? or should it be as follows? > > name="_nest_path_" type="* _nest_path_ *" /> > > Also, should we explicitly set index=true and store=true on _nest_path_ > and _nest_parent_ fields? > > > > On Thu, Sep 17, 2020 at 1:17 PM Alexandre Rafalovitch > wrote: > > > Did you reindex the original document after you added a new field? If > > not, then the previously indexed content is missing it and your code > > paths will get out of sync. > > > > Regards, > >Alex. > > P.s. I haven't done what you are doing before, so there may be > > something I am missing myself. > > > > > > On Thu, 17 Sep 2020 at 12:46, Pratik Patel wrote: > > > > > > Thanks for your reply Alexandre. > > > > > > I have "_root_" and "_nest_path_" fields in my schema but not > > > "_nest_parent_". > > > > > > > > > > > > > > > > > docValues="false" /> > > > > > > > > name="_nest_path_" class="solr.NestPathField" /> > > > > > > I ran my test after adding the "_nest_parent_" field and I am not getting > > > NPE any more which is good. Thanks! > > > > > > But looking at the documents in the index, I see that after the atomic > > > update, now there are two children documents with the same id. One > > document > > > has old values and another one has new values. Shouldn't they be merged > > > based on the "id"? Do we need to specify anything else in the request to > > > ensure that documents are merged/updated and not duplicated? > > > > > > For your reference, below is the test I am running now. > > > > > > // update field of one child doc > > > SolrInputDocument sdoc = new SolrInputDocument( ); > > > sdoc.addField( "id", testChildPOJO.id() ); > > > sdoc.addField( "conceptid", testChildPOJO.conceptid() ); > > > sdoc.addField( "storeid", "foo" ); > > > sdoc.setField( "fieldName", > > > java.util.Collections.singletonMap("set", Collections.list("bar" ) )); > > > > > > final UpdateRequest req = new UpdateRequest(); > > > req.withRoute( pojo1.id() );// parent id > > > req.add(sdoc); > > > > > > collection.client.request( req, > > collection.getCollectionName() > > > ); > > > collection.client.commit(); > > > > > > > > > Resulting documents : > > > > > > {id=c1_child1, conceptid=c1, storeid=s1, > > fieldName=c1_child1_field_value1, > > > startTime=Mon Sep 07 12:40:37 EDT 2020, integerField_iDF=10, > > > booleanField_bDF=true, _root_=abcd, _version_=1678099970090074112} > > > {id=c1_child1, conceptid=c1, storeid=foo, fieldName=bar, startTime=Mon > > Sep > > > 07 12:40:37 EDT 2020, integerField_iDF=10, booleanField_bDF=true, > > > _root_=abcd, _version_=1678099970405695488} > > > > > > > > > > > > > > > > > > > > > On Thu, Sep 17, 2020 at 12:01 PM Alexandre Rafalovitch < > > arafa...@gmail.com> > > > wrote: > > > > > > > Can you double-check your schema to see if you have all the fields > > > > required to support nested documents. You are supposed to get away > > > > with just _root_, but really you should also include _nest_path and > > > > _nest_parent_. Your particular exception seems to be triggering > > > > something (maybe a bug) related to - possibly - missing _nest_path_ > > > > field. > > > > > > > > See: > > > > > > https://lucene.apache.org/solr/guide/8_5/indexing-nested-documents.html#indexing-nested-documents > > > > > > > > Regards, > > > >Alex. > > > > > > > > On Wed, 16 Sep 2020 at 13:28, Pratik Patel > > wrote: > > > > > > > > > > Hello Everyone, > > > > > > > > > > I am trying to update a field of a child document using atomic > > updates > > > > > feature. I am using solr and solrJ version 8.5.0 > > > > > > > > > > I have ensured that my schema satisfies the conditions for atomic > > updates > > > > > and I am able to do atomic updates on normal documents but with >
Re: How to remove duplicate tokens from solr
If someone is searching with " tshirt tshirt tshirt tshirt tshirt tshirt" we need to remove the duplicates and search with tshirt. On Fri, 18 Sep, 2020, 12:19 am Alexandre Rafalovitch, wrote: > This is not quite enough information. > There is > https://lucene.apache.org/solr/guide/8_6/filter-descriptions.html#remove-duplicates-token-filter > but it has specific limitations. > > What is the problem that you are trying to solve that you feel is due > to duplicate tokens? Why are they duplicates? Is it about storage or > relevancy? > > Regards, >Alex. > > On Thu, 17 Sep 2020 at 14:35, Rajdeep Sahoo > wrote: > > > > Hi team, > > Is there any way to remove duplicate tokens from solr. Is there any > filter > > for this. >
Re: How to remove duplicate tokens from solr
This is not quite enough information. There is https://lucene.apache.org/solr/guide/8_6/filter-descriptions.html#remove-duplicates-token-filter but it has specific limitations. What is the problem that you are trying to solve that you feel is due to duplicate tokens? Why are they duplicates? Is it about storage or relevancy? Regards, Alex. On Thu, 17 Sep 2020 at 14:35, Rajdeep Sahoo wrote: > > Hi team, > Is there any way to remove duplicate tokens from solr. Is there any filter > for this.
Re: NPE Issue with atomic update to nested document or child document through SolrJ
I am running this in a unit test which deletes the collection after the test is over. So every new test run gets a fresh collection. It is a very simple test where I am first indexing a couple of parent documents with few children and then testing an atomic update on one parent as I have posted in my previous message. (using UpdateRequest) I am not sure if I am triggering the atomic update correctly, do you see any potential issue in that code? I noticed something in the documentation here. https://lucene.apache.org/solr/guide/8_5/indexing-nested-documents.html#indexing-nested-documents field_type is declared with name *"_nest_path_"* whereas field is declared with type *"nest_path". * Is this intentional? or should it be as follows? Also, should we explicitly set index=true and store=true on _nest_path_ and _nest_parent_ fields? On Thu, Sep 17, 2020 at 1:17 PM Alexandre Rafalovitch wrote: > Did you reindex the original document after you added a new field? If > not, then the previously indexed content is missing it and your code > paths will get out of sync. > > Regards, >Alex. > P.s. I haven't done what you are doing before, so there may be > something I am missing myself. > > > On Thu, 17 Sep 2020 at 12:46, Pratik Patel wrote: > > > > Thanks for your reply Alexandre. > > > > I have "_root_" and "_nest_path_" fields in my schema but not > > "_nest_parent_". > > > > > > > > > > > docValues="false" /> > > > > > name="_nest_path_" class="solr.NestPathField" /> > > > > I ran my test after adding the "_nest_parent_" field and I am not getting > > NPE any more which is good. Thanks! > > > > But looking at the documents in the index, I see that after the atomic > > update, now there are two children documents with the same id. One > document > > has old values and another one has new values. Shouldn't they be merged > > based on the "id"? Do we need to specify anything else in the request to > > ensure that documents are merged/updated and not duplicated? > > > > For your reference, below is the test I am running now. > > > > // update field of one child doc > > SolrInputDocument sdoc = new SolrInputDocument( ); > > sdoc.addField( "id", testChildPOJO.id() ); > > sdoc.addField( "conceptid", testChildPOJO.conceptid() ); > > sdoc.addField( "storeid", "foo" ); > > sdoc.setField( "fieldName", > > java.util.Collections.singletonMap("set", Collections.list("bar" ) )); > > > > final UpdateRequest req = new UpdateRequest(); > > req.withRoute( pojo1.id() );// parent id > > req.add(sdoc); > > > > collection.client.request( req, > collection.getCollectionName() > > ); > > collection.client.commit(); > > > > > > Resulting documents : > > > > {id=c1_child1, conceptid=c1, storeid=s1, > fieldName=c1_child1_field_value1, > > startTime=Mon Sep 07 12:40:37 EDT 2020, integerField_iDF=10, > > booleanField_bDF=true, _root_=abcd, _version_=1678099970090074112} > > {id=c1_child1, conceptid=c1, storeid=foo, fieldName=bar, startTime=Mon > Sep > > 07 12:40:37 EDT 2020, integerField_iDF=10, booleanField_bDF=true, > > _root_=abcd, _version_=1678099970405695488} > > > > > > > > > > > > > > On Thu, Sep 17, 2020 at 12:01 PM Alexandre Rafalovitch < > arafa...@gmail.com> > > wrote: > > > > > Can you double-check your schema to see if you have all the fields > > > required to support nested documents. You are supposed to get away > > > with just _root_, but really you should also include _nest_path and > > > _nest_parent_. Your particular exception seems to be triggering > > > something (maybe a bug) related to - possibly - missing _nest_path_ > > > field. > > > > > > See: > > > > https://lucene.apache.org/solr/guide/8_5/indexing-nested-documents.html#indexing-nested-documents > > > > > > Regards, > > >Alex. > > > > > > On Wed, 16 Sep 2020 at 13:28, Pratik Patel > wrote: > > > > > > > > Hello Everyone, > > > > > > > > I am trying to update a field of a child document using atomic > updates > > > > feature. I am using solr and solrJ version 8.5.0 > > > > > > > > I have ensured that my schema satisfies the conditions for atomic > updates > > > > and I am able to do atomic updates on normal documents but with > nested > > > > child documents, I am getting a Null Pointer Exception. Following is > the > > > > simple test which I am trying. > > > > > > > > TestPojo pojo1 = new TestPojo().cId( "abcd" ) > > > > > .conceptid( "c1" ) > > > > > .storeid( storeId > ) > > > > > .testChildPojos( > > > > > Collections.list( testChildPOJO, testChildPOJO2, > > > > > > > > testChildPOJO3 ) > > > > > ); > > > > > TestChildPOJOtestChildPOJO = new > TestChildPOJO().cId( > > > > > "c1_child1" ) > > > > >
How to remove duplicate tokens from solr
Hi team, Is there any way to remove duplicate tokens from solr. Is there any filter for this.
Re: Doing what does using SolrJ API
Thank you all for your feedback. They are very helpful. @Walther, out of the 1000 fields in Solr's schema, only 5 are set as "required" fields and the Solr doc that I create and then send to Solr for indexing, contains only those fields that have data to be indexed. So some docs will have 10 fields, some 50, etc. Steven On Thu, Sep 17, 2020 at 1:55 PM Erick Erickson wrote: > The script can actually be written an any number of scripting languages, > python, groovy, > javascript etc. but Alexandre’s comments about javascript are well taken. > > It all depends here on whether you every want to search the fields > individually. If you do, > you need to have them in your index as well as the copyField. > > > On Sep 17, 2020, at 1:37 PM, Walter Underwood > wrote: > > > > If you want to ignore a field being sent to Solr, you can set > indexed=false and > > stored=false for that field in schema.xml. It will take up room in > schema.xml but > > zero room on disk. > > > > wunder > > Walter Underwood > > wun...@wunderwood.org > > http://observer.wunderwood.org/ (my blog) > > > >> On Sep 17, 2020, at 10:23 AM, Alexandre Rafalovitch > wrote: > >> > >> Solr has a whole pipeline that you can run during document ingesting > before > >> the actual indexing happens. It is called Update Request Processor (URP) > >> and is defined in solrconfig.xml or in an override file. Obviously, > since > >> you are indexing from SolrJ client, you have even more flexibility, but > it > >> is good to know about anyway. > >> > >> You can read all about it at: > >> https://lucene.apache.org/solr/guide/8_6/update-request-processors.html > and > >> see the extensive list of processors you can leverage. The specific > >> mentioned one is this one: > >> > https://lucene.apache.org/solr/8_6_0//solr-core/org/apache/solr/update/processor/StatelessScriptUpdateProcessorFactory.html > >> > >> Just a word of warning that Stateless URP is using Javascript, which is > >> getting a bit of a complicated story as underlying JVM is upgraded > (Oracle > >> dropped their javascript engine in JDK 14). So if one of the simpler > URPs > >> will do the job or a chain of them, that may be a better path to take. > >> > >> Regards, > >> Alex. > >> > >> > >> On Thu, 17 Sep 2020 at 13:13, Steven White > wrote: > >> > >>> Thanks Erick. Where can I learn more about "stateless script update > >>> processor factory". I don't know what you mean by this. > >>> > >>> Steven > >>> > >>> On Thu, Sep 17, 2020 at 1:08 PM Erick Erickson < > erickerick...@gmail.com> > >>> wrote: > >>> > 1000 fields is fine, you'll waste some cycles on bookkeeping, but I > >>> really > doubt you'll notice. That said, are these fields used for searching? > Because you do have control over what gous into the index if you can > put > >>> a > "stateless script update processor factory" in your update chain. > There > >>> you > can do whatever you want, including combine all the fields into one > and > delete the original fields. There's no point in having your index > >>> cluttered > with unused fields, OTOH, it may not be worth the effort just to > satisfy > >>> my > sense of aesthetics 😉 > > On Thu, Sep 17, 2020, 12:59 Steven White > wrote: > > > Hi Eric, > > > > Yes, this is coming from a DB. Unfortunately I have no control over > >>> the > > list of fields. Out of the 1000 fields that there maybe, no > document, > that > > gets indexed into Solr will use more then about 50 and since i'm > >>> copying > > the values of those fields to the catch-all field and the catch-all > >>> field > > is my default search field, I don't expect any problem for having > 1000 > > fields in Solr's schema, or should I? > > > > Thanks > > > > Steven > > > > > > On Thu, Sep 17, 2020 at 8:23 AM Erick Erickson < > >>> erickerick...@gmail.com> > > wrote: > > > >> “there over 1000 of them[fields]” > >> > >> This is often a red flag in my experience. Solr will handle that > many > >> fields, I’ve seen many more. But this is often a result of > >> “database thinking”, i.e. your mental model of how all this data > >> is from a DB perspective rather than a search perspective. > >> > >> It’s unwieldy to have that many fields. Obviously I don’t know the > >> particulars of > >> your app, and maybe that’s the best design. Particularly if many of > >>> the > >> fields > >> are sparsely populated, i.e. only a small percentage of the > documents > in > >> your > >> corpus have any value for that field then taking a step back and > looking > >> at the design might save you some grief down the line. > >> > >> For instance, I’ve seen designs where instead of > >> field1:some_value > >> field2:other_value…. > >> > >> you use a single field with _tokens_ like: > >> field:field1_some_value > >> fie
Re: Handling failure when adding docs to Solr using SolrJ
I recommend _against_ issuing explicit commits from the client, let your solrconfig.xml autocommit settings take care of it. Make sure either your soft or hard commits open a new searcher for the docs to be searchable. I’ll bend a little bit if you can _guarantee_ that you only ever have one indexing client running and basically only ever issue the commit at the end. There’s another strategy, do the solrClient.add() command with the commitWithin parameter. As far as failures, look at https://lucene.apache.org/solr/7_3_0/solr-core/org/apache/solr/update/processor/TolerantUpdateProcessor.html that’ll give you a better clue about _which_ docs failed. From there, though, it’s a bit if debugging to figure out why that particular doc failed, usually people record the docs that failed for later analysis. and/or look at the Solr logs which usually give a more detailed reason of _why_ a document failed... Best, Erick > On Sep 17, 2020, at 1:09 PM, Steven White wrote: > > Hi everyone, > > I'm trying to figure out when and how I should handle failures that may > occur during indexing. In the sample code below, look at my comment and > let me know what state my index is in when things fail: > > SolrClient solrClient = new HttpSolrClient.Builder(url).build(); > > solrClient.add(solrDocs); > > // #1: What to do if add() fails? And how do I know if all or some of > my docs in 'solrDocs' made it to the index or not ('solrDocs' is a list of > 1 or more doc), should I retry add() again? Retry with a smaller chunk? > Etc. > > if (doCommit == true) > { > solrClient.commit(); > > // #2: What to do if commit() fails? Re-issue commit() again? > } > > Thanks > > Steven
Re: Doing what does using SolrJ API
The script can actually be written an any number of scripting languages, python, groovy, javascript etc. but Alexandre’s comments about javascript are well taken. It all depends here on whether you every want to search the fields individually. If you do, you need to have them in your index as well as the copyField. > On Sep 17, 2020, at 1:37 PM, Walter Underwood wrote: > > If you want to ignore a field being sent to Solr, you can set indexed=false > and > stored=false for that field in schema.xml. It will take up room in schema.xml > but > zero room on disk. > > wunder > Walter Underwood > wun...@wunderwood.org > http://observer.wunderwood.org/ (my blog) > >> On Sep 17, 2020, at 10:23 AM, Alexandre Rafalovitch >> wrote: >> >> Solr has a whole pipeline that you can run during document ingesting before >> the actual indexing happens. It is called Update Request Processor (URP) >> and is defined in solrconfig.xml or in an override file. Obviously, since >> you are indexing from SolrJ client, you have even more flexibility, but it >> is good to know about anyway. >> >> You can read all about it at: >> https://lucene.apache.org/solr/guide/8_6/update-request-processors.html and >> see the extensive list of processors you can leverage. The specific >> mentioned one is this one: >> https://lucene.apache.org/solr/8_6_0//solr-core/org/apache/solr/update/processor/StatelessScriptUpdateProcessorFactory.html >> >> Just a word of warning that Stateless URP is using Javascript, which is >> getting a bit of a complicated story as underlying JVM is upgraded (Oracle >> dropped their javascript engine in JDK 14). So if one of the simpler URPs >> will do the job or a chain of them, that may be a better path to take. >> >> Regards, >> Alex. >> >> >> On Thu, 17 Sep 2020 at 13:13, Steven White wrote: >> >>> Thanks Erick. Where can I learn more about "stateless script update >>> processor factory". I don't know what you mean by this. >>> >>> Steven >>> >>> On Thu, Sep 17, 2020 at 1:08 PM Erick Erickson >>> wrote: >>> 1000 fields is fine, you'll waste some cycles on bookkeeping, but I >>> really doubt you'll notice. That said, are these fields used for searching? Because you do have control over what gous into the index if you can put >>> a "stateless script update processor factory" in your update chain. There >>> you can do whatever you want, including combine all the fields into one and delete the original fields. There's no point in having your index >>> cluttered with unused fields, OTOH, it may not be worth the effort just to satisfy >>> my sense of aesthetics 😉 On Thu, Sep 17, 2020, 12:59 Steven White wrote: > Hi Eric, > > Yes, this is coming from a DB. Unfortunately I have no control over >>> the > list of fields. Out of the 1000 fields that there maybe, no document, that > gets indexed into Solr will use more then about 50 and since i'm >>> copying > the values of those fields to the catch-all field and the catch-all >>> field > is my default search field, I don't expect any problem for having 1000 > fields in Solr's schema, or should I? > > Thanks > > Steven > > > On Thu, Sep 17, 2020 at 8:23 AM Erick Erickson < >>> erickerick...@gmail.com> > wrote: > >> “there over 1000 of them[fields]” >> >> This is often a red flag in my experience. Solr will handle that many >> fields, I’ve seen many more. But this is often a result of >> “database thinking”, i.e. your mental model of how all this data >> is from a DB perspective rather than a search perspective. >> >> It’s unwieldy to have that many fields. Obviously I don’t know the >> particulars of >> your app, and maybe that’s the best design. Particularly if many of >>> the >> fields >> are sparsely populated, i.e. only a small percentage of the documents in >> your >> corpus have any value for that field then taking a step back and looking >> at the design might save you some grief down the line. >> >> For instance, I’ve seen designs where instead of >> field1:some_value >> field2:other_value…. >> >> you use a single field with _tokens_ like: >> field:field1_some_value >> field:field2_other_value >> >> that drops the complexity and increases performance. >> >> Anyway, just a thought you might want to consider. >> >> Best, >> Erick >> >>> On Sep 16, 2020, at 9:31 PM, Steven White > wrote: >>> >>> Hi everyone, >>> >>> I figured it out. It is as simple as creating a List and using >>> that as the value part for SolrInputDocument.addField() API. >>> >>> Thanks, >>> >>> Steven >>> >>> >>> On Wed, Sep 16, 2020 at 9:13 PM Steven White >>> >> wrote: >>> Hi everyone, I want to
Re: Doing what does using SolrJ API
If you want to ignore a field being sent to Solr, you can set indexed=false and stored=false for that field in schema.xml. It will take up room in schema.xml but zero room on disk. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Sep 17, 2020, at 10:23 AM, Alexandre Rafalovitch > wrote: > > Solr has a whole pipeline that you can run during document ingesting before > the actual indexing happens. It is called Update Request Processor (URP) > and is defined in solrconfig.xml or in an override file. Obviously, since > you are indexing from SolrJ client, you have even more flexibility, but it > is good to know about anyway. > > You can read all about it at: > https://lucene.apache.org/solr/guide/8_6/update-request-processors.html and > see the extensive list of processors you can leverage. The specific > mentioned one is this one: > https://lucene.apache.org/solr/8_6_0//solr-core/org/apache/solr/update/processor/StatelessScriptUpdateProcessorFactory.html > > Just a word of warning that Stateless URP is using Javascript, which is > getting a bit of a complicated story as underlying JVM is upgraded (Oracle > dropped their javascript engine in JDK 14). So if one of the simpler URPs > will do the job or a chain of them, that may be a better path to take. > > Regards, > Alex. > > > On Thu, 17 Sep 2020 at 13:13, Steven White wrote: > >> Thanks Erick. Where can I learn more about "stateless script update >> processor factory". I don't know what you mean by this. >> >> Steven >> >> On Thu, Sep 17, 2020 at 1:08 PM Erick Erickson >> wrote: >> >>> 1000 fields is fine, you'll waste some cycles on bookkeeping, but I >> really >>> doubt you'll notice. That said, are these fields used for searching? >>> Because you do have control over what gous into the index if you can put >> a >>> "stateless script update processor factory" in your update chain. There >> you >>> can do whatever you want, including combine all the fields into one and >>> delete the original fields. There's no point in having your index >> cluttered >>> with unused fields, OTOH, it may not be worth the effort just to satisfy >> my >>> sense of aesthetics 😉 >>> >>> On Thu, Sep 17, 2020, 12:59 Steven White wrote: >>> Hi Eric, Yes, this is coming from a DB. Unfortunately I have no control over >> the list of fields. Out of the 1000 fields that there maybe, no document, >>> that gets indexed into Solr will use more then about 50 and since i'm >> copying the values of those fields to the catch-all field and the catch-all >> field is my default search field, I don't expect any problem for having 1000 fields in Solr's schema, or should I? Thanks Steven On Thu, Sep 17, 2020 at 8:23 AM Erick Erickson < >> erickerick...@gmail.com> wrote: > “there over 1000 of them[fields]” > > This is often a red flag in my experience. Solr will handle that many > fields, I’ve seen many more. But this is often a result of > “database thinking”, i.e. your mental model of how all this data > is from a DB perspective rather than a search perspective. > > It’s unwieldy to have that many fields. Obviously I don’t know the > particulars of > your app, and maybe that’s the best design. Particularly if many of >> the > fields > are sparsely populated, i.e. only a small percentage of the documents >>> in > your > corpus have any value for that field then taking a step back and >>> looking > at the design might save you some grief down the line. > > For instance, I’ve seen designs where instead of > field1:some_value > field2:other_value…. > > you use a single field with _tokens_ like: > field:field1_some_value > field:field2_other_value > > that drops the complexity and increases performance. > > Anyway, just a thought you might want to consider. > > Best, > Erick > >> On Sep 16, 2020, at 9:31 PM, Steven White wrote: >> >> Hi everyone, >> >> I figured it out. It is as simple as creating a List and >>> using >> that as the value part for SolrInputDocument.addField() API. >> >> Thanks, >> >> Steven >> >> >> On Wed, Sep 16, 2020 at 9:13 PM Steven White >> > wrote: >> >>> Hi everyone, >>> >>> I want to avoid creating a >> source="OneFieldOfMany"/> in my schema (there will be over 1000 of them > and >>> maybe more so managing it will be a pain). Instead, I want to use SolrJ >>> API to do what does. Any example of how I can do >> this? If >>> there is an example online, that would be great. >>> >>> Thanks in advance. >>> >>> Steven >>> > > >>> >>
Re: Doing what does using SolrJ API
Solr has a whole pipeline that you can run during document ingesting before the actual indexing happens. It is called Update Request Processor (URP) and is defined in solrconfig.xml or in an override file. Obviously, since you are indexing from SolrJ client, you have even more flexibility, but it is good to know about anyway. You can read all about it at: https://lucene.apache.org/solr/guide/8_6/update-request-processors.html and see the extensive list of processors you can leverage. The specific mentioned one is this one: https://lucene.apache.org/solr/8_6_0//solr-core/org/apache/solr/update/processor/StatelessScriptUpdateProcessorFactory.html Just a word of warning that Stateless URP is using Javascript, which is getting a bit of a complicated story as underlying JVM is upgraded (Oracle dropped their javascript engine in JDK 14). So if one of the simpler URPs will do the job or a chain of them, that may be a better path to take. Regards, Alex. On Thu, 17 Sep 2020 at 13:13, Steven White wrote: > Thanks Erick. Where can I learn more about "stateless script update > processor factory". I don't know what you mean by this. > > Steven > > On Thu, Sep 17, 2020 at 1:08 PM Erick Erickson > wrote: > > > 1000 fields is fine, you'll waste some cycles on bookkeeping, but I > really > > doubt you'll notice. That said, are these fields used for searching? > > Because you do have control over what gous into the index if you can put > a > > "stateless script update processor factory" in your update chain. There > you > > can do whatever you want, including combine all the fields into one and > > delete the original fields. There's no point in having your index > cluttered > > with unused fields, OTOH, it may not be worth the effort just to satisfy > my > > sense of aesthetics 😉 > > > > On Thu, Sep 17, 2020, 12:59 Steven White wrote: > > > > > Hi Eric, > > > > > > Yes, this is coming from a DB. Unfortunately I have no control over > the > > > list of fields. Out of the 1000 fields that there maybe, no document, > > that > > > gets indexed into Solr will use more then about 50 and since i'm > copying > > > the values of those fields to the catch-all field and the catch-all > field > > > is my default search field, I don't expect any problem for having 1000 > > > fields in Solr's schema, or should I? > > > > > > Thanks > > > > > > Steven > > > > > > > > > On Thu, Sep 17, 2020 at 8:23 AM Erick Erickson < > erickerick...@gmail.com> > > > wrote: > > > > > > > “there over 1000 of them[fields]” > > > > > > > > This is often a red flag in my experience. Solr will handle that many > > > > fields, I’ve seen many more. But this is often a result of > > > > “database thinking”, i.e. your mental model of how all this data > > > > is from a DB perspective rather than a search perspective. > > > > > > > > It’s unwieldy to have that many fields. Obviously I don’t know the > > > > particulars of > > > > your app, and maybe that’s the best design. Particularly if many of > the > > > > fields > > > > are sparsely populated, i.e. only a small percentage of the documents > > in > > > > your > > > > corpus have any value for that field then taking a step back and > > looking > > > > at the design might save you some grief down the line. > > > > > > > > For instance, I’ve seen designs where instead of > > > > field1:some_value > > > > field2:other_value…. > > > > > > > > you use a single field with _tokens_ like: > > > > field:field1_some_value > > > > field:field2_other_value > > > > > > > > that drops the complexity and increases performance. > > > > > > > > Anyway, just a thought you might want to consider. > > > > > > > > Best, > > > > Erick > > > > > > > > > On Sep 16, 2020, at 9:31 PM, Steven White > > > wrote: > > > > > > > > > > Hi everyone, > > > > > > > > > > I figured it out. It is as simple as creating a List and > > using > > > > > that as the value part for SolrInputDocument.addField() API. > > > > > > > > > > Thanks, > > > > > > > > > > Steven > > > > > > > > > > > > > > > On Wed, Sep 16, 2020 at 9:13 PM Steven White > > > > > wrote: > > > > > > > > > >> Hi everyone, > > > > >> > > > > >> I want to avoid creating a > > > >> source="OneFieldOfMany"/> in my schema (there will be over 1000 of > > > them > > > > and > > > > >> maybe more so managing it will be a pain). Instead, I want to use > > > SolrJ > > > > >> API to do what does. Any example of how I can do > this? > > > If > > > > >> there is an example online, that would be great. > > > > >> > > > > >> Thanks in advance. > > > > >> > > > > >> Steven > > > > >> > > > > > > > > > > > > > >
Re: NPE Issue with atomic update to nested document or child document through SolrJ
Did you reindex the original document after you added a new field? If not, then the previously indexed content is missing it and your code paths will get out of sync. Regards, Alex. P.s. I haven't done what you are doing before, so there may be something I am missing myself. On Thu, 17 Sep 2020 at 12:46, Pratik Patel wrote: > > Thanks for your reply Alexandre. > > I have "_root_" and "_nest_path_" fields in my schema but not > "_nest_parent_". > > > > > docValues="false" /> > > name="_nest_path_" class="solr.NestPathField" /> > > I ran my test after adding the "_nest_parent_" field and I am not getting > NPE any more which is good. Thanks! > > But looking at the documents in the index, I see that after the atomic > update, now there are two children documents with the same id. One document > has old values and another one has new values. Shouldn't they be merged > based on the "id"? Do we need to specify anything else in the request to > ensure that documents are merged/updated and not duplicated? > > For your reference, below is the test I am running now. > > // update field of one child doc > SolrInputDocument sdoc = new SolrInputDocument( ); > sdoc.addField( "id", testChildPOJO.id() ); > sdoc.addField( "conceptid", testChildPOJO.conceptid() ); > sdoc.addField( "storeid", "foo" ); > sdoc.setField( "fieldName", > java.util.Collections.singletonMap("set", Collections.list("bar" ) )); > > final UpdateRequest req = new UpdateRequest(); > req.withRoute( pojo1.id() );// parent id > req.add(sdoc); > > collection.client.request( req, collection.getCollectionName() > ); > collection.client.commit(); > > > Resulting documents : > > {id=c1_child1, conceptid=c1, storeid=s1, fieldName=c1_child1_field_value1, > startTime=Mon Sep 07 12:40:37 EDT 2020, integerField_iDF=10, > booleanField_bDF=true, _root_=abcd, _version_=1678099970090074112} > {id=c1_child1, conceptid=c1, storeid=foo, fieldName=bar, startTime=Mon Sep > 07 12:40:37 EDT 2020, integerField_iDF=10, booleanField_bDF=true, > _root_=abcd, _version_=1678099970405695488} > > > > > > > On Thu, Sep 17, 2020 at 12:01 PM Alexandre Rafalovitch > wrote: > > > Can you double-check your schema to see if you have all the fields > > required to support nested documents. You are supposed to get away > > with just _root_, but really you should also include _nest_path and > > _nest_parent_. Your particular exception seems to be triggering > > something (maybe a bug) related to - possibly - missing _nest_path_ > > field. > > > > See: > > https://lucene.apache.org/solr/guide/8_5/indexing-nested-documents.html#indexing-nested-documents > > > > Regards, > >Alex. > > > > On Wed, 16 Sep 2020 at 13:28, Pratik Patel wrote: > > > > > > Hello Everyone, > > > > > > I am trying to update a field of a child document using atomic updates > > > feature. I am using solr and solrJ version 8.5.0 > > > > > > I have ensured that my schema satisfies the conditions for atomic updates > > > and I am able to do atomic updates on normal documents but with nested > > > child documents, I am getting a Null Pointer Exception. Following is the > > > simple test which I am trying. > > > > > > TestPojo pojo1 = new TestPojo().cId( "abcd" ) > > > > .conceptid( "c1" ) > > > > .storeid( storeId ) > > > > .testChildPojos( > > > > Collections.list( testChildPOJO, testChildPOJO2, > > > > > > testChildPOJO3 ) > > > > ); > > > > TestChildPOJOtestChildPOJO = new TestChildPOJO().cId( > > > > "c1_child1" ) > > > > .conceptid( "c1" > > ) > > > > .storeid( > > storeId ) > > > > .fieldName( > > > > "c1_child1_field_value1" ) > > > > .startTime( > > > > Date.from( now.minus( 10, ChronoUnit.DAYS ) ) ) > > > > > > .integerField_iDF( > > > > 10 ) > > > > > > > > .booleanField_bDF(true); > > > > // index pojo1 with child testChildPOJO > > > > SolrInputDocument sdoc = new SolrInputDocument(); > > > > sdoc.addField( "_route_", pojo1.cId() ); > > > > sdoc.addField( "id", testChildPOJO.cId() ); > > > > sdoc.addField( "conceptid", testChildPOJO.conceptid() ); > > > > sdoc.addField( "storeid", testChildPOJO.cId() ); > > > > sdoc.setField( "fieldName", java.util.Collections.singletonMap("set", > > > > Collections.list(testChildPOJO.fieldName() + postfix) ) ); // modify > > field > > > > "fieldName" > > > > collection.client.add( sdoc ); // results in NPE! > > > > > > > > > Stack Trace: > > > > > > ERROR org.apache.solr.client.solrj.impl.BaseCloudSolrClient - Request to > > > > collection [col
Re: Doing what does using SolrJ API
Thanks Erick. Where can I learn more about "stateless script update processor factory". I don't know what you mean by this. Steven On Thu, Sep 17, 2020 at 1:08 PM Erick Erickson wrote: > 1000 fields is fine, you'll waste some cycles on bookkeeping, but I really > doubt you'll notice. That said, are these fields used for searching? > Because you do have control over what gous into the index if you can put a > "stateless script update processor factory" in your update chain. There you > can do whatever you want, including combine all the fields into one and > delete the original fields. There's no point in having your index cluttered > with unused fields, OTOH, it may not be worth the effort just to satisfy my > sense of aesthetics 😉 > > On Thu, Sep 17, 2020, 12:59 Steven White wrote: > > > Hi Eric, > > > > Yes, this is coming from a DB. Unfortunately I have no control over the > > list of fields. Out of the 1000 fields that there maybe, no document, > that > > gets indexed into Solr will use more then about 50 and since i'm copying > > the values of those fields to the catch-all field and the catch-all field > > is my default search field, I don't expect any problem for having 1000 > > fields in Solr's schema, or should I? > > > > Thanks > > > > Steven > > > > > > On Thu, Sep 17, 2020 at 8:23 AM Erick Erickson > > wrote: > > > > > “there over 1000 of them[fields]” > > > > > > This is often a red flag in my experience. Solr will handle that many > > > fields, I’ve seen many more. But this is often a result of > > > “database thinking”, i.e. your mental model of how all this data > > > is from a DB perspective rather than a search perspective. > > > > > > It’s unwieldy to have that many fields. Obviously I don’t know the > > > particulars of > > > your app, and maybe that’s the best design. Particularly if many of the > > > fields > > > are sparsely populated, i.e. only a small percentage of the documents > in > > > your > > > corpus have any value for that field then taking a step back and > looking > > > at the design might save you some grief down the line. > > > > > > For instance, I’ve seen designs where instead of > > > field1:some_value > > > field2:other_value…. > > > > > > you use a single field with _tokens_ like: > > > field:field1_some_value > > > field:field2_other_value > > > > > > that drops the complexity and increases performance. > > > > > > Anyway, just a thought you might want to consider. > > > > > > Best, > > > Erick > > > > > > > On Sep 16, 2020, at 9:31 PM, Steven White > > wrote: > > > > > > > > Hi everyone, > > > > > > > > I figured it out. It is as simple as creating a List and > using > > > > that as the value part for SolrInputDocument.addField() API. > > > > > > > > Thanks, > > > > > > > > Steven > > > > > > > > > > > > On Wed, Sep 16, 2020 at 9:13 PM Steven White > > > wrote: > > > > > > > >> Hi everyone, > > > >> > > > >> I want to avoid creating a > > >> source="OneFieldOfMany"/> in my schema (there will be over 1000 of > > them > > > and > > > >> maybe more so managing it will be a pain). Instead, I want to use > > SolrJ > > > >> API to do what does. Any example of how I can do this? > > If > > > >> there is an example online, that would be great. > > > >> > > > >> Thanks in advance. > > > >> > > > >> Steven > > > >> > > > > > > > > >
Handling failure when adding docs to Solr using SolrJ
Hi everyone, I'm trying to figure out when and how I should handle failures that may occur during indexing. In the sample code below, look at my comment and let me know what state my index is in when things fail: SolrClient solrClient = new HttpSolrClient.Builder(url).build(); solrClient.add(solrDocs); // #1: What to do if add() fails? And how do I know if all or some of my docs in 'solrDocs' made it to the index or not ('solrDocs' is a list of 1 or more doc), should I retry add() again? Retry with a smaller chunk? Etc. if (doCommit == true) { solrClient.commit(); // #2: What to do if commit() fails? Re-issue commit() again? } Thanks Steven
Re: Doing what does using SolrJ API
1000 fields is fine, you'll waste some cycles on bookkeeping, but I really doubt you'll notice. That said, are these fields used for searching? Because you do have control over what gous into the index if you can put a "stateless script update processor factory" in your update chain. There you can do whatever you want, including combine all the fields into one and delete the original fields. There's no point in having your index cluttered with unused fields, OTOH, it may not be worth the effort just to satisfy my sense of aesthetics 😉 On Thu, Sep 17, 2020, 12:59 Steven White wrote: > Hi Eric, > > Yes, this is coming from a DB. Unfortunately I have no control over the > list of fields. Out of the 1000 fields that there maybe, no document, that > gets indexed into Solr will use more then about 50 and since i'm copying > the values of those fields to the catch-all field and the catch-all field > is my default search field, I don't expect any problem for having 1000 > fields in Solr's schema, or should I? > > Thanks > > Steven > > > On Thu, Sep 17, 2020 at 8:23 AM Erick Erickson > wrote: > > > “there over 1000 of them[fields]” > > > > This is often a red flag in my experience. Solr will handle that many > > fields, I’ve seen many more. But this is often a result of > > “database thinking”, i.e. your mental model of how all this data > > is from a DB perspective rather than a search perspective. > > > > It’s unwieldy to have that many fields. Obviously I don’t know the > > particulars of > > your app, and maybe that’s the best design. Particularly if many of the > > fields > > are sparsely populated, i.e. only a small percentage of the documents in > > your > > corpus have any value for that field then taking a step back and looking > > at the design might save you some grief down the line. > > > > For instance, I’ve seen designs where instead of > > field1:some_value > > field2:other_value…. > > > > you use a single field with _tokens_ like: > > field:field1_some_value > > field:field2_other_value > > > > that drops the complexity and increases performance. > > > > Anyway, just a thought you might want to consider. > > > > Best, > > Erick > > > > > On Sep 16, 2020, at 9:31 PM, Steven White > wrote: > > > > > > Hi everyone, > > > > > > I figured it out. It is as simple as creating a List and using > > > that as the value part for SolrInputDocument.addField() API. > > > > > > Thanks, > > > > > > Steven > > > > > > > > > On Wed, Sep 16, 2020 at 9:13 PM Steven White > > wrote: > > > > > >> Hi everyone, > > >> > > >> I want to avoid creating a > >> source="OneFieldOfMany"/> in my schema (there will be over 1000 of > them > > and > > >> maybe more so managing it will be a pain). Instead, I want to use > SolrJ > > >> API to do what does. Any example of how I can do this? > If > > >> there is an example online, that would be great. > > >> > > >> Thanks in advance. > > >> > > >> Steven > > >> > > > > >
Re: Doing what does using SolrJ API
Hi Eric, Yes, this is coming from a DB. Unfortunately I have no control over the list of fields. Out of the 1000 fields that there maybe, no document, that gets indexed into Solr will use more then about 50 and since i'm copying the values of those fields to the catch-all field and the catch-all field is my default search field, I don't expect any problem for having 1000 fields in Solr's schema, or should I? Thanks Steven On Thu, Sep 17, 2020 at 8:23 AM Erick Erickson wrote: > “there over 1000 of them[fields]” > > This is often a red flag in my experience. Solr will handle that many > fields, I’ve seen many more. But this is often a result of > “database thinking”, i.e. your mental model of how all this data > is from a DB perspective rather than a search perspective. > > It’s unwieldy to have that many fields. Obviously I don’t know the > particulars of > your app, and maybe that’s the best design. Particularly if many of the > fields > are sparsely populated, i.e. only a small percentage of the documents in > your > corpus have any value for that field then taking a step back and looking > at the design might save you some grief down the line. > > For instance, I’ve seen designs where instead of > field1:some_value > field2:other_value…. > > you use a single field with _tokens_ like: > field:field1_some_value > field:field2_other_value > > that drops the complexity and increases performance. > > Anyway, just a thought you might want to consider. > > Best, > Erick > > > On Sep 16, 2020, at 9:31 PM, Steven White wrote: > > > > Hi everyone, > > > > I figured it out. It is as simple as creating a List and using > > that as the value part for SolrInputDocument.addField() API. > > > > Thanks, > > > > Steven > > > > > > On Wed, Sep 16, 2020 at 9:13 PM Steven White > wrote: > > > >> Hi everyone, > >> > >> I want to avoid creating a >> source="OneFieldOfMany"/> in my schema (there will be over 1000 of them > and > >> maybe more so managing it will be a pain). Instead, I want to use SolrJ > >> API to do what does. Any example of how I can do this? If > >> there is an example online, that would be great. > >> > >> Thanks in advance. > >> > >> Steven > >> > >
Re: NPE Issue with atomic update to nested document or child document through SolrJ
Thanks for your reply Alexandre. I have "_root_" and "_nest_path_" fields in my schema but not "_nest_parent_". I ran my test after adding the "_nest_parent_" field and I am not getting NPE any more which is good. Thanks! But looking at the documents in the index, I see that after the atomic update, now there are two children documents with the same id. One document has old values and another one has new values. Shouldn't they be merged based on the "id"? Do we need to specify anything else in the request to ensure that documents are merged/updated and not duplicated? For your reference, below is the test I am running now. // update field of one child doc SolrInputDocument sdoc = new SolrInputDocument( ); sdoc.addField( "id", testChildPOJO.id() ); sdoc.addField( "conceptid", testChildPOJO.conceptid() ); sdoc.addField( "storeid", "foo" ); sdoc.setField( "fieldName", java.util.Collections.singletonMap("set", Collections.list("bar" ) )); final UpdateRequest req = new UpdateRequest(); req.withRoute( pojo1.id() );// parent id req.add(sdoc); collection.client.request( req, collection.getCollectionName() ); collection.client.commit(); Resulting documents : {id=c1_child1, conceptid=c1, storeid=s1, fieldName=c1_child1_field_value1, startTime=Mon Sep 07 12:40:37 EDT 2020, integerField_iDF=10, booleanField_bDF=true, _root_=abcd, _version_=1678099970090074112} {id=c1_child1, conceptid=c1, storeid=foo, fieldName=bar, startTime=Mon Sep 07 12:40:37 EDT 2020, integerField_iDF=10, booleanField_bDF=true, _root_=abcd, _version_=1678099970405695488} On Thu, Sep 17, 2020 at 12:01 PM Alexandre Rafalovitch wrote: > Can you double-check your schema to see if you have all the fields > required to support nested documents. You are supposed to get away > with just _root_, but really you should also include _nest_path and > _nest_parent_. Your particular exception seems to be triggering > something (maybe a bug) related to - possibly - missing _nest_path_ > field. > > See: > https://lucene.apache.org/solr/guide/8_5/indexing-nested-documents.html#indexing-nested-documents > > Regards, >Alex. > > On Wed, 16 Sep 2020 at 13:28, Pratik Patel wrote: > > > > Hello Everyone, > > > > I am trying to update a field of a child document using atomic updates > > feature. I am using solr and solrJ version 8.5.0 > > > > I have ensured that my schema satisfies the conditions for atomic updates > > and I am able to do atomic updates on normal documents but with nested > > child documents, I am getting a Null Pointer Exception. Following is the > > simple test which I am trying. > > > > TestPojo pojo1 = new TestPojo().cId( "abcd" ) > > > .conceptid( "c1" ) > > > .storeid( storeId ) > > > .testChildPojos( > > > Collections.list( testChildPOJO, testChildPOJO2, > > > > testChildPOJO3 ) > > > ); > > > TestChildPOJOtestChildPOJO = new TestChildPOJO().cId( > > > "c1_child1" ) > > > .conceptid( "c1" > ) > > > .storeid( > storeId ) > > > .fieldName( > > > "c1_child1_field_value1" ) > > > .startTime( > > > Date.from( now.minus( 10, ChronoUnit.DAYS ) ) ) > > > > .integerField_iDF( > > > 10 ) > > > > > > .booleanField_bDF(true); > > > // index pojo1 with child testChildPOJO > > > SolrInputDocument sdoc = new SolrInputDocument(); > > > sdoc.addField( "_route_", pojo1.cId() ); > > > sdoc.addField( "id", testChildPOJO.cId() ); > > > sdoc.addField( "conceptid", testChildPOJO.conceptid() ); > > > sdoc.addField( "storeid", testChildPOJO.cId() ); > > > sdoc.setField( "fieldName", java.util.Collections.singletonMap("set", > > > Collections.list(testChildPOJO.fieldName() + postfix) ) ); // modify > field > > > "fieldName" > > > collection.client.add( sdoc ); // results in NPE! > > > > > > Stack Trace: > > > > ERROR org.apache.solr.client.solrj.impl.BaseCloudSolrClient - Request to > > > collection [collectionTest2] failed due to (500) > > > org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: > Error > > > from server at > > > http://172.15.1.100:8081/solr/collectionTest2_shard1_replica_n1: > > > java.lang.NullPointerException > > > at > > > > org.apache.solr.update.processor.AtomicUpdateDocumentMerger.getFieldFromHierarchy(AtomicUpdateDocumentMerger.java:308) > > > at > > > > org.apache.solr.update.processor.AtomicUpdateDocumentMerger.mergeChildDoc(AtomicUpdateDocumentMerger.java:405) > > > at > > > > org.apache.solr.update.processor.DistributedUpdateProcessor.getUpdatedDocument(DistributedUpdateProces
Re: Help using Noggit for streaming JSON data
See this method: /** Reads a JSON string into the output, decoding any escaped characters. */ public void getString(CharArr output) throws IOException And then the idea is to create a subclass of CharArr to incrementally handle the string that is written to it. You could overload write methods, or perhaps reserve() to flush/handle the buffer when it reaches a certain size. -Yonik On Thu, Sep 17, 2020 at 11:48 AM Christopher Schultz < ch...@christopherschultz.net> wrote: > All, > > Is this an appropriate forum for asking questions about how to use > Noggit? The Github doesn't have any discussions available and filing an > "issue" to ask a question is kinda silly. I'm happy to be redirected to > the right place if this isn't appropriate. > > I've been able to figure out most things in Noggit by reading the code, > but I have a new use-case where I expect that I'll have very large > values (base64-encoded binary) and I'd like to stream those rather than > calling parser.getString() and getting a potentially huge string coming > back. I'm streaming into a database so I never need the whole string in > one place at one time. > > I was thinking something like this: > > JSONParser p = ...; > > int evt = p.nextEvent(); > if(JSONParser.STRING == evt) { > // Start streaming > boolean eos = false; > while(!eos) { > char c = p.getChar(); > if(c == '"') { > eos = true; > } else { > append to stream > } > } > } > > But getChar() is not public. The only "documentation" I've really been > able to find for Noggit is this post from Yonic back in 2014: > > http://yonik.com/noggit-json-parser/ > > It mostly says "Noggit is great!" and specifically mentions huge, long > strings but does not actually show any Java code to consume the JSON > data in any kind of streaming way. > > The ObjectBuilder class is a great user of JSONParser, but it just > builds standard objects and would consume tons of memory in my case. > > I know for sure that Solr consumes huge JSON documents and I'm assuming > that Noggit is being used in that situation, though I have not looked at > the code used to do that. > > Any suggestions? > > -chris >
Re: NPE Issue with atomic update to nested document or child document through SolrJ
Can you double-check your schema to see if you have all the fields required to support nested documents. You are supposed to get away with just _root_, but really you should also include _nest_path and _nest_parent_. Your particular exception seems to be triggering something (maybe a bug) related to - possibly - missing _nest_path_ field. See: https://lucene.apache.org/solr/guide/8_5/indexing-nested-documents.html#indexing-nested-documents Regards, Alex. On Wed, 16 Sep 2020 at 13:28, Pratik Patel wrote: > > Hello Everyone, > > I am trying to update a field of a child document using atomic updates > feature. I am using solr and solrJ version 8.5.0 > > I have ensured that my schema satisfies the conditions for atomic updates > and I am able to do atomic updates on normal documents but with nested > child documents, I am getting a Null Pointer Exception. Following is the > simple test which I am trying. > > TestPojo pojo1 = new TestPojo().cId( "abcd" ) > > .conceptid( "c1" ) > > .storeid( storeId ) > > .testChildPojos( > > Collections.list( testChildPOJO, testChildPOJO2, > > testChildPOJO3 ) > > ); > > TestChildPOJOtestChildPOJO = new TestChildPOJO().cId( > > "c1_child1" ) > > .conceptid( "c1" ) > > .storeid( storeId ) > > .fieldName( > > "c1_child1_field_value1" ) > > .startTime( > > Date.from( now.minus( 10, ChronoUnit.DAYS ) ) ) > > .integerField_iDF( > > 10 ) > > > > .booleanField_bDF(true); > > // index pojo1 with child testChildPOJO > > SolrInputDocument sdoc = new SolrInputDocument(); > > sdoc.addField( "_route_", pojo1.cId() ); > > sdoc.addField( "id", testChildPOJO.cId() ); > > sdoc.addField( "conceptid", testChildPOJO.conceptid() ); > > sdoc.addField( "storeid", testChildPOJO.cId() ); > > sdoc.setField( "fieldName", java.util.Collections.singletonMap("set", > > Collections.list(testChildPOJO.fieldName() + postfix) ) ); // modify field > > "fieldName" > > collection.client.add( sdoc ); // results in NPE! > > > Stack Trace: > > ERROR org.apache.solr.client.solrj.impl.BaseCloudSolrClient - Request to > > collection [collectionTest2] failed due to (500) > > org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error > > from server at > > http://172.15.1.100:8081/solr/collectionTest2_shard1_replica_n1: > > java.lang.NullPointerException > > at > > org.apache.solr.update.processor.AtomicUpdateDocumentMerger.getFieldFromHierarchy(AtomicUpdateDocumentMerger.java:308) > > at > > org.apache.solr.update.processor.AtomicUpdateDocumentMerger.mergeChildDoc(AtomicUpdateDocumentMerger.java:405) > > at > > org.apache.solr.update.processor.DistributedUpdateProcessor.getUpdatedDocument(DistributedUpdateProcessor.java:711) > > at > > org.apache.solr.update.processor.DistributedUpdateProcessor.doVersionAdd(DistributedUpdateProcessor.java:374) > > at > > org.apache.solr.update.processor.DistributedUpdateProcessor.lambda$versionAdd$0(DistributedUpdateProcessor.java:339) > > at org.apache.solr.update.VersionBucket.runWithLock(VersionBucket.java:50) > > at > > org.apache.solr.update.processor.DistributedUpdateProcessor.versionAdd(DistributedUpdateProcessor.java:339) > > at > > org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:225) > > at > > org.apache.solr.update.processor.DistributedZkUpdateProcessor.processAdd(DistributedZkUpdateProcessor.java:245) > > at > > org.apache.solr.update.processor.LogUpdateProcessorFactory$LogUpdateProcessor.processAdd(LogUpdateProcessorFactory.java:103) > > at > > org.apache.solr.handler.loader.JavabinLoader$1.update(JavabinLoader.java:110) > > at > > org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$StreamingCodec.readOuterMostDocIterator(JavaBinUpdateRequestCodec.java:332) > > at > > org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$StreamingCodec.readIterator(JavaBinUpdateRequestCodec.java:281) > > at > > org.apache.solr.common.util.JavaBinCodec.readObject(JavaBinCodec.java:338) > > at org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:283) > > at > > org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$StreamingCodec.readNamedList(JavaBinUpdateRequestCodec.java:236) > > at > > org.apache.solr.common.util.JavaBinCodec.readObject(JavaBinCodec.java:303) > > at org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:283) > > at > > org.apache.solr.common.util.JavaBinCodec.unmarshal(JavaBinCodec.java:196) > > at > > org.apache.solr.client.solrj.request.JavaBi
Help using Noggit for streaming JSON data
All, Is this an appropriate forum for asking questions about how to use Noggit? The Github doesn't have any discussions available and filing an "issue" to ask a question is kinda silly. I'm happy to be redirected to the right place if this isn't appropriate. I've been able to figure out most things in Noggit by reading the code, but I have a new use-case where I expect that I'll have very large values (base64-encoded binary) and I'd like to stream those rather than calling parser.getString() and getting a potentially huge string coming back. I'm streaming into a database so I never need the whole string in one place at one time. I was thinking something like this: JSONParser p = ...; int evt = p.nextEvent(); if(JSONParser.STRING == evt) { // Start streaming boolean eos = false; while(!eos) { char c = p.getChar(); if(c == '"') { eos = true; } else { append to stream } } } But getChar() is not public. The only "documentation" I've really been able to find for Noggit is this post from Yonic back in 2014: http://yonik.com/noggit-json-parser/ It mostly says "Noggit is great!" and specifically mentions huge, long strings but does not actually show any Java code to consume the JSON data in any kind of streaming way. The ObjectBuilder class is a great user of JSONParser, but it just builds standard objects and would consume tons of memory in my case. I know for sure that Solr consumes huge JSON documents and I'm assuming that Noggit is being used in that situation, though I have not looked at the code used to do that. Any suggestions? -chris
Re: NPE Issue with atomic update to nested document or child document through SolrJ
Following are the approaches I have tried so far and both results in NPE. *approach 1 TestChildPOJO testChildPOJO = new TestChildPOJO().cId( "c1_child1" ) .conceptid( "c1" ) .storeid( storeId ) .fieldName( "c1_child1_field_value1" ) .startTime( Date.from( now.minus( 10, ChronoUnit.DAYS ) ) ) .integerField_iDF( 10 ) .booleanField_bDF(true); TestPojo pojo1 = new TestPojo().cId( "abcd" ) .conceptid( "c1" ) .storeid( storeId ) .testChildPojos( Collections.list( testChildPOJO, testChildPOJO2, testChildPOJO3 ) ); // index pojo1 with child testChildPOJO SolrInputDocument sdoc = new SolrInputDocument(); sdoc.addField( "_route_", pojo1.cId() ); sdoc.addField( "id", testChildPOJO.cId() ); sdoc.addField( "conceptid", testChildPOJO.conceptid() ); sdoc.addField( "storeid", testChildPOJO.cId() ); sdoc.setField( "fieldName", java.util.Collections.singletonMap("set", Collections.list(testChildPOJO.fieldName() + postfix) ) ); // modify field "fieldName" collection.client.add( sdoc ); // results in NPE! *approach 1 *approach 2 SolrInputDocument sdoc = new SolrInputDocument( ); sdoc.addField( "id", testChildPOJO.id() ); sdoc.setField( "fieldName", java.util.Collections.singletonMap("set", testChildPOJO.fieldName() + postfix) ); final UpdateRequest req = new UpdateRequest(); req.withRoute( pojo1.id() ); req.add(sdoc); collection.client.request( req, collection.getCollectionName() ); req.commit( collection.client, collection.getCollectionName()); *approach 2 -- Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html
Re: Need to update SOLR_HOME in the solr service script and getting errors
On Wed, Sep 16, 2020 at 02:59:32PM +, Victor Kretzer wrote: > My setup is two solr nodes running on separate Azure Ubuntu 18.04 LTS vms > using an external zookeeper assembly. > I installed Solr 6.6.6 using the install file and then followed the steps for > enabling ssl. I am able to start solr, add collections and the like using > bin/solr script. > > Example: > /opt/solr$ sudo bin/solr start -cloud -s cloud/test2 -force > > However, if I restart the machine or attempt to start solr using the > installed service, it naturally goes back to the default SOLR_HOME in the > /etc/default/solr.in.sh script: "/var/solr/data" > > I've tried updating SOLR_HOME to "/opt/solr/cloud/test2" That is what I would do. > but then when I start the service I see the following error on the Admin > Dashboard: > SolrCore Initialization Failures > mycollection_shard1_replica1: > org.apache.solr.common.SolrException:org.apache.solr.common.SolrException: > /opt/solr-6.6.6/cloud/test2/mycollection_shard1_replica1/data/index/write.lock > Please check your logs for more information > > I'm including what I believe to be the pertinent information from the logs > below: You did well. > I suspect this is a permission issue because the solr user created by the > install script isn't allowed access to /opt/solr but I'm new to Linux and > haven't completely wrapped my head around the way permissions work with it. > Am I correct in guessing the cause of the error and, if so, how do I correct > this so that the service can be used to run my instances? Yes, the stack trace actually tells you explicitly that the problem is permissions on that file. Follow the chain of "Caused by:" and you'll see: Caused by: java.nio.file.AccessDeniedException: /opt/solr-6.6.6/cloud/test2/mycollection_shard1_replica1/data/index/write.lock Since, in the past, you have started Solr using 'sudo', this probably means that write.lock is owned by 'root'. Solr creates this file with permissions that allow only the owner to write it. If the service script runs Solr as any other user (and it should!) then Solr won't be able to open this file for writing, and because of this it won't complete the loading of that core. You should find out what user account is used by the service script, and 'chown' Solr's entire working directories tree to be owned by that user. Then, refrain from ever running Solr as 'root' or the problem may recur. Use the normal service start/stop mechanism for controlling your Solr instances. -- Mark H. Wood Lead Technology Analyst University Library Indiana University - Purdue University Indianapolis 755 W. Michigan Street Indianapolis, IN 46202 317-274-0749 www.ulib.iupui.edu signature.asc Description: PGP signature
Re: Unable to create core Solr 8.6.2
Look in your solr log, there’s usually a more detailed message > On Sep 17, 2020, at 9:35 AM, Anuj Bhargava wrote: > > Getting the following error message while trying to create core > > # sudo su - solr -c "/opt/solr/bin/solr create_core -c 9lives" > WARNING: Using _default configset with data driven schema functionality. > NOT RECOMMENDED for production use. > To turn off: bin/solr config -c 9lives -p 8984 -action > set-user-property -property update.autoCreateFields -valuefalse > > ERROR: Parse error : > > > Error 401 Unauthorized > > HTTP ERROR 401 > Problem accessing /solr/admin/info/system. Reason: > Unauthorized > >
Unable to create core Solr 8.6.2
Getting the following error message while trying to create core # sudo su - solr -c "/opt/solr/bin/solr create_core -c 9lives" WARNING: Using _default configset with data driven schema functionality. NOT RECOMMENDED for production use. To turn off: bin/solr config -c 9lives -p 8984 -action set-user-property -property update.autoCreateFields -valuefalse ERROR: Parse error : Error 401 Unauthorized HTTP ERROR 401 Problem accessing /solr/admin/info/system. Reason: Unauthorized
Re: Doing what does using SolrJ API
“there over 1000 of them[fields]” This is often a red flag in my experience. Solr will handle that many fields, I’ve seen many more. But this is often a result of “database thinking”, i.e. your mental model of how all this data is from a DB perspective rather than a search perspective. It’s unwieldy to have that many fields. Obviously I don’t know the particulars of your app, and maybe that’s the best design. Particularly if many of the fields are sparsely populated, i.e. only a small percentage of the documents in your corpus have any value for that field then taking a step back and looking at the design might save you some grief down the line. For instance, I’ve seen designs where instead of field1:some_value field2:other_value…. you use a single field with _tokens_ like: field:field1_some_value field:field2_other_value that drops the complexity and increases performance. Anyway, just a thought you might want to consider. Best, Erick > On Sep 16, 2020, at 9:31 PM, Steven White wrote: > > Hi everyone, > > I figured it out. It is as simple as creating a List and using > that as the value part for SolrInputDocument.addField() API. > > Thanks, > > Steven > > > On Wed, Sep 16, 2020 at 9:13 PM Steven White wrote: > >> Hi everyone, >> >> I want to avoid creating a > source="OneFieldOfMany"/> in my schema (there will be over 1000 of them and >> maybe more so managing it will be a pain). Instead, I want to use SolrJ >> API to do what does. Any example of how I can do this? If >> there is an example online, that would be great. >> >> Thanks in advance. >> >> Steven >>
Re: Fetched but not Added Solr 8.6.2
SolrWriter Error creating document : On Thu, 17 Sep 2020 at 15:53, Jörn Franke wrote: > Log file will tell you the issue. > > > Am 17.09.2020 um 10:54 schrieb Anuj Bhargava : > > > > We just installed Solr 8.6.2 > > It is fetching the data but not adding > > > > Indexing completed. *Added/Updated: 0 *documents. Deleted 0 documents. > > (Duration: 06s) > > Requests: 1 ,* Fetched: 100* 17/s, Skipped: 0 , Processed: 0 > > > > The *data-config.xml* > > > > > > >driver="com.mysql.jdbc.Driver" > >batchSize="-1" > >autoReconnect="true" > >socketTimeout="0" > >connectTimeout="0" > >encoding="UTF-8" > >url="jdbc:mysql://zeroDateTimeBehavior=convertToNull" > >user="xxx" > >password="xxx"/> > > > > >deltaQuery="select posting_id from countries where > > last_modified > '${dataimporter.last_index_time}'"> > > > > > > >
Re: Fetched but not Added Solr 8.6.2
Log file will tell you the issue. > Am 17.09.2020 um 10:54 schrieb Anuj Bhargava : > > We just installed Solr 8.6.2 > It is fetching the data but not adding > > Indexing completed. *Added/Updated: 0 *documents. Deleted 0 documents. > (Duration: 06s) > Requests: 1 ,* Fetched: 100* 17/s, Skipped: 0 , Processed: 0 > > The *data-config.xml* > > >driver="com.mysql.jdbc.Driver" >batchSize="-1" >autoReconnect="true" >socketTimeout="0" >connectTimeout="0" >encoding="UTF-8" >url="jdbc:mysql://zeroDateTimeBehavior=convertToNull" >user="xxx" >password="xxx"/> > >deltaQuery="select posting_id from countries where > last_modified > '${dataimporter.last_index_time}'"> > > >
Fetched but not Added Solr 8.6.2
We just installed Solr 8.6.2 It is fetching the data but not adding Indexing completed. *Added/Updated: 0 *documents. Deleted 0 documents. (Duration: 06s) Requests: 1 ,* Fetched: 100* 17/s, Skipped: 0 , Processed: 0 The *data-config.xml*