Re: Solr training

2020-09-17 Thread matthew sporleder
Is there a friends-on-the-mailing list discount?  I had a bit of sticker shock!

On Wed, Sep 16, 2020 at 9:38 AM Charlie Hull  wrote:
>
> I do of course mean 'Group Discounts': you don't get a discount for
> being in a 'froup' sadly (I wasn't even aware that was a thing!)
>
> Charlie
>
> On 16/09/2020 13:26, Charlie Hull wrote:
> >
> > Hi all,
> >
> > We're running our SolrThink Like a Relevance Engineer training 6-9 Oct
> > - you can find out more & book tickets at
> > https://opensourceconnections.com/training/solr-think-like-a-relevance-engineer-tlre/
> >
> > The course is delivered over 4 half-days from 9am EST / 2pm BST / 3pm
> > CET and is led by Eric Pugh who co-wrote the first book on Solr and is
> > a Solr Committer. It's suitable for all members of the search team -
> > search engineers, data scientists, even product owners who want to
> > know how Solr search can be measured & tuned. Delivered by working
> > relevance engineers the course features practical exercises and will
> > give you a great foundation in how to use Solr to build great search.
> >
> > Tthe early bird discount expires end of this week so do book soon if
> > you're interested! Froup discounts also available. We're also running
> > a more advanced course on Learning to Rank a couple of weeks later -
> > you can find all our training courses and dates at
> > https://opensourceconnections.com/training/
> >
> > Cheers
> >
> > Charlie
> >
> > --
> > Charlie Hull
> > OpenSource Connections, previously Flax
> >
> > tel/fax: +44 (0)8700 118334
> > mobile:  +44 (0)7767 825828
> > web:www.o19s.com
>
>
> --
> Charlie Hull
> OpenSource Connections, previously Flax
>
> tel/fax: +44 (0)8700 118334
> mobile:  +44 (0)7767 825828
> web: www.o19s.com
>


Re: How to remove duplicate tokens from solr

2020-09-17 Thread Rajdeep Sahoo
But not sure why these type of search string is causing high cpu
utilization.

On Fri, 18 Sep, 2020, 12:49 am Rahul Goswami,  wrote:

> Is this for a phrase search? If yes then the position of the token would
> matter too and not sure which token would you want to remove. "eg
> "tshirt hat tshirt".
> Also, are you looking to save space and want this at index time? Or just
> want to remove duplicates from the search string?
>
> If this is at search time AND is not a phrase search, there are a couple
> approaches I could think of :
>
> 1) You could either handle this in the application layer to only pass the
> deduplicated string before it hits solr
> 2) You can write a custom search component and configure it in the
>   list to process the search string and remove duplicates
> before it hits the default search components. See here (
>
> https://lucene.apache.org/solr/guide/7_7/requesthandlers-and-searchcomponents-in-solrconfig.html#first-components-and-last-components
> ).
>
> However if for search, I would still evaluate if writing those extra lines
> of code is worth the investment. I say so since my assumption is that for
> duplicated tokens in search string, lucene would have the intelligence to
> not fetch the doc ids again, so you should not be worried about spending
> computation resources to reevaluate the same tokens (Someone correct me if
> I am wrong!)
>
> -Rahul
>
> On Thu, Sep 17, 2020 at 2:56 PM Rajdeep Sahoo 
> wrote:
>
> > If someone is searching with " tshirt tshirt tshirt tshirt tshirt tshirt"
> > we need to remove the duplicates and search with tshirt.
> >
> >
> > On Fri, 18 Sep, 2020, 12:19 am Alexandre Rafalovitch, <
> arafa...@gmail.com>
> > wrote:
> >
> > > This is not quite enough information.
> > > There is
> > >
> >
> https://lucene.apache.org/solr/guide/8_6/filter-descriptions.html#remove-duplicates-token-filter
> > > but it has specific limitations.
> > >
> > > What is the problem that you are trying to solve that you feel is due
> > > to duplicate tokens? Why are they duplicates? Is it about storage or
> > > relevancy?
> > >
> > > Regards,
> > >Alex.
> > >
> > > On Thu, 17 Sep 2020 at 14:35, Rajdeep Sahoo <
> rajdeepsahoo2...@gmail.com>
> > > wrote:
> > > >
> > > > Hi team,
> > > >  Is there any way to remove duplicate tokens from solr. Is there any
> > > filter
> > > > for this.
> > >
> >
>


Re: How to remove duplicate tokens from solr

2020-09-17 Thread Rahul Goswami
Is this for a phrase search? If yes then the position of the token would
matter too and not sure which token would you want to remove. "eg
"tshirt hat tshirt".
Also, are you looking to save space and want this at index time? Or just
want to remove duplicates from the search string?

If this is at search time AND is not a phrase search, there are a couple
approaches I could think of :

1) You could either handle this in the application layer to only pass the
deduplicated string before it hits solr
2) You can write a custom search component and configure it in the
  list to process the search string and remove duplicates
before it hits the default search components. See here (
https://lucene.apache.org/solr/guide/7_7/requesthandlers-and-searchcomponents-in-solrconfig.html#first-components-and-last-components
).

However if for search, I would still evaluate if writing those extra lines
of code is worth the investment. I say so since my assumption is that for
duplicated tokens in search string, lucene would have the intelligence to
not fetch the doc ids again, so you should not be worried about spending
computation resources to reevaluate the same tokens (Someone correct me if
I am wrong!)

-Rahul

On Thu, Sep 17, 2020 at 2:56 PM Rajdeep Sahoo 
wrote:

> If someone is searching with " tshirt tshirt tshirt tshirt tshirt tshirt"
> we need to remove the duplicates and search with tshirt.
>
>
> On Fri, 18 Sep, 2020, 12:19 am Alexandre Rafalovitch, 
> wrote:
>
> > This is not quite enough information.
> > There is
> >
> https://lucene.apache.org/solr/guide/8_6/filter-descriptions.html#remove-duplicates-token-filter
> > but it has specific limitations.
> >
> > What is the problem that you are trying to solve that you feel is due
> > to duplicate tokens? Why are they duplicates? Is it about storage or
> > relevancy?
> >
> > Regards,
> >Alex.
> >
> > On Thu, 17 Sep 2020 at 14:35, Rajdeep Sahoo 
> > wrote:
> > >
> > > Hi team,
> > >  Is there any way to remove duplicate tokens from solr. Is there any
> > filter
> > > for this.
> >
>


RE: Need to update SOLR_HOME in the solr service script and getting errors

2020-09-17 Thread Victor Kretzer
Hi Mark. 

Thanks for taking the time to explain it so clearly. It makes perfect sense to 
me now and using chown solved the problem. Thanks again and have a great day.

Victor


-Original Message-
From: Mark H. Wood  
Sent: Thursday, September 17, 2020 9:59 AM
To: solr-user@lucene.apache.org
Subject: Re: Need to update SOLR_HOME in the solr service script and getting 
errors

On Wed, Sep 16, 2020 at 02:59:32PM +, Victor Kretzer wrote:
> My setup is two solr nodes running on separate Azure Ubuntu 18.04 LTS vms 
> using an external zookeeper assembly.
> I installed Solr 6.6.6 using the install file and then followed the steps for 
> enabling ssl. I am able to start solr, add collections and the like using 
> bin/solr script.
> 
> Example:
> /opt/solr$ sudo bin/solr start -cloud -s cloud/test2 -force
> 
> However, if I restart the machine or attempt to start solr using the 
> installed service, it naturally goes back to the default SOLR_HOME in the 
> /etc/default/solr.in.sh script: "/var/solr/data"
> 
> I've tried updating SOLR_HOME to "/opt/solr/cloud/test2"

That is what I would do.

> but then when I start the service I see the following error on the Admin 
> Dashboard:
> SolrCore Initialization Failures
> mycollection_shard1_replica1: 
> org.apache.solr.common.SolrException:org.apache.solr.common.SolrExcept
> ion: 
> /opt/solr-6.6.6/cloud/test2/mycollection_shard1_replica1/data/index/wr
> ite.lock Please check your logs for more information
> 
> I'm including what I believe to be the pertinent information from the logs 
> below:

You did well.

> I suspect this is a permission issue because the solr user created by the 
> install script isn't allowed access to  /opt/solr but I'm new to Linux and 
> haven't completely wrapped my head around the way permissions work with it. 
> Am I correct in guessing the cause of the error and, if so, how do I correct 
> this so that the service can be used to run my instances?

Yes, the stack trace actually tells you explicitly that the problem is 
permissions on that file.  Follow the chain of "Caused by:" and you'll see:

  Caused by: java.nio.file.AccessDeniedException: 
/opt/solr-6.6.6/cloud/test2/mycollection_shard1_replica1/data/index/write.lock

Since, in the past, you have started Solr using 'sudo', this probably means 
that write.lock is owned by 'root'.  Solr creates this file with permissions 
that allow only the owner to write it.  If the service script runs Solr as any 
other user (and it should!) then Solr won't be able to open this file for 
writing, and because of this it won't complete the loading of that core.

You should find out what user account is used by the service script, and 
'chown' Solr's entire working directories tree to be owned by that user.  Then, 
refrain from ever running Solr as 'root' or the problem may recur.  Use the 
normal service start/stop mechanism for controlling your Solr instances.

--
Mark H. Wood
Lead Technology Analyst

University Library
Indiana University - Purdue University Indianapolis
755 W. Michigan Street
Indianapolis, IN 46202
317-274-0749
www.ulib.iupui.edu


Re: NPE Issue with atomic update to nested document or child document through SolrJ

2020-09-17 Thread Alexandre Rafalovitch
The missing underscore is a documentation bug, because it was not
escaped the second time and the asciidoc chewed it up as an
bold/italic indicator. The declaration and references should match.

I am not sure about the code. I hope somebody else will step in on that part.

Regards,
   Alex.

On Thu, 17 Sep 2020 at 14:48, Pratik Patel  wrote:
>
> I am running this in a unit test which deletes the collection after the
> test is over. So every new test run gets a fresh collection.
>
> It is a very simple test where I am first indexing a couple of parent
> documents with few children and then testing an atomic update on one parent
> as I have posted in my previous message. (using UpdateRequest)
>
> I am not sure if I am triggering the atomic update correctly, do you see
> any potential issue in that code?
>
> I noticed something in the documentation here.
> https://lucene.apache.org/solr/guide/8_5/indexing-nested-documents.html#indexing-nested-documents
>
>   name="_nest_path_" type="*nest_path*" />
>
> field_type is declared with name *"_nest_path_"* whereas field is declared
> with type *"nest_path". *
>
> Is this intentional? or should it be as follows?
>
>   name="_nest_path_" type="* _nest_path_ *" />
>
> Also, should we explicitly set index=true and store=true on _nest_path_
> and _nest_parent_ fields?
>
>
>
> On Thu, Sep 17, 2020 at 1:17 PM Alexandre Rafalovitch 
> wrote:
>
> > Did you reindex the original document after you added a new field? If
> > not, then the previously indexed content is missing it and your code
> > paths will get out of sync.
> >
> > Regards,
> >Alex.
> > P.s. I haven't done what you are doing before, so there may be
> > something I am missing myself.
> >
> >
> > On Thu, 17 Sep 2020 at 12:46, Pratik Patel  wrote:
> > >
> > > Thanks for your reply Alexandre.
> > >
> > > I have "_root_" and "_nest_path_" fields in my schema but not
> > > "_nest_parent_".
> > >
> > >
> > > 
> > > 
> > >  > > docValues="false" />
> > > 
> > >  > > name="_nest_path_" class="solr.NestPathField" />
> > >
> > > I ran my test after adding the "_nest_parent_" field and I am not getting
> > > NPE any more which is good. Thanks!
> > >
> > > But looking at the documents in the index, I see that after the atomic
> > > update, now there are two children documents with the same id. One
> > document
> > > has old values and another one has new values. Shouldn't they be merged
> > > based on the "id"? Do we need to specify anything else in the request to
> > > ensure that documents are merged/updated and not duplicated?
> > >
> > > For your reference, below is the test I am running now.
> > >
> > > // update field of one child doc
> > > SolrInputDocument sdoc = new SolrInputDocument(  );
> > > sdoc.addField( "id", testChildPOJO.id() );
> > > sdoc.addField( "conceptid", testChildPOJO.conceptid() );
> > > sdoc.addField( "storeid", "foo" );
> > > sdoc.setField( "fieldName",
> > > java.util.Collections.singletonMap("set", Collections.list("bar" ) ));
> > >
> > > final UpdateRequest req = new UpdateRequest();
> > > req.withRoute( pojo1.id() );// parent id
> > > req.add(sdoc);
> > >
> > > collection.client.request( req,
> > collection.getCollectionName()
> > > );
> > > collection.client.commit();
> > >
> > >
> > > Resulting documents :
> > >
> > > {id=c1_child1, conceptid=c1, storeid=s1,
> > fieldName=c1_child1_field_value1,
> > > startTime=Mon Sep 07 12:40:37 EDT 2020, integerField_iDF=10,
> > > booleanField_bDF=true, _root_=abcd, _version_=1678099970090074112}
> > > {id=c1_child1, conceptid=c1, storeid=foo, fieldName=bar, startTime=Mon
> > Sep
> > > 07 12:40:37 EDT 2020, integerField_iDF=10, booleanField_bDF=true,
> > > _root_=abcd, _version_=1678099970405695488}
> > >
> > >
> > >
> > >
> > >
> > >
> > > On Thu, Sep 17, 2020 at 12:01 PM Alexandre Rafalovitch <
> > arafa...@gmail.com>
> > > wrote:
> > >
> > > > Can you double-check your schema to see if you have all the fields
> > > > required to support nested documents. You are supposed to get away
> > > > with just _root_, but really you should also include _nest_path and
> > > > _nest_parent_. Your particular exception seems to be triggering
> > > > something (maybe a bug) related to - possibly - missing _nest_path_
> > > > field.
> > > >
> > > > See:
> > > >
> > https://lucene.apache.org/solr/guide/8_5/indexing-nested-documents.html#indexing-nested-documents
> > > >
> > > > Regards,
> > > >Alex.
> > > >
> > > > On Wed, 16 Sep 2020 at 13:28, Pratik Patel 
> > wrote:
> > > > >
> > > > > Hello Everyone,
> > > > >
> > > > > I am trying to update a field of a child document using atomic
> > updates
> > > > > feature. I am using solr and solrJ version 8.5.0
> > > > >
> > > > > I have ensured that my schema satisfies the conditions for atomic
> > updates
> > > > > and I am able to do atomic updates on normal documents but with
> 

Re: How to remove duplicate tokens from solr

2020-09-17 Thread Rajdeep Sahoo
If someone is searching with " tshirt tshirt tshirt tshirt tshirt tshirt"
we need to remove the duplicates and search with tshirt.


On Fri, 18 Sep, 2020, 12:19 am Alexandre Rafalovitch, 
wrote:

> This is not quite enough information.
> There is
> https://lucene.apache.org/solr/guide/8_6/filter-descriptions.html#remove-duplicates-token-filter
> but it has specific limitations.
>
> What is the problem that you are trying to solve that you feel is due
> to duplicate tokens? Why are they duplicates? Is it about storage or
> relevancy?
>
> Regards,
>Alex.
>
> On Thu, 17 Sep 2020 at 14:35, Rajdeep Sahoo 
> wrote:
> >
> > Hi team,
> >  Is there any way to remove duplicate tokens from solr. Is there any
> filter
> > for this.
>


Re: How to remove duplicate tokens from solr

2020-09-17 Thread Alexandre Rafalovitch
This is not quite enough information.
There is 
https://lucene.apache.org/solr/guide/8_6/filter-descriptions.html#remove-duplicates-token-filter
but it has specific limitations.

What is the problem that you are trying to solve that you feel is due
to duplicate tokens? Why are they duplicates? Is it about storage or
relevancy?

Regards,
   Alex.

On Thu, 17 Sep 2020 at 14:35, Rajdeep Sahoo  wrote:
>
> Hi team,
>  Is there any way to remove duplicate tokens from solr. Is there any filter
> for this.


Re: NPE Issue with atomic update to nested document or child document through SolrJ

2020-09-17 Thread Pratik Patel
I am running this in a unit test which deletes the collection after the
test is over. So every new test run gets a fresh collection.

It is a very simple test where I am first indexing a couple of parent
documents with few children and then testing an atomic update on one parent
as I have posted in my previous message. (using UpdateRequest)

I am not sure if I am triggering the atomic update correctly, do you see
any potential issue in that code?

I noticed something in the documentation here.
https://lucene.apache.org/solr/guide/8_5/indexing-nested-documents.html#indexing-nested-documents

 

field_type is declared with name *"_nest_path_"* whereas field is declared
with type *"nest_path". *

Is this intentional? or should it be as follows?

 

Also, should we explicitly set index=true and store=true on _nest_path_
and _nest_parent_ fields?



On Thu, Sep 17, 2020 at 1:17 PM Alexandre Rafalovitch 
wrote:

> Did you reindex the original document after you added a new field? If
> not, then the previously indexed content is missing it and your code
> paths will get out of sync.
>
> Regards,
>Alex.
> P.s. I haven't done what you are doing before, so there may be
> something I am missing myself.
>
>
> On Thu, 17 Sep 2020 at 12:46, Pratik Patel  wrote:
> >
> > Thanks for your reply Alexandre.
> >
> > I have "_root_" and "_nest_path_" fields in my schema but not
> > "_nest_parent_".
> >
> >
> > 
> > 
> >  > docValues="false" />
> > 
> >  > name="_nest_path_" class="solr.NestPathField" />
> >
> > I ran my test after adding the "_nest_parent_" field and I am not getting
> > NPE any more which is good. Thanks!
> >
> > But looking at the documents in the index, I see that after the atomic
> > update, now there are two children documents with the same id. One
> document
> > has old values and another one has new values. Shouldn't they be merged
> > based on the "id"? Do we need to specify anything else in the request to
> > ensure that documents are merged/updated and not duplicated?
> >
> > For your reference, below is the test I am running now.
> >
> > // update field of one child doc
> > SolrInputDocument sdoc = new SolrInputDocument(  );
> > sdoc.addField( "id", testChildPOJO.id() );
> > sdoc.addField( "conceptid", testChildPOJO.conceptid() );
> > sdoc.addField( "storeid", "foo" );
> > sdoc.setField( "fieldName",
> > java.util.Collections.singletonMap("set", Collections.list("bar" ) ));
> >
> > final UpdateRequest req = new UpdateRequest();
> > req.withRoute( pojo1.id() );// parent id
> > req.add(sdoc);
> >
> > collection.client.request( req,
> collection.getCollectionName()
> > );
> > collection.client.commit();
> >
> >
> > Resulting documents :
> >
> > {id=c1_child1, conceptid=c1, storeid=s1,
> fieldName=c1_child1_field_value1,
> > startTime=Mon Sep 07 12:40:37 EDT 2020, integerField_iDF=10,
> > booleanField_bDF=true, _root_=abcd, _version_=1678099970090074112}
> > {id=c1_child1, conceptid=c1, storeid=foo, fieldName=bar, startTime=Mon
> Sep
> > 07 12:40:37 EDT 2020, integerField_iDF=10, booleanField_bDF=true,
> > _root_=abcd, _version_=1678099970405695488}
> >
> >
> >
> >
> >
> >
> > On Thu, Sep 17, 2020 at 12:01 PM Alexandre Rafalovitch <
> arafa...@gmail.com>
> > wrote:
> >
> > > Can you double-check your schema to see if you have all the fields
> > > required to support nested documents. You are supposed to get away
> > > with just _root_, but really you should also include _nest_path and
> > > _nest_parent_. Your particular exception seems to be triggering
> > > something (maybe a bug) related to - possibly - missing _nest_path_
> > > field.
> > >
> > > See:
> > >
> https://lucene.apache.org/solr/guide/8_5/indexing-nested-documents.html#indexing-nested-documents
> > >
> > > Regards,
> > >Alex.
> > >
> > > On Wed, 16 Sep 2020 at 13:28, Pratik Patel 
> wrote:
> > > >
> > > > Hello Everyone,
> > > >
> > > > I am trying to update a field of a child document using atomic
> updates
> > > > feature. I am using solr and solrJ version 8.5.0
> > > >
> > > > I have ensured that my schema satisfies the conditions for atomic
> updates
> > > > and I am able to do atomic updates on normal documents but with
> nested
> > > > child documents, I am getting a Null Pointer Exception. Following is
> the
> > > > simple test which I am trying.
> > > >
> > > > TestPojo  pojo1  = new TestPojo().cId( "abcd" )
> > > > >  .conceptid( "c1" )
> > > > >  .storeid( storeId
> )
> > > > >  .testChildPojos(
> > > > > Collections.list( testChildPOJO, testChildPOJO2,
> > > > >
> > > testChildPOJO3 )
> > > > > );
> > > > > TestChildPOJOtestChildPOJO = new
> TestChildPOJO().cId(
> > > > > "c1_child1" )
> > > > >

How to remove duplicate tokens from solr

2020-09-17 Thread Rajdeep Sahoo
Hi team,
 Is there any way to remove duplicate tokens from solr. Is there any filter
for this.


Re: Doing what does using SolrJ API

2020-09-17 Thread Steven White
Thank you all for your feedback.  They are very helpful.

@Walther, out of the 1000 fields in Solr's schema, only 5 are set as
"required" fields and the Solr doc that I create and then send to Solr for
indexing, contains only those fields that have data to be indexed.  So some
docs will have 10 fields, some 50, etc.

Steven

On Thu, Sep 17, 2020 at 1:55 PM Erick Erickson 
wrote:

> The script can actually be written an any number of scripting languages,
> python, groovy,
> javascript etc. but Alexandre’s comments about javascript are well taken.
>
> It all depends here on whether you every want to search the fields
> individually. If you do,
> you need to have them in your index as well as the copyField.
>
> > On Sep 17, 2020, at 1:37 PM, Walter Underwood 
> wrote:
> >
> > If you want to ignore a field being sent to Solr, you can set
> indexed=false and
> > stored=false for that field in schema.xml. It will take up room in
> schema.xml but
> > zero room on disk.
> >
> > wunder
> > Walter Underwood
> > wun...@wunderwood.org
> > http://observer.wunderwood.org/  (my blog)
> >
> >> On Sep 17, 2020, at 10:23 AM, Alexandre Rafalovitch 
> wrote:
> >>
> >> Solr has a whole pipeline that you can run during document ingesting
> before
> >> the actual indexing happens. It is called Update Request Processor (URP)
> >> and is defined in solrconfig.xml or in an override file. Obviously,
> since
> >> you are indexing from SolrJ client, you have even more flexibility, but
> it
> >> is good to know about anyway.
> >>
> >> You can read all about it at:
> >> https://lucene.apache.org/solr/guide/8_6/update-request-processors.html
> and
> >> see the extensive list of processors you can leverage. The specific
> >> mentioned one is this one:
> >>
> https://lucene.apache.org/solr/8_6_0//solr-core/org/apache/solr/update/processor/StatelessScriptUpdateProcessorFactory.html
> >>
> >> Just a word of warning that Stateless URP is using Javascript, which is
> >> getting a bit of a complicated story as underlying JVM is upgraded
> (Oracle
> >> dropped their javascript engine in JDK 14). So if one of the simpler
> URPs
> >> will do the job or a chain of them, that may be a better path to take.
> >>
> >> Regards,
> >>  Alex.
> >>
> >>
> >> On Thu, 17 Sep 2020 at 13:13, Steven White 
> wrote:
> >>
> >>> Thanks Erick.  Where can I learn more about "stateless script update
> >>> processor factory".  I don't know what you mean by this.
> >>>
> >>> Steven
> >>>
> >>> On Thu, Sep 17, 2020 at 1:08 PM Erick Erickson <
> erickerick...@gmail.com>
> >>> wrote:
> >>>
>  1000 fields is fine, you'll waste some cycles on bookkeeping, but I
> >>> really
>  doubt you'll notice. That said, are these fields used for searching?
>  Because you do have control over what gous into the index if you can
> put
> >>> a
>  "stateless script update processor factory" in your update chain.
> There
> >>> you
>  can do whatever you want, including combine all the fields into one
> and
>  delete the original fields. There's no point in having your index
> >>> cluttered
>  with unused fields, OTOH, it may not be worth the effort just to
> satisfy
> >>> my
>  sense of aesthetics 😉
> 
>  On Thu, Sep 17, 2020, 12:59 Steven White 
> wrote:
> 
> > Hi Eric,
> >
> > Yes, this is coming from a DB.  Unfortunately I have no control over
> >>> the
> > list of fields.  Out of the 1000 fields that there maybe, no
> document,
>  that
> > gets indexed into Solr will use more then about 50 and since i'm
> >>> copying
> > the values of those fields to the catch-all field and the catch-all
> >>> field
> > is my default search field, I don't expect any problem for having
> 1000
> > fields in Solr's schema, or should I?
> >
> > Thanks
> >
> > Steven
> >
> >
> > On Thu, Sep 17, 2020 at 8:23 AM Erick Erickson <
> >>> erickerick...@gmail.com>
> > wrote:
> >
> >> “there over 1000 of them[fields]”
> >>
> >> This is often a red flag in my experience. Solr will handle that
> many
> >> fields, I’ve seen many more. But this is often a result of
> >> “database thinking”, i.e. your mental model of how all this data
> >> is from a DB perspective rather than a search perspective.
> >>
> >> It’s unwieldy to have that many fields. Obviously I don’t know the
> >> particulars of
> >> your app, and maybe that’s the best design. Particularly if many of
> >>> the
> >> fields
> >> are sparsely populated, i.e. only a small percentage of the
> documents
>  in
> >> your
> >> corpus have any value for that field then taking a step back and
>  looking
> >> at the design might save you some grief down the line.
> >>
> >> For instance, I’ve seen designs where instead of
> >> field1:some_value
> >> field2:other_value….
> >>
> >> you use a single field with _tokens_ like:
> >> field:field1_some_value
> >> fie

Re: Handling failure when adding docs to Solr using SolrJ

2020-09-17 Thread Erick Erickson
I recommend _against_ issuing explicit commits from the client, let
your solrconfig.xml autocommit settings take care of it. Make sure
either your soft or hard commits open a new searcher for the docs
to be searchable.

I’ll bend a little bit if you can _guarantee_ that you only ever have one
indexing client running and basically only ever issue the commit at the
end.

There’s another strategy, do the solrClient.add() command with the
commitWithin parameter.

As far as failures, look at 
https://lucene.apache.org/solr/7_3_0/solr-core/org/apache/solr/update/processor/TolerantUpdateProcessor.html
that’ll give you a better clue about _which_ docs failed. From there, though,
it’s a bit if debugging to figure out why that particular doc failed, usually 
people
record the docs that failed for later analysis. and/or look at the Solr logs 
which
usually give a more detailed reason of _why_ a document failed...

Best,
Erick

> On Sep 17, 2020, at 1:09 PM, Steven White  wrote:
> 
> Hi everyone,
> 
> I'm trying to figure out when and how I should handle failures that may
> occur during indexing.  In the sample code below, look at my comment and
> let me know what state my index is in when things fail:
> 
>   SolrClient solrClient = new HttpSolrClient.Builder(url).build();
> 
>   solrClient.add(solrDocs);
> 
>   // #1: What to do if add() fails?  And how do I know if all or some of
> my docs in 'solrDocs' made it to the index or not ('solrDocs' is a list of
> 1 or more doc), should I retry add() again?  Retry with a smaller chunk?
> Etc.
> 
>   if (doCommit == true)
>   {
>  solrClient.commit();
> 
>   // #2: What to do if commit() fails?  Re-issue commit() again?
>   }
> 
> Thanks
> 
> Steven



Re: Doing what does using SolrJ API

2020-09-17 Thread Erick Erickson
The script can actually be written an any number of scripting languages, 
python, groovy,
javascript etc. but Alexandre’s comments about javascript are well taken.

It all depends here on whether you every want to search the fields 
individually. If you do,
you need to have them in your index as well as the copyField.

> On Sep 17, 2020, at 1:37 PM, Walter Underwood  wrote:
> 
> If you want to ignore a field being sent to Solr, you can set indexed=false 
> and 
> stored=false for that field in schema.xml. It will take up room in schema.xml 
> but
> zero room on disk.
> 
> wunder
> Walter Underwood
> wun...@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
> 
>> On Sep 17, 2020, at 10:23 AM, Alexandre Rafalovitch  
>> wrote:
>> 
>> Solr has a whole pipeline that you can run during document ingesting before
>> the actual indexing happens. It is called Update Request Processor (URP)
>> and is defined in solrconfig.xml or in an override file. Obviously, since
>> you are indexing from SolrJ client, you have even more flexibility, but it
>> is good to know about anyway.
>> 
>> You can read all about it at:
>> https://lucene.apache.org/solr/guide/8_6/update-request-processors.html and
>> see the extensive list of processors you can leverage. The specific
>> mentioned one is this one:
>> https://lucene.apache.org/solr/8_6_0//solr-core/org/apache/solr/update/processor/StatelessScriptUpdateProcessorFactory.html
>> 
>> Just a word of warning that Stateless URP is using Javascript, which is
>> getting a bit of a complicated story as underlying JVM is upgraded (Oracle
>> dropped their javascript engine in JDK 14). So if one of the simpler URPs
>> will do the job or a chain of them, that may be a better path to take.
>> 
>> Regards,
>>  Alex.
>> 
>> 
>> On Thu, 17 Sep 2020 at 13:13, Steven White  wrote:
>> 
>>> Thanks Erick.  Where can I learn more about "stateless script update
>>> processor factory".  I don't know what you mean by this.
>>> 
>>> Steven
>>> 
>>> On Thu, Sep 17, 2020 at 1:08 PM Erick Erickson 
>>> wrote:
>>> 
 1000 fields is fine, you'll waste some cycles on bookkeeping, but I
>>> really
 doubt you'll notice. That said, are these fields used for searching?
 Because you do have control over what gous into the index if you can put
>>> a
 "stateless script update processor factory" in your update chain. There
>>> you
 can do whatever you want, including combine all the fields into one and
 delete the original fields. There's no point in having your index
>>> cluttered
 with unused fields, OTOH, it may not be worth the effort just to satisfy
>>> my
 sense of aesthetics 😉
 
 On Thu, Sep 17, 2020, 12:59 Steven White  wrote:
 
> Hi Eric,
> 
> Yes, this is coming from a DB.  Unfortunately I have no control over
>>> the
> list of fields.  Out of the 1000 fields that there maybe, no document,
 that
> gets indexed into Solr will use more then about 50 and since i'm
>>> copying
> the values of those fields to the catch-all field and the catch-all
>>> field
> is my default search field, I don't expect any problem for having 1000
> fields in Solr's schema, or should I?
> 
> Thanks
> 
> Steven
> 
> 
> On Thu, Sep 17, 2020 at 8:23 AM Erick Erickson <
>>> erickerick...@gmail.com>
> wrote:
> 
>> “there over 1000 of them[fields]”
>> 
>> This is often a red flag in my experience. Solr will handle that many
>> fields, I’ve seen many more. But this is often a result of
>> “database thinking”, i.e. your mental model of how all this data
>> is from a DB perspective rather than a search perspective.
>> 
>> It’s unwieldy to have that many fields. Obviously I don’t know the
>> particulars of
>> your app, and maybe that’s the best design. Particularly if many of
>>> the
>> fields
>> are sparsely populated, i.e. only a small percentage of the documents
 in
>> your
>> corpus have any value for that field then taking a step back and
 looking
>> at the design might save you some grief down the line.
>> 
>> For instance, I’ve seen designs where instead of
>> field1:some_value
>> field2:other_value….
>> 
>> you use a single field with _tokens_ like:
>> field:field1_some_value
>> field:field2_other_value
>> 
>> that drops the complexity and increases performance.
>> 
>> Anyway, just a thought you might want to consider.
>> 
>> Best,
>> Erick
>> 
>>> On Sep 16, 2020, at 9:31 PM, Steven White 
> wrote:
>>> 
>>> Hi everyone,
>>> 
>>> I figured it out.  It is as simple as creating a List and
 using
>>> that as the value part for SolrInputDocument.addField() API.
>>> 
>>> Thanks,
>>> 
>>> Steven
>>> 
>>> 
>>> On Wed, Sep 16, 2020 at 9:13 PM Steven White >>> 
>> wrote:
>>> 
 Hi everyone,
 
 I want to

Re: Doing what does using SolrJ API

2020-09-17 Thread Walter Underwood
If you want to ignore a field being sent to Solr, you can set indexed=false and 
stored=false for that field in schema.xml. It will take up room in schema.xml 
but
zero room on disk.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)

> On Sep 17, 2020, at 10:23 AM, Alexandre Rafalovitch  
> wrote:
> 
> Solr has a whole pipeline that you can run during document ingesting before
> the actual indexing happens. It is called Update Request Processor (URP)
> and is defined in solrconfig.xml or in an override file. Obviously, since
> you are indexing from SolrJ client, you have even more flexibility, but it
> is good to know about anyway.
> 
> You can read all about it at:
> https://lucene.apache.org/solr/guide/8_6/update-request-processors.html and
> see the extensive list of processors you can leverage. The specific
> mentioned one is this one:
> https://lucene.apache.org/solr/8_6_0//solr-core/org/apache/solr/update/processor/StatelessScriptUpdateProcessorFactory.html
> 
> Just a word of warning that Stateless URP is using Javascript, which is
> getting a bit of a complicated story as underlying JVM is upgraded (Oracle
> dropped their javascript engine in JDK 14). So if one of the simpler URPs
> will do the job or a chain of them, that may be a better path to take.
> 
> Regards,
>   Alex.
> 
> 
> On Thu, 17 Sep 2020 at 13:13, Steven White  wrote:
> 
>> Thanks Erick.  Where can I learn more about "stateless script update
>> processor factory".  I don't know what you mean by this.
>> 
>> Steven
>> 
>> On Thu, Sep 17, 2020 at 1:08 PM Erick Erickson 
>> wrote:
>> 
>>> 1000 fields is fine, you'll waste some cycles on bookkeeping, but I
>> really
>>> doubt you'll notice. That said, are these fields used for searching?
>>> Because you do have control over what gous into the index if you can put
>> a
>>> "stateless script update processor factory" in your update chain. There
>> you
>>> can do whatever you want, including combine all the fields into one and
>>> delete the original fields. There's no point in having your index
>> cluttered
>>> with unused fields, OTOH, it may not be worth the effort just to satisfy
>> my
>>> sense of aesthetics 😉
>>> 
>>> On Thu, Sep 17, 2020, 12:59 Steven White  wrote:
>>> 
 Hi Eric,
 
 Yes, this is coming from a DB.  Unfortunately I have no control over
>> the
 list of fields.  Out of the 1000 fields that there maybe, no document,
>>> that
 gets indexed into Solr will use more then about 50 and since i'm
>> copying
 the values of those fields to the catch-all field and the catch-all
>> field
 is my default search field, I don't expect any problem for having 1000
 fields in Solr's schema, or should I?
 
 Thanks
 
 Steven
 
 
 On Thu, Sep 17, 2020 at 8:23 AM Erick Erickson <
>> erickerick...@gmail.com>
 wrote:
 
> “there over 1000 of them[fields]”
> 
> This is often a red flag in my experience. Solr will handle that many
> fields, I’ve seen many more. But this is often a result of
> “database thinking”, i.e. your mental model of how all this data
> is from a DB perspective rather than a search perspective.
> 
> It’s unwieldy to have that many fields. Obviously I don’t know the
> particulars of
> your app, and maybe that’s the best design. Particularly if many of
>> the
> fields
> are sparsely populated, i.e. only a small percentage of the documents
>>> in
> your
> corpus have any value for that field then taking a step back and
>>> looking
> at the design might save you some grief down the line.
> 
> For instance, I’ve seen designs where instead of
> field1:some_value
> field2:other_value….
> 
> you use a single field with _tokens_ like:
> field:field1_some_value
> field:field2_other_value
> 
> that drops the complexity and increases performance.
> 
> Anyway, just a thought you might want to consider.
> 
> Best,
> Erick
> 
>> On Sep 16, 2020, at 9:31 PM, Steven White 
 wrote:
>> 
>> Hi everyone,
>> 
>> I figured it out.  It is as simple as creating a List and
>>> using
>> that as the value part for SolrInputDocument.addField() API.
>> 
>> Thanks,
>> 
>> Steven
>> 
>> 
>> On Wed, Sep 16, 2020 at 9:13 PM Steven White >> 
> wrote:
>> 
>>> Hi everyone,
>>> 
>>> I want to avoid creating a >> source="OneFieldOfMany"/> in my schema (there will be over 1000 of
 them
> and
>>> maybe more so managing it will be a pain).  Instead, I want to use
 SolrJ
>>> API to do what  does.  Any example of how I can do
>> this?
 If
>>> there is an example online, that would be great.
>>> 
>>> Thanks in advance.
>>> 
>>> Steven
>>> 
> 
> 
 
>>> 
>> 



Re: Doing what does using SolrJ API

2020-09-17 Thread Alexandre Rafalovitch
Solr has a whole pipeline that you can run during document ingesting before
the actual indexing happens. It is called Update Request Processor (URP)
and is defined in solrconfig.xml or in an override file. Obviously, since
you are indexing from SolrJ client, you have even more flexibility, but it
is good to know about anyway.

You can read all about it at:
https://lucene.apache.org/solr/guide/8_6/update-request-processors.html and
see the extensive list of processors you can leverage. The specific
mentioned one is this one:
https://lucene.apache.org/solr/8_6_0//solr-core/org/apache/solr/update/processor/StatelessScriptUpdateProcessorFactory.html

Just a word of warning that Stateless URP is using Javascript, which is
getting a bit of a complicated story as underlying JVM is upgraded (Oracle
dropped their javascript engine in JDK 14). So if one of the simpler URPs
will do the job or a chain of them, that may be a better path to take.

Regards,
   Alex.


On Thu, 17 Sep 2020 at 13:13, Steven White  wrote:

> Thanks Erick.  Where can I learn more about "stateless script update
> processor factory".  I don't know what you mean by this.
>
> Steven
>
> On Thu, Sep 17, 2020 at 1:08 PM Erick Erickson 
> wrote:
>
> > 1000 fields is fine, you'll waste some cycles on bookkeeping, but I
> really
> > doubt you'll notice. That said, are these fields used for searching?
> > Because you do have control over what gous into the index if you can put
> a
> > "stateless script update processor factory" in your update chain. There
> you
> > can do whatever you want, including combine all the fields into one and
> > delete the original fields. There's no point in having your index
> cluttered
> > with unused fields, OTOH, it may not be worth the effort just to satisfy
> my
> > sense of aesthetics 😉
> >
> > On Thu, Sep 17, 2020, 12:59 Steven White  wrote:
> >
> > > Hi Eric,
> > >
> > > Yes, this is coming from a DB.  Unfortunately I have no control over
> the
> > > list of fields.  Out of the 1000 fields that there maybe, no document,
> > that
> > > gets indexed into Solr will use more then about 50 and since i'm
> copying
> > > the values of those fields to the catch-all field and the catch-all
> field
> > > is my default search field, I don't expect any problem for having 1000
> > > fields in Solr's schema, or should I?
> > >
> > > Thanks
> > >
> > > Steven
> > >
> > >
> > > On Thu, Sep 17, 2020 at 8:23 AM Erick Erickson <
> erickerick...@gmail.com>
> > > wrote:
> > >
> > > > “there over 1000 of them[fields]”
> > > >
> > > > This is often a red flag in my experience. Solr will handle that many
> > > > fields, I’ve seen many more. But this is often a result of
> > > > “database thinking”, i.e. your mental model of how all this data
> > > > is from a DB perspective rather than a search perspective.
> > > >
> > > > It’s unwieldy to have that many fields. Obviously I don’t know the
> > > > particulars of
> > > > your app, and maybe that’s the best design. Particularly if many of
> the
> > > > fields
> > > > are sparsely populated, i.e. only a small percentage of the documents
> > in
> > > > your
> > > > corpus have any value for that field then taking a step back and
> > looking
> > > > at the design might save you some grief down the line.
> > > >
> > > > For instance, I’ve seen designs where instead of
> > > > field1:some_value
> > > > field2:other_value….
> > > >
> > > > you use a single field with _tokens_ like:
> > > > field:field1_some_value
> > > > field:field2_other_value
> > > >
> > > > that drops the complexity and increases performance.
> > > >
> > > > Anyway, just a thought you might want to consider.
> > > >
> > > > Best,
> > > > Erick
> > > >
> > > > > On Sep 16, 2020, at 9:31 PM, Steven White 
> > > wrote:
> > > > >
> > > > > Hi everyone,
> > > > >
> > > > > I figured it out.  It is as simple as creating a List and
> > using
> > > > > that as the value part for SolrInputDocument.addField() API.
> > > > >
> > > > > Thanks,
> > > > >
> > > > > Steven
> > > > >
> > > > >
> > > > > On Wed, Sep 16, 2020 at 9:13 PM Steven White  >
> > > > wrote:
> > > > >
> > > > >> Hi everyone,
> > > > >>
> > > > >> I want to avoid creating a  > > > >> source="OneFieldOfMany"/> in my schema (there will be over 1000 of
> > > them
> > > > and
> > > > >> maybe more so managing it will be a pain).  Instead, I want to use
> > > SolrJ
> > > > >> API to do what  does.  Any example of how I can do
> this?
> > > If
> > > > >> there is an example online, that would be great.
> > > > >>
> > > > >> Thanks in advance.
> > > > >>
> > > > >> Steven
> > > > >>
> > > >
> > > >
> > >
> >
>


Re: NPE Issue with atomic update to nested document or child document through SolrJ

2020-09-17 Thread Alexandre Rafalovitch
Did you reindex the original document after you added a new field? If
not, then the previously indexed content is missing it and your code
paths will get out of sync.

Regards,
   Alex.
P.s. I haven't done what you are doing before, so there may be
something I am missing myself.


On Thu, 17 Sep 2020 at 12:46, Pratik Patel  wrote:
>
> Thanks for your reply Alexandre.
>
> I have "_root_" and "_nest_path_" fields in my schema but not
> "_nest_parent_".
>
>
> 
> 
>  docValues="false" />
> 
>  name="_nest_path_" class="solr.NestPathField" />
>
> I ran my test after adding the "_nest_parent_" field and I am not getting
> NPE any more which is good. Thanks!
>
> But looking at the documents in the index, I see that after the atomic
> update, now there are two children documents with the same id. One document
> has old values and another one has new values. Shouldn't they be merged
> based on the "id"? Do we need to specify anything else in the request to
> ensure that documents are merged/updated and not duplicated?
>
> For your reference, below is the test I am running now.
>
> // update field of one child doc
> SolrInputDocument sdoc = new SolrInputDocument(  );
> sdoc.addField( "id", testChildPOJO.id() );
> sdoc.addField( "conceptid", testChildPOJO.conceptid() );
> sdoc.addField( "storeid", "foo" );
> sdoc.setField( "fieldName",
> java.util.Collections.singletonMap("set", Collections.list("bar" ) ));
>
> final UpdateRequest req = new UpdateRequest();
> req.withRoute( pojo1.id() );// parent id
> req.add(sdoc);
>
> collection.client.request( req, collection.getCollectionName()
> );
> collection.client.commit();
>
>
> Resulting documents :
>
> {id=c1_child1, conceptid=c1, storeid=s1, fieldName=c1_child1_field_value1,
> startTime=Mon Sep 07 12:40:37 EDT 2020, integerField_iDF=10,
> booleanField_bDF=true, _root_=abcd, _version_=1678099970090074112}
> {id=c1_child1, conceptid=c1, storeid=foo, fieldName=bar, startTime=Mon Sep
> 07 12:40:37 EDT 2020, integerField_iDF=10, booleanField_bDF=true,
> _root_=abcd, _version_=1678099970405695488}
>
>
>
>
>
>
> On Thu, Sep 17, 2020 at 12:01 PM Alexandre Rafalovitch 
> wrote:
>
> > Can you double-check your schema to see if you have all the fields
> > required to support nested documents. You are supposed to get away
> > with just _root_, but really you should also include _nest_path and
> > _nest_parent_. Your particular exception seems to be triggering
> > something (maybe a bug) related to - possibly - missing _nest_path_
> > field.
> >
> > See:
> > https://lucene.apache.org/solr/guide/8_5/indexing-nested-documents.html#indexing-nested-documents
> >
> > Regards,
> >Alex.
> >
> > On Wed, 16 Sep 2020 at 13:28, Pratik Patel  wrote:
> > >
> > > Hello Everyone,
> > >
> > > I am trying to update a field of a child document using atomic updates
> > > feature. I am using solr and solrJ version 8.5.0
> > >
> > > I have ensured that my schema satisfies the conditions for atomic updates
> > > and I am able to do atomic updates on normal documents but with nested
> > > child documents, I am getting a Null Pointer Exception. Following is the
> > > simple test which I am trying.
> > >
> > > TestPojo  pojo1  = new TestPojo().cId( "abcd" )
> > > >  .conceptid( "c1" )
> > > >  .storeid( storeId )
> > > >  .testChildPojos(
> > > > Collections.list( testChildPOJO, testChildPOJO2,
> > > >
> > testChildPOJO3 )
> > > > );
> > > > TestChildPOJOtestChildPOJO = new TestChildPOJO().cId(
> > > > "c1_child1" )
> > > >   .conceptid( "c1"
> > )
> > > >   .storeid(
> > storeId )
> > > >   .fieldName(
> > > > "c1_child1_field_value1" )
> > > >   .startTime(
> > > > Date.from( now.minus( 10, ChronoUnit.DAYS ) ) )
> > > >
> >  .integerField_iDF(
> > > > 10 )
> > > >
> > > > .booleanField_bDF(true);
> > > > // index pojo1 with child testChildPOJO
> > > > SolrInputDocument sdoc = new SolrInputDocument();
> > > > sdoc.addField( "_route_", pojo1.cId() );
> > > > sdoc.addField( "id", testChildPOJO.cId() );
> > > > sdoc.addField( "conceptid", testChildPOJO.conceptid() );
> > > > sdoc.addField( "storeid", testChildPOJO.cId() );
> > > > sdoc.setField( "fieldName", java.util.Collections.singletonMap("set",
> > > > Collections.list(testChildPOJO.fieldName() + postfix) ) ); // modify
> > field
> > > > "fieldName"
> > > > collection.client.add( sdoc );   // results in NPE!
> > >
> > >
> > > Stack Trace:
> > >
> > > ERROR org.apache.solr.client.solrj.impl.BaseCloudSolrClient - Request to
> > > > collection [col

Re: Doing what does using SolrJ API

2020-09-17 Thread Steven White
Thanks Erick.  Where can I learn more about "stateless script update
processor factory".  I don't know what you mean by this.

Steven

On Thu, Sep 17, 2020 at 1:08 PM Erick Erickson 
wrote:

> 1000 fields is fine, you'll waste some cycles on bookkeeping, but I really
> doubt you'll notice. That said, are these fields used for searching?
> Because you do have control over what gous into the index if you can put a
> "stateless script update processor factory" in your update chain. There you
> can do whatever you want, including combine all the fields into one and
> delete the original fields. There's no point in having your index cluttered
> with unused fields, OTOH, it may not be worth the effort just to satisfy my
> sense of aesthetics 😉
>
> On Thu, Sep 17, 2020, 12:59 Steven White  wrote:
>
> > Hi Eric,
> >
> > Yes, this is coming from a DB.  Unfortunately I have no control over the
> > list of fields.  Out of the 1000 fields that there maybe, no document,
> that
> > gets indexed into Solr will use more then about 50 and since i'm copying
> > the values of those fields to the catch-all field and the catch-all field
> > is my default search field, I don't expect any problem for having 1000
> > fields in Solr's schema, or should I?
> >
> > Thanks
> >
> > Steven
> >
> >
> > On Thu, Sep 17, 2020 at 8:23 AM Erick Erickson 
> > wrote:
> >
> > > “there over 1000 of them[fields]”
> > >
> > > This is often a red flag in my experience. Solr will handle that many
> > > fields, I’ve seen many more. But this is often a result of
> > > “database thinking”, i.e. your mental model of how all this data
> > > is from a DB perspective rather than a search perspective.
> > >
> > > It’s unwieldy to have that many fields. Obviously I don’t know the
> > > particulars of
> > > your app, and maybe that’s the best design. Particularly if many of the
> > > fields
> > > are sparsely populated, i.e. only a small percentage of the documents
> in
> > > your
> > > corpus have any value for that field then taking a step back and
> looking
> > > at the design might save you some grief down the line.
> > >
> > > For instance, I’ve seen designs where instead of
> > > field1:some_value
> > > field2:other_value….
> > >
> > > you use a single field with _tokens_ like:
> > > field:field1_some_value
> > > field:field2_other_value
> > >
> > > that drops the complexity and increases performance.
> > >
> > > Anyway, just a thought you might want to consider.
> > >
> > > Best,
> > > Erick
> > >
> > > > On Sep 16, 2020, at 9:31 PM, Steven White 
> > wrote:
> > > >
> > > > Hi everyone,
> > > >
> > > > I figured it out.  It is as simple as creating a List and
> using
> > > > that as the value part for SolrInputDocument.addField() API.
> > > >
> > > > Thanks,
> > > >
> > > > Steven
> > > >
> > > >
> > > > On Wed, Sep 16, 2020 at 9:13 PM Steven White 
> > > wrote:
> > > >
> > > >> Hi everyone,
> > > >>
> > > >> I want to avoid creating a  > > >> source="OneFieldOfMany"/> in my schema (there will be over 1000 of
> > them
> > > and
> > > >> maybe more so managing it will be a pain).  Instead, I want to use
> > SolrJ
> > > >> API to do what  does.  Any example of how I can do this?
> > If
> > > >> there is an example online, that would be great.
> > > >>
> > > >> Thanks in advance.
> > > >>
> > > >> Steven
> > > >>
> > >
> > >
> >
>


Handling failure when adding docs to Solr using SolrJ

2020-09-17 Thread Steven White
Hi everyone,

I'm trying to figure out when and how I should handle failures that may
occur during indexing.  In the sample code below, look at my comment and
let me know what state my index is in when things fail:

   SolrClient solrClient = new HttpSolrClient.Builder(url).build();

   solrClient.add(solrDocs);

   // #1: What to do if add() fails?  And how do I know if all or some of
my docs in 'solrDocs' made it to the index or not ('solrDocs' is a list of
1 or more doc), should I retry add() again?  Retry with a smaller chunk?
Etc.

   if (doCommit == true)
   {
  solrClient.commit();

   // #2: What to do if commit() fails?  Re-issue commit() again?
   }

Thanks

Steven


Re: Doing what does using SolrJ API

2020-09-17 Thread Erick Erickson
1000 fields is fine, you'll waste some cycles on bookkeeping, but I really
doubt you'll notice. That said, are these fields used for searching?
Because you do have control over what gous into the index if you can put a
"stateless script update processor factory" in your update chain. There you
can do whatever you want, including combine all the fields into one and
delete the original fields. There's no point in having your index cluttered
with unused fields, OTOH, it may not be worth the effort just to satisfy my
sense of aesthetics 😉

On Thu, Sep 17, 2020, 12:59 Steven White  wrote:

> Hi Eric,
>
> Yes, this is coming from a DB.  Unfortunately I have no control over the
> list of fields.  Out of the 1000 fields that there maybe, no document, that
> gets indexed into Solr will use more then about 50 and since i'm copying
> the values of those fields to the catch-all field and the catch-all field
> is my default search field, I don't expect any problem for having 1000
> fields in Solr's schema, or should I?
>
> Thanks
>
> Steven
>
>
> On Thu, Sep 17, 2020 at 8:23 AM Erick Erickson 
> wrote:
>
> > “there over 1000 of them[fields]”
> >
> > This is often a red flag in my experience. Solr will handle that many
> > fields, I’ve seen many more. But this is often a result of
> > “database thinking”, i.e. your mental model of how all this data
> > is from a DB perspective rather than a search perspective.
> >
> > It’s unwieldy to have that many fields. Obviously I don’t know the
> > particulars of
> > your app, and maybe that’s the best design. Particularly if many of the
> > fields
> > are sparsely populated, i.e. only a small percentage of the documents in
> > your
> > corpus have any value for that field then taking a step back and looking
> > at the design might save you some grief down the line.
> >
> > For instance, I’ve seen designs where instead of
> > field1:some_value
> > field2:other_value….
> >
> > you use a single field with _tokens_ like:
> > field:field1_some_value
> > field:field2_other_value
> >
> > that drops the complexity and increases performance.
> >
> > Anyway, just a thought you might want to consider.
> >
> > Best,
> > Erick
> >
> > > On Sep 16, 2020, at 9:31 PM, Steven White 
> wrote:
> > >
> > > Hi everyone,
> > >
> > > I figured it out.  It is as simple as creating a List and using
> > > that as the value part for SolrInputDocument.addField() API.
> > >
> > > Thanks,
> > >
> > > Steven
> > >
> > >
> > > On Wed, Sep 16, 2020 at 9:13 PM Steven White 
> > wrote:
> > >
> > >> Hi everyone,
> > >>
> > >> I want to avoid creating a  > >> source="OneFieldOfMany"/> in my schema (there will be over 1000 of
> them
> > and
> > >> maybe more so managing it will be a pain).  Instead, I want to use
> SolrJ
> > >> API to do what  does.  Any example of how I can do this?
> If
> > >> there is an example online, that would be great.
> > >>
> > >> Thanks in advance.
> > >>
> > >> Steven
> > >>
> >
> >
>


Re: Doing what does using SolrJ API

2020-09-17 Thread Steven White
Hi Eric,

Yes, this is coming from a DB.  Unfortunately I have no control over the
list of fields.  Out of the 1000 fields that there maybe, no document, that
gets indexed into Solr will use more then about 50 and since i'm copying
the values of those fields to the catch-all field and the catch-all field
is my default search field, I don't expect any problem for having 1000
fields in Solr's schema, or should I?

Thanks

Steven


On Thu, Sep 17, 2020 at 8:23 AM Erick Erickson 
wrote:

> “there over 1000 of them[fields]”
>
> This is often a red flag in my experience. Solr will handle that many
> fields, I’ve seen many more. But this is often a result of
> “database thinking”, i.e. your mental model of how all this data
> is from a DB perspective rather than a search perspective.
>
> It’s unwieldy to have that many fields. Obviously I don’t know the
> particulars of
> your app, and maybe that’s the best design. Particularly if many of the
> fields
> are sparsely populated, i.e. only a small percentage of the documents in
> your
> corpus have any value for that field then taking a step back and looking
> at the design might save you some grief down the line.
>
> For instance, I’ve seen designs where instead of
> field1:some_value
> field2:other_value….
>
> you use a single field with _tokens_ like:
> field:field1_some_value
> field:field2_other_value
>
> that drops the complexity and increases performance.
>
> Anyway, just a thought you might want to consider.
>
> Best,
> Erick
>
> > On Sep 16, 2020, at 9:31 PM, Steven White  wrote:
> >
> > Hi everyone,
> >
> > I figured it out.  It is as simple as creating a List and using
> > that as the value part for SolrInputDocument.addField() API.
> >
> > Thanks,
> >
> > Steven
> >
> >
> > On Wed, Sep 16, 2020 at 9:13 PM Steven White 
> wrote:
> >
> >> Hi everyone,
> >>
> >> I want to avoid creating a  >> source="OneFieldOfMany"/> in my schema (there will be over 1000 of them
> and
> >> maybe more so managing it will be a pain).  Instead, I want to use SolrJ
> >> API to do what  does.  Any example of how I can do this?  If
> >> there is an example online, that would be great.
> >>
> >> Thanks in advance.
> >>
> >> Steven
> >>
>
>


Re: NPE Issue with atomic update to nested document or child document through SolrJ

2020-09-17 Thread Pratik Patel
Thanks for your reply Alexandre.

I have "_root_" and "_nest_path_" fields in my schema but not
"_nest_parent_".








I ran my test after adding the "_nest_parent_" field and I am not getting
NPE any more which is good. Thanks!

But looking at the documents in the index, I see that after the atomic
update, now there are two children documents with the same id. One document
has old values and another one has new values. Shouldn't they be merged
based on the "id"? Do we need to specify anything else in the request to
ensure that documents are merged/updated and not duplicated?

For your reference, below is the test I am running now.

// update field of one child doc
SolrInputDocument sdoc = new SolrInputDocument(  );
sdoc.addField( "id", testChildPOJO.id() );
sdoc.addField( "conceptid", testChildPOJO.conceptid() );
sdoc.addField( "storeid", "foo" );
sdoc.setField( "fieldName",
java.util.Collections.singletonMap("set", Collections.list("bar" ) ));

final UpdateRequest req = new UpdateRequest();
req.withRoute( pojo1.id() );// parent id
req.add(sdoc);

collection.client.request( req, collection.getCollectionName()
);
collection.client.commit();


Resulting documents :

{id=c1_child1, conceptid=c1, storeid=s1, fieldName=c1_child1_field_value1,
startTime=Mon Sep 07 12:40:37 EDT 2020, integerField_iDF=10,
booleanField_bDF=true, _root_=abcd, _version_=1678099970090074112}
{id=c1_child1, conceptid=c1, storeid=foo, fieldName=bar, startTime=Mon Sep
07 12:40:37 EDT 2020, integerField_iDF=10, booleanField_bDF=true,
_root_=abcd, _version_=1678099970405695488}






On Thu, Sep 17, 2020 at 12:01 PM Alexandre Rafalovitch 
wrote:

> Can you double-check your schema to see if you have all the fields
> required to support nested documents. You are supposed to get away
> with just _root_, but really you should also include _nest_path and
> _nest_parent_. Your particular exception seems to be triggering
> something (maybe a bug) related to - possibly - missing _nest_path_
> field.
>
> See:
> https://lucene.apache.org/solr/guide/8_5/indexing-nested-documents.html#indexing-nested-documents
>
> Regards,
>Alex.
>
> On Wed, 16 Sep 2020 at 13:28, Pratik Patel  wrote:
> >
> > Hello Everyone,
> >
> > I am trying to update a field of a child document using atomic updates
> > feature. I am using solr and solrJ version 8.5.0
> >
> > I have ensured that my schema satisfies the conditions for atomic updates
> > and I am able to do atomic updates on normal documents but with nested
> > child documents, I am getting a Null Pointer Exception. Following is the
> > simple test which I am trying.
> >
> > TestPojo  pojo1  = new TestPojo().cId( "abcd" )
> > >  .conceptid( "c1" )
> > >  .storeid( storeId )
> > >  .testChildPojos(
> > > Collections.list( testChildPOJO, testChildPOJO2,
> > >
> testChildPOJO3 )
> > > );
> > > TestChildPOJOtestChildPOJO = new TestChildPOJO().cId(
> > > "c1_child1" )
> > >   .conceptid( "c1"
> )
> > >   .storeid(
> storeId )
> > >   .fieldName(
> > > "c1_child1_field_value1" )
> > >   .startTime(
> > > Date.from( now.minus( 10, ChronoUnit.DAYS ) ) )
> > >
>  .integerField_iDF(
> > > 10 )
> > >
> > > .booleanField_bDF(true);
> > > // index pojo1 with child testChildPOJO
> > > SolrInputDocument sdoc = new SolrInputDocument();
> > > sdoc.addField( "_route_", pojo1.cId() );
> > > sdoc.addField( "id", testChildPOJO.cId() );
> > > sdoc.addField( "conceptid", testChildPOJO.conceptid() );
> > > sdoc.addField( "storeid", testChildPOJO.cId() );
> > > sdoc.setField( "fieldName", java.util.Collections.singletonMap("set",
> > > Collections.list(testChildPOJO.fieldName() + postfix) ) ); // modify
> field
> > > "fieldName"
> > > collection.client.add( sdoc );   // results in NPE!
> >
> >
> > Stack Trace:
> >
> > ERROR org.apache.solr.client.solrj.impl.BaseCloudSolrClient - Request to
> > > collection [collectionTest2] failed due to (500)
> > > org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException:
> Error
> > > from server at
> > > http://172.15.1.100:8081/solr/collectionTest2_shard1_replica_n1:
> > > java.lang.NullPointerException
> > > at
> > >
> org.apache.solr.update.processor.AtomicUpdateDocumentMerger.getFieldFromHierarchy(AtomicUpdateDocumentMerger.java:308)
> > > at
> > >
> org.apache.solr.update.processor.AtomicUpdateDocumentMerger.mergeChildDoc(AtomicUpdateDocumentMerger.java:405)
> > > at
> > >
> org.apache.solr.update.processor.DistributedUpdateProcessor.getUpdatedDocument(DistributedUpdateProces

Re: Help using Noggit for streaming JSON data

2020-09-17 Thread Yonik Seeley
See this method:

  /** Reads a JSON string into the output, decoding any escaped characters.
*/
  public void getString(CharArr output) throws IOException

And then the idea is to create a subclass of CharArr to incrementally
handle the string that is written to it.
You could overload write methods, or perhaps reserve() to flush/handle the
buffer when it reaches a certain size.

-Yonik


On Thu, Sep 17, 2020 at 11:48 AM Christopher Schultz <
ch...@christopherschultz.net> wrote:

> All,
>
> Is this an appropriate forum for asking questions about how to use
> Noggit? The Github doesn't have any discussions available and filing an
> "issue" to ask a question is kinda silly. I'm happy to be redirected to
> the right place if this isn't appropriate.
>
> I've been able to figure out most things in Noggit by reading the code,
> but I have a new use-case where I expect that I'll have very large
> values (base64-encoded binary) and I'd like to stream those rather than
> calling parser.getString() and getting a potentially huge string coming
> back. I'm streaming into a database so I never need the whole string in
> one place at one time.
>
> I was thinking something like this:
>
> JSONParser p = ...;
>
> int evt = p.nextEvent();
> if(JSONParser.STRING == evt) {
>   // Start streaming
>   boolean eos = false;
>   while(!eos) {
> char c = p.getChar();
> if(c == '"') {
>   eos = true;
> } else {
>   append to stream
> }
>   }
> }
>
> But getChar() is not public. The only "documentation" I've really been
> able to find for Noggit is this post from Yonic back in 2014:
>
> http://yonik.com/noggit-json-parser/
>
> It mostly says "Noggit is great!" and specifically mentions huge, long
> strings but does not actually show any Java code to consume the JSON
> data in any kind of streaming way.
>
> The ObjectBuilder class is a great user of JSONParser, but it just
> builds standard objects and would consume tons of memory in my case.
>
> I know for sure that Solr consumes huge JSON documents and I'm assuming
> that Noggit is being used in that situation, though I have not looked at
> the code used to do that.
>
> Any suggestions?
>
> -chris
>


Re: NPE Issue with atomic update to nested document or child document through SolrJ

2020-09-17 Thread Alexandre Rafalovitch
Can you double-check your schema to see if you have all the fields
required to support nested documents. You are supposed to get away
with just _root_, but really you should also include _nest_path and
_nest_parent_. Your particular exception seems to be triggering
something (maybe a bug) related to - possibly - missing _nest_path_
field.

See: 
https://lucene.apache.org/solr/guide/8_5/indexing-nested-documents.html#indexing-nested-documents

Regards,
   Alex.

On Wed, 16 Sep 2020 at 13:28, Pratik Patel  wrote:
>
> Hello Everyone,
>
> I am trying to update a field of a child document using atomic updates
> feature. I am using solr and solrJ version 8.5.0
>
> I have ensured that my schema satisfies the conditions for atomic updates
> and I am able to do atomic updates on normal documents but with nested
> child documents, I am getting a Null Pointer Exception. Following is the
> simple test which I am trying.
>
> TestPojo  pojo1  = new TestPojo().cId( "abcd" )
> >  .conceptid( "c1" )
> >  .storeid( storeId )
> >  .testChildPojos(
> > Collections.list( testChildPOJO, testChildPOJO2,
> >  testChildPOJO3 )
> > );
> > TestChildPOJOtestChildPOJO = new TestChildPOJO().cId(
> > "c1_child1" )
> >   .conceptid( "c1" )
> >   .storeid( storeId )
> >   .fieldName(
> > "c1_child1_field_value1" )
> >   .startTime(
> > Date.from( now.minus( 10, ChronoUnit.DAYS ) ) )
> >   .integerField_iDF(
> > 10 )
> >
> > .booleanField_bDF(true);
> > // index pojo1 with child testChildPOJO
> > SolrInputDocument sdoc = new SolrInputDocument();
> > sdoc.addField( "_route_", pojo1.cId() );
> > sdoc.addField( "id", testChildPOJO.cId() );
> > sdoc.addField( "conceptid", testChildPOJO.conceptid() );
> > sdoc.addField( "storeid", testChildPOJO.cId() );
> > sdoc.setField( "fieldName", java.util.Collections.singletonMap("set",
> > Collections.list(testChildPOJO.fieldName() + postfix) ) ); // modify field
> > "fieldName"
> > collection.client.add( sdoc );   // results in NPE!
>
>
> Stack Trace:
>
> ERROR org.apache.solr.client.solrj.impl.BaseCloudSolrClient - Request to
> > collection [collectionTest2] failed due to (500)
> > org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error
> > from server at
> > http://172.15.1.100:8081/solr/collectionTest2_shard1_replica_n1:
> > java.lang.NullPointerException
> > at
> > org.apache.solr.update.processor.AtomicUpdateDocumentMerger.getFieldFromHierarchy(AtomicUpdateDocumentMerger.java:308)
> > at
> > org.apache.solr.update.processor.AtomicUpdateDocumentMerger.mergeChildDoc(AtomicUpdateDocumentMerger.java:405)
> > at
> > org.apache.solr.update.processor.DistributedUpdateProcessor.getUpdatedDocument(DistributedUpdateProcessor.java:711)
> > at
> > org.apache.solr.update.processor.DistributedUpdateProcessor.doVersionAdd(DistributedUpdateProcessor.java:374)
> > at
> > org.apache.solr.update.processor.DistributedUpdateProcessor.lambda$versionAdd$0(DistributedUpdateProcessor.java:339)
> > at org.apache.solr.update.VersionBucket.runWithLock(VersionBucket.java:50)
> > at
> > org.apache.solr.update.processor.DistributedUpdateProcessor.versionAdd(DistributedUpdateProcessor.java:339)
> > at
> > org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:225)
> > at
> > org.apache.solr.update.processor.DistributedZkUpdateProcessor.processAdd(DistributedZkUpdateProcessor.java:245)
> > at
> > org.apache.solr.update.processor.LogUpdateProcessorFactory$LogUpdateProcessor.processAdd(LogUpdateProcessorFactory.java:103)
> > at
> > org.apache.solr.handler.loader.JavabinLoader$1.update(JavabinLoader.java:110)
> > at
> > org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$StreamingCodec.readOuterMostDocIterator(JavaBinUpdateRequestCodec.java:332)
> > at
> > org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$StreamingCodec.readIterator(JavaBinUpdateRequestCodec.java:281)
> > at
> > org.apache.solr.common.util.JavaBinCodec.readObject(JavaBinCodec.java:338)
> > at org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:283)
> > at
> > org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$StreamingCodec.readNamedList(JavaBinUpdateRequestCodec.java:236)
> > at
> > org.apache.solr.common.util.JavaBinCodec.readObject(JavaBinCodec.java:303)
> > at org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:283)
> > at
> > org.apache.solr.common.util.JavaBinCodec.unmarshal(JavaBinCodec.java:196)
> > at
> > org.apache.solr.client.solrj.request.JavaBi

Help using Noggit for streaming JSON data

2020-09-17 Thread Christopher Schultz
All,

Is this an appropriate forum for asking questions about how to use
Noggit? The Github doesn't have any discussions available and filing an
"issue" to ask a question is kinda silly. I'm happy to be redirected to
the right place if this isn't appropriate.

I've been able to figure out most things in Noggit by reading the code,
but I have a new use-case where I expect that I'll have very large
values (base64-encoded binary) and I'd like to stream those rather than
calling parser.getString() and getting a potentially huge string coming
back. I'm streaming into a database so I never need the whole string in
one place at one time.

I was thinking something like this:

JSONParser p = ...;

int evt = p.nextEvent();
if(JSONParser.STRING == evt) {
  // Start streaming
  boolean eos = false;
  while(!eos) {
char c = p.getChar();
if(c == '"') {
  eos = true;
} else {
  append to stream
}
  }
}

But getChar() is not public. The only "documentation" I've really been
able to find for Noggit is this post from Yonic back in 2014:

http://yonik.com/noggit-json-parser/

It mostly says "Noggit is great!" and specifically mentions huge, long
strings but does not actually show any Java code to consume the JSON
data in any kind of streaming way.

The ObjectBuilder class is a great user of JSONParser, but it just
builds standard objects and would consume tons of memory in my case.

I know for sure that Solr consumes huge JSON documents and I'm assuming
that Noggit is being used in that situation, though I have not looked at
the code used to do that.

Any suggestions?

-chris


Re: NPE Issue with atomic update to nested document or child document through SolrJ

2020-09-17 Thread pratik@semandex
Following are the approaches I have tried so far and both results in NPE.



*approach 1

TestChildPOJO  testChildPOJO = new TestChildPOJO().cId( "c1_child1" )
  .conceptid( "c1" )
  .storeid( storeId )
  .fieldName(
"c1_child1_field_value1" )
  .startTime( Date.from(
now.minus( 10, ChronoUnit.DAYS ) ) )
  .integerField_iDF( 10
)
 
.booleanField_bDF(true);


TestPojo  pojo1  = new TestPojo().cId( "abcd" )
 .conceptid( "c1" )
 .storeid( storeId )
 .testChildPojos(
Collections.list( testChildPOJO, testChildPOJO2, testChildPOJO3 ) );

 

// index pojo1 with child testChildPOJO

SolrInputDocument sdoc = new SolrInputDocument();
sdoc.addField( "_route_", pojo1.cId() );
sdoc.addField( "id", testChildPOJO.cId() );
sdoc.addField( "conceptid", testChildPOJO.conceptid() );
sdoc.addField( "storeid", testChildPOJO.cId() );
sdoc.setField( "fieldName", java.util.Collections.singletonMap("set",
Collections.list(testChildPOJO.fieldName() + postfix) ) );  // modify field
"fieldName"

collection.client.add( sdoc );  
// results in NPE!

*approach 1


*approach 2

SolrInputDocument sdoc = new SolrInputDocument(  );
sdoc.addField( "id", testChildPOJO.id() );
sdoc.setField( "fieldName",
java.util.Collections.singletonMap("set", testChildPOJO.fieldName() +
postfix) );
final UpdateRequest req = new UpdateRequest();
req.withRoute( pojo1.id() );
req.add(sdoc);
   
collection.client.request( req, collection.getCollectionName()
);
req.commit( collection.client, collection.getCollectionName());


*approach 2




--
Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Need to update SOLR_HOME in the solr service script and getting errors

2020-09-17 Thread Mark H. Wood
On Wed, Sep 16, 2020 at 02:59:32PM +, Victor Kretzer wrote:
> My setup is two solr nodes running on separate Azure Ubuntu 18.04 LTS vms 
> using an external zookeeper assembly.
> I installed Solr 6.6.6 using the install file and then followed the steps for 
> enabling ssl. I am able to start solr, add collections and the like using 
> bin/solr script.
> 
> Example:
> /opt/solr$ sudo bin/solr start -cloud -s cloud/test2 -force
> 
> However, if I restart the machine or attempt to start solr using the 
> installed service, it naturally goes back to the default SOLR_HOME in the 
> /etc/default/solr.in.sh script: "/var/solr/data"
> 
> I've tried updating SOLR_HOME to "/opt/solr/cloud/test2"

That is what I would do.

> but then when I start the service I see the following error on the Admin 
> Dashboard:
> SolrCore Initialization Failures
> mycollection_shard1_replica1: 
> org.apache.solr.common.SolrException:org.apache.solr.common.SolrException: 
> /opt/solr-6.6.6/cloud/test2/mycollection_shard1_replica1/data/index/write.lock
> Please check your logs for more information
> 
> I'm including what I believe to be the pertinent information from the logs 
> below:

You did well.

> I suspect this is a permission issue because the solr user created by the 
> install script isn't allowed access to  /opt/solr but I'm new to Linux and 
> haven't completely wrapped my head around the way permissions work with it. 
> Am I correct in guessing the cause of the error and, if so, how do I correct 
> this so that the service can be used to run my instances?

Yes, the stack trace actually tells you explicitly that the problem is
permissions on that file.  Follow the chain of "Caused by:" and you'll see:

  Caused by: java.nio.file.AccessDeniedException: 
/opt/solr-6.6.6/cloud/test2/mycollection_shard1_replica1/data/index/write.lock

Since, in the past, you have started Solr using 'sudo', this probably
means that write.lock is owned by 'root'.  Solr creates this file with
permissions that allow only the owner to write it.  If the service
script runs Solr as any other user (and it should!) then Solr won't be
able to open this file for writing, and because of this it won't
complete the loading of that core.

You should find out what user account is used by the service script,
and 'chown' Solr's entire working directories tree to be owned by that
user.  Then, refrain from ever running Solr as 'root' or the problem
may recur.  Use the normal service start/stop mechanism for
controlling your Solr instances.

-- 
Mark H. Wood
Lead Technology Analyst

University Library
Indiana University - Purdue University Indianapolis
755 W. Michigan Street
Indianapolis, IN 46202
317-274-0749
www.ulib.iupui.edu


signature.asc
Description: PGP signature


Re: Unable to create core Solr 8.6.2

2020-09-17 Thread Erick Erickson
Look in your solr log, there’s usually a more detailed message

> On Sep 17, 2020, at 9:35 AM, Anuj Bhargava  wrote:
> 
> Getting the following error message while trying to create core
> 
> # sudo su - solr -c "/opt/solr/bin/solr create_core -c 9lives"
> WARNING: Using _default configset with data driven schema functionality.
> NOT RECOMMENDED for production use.
> To turn off: bin/solr config -c 9lives -p 8984 -action
> set-user-property -property update.autoCreateFields -valuefalse
> 
> ERROR: Parse error : 
> 
> 
> Error 401 Unauthorized
> 
> HTTP ERROR 401
> Problem accessing /solr/admin/info/system. Reason:
> Unauthorized
> 
> 



Unable to create core Solr 8.6.2

2020-09-17 Thread Anuj Bhargava
Getting the following error message while trying to create core

# sudo su - solr -c "/opt/solr/bin/solr create_core -c 9lives"
WARNING: Using _default configset with data driven schema functionality.
NOT RECOMMENDED for production use.
 To turn off: bin/solr config -c 9lives -p 8984 -action
set-user-property -property update.autoCreateFields -valuefalse

ERROR: Parse error : 


Error 401 Unauthorized

HTTP ERROR 401
Problem accessing /solr/admin/info/system. Reason:
Unauthorized




Re: Doing what does using SolrJ API

2020-09-17 Thread Erick Erickson
“there over 1000 of them[fields]”

This is often a red flag in my experience. Solr will handle that many 
fields, I’ve seen many more. But this is often a result of 
“database thinking”, i.e. your mental model of how all this data
is from a DB perspective rather than a search perspective.

It’s unwieldy to have that many fields. Obviously I don’t know the particulars 
of
your app, and maybe that’s the best design. Particularly if many of the fields
are sparsely populated, i.e. only a small percentage of the documents in your
corpus have any value for that field then taking a step back and looking
at the design might save you some grief down the line.

For instance, I’ve seen designs where instead of
field1:some_value
field2:other_value….

you use a single field with _tokens_ like:
field:field1_some_value
field:field2_other_value

that drops the complexity and increases performance.

Anyway, just a thought you might want to consider.

Best,
Erick

> On Sep 16, 2020, at 9:31 PM, Steven White  wrote:
> 
> Hi everyone,
> 
> I figured it out.  It is as simple as creating a List and using
> that as the value part for SolrInputDocument.addField() API.
> 
> Thanks,
> 
> Steven
> 
> 
> On Wed, Sep 16, 2020 at 9:13 PM Steven White  wrote:
> 
>> Hi everyone,
>> 
>> I want to avoid creating a > source="OneFieldOfMany"/> in my schema (there will be over 1000 of them and
>> maybe more so managing it will be a pain).  Instead, I want to use SolrJ
>> API to do what  does.  Any example of how I can do this?  If
>> there is an example online, that would be great.
>> 
>> Thanks in advance.
>> 
>> Steven
>> 



Re: Fetched but not Added Solr 8.6.2

2020-09-17 Thread Anuj Bhargava
SolrWriter
Error creating document :

On Thu, 17 Sep 2020 at 15:53, Jörn Franke  wrote:

> Log file will tell you the issue.
>
> > Am 17.09.2020 um 10:54 schrieb Anuj Bhargava :
> >
> > We just installed Solr 8.6.2
> > It is fetching the data but not adding
> >
> > Indexing completed. *Added/Updated: 0 *documents. Deleted 0 documents.
> > (Duration: 06s)
> > Requests: 1 ,* Fetched: 100* 17/s, Skipped: 0 , Processed: 0
> >
> > The *data-config.xml*
> >
> > 
> > >driver="com.mysql.jdbc.Driver"
> >batchSize="-1"
> >autoReconnect="true"
> >socketTimeout="0"
> >connectTimeout="0"
> >encoding="UTF-8"
> >url="jdbc:mysql://zeroDateTimeBehavior=convertToNull"
> >user="xxx"
> >password="xxx"/>
> >
> > >deltaQuery="select posting_id from countries where
> > last_modified > '${dataimporter.last_index_time}'">
> >
> >
> > 
>


Re: Fetched but not Added Solr 8.6.2

2020-09-17 Thread Jörn Franke
Log file will tell you the issue.

> Am 17.09.2020 um 10:54 schrieb Anuj Bhargava :
> 
> We just installed Solr 8.6.2
> It is fetching the data but not adding
> 
> Indexing completed. *Added/Updated: 0 *documents. Deleted 0 documents.
> (Duration: 06s)
> Requests: 1 ,* Fetched: 100* 17/s, Skipped: 0 , Processed: 0
> 
> The *data-config.xml*
> 
> 
>driver="com.mysql.jdbc.Driver"
>batchSize="-1"
>autoReconnect="true"
>socketTimeout="0"
>connectTimeout="0"
>encoding="UTF-8"
>url="jdbc:mysql://zeroDateTimeBehavior=convertToNull"
>user="xxx"
>password="xxx"/>
>
>deltaQuery="select posting_id from countries where
> last_modified > '${dataimporter.last_index_time}'">
>
>
> 


Fetched but not Added Solr 8.6.2

2020-09-17 Thread Anuj Bhargava
We just installed Solr 8.6.2
It is fetching the data but not adding

Indexing completed. *Added/Updated: 0 *documents. Deleted 0 documents.
(Duration: 06s)
Requests: 1 ,* Fetched: 100* 17/s, Skipped: 0 , Processed: 0

The *data-config.xml*