Re: Lucene optimization to disable hit count

2019-11-20 Thread Wei
Thanks! Looking forward to have this feature in Solr.

On Wed, Nov 20, 2019 at 5:30 PM Tomás Fernández Löbbe 
wrote:

> Not yet:
> https://issues.apache.org/jira/browse/SOLR-13289
>
> On Wed, Nov 20, 2019 at 4:57 PM Wei  wrote:
>
> > Hi,
> >
> > I see this lucene optimization to disable hit counts for better query
> > performance:
> >
> > https://issues.apache.org/jira/browse/LUCENE-8060
> >
> > Is the feature available in Solr 8.3?
> >
> > Thanks,
> > Wei
> >
>


Re: Lucene optimization to disable hit count

2019-11-20 Thread Tomás Fernández Löbbe
Not yet:
https://issues.apache.org/jira/browse/SOLR-13289

On Wed, Nov 20, 2019 at 4:57 PM Wei  wrote:

> Hi,
>
> I see this lucene optimization to disable hit counts for better query
> performance:
>
> https://issues.apache.org/jira/browse/LUCENE-8060
>
> Is the feature available in Solr 8.3?
>
> Thanks,
> Wei
>


Lucene optimization to disable hit count

2019-11-20 Thread Wei
Hi,

I see this lucene optimization to disable hit counts for better query
performance:

https://issues.apache.org/jira/browse/LUCENE-8060

Is the feature available in Solr 8.3?

Thanks,
Wei


Re: Active directory integration in Solr

2019-11-20 Thread Dave
I guess I don’t understand why one wouldn’t simply make a basic front end for 
solr, it’s literally the easiest thing to throw together and then you control 
all authentication and filters per user.  Even a basic one would be some w3 
school tutorials with php+json+whatever authentication Mech you want to use.  
Access to the ui right away let’s you just, drop entire cores or collections, 
there’s no way anyone not familiar with what they’re doing should be allowed to 
touch it

> On Nov 20, 2019, at 6:22 PM, Jörn Franke  wrote:
> 
> Well i propose for Solr Kerberos authentication on HTTPS (2) for the web ui 
> backend. Then the web ui backend does any type of authentication / 
> authorization of users you need.
> I would not let users access directly access Solr in any environment. 
> 
> 
> 
>> Am 20.11.2019 um 20:19 schrieb Kevin Risden :
>> 
>> So I wrote the blog more of an experiment above. I don't know if it is
>> fully operating other than on a single node. That being said, the Hadoop
>> authentication plugin doesn't require running on HDFS. It just uses the
>> Hadoop code to do authentication.
>> 
>> I will echo what Jorn said though - I wouldn't expose Solr to the internet
>> or directly without some sort of API. Whether you do
>> authentication/authorization at the API is a separate question.
>> 
>> Kevin Risden
>> 
>> 
>>> On Wed, Nov 20, 2019 at 1:54 PM Jörn Franke  wrote:
>>> 
>>> I would not give users directly access to Solr - even with LDAP plugin.
>>> Build a rest interface or web interface that does the authentication and
>>> authorization and security sanitization. Then you can also manage better
>>> excessive queries or explicitly forbid certain type of queries (eg specific
>>> streaming expressions - I would not expose all of them to users).
>>> 
> Am 19.11.2019 um 11:02 schrieb Kommu, Vinodh K. :
 
 Thanks Charlie.
 
 We are already using Basic authentication in our existing clusters,
>>> however it's getting difficult to maintain number of users as we are
>>> getting too many requests for readonly access from support teams. So we
>>> desperately looking for active directory solution. Just wondering if
>>> someone might have same requirement need.
 
 
 Regards,
 Vinodh
 
 -Original Message-
 From: Charlie Hull 
 Sent: Tuesday, November 19, 2019 2:55 PM
 To: solr-user@lucene.apache.org
 Subject: Re: Active directory integration in Solr
 
 ATTENTION! This email originated outside of DTCC; exercise caution.
 
 Not out of the box, there are a few authentication plugins bundled but
>>> not for AD
 
>>> https://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Flucene.apache.org%2Fsolr%2Fguide%2F7_2%2Fauthentication-and-authorization-plugins.htmldata=02%7C01%7Cvkommu%40dtcc.com%7C2e17e1feef78432502e008d76cd26635%7C0465519d7f554d47998b55e2a86f04a8%7C0%7C0%7C637097523245309858sdata=fkahJ62aWFYh7QxcyFQbJV9u8OsTYSWp6pv0MNdzjps%3Dreserved=0
 - there's also some useful stuff in Apache ManifoldCF
 
>>> https://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.francelabs.com%2Fblog%2Ftutorial-on-authorizations-for-manifold-cf-and-solr%2Fdata=02%7C01%7Cvkommu%40dtcc.com%7C2e17e1feef78432502e008d76cd26635%7C0465519d7f554d47998b55e2a86f04a8%7C0%7C0%7C637097523245319858sdata=iYiKRDJKYBZaxUd%2F%2BIddFBwxB2RhSqih2KZc26aZlRU%3Dreserved=0
 
 
 Best
 
 Charlie
 
> On 18/11/2019 15:08, Kommu, Vinodh K. wrote:
> Hi,
> 
> Does anyone know that Solr has any out of the box capability to
>>> integrate Active directory (using LDAP) when security is enabled? Instead
>>> of creating users in security.json file, planning to use users who already
>>> exists in active directory so they can use their individual credentials
>>> rather than defining in Solr. Did anyone came across similar requirement?
>>> If so was there any working solution?
> 
> 
> Thanks,
> Vinodh
> 
> DTCC DISCLAIMER: This email and any files transmitted with it are
>>> confidential and intended solely for the use of the individual or entity to
>>> whom they are addressed. If you have received this email in error, please
>>> notify us immediately and delete the email and any attachments from your
>>> system. The recipient should check this email and any attachments for the
>>> presence of viruses. The company accepts no liability for any damage caused
>>> by any virus transmitted by this email.
> 
 
 --
 Charlie Hull
 Flax - Open Source Enterprise Search
 
 tel/fax: +44 (0)8700 118334
 mobile:  +44 (0)7767 825828
 web:
>>> https://nam02.safelinks.protection.outlook.com/?url=www.flax.co.ukdata=02%7C01%7Cvkommu%40dtcc.com%7C2e17e1feef78432502e008d76cd26635%7C0465519d7f554d47998b55e2a86f04a8%7C0%7C0%7C637097523245319858sdata=YNGIg%2FVgL2w82i3JWsBkBTJeefHMjSxbjLaQyOdJVt0%3Dreserved=0
 
 DTCC DISCLAIMER: This email and any files 

Re: Active directory integration in Solr

2019-11-20 Thread Jörn Franke
Well i propose for Solr Kerberos authentication on HTTPS (2) for the web ui 
backend. Then the web ui backend does any type of authentication / 
authorization of users you need.
I would not let users access directly access Solr in any environment. 



> Am 20.11.2019 um 20:19 schrieb Kevin Risden :
> 
> So I wrote the blog more of an experiment above. I don't know if it is
> fully operating other than on a single node. That being said, the Hadoop
> authentication plugin doesn't require running on HDFS. It just uses the
> Hadoop code to do authentication.
> 
> I will echo what Jorn said though - I wouldn't expose Solr to the internet
> or directly without some sort of API. Whether you do
> authentication/authorization at the API is a separate question.
> 
> Kevin Risden
> 
> 
>> On Wed, Nov 20, 2019 at 1:54 PM Jörn Franke  wrote:
>> 
>> I would not give users directly access to Solr - even with LDAP plugin.
>> Build a rest interface or web interface that does the authentication and
>> authorization and security sanitization. Then you can also manage better
>> excessive queries or explicitly forbid certain type of queries (eg specific
>> streaming expressions - I would not expose all of them to users).
>> 
 Am 19.11.2019 um 11:02 schrieb Kommu, Vinodh K. :
>>> 
>>> Thanks Charlie.
>>> 
>>> We are already using Basic authentication in our existing clusters,
>> however it's getting difficult to maintain number of users as we are
>> getting too many requests for readonly access from support teams. So we
>> desperately looking for active directory solution. Just wondering if
>> someone might have same requirement need.
>>> 
>>> 
>>> Regards,
>>> Vinodh
>>> 
>>> -Original Message-
>>> From: Charlie Hull 
>>> Sent: Tuesday, November 19, 2019 2:55 PM
>>> To: solr-user@lucene.apache.org
>>> Subject: Re: Active directory integration in Solr
>>> 
>>> ATTENTION! This email originated outside of DTCC; exercise caution.
>>> 
>>> Not out of the box, there are a few authentication plugins bundled but
>> not for AD
>>> 
>> https://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Flucene.apache.org%2Fsolr%2Fguide%2F7_2%2Fauthentication-and-authorization-plugins.htmldata=02%7C01%7Cvkommu%40dtcc.com%7C2e17e1feef78432502e008d76cd26635%7C0465519d7f554d47998b55e2a86f04a8%7C0%7C0%7C637097523245309858sdata=fkahJ62aWFYh7QxcyFQbJV9u8OsTYSWp6pv0MNdzjps%3Dreserved=0
>>> - there's also some useful stuff in Apache ManifoldCF
>>> 
>> https://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.francelabs.com%2Fblog%2Ftutorial-on-authorizations-for-manifold-cf-and-solr%2Fdata=02%7C01%7Cvkommu%40dtcc.com%7C2e17e1feef78432502e008d76cd26635%7C0465519d7f554d47998b55e2a86f04a8%7C0%7C0%7C637097523245319858sdata=iYiKRDJKYBZaxUd%2F%2BIddFBwxB2RhSqih2KZc26aZlRU%3Dreserved=0
>>> 
>>> 
>>> Best
>>> 
>>> Charlie
>>> 
 On 18/11/2019 15:08, Kommu, Vinodh K. wrote:
 Hi,
 
 Does anyone know that Solr has any out of the box capability to
>> integrate Active directory (using LDAP) when security is enabled? Instead
>> of creating users in security.json file, planning to use users who already
>> exists in active directory so they can use their individual credentials
>> rather than defining in Solr. Did anyone came across similar requirement?
>> If so was there any working solution?
 
 
 Thanks,
 Vinodh
 
 DTCC DISCLAIMER: This email and any files transmitted with it are
>> confidential and intended solely for the use of the individual or entity to
>> whom they are addressed. If you have received this email in error, please
>> notify us immediately and delete the email and any attachments from your
>> system. The recipient should check this email and any attachments for the
>> presence of viruses. The company accepts no liability for any damage caused
>> by any virus transmitted by this email.
 
>>> 
>>> --
>>> Charlie Hull
>>> Flax - Open Source Enterprise Search
>>> 
>>> tel/fax: +44 (0)8700 118334
>>> mobile:  +44 (0)7767 825828
>>> web:
>> https://nam02.safelinks.protection.outlook.com/?url=www.flax.co.ukdata=02%7C01%7Cvkommu%40dtcc.com%7C2e17e1feef78432502e008d76cd26635%7C0465519d7f554d47998b55e2a86f04a8%7C0%7C0%7C637097523245319858sdata=YNGIg%2FVgL2w82i3JWsBkBTJeefHMjSxbjLaQyOdJVt0%3Dreserved=0
>>> 
>>> DTCC DISCLAIMER: This email and any files transmitted with it are
>> confidential and intended solely for the use of the individual or entity to
>> whom they are addressed. If you have received this email in error, please
>> notify us immediately and delete the email and any attachments from your
>> system. The recipient should check this email and any attachments for the
>> presence of viruses. The company accepts no liability for any damage caused
>> by any virus transmitted by this email.
>>> 
>> 


Re: Possible data corruption in JavaBinCodec in Solr 8.3 during distributed update?

2019-11-20 Thread Noble Paul
Sure, looking forward to that
thnaks

On Thu, Nov 21, 2019 at 8:06 AM Colvin Cowie  wrote:
>
> Hi, I'll share it when I'm back at work tomorrow.
>
> I've found that the issue appears to not be reproducible using the
> solrconfig.xml in the _default example, so it must be something in ours (or
> something missing from it) that is making the problem appear, in
> combination with the code change.
>
> Thanks
>
>
> On Wednesday, 20 November 2019, Noble Paul  wrote:
>
> > Can you share the test please
> >
> > On Thu, Nov 21, 2019 at 7:02 AM Noble Paul  wrote:
> > >
> > > Thanks Colvin, I'll take a look
> > >
> > > On Thu, Nov 21, 2019 at 4:24 AM Colvin Cowie 
> > wrote:
> > > >
> > > > I've identified the change which has caused the problem to
> > materialize, but
> > > > it shouldn't itself cause a problem.
> > > >
> > > > https://github.com/apache/lucene-solr/commit/
> > e45e8127d5c17af4e4b87a0a4eaf0afaf4f9ff4b#diff-
> > 7f7f485122d8257bd5d3210c092b967fR52
> > > > for https://issues.apache.org/jira/browse/SOLR-13682
> > > >
> > > > In writeMap, the new BiConsumer unwraps the SolrInputField using
> > getValue
> > > > rather than getRawValue (which the JavaBinCodec calls):
> > > >
> > > >
> > > > *  if (o instanceof SolrInputField) {o = ((SolrInputField)
> > > > o).getValue();  }*
> > > > As a result the JavaBinCodec will now be hitting different writer
> > methods
> > > > based on the value retrieved from the SolrInputField, rather than just
> > > > writing the org.apache.solr.common.util.JavaBinCodec.writeKnownType(
> > Object)
> > > >
> > > >
> > > > *if (val instanceof SolrInputField) {  return
> > > > writeKnownType(((SolrInputField) val).getRawValue());}*
> > > > https://github.com/apache/lucene-solr/blob/branch_8_3/
> > solr/solrj/src/java/org/apache/solr/common/util/JavaBinCodec.java#L362
> > > >
> > > > SolrInputField getValue uses
> > > > org.apache.solr.common.util.ByteArrayUtf8CharSequence.
> > convertCharSeq(Object)
> > > > while getRawValue just returns whatever value the SolrInputField has,
> > so
> > > > the EntryWriter in the JavaBinCodec hits different paths from the ones
> > > > which must non-deterministically produce garbage data when getValue()
> > is
> > > > used.
> > > >
> > > > Changing *getValue()* to *getRawValue()* in the SolrInputDocument's
> > > > *writeMap()* appears to "fix" the problem. (With getValue() the test I
> > have
> > > > reliably fails within 50 iterations of indexing 2500 documents, with
> > > > getRawValue() it succeeds for the 500 iterations I'm running it for)
> > > >
> > > > I'll see about providing a test that can be shared that demonstrates
> > the
> > > > problem, and see if we can find what is going wrong in the codec...
> > > >
> > > >
> > > > On Tue, 19 Nov 2019 at 13:48, Colvin Cowie  > >
> > > > wrote:
> > > >
> > > > > Hello
> > > > >
> > > > > Apologies for the lack of actual detail in this, we're still digging
> > into
> > > > > it ourselves. I will provide more detail, and maybe some logs, once
> > I have
> > > > > a better idea of what is actually happening.
> > > > > But I thought I might as well ask if anyone knows of changes that
> > were
> > > > > made in the Solr 8.3 release that are likely to have caused an issue
> > like
> > > > > this?
> > > > >
> > > > > We were on Solr 8.1.1 for several months and moved to 8.2.0 for
> > about 2
> > > > > weeks before moving to 8.3.0 last week.
> > > > > We didn't see this issue at all on the previous releases. Since
> > moving to
> > > > > 8.3 we have had a consistent (but non-deterministic) set of failing
> > tests,
> > > > > on Windows and Linux.
> > > > >
> > > > > The issue we are seeing as that during updates, the data we have
> > sent is
> > > > > *sometimes* corrupted, as though a buffer has been used incorrectly.
> > For
> > > > > example if the well formed data went was
> > > > > *'fieldName':"this is a long string"*
> > > > > The error we see from Solr might be that
> > > > > unknown field * 'fieldNamis a long string" *
> > > > >
> > > > > And variations of that kind of behaviour, were part of the data is
> > missing
> > > > > or corrupted. The data we are indexing does include fields which
> > store
> > > > > (escaped) serialized JSON strings - if that might have any bearing -
> > but
> > > > > the error isn't always on those fields.
> > > > > For example, given a valid document that looks like this (I've
> > replaced
> > > > > the values by hand, so if the json is messed up here, that's not
> > relevant:)
> > > > > when returned with the json response writer:
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > *{"id": "abcd","testField": "blah","jsonField":
> > > > > "{\"thing\":{\"abcd\":\"value\",\"xyz\":[\"abc\",\"def\",\"
> > ghi\"],\"nnn\":\"xyz\"},\"stuff\":[{\"qqq\":\"rrr\"}],\"
> > ttt\":0,\"mmm\":\"Some
> > > > > string\",\"someBool\":true}"}*
> > > > > We've had errors during indexing like:
> > > > > *unknown field
> > > > > 

Re: fq pfloat_field:* returns no documents, tfloat:* does

2019-11-20 Thread Tomás Fernández Löbbe
Hi Webster,
> The fq  facet_melting_point:*
"Point" numeric fields don't support that syntax currently, and the way to
retrieve "docs with any value in field foo" is "foo:[* TO *]". See
https://issues.apache.org/jira/browse/SOLR-11746


On Wed, Nov 20, 2019 at 2:21 PM Webster Homer <
webster.ho...@milliporesigma.com> wrote:

> The fq   facet_melting_point:*
> Returns 0 rows. However the field clearly has data in it, why does this
> query return rows where there is data
>
> I am trying to update our solr schemas to use the point fields instead of
> the trie fields.
>
> We have a number of pfloat fields. These fields are indexed and I can
> facet on them
>
> This is a typical definition
>  stored="true" required="false" multiValued="true" docValues="true"/>
>
> Another odd behavior is that when I use the Schema Browser the "Load Term
> Info" loads no data.
>
> I am using Solr 7.2
> This message and any attachment are confidential and may be privileged or
> otherwise protected from disclosure. If you are not the intended recipient,
> you must not copy this message or attachment or disclose the contents to
> any other person. If you have received this transmission in error, please
> notify the sender immediately and delete the message and any attachment
> from your system. Merck KGaA, Darmstadt, Germany and any of its
> subsidiaries do not accept liability for any omissions or errors in this
> message which may arise as a result of E-Mail-transmission or for damages
> resulting from any unauthorized changes of the content of this message and
> any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its
> subsidiaries do not guarantee that this message is free of viruses and does
> not accept liability for any damages caused by any virus transmitted
> therewith. Click http://www.merckgroup.com/disclaimer to access the
> German, French, Spanish and Portuguese versions of this disclaimer.
>


fq pfloat_field:* returns no documents, tfloat:* does

2019-11-20 Thread Webster Homer
The fq   facet_melting_point:*
Returns 0 rows. However the field clearly has data in it, why does this query 
return rows where there is data

I am trying to update our solr schemas to use the point fields instead of the 
trie fields.

We have a number of pfloat fields. These fields are indexed and I can facet on 
them

This is a typical definition


Another odd behavior is that when I use the Schema Browser the "Load Term Info" 
loads no data.

I am using Solr 7.2
This message and any attachment are confidential and may be privileged or 
otherwise protected from disclosure. If you are not the intended recipient, you 
must not copy this message or attachment or disclose the contents to any other 
person. If you have received this transmission in error, please notify the 
sender immediately and delete the message and any attachment from your system. 
Merck KGaA, Darmstadt, Germany and any of its subsidiaries do not accept 
liability for any omissions or errors in this message which may arise as a 
result of E-Mail-transmission or for damages resulting from any unauthorized 
changes of the content of this message and any attachment thereto. Merck KGaA, 
Darmstadt, Germany and any of its subsidiaries do not guarantee that this 
message is free of viruses and does not accept liability for any damages caused 
by any virus transmitted therewith. Click http://www.merckgroup.com/disclaimer 
to access the German, French, Spanish and Portuguese versions of this 
disclaimer.


Re: How to change config set for some collection

2019-11-20 Thread Pratik Patel
Thanks Shawn! This is what I needed.

On Wed, Nov 20, 2019 at 3:59 PM Shawn Heisey  wrote:

> On 11/20/2019 1:34 PM, Pratik Patel wrote:
> > Let's say I have a collection called "collection1" which uses config set
> > "config_set_1".
> > Now, using "upconfig" command, I upload a new configuration called
> > "config_set_2". How can I make "collection1" use "config_set_2" instead
> of
> > "config_set_1"?
> >
> > I know that if I upload new configuration with the same name
> "config_set_1"
> > and reload the collection then it will have new configuration but I want
> to
> > keep the old config set, add a new one and make changes so that
> collection1
> > starts using new config set.
> >
> > Is it possible?
>
> There is an action, available in the zkcli script and possibly
> elsewhere, called "linkconfig".
>
> It looks like the config can also be changed with the collections API,
> using the MODIFYCOLLECTION action.
>
>
> https://lucene.apache.org/solr/guide/8_2/collection-management.html#modifycollection
>
> To make the change effective after linking to a new config, you'll need
> to reload the collection.
>
> Thanks,
> Shawn
>


Re: Possible data corruption in JavaBinCodec in Solr 8.3 during distributed update?

2019-11-20 Thread Colvin Cowie
Hi, I'll share it when I'm back at work tomorrow.

I've found that the issue appears to not be reproducible using the
solrconfig.xml in the _default example, so it must be something in ours (or
something missing from it) that is making the problem appear, in
combination with the code change.

Thanks


On Wednesday, 20 November 2019, Noble Paul  wrote:

> Can you share the test please
>
> On Thu, Nov 21, 2019 at 7:02 AM Noble Paul  wrote:
> >
> > Thanks Colvin, I'll take a look
> >
> > On Thu, Nov 21, 2019 at 4:24 AM Colvin Cowie 
> wrote:
> > >
> > > I've identified the change which has caused the problem to
> materialize, but
> > > it shouldn't itself cause a problem.
> > >
> > > https://github.com/apache/lucene-solr/commit/
> e45e8127d5c17af4e4b87a0a4eaf0afaf4f9ff4b#diff-
> 7f7f485122d8257bd5d3210c092b967fR52
> > > for https://issues.apache.org/jira/browse/SOLR-13682
> > >
> > > In writeMap, the new BiConsumer unwraps the SolrInputField using
> getValue
> > > rather than getRawValue (which the JavaBinCodec calls):
> > >
> > >
> > > *  if (o instanceof SolrInputField) {o = ((SolrInputField)
> > > o).getValue();  }*
> > > As a result the JavaBinCodec will now be hitting different writer
> methods
> > > based on the value retrieved from the SolrInputField, rather than just
> > > writing the org.apache.solr.common.util.JavaBinCodec.writeKnownType(
> Object)
> > >
> > >
> > > *if (val instanceof SolrInputField) {  return
> > > writeKnownType(((SolrInputField) val).getRawValue());}*
> > > https://github.com/apache/lucene-solr/blob/branch_8_3/
> solr/solrj/src/java/org/apache/solr/common/util/JavaBinCodec.java#L362
> > >
> > > SolrInputField getValue uses
> > > org.apache.solr.common.util.ByteArrayUtf8CharSequence.
> convertCharSeq(Object)
> > > while getRawValue just returns whatever value the SolrInputField has,
> so
> > > the EntryWriter in the JavaBinCodec hits different paths from the ones
> > > which must non-deterministically produce garbage data when getValue()
> is
> > > used.
> > >
> > > Changing *getValue()* to *getRawValue()* in the SolrInputDocument's
> > > *writeMap()* appears to "fix" the problem. (With getValue() the test I
> have
> > > reliably fails within 50 iterations of indexing 2500 documents, with
> > > getRawValue() it succeeds for the 500 iterations I'm running it for)
> > >
> > > I'll see about providing a test that can be shared that demonstrates
> the
> > > problem, and see if we can find what is going wrong in the codec...
> > >
> > >
> > > On Tue, 19 Nov 2019 at 13:48, Colvin Cowie  >
> > > wrote:
> > >
> > > > Hello
> > > >
> > > > Apologies for the lack of actual detail in this, we're still digging
> into
> > > > it ourselves. I will provide more detail, and maybe some logs, once
> I have
> > > > a better idea of what is actually happening.
> > > > But I thought I might as well ask if anyone knows of changes that
> were
> > > > made in the Solr 8.3 release that are likely to have caused an issue
> like
> > > > this?
> > > >
> > > > We were on Solr 8.1.1 for several months and moved to 8.2.0 for
> about 2
> > > > weeks before moving to 8.3.0 last week.
> > > > We didn't see this issue at all on the previous releases. Since
> moving to
> > > > 8.3 we have had a consistent (but non-deterministic) set of failing
> tests,
> > > > on Windows and Linux.
> > > >
> > > > The issue we are seeing as that during updates, the data we have
> sent is
> > > > *sometimes* corrupted, as though a buffer has been used incorrectly.
> For
> > > > example if the well formed data went was
> > > > *'fieldName':"this is a long string"*
> > > > The error we see from Solr might be that
> > > > unknown field * 'fieldNamis a long string" *
> > > >
> > > > And variations of that kind of behaviour, were part of the data is
> missing
> > > > or corrupted. The data we are indexing does include fields which
> store
> > > > (escaped) serialized JSON strings - if that might have any bearing -
> but
> > > > the error isn't always on those fields.
> > > > For example, given a valid document that looks like this (I've
> replaced
> > > > the values by hand, so if the json is messed up here, that's not
> relevant:)
> > > > when returned with the json response writer:
> > > >
> > > >
> > > >
> > > >
> > > > *{"id": "abcd","testField": "blah","jsonField":
> > > > "{\"thing\":{\"abcd\":\"value\",\"xyz\":[\"abc\",\"def\",\"
> ghi\"],\"nnn\":\"xyz\"},\"stuff\":[{\"qqq\":\"rrr\"}],\"
> ttt\":0,\"mmm\":\"Some
> > > > string\",\"someBool\":true}"}*
> > > > We've had errors during indexing like:
> > > > *unknown field
> > > > 'testField:"value","xyz":["abc","def","ghi"],"nnn":"xyz"}
> ,"stuff":[{"qqq":"rrr"}],"ttt":0,"mmm":"Some
> > > > string","someBool":true}���'*
> > > > (those � unprintable characters are part of it)
> > > >
> > > > So far we've not been able to reproduce the problem on a collection
> with a
> > > > single shard, so it does seem like the problem 

Re: How to change config set for some collection

2019-11-20 Thread Shawn Heisey

On 11/20/2019 1:34 PM, Pratik Patel wrote:

Let's say I have a collection called "collection1" which uses config set
"config_set_1".
Now, using "upconfig" command, I upload a new configuration called
"config_set_2". How can I make "collection1" use "config_set_2" instead of
"config_set_1"?

I know that if I upload new configuration with the same name "config_set_1"
and reload the collection then it will have new configuration but I want to
keep the old config set, add a new one and make changes so that collection1
starts using new config set.

Is it possible?


There is an action, available in the zkcli script and possibly 
elsewhere, called "linkconfig".


It looks like the config can also be changed with the collections API, 
using the MODIFYCOLLECTION action.


https://lucene.apache.org/solr/guide/8_2/collection-management.html#modifycollection

To make the change effective after linking to a new config, you'll need 
to reload the collection.


Thanks,
Shawn


How to change config set for some collection

2019-11-20 Thread Pratik Patel
Hello Everyone,

Let's say I have a collection called "collection1" which uses config set
"config_set_1".
Now, using "upconfig" command, I upload a new configuration called
"config_set_2". How can I make "collection1" use "config_set_2" instead of
"config_set_1"?

I know that if I upload new configuration with the same name "config_set_1"
and reload the collection then it will have new configuration but I want to
keep the old config set, add a new one and make changes so that collection1
starts using new config set.

Is it possible?


Thanks and Regards
Pratik


Re: CloudSolrClient - basic auth - multi shard collection

2019-11-20 Thread Nicolas Paris
>  you can fix the issue by upgrading to 8.2 - both of those

Thanks, I will try ASAP

On Wed, Nov 20, 2019 at 01:58:31PM -0500, Jason Gerlowski wrote:
> Hi Nicholas,
> 
> I'm not really familiar with spring-data-solr, so I can't speak to
> that detail, but it sounds like you might be running into either
> https://issues.apache.org/jira/browse/SOLR-13510 or
> https://issues.apache.org/jira/browse/SOLR-13472.  There are partial
> workarounds on those issues that might help you.  If those aren't
> sufficient, you can fix the issue by upgrading to 8.2 - both of those
> bugs are fixed in that version.
> 
> Hope that helps,
> 
> Jason
> 
> 
> On Mon, Nov 18, 2019 at 8:26 AM Nicolas Paris  
> wrote:
> >
> > Hello,
> >
> > I am having trouble with basic auth on a solrcloud instance. When the
> > collection is only one shard, there is no problem. When the collection
> > is multiple shard, there is no problem until I ask multiple query
> > concurrently: I get 401 error and asking for credentials for concurrent
> > queries.
> >
> > I have created a Premptive Auth Interceptor which should add the
> > credential information for every http call.
> >
> > Thanks for any pointer,
> >
> > solr:8.1
> > spring-data-solr:4.1.0
> > --
> > nicolas
> 

-- 
nicolas


Re: Possible data corruption in JavaBinCodec in Solr 8.3 during distributed update?

2019-11-20 Thread Noble Paul
Can you share the test please

On Thu, Nov 21, 2019 at 7:02 AM Noble Paul  wrote:
>
> Thanks Colvin, I'll take a look
>
> On Thu, Nov 21, 2019 at 4:24 AM Colvin Cowie  
> wrote:
> >
> > I've identified the change which has caused the problem to materialize, but
> > it shouldn't itself cause a problem.
> >
> > https://github.com/apache/lucene-solr/commit/e45e8127d5c17af4e4b87a0a4eaf0afaf4f9ff4b#diff-7f7f485122d8257bd5d3210c092b967fR52
> > for https://issues.apache.org/jira/browse/SOLR-13682
> >
> > In writeMap, the new BiConsumer unwraps the SolrInputField using getValue
> > rather than getRawValue (which the JavaBinCodec calls):
> >
> >
> > *  if (o instanceof SolrInputField) {o = ((SolrInputField)
> > o).getValue();  }*
> > As a result the JavaBinCodec will now be hitting different writer methods
> > based on the value retrieved from the SolrInputField, rather than just
> > writing the org.apache.solr.common.util.JavaBinCodec.writeKnownType(Object)
> >
> >
> > *if (val instanceof SolrInputField) {  return
> > writeKnownType(((SolrInputField) val).getRawValue());}*
> > https://github.com/apache/lucene-solr/blob/branch_8_3/solr/solrj/src/java/org/apache/solr/common/util/JavaBinCodec.java#L362
> >
> > SolrInputField getValue uses
> > org.apache.solr.common.util.ByteArrayUtf8CharSequence.convertCharSeq(Object)
> > while getRawValue just returns whatever value the SolrInputField has, so
> > the EntryWriter in the JavaBinCodec hits different paths from the ones
> > which must non-deterministically produce garbage data when getValue() is
> > used.
> >
> > Changing *getValue()* to *getRawValue()* in the SolrInputDocument's
> > *writeMap()* appears to "fix" the problem. (With getValue() the test I have
> > reliably fails within 50 iterations of indexing 2500 documents, with
> > getRawValue() it succeeds for the 500 iterations I'm running it for)
> >
> > I'll see about providing a test that can be shared that demonstrates the
> > problem, and see if we can find what is going wrong in the codec...
> >
> >
> > On Tue, 19 Nov 2019 at 13:48, Colvin Cowie 
> > wrote:
> >
> > > Hello
> > >
> > > Apologies for the lack of actual detail in this, we're still digging into
> > > it ourselves. I will provide more detail, and maybe some logs, once I have
> > > a better idea of what is actually happening.
> > > But I thought I might as well ask if anyone knows of changes that were
> > > made in the Solr 8.3 release that are likely to have caused an issue like
> > > this?
> > >
> > > We were on Solr 8.1.1 for several months and moved to 8.2.0 for about 2
> > > weeks before moving to 8.3.0 last week.
> > > We didn't see this issue at all on the previous releases. Since moving to
> > > 8.3 we have had a consistent (but non-deterministic) set of failing tests,
> > > on Windows and Linux.
> > >
> > > The issue we are seeing as that during updates, the data we have sent is
> > > *sometimes* corrupted, as though a buffer has been used incorrectly. For
> > > example if the well formed data went was
> > > *'fieldName':"this is a long string"*
> > > The error we see from Solr might be that
> > > unknown field * 'fieldNamis a long string" *
> > >
> > > And variations of that kind of behaviour, were part of the data is missing
> > > or corrupted. The data we are indexing does include fields which store
> > > (escaped) serialized JSON strings - if that might have any bearing - but
> > > the error isn't always on those fields.
> > > For example, given a valid document that looks like this (I've replaced
> > > the values by hand, so if the json is messed up here, that's not 
> > > relevant:)
> > > when returned with the json response writer:
> > >
> > >
> > >
> > >
> > > *{"id": "abcd","testField": "blah","jsonField":
> > > "{\"thing\":{\"abcd\":\"value\",\"xyz\":[\"abc\",\"def\",\"ghi\"],\"nnn\":\"xyz\"},\"stuff\":[{\"qqq\":\"rrr\"}],\"ttt\":0,\"mmm\":\"Some
> > > string\",\"someBool\":true}"}*
> > > We've had errors during indexing like:
> > > *unknown field
> > > 'testField:"value","xyz":["abc","def","ghi"],"nnn":"xyz"},"stuff":[{"qqq":"rrr"}],"ttt":0,"mmm":"Some
> > > string","someBool":true}���'*
> > > (those � unprintable characters are part of it)
> > >
> > > So far we've not been able to reproduce the problem on a collection with a
> > > single shard, so it does seem like the problem is only happening 
> > > internally
> > > when updates are distributed to the other shards... But that's not been
> > > totally verified.
> > >
> > > We've also only encountered the problem on one of the collections we build
> > > (the data within each collection is generally the same though. The ids are
> > > slightly different - but still strings. The main difference is that this
> > > problematic index is built using an Iterator to *solrj
> > > org.apache.solr.client.solrj.SolrClient.add(String,
> > > Iterator)* - the *SolrInputDocument*s are not being
> > > reused in the client, I checked 

Re: Possible data corruption in JavaBinCodec in Solr 8.3 during distributed update?

2019-11-20 Thread Noble Paul
Thanks Colvin, I'll take a look

On Thu, Nov 21, 2019 at 4:24 AM Colvin Cowie  wrote:
>
> I've identified the change which has caused the problem to materialize, but
> it shouldn't itself cause a problem.
>
> https://github.com/apache/lucene-solr/commit/e45e8127d5c17af4e4b87a0a4eaf0afaf4f9ff4b#diff-7f7f485122d8257bd5d3210c092b967fR52
> for https://issues.apache.org/jira/browse/SOLR-13682
>
> In writeMap, the new BiConsumer unwraps the SolrInputField using getValue
> rather than getRawValue (which the JavaBinCodec calls):
>
>
> *  if (o instanceof SolrInputField) {o = ((SolrInputField)
> o).getValue();  }*
> As a result the JavaBinCodec will now be hitting different writer methods
> based on the value retrieved from the SolrInputField, rather than just
> writing the org.apache.solr.common.util.JavaBinCodec.writeKnownType(Object)
>
>
> *if (val instanceof SolrInputField) {  return
> writeKnownType(((SolrInputField) val).getRawValue());}*
> https://github.com/apache/lucene-solr/blob/branch_8_3/solr/solrj/src/java/org/apache/solr/common/util/JavaBinCodec.java#L362
>
> SolrInputField getValue uses
> org.apache.solr.common.util.ByteArrayUtf8CharSequence.convertCharSeq(Object)
> while getRawValue just returns whatever value the SolrInputField has, so
> the EntryWriter in the JavaBinCodec hits different paths from the ones
> which must non-deterministically produce garbage data when getValue() is
> used.
>
> Changing *getValue()* to *getRawValue()* in the SolrInputDocument's
> *writeMap()* appears to "fix" the problem. (With getValue() the test I have
> reliably fails within 50 iterations of indexing 2500 documents, with
> getRawValue() it succeeds for the 500 iterations I'm running it for)
>
> I'll see about providing a test that can be shared that demonstrates the
> problem, and see if we can find what is going wrong in the codec...
>
>
> On Tue, 19 Nov 2019 at 13:48, Colvin Cowie 
> wrote:
>
> > Hello
> >
> > Apologies for the lack of actual detail in this, we're still digging into
> > it ourselves. I will provide more detail, and maybe some logs, once I have
> > a better idea of what is actually happening.
> > But I thought I might as well ask if anyone knows of changes that were
> > made in the Solr 8.3 release that are likely to have caused an issue like
> > this?
> >
> > We were on Solr 8.1.1 for several months and moved to 8.2.0 for about 2
> > weeks before moving to 8.3.0 last week.
> > We didn't see this issue at all on the previous releases. Since moving to
> > 8.3 we have had a consistent (but non-deterministic) set of failing tests,
> > on Windows and Linux.
> >
> > The issue we are seeing as that during updates, the data we have sent is
> > *sometimes* corrupted, as though a buffer has been used incorrectly. For
> > example if the well formed data went was
> > *'fieldName':"this is a long string"*
> > The error we see from Solr might be that
> > unknown field * 'fieldNamis a long string" *
> >
> > And variations of that kind of behaviour, were part of the data is missing
> > or corrupted. The data we are indexing does include fields which store
> > (escaped) serialized JSON strings - if that might have any bearing - but
> > the error isn't always on those fields.
> > For example, given a valid document that looks like this (I've replaced
> > the values by hand, so if the json is messed up here, that's not relevant:)
> > when returned with the json response writer:
> >
> >
> >
> >
> > *{"id": "abcd","testField": "blah","jsonField":
> > "{\"thing\":{\"abcd\":\"value\",\"xyz\":[\"abc\",\"def\",\"ghi\"],\"nnn\":\"xyz\"},\"stuff\":[{\"qqq\":\"rrr\"}],\"ttt\":0,\"mmm\":\"Some
> > string\",\"someBool\":true}"}*
> > We've had errors during indexing like:
> > *unknown field
> > 'testField:"value","xyz":["abc","def","ghi"],"nnn":"xyz"},"stuff":[{"qqq":"rrr"}],"ttt":0,"mmm":"Some
> > string","someBool":true}���'*
> > (those � unprintable characters are part of it)
> >
> > So far we've not been able to reproduce the problem on a collection with a
> > single shard, so it does seem like the problem is only happening internally
> > when updates are distributed to the other shards... But that's not been
> > totally verified.
> >
> > We've also only encountered the problem on one of the collections we build
> > (the data within each collection is generally the same though. The ids are
> > slightly different - but still strings. The main difference is that this
> > problematic index is built using an Iterator to *solrj
> > org.apache.solr.client.solrj.SolrClient.add(String,
> > Iterator)* - the *SolrInputDocument*s are not being
> > reused in the client, I checked that -, while the other index is built by
> > streaming CSVs to Solr.)
> >
> >
> > We will look into it further, but if anyone has any ideas of what might
> > have changed in 8.3 from 8.1 / 8.2 that could cause this, that would be
> > helpful.
> >
> > Cheers
> > Colvin
> >
> >



-- 

Re: Active directory integration in Solr

2019-11-20 Thread Kevin Risden
So I wrote the blog more of an experiment above. I don't know if it is
fully operating other than on a single node. That being said, the Hadoop
authentication plugin doesn't require running on HDFS. It just uses the
Hadoop code to do authentication.

I will echo what Jorn said though - I wouldn't expose Solr to the internet
or directly without some sort of API. Whether you do
authentication/authorization at the API is a separate question.

Kevin Risden


On Wed, Nov 20, 2019 at 1:54 PM Jörn Franke  wrote:

> I would not give users directly access to Solr - even with LDAP plugin.
> Build a rest interface or web interface that does the authentication and
> authorization and security sanitization. Then you can also manage better
> excessive queries or explicitly forbid certain type of queries (eg specific
> streaming expressions - I would not expose all of them to users).
>
> > Am 19.11.2019 um 11:02 schrieb Kommu, Vinodh K. :
> >
> > Thanks Charlie.
> >
> > We are already using Basic authentication in our existing clusters,
> however it's getting difficult to maintain number of users as we are
> getting too many requests for readonly access from support teams. So we
> desperately looking for active directory solution. Just wondering if
> someone might have same requirement need.
> >
> >
> > Regards,
> > Vinodh
> >
> > -Original Message-
> > From: Charlie Hull 
> > Sent: Tuesday, November 19, 2019 2:55 PM
> > To: solr-user@lucene.apache.org
> > Subject: Re: Active directory integration in Solr
> >
> > ATTENTION! This email originated outside of DTCC; exercise caution.
> >
> > Not out of the box, there are a few authentication plugins bundled but
> not for AD
> >
> https://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Flucene.apache.org%2Fsolr%2Fguide%2F7_2%2Fauthentication-and-authorization-plugins.htmldata=02%7C01%7Cvkommu%40dtcc.com%7C2e17e1feef78432502e008d76cd26635%7C0465519d7f554d47998b55e2a86f04a8%7C0%7C0%7C637097523245309858sdata=fkahJ62aWFYh7QxcyFQbJV9u8OsTYSWp6pv0MNdzjps%3Dreserved=0
> > - there's also some useful stuff in Apache ManifoldCF
> >
> https://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.francelabs.com%2Fblog%2Ftutorial-on-authorizations-for-manifold-cf-and-solr%2Fdata=02%7C01%7Cvkommu%40dtcc.com%7C2e17e1feef78432502e008d76cd26635%7C0465519d7f554d47998b55e2a86f04a8%7C0%7C0%7C637097523245319858sdata=iYiKRDJKYBZaxUd%2F%2BIddFBwxB2RhSqih2KZc26aZlRU%3Dreserved=0
> >
> >
> > Best
> >
> > Charlie
> >
> >> On 18/11/2019 15:08, Kommu, Vinodh K. wrote:
> >> Hi,
> >>
> >> Does anyone know that Solr has any out of the box capability to
> integrate Active directory (using LDAP) when security is enabled? Instead
> of creating users in security.json file, planning to use users who already
> exists in active directory so they can use their individual credentials
> rather than defining in Solr. Did anyone came across similar requirement?
> If so was there any working solution?
> >>
> >>
> >> Thanks,
> >> Vinodh
> >>
> >> DTCC DISCLAIMER: This email and any files transmitted with it are
> confidential and intended solely for the use of the individual or entity to
> whom they are addressed. If you have received this email in error, please
> notify us immediately and delete the email and any attachments from your
> system. The recipient should check this email and any attachments for the
> presence of viruses. The company accepts no liability for any damage caused
> by any virus transmitted by this email.
> >>
> >
> > --
> > Charlie Hull
> > Flax - Open Source Enterprise Search
> >
> > tel/fax: +44 (0)8700 118334
> > mobile:  +44 (0)7767 825828
> > web:
> https://nam02.safelinks.protection.outlook.com/?url=www.flax.co.ukdata=02%7C01%7Cvkommu%40dtcc.com%7C2e17e1feef78432502e008d76cd26635%7C0465519d7f554d47998b55e2a86f04a8%7C0%7C0%7C637097523245319858sdata=YNGIg%2FVgL2w82i3JWsBkBTJeefHMjSxbjLaQyOdJVt0%3Dreserved=0
> >
> > DTCC DISCLAIMER: This email and any files transmitted with it are
> confidential and intended solely for the use of the individual or entity to
> whom they are addressed. If you have received this email in error, please
> notify us immediately and delete the email and any attachments from your
> system. The recipient should check this email and any attachments for the
> presence of viruses. The company accepts no liability for any damage caused
> by any virus transmitted by this email.
> >
>


Re: Zk upconfig command is appending local directory to default confdir

2019-11-20 Thread Michael Becker
I was able to resolve this. I had left out the trailing "/" from the filepath, 
so Solr wasn't seeing it as a local directory.

On 11/19/19, 2:45 AM, "Dominique Bejean"  wrote:

Hi Michael,

It seems Sorl really don't find any solrconfig.xml file or a
conf/solrconfig.xml file in the local path you specified. The last try is
to look in "/opt/solr-6.5.1/server/solr/configsets/", but obviously it doesn't work has you didn't specify a
confiset name.

The code is here -

https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apache_lucene-2Dsolr_blob_8cde1277ec7151bd6ab62950ac93cbdd6ff04d9f_solr_solrj_src_java_org_apache_solr_common_cloud_ZkConfigManager.java-23L181=DwIFaQ=mV61MqUbMWY-LlirxZJvJg=8W5Kk7fJg_C2taHlYyXZLinEFeEtcUcFddYrS5aUhiE=XQfOcrEywCrmCsBoBMPAznCdw_ZV28m7HTnR5rf1q50=bik3n9dWgRpi2fcr0t60J2NV7UzzXdg5T4VpPDt_-yE=
 


Any error in read access rights to your config directory ?

Regards

Dominique



Le lun. 18 nov. 2019 à 15:48, Michael Becker  a écrit :

> I’ve run into an issue when attempting to configure Zookeeper. When
> running the zk upconfig -d command specifying a local directory where the
> solrconfig.xml files are located, I get the following error:
> “Could not complete upconfig operation for reason: Could not find
> solrconfig.xml at /opt/solr-6.5.1/server/solr/configsets/solrconfig.xml,
> /opt/solr-6.5.1/server/solr/configsets/conf/solrconfig.xml or
> /opt/solr-6.5.1/server/solr/configsets/ 
> /solrconfig.xml”
>
> I’m trying to determine why the solr zk upconfig command is appending my
> local directory to the default confdir, rather than looking for the XML
> files in that directory,
> I have two other environments with Solr where this does not occur. It’s
> just this one environment that is having this issue.
> I am using Solr version 6.5.1.
> Any suggestions on how to troubleshoot this would be appreciated.
>
> Mike
>




Re: CloudSolrClient - basic auth - multi shard collection

2019-11-20 Thread Jason Gerlowski
Hi Nicholas,

I'm not really familiar with spring-data-solr, so I can't speak to
that detail, but it sounds like you might be running into either
https://issues.apache.org/jira/browse/SOLR-13510 or
https://issues.apache.org/jira/browse/SOLR-13472.  There are partial
workarounds on those issues that might help you.  If those aren't
sufficient, you can fix the issue by upgrading to 8.2 - both of those
bugs are fixed in that version.

Hope that helps,

Jason


On Mon, Nov 18, 2019 at 8:26 AM Nicolas Paris  wrote:
>
> Hello,
>
> I am having trouble with basic auth on a solrcloud instance. When the
> collection is only one shard, there is no problem. When the collection
> is multiple shard, there is no problem until I ask multiple query
> concurrently: I get 401 error and asking for credentials for concurrent
> queries.
>
> I have created a Premptive Auth Interceptor which should add the
> credential information for every http call.
>
> Thanks for any pointer,
>
> solr:8.1
> spring-data-solr:4.1.0
> --
> nicolas


Re: Active directory integration in Solr

2019-11-20 Thread Jörn Franke
I would not give users directly access to Solr - even with LDAP plugin. Build a 
rest interface or web interface that does the authentication and authorization 
and security sanitization. Then you can also manage better excessive queries or 
explicitly forbid certain type of queries (eg specific streaming expressions - 
I would not expose all of them to users).

> Am 19.11.2019 um 11:02 schrieb Kommu, Vinodh K. :
> 
> Thanks Charlie.
> 
> We are already using Basic authentication in our existing clusters, however 
> it's getting difficult to maintain number of users as we are getting too many 
> requests for readonly access from support teams. So we desperately looking 
> for active directory solution. Just wondering if someone might have same 
> requirement need.
> 
> 
> Regards,
> Vinodh 
> 
> -Original Message-
> From: Charlie Hull  
> Sent: Tuesday, November 19, 2019 2:55 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Active directory integration in Solr
> 
> ATTENTION! This email originated outside of DTCC; exercise caution.
> 
> Not out of the box, there are a few authentication plugins bundled but not 
> for AD
> https://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Flucene.apache.org%2Fsolr%2Fguide%2F7_2%2Fauthentication-and-authorization-plugins.htmldata=02%7C01%7Cvkommu%40dtcc.com%7C2e17e1feef78432502e008d76cd26635%7C0465519d7f554d47998b55e2a86f04a8%7C0%7C0%7C637097523245309858sdata=fkahJ62aWFYh7QxcyFQbJV9u8OsTYSWp6pv0MNdzjps%3Dreserved=0
> - there's also some useful stuff in Apache ManifoldCF
> https://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.francelabs.com%2Fblog%2Ftutorial-on-authorizations-for-manifold-cf-and-solr%2Fdata=02%7C01%7Cvkommu%40dtcc.com%7C2e17e1feef78432502e008d76cd26635%7C0465519d7f554d47998b55e2a86f04a8%7C0%7C0%7C637097523245319858sdata=iYiKRDJKYBZaxUd%2F%2BIddFBwxB2RhSqih2KZc26aZlRU%3Dreserved=0
> 
> 
> Best
> 
> Charlie
> 
>> On 18/11/2019 15:08, Kommu, Vinodh K. wrote:
>> Hi,
>> 
>> Does anyone know that Solr has any out of the box capability to integrate 
>> Active directory (using LDAP) when security is enabled? Instead of creating 
>> users in security.json file, planning to use users who already exists in 
>> active directory so they can use their individual credentials rather than 
>> defining in Solr. Did anyone came across similar requirement? If so was 
>> there any working solution?
>> 
>> 
>> Thanks,
>> Vinodh
>> 
>> DTCC DISCLAIMER: This email and any files transmitted with it are 
>> confidential and intended solely for the use of the individual or entity to 
>> whom they are addressed. If you have received this email in error, please 
>> notify us immediately and delete the email and any attachments from your 
>> system. The recipient should check this email and any attachments for the 
>> presence of viruses. The company accepts no liability for any damage caused 
>> by any virus transmitted by this email.
>> 
> 
> --
> Charlie Hull
> Flax - Open Source Enterprise Search
> 
> tel/fax: +44 (0)8700 118334
> mobile:  +44 (0)7767 825828
> web: 
> https://nam02.safelinks.protection.outlook.com/?url=www.flax.co.ukdata=02%7C01%7Cvkommu%40dtcc.com%7C2e17e1feef78432502e008d76cd26635%7C0465519d7f554d47998b55e2a86f04a8%7C0%7C0%7C637097523245319858sdata=YNGIg%2FVgL2w82i3JWsBkBTJeefHMjSxbjLaQyOdJVt0%3Dreserved=0
> 
> DTCC DISCLAIMER: This email and any files transmitted with it are 
> confidential and intended solely for the use of the individual or entity to 
> whom they are addressed. If you have received this email in error, please 
> notify us immediately and delete the email and any attachments from your 
> system. The recipient should check this email and any attachments for the 
> presence of viruses. The company accepts no liability for any damage caused 
> by any virus transmitted by this email.
> 


Re: How do I add my own Streaming Expressions?

2019-11-20 Thread Joel Bernstein
Yeah this not documented. Here are two links that will be helpful:

https://issues.apache.org/jira/browse/SOLR-9103

Slide 40 Shows the solrconfig.xml approach to registering new streams:
https://www.slideshare.net/lucidworks/creating-new-streams-presented-by-dennis-gove-bloomberg-lp



Joel Bernstein
http://joelsolr.blogspot.com/


On Tue, Nov 19, 2019 at 3:04 PM Eric Pugh 
wrote:

> The documentation in the StreamHandler suggests adding into Solrconfig
> some streamFunctions:
>
>  * 
>  *   name="group">org.apache.solr.client.solrj.io.stream.ReducerStream
>  *   name="count">org.apache.solr.client.solrj.io.stream.RecordCountStream
>  * 
>
>
>
> https://github.com/apache/lucene-solr/blob/master/solr/core/src/java/org/apache/solr/handler/StreamHandler.java#L114
>
> What is happening in StreamHandler doesn’t seem to be working, however in
> the similar GraphHandler, there is a call to “streamFunctions”:
>
>
> https://github.com/apache/lucene-solr/blob/master/solr/core/src/java/org/apache/solr/handler/GraphHandler.java#L90
>
> I’m still debugging this…
>
> Eric
>
>
>
> > On Nov 15, 2019, at 9:43 PM, Eric Pugh 
> wrote:
> >
> > What is the process for adding new Streaming Expressions?
> >
> > It appears that the org.apache.solr.client.solrj.io.Lang method
> statically loads all the streaming expressions?
> >
> > Eric
> >
> > ___
> > Eric Pugh | Founder & CEO | OpenSource Connections, LLC | 434.466.1467 |
> http://www.opensourceconnections.com <
> http://www.opensourceconnections.com/> | My Free/Busy <
> http://tinyurl.com/eric-cal>
> > Co-Author: Apache Solr Enterprise Search Server, 3rd Ed <
> https://www.packtpub.com/big-data-and-business-intelligence/apache-solr-enterprise-search-server-third-edition-raw>
>
> > This e-mail and all contents, including attachments, is considered to be
> Company Confidential unless explicitly stated otherwise, regardless of
> whether attachments are marked as such.
> >
>
> ___
> Eric Pugh | Founder & CEO | OpenSource Connections, LLC | 434.466.1467 |
> http://www.opensourceconnections.com <
> http://www.opensourceconnections.com/> | My Free/Busy <
> http://tinyurl.com/eric-cal>
> Co-Author: Apache Solr Enterprise Search Server, 3rd Ed <
> https://www.packtpub.com/big-data-and-business-intelligence/apache-solr-enterprise-search-server-third-edition-raw>
>
> This e-mail and all contents, including attachments, is considered to be
> Company Confidential unless explicitly stated otherwise, regardless of
> whether attachments are marked as such.
>
>


Re: Possible data corruption in JavaBinCodec in Solr 8.3 during distributed update?

2019-11-20 Thread Colvin Cowie
I've identified the change which has caused the problem to materialize, but
it shouldn't itself cause a problem.

https://github.com/apache/lucene-solr/commit/e45e8127d5c17af4e4b87a0a4eaf0afaf4f9ff4b#diff-7f7f485122d8257bd5d3210c092b967fR52
for https://issues.apache.org/jira/browse/SOLR-13682

In writeMap, the new BiConsumer unwraps the SolrInputField using getValue
rather than getRawValue (which the JavaBinCodec calls):


*  if (o instanceof SolrInputField) {o = ((SolrInputField)
o).getValue();  }*
As a result the JavaBinCodec will now be hitting different writer methods
based on the value retrieved from the SolrInputField, rather than just
writing the org.apache.solr.common.util.JavaBinCodec.writeKnownType(Object)


*if (val instanceof SolrInputField) {  return
writeKnownType(((SolrInputField) val).getRawValue());}*
https://github.com/apache/lucene-solr/blob/branch_8_3/solr/solrj/src/java/org/apache/solr/common/util/JavaBinCodec.java#L362

SolrInputField getValue uses
org.apache.solr.common.util.ByteArrayUtf8CharSequence.convertCharSeq(Object)
while getRawValue just returns whatever value the SolrInputField has, so
the EntryWriter in the JavaBinCodec hits different paths from the ones
which must non-deterministically produce garbage data when getValue() is
used.

Changing *getValue()* to *getRawValue()* in the SolrInputDocument's
*writeMap()* appears to "fix" the problem. (With getValue() the test I have
reliably fails within 50 iterations of indexing 2500 documents, with
getRawValue() it succeeds for the 500 iterations I'm running it for)

I'll see about providing a test that can be shared that demonstrates the
problem, and see if we can find what is going wrong in the codec...


On Tue, 19 Nov 2019 at 13:48, Colvin Cowie 
wrote:

> Hello
>
> Apologies for the lack of actual detail in this, we're still digging into
> it ourselves. I will provide more detail, and maybe some logs, once I have
> a better idea of what is actually happening.
> But I thought I might as well ask if anyone knows of changes that were
> made in the Solr 8.3 release that are likely to have caused an issue like
> this?
>
> We were on Solr 8.1.1 for several months and moved to 8.2.0 for about 2
> weeks before moving to 8.3.0 last week.
> We didn't see this issue at all on the previous releases. Since moving to
> 8.3 we have had a consistent (but non-deterministic) set of failing tests,
> on Windows and Linux.
>
> The issue we are seeing as that during updates, the data we have sent is
> *sometimes* corrupted, as though a buffer has been used incorrectly. For
> example if the well formed data went was
> *'fieldName':"this is a long string"*
> The error we see from Solr might be that
> unknown field * 'fieldNamis a long string" *
>
> And variations of that kind of behaviour, were part of the data is missing
> or corrupted. The data we are indexing does include fields which store
> (escaped) serialized JSON strings - if that might have any bearing - but
> the error isn't always on those fields.
> For example, given a valid document that looks like this (I've replaced
> the values by hand, so if the json is messed up here, that's not relevant:)
> when returned with the json response writer:
>
>
>
>
> *{"id": "abcd","testField": "blah","jsonField":
> "{\"thing\":{\"abcd\":\"value\",\"xyz\":[\"abc\",\"def\",\"ghi\"],\"nnn\":\"xyz\"},\"stuff\":[{\"qqq\":\"rrr\"}],\"ttt\":0,\"mmm\":\"Some
> string\",\"someBool\":true}"}*
> We've had errors during indexing like:
> *unknown field
> 'testField:"value","xyz":["abc","def","ghi"],"nnn":"xyz"},"stuff":[{"qqq":"rrr"}],"ttt":0,"mmm":"Some
> string","someBool":true}���'*
> (those � unprintable characters are part of it)
>
> So far we've not been able to reproduce the problem on a collection with a
> single shard, so it does seem like the problem is only happening internally
> when updates are distributed to the other shards... But that's not been
> totally verified.
>
> We've also only encountered the problem on one of the collections we build
> (the data within each collection is generally the same though. The ids are
> slightly different - but still strings. The main difference is that this
> problematic index is built using an Iterator to *solrj
> org.apache.solr.client.solrj.SolrClient.add(String,
> Iterator)* - the *SolrInputDocument*s are not being
> reused in the client, I checked that -, while the other index is built by
> streaming CSVs to Solr.)
>
>
> We will look into it further, but if anyone has any ideas of what might
> have changed in 8.3 from 8.1 / 8.2 that could cause this, that would be
> helpful.
>
> Cheers
> Colvin
>
>


RE: Active directory integration in Solr

2019-11-20 Thread Kommu, Vinodh K.
Thanks for responding Wood.

Since we are running on Solr cloud not in HDFS, perhaps there is no viable 
solution for solr cloud I think.


Regards,
Vinodh

-Original Message-
From: Mark H. Wood  
Sent: Wednesday, November 20, 2019 9:44 PM
To: solr-user@lucene.apache.org
Subject: Re: Active directory integration in Solr

ATTENTION! This email originated outside of DTCC; exercise caution.
DTCC DISCLAIMER: This email and any files transmitted with it are confidential 
and intended solely for the use of the individual or entity to whom they are 
addressed. If you have received this email in error, please notify us 
immediately and delete the email and any attachments from your system. The 
recipient should check this email and any attachments for the presence of 
viruses. The company accepts no liability for any damage caused by any virus 
transmitted by this email.



Re: When search term has two stopwords ('and' and 'a') together, it doesn't work

2019-11-20 Thread Guilherme Viteri
Hi,

Alright, after trying and trying, I have managed to isolate the fields that are 
causing the search to fail.
Now, all the fields are "" are 
breaking up my search. 

I changed the id-StrField to 







And finally now it works, however I am just scared this is not correct or bad 
practice as I am dealing with IDs and they should be anyhow parsed.

What is your opinion ?

Thanks
Guilherme

> On 18 Nov 2019, at 15:42, Guilherme Viteri  wrote:
> 
> Hi,
> 
>> Have you tried reindexing the documents and compare the results? No issues
>> if you cannot do that - let's try something else. I was going through the
>> whole mail and your files. You had said:
> Yes, but since it hasn't worked as suggested, I kept as you suggested.
> 
>> As soon as I add dbId or stId (regardless the boost, 1.0 or 100.0), then I
>>> don't get anything (which make sense).
>> 
>> Why did you think that not getting anything when you add dbId made sense?
>> Asking because I may be missing something here.
> I am searching for a text and I was searching on an ID field, which wouldn't 
> make sense.
> (I will come back to this soon.)
> 
> Ok, I've been adding and removing fields in the qf and I could isolate half 
> of the problem. First, I have one type of field called keyword_field and I 
> added the StopWords filter for this field and It worked. Second,
> when I add the fields that are id ( />
> 
> Do you think I should also the stopwords filter for the fieldtype id ?
> (I tried, and it worked, but I am not sure if this is conceptually correct, 
> id, should remain intact from my understand)
> 
> Thanks
> Guilherme
> 
>> On 18 Nov 2019, at 05:37, Paras Lehana > > wrote:
>> 
>> Hi Guilherme,
>> 
>> Have you tried reindexing the documents and compare the results? No issues
>> if you cannot do that - let's try something else. I was going through the
>> whole mail and your files. You had said:
>> 
>> As soon as I add dbId or stId (regardless the boost, 1.0 or 100.0), then I
>>> don't get anything (which make sense).
>> 
>> 
>> Why did you think that not getting anything when you add dbId made sense?
>> Asking because I may be missing something here.
>> 
>> Also, what is the purpose of so many qf's? Going through your documents and
>> config files, I found that your dbId's are string of numbers and I don't
>> think you want to find your query terms in dbId, right?
>> Do you want to boost the score by the values in dbId?
>> 
>> Your qf of dbId^100 boosts documents containing terms in q by 100x. Since
>> your terms don't match with the values in dbId for any document, the score
>> produced by this scoring is 0. 100x or 1x of 0 is still 0.
>> I still need to see how this scoring gets added up in edismax parser but do
>> reevaluate the usage of these qfs. Same goes for other qf boosts. :)
>> 
>> 
>> On Fri, 15 Nov 2019 at 12:23, Guilherme Viteri > > wrote:
>> 
>>> Hi Paras
>>> No worries.
>>> No I didn’t find anything. This is annoying now...
>>> Yes! They do contain dbId. Absolutely all my docs contains dbId and it is
>>> actually my key, if you check again the schema.xml
>>> 
>>> Cheers
>>> Guilherme
>>> 
>>> On 15 Nov 2019, at 05:37, Paras Lehana >> > wrote:
>>> 
>>> 
>>> Hey Guilherme,
>>> 
>>> I was a bit busy for the past few days and couldn't read your mail. So,
>>> did you find anything? Anyways, as I had expected, the culprit is
>>> definitely among the qfs. Do the documents in concern contain dbId? I
>>> suggest you to cross check the fields in your document with those impacting
>>> the result in qf.
>>> 
>>> On Tue, 12 Nov 2019 at 16:14, Guilherme Viteri >> > wrote:
>>> 
 What I can't understand is:
 I search for the exact term - "Immunoregulatory interactions between a
 Lymphoid *and a* non-Lymphoid cell" and If i search "I search for the
 exact term - Immunoregulatory interactions between a Lymphoid *and 
 *non-Lymphoid
 cell" then it works
 
 On 11 Nov 2019, at 12:24, Guilherme Viteri >>> > wrote:
 
 Thanks
 
 Removing stopwords is another story. I'm curious to find the reason
 assuming that you keep on using stopwords. In some cases, stopwords are
 really necessary.
 
 Yes. It always make sense the way we've been using.
 
 If q.alt is giving you responses, it's confirmed that your stopwords
 filter
 is working as expected. The problem definitely lies in the configuration
 of
 edismax.
 
 I see.
 
 *Let me explain again:* In your solrconfig.xml, look at your /search
 
 Ok, using q now, removed all qf, performed the search and I got 23
 results, and the one I really want, on the top.
 As soon as I add dbId or stId (regardless the boost, 1.0 or 100.0), then
 I don't get anything (which make 

Re: Possible synchronization bug in Solr reader

2019-11-20 Thread Bram Biesbrouck
Please allow me to answer my own question.
I was using the ThreadLocalCleaner

class from the Apache Sling project that is a very useful (but dangerous)
tool.
Bottom line: it doesn't like weak references in ThreadLocals, like in
Solr's CloseableThreadLocal class.

b.


On Tue, Nov 19, 2019 at 4:34 PM Bram Biesbrouck <
bram.biesbro...@reinvention.be> wrote:

> Hi all,
>
> I think I might have discovered a synchronization bug when ingesting a lot
> of data into Solr, but want to check with the specialists first ;-)
>
> I'm using a little custom written map/reduce framework that boots a
> 20-something threads to do some heavy processing on data-preparation. When
> this processing is done, the results of these threads are gathers in a
> reduce step, where they are ingested into an (embedded) Solr instance. To
> maximize throughput, I'm ingesting the data in parallel in a couple of
> threads of their own and this is where I run into a synchronization error.
>
> As with all synchronization bugs, it happens "some" of the time and
> they're hard to debug, but I think I managed to get my finger on the root
> (I'm using Solr 8.3):
>
> in class org.apache.lucene.index.CodecReader, throws a NPE on line 84:
> getFieldsReader().visitDocument(docID, visitor);
>
> The issue is that the getFieldsReader() getter is mapped to a ThreadLocal
> (more explicitly,
> org.apache.lucene.index.SegmentCoreReaders.fieldsReaderLocal) that seems to
> be released (set to null) somewhere automatically, and read afterwards,
> without synchronizing the two.
>
> I don't think I should set any resource locks of my own, since I'm only
> using the SolrJ API and the /update endpoint.
>
> I know this is quite a low-level question, but could anyone point me in
> the right direction to further investigate this issue? Ie, what could be
> the reason the reader is released out-of-sync?
>
> best,
>
> b.
>


Re: Active directory integration in Solr

2019-11-20 Thread Mark H. Wood
On Mon, Nov 18, 2019 at 03:08:51PM +, Kommu, Vinodh K. wrote:
> Does anyone know that Solr has any out of the box capability to integrate 
> Active directory (using LDAP) when security is enabled? Instead of creating 
> users in security.json file, planning to use users who already exists in 
> active directory so they can use their individual credentials rather than 
> defining in Solr. Did anyone came across similar requirement? If so was there 
> any working solution?

Searching for "solr authentication ldap" turned up this:

https://risdenk.github.io/2018/11/20/apache-solr-hadoop-authentication-plugin-ldap.html

ADS also uses Kerberos, and Solr has a Kerberos authN plugin.  Would
that help?

-- 
Mark H. Wood
Lead Technology Analyst

University Library
Indiana University - Purdue University Indianapolis
755 W. Michigan Street
Indianapolis, IN 46202
317-274-0749
www.ulib.iupui.edu


signature.asc
Description: PGP signature


Re: Metrics avgRequestsPerSecond and avgRequestsPerSecond from documentation gone?

2019-11-20 Thread Andrzej Białecki
Hi,

Yes, the documentation needs to be fixed, these attributes have been removed or 
replaced.

* avgRequestsPerSecond -> requestTimes:meanRate. Please note that this is a 
non-decaying simple average based on the total wall clock time elapsed since 
the handler was started until NOW, and the total number of requests the handler 
processed in this time.

* avgTimePerRequest =  totalTime / requests (in nano-seconds). Please note that 
the “totalTime” metric represents the aggregated elapsed time when the handler 
was processing requests (ie. not including all other elapsed time when the 
handler was just sitting idle). Perhaps a better name for this metric would be 
“totalProcessingTime”. 

> On 19 Nov 2019, at 17:35, Koen De Groote  wrote:
> 
> Greetings,
> 
> I'm using Solr 7.6 and have enabled JMX metrics.
> 
> I ran into this page:
> https://lucene.apache.org/solr/guide/7_6/performance-statistics-reference.html#commonly-used-stats-for-request-handlers
> 
> Which mentions "avgRequestsPerSecond" and "avgTimePerRequest" and some
> other attributes, which do not exist anymore in this version. I have an
> older version(4) I spun up to have a look and they do exist in that version.
> 
> When getting info on a QUERY or UPDATE bean with name `requestTimes`, I get
> this:
> 
> # attributes
>  %0   - 50thPercentile (double, r)
>  %1   - 75thPercentile (double, r)
>  %2   - 95thPercentile (double, r)
>  %3   - 98thPercentile (double, r)
>  %4   - 999thPercentile (double, r)
>  %5   - 99thPercentile (double, r)
>  %6   - Count (long, r)
>  %7   - DurationUnit (java.lang.String, r)
>  %8   - FifteenMinuteRate (double, r)
>  %9   - FiveMinuteRate (double, r)
>  %10  - Max (double, r)
>  %11  - Mean (double, r)
>  %12  - MeanRate (double, r)
>  %13  - Min (double, r)
>  %14  - OneMinuteRate (double, r)
>  %15  - RateUnit (java.lang.String, r)
>  %16  - StdDev (double, r)
>  %17  - _instanceTag (java.lang.String, r)
> # operations
>  %0   - javax.management.ObjectName objectName()
>  %1   - [J values()
> #there's no notifications
> 
> And it seems that none of the current values are actually a proper
> replacement for the functionality these values used to offer.
> 
> How shall I go about getting this info now? Do I need to combine several
> other metrics?
> 
> For completeness sake, my solr.xml, where I enabled JMX, is just the
> default example from the documentation, with JMX added:
> 
> 
> 
>
>${host:}
>${jetty.port:8983}
>${hostContext:solr}
>${zkClientTimeout:15000}
> name="genericCoreNodeNames">${genericCoreNodeNames:true}
>
> class="HttpShardHandlerFactory">
>${socketTimeout:0}
>${connTimeout:0}
>
>
>
>javax.net.ssl.keyStorePassword
>javax.net.ssl.trustStorePassword
>basicauth
>zkDigestPassword
>zkDigestReadonlyPassword
>
> class="org.apache.solr.metrics.reporters.SolrJmxReporter">
> name="rootName">very_obvious_name_for_easy_reading_${jetty.port:8983}
>
>
> 
> 
> 
> Kind regards,
> Koen De Groote



Re: Question about Luke

2019-11-20 Thread Tomoko Uchida
Hello,

> Is it different from checkIndex -exorcise option?
> (As far as I recently leaned, checkIndex -exorcise will delete unreadable 
> indices. )

If you mean desktop app Luke, "Repair" is just a wrapper of
CheckIndex.exorciseIndex(). There is no difference between doing
"Repair" from Luke GUI and calling "CheckIndex -exorcise" from CLI.


2019年11月11日(月) 20:36 Kayak28 :
>
> Hello, Community:
>
> I am using Solr7.4.0 currently, and I was testing how Solr actually behaves 
> when it has a corrupted index.
> And I used Luke to fix the broken index from GUI.
> I just came up with the following questions.
> Is it possible to use the repair index tool from CLI? (in the case, Solr was 
> on AWS for example.)
> Is it different from checkIndex -exorcise option?
> (As far as I recently leaned, checkIndex -exorcise will delete unreadable 
> indices. )
>
> If anyone gives me a reply, I would be very thankful.
>
> Sincerely,
> Kaya Ota


Re: Solr process takes several minutes before accepting commands after restart

2019-11-20 Thread Jörn Franke
Have you checked the log files of Solr?


Do you have a service mesh in-between? Could it be something at the network 
layer/container orchestration  that is blocking requests for some minutes?

> Am 20.11.2019 um 10:32 schrieb Koen De Groote :
> 
> Hello
> 
> I was testing some backup/restore scenarios.
> 
> 1 of them is Solr7.6 in a docker container(7.6.0-slim), set up as
> SolrCloud, with zookeeper.
> 
> The steps are as follows:
> 
> 1. Manually delete the data folder.
> 2. Restart the container. The process is now in error mode, complaining
> that it cannot find the cores.
> 3. Fix the install, meaning create new data folders, which are empty at
> this point.
> 4. Restart the container again, to pick up the empty folders and not be in
> error anymore.
> 5. Perform the restore
> 6. Check if everything is available again
> 
> The problem is between step 4 and 5. After step 4, it takes several minutes
> before solr actually responds to curl commands.
> 
> Once responsive, the restore happened just fine. But it's very stressful in
> a situation where you have to restore a production environment and the
> process just doesn't respond for 5-10 minutes.
> 
> We're talking about 20GB of data here, so not very much, but not little
> either.
> 
> Is it normal that it takes so long before solr responds? If not, what
> should I look at in order to find the cause?
> 
> I have asked this before recently, though the wording was confusing. This
> should be clearer.
> 
> Kind regards,
> Koen De Groote


Solr process takes several minutes before accepting commands after restart

2019-11-20 Thread Koen De Groote
Hello

I was testing some backup/restore scenarios.

1 of them is Solr7.6 in a docker container(7.6.0-slim), set up as
SolrCloud, with zookeeper.

The steps are as follows:

1. Manually delete the data folder.
2. Restart the container. The process is now in error mode, complaining
that it cannot find the cores.
3. Fix the install, meaning create new data folders, which are empty at
this point.
4. Restart the container again, to pick up the empty folders and not be in
error anymore.
5. Perform the restore
6. Check if everything is available again

The problem is between step 4 and 5. After step 4, it takes several minutes
before solr actually responds to curl commands.

Once responsive, the restore happened just fine. But it's very stressful in
a situation where you have to restore a production environment and the
process just doesn't respond for 5-10 minutes.

We're talking about 20GB of data here, so not very much, but not little
either.

Is it normal that it takes so long before solr responds? If not, what
should I look at in order to find the cause?

I have asked this before recently, though the wording was confusing. This
should be clearer.

Kind regards,
Koen De Groote