Re: updating documents via csv

2019-12-17 Thread Paras Lehana
Oh lol. How could I miss that! This is actually true for any bash command.
Glad that it worked.

On Wed, 18 Dec, 2019, 00:29 rhys J,  wrote:

> On Mon, Dec 16, 2019 at 11:58 PM Paras Lehana 
> wrote:
>
> > Hi Rhys,
> >
> > I use CDATA for XMLs:
> >
> >
> >  
> >
> > There should be a similar solution for JSON though I couldn't find the
> > specific one on the internet. If you are okay to use XMLs for indexing,
> you
> > can use this.
> >
> >
> We are set on using json, but I figured out how to handle the single quote.
>
> If i use curl " and then single quotes inside, I can escape the single
> quote in the field with no problem.
>
> Thanks for the help!
>
> Rhys
>

-- 
*
*

 


Synonym expansions w/ phrase slop exhausting memory after upgrading to SOLR 7

2019-12-17 Thread Nick D
Hello All,

We recently upgraded from Solr 6.6 to Solr 7.7.2 and recently had spikes in
memory that eventually caused either an OOM or almost 100% utilization of
the available memory. After trying a few things, increasing the JVM heap,
making sure docValues were set for all Sort, facet fields (thought maybe
the fieldCache was blowing up), I was able to isolate a single query that
would cause the used memory to become fully exhausted and effectively
render the instance dead. After applying a timeAllowed  value to the query
and reducing the query phrase (system would crash on without throwing the
warning on longer queries containing synonyms). I was able to idenitify the
following warning in the logs:

o.a.s.s.SolrIndexSearcher Query: 

the request took too long to iterate over terms. Timeout: timeoutAt:
812182664173653 (System.nanoTime(): 812182715745553),
TermsEnum=org.apache.lucene.codecs.blocktree.SegmentTermsEnum@7a0db441

I have narrowed the problem down to the following:
the way synonyms are being expaneded along with phrase slop.

With a ps=5 I get 4096 possible permutations of the phrase being searched
with because of synonyms, looking similar to:
ngs_title:"bereavement leave type build bereavement leave type data p"~5
 ngs_title:"bereavement leave type build bereavement bereavement type data
p"~5
 ngs_title:"bereavement leave type build bereavement jury duty type data
p"~5
 ngs_title:"bereavement leave type build bereavement maternity leave type
data p"~5
 ngs_title:"bereavement leave type build bereavement paternity type data
p"~5
 ngs_title:"bereavement leave type build bereavement paternity leave type
data p"~5
 ngs_title:"bereavement leave type build bereavement adoption leave type
data p"~5
 ngs_title:"bereavement leave type build jury duty maternity leave type
data p"~5
 ngs_title:"bereavement leave type build jury duty paternity type data p"~5
 ngs_title:"bereavement leave type build jury duty paternity leave type
data p"~5
 ngs_title:"bereavement leave type build jury duty adoption leave type data
p"~5
 ngs_title:"bereavement leave type build jury duty absence type data p"~5
 ngs_title:"bereavement leave type build maternity leave leave type data
p"~5
 ngs_title:"bereavement leave type build maternity leave bereavement type
data p"~5
 ngs_title:"bereavement leave type build maternity leave jury duty type
data p"~5



Previously in Solr 6 that same query, with the same synonyms (and query
analysis chain) would produce a parsedQuery like when using a =5:
DisjunctionMaxQuery(((ngs_field_description:\"leave leave type build leave
leave type data ? p leave leave type type.enabled\"~5)^3.0 |
(ngs_title:\"leave leave type build leave leave type data ? p leave leave
type type.enabled\"~5)^10.0)

The expansion wasn't being applied to the added disjunctionMaxQuery to when
adjusting rankings with phrase slop.

In general the parsedqueries between 6 and 7 are differnet, with some new
`spanNears` showing but they don't create the memory consumpution issues
that I have seen when a large synonym expansion is happening along w/ using
a PS parameter.

I didn't see much in terms on release notes changes for synonym changes
(outside of SOW=false being the default for version . 7).

The field being opertated on has the following query analysis chain:

 




  

Not sure if there is a change in phrase slop that now takes synonyms into
account and if there is way to disable that kind of expansion or not. I am
not sure if it is related to SOLR-10980
 or
not, does seem to be related,  but referenced Solr 6 which does not do the
expansion.

Any help would be greatly appreciated.

Nick


"No value present" when set cluster policy for autoscaling in solr cloud mode

2019-12-17 Thread Cao, Li
Hi!

I am trying to add a cluster policy to a freshly built 8.3.0 cluster (no 
collection added). I got this error when adding such a cluster policy

{ 
"set-cluster-policy":[{"cores":"<3","nodeset":{"sysprop.rex.node.type":"tlog"}}]}

Basically I want to limit the number of cores for certain machines with a 
special environmental variable value.

But I got this error response:

{
  "responseHeader":{
"status":400,
"QTime":144},
  "result":"failure",
  "WARNING":"This response format is experimental.  It is likely to change in 
the future.",
  "error":{
"metadata":[
  "error-class","org.apache.solr.api.ApiBag$ExceptionWithErrObject",
  "root-error-class","org.apache.solr.api.ApiBag$ExceptionWithErrObject"],
"details":[{
"set-cluster-policy":[{
"cores":"<3",
"nodeset":{"sysprop.rex.node.type":"tlog"}}],
"errorMessages":["No value present"]}],
"msg":"Error in command payload",
"code":400}}

However, this works:

{ "set-cluster-policy":[{"cores":"<3","node":"#ANY"}]}

I read the autoscaling policy documentations and cannot figure out why. Could 
someone help me on this?

Thanks!

Li


Re: updating documents via csv

2019-12-17 Thread rhys J
On Mon, Dec 16, 2019 at 11:58 PM Paras Lehana 
wrote:

> Hi Rhys,
>
> I use CDATA for XMLs:
>
>
>  
>
> There should be a similar solution for JSON though I couldn't find the
> specific one on the internet. If you are okay to use XMLs for indexing, you
> can use this.
>
>
We are set on using json, but I figured out how to handle the single quote.

If i use curl " and then single quotes inside, I can escape the single
quote in the field with no problem.

Thanks for the help!

Rhys


Re: Query Containing Multiple Parsers

2019-12-17 Thread Chris Hostetter


: Is there a way to construct a query that needs two different parsers?
: Example:
: q={!xmlparser}Hello
: AND
: q={!edismax}text_en:"foo bar"~4

The easies way to do what you're asking about would be to choose one of 
those queries for "storking" purposes, and put the other one in an "fq" 
simply for filtering.

But you can build a single compelx query using multiple parsers by 
leveraging the "lucene" parser's support for nesting queries -- ie: in a 
larger boolean query -- and then use local param variables to reference 
your other param names...


q=({!edismax qf=text_en v=$my_main_query} OR {!xmlparser v=$my_span_query})
my_main_query="foo bar"~4
my_span_query=Hello

...the important bits that tend to trip people up is to make sure you 
don't start your query string with the local param syntax of another 
parser, and that you don't pass the input of your nested parsers "inline" 
if they contain white space .. hence the parens above and the use of the 
'v' local param.

If you tried to do the same thing like either of these queries below, it 
wouldn't work because it would confuse the parsing logic...

bad_q1={!edismax qf=text_en v=$my_main_query} OR {!xmlparser v=$my_span_query}

bad_q2=({!edismax qf=text_en}"foo bar"~4 OR {!xmlparser v=$my_span_query})

in "bad_q1" solr would think you wanted the *entire* param value 
(including the "OR {!xmlparser..." passed to the "edismax" parsers

in "bad_q2" the nested edismax parser would only be given the input '"foo" 
.. and not the ' bar"~4' bit, because the outer most (implicit) lucene 
parser doens't understand how much of the input you intended for the 
nested parser.


-Hoss
http://www.lucidworks.com/


Re: Using Deep Paging with Graph Query Parser

2019-12-17 Thread Chris Hostetter


: Is there a way to use combine paging's cursor feature with graph query
: parser?

it should work just fie -- the cursorMark logic doesn't care what query 
parser you use.

Is there a particular problem you are running into when you send requests 
using both?


-Hoss
http://www.lucidworks.com/


Re: Facing jwt authentication problem using solr 8.1.1

2019-12-17 Thread Jason Gerlowski
Hey Jan,

Is this a case of something that'd be fixed by
https://issues.apache.org/jira/browse/SOLR-13071 ?

Just wondering

Best,
Jason

On Thu, Dec 12, 2019 at 5:43 PM Jan Høydahl  wrote:
>
> Try something like this 
> https://gist.github.com/b330e1bea7842bcdc1e5fa3940b4a4f7 
> 
>
> The trick is to «whitelist» certain paths that will not require auth, but 
> then further down add rules to block all other paths either as admin role or 
> with special role *»* which means «any authenticated user».
>
> Jan
>
> > 12. des. 2019 kl. 07:47 skrev Lakhan Gupta 
> > :
> >
> > Hi,
> >
> > Using solr 8.1.1 version and facing problem while enabling jwt 
> > authentication in solr. Jwt authentication is working fine after 
> > configuring security.json file. Below is the configuration I am using for 
> > enabling jwt authentication.
> >
> > Security.json
> >
> > {
> >  "authentication":{
> >   "blockUnknown": false,
> >"class":"solr.JWTAuthPlugin",
> >   "jwk":{
> >  "kty":"oct",
> >  "use":"sig",
> >  "kid":"k1",
> >  
> > "k":"7A02618BE6943C22FD81CAB9F6FCF063B6E1732C3614BC3ACA6032B6B3215CAF0D28A34FD423423CA3AC34BEA27D3F79",
> >  "alg":"HS256"},
> >"aud":"solr"},
> >   "authorization":{
> >  "class":"solr.RuleBasedAuthorizationPlugin",
> >  "permissions":[
> >  {
> >"name":"all",
> > "path":"/*",
> >"role":"admin"
> > }
> >  ],
> >  "user-role":{
> > "solr":"admin"
> >  }
> >   }
> > }
> >
> > Using secret key
> > 7A02618BE6943C22FD81CAB9F6FCF063B6E1732C3614BC3ACA6032B6B3215CAF0D28A34FD423423CA3AC34BEA27D3F79
> >
> > Jwt token is generated:
> > eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJzdWIiOiJhZCIsImF1ZCI6InNvbHIiLCJleHAiOjk5MTYyMzkwMjJ9.M4PksJTJ9gFjOlvvFmG1eDSyXDtKIRSGIYicIW9hwT4
> >
> > Below header and payload I'm using for generate jwt token :
> >
> > The header is
> > {
> >  "alg": "HS256",
> >  "typ": "JWT"
> > }
> >
> > and the payload is
> >
> > {
> >  "sub": "admin",
> >  "aud": "Solr",
> >  "exp": 9916239022
> > }
> >
> > With above configuration my jwt authentication is working fine. But there 
> > is a problem when request is sent without authentication in header the api 
> > still retrieving data. I want to prevent it when request come without 
> > authentication header.
> >
> > For that, I've enabled blockUnknown parameter in security.json file. That 
> > works fine and authentication request is required. But, after enabling 
> > blockunknown  parameter I am facing below exception while starting solr 
> > using solr start command.
> >
> >
> > ERROR: Solr requires authentication for 
> > http://localhost:8983/solr/admin/info/system. Please supply valid 
> > credentials. HTTP code=401
> >
> > I've googled a lot and find out
> >
> > solr/admin/info/system endpoint required authentication.
> >
> > How to authenticate solr/admin/info/system endpoint while startup solr?
> >
> > Need urgent help. I'd appreciate if someone can help me.
> >
> > Thanks
> > Lakhan Gupta
> >
> >
> >
> > The information in this email is confidential and may be legally 
> > privileged. It is intended solely for the addressee and access to it by 
> > anyone else is unauthorized. If you are not the intended recipient, any 
> > disclosure, copying, distribution or any action taken or omitted to be 
> > taken based on it, is strictly prohibited and may be unlawful.
>