Concurrent Updates
We have a SolrCloud cluster (of 3 nodes) running solr 6.4.2. Every night, we delete and recreate our whole catalog. In this process, we're simultaneously running a query which recreates the product catalog (which includes child documents of a different type) and a query that creates a third document type that we use for joining. When we issue a search against one shard, we see the response we expect. But when we issue the same search against another shard, instead of the prescribed child documents, we'll have children that are this third type of document. This seems to only affect the occasional document. We're wondering if anybody out there has experience with this, and might have some ideas as to why it is happening. Thanks so much. -- *This message is intended only for the use of the individual or entity to which it is addressed and may contain information that is privileged, confidential and exempt from disclosure under applicable law. If you have received this message in error, you are hereby notified that any use, dissemination, distribution or copying of this message is prohibited. If you have received this communication in error, please notify the sender immediately and destroy the transmitted information.*
Retrieve DocIdSet from Query in lucene 5.x
I am trying to migrate some old code that used to retrieve DocIdSets from filters, but with Filters being deprecated in Lucene 5.x I am trying to move away from those classes but I'm not sure the right way to do this now. Are there any examples of doing this?
JAr errors with SoLr 6.6.1 and http client and core
Hi I have the following code. System.out.println("Initializing server"); SystemDefaultHttpClient cl = new SystemDefaultHttpClient(); client = new HttpSolrClient("http://localhost:8983/solr/#/prosp_poc_collection",cl); System.out.println("Completed initializing the server"); client.deleteByQuery( "*:*" ); Version of solr 6.6.1 and http client is 4.5.3 and 4.4.8 for http core. I get the following exception please advise. Exception Details: Location: org/apache/solr/client/solrj/impl/HttpClientUtil.createClient(Lorg/apache/solr/common/params/SolrParams;)Lorg/apache/http/impl/client/CloseableHttpClient; @57: areturn Reason: Type 'org/apache/http/impl/client/SystemDefaultHttpClient' (current frame, stack[0]) is not assignable to 'org/apache/http/impl/client/CloseableHttpClient' (from method signature) Current Frame: bci: @57 flags: { } locals: { 'org/apache/solr/common/params/SolrParams', 'org/apache/solr/common/params/ModifiableSolrParams', 'org/apache/http/impl/client/SystemDefaultHttpClient' } stack: { 'org/apache/http/impl/client/SystemDefaultHttpClient' } Bytecode: 0x000: bb00 0959 2ab7 000a 4cb2 000b b900 0c01 0x010: 0099 001e b200 0bbb 000d 59b7 000e 120f 0x020: b600 102b b600 11b6 0012 b900 1302 00b8 0x030: 0014 4d2c 2bb8 0015 2cb0 Stackmap Table: append_frame(@47,Object[#172]) at org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:514) at org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:279) at org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:268) at org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:160) at org.apache.solr.client.solrj.SolrClient.deleteByQuery(SolrClient.java:895) at org.apache.solr.client.solrj.SolrClient.deleteByQuery(SolrClient.java:858) at org.apache.solr.client.solrj.SolrClient.deleteByQuery(SolrClient.java:873) at com.moodys.poc.tika.test.PocApacheTikaSecFilings.initSolrServer(PocApacheTikaSecFilings.java:35) at com.moodys.poc.tika.test.PocApacheTikaSecFilings.main(PocApacheTikaSecFilings.java:41) - Moody's monitors email communications through its networks for regulatory compliance purposes and to protect its customers, employees and business and where allowed to do so by applicable law. The information contained in this e-mail message, and any attachment thereto, is confidential and may not be disclosed without our express permission. If you are not the intended recipient or an employee or agent responsible for delivering this message to the intended recipient, you are hereby notified that you have received this message in error and that any review, dissemination, distribution or copying of this message, or any attachment thereto, in whole or in part, is strictly prohibited. If you have received this message in error, please immediately notify us by telephone, fax or e-mail and delete the message and all of its attachments. Every effort is made to keep our network free from viruses. You should, however, review this e-mail message, as well as any attachment thereto, for viruses. We take no responsibility and have no liability for any computer virus which may be transferred via this e-mail message. -
Solr Beginner!!
Hi: I am trying to ingest a few memos - they do not have any standard format (json, xml etc etc) but just plain text however the memos all follow some template. What I would like to od post ingestion is to extract keywords and some values around it. So say for instance if the text contains the key word Outstanding Amount: 1000. I would like to search for Outstanding Amount ( I can do that using the query interface) how to I extract the entire string Outstanding Amount +3or4 words from Solr. I am really new to solr so any documentation etc would be super helpful. Is Solr the right tool for this use case also Thanks. - Moody's monitors email communications through its networks for regulatory compliance purposes and to protect its customers, employees and business and where allowed to do so by applicable law. The information contained in this e-mail message, and any attachment thereto, is confidential and may not be disclosed without our express permission. If you are not the intended recipient or an employee or agent responsible for delivering this message to the intended recipient, you are hereby notified that you have received this message in error and that any review, dissemination, distribution or copying of this message, or any attachment thereto, in whole or in part, is strictly prohibited. If you have received this message in error, please immediately notify us by telephone, fax or e-mail and delete the message and all of its attachments. Every effort is made to keep our network free from viruses. You should, however, review this e-mail message, as well as any attachment thereto, for viruses. We take no responsibility and have no liability for any computer virus which may be transferred via this e-mail message. -
RE: Solr 7.0.0 -- can it use a 6.5.0 data repository (index)
First, thanks for the quick response. Yes, it sounds like the same problem!! I did a bunch of searching before repoting the issue, I didn't come across that JIRA or I wouldn't have reported it. My apologies for the duplication (although it is a new JIRA). Is there a good place to start searching in the future? I'm a fairly experiences Solr user, and I don't mind slogging through Java code. Meanwhile I'll follow the JIRA so I know when it gets fixed. Thanks!! -Original Message- From: Stefan Matheis [mailto:matheis.ste...@gmail.com] Sent: Wednesday, September 27, 2017 12:32 PM To: solr-user@lucene.apache.org Subject: Re: Solr 7.0.0 -- can it use a 6.5.0 data repository (index) That sounds like https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_SOLR-2D11406=DwIFaQ=z0adcvxXWKG6LAMN6dVEqQ=4gLDKHTqOXldY2aQti2VNXYWPtqa1bUKE6MA9VrIJfU=iYU948dQo6G0tKFQUguY6SHOZNZoCOEAEv1sCf4ukcA=HvPPQL--s3bFtNyBdUiz1hNIqfLEVrb4Cu-HIC71dKY= if i'm not mistaken? -Stefan On Sep 27, 2017 8:20 PM, "Wayne L. Johnson" <wjohn...@familysearch.org> wrote: > I’m testing Solr 7.0.0. When I start with an empty index, Solr comes > up just fine, I can add documents and query documents. However when I > start with an already-populated set of documents (from 6.5.0), Solr > will not start. The relevant portion of the traceback seems to be: > > Caused by: java.lang.NullPointerException > > at java.util.Objects.requireNonNull(Objects.java:203) > > … > > at java.util.stream.ReferencePipeline.reduce( > ReferencePipeline.java:479) > > at org.apache.solr.index.SlowCompositeReaderWrapper.( > SlowCompositeReaderWrapper.java:76) > > at org.apache.solr.index.SlowCompositeReaderWrapper.wrap( > SlowCompositeReaderWrapper.java:57) > > at org.apache.solr.search.SolrIndexSearcher.( > SolrIndexSearcher.java:252) > > at org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java: > 2034) > > ... 12 more > > > > In looking at the de-compiled code (SlowCompositeReaderWrapper), lines > 72-77, and it appears that one or more “leaf” files doesn’t have a > “min-version” set. That’s a guess. If so, does this mean Solr 7.0.0 > can’t read a 6.5.0 index? > > > > Thanks > > > > Wayne Johnson > > 801-240-4024 > > wjohnson...@ldschurch.org > > [image: familysearch2.JPG] > > >
Solr 7.0.0 -- can it use a 6.5.0 data repository (index)
I'm testing Solr 7.0.0. When I start with an empty index, Solr comes up just fine, I can add documents and query documents. However when I start with an already-populated set of documents (from 6.5.0), Solr will not start. The relevant portion of the traceback seems to be: Caused by: java.lang.NullPointerException at java.util.Objects.requireNonNull(Objects.java:203) ... at java.util.stream.ReferencePipeline.reduce(ReferencePipeline.java:479) at org.apache.solr.index.SlowCompositeReaderWrapper.(SlowCompositeReaderWrapper.java:76) at org.apache.solr.index.SlowCompositeReaderWrapper.wrap(SlowCompositeReaderWrapper.java:57) at org.apache.solr.search.SolrIndexSearcher.(SolrIndexSearcher.java:252) at org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:2034) ... 12 more In looking at the de-compiled code (SlowCompositeReaderWrapper), lines 72-77, and it appears that one or more "leaf" files doesn't have a "min-version" set. That's a guess. If so, does this mean Solr 7.0.0 can't read a 6.5.0 index? Thanks Wayne Johnson 801-240-4024 wjohnson...@ldschurch.org<mailto:wjohnson...@ldschurch.org> [familysearch2.JPG]
Re: Custom StoredFieldVisitor in Solr
Hi Rick. The use case is we use payloads to determine if a particular user can or can't see a field, as of right now we have the query piece working so that fields the user can't see don't contribute to the score but we wanted to use a custom stored field visitor as well so that we can remove fields that particular user shouldn't be able to see. On Thu, Aug 24, 2017 at 11:08 AM, Rick Leir <rl...@leirtech.com> wrote: > Jamie, what is the use case? Cheers -- Rick > > On August 23, 2017 11:30:38 AM MDT, Jamie Johnson <jej2...@gmail.com> > wrote: > >I thought I had asked this previously, but I can't find reference to it > >now. I am interested in using a custom StoredFieldVisitor in Solr and > >after spelunking through the code for a little it seems that there is > >no > >easy extension point that supports me doing so. I am currently on Solr > >4.x > >(moving forward is a long term option, but can't be done in the short > >term). The only option I see at this point is creating a forking Solr > >and > >changing the way SolrIndexSearcher currently works to provide another > >option to enable my custom StoredFieldVisitor. While I'd prefer not to > >do > >so, if it is my only option I am ok with it. > > > >Are there any suggestions for how to go about supporting this besides > >the > >above? > > > >Jamie > > -- > Sorry for being brief. Alternate email is rickleir at yahoo dot com
Custom StoredFieldVisitor in Solr
I thought I had asked this previously, but I can't find reference to it now. I am interested in using a custom StoredFieldVisitor in Solr and after spelunking through the code for a little it seems that there is no easy extension point that supports me doing so. I am currently on Solr 4.x (moving forward is a long term option, but can't be done in the short term). The only option I see at this point is creating a forking Solr and changing the way SolrIndexSearcher currently works to provide another option to enable my custom StoredFieldVisitor. While I'd prefer not to do so, if it is my only option I am ok with it. Are there any suggestions for how to go about supporting this besides the above? Jamie
Re: Regex Phrases
So I managed to get the tokenizing to work with both PatternTokenizerFactory and WordDelimiterFilterFactory (used in combination with WhitespaceTokenizerFactory). For PT I used a regex that matches the various permutations of the phrases, and for WDF/WT I used protected words with every permutation (there are only 40 or 50). In both cases, via the admin/analysis screen, the Index and Query values were tokenized correctly (for example, "Super Vitamin C" was tokenized as "Super" and "Vitamin C"). However, when I do a query like "DisplayName:(Super Vitamin C)" with "debug=query", I see that the parsed query is "DisplayName:Super DisplayName:Vitamin DisplayName:C" ("DisplayName" is the field I'm working on here). Shouldn't that instead be parsed as something like "DIsplayName:Super DisplayName:"Vitamin C"" or something similar? Or am I not understanding how query parsing works? In either case, I'm seeing results where DisplayName contains things like "Vitamin B 90 Caps" or "Super Orange 30 pkts", neither of which contain the phrase "Vitamin C", so I suspect something is wrong. On Thu, Mar 23, 2017 at 8:08 AM, Joel Bernstein <joels...@gmail.com> wrote: > You can also checkout > https://cwiki.apache.org/confluence/display/solr/Tokenizers#Tokenizers- > RegularExpressionPatternTokenizer > . > > Joel Bernstein > http://joelsolr.blogspot.com/ > > On Wed, Mar 22, 2017 at 7:52 PM, Erick Erickson <erickerick...@gmail.com> > wrote: > > > Susheel: > > > > That'll work, but the options you've specified for > > WordDelimiterFilterFactory pretty much make it so it's doing nothing. > > I realize it's commented out... > > > > That said, it's true that if you have a very specific pattern you want > > to recognize a Regex can do the trick. WDFF is a bit more generic > > though when you have less specific requirements. > > > > Best, > > Erick > > > > On Wed, Mar 22, 2017 at 12:56 PM, Susheel Kumar <susheel2...@gmail.com> > > wrote: > > > I have used PatternReplaceFilterFactory in some of these situations. > e.g. > > > below > > > > > > > > > class="solr.PatternReplaceFilterFactory" pattern="(\d+)-(\d+)-?(\d+)$" > > > replacement="$1$2$3"/> > > > > > > On Wed, Mar 22, 2017 at 2:54 PM, Mark Johnson < > > mjohn...@emersonecologics.com > > >> wrote: > > > > > >> Awesome, thank you much! > > >> > > >> On Wed, Mar 22, 2017 at 2:38 PM, Erick Erickson < > > erickerick...@gmail.com> > > >> wrote: > > >> > > >> > Take a close look at WordDelimiterFilterFactory, it's designed to > deal > > >> > with things like part numbers, phone numbers and the like, and the > > >> > example you gave is in the same class of problem I think. It'll take > > >> > a bit to get your head around what it does, but it'll perfom better > > >> > than regexes, assuming you can get what you need out of it. > > >> > > > >> > And the admin/analysis page will help you _greatly_ in understanding > > >> > what the effects of the various parameters are. > > >> > > > >> > Best, > > >> > Erick > > >> > > > >> > On Wed, Mar 22, 2017 at 11:06 AM, Mark Johnson > > >> > <mjohn...@emersonecologics.com> wrote: > > >> > > Is it possible to configure Solr to treat text that matches a > regex > > as > > >> a > > >> > > phrase? > > >> > > > > >> > > I have a database full of products, and the Title and Description > > >> fields > > >> > > are text_en, tokenized via the StandardTokenizerFactory. This > works > > in > > >> > most > > >> > > cases, but a number of products have names like: > > >> > > > > >> > > - Vitamin A > > >> > > - Vitamin-A > > >> > > - Vitamin B12 > > >> > > - Vitamin B-12 > > >> > > ...and so on > > >> > > > > >> > > I have a regex that will match all of the permutations and would > > like > > >> to > > >> > > configure the field type so that anything that matches the regex > > >> pattern > > >> > is > > >> > > treated as a single token, instead of being broken up by spaces, >
Re: Regex Phrases
Awesome, thank you much! On Wed, Mar 22, 2017 at 2:38 PM, Erick Erickson <erickerick...@gmail.com> wrote: > Take a close look at WordDelimiterFilterFactory, it's designed to deal > with things like part numbers, phone numbers and the like, and the > example you gave is in the same class of problem I think. It'll take > a bit to get your head around what it does, but it'll perfom better > than regexes, assuming you can get what you need out of it. > > And the admin/analysis page will help you _greatly_ in understanding > what the effects of the various parameters are. > > Best, > Erick > > On Wed, Mar 22, 2017 at 11:06 AM, Mark Johnson > <mjohn...@emersonecologics.com> wrote: > > Is it possible to configure Solr to treat text that matches a regex as a > > phrase? > > > > I have a database full of products, and the Title and Description fields > > are text_en, tokenized via the StandardTokenizerFactory. This works in > most > > cases, but a number of products have names like: > > > > - Vitamin A > > - Vitamin-A > > - Vitamin B12 > > - Vitamin B-12 > > ...and so on > > > > I have a regex that will match all of the permutations and would like to > > configure the field type so that anything that matches the regex pattern > is > > treated as a single token, instead of being broken up by spaces, etc. Is > > that possible? > > > > -- > > *This message is intended only for the use of the individual or entity to > > which it is addressed and may contain information that is privileged, > > confidential and exempt from disclosure under applicable law. If you have > > received this message in error, you are hereby notified that any use, > > dissemination, distribution or copying of this message is prohibited. If > > you have received this communication in error, please notify the sender > > immediately and destroy the transmitted information.* > -- Best Regards, *Mark Johnson* | .NET Software Engineer Office: 603-392-7017 Emerson Ecologics, LLC | 1230 Elm Street | Suite 301 | Manchester NH | 03101 <http://www.emersonecologics.com/> <https://wellevate.me/#/> *Supporting The Practice Of Healthy Living* <http://blog.emersonecologics.com/> <https://www.linkedin.com/company/emerson-ecologics> <https://www.facebook.com/emersonecologics/> <https://twitter.com/EmersonEcologic> <https://www.instagram.com/emerson_ecologics/> <https://www.pinterest.com/emersonecologic/> <https://www.glassdoor.com/Overview/Working-at-Emerson-Ecologics-EI_IE388367.11,28.htm> -- *This message is intended only for the use of the individual or entity to which it is addressed and may contain information that is privileged, confidential and exempt from disclosure under applicable law. If you have received this message in error, you are hereby notified that any use, dissemination, distribution or copying of this message is prohibited. If you have received this communication in error, please notify the sender immediately and destroy the transmitted information.*
Regex Phrases
Is it possible to configure Solr to treat text that matches a regex as a phrase? I have a database full of products, and the Title and Description fields are text_en, tokenized via the StandardTokenizerFactory. This works in most cases, but a number of products have names like: - Vitamin A - Vitamin-A - Vitamin B12 - Vitamin B-12 ...and so on I have a regex that will match all of the permutations and would like to configure the field type so that anything that matches the regex pattern is treated as a single token, instead of being broken up by spaces, etc. Is that possible? -- *This message is intended only for the use of the individual or entity to which it is addressed and may contain information that is privileged, confidential and exempt from disclosure under applicable law. If you have received this message in error, you are hereby notified that any use, dissemination, distribution or copying of this message is prohibited. If you have received this communication in error, please notify the sender immediately and destroy the transmitted information.*
Re: Partial Match with DF
Thank you for the heads up! I think in some cases we will want to strip out punctuation but in others we might need it (for example, "liquid courage." should tokenize to "liquid" and "courage", while "1.5 oz liquid courage" should tokenize to "1.5", "oz", "liquid" and "courage"). I'll have to do some experimenting to see which one will work best for us. On Thu, Mar 16, 2017 at 11:09 AM, Erick Erickson <erickerick...@gmail.com> wrote: > Yeah, they've saved me on numerous occasions, glad to see they helped. > > One caution BTW when you start changing fieldTypes is you have to > watch punctuation. StandardTokenizerFactory won't pass through most > punctuation. > > WordDelimiterFilterFactory breaks on non alpha-num, including > punctuation effectively throwing it out. > > But WhitespaceTokenizer does just that and spits out punctuation as > part of tokens, i.e. > "my words." (note period) is broken up as "my" "words." and wouldn't > match a search on "word". > > One other note, there's a tokenizer/filter for a zillion different > cases, you can go wild. Here's a partial > list:https://cwiki.apache.org/confluence/display/solr/ > Understanding+Analyzers%2C+Tokenizers%2C+and+Filters, > see the "Tokenizer", "Filters" and CharFilters" links. There are 12 > tokenizers listed and 40 or so filters... and the list is not > guaranteed to be complete. > > On Thu, Mar 16, 2017 at 7:39 AM, Mark Johnson > <mjohn...@emersonecologics.com> wrote: > > You're right! The fields I'm searching are all "string" type. I switched > to > > "text_en" and now it's working exactly as I need it to! I'll do some > > research to see if "text_en" or another "text" type field is best for our > > needs. > > > > Also, those debug options are amazing! They'll help tremendously in the > > future. > > > > Thank you much! > > > > On Thu, Mar 16, 2017 at 10:02 AM, Erick Erickson < > erickerick...@gmail.com> > > wrote: > > > >> My guess: Your analysis chain for the fields is different, i.e. they > >> have a different fieldType. In particular, watch out for the "string" > >> type, people are often confused about it. It does _not_ break input > >> into tokens, you need a text-based field type, text_en is one example > >> that is usually in the configs by default. > >> > >> Two tools that'll help you enormously: > >> > >> admin UI>>select core (or collection) from the drop-down>>analysis > >> That shows you exactly how Solr/Lucene break up text at query and index > >> time > >> > >> add =query to the URL. That'll show you how the query was parsed. > >> > >> Best, > >> Erick > >> > >> On Thu, Mar 16, 2017 at 6:52 AM, Mark Johnson > >> <mjohn...@emersonecologics.com> wrote: > >> > Oh, great! Thank you! > >> > > >> > So if I switch over to eDisMax I'd specify the fields to query via the > >> "qf" > >> > parameter, right? That seems to have the same result (only matches > when I > >> > specify the exact phrase in the field, not just certain words from > it). > >> > > >> > On Thu, Mar 16, 2017 at 9:33 AM, Alexandre Rafalovitch < > >> arafa...@gmail.com> > >> > wrote: > >> > > >> >> df is default field - you can only give one. To search over multiple > >> >> fields, you switch to eDisMax query parser and fl parameter. > >> >> > >> >> Then, the question will be what type definition your fields have. > When > >> you > >> >> search text field, you are using its definition because of copyField. > >> Your > >> >> original fields may be strings. > >> >> > >> >> Remember to reload core and reminded when you change definitions. > >> >> > >> >> Regards, > >> >>Alex > >> >> > >> >> > >> >> On 16 Mar 2017 9:15 AM, "Mark Johnson" < > mjohn...@emersonecologics.com> > >> >> wrote: > >> >> > >> >> > Forgive me if I'm missing something obvious -- I'm new to Solr, > but I > >> >> can't > >> >> > seem to find an explanation for the behavior I'm seeing. > >> >> > > >> >> > If I have a document that look
Re: Partial Match with DF
Wow, that's really powerful! Thank you! On Thu, Mar 16, 2017 at 11:19 AM, Charlie Hull <char...@flax.co.uk> wrote: > Hi Mark, > > Open Source Connection's excellent www.splainer.io might also be useful to > help you break down exactly what your query is doing. > > Cheers > > Charlie > > P.S. planning a blog soon listing 'useful Solr tools' > > On 16 March 2017 at 14:39, Mark Johnson <mjohn...@emersonecologics.com> > wrote: > > > You're right! The fields I'm searching are all "string" type. I switched > to > > "text_en" and now it's working exactly as I need it to! I'll do some > > research to see if "text_en" or another "text" type field is best for our > > needs. > > > > Also, those debug options are amazing! They'll help tremendously in the > > future. > > > > Thank you much! > > > > On Thu, Mar 16, 2017 at 10:02 AM, Erick Erickson < > erickerick...@gmail.com> > > wrote: > > > > > My guess: Your analysis chain for the fields is different, i.e. they > > > have a different fieldType. In particular, watch out for the "string" > > > type, people are often confused about it. It does _not_ break input > > > into tokens, you need a text-based field type, text_en is one example > > > that is usually in the configs by default. > > > > > > Two tools that'll help you enormously: > > > > > > admin UI>>select core (or collection) from the drop-down>>analysis > > > That shows you exactly how Solr/Lucene break up text at query and index > > > time > > > > > > add =query to the URL. That'll show you how the query was parsed. > > > > > > Best, > > > Erick > > > > > > On Thu, Mar 16, 2017 at 6:52 AM, Mark Johnson > > > <mjohn...@emersonecologics.com> wrote: > > > > Oh, great! Thank you! > > > > > > > > So if I switch over to eDisMax I'd specify the fields to query via > the > > > "qf" > > > > parameter, right? That seems to have the same result (only matches > > when I > > > > specify the exact phrase in the field, not just certain words from > it). > > > > > > > > On Thu, Mar 16, 2017 at 9:33 AM, Alexandre Rafalovitch < > > > arafa...@gmail.com> > > > > wrote: > > > > > > > >> df is default field - you can only give one. To search over multiple > > > >> fields, you switch to eDisMax query parser and fl parameter. > > > >> > > > >> Then, the question will be what type definition your fields have. > When > > > you > > > >> search text field, you are using its definition because of > copyField. > > > Your > > > >> original fields may be strings. > > > >> > > > >> Remember to reload core and reminded when you change definitions. > > > >> > > > >> Regards, > > > >>Alex > > > >> > > > >> > > > >> On 16 Mar 2017 9:15 AM, "Mark Johnson" < > mjohn...@emersonecologics.com > > > > > > >> wrote: > > > >> > > > >> > Forgive me if I'm missing something obvious -- I'm new to Solr, > but > > I > > > >> can't > > > >> > seem to find an explanation for the behavior I'm seeing. > > > >> > > > > >> > If I have a document that looks like this: > > > >> > { > > > >> > field1: "aaa bbb", > > > >> > field2: "ccc ddd", > > > >> > field3: "eee fff" > > > >> > } > > > >> > > > > >> > And I do a search where "q" is "aaa ccc", I get the document in > the > > > >> > results. This is because (please correct me if I'm wrong) the > > default > > > >> "df" > > > >> > is set to the "_text_" field, which contains the text values from > > all > > > >> > fields. > > > >> > > > > >> > However, if I do a search where "df" is "field1" and "field2" and > > "q" > > > is > > > >> > "aaa ccc" (words from field1 and field2) I get no results. > > > >> > > > > >> > In a simpler example, if I do a searc
Re: Partial Match with DF
You're right! The fields I'm searching are all "string" type. I switched to "text_en" and now it's working exactly as I need it to! I'll do some research to see if "text_en" or another "text" type field is best for our needs. Also, those debug options are amazing! They'll help tremendously in the future. Thank you much! On Thu, Mar 16, 2017 at 10:02 AM, Erick Erickson <erickerick...@gmail.com> wrote: > My guess: Your analysis chain for the fields is different, i.e. they > have a different fieldType. In particular, watch out for the "string" > type, people are often confused about it. It does _not_ break input > into tokens, you need a text-based field type, text_en is one example > that is usually in the configs by default. > > Two tools that'll help you enormously: > > admin UI>>select core (or collection) from the drop-down>>analysis > That shows you exactly how Solr/Lucene break up text at query and index > time > > add =query to the URL. That'll show you how the query was parsed. > > Best, > Erick > > On Thu, Mar 16, 2017 at 6:52 AM, Mark Johnson > <mjohn...@emersonecologics.com> wrote: > > Oh, great! Thank you! > > > > So if I switch over to eDisMax I'd specify the fields to query via the > "qf" > > parameter, right? That seems to have the same result (only matches when I > > specify the exact phrase in the field, not just certain words from it). > > > > On Thu, Mar 16, 2017 at 9:33 AM, Alexandre Rafalovitch < > arafa...@gmail.com> > > wrote: > > > >> df is default field - you can only give one. To search over multiple > >> fields, you switch to eDisMax query parser and fl parameter. > >> > >> Then, the question will be what type definition your fields have. When > you > >> search text field, you are using its definition because of copyField. > Your > >> original fields may be strings. > >> > >> Remember to reload core and reminded when you change definitions. > >> > >> Regards, > >>Alex > >> > >> > >> On 16 Mar 2017 9:15 AM, "Mark Johnson" <mjohn...@emersonecologics.com> > >> wrote: > >> > >> > Forgive me if I'm missing something obvious -- I'm new to Solr, but I > >> can't > >> > seem to find an explanation for the behavior I'm seeing. > >> > > >> > If I have a document that looks like this: > >> > { > >> > field1: "aaa bbb", > >> > field2: "ccc ddd", > >> > field3: "eee fff" > >> > } > >> > > >> > And I do a search where "q" is "aaa ccc", I get the document in the > >> > results. This is because (please correct me if I'm wrong) the default > >> "df" > >> > is set to the "_text_" field, which contains the text values from all > >> > fields. > >> > > >> > However, if I do a search where "df" is "field1" and "field2" and "q" > is > >> > "aaa ccc" (words from field1 and field2) I get no results. > >> > > >> > In a simpler example, if I do a search where "df" is "field1" and "q" > is > >> > "aaa" (a word from field1) I still get no results. > >> > > >> > If I do a search where "df" is "field1" and "q" is "aaa bbb" (the full > >> > value of field1) then I get the document in the results. > >> > > >> > So I'm concluding that when using "df" to specify which fields to > search > >> > then only an exact match on the full field value will return a > document. > >> > > >> > Is that a correct conclusion? Is there another way to specify which > >> fields > >> > to search without requiring an exact match? The results I'd like to > >> achieve > >> > are: > >> > > >> > Would Match: > >> > q=aaa > >> > q=aaa bbb > >> > q=aaa ccc > >> > q=aaa fff > >> > > >> > Would Not Match: > >> > q=eee > >> > q=fff > >> > q=eee fff > >> > > >> > -- > >> > *This message is intended only for the use of the individual or > entity to > >> > which it is addressed and may contain information that is privileged, > >> > confidential and exempt from di
Re: Partial Match with DF
Oh, great! Thank you! So if I switch over to eDisMax I'd specify the fields to query via the "qf" parameter, right? That seems to have the same result (only matches when I specify the exact phrase in the field, not just certain words from it). On Thu, Mar 16, 2017 at 9:33 AM, Alexandre Rafalovitch <arafa...@gmail.com> wrote: > df is default field - you can only give one. To search over multiple > fields, you switch to eDisMax query parser and fl parameter. > > Then, the question will be what type definition your fields have. When you > search text field, you are using its definition because of copyField. Your > original fields may be strings. > > Remember to reload core and reminded when you change definitions. > > Regards, > Alex > > > On 16 Mar 2017 9:15 AM, "Mark Johnson" <mjohn...@emersonecologics.com> > wrote: > > > Forgive me if I'm missing something obvious -- I'm new to Solr, but I > can't > > seem to find an explanation for the behavior I'm seeing. > > > > If I have a document that looks like this: > > { > > field1: "aaa bbb", > > field2: "ccc ddd", > > field3: "eee fff" > > } > > > > And I do a search where "q" is "aaa ccc", I get the document in the > > results. This is because (please correct me if I'm wrong) the default > "df" > > is set to the "_text_" field, which contains the text values from all > > fields. > > > > However, if I do a search where "df" is "field1" and "field2" and "q" is > > "aaa ccc" (words from field1 and field2) I get no results. > > > > In a simpler example, if I do a search where "df" is "field1" and "q" is > > "aaa" (a word from field1) I still get no results. > > > > If I do a search where "df" is "field1" and "q" is "aaa bbb" (the full > > value of field1) then I get the document in the results. > > > > So I'm concluding that when using "df" to specify which fields to search > > then only an exact match on the full field value will return a document. > > > > Is that a correct conclusion? Is there another way to specify which > fields > > to search without requiring an exact match? The results I'd like to > achieve > > are: > > > > Would Match: > > q=aaa > > q=aaa bbb > > q=aaa ccc > > q=aaa fff > > > > Would Not Match: > > q=eee > > q=fff > > q=eee fff > > > > -- > > *This message is intended only for the use of the individual or entity to > > which it is addressed and may contain information that is privileged, > > confidential and exempt from disclosure under applicable law. If you have > > received this message in error, you are hereby notified that any use, > > dissemination, distribution or copying of this message is prohibited. If > > you have received this communication in error, please notify the sender > > immediately and destroy the transmitted information.* > > > -- Best Regards, *Mark Johnson* | .NET Software Engineer Office: 603-392-7017 Emerson Ecologics, LLC | 1230 Elm Street | Suite 301 | Manchester NH | 03101 <http://www.emersonecologics.com/> <https://wellevate.me/#/> *Supporting The Practice Of Healthy Living* <http://blog.emersonecologics.com/> <https://www.linkedin.com/company/emerson-ecologics> <https://www.facebook.com/emersonecologics/> <https://twitter.com/EmersonEcologic> <https://www.instagram.com/emerson_ecologics/> <https://www.pinterest.com/emersonecologic/> <https://www.glassdoor.com/Overview/Working-at-Emerson-Ecologics-EI_IE388367.11,28.htm> -- *This message is intended only for the use of the individual or entity to which it is addressed and may contain information that is privileged, confidential and exempt from disclosure under applicable law. If you have received this message in error, you are hereby notified that any use, dissemination, distribution or copying of this message is prohibited. If you have received this communication in error, please notify the sender immediately and destroy the transmitted information.*
Partial Match with DF
Forgive me if I'm missing something obvious -- I'm new to Solr, but I can't seem to find an explanation for the behavior I'm seeing. If I have a document that looks like this: { field1: "aaa bbb", field2: "ccc ddd", field3: "eee fff" } And I do a search where "q" is "aaa ccc", I get the document in the results. This is because (please correct me if I'm wrong) the default "df" is set to the "_text_" field, which contains the text values from all fields. However, if I do a search where "df" is "field1" and "field2" and "q" is "aaa ccc" (words from field1 and field2) I get no results. In a simpler example, if I do a search where "df" is "field1" and "q" is "aaa" (a word from field1) I still get no results. If I do a search where "df" is "field1" and "q" is "aaa bbb" (the full value of field1) then I get the document in the results. So I'm concluding that when using "df" to specify which fields to search then only an exact match on the full field value will return a document. Is that a correct conclusion? Is there another way to specify which fields to search without requiring an exact match? The results I'd like to achieve are: Would Match: q=aaa q=aaa bbb q=aaa ccc q=aaa fff Would Not Match: q=eee q=fff q=eee fff -- *This message is intended only for the use of the individual or entity to which it is addressed and may contain information that is privileged, confidential and exempt from disclosure under applicable law. If you have received this message in error, you are hereby notified that any use, dissemination, distribution or copying of this message is prohibited. If you have received this communication in error, please notify the sender immediately and destroy the transmitted information.*
Re: Custom handler/content stream loader
I did start start with this but it's a limited approach since it only works with text fields. Right now I'm using this with a bunch of custom fields extended to support payloads but that is expensive to maintain between versions, especially when APIs change so I'm looking for a less invasive way of supporting the same capability. I believe I have a Lucene solution for handling any field type, though i will obviously need to test it, if I can figure out the best way to do the custom building of the document and how to marshal it to the server and from the client. On Aug 23, 2016 11:05 PM, "Alexandre Rafalovitch" <arafa...@gmail.com> wrote: > Have you tried starting with the DelimitedPayloadTokenFilterFactory? > There is a sample configuration in the shipped examples: > https://github.com/apache/lucene-solr/blob/releases/ > lucene-solr/6.1.0/solr/example/example-DIH/solr/db/ > conf/managed-schema#L625 > > Regards, > Alex. > > Newsletter and resources for Solr beginners and intermediates: > http://www.solr-start.com/ > > > On 24 August 2016 at 04:22, Jamie Johnson <jej2...@gmail.com> wrote: > > I have a need to build custom field types that store additional metadata > at > > the field level in a payload. I was thinking that I could satisfy this > by > > building a custom UpdateRequest that captured this additional information > > in XML, but I am not really sure how to get at this additional > information > > on the server side. Would I need to implement a custom RequestHandler to > > handle the update, could I add a custom ContentStreamLoader to parse the > > XML, how do I customize the creation of the lucene document once I have > the > > XML? Any help/direction would really be appreciated. > > > > -Jamie >
Re: Custom handler/content stream loader
Ok, did a bit more digging. It looks like if I build a custom ContentStreamLoader I can create a custom AddOrUpdateCommand that is ultimately responsible for building the lucene document. So looks like if I build a custom UpdateRequestHandler I can register my custom ContentStreamLoader and I'll be set. Is this the appropriate course of action? Lastly, I always want to use my custom UpdateRequest when adding data to Solr from SolrJ but I don't see an easy way of doing this. Really what I need is to control the XML generated and being sent to the server and it looks like this is the best way, but I wonder given the inability to plugin a custom request writer (or something similar). Am I barking up the wrong tree? On Aug 23, 2016 5:22 PM, "Jamie Johnson" <jej2...@gmail.com> wrote: > I have a need to build custom field types that store additional metadata > at the field level in a payload. I was thinking that I could satisfy this > by building a custom UpdateRequest that captured this additional > information in XML, but I am not really sure how to get at this additional > information on the server side. Would I need to implement a custom > RequestHandler to handle the update, could I add a custom > ContentStreamLoader to parse the XML, how do I customize the creation of > the lucene document once I have the XML? Any help/direction would really > be appreciated. > > -Jamie >
Custom handler/content stream loader
I have a need to build custom field types that store additional metadata at the field level in a payload. I was thinking that I could satisfy this by building a custom UpdateRequest that captured this additional information in XML, but I am not really sure how to get at this additional information on the server side. Would I need to implement a custom RequestHandler to handle the update, could I add a custom ContentStreamLoader to parse the XML, how do I customize the creation of the lucene document once I have the XML? Any help/direction would really be appreciated. -Jamie
RE: How to Add New Fields and Fields Types Programmatically Using Solrj
Thanks a lot Steve. It worked out. Regards, Jeniba Johnson -Original Message- From: Steve Rowe [mailto:sar...@gmail.com] Sent: Monday, July 18, 2016 7:57 PM To: solr-user@lucene.apache.org Subject: Re: How to Add New Fields and Fields Types Programmatically Using Solrj Hi Jeniba, You can add fields and field types using Solrj with SchemaRequest.Update subclasses - see here for a list: <http://lucene.apache.org/solr/6_1_0/solr-solrj/org/apache/solr/client/solrj/request/schema/SchemaRequest.Update.html> There are quite a few examples of doing both in the tests: <https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;a=blob;f=solr/solrj/src/test/org/apache/solr/client/solrj/request/SchemaTest.java;h=72051b123aadb2df57f4bf19abfedb0ac0deb6cd;hb=refs/heads/branch_6_1> -- Steve www.lucidworks.com > On Jul 18, 2016, at 1:59 AM, Jeniba Johnson <jeniba.john...@lntinfotech.com> > wrote: > > > Hi, > > I have configured solr5.3.1 and started Solr in schema less mode. Using > SolrInputDocument, Iam able to add new fields in solrconfig.xml using Solrj. > How to specify the field type of a field using Solrj. > > Eg required="true" multivalued="false" /> > > How can I add field type properties using SolrInputDocument programmatically > using Solrj? Can anyone help with it? > > > > Regards, > Jeniba Johnson > > > > > The contents of this e-mail and any attachment(s) may contain confidential or > privileged information for the intended recipient(s). Unintended recipients > are prohibited from taking action on the basis of information in this e-mail > and using or disseminating the information, and must notify the sender and > delete it from their system. L Infotech will not accept responsibility or > liability for the accuracy or completeness of, or the presence of any virus > or disabling code in this e-mail"
FW: How to Add New Fields and Fields Types Programmatically Using Solrj
Hi, I have configured solr5.3.1 and started Solr in schema less mode. Using SolrInputDocument, Iam able to add new fields in solrconfig.xml using Solrj. How to specify the field type of a field using Solrj. Eg How can I add field type properties using SolrInputDocument programmatically using Solrj? Can anyone help with it? Regards, Jeniba Johnson The contents of this e-mail and any attachment(s) may contain confidential or privileged information for the intended recipient(s). Unintended recipients are prohibited from taking action on the basis of information in this e-mail and using or disseminating the information, and must notify the sender and delete it from their system. L Infotech will not accept responsibility or liability for the accuracy or completeness of, or the presence of any virus or disabling code in this e-mail"
Re: Escaping characters in a nested query
Thanks Mikhail, I'll give this a try. On Feb 27, 2016 5:27 AM, "Mikhail Khludnev" <mkhlud...@griddynamics.com> wrote: > Hello, > > I suggest q=_query_:{!lucene v=$subq}=my_awesome:less%20pain&... > or even starting for some version (I don't remember) _query_ pseudo field > is not necessary ie q=foo +bar {!lucene > v=$subq}=my_awesome:less%20pain& > > > On Fri, Feb 26, 2016 at 10:38 PM, Jamie Johnson <jej2...@gmail.com> wrote: > > > When using nested queries of the form q=_query_:"my_awesome:query", what > > needs to be escaped in the query portion? Just using the admin UI the > > following works > > > > _query_:"+field\\:with\\:special" > > _query_:"+field\\:with\\~special" > > _query_:"+field\\:with\\" > > > > but the same doesn't work for quotes, i.e. > > > > _query_:"+field\\:with\\"special" > > > > throws a org.apache.solr.search.SyntaxError. If I do > > > > _query_:"+field\\:with\\\"special" it executes, though I am not sure why > > quotes require different escaping. > > > > I am currently running solr 4.10.4, any thoughts? > > > > > > -- > Sincerely yours > Mikhail Khludnev > Principal Engineer, > Grid Dynamics > > <http://www.griddynamics.com> > <mkhlud...@griddynamics.com> >
Escaping characters in a nested query
When using nested queries of the form q=_query_:"my_awesome:query", what needs to be escaped in the query portion? Just using the admin UI the following works _query_:"+field\\:with\\:special" _query_:"+field\\:with\\~special" _query_:"+field\\:with\\" but the same doesn't work for quotes, i.e. _query_:"+field\\:with\\"special" throws a org.apache.solr.search.SyntaxError. If I do _query_:"+field\\:with\\\"special" it executes, though I am not sure why quotes require different escaping. I am currently running solr 4.10.4, any thoughts?
Solr InputFormat Exist?
Is there an equivalent of the ESInputFormat ( https://github.com/elastic/elasticsearch-hadoop/blob/03c056142a5ab7422b81bb1f519fd67a9581405f/mr/src/main/java/org/elasticsearch/hadoop/mr/EsInputFormat.java) in Solr or is there any work that is planned in this regard? -Jamie
Re: Add support in FacetsComponent for facet.method=uif
The patch adds facet.method=uif and then delegates all of the work to the JSON Faceting API to do the work. I had originally added a facet.method=dv and made the original facet.method=fc work using the UnInvertedField but wanted to avoid making a change that would introduce unexpected behavior. While I think it's strange that facet.method=dv does not exist and fc defaults to dv I think if we wanted to change that it should be done in another ticket. On Sun, Jan 3, 2016 at 4:18 PM, William Bell <billnb...@gmail.com> wrote: > Interesting that facet.method=dv or facet.method=uif. What is the > difference? > > On Sun, Jan 3, 2016 at 6:44 AM, Jamie Johnson <jej2...@gmail.com> wrote: > > > For those interested I created a separate jira issue for this but forgot > to > > attach earlier. > > > > https://issues.apache.org/jira/browse/SOLR-8466 > > On Jan 2, 2016 8:45 PM, "William Bell" <billnb...@gmail.com> wrote: > > > > > Yes we would like backward compatibility. We cannot switch all the > facet > > > fields to DocValues and our faceting is slow. > > > > > > Please... > > > > > > On Fri, Jan 1, 2016 at 7:41 AM, Jamie Johnson <jej2...@gmail.com> > wrote: > > > > > > > Is there any interest in this? While i think it's important and > inline > > > > with faceting available in the new json facet api, I've seen no > > > discussion > > > > on it so I'm wondering if it's best I add support for this using a > > custom > > > > facet component even though the majority of the component will be a > > copy > > > > which is prefer to not need to maintain separately. > > > > > > > > Jamie > > > > On Dec 22, 2015 12:37 PM, "Jamie Johnson" <jej2...@gmail.com> wrote: > > > > > > > > > I had previously piggybacked on another post, but I think it may > have > > > > been > > > > > lost there. I had a need to do UnInvertedField based faceting in > the > > > > > FacetsComponent and as such started looking at what would be > required > > > to > > > > > implement something similar to what the JSON Facets based API does > in > > > > this > > > > > regard. The patch that I have in this regard works and is attached > > to > > > > > https://issues.apache.org/jira/browse/SOLR-8096, is that > appropriate > > > or > > > > > should I create a new ticket to specifically add this support? > > > > > > > > > > -Jamie > > > > > > > > > > > > > > > > > > > > > -- > > > Bill Bell > > > billnb...@gmail.com > > > cell 720-256-8076 > > > > > > > > > -- > Bill Bell > billnb...@gmail.com > cell 720-256-8076 >
Re: Add support in FacetsComponent for facet.method=uif
Are you looking at 8466 or 8096? The patch on 8466 is the one I'm referencing. I should remove the other as it is more change than I think should be done for this ticket. Jamie On Jan 3, 2016 8:47 PM, "William Bell" <billnb...@gmail.com> wrote: > Ok the path appears to have dv and uif in there.? > > On Sun, Jan 3, 2016 at 4:40 PM, Jamie Johnson <jej2...@gmail.com> wrote: > > > The patch adds facet.method=uif and then delegates all of the work to the > > JSON Faceting API to do the work. I had originally added a > facet.method=dv > > and made the original facet.method=fc work using the UnInvertedField but > > wanted to avoid making a change that would introduce unexpected behavior. > > While I think it's strange that facet.method=dv does not exist and fc > > defaults to dv I think if we wanted to change that it should be done in > > another ticket. > > > > On Sun, Jan 3, 2016 at 4:18 PM, William Bell <billnb...@gmail.com> > wrote: > > > > > Interesting that facet.method=dv or facet.method=uif. What is the > > > difference? > > > > > > On Sun, Jan 3, 2016 at 6:44 AM, Jamie Johnson <jej2...@gmail.com> > wrote: > > > > > > > For those interested I created a separate jira issue for this but > > forgot > > > to > > > > attach earlier. > > > > > > > > https://issues.apache.org/jira/browse/SOLR-8466 > > > > On Jan 2, 2016 8:45 PM, "William Bell" <billnb...@gmail.com> wrote: > > > > > > > > > Yes we would like backward compatibility. We cannot switch all the > > > facet > > > > > fields to DocValues and our faceting is slow. > > > > > > > > > > Please... > > > > > > > > > > On Fri, Jan 1, 2016 at 7:41 AM, Jamie Johnson <jej2...@gmail.com> > > > wrote: > > > > > > > > > > > Is there any interest in this? While i think it's important and > > > inline > > > > > > with faceting available in the new json facet api, I've seen no > > > > > discussion > > > > > > on it so I'm wondering if it's best I add support for this using > a > > > > custom > > > > > > facet component even though the majority of the component will > be a > > > > copy > > > > > > which is prefer to not need to maintain separately. > > > > > > > > > > > > Jamie > > > > > > On Dec 22, 2015 12:37 PM, "Jamie Johnson" <jej2...@gmail.com> > > wrote: > > > > > > > > > > > > > I had previously piggybacked on another post, but I think it > may > > > have > > > > > > been > > > > > > > lost there. I had a need to do UnInvertedField based faceting > in > > > the > > > > > > > FacetsComponent and as such started looking at what would be > > > required > > > > > to > > > > > > > implement something similar to what the JSON Facets based API > > does > > > in > > > > > > this > > > > > > > regard. The patch that I have in this regard works and is > > attached > > > > to > > > > > > > https://issues.apache.org/jira/browse/SOLR-8096, is that > > > appropriate > > > > > or > > > > > > > should I create a new ticket to specifically add this support? > > > > > > > > > > > > > > -Jamie > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > Bill Bell > > > > > billnb...@gmail.com > > > > > cell 720-256-8076 > > > > > > > > > > > > > > > > > > > > > -- > > > Bill Bell > > > billnb...@gmail.com > > > cell 720-256-8076 > > > > > > > > > -- > Bill Bell > billnb...@gmail.com > cell 720-256-8076 >
Re: Add support in FacetsComponent for facet.method=uif
For those interested I created a separate jira issue for this but forgot to attach earlier. https://issues.apache.org/jira/browse/SOLR-8466 On Jan 2, 2016 8:45 PM, "William Bell" <billnb...@gmail.com> wrote: > Yes we would like backward compatibility. We cannot switch all the facet > fields to DocValues and our faceting is slow. > > Please... > > On Fri, Jan 1, 2016 at 7:41 AM, Jamie Johnson <jej2...@gmail.com> wrote: > > > Is there any interest in this? While i think it's important and inline > > with faceting available in the new json facet api, I've seen no > discussion > > on it so I'm wondering if it's best I add support for this using a custom > > facet component even though the majority of the component will be a copy > > which is prefer to not need to maintain separately. > > > > Jamie > > On Dec 22, 2015 12:37 PM, "Jamie Johnson" <jej2...@gmail.com> wrote: > > > > > I had previously piggybacked on another post, but I think it may have > > been > > > lost there. I had a need to do UnInvertedField based faceting in the > > > FacetsComponent and as such started looking at what would be required > to > > > implement something similar to what the JSON Facets based API does in > > this > > > regard. The patch that I have in this regard works and is attached to > > > https://issues.apache.org/jira/browse/SOLR-8096, is that appropriate > or > > > should I create a new ticket to specifically add this support? > > > > > > -Jamie > > > > > > > > > -- > Bill Bell > billnb...@gmail.com > cell 720-256-8076 >
Re: Add support in FacetsComponent for facet.method=uif
Is there any interest in this? While i think it's important and inline with faceting available in the new json facet api, I've seen no discussion on it so I'm wondering if it's best I add support for this using a custom facet component even though the majority of the component will be a copy which is prefer to not need to maintain separately. Jamie On Dec 22, 2015 12:37 PM, "Jamie Johnson" <jej2...@gmail.com> wrote: > I had previously piggybacked on another post, but I think it may have been > lost there. I had a need to do UnInvertedField based faceting in the > FacetsComponent and as such started looking at what would be required to > implement something similar to what the JSON Facets based API does in this > regard. The patch that I have in this regard works and is attached to > https://issues.apache.org/jira/browse/SOLR-8096, is that appropriate or > should I create a new ticket to specifically add this support? > > -Jamie >
Re: Adding the same field value question
Yes the field is multi valued On Dec 28, 2015 3:48 PM, "Jack Krupansky" <jack.krupan...@gmail.com> wrote: > Is the field multivalued? > > -- Jack Krupansky > > On Sun, Dec 27, 2015 at 11:16 PM, Jamie Johnson <jej2...@gmail.com> wrote: > > > What is the difference of adding a field with the same value twice or > > adding it once and boosting the field on add? Is there a situation where > > one approach is preferred? > > > > Jamie > > >
Re: Adding the same field value question
Thanks, I wasn't sure if adding twice and boosting results in a similar thing happening under the hood or not. Appreciate the response. Jamie On Dec 28, 2015 9:08 AM, "Binoy Dalal" <binoydala...@gmail.com> wrote: > There's no benefit in adding the same field twice because that'll just > increase the size of your index without providing any real benefits at > query time. > For increasing the scores, boosting is definitely the way to go. > > On Mon, 28 Dec 2015, 09:46 Jamie Johnson <jej2...@gmail.com> wrote: > > > What is the difference of adding a field with the same value twice or > > adding it once and boosting the field on add? Is there a situation where > > one approach is preferred? > > > > Jamie > > > -- > Regards, > Binoy Dalal >
Re: Solr - facet fields that contain other facet fields
Can you do the opposite? Index into an unanalyzed field and copy into the analyzed? If I remember correctly facets are based off of indexed values so if you tokenize the field then the facets will be as you are seeing now. On Dec 28, 2015 9:45 AM, "Kevin Lopez"wrote: > *What I am trying to accomplish: * > Generate a facet based on the documents uploaded and a text file containing > terms from a domain/ontology such that a facet is shown if a term is in the > text file and in a document (key phrase extraction). > > *The problem:* > When I select the facet for the term "*not necessarily*" (we see there is a > space) and I get the results for the term "*not*". The field is tokenized > and multivalued. This leads me to believe that I can not use a tokenized > field as a facet field. I tried to copy the values of the field to a text > field with a keywordtokenizer. I am told when checking the schema browser: > "Sorry, no Term Info available :(" This is after I delete the old index and > upload the documents again. The facet is coming from a field that is > already copied from another field, so I cannot copy this field to a text > field with a keywordtokenizer or strfield. What can I do to fix this? Is > there an alternate way to accomplish this? > > *Here is my configuration:* > > > > multiValued="true" type="Cytokine_Pass"/> > > > > > > >stored="true" multiValued="true" >termPositions="true" >termVectors="true" >termOffsets="true"/> > sortMissingLast="true" omitNorms="true"> > > minShingleSize="2" maxShingleSize="5" > outputUnigramsIfNoShingles="true" > /> > > > synonyms="synonyms_ColonCancer.txt" ignoreCase="true" expand="true" > tokenizerFactory="solr.KeywordTokenizerFactory"/> > words="prefLabels_ColonCancer.txt" ignoreCase="true"/> > > > > > Regards, > > Kevin >
Adding the same field value question
What is the difference of adding a field with the same value twice or adding it once and boosting the field on add? Is there a situation where one approach is preferred? Jamie
Limit fields returned in solr based on content
I have what I believe is a unique requirement discussed here in the past to limit data sent to users based on some marking in the field.
Re: Limit fields returned in solr based on content
Sorry hit send too early Is there a mechanism in solr/lucene that allows customization of the fields returned that would have access to the field content and payload? On Dec 24, 2015 4:15 PM, "Jamie Johnson" <jej2...@gmail.com> wrote: > I have what I believe is a unique requirement discussed here in the past > to limit data sent to users based on some marking in the field. >
Re: Limit fields returned in solr based on content
I'm currently doing it in a middle tier, but it means I can't return results from the index to users, instead it needs to always hit the store, not the end of the world but was hoping I could use the fields in the index as a quick first view and then get the full result when the user selected an entry. Jamie On Dec 24, 2015 4:26 PM, "Walter Underwood" <wun...@wunderwood.org> wrote: > I would do that in a middle tier. You can’t do every single thing in Solr. > > wunder > Walter Underwood > wun...@wunderwood.org > http://observer.wunderwood.org/ (my blog) > > > > On Dec 24, 2015, at 1:21 PM, Upayavira <u...@odoko.co.uk> wrote: > > > > You could create a custom DocTransformer. They can enhance the fields > > included in the search results. So, instead of fl=somefield you could > > have fl=[my-filter:somefield], and your MyFieldDocTransformer makes the > > decision as to whether or not to include somefield in the output. > > > > This would of course, require some Java coding. > > > > Upayavira > > > > On Thu, Dec 24, 2015, at 09:17 PM, Jamie Johnson wrote: > >> Sorry hit send too early > >> > >> Is there a mechanism in solr/lucene that allows customization of the > >> fields > >> returned that would have access to the field content and payload? > >> On Dec 24, 2015 4:15 PM, "Jamie Johnson" <jej2...@gmail.com> wrote: > >> > >>> I have what I believe is a unique requirement discussed here in the > past > >>> to limit data sent to users based on some marking in the field. > >>> > >
Re: Limit fields returned in solr based on content
Would the doc transformer have access to payloads? On Dec 24, 2015 4:21 PM, "Upayavira" <u...@odoko.co.uk> wrote: > You could create a custom DocTransformer. They can enhance the fields > included in the search results. So, instead of fl=somefield you could > have fl=[my-filter:somefield], and your MyFieldDocTransformer makes the > decision as to whether or not to include somefield in the output. > > This would of course, require some Java coding. > > Upayavira > > On Thu, Dec 24, 2015, at 09:17 PM, Jamie Johnson wrote: > > Sorry hit send too early > > > > Is there a mechanism in solr/lucene that allows customization of the > > fields > > returned that would have access to the field content and payload? > > On Dec 24, 2015 4:15 PM, "Jamie Johnson" <jej2...@gmail.com> wrote: > > > > > I have what I believe is a unique requirement discussed here in the > past > > > to limit data sent to users based on some marking in the field. > > > >
Add support in FacetsComponent for facet.method=uif
I had previously piggybacked on another post, but I think it may have been lost there. I had a need to do UnInvertedField based faceting in the FacetsComponent and as such started looking at what would be required to implement something similar to what the JSON Facets based API does in this regard. The patch that I have in this regard works and is attached to https://issues.apache.org/jira/browse/SOLR-8096, is that appropriate or should I create a new ticket to specifically add this support? -Jamie
Re: facet component and uninverted field
Thanks, the issue I'm having is that there is no equivalent to method uif for the standard facet component. We'll see how SOLR-8096 shakes out. On Sun, Dec 20, 2015 at 11:29 PM, Upayavira <u...@odoko.co.uk> wrote: > > > On Sun, Dec 20, 2015, at 01:32 PM, Jamie Johnson wrote: > > For those interested I've attached an initial patch to > > https://issues.apache.org/jira/browse/SOLR-8096 to start supporting uif > > in > > FacetComponent via JSON facet api. > > On Dec 18, 2015 9:22 PM, "Jamie Johnson" <jej2...@gmail.com> wrote: > > > > > I recently saw that the new JSON Facet API supports controlling the > facet > > > method that is used and was wondering if there was any support for > doing > > > the same thing in the original facet component? > > > > > > Also is there a plan to deprecate one of these components over the > other > > > or is there an expectation that both will continue to live on? > Curious if > > > I should bite the bullet and transition to the new JSON Facet API or > not. > > facet.method specifies the method for faceting! But I suspect you've > found that already. > > As to deprecation, these sort of things in my experience don't get > deprecated as such, we just find that one gets better than the other - > the better it gets, the more adoption it sees. > > Upayavira >
Re: facet component and uninverted field
For those interested I've attached an initial patch to https://issues.apache.org/jira/browse/SOLR-8096 to start supporting uif in FacetComponent via JSON facet api. On Dec 18, 2015 9:22 PM, "Jamie Johnson" <jej2...@gmail.com> wrote: > I recently saw that the new JSON Facet API supports controlling the facet > method that is used and was wondering if there was any support for doing > the same thing in the original facet component? > > Also is there a plan to deprecate one of these components over the other > or is there an expectation that both will continue to live on? Curious if > I should bite the bullet and transition to the new JSON Facet API or not. >
Re: faceting is unusable slow since upgrade to 5.3.0
Bill, Check out the patch attached to https://issues.apache.org/jira/browse/SOLR-8096. I had considered making the method uif after I had done most of the work, it would be trivial to change and would probably be more aligned with not adding unexpected changes to people that are currently using fc. -Jamie On Sat, Dec 19, 2015 at 11:03 PM, William Bellwrote: > Can we add method=uif back when not using the JSON Facet API too? > > That would help a lot of people. > > On Thu, Dec 17, 2015 at 7:17 AM, Yonik Seeley wrote: > > > On Wed, Dec 16, 2015 at 4:57 AM, Vincenzo D'Amore > > wrote: > > > Hi all, > > > > > > given that solr 5.4 is finally released, is this what's more stable and > > > efficient version of solrcloud ? > > > > > > I have a website which receives many search requests. It serve normally > > > about 2000 concurrent requests, but sometime there are peak from 4000 > to > > > 1 requests in few seconds. > > > > > > On January I'll have a chance to upgrade my old SolrCloud 4.8.1 cluster > > to > > > a new brand version, but following this thread I read about the > problems > > > that can occur upgrading to latest version. > > > > > > I have seen that issue SOLR-7730 "speed-up faceting on doc values > fields" > > > is fixed in 5.4. > > > > > > I'm using standard faceting without docValues. Should I add docValues > in > > > order to benefit of such fix? > > > > You'll have to try it I think... > > DocValues have a lot of advantages (much less heap consumption, and > > much smaller overhead when opening a new searcher), but they can often > > be slower as well. > > > > Comparing 4x to 5x non-docvalues, top-level field caches were removed > > by lucene, and while that benefits certain things like NRT (opening a > > new searcher very often), it will hurt performance for other > > configurations. > > > > The JSON Facet API currently allows you to pick your strategy via the > > "method" param for multi-valued string fields without docvalues: > > "uif" (UninvertedField) gets you the top-level strategy from Solr 4, > > while "dv" (DocValues built on-the-fly) gets you the NRT-friendly > > "per-segment" strategy. > > > > -Yonik > > > > > > -- > Bill Bell > billnb...@gmail.com > cell 720-256-8076 >
Re: faceting is unusable slow since upgrade to 5.3.0
Can we still specify the cache implementation for the field cache? When this change occurred to faceting (uninverting reader vs field ) it prevented us from moving to 5.x but if we can get the 4.x functionality using that api we could look to port to the latest. Jamie On Dec 17, 2015 9:18 AM, "Yonik Seeley"wrote: > On Wed, Dec 16, 2015 at 4:57 AM, Vincenzo D'Amore > wrote: > > Hi all, > > > > given that solr 5.4 is finally released, is this what's more stable and > > efficient version of solrcloud ? > > > > I have a website which receives many search requests. It serve normally > > about 2000 concurrent requests, but sometime there are peak from 4000 to > > 1 requests in few seconds. > > > > On January I'll have a chance to upgrade my old SolrCloud 4.8.1 cluster > to > > a new brand version, but following this thread I read about the problems > > that can occur upgrading to latest version. > > > > I have seen that issue SOLR-7730 "speed-up faceting on doc values fields" > > is fixed in 5.4. > > > > I'm using standard faceting without docValues. Should I add docValues in > > order to benefit of such fix? > > You'll have to try it I think... > DocValues have a lot of advantages (much less heap consumption, and > much smaller overhead when opening a new searcher), but they can often > be slower as well. > > Comparing 4x to 5x non-docvalues, top-level field caches were removed > by lucene, and while that benefits certain things like NRT (opening a > new searcher very often), it will hurt performance for other > configurations. > > The JSON Facet API currently allows you to pick your strategy via the > "method" param for multi-valued string fields without docvalues: > "uif" (UninvertedField) gets you the top-level strategy from Solr 4, > while "dv" (DocValues built on-the-fly) gets you the NRT-friendly > "per-segment" strategy. > > -Yonik >
Re: faceting is unusable slow since upgrade to 5.3.0
Also can we get the capability to choose the method of faceting in the older faceting component? I'm not looking for complete feature parity just the ability to specify the method. As always thanks. On Fri, Dec 18, 2015 at 8:04 AM, Jamie Johnson <jej2...@gmail.com> wrote: > Can we still specify the cache implementation for the field cache? When > this change occurred to faceting (uninverting reader vs field ) it > prevented us from moving to 5.x but if we can get the 4.x functionality > using that api we could look to port to the latest. > > Jamie > On Dec 17, 2015 9:18 AM, "Yonik Seeley" <ysee...@gmail.com> wrote: > >> On Wed, Dec 16, 2015 at 4:57 AM, Vincenzo D'Amore <v.dam...@gmail.com> >> wrote: >> > Hi all, >> > >> > given that solr 5.4 is finally released, is this what's more stable and >> > efficient version of solrcloud ? >> > >> > I have a website which receives many search requests. It serve normally >> > about 2000 concurrent requests, but sometime there are peak from 4000 to >> > 1 requests in few seconds. >> > >> > On January I'll have a chance to upgrade my old SolrCloud 4.8.1 cluster >> to >> > a new brand version, but following this thread I read about the problems >> > that can occur upgrading to latest version. >> > >> > I have seen that issue SOLR-7730 "speed-up faceting on doc values >> fields" >> > is fixed in 5.4. >> > >> > I'm using standard faceting without docValues. Should I add docValues in >> > order to benefit of such fix? >> >> You'll have to try it I think... >> DocValues have a lot of advantages (much less heap consumption, and >> much smaller overhead when opening a new searcher), but they can often >> be slower as well. >> >> Comparing 4x to 5x non-docvalues, top-level field caches were removed >> by lucene, and while that benefits certain things like NRT (opening a >> new searcher very often), it will hurt performance for other >> configurations. >> >> The JSON Facet API currently allows you to pick your strategy via the >> "method" param for multi-valued string fields without docvalues: >> "uif" (UninvertedField) gets you the top-level strategy from Solr 4, >> while "dv" (DocValues built on-the-fly) gets you the NRT-friendly >> "per-segment" strategy. >> >> -Yonik >> >
facet component and uninverted field
I recently saw that the new JSON Facet API supports controlling the facet method that is used and was wondering if there was any support for doing the same thing in the original facet component? Also is there a plan to deprecate one of these components over the other or is there an expectation that both will continue to live on? Curious if I should bite the bullet and transition to the new JSON Facet API or not.
Re: Append fields to a document
The expense is in gathering the pieces to do the indexing. There isn't much that I can do in that regard unfortunately. I need to investigate storing the fields, if they aren't returned is the expense just size on disk or is there a memory cost as well? On Dec 16, 2015 7:43 PM, "Alexandre Rafalovitch" <arafa...@gmail.com> wrote: > ExternalFileField might be useful in some situations. > > But also, is it possible that your Solr schema configuration is not > best suited for your domain? Is it - for example - possible that the > additional data should be in child records? > > Pure guesswork here, not enough information. But, as described, Solr > will not be able to fulfill your needs easily. Something will need to > change. > > Regards, >Alex. > > > Newsletter and resources for Solr beginners and intermediates: > http://www.solr-start.com/ > > > On 16 December 2015 at 22:09, Jamie Johnson <jej2...@gmail.com> wrote: > > I have a use case where we only need to append some fields to a document. > > To retrieve the full representation is very expensive but I can easily > get > > the deltas. Is it possible to just add fields to an existing Solr > > document? I experimented with using overwrite=false, but that resulted > in > > two documents with the same uniqueKey in the index (which makes sense). > Is > > there a way to accomplish what I'm looking to do in Solr? My fields > aren't > > all stored and think it will be too expensive for me to make that change. > > Any thoughts would be really appreciated. >
Append fields to a document
I have a use case where we only need to append some fields to a document. To retrieve the full representation is very expensive but I can easily get the deltas. Is it possible to just add fields to an existing Solr document? I experimented with using overwrite=false, but that resulted in two documents with the same uniqueKey in the index (which makes sense). Is there a way to accomplish what I'm looking to do in Solr? My fields aren't all stored and think it will be too expensive for me to make that change. Any thoughts would be really appreciated.
SOLR-7996
Has anyone looked at this issue? I'd be willing to take a stab at it if someone could provide some high level design guidance. This would be a critical piece preventing us from moving to version 5. Jamie
Re: Child document and parent document with same key
Thanks that's what I suspected given what I'm seeing but wanted to make sure. Again thanks On Nov 5, 2015 1:08 PM, "Mikhail Khludnev" <mkhlud...@griddynamics.com> wrote: > On Fri, Oct 16, 2015 at 10:41 PM, Jamie Johnson <jej2...@gmail.com> wrote: > > > Is this expected to work? > > > I think it is. I'm still not sure I understand the question. But let me > bring some details from SOLR-3076: > - Solr's backs on Lucene's "deleteTerm" which is supplied into > indexWriter.updateDocument(); > - when parent document has children, is not a deleteTerm but > its' value is used for "deleteTerm" for field "_root_" see > > https://github.com/apache/lucene-solr/blob/trunk/solr/core/src/java/org/apache/solr/update/DirectUpdateHandler2.java#L251 > - thus for block updates uniqueKey is (almost) meaningless. > It lacks of elegance, but that's it. > > -- > Sincerely yours > Mikhail Khludnev > Principal Engineer, > Grid Dynamics > > <http://www.griddynamics.com> > <mkhlud...@griddynamics.com> >
Re: Child document and parent document with same key
The field is "key" and this is the value of unique key in schema.xml On Oct 17, 2015 3:23 AM, "Mikhail Khludnev" <mkhlud...@griddynamics.com> wrote: > Hello, > > What are the field names for parent and child docs exactly? > Whats' in schema.xml? > What you've got if you actually try to do this? > > On Fri, Oct 16, 2015 at 12:41 PM, Jamie Johnson <jej2...@gmail.com> wrote: > > > I am looking at using child documents and noticed that if I specify a > child > > and parent with the same key solr indexes this fine and I can retrieve > both > > documents separately. Is this expected to work? > > > > -Jamie > > > > > > -- > Sincerely yours > Mikhail Khludnev > Principal Engineer, > Grid Dynamics > > <http://www.griddynamics.com> > <mkhlud...@griddynamics.com> >
Payload doesn't apply to WordDelimiterFilterFactory-generated tokens
I came across this post ( http://lucene.472066.n3.nabble.com/Payload-doesn-t-apply-to-WordDelimiterFilterFactory-generated-tokens-td3136748.html) and tried to find a JIRA for this task. Was one ever created? If not I'd be happy to create it if this is still something that makes sense or if instead there is another recommended approach for supporting cloning attributes like payload from the source token stream in the WordDelimiterFilterFactory.
Re: Order of actions in Update request
Yes if they are in separate requests I imagine it would work though I haven't tested. I was wondering if there was a way to execute these actions in a single request and maintain order. On Oct 24, 2015 3:25 PM, "Shawn Heisey" <apa...@elyograg.org> wrote: > On 10/24/2015 5:21 AM, Jamie Johnson wrote: > > Looking at the code and jira I see that ordering actions in solrj update > > request is currently not supported but I'd like to know if there is any > > other way to get this capability. I took a quick look at the XML loader > > and it appears to process actions as it sees them so if the order was > > changed to order the actions as > > > > Add > > Delete > > Add > > > > Vs > > Add > > Add > > Delete > > > > Would this cause any issues with the update? Would it achieve the > desired > > result? Are there any other options for ordering actions as they were > > provided to the update request? > > If those three actions are in separate update requests using > HttpSolrClient or CloudSolrClient in a single thread, I would expect > them to be executed in the order you make the requests. If you're using > multiple threads, then you probably cannot guarantee the order of the > requests. > > Are you using one of those clients in a single thread and seeing > something other than what I have described? If so, I think that might > be a bug. > > If you're using ConcurrentUpdateSolrClient, I don't think you can > guarantee order. That client has multiple threads pulling the requests > out of an internal queue. If some requests complete substantially > faster than others, they could happen out of order. The concurrent > client is a poor choice for anything but bulk inserts, and because of > the fact that it ignores almost every error that happens while it runs, > it often is not a good choice for that either. > > Thanks, > Shawn > >
Order of actions in Update request
Looking at the code and jira I see that ordering actions in solrj update request is currently not supported but I'd like to know if there is any other way to get this capability. I took a quick look at the XML loader and it appears to process actions as it sees them so if the order was changed to order the actions as Add Delete Add Vs Add Add Delete Would this cause any issues with the update? Would it achieve the desired result? Are there any other options for ordering actions as they were provided to the update request? Jamie
Child document and parent document with same key
I am looking at using child documents and noticed that if I specify a child and parent with the same key solr indexes this fine and I can retrieve both documents separately. Is this expected to work? -Jamie
SolrCloud NoAuth for /unrelatednode error
I am getting an error that essentially says solr does not have auth for /unrelatednode/... I would be ok with the error being displayed, but I think this may be what is causing my solr instances to be shown as down. Currently I'm issuing the following command http://localhost:8983/solr/admin/collections?action=CREATE=collection=2=2=config=2 I see the collection and shards being created, but they appear as down in the clusterstate.json. The only exception I see when trying to show the Cloud graph is shown below. Could this be the cause for the shards showing up as down? WARN ZookeeperInfoServlet - Keeper Exception org.apache.zookeeper.KeeperException$NoAuthException: KeeperErrorCode = NoAuth for /unrelatednode/foo/bar at org.apache.zookeeper.KeeperException.create(KeeperException.java:113) at org.apache.zookeeper.KeeperException.create(KeeperException.java:51) at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1155) at org.apache.solr.common.cloud.SolrZkClient$8.execute(SolrZkClient.java:308) at org.apache.solr.common.cloud.SolrZkClient$8.execute(SolrZkClient.java:305) at org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkCmdExecutor.java:74) at org.apache.solr.common.cloud.SolrZkClient.getData(SolrZkClient.java:305) at org.apache.solr.servlet.ZookeeperInfoServlet$ZKPrinter.printTree(ZookeeperInfoServlet.java:279) at org.apache.solr.servlet.ZookeeperInfoServlet$ZKPrinter.printTree(ZookeeperInfoServlet.java:322) at org.apache.solr.servlet.ZookeeperInfoServlet$ZKPrinter.printTree(ZookeeperInfoServlet.java:322) at org.apache.solr.servlet.ZookeeperInfoServlet$ZKPrinter.printTree(ZookeeperInfoServlet.java:322) at org.apache.solr.servlet.ZookeeperInfoServlet$ZKPrinter.printTree(ZookeeperInfoServlet.java:322) at org.apache.solr.servlet.ZookeeperInfoServlet$ZKPrinter.print(ZookeeperInfoServlet.java:226) at org.apache.solr.servlet.ZookeeperInfoServlet.doGet(ZookeeperInfoServlet.java:104) at javax.servlet.http.HttpServlet.service(HttpServlet.java:687) at javax.servlet.http.HttpServlet.service(HttpServlet.java:790) at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:769) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1667) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:466) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:207) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1650) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:583) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143) at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:577) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:223) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1125) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515) at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1059) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141) at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:215) at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:110) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97) at org.eclipse.jetty.server.Server.handle(Server.java:497) at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:311) at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:248) at org.eclipse.jetty.io.AbstractConnection$2.run(AbstractConnection.java:540) at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:610) at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:539) at java.lang.Thread.run(Thread.java:745)
Re: SolrCloud NoAuth for /unrelatednode error
Ah please ignore, it looks like this was totally unrelated and my issue was configuration related On Fri, Oct 9, 2015 at 11:18 AM, Jamie Johnson <jej2...@gmail.com> wrote: > I am getting an error that essentially says solr does not have auth for > /unrelatednode/... I would be ok with the error being displayed, but I > think this may be what is causing my solr instances to be shown as down. > Currently I'm issuing the following command > > > http://localhost:8983/solr/admin/collections?action=CREATE=collection=2=2=config=2 > > I see the collection and shards being created, but they appear as down in > the clusterstate.json. The only exception I see when trying to show the > Cloud graph is shown below. Could this be the cause for the shards showing > up as down? > > WARN ZookeeperInfoServlet - Keeper Exception > org.apache.zookeeper.KeeperException$NoAuthException: KeeperErrorCode = > NoAuth for /unrelatednode/foo/bar > at org.apache.zookeeper.KeeperException.create(KeeperException.java:113) > at org.apache.zookeeper.KeeperException.create(KeeperException.java:51) > at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1155) > at > org.apache.solr.common.cloud.SolrZkClient$8.execute(SolrZkClient.java:308) > at > org.apache.solr.common.cloud.SolrZkClient$8.execute(SolrZkClient.java:305) > at > org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkCmdExecutor.java:74) > at org.apache.solr.common.cloud.SolrZkClient.getData(SolrZkClient.java:305) > at > org.apache.solr.servlet.ZookeeperInfoServlet$ZKPrinter.printTree(ZookeeperInfoServlet.java:279) > at > org.apache.solr.servlet.ZookeeperInfoServlet$ZKPrinter.printTree(ZookeeperInfoServlet.java:322) > at > org.apache.solr.servlet.ZookeeperInfoServlet$ZKPrinter.printTree(ZookeeperInfoServlet.java:322) > at > org.apache.solr.servlet.ZookeeperInfoServlet$ZKPrinter.printTree(ZookeeperInfoServlet.java:322) > at > org.apache.solr.servlet.ZookeeperInfoServlet$ZKPrinter.printTree(ZookeeperInfoServlet.java:322) > at > org.apache.solr.servlet.ZookeeperInfoServlet$ZKPrinter.print(ZookeeperInfoServlet.java:226) > at > org.apache.solr.servlet.ZookeeperInfoServlet.doGet(ZookeeperInfoServlet.java:104) > at javax.servlet.http.HttpServlet.service(HttpServlet.java:687) > at javax.servlet.http.HttpServlet.service(HttpServlet.java:790) > at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:769) > at > org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1667) > at > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:466) > at > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:207) > at > org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1650) > at > org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:583) > at > org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143) > at > org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:577) > at > org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:223) > at > org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1125) > at > org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515) > at > org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185) > at > org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1059) > at > org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141) > at > org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:215) > at > org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:110) > at > org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97) > at org.eclipse.jetty.server.Server.handle(Server.java:497) > at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:311) > at > org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:248) > at > org.eclipse.jetty.io.AbstractConnection$2.run(AbstractConnection.java:540) > at > org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:610) > at > org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:539) > at java.lang.Thread.run(Thread.java:745) > >
Re: Lucene/Solr 5.0 and custom FieldCahe implementation
No worries, thanks again I'll begin teaching this On Mon, Aug 31, 2015, 5:16 PM Tomás Fernández Löbbe <tomasflo...@gmail.com> wrote: > Sorry Jamie, I totally missed this email. There was no Jira that I could > find. I created SOLR-7996 > > On Sat, Aug 29, 2015 at 5:26 AM, Jamie Johnson <jej2...@gmail.com> wrote: > > > This sounds like a good idea, I'm assuming I'd need to make my own > > UnInvertingReader (or subclass) to do this right? Is there a way to do > > this on the 5.x codebase or would I still need the solrindexer factory > work > > that Tomás mentioned previously? > > > > Tomás, is there a ticket for the SolrIndexer factory? I'd like to follow > > it's work to know what version of 5.x (or later) I should be looking for > > this in. > > > > On Thu, Aug 27, 2015 at 1:06 PM, Yonik Seeley <ysee...@gmail.com> wrote: > > > > > UnInvertingReader makes indexed fields look like docvalues fields. > > > The caching itself is still done in FieldCache/FieldCacheImpl > > > but you could perhaps wrap what is cached there to either screen out > > > stuff or construct a new entry based on the user. > > > > > > -Yonik > > > > > > > > > On Thu, Aug 27, 2015 at 12:55 PM, Jamie Johnson <jej2...@gmail.com> > > wrote: > > > > I think a custom UnInvertingReader would work as I could skip the > > process > > > > of putting things in the cache. Right now in Solr 4.x though I am > > > caching > > > > based but including the users authorities in the key of the cache so > > > we're > > > > not rebuilding the UnivertedField on every request. Where in 5.x is > > the > > > > object actually cached? Will this be possible in 5.x? > > > > > > > > On Thu, Aug 27, 2015 at 12:32 PM, Yonik Seeley <ysee...@gmail.com> > > > wrote: > > > > > > > >> The FieldCache has become implementation rather than interface, so I > > > >> don't think you're going to see plugins at that level (it's all > > > >> package protected now). > > > >> > > > >> One could either subclass or re-implement UnInvertingReader though. > > > >> > > > >> -Yonik > > > >> > > > >> > > > >> On Thu, Aug 27, 2015 at 12:09 PM, Jamie Johnson <jej2...@gmail.com> > > > wrote: > > > >> > Also in this vein I think that Lucene should support factories for > > the > > > >> > cache creation as described @ > > > >> > https://issues.apache.org/jira/browse/LUCENE-2394. I'm not > > endorsing > > > >> the > > > >> > patch that is provided (I haven't even looked at it) just the > > concept > > > in > > > >> > general. > > > >> > > > > >> > On Thu, Aug 27, 2015 at 12:01 PM, Jamie Johnson < > jej2...@gmail.com> > > > >> wrote: > > > >> > > > > >> >> That makes sense, then I could extend the SolrIndexSearcher by > > > creating > > > >> a > > > >> >> different factory class that did whatever magic I needed. If you > > > >> create a > > > >> >> Jira ticket for this please link it here so I can track it! > Again > > > >> thanks > > > >> >> > > > >> >> On Thu, Aug 27, 2015 at 11:59 AM, Tomás Fernández Löbbe < > > > >> >> tomasflo...@gmail.com> wrote: > > > >> >> > > > >> >>> I don't think there is a way to do this now. Maybe we should > > > separate > > > >> the > > > >> >>> logic of creating the SolrIndexSearcher to a factory. Moving > this > > > logic > > > >> >>> away from SolrCore is already a win, plus it will make it easier > > to > > > >> unit > > > >> >>> test and extend for advanced use cases. > > > >> >>> > > > >> >>> Tomás > > > >> >>> > > > >> >>> On Wed, Aug 26, 2015 at 8:10 PM, Jamie Johnson < > jej2...@gmail.com > > > > > > >> wrote: > > > >> >>> > > > >> >>> > Sorry to poke this again but I'm not following the last > comment > > of > > > >> how I > > > >> >>> > could go about extending the so
Re: Lucene/Solr 5.0 and custom FieldCahe implementation
Tracking not teaching... Auto complete is fun... On Tue, Sep 1, 2015, 6:34 AM Jamie Johnson <jej2...@gmail.com> wrote: > No worries, thanks again I'll begin teaching this > > On Mon, Aug 31, 2015, 5:16 PM Tomás Fernández Löbbe <tomasflo...@gmail.com> > wrote: > >> Sorry Jamie, I totally missed this email. There was no Jira that I could >> find. I created SOLR-7996 >> >> On Sat, Aug 29, 2015 at 5:26 AM, Jamie Johnson <jej2...@gmail.com> wrote: >> >> > This sounds like a good idea, I'm assuming I'd need to make my own >> > UnInvertingReader (or subclass) to do this right? Is there a way to do >> > this on the 5.x codebase or would I still need the solrindexer factory >> work >> > that Tomás mentioned previously? >> > >> > Tomás, is there a ticket for the SolrIndexer factory? I'd like to >> follow >> > it's work to know what version of 5.x (or later) I should be looking for >> > this in. >> > >> > On Thu, Aug 27, 2015 at 1:06 PM, Yonik Seeley <ysee...@gmail.com> >> wrote: >> > >> > > UnInvertingReader makes indexed fields look like docvalues fields. >> > > The caching itself is still done in FieldCache/FieldCacheImpl >> > > but you could perhaps wrap what is cached there to either screen out >> > > stuff or construct a new entry based on the user. >> > > >> > > -Yonik >> > > >> > > >> > > On Thu, Aug 27, 2015 at 12:55 PM, Jamie Johnson <jej2...@gmail.com> >> > wrote: >> > > > I think a custom UnInvertingReader would work as I could skip the >> > process >> > > > of putting things in the cache. Right now in Solr 4.x though I am >> > > caching >> > > > based but including the users authorities in the key of the cache so >> > > we're >> > > > not rebuilding the UnivertedField on every request. Where in 5.x is >> > the >> > > > object actually cached? Will this be possible in 5.x? >> > > > >> > > > On Thu, Aug 27, 2015 at 12:32 PM, Yonik Seeley <ysee...@gmail.com> >> > > wrote: >> > > > >> > > >> The FieldCache has become implementation rather than interface, so >> I >> > > >> don't think you're going to see plugins at that level (it's all >> > > >> package protected now). >> > > >> >> > > >> One could either subclass or re-implement UnInvertingReader though. >> > > >> >> > > >> -Yonik >> > > >> >> > > >> >> > > >> On Thu, Aug 27, 2015 at 12:09 PM, Jamie Johnson <jej2...@gmail.com >> > >> > > wrote: >> > > >> > Also in this vein I think that Lucene should support factories >> for >> > the >> > > >> > cache creation as described @ >> > > >> > https://issues.apache.org/jira/browse/LUCENE-2394. I'm not >> > endorsing >> > > >> the >> > > >> > patch that is provided (I haven't even looked at it) just the >> > concept >> > > in >> > > >> > general. >> > > >> > >> > > >> > On Thu, Aug 27, 2015 at 12:01 PM, Jamie Johnson < >> jej2...@gmail.com> >> > > >> wrote: >> > > >> > >> > > >> >> That makes sense, then I could extend the SolrIndexSearcher by >> > > creating >> > > >> a >> > > >> >> different factory class that did whatever magic I needed. If >> you >> > > >> create a >> > > >> >> Jira ticket for this please link it here so I can track it! >> Again >> > > >> thanks >> > > >> >> >> > > >> >> On Thu, Aug 27, 2015 at 11:59 AM, Tomás Fernández Löbbe < >> > > >> >> tomasflo...@gmail.com> wrote: >> > > >> >> >> > > >> >>> I don't think there is a way to do this now. Maybe we should >> > > separate >> > > >> the >> > > >> >>> logic of creating the SolrIndexSearcher to a factory. Moving >> this >> > > logic >> > > >> >>> away from SolrCore is already a win, plus it will make it >> easier >> > to >> > > >> unit >> > >
Re: Lucene/Solr 5.0 and custom FieldCahe implementation
This sounds like a good idea, I'm assuming I'd need to make my own UnInvertingReader (or subclass) to do this right? Is there a way to do this on the 5.x codebase or would I still need the solrindexer factory work that Tomás mentioned previously? Tomás, is there a ticket for the SolrIndexer factory? I'd like to follow it's work to know what version of 5.x (or later) I should be looking for this in. On Thu, Aug 27, 2015 at 1:06 PM, Yonik Seeley ysee...@gmail.com wrote: UnInvertingReader makes indexed fields look like docvalues fields. The caching itself is still done in FieldCache/FieldCacheImpl but you could perhaps wrap what is cached there to either screen out stuff or construct a new entry based on the user. -Yonik On Thu, Aug 27, 2015 at 12:55 PM, Jamie Johnson jej2...@gmail.com wrote: I think a custom UnInvertingReader would work as I could skip the process of putting things in the cache. Right now in Solr 4.x though I am caching based but including the users authorities in the key of the cache so we're not rebuilding the UnivertedField on every request. Where in 5.x is the object actually cached? Will this be possible in 5.x? On Thu, Aug 27, 2015 at 12:32 PM, Yonik Seeley ysee...@gmail.com wrote: The FieldCache has become implementation rather than interface, so I don't think you're going to see plugins at that level (it's all package protected now). One could either subclass or re-implement UnInvertingReader though. -Yonik On Thu, Aug 27, 2015 at 12:09 PM, Jamie Johnson jej2...@gmail.com wrote: Also in this vein I think that Lucene should support factories for the cache creation as described @ https://issues.apache.org/jira/browse/LUCENE-2394. I'm not endorsing the patch that is provided (I haven't even looked at it) just the concept in general. On Thu, Aug 27, 2015 at 12:01 PM, Jamie Johnson jej2...@gmail.com wrote: That makes sense, then I could extend the SolrIndexSearcher by creating a different factory class that did whatever magic I needed. If you create a Jira ticket for this please link it here so I can track it! Again thanks On Thu, Aug 27, 2015 at 11:59 AM, Tomás Fernández Löbbe tomasflo...@gmail.com wrote: I don't think there is a way to do this now. Maybe we should separate the logic of creating the SolrIndexSearcher to a factory. Moving this logic away from SolrCore is already a win, plus it will make it easier to unit test and extend for advanced use cases. Tomás On Wed, Aug 26, 2015 at 8:10 PM, Jamie Johnson jej2...@gmail.com wrote: Sorry to poke this again but I'm not following the last comment of how I could go about extending the solr index searcher and have the extension used. Is there an example of this? Again thanks Jamie On Aug 25, 2015 7:18 AM, Jamie Johnson jej2...@gmail.com wrote: I had seen this as well, if I over wrote this by extending SolrIndexSearcher how do I have my extension used? I didn't see a way that could be plugged in. On Aug 25, 2015 7:15 AM, Mikhail Khludnev mkhlud...@griddynamics.com wrote: On Tue, Aug 25, 2015 at 2:03 PM, Jamie Johnson jej2...@gmail.com wrote: Thanks Mikhail. If I'm reading the SimpleFacets class correctly, out delegates to DocValuesFacets when facet method is FC, what used to be FieldCache I believe. DocValuesFacets either uses DocValues or builds then using the UninvertingReader. Ah.. got it. Thanks for reminding this details.It seems like even docValues=true doesn't help with your custom implementation. I am not seeing a clean extension point to add a custom UninvertingReader to Solr, would the only way be to copy the FacetComponent and SimpleFacets and modify as needed? Sadly, yes. There is no proper extension point. Also, consider overriding SolrIndexSearcher.wrapReader(SolrCore, DirectoryReader) where the particular UninvertingReader is created, there you can pass the own one, which refers to custom FieldCache. On Aug 25, 2015 12:42 AM, Mikhail Khludnev mkhlud...@griddynamics.com wrote: Hello Jamie, I don't understand how it could choose DocValuesFacets (it occurs on docValues=true) field, but then switches to UninvertingReader/FieldCache which means docValues=false. If you can provide more details it would be great. Beside of that, I suppose you can only implement and inject your own UninvertingReader, I don't think there is an extension point for this. It's too specific requirement. On Tue, Aug 25, 2015 at 3:50 AM, Jamie Johnson jej2...@gmail.com wrote: as mentioned
Re: Lucene/Solr 5.0 and custom FieldCahe implementation
Also since DocValues seems to be the future of faceting, is there another mechanism that I should be looking at to do authorization based filtering like this? I know that I can do this filtering at a document level and get the desired result, but am wondering about at the Term level. As always thanks. On Sat, Aug 29, 2015 at 8:26 AM, Jamie Johnson jej2...@gmail.com wrote: This sounds like a good idea, I'm assuming I'd need to make my own UnInvertingReader (or subclass) to do this right? Is there a way to do this on the 5.x codebase or would I still need the solrindexer factory work that Tomás mentioned previously? Tomás, is there a ticket for the SolrIndexer factory? I'd like to follow it's work to know what version of 5.x (or later) I should be looking for this in. On Thu, Aug 27, 2015 at 1:06 PM, Yonik Seeley ysee...@gmail.com wrote: UnInvertingReader makes indexed fields look like docvalues fields. The caching itself is still done in FieldCache/FieldCacheImpl but you could perhaps wrap what is cached there to either screen out stuff or construct a new entry based on the user. -Yonik On Thu, Aug 27, 2015 at 12:55 PM, Jamie Johnson jej2...@gmail.com wrote: I think a custom UnInvertingReader would work as I could skip the process of putting things in the cache. Right now in Solr 4.x though I am caching based but including the users authorities in the key of the cache so we're not rebuilding the UnivertedField on every request. Where in 5.x is the object actually cached? Will this be possible in 5.x? On Thu, Aug 27, 2015 at 12:32 PM, Yonik Seeley ysee...@gmail.com wrote: The FieldCache has become implementation rather than interface, so I don't think you're going to see plugins at that level (it's all package protected now). One could either subclass or re-implement UnInvertingReader though. -Yonik On Thu, Aug 27, 2015 at 12:09 PM, Jamie Johnson jej2...@gmail.com wrote: Also in this vein I think that Lucene should support factories for the cache creation as described @ https://issues.apache.org/jira/browse/LUCENE-2394. I'm not endorsing the patch that is provided (I haven't even looked at it) just the concept in general. On Thu, Aug 27, 2015 at 12:01 PM, Jamie Johnson jej2...@gmail.com wrote: That makes sense, then I could extend the SolrIndexSearcher by creating a different factory class that did whatever magic I needed. If you create a Jira ticket for this please link it here so I can track it! Again thanks On Thu, Aug 27, 2015 at 11:59 AM, Tomás Fernández Löbbe tomasflo...@gmail.com wrote: I don't think there is a way to do this now. Maybe we should separate the logic of creating the SolrIndexSearcher to a factory. Moving this logic away from SolrCore is already a win, plus it will make it easier to unit test and extend for advanced use cases. Tomás On Wed, Aug 26, 2015 at 8:10 PM, Jamie Johnson jej2...@gmail.com wrote: Sorry to poke this again but I'm not following the last comment of how I could go about extending the solr index searcher and have the extension used. Is there an example of this? Again thanks Jamie On Aug 25, 2015 7:18 AM, Jamie Johnson jej2...@gmail.com wrote: I had seen this as well, if I over wrote this by extending SolrIndexSearcher how do I have my extension used? I didn't see a way that could be plugged in. On Aug 25, 2015 7:15 AM, Mikhail Khludnev mkhlud...@griddynamics.com wrote: On Tue, Aug 25, 2015 at 2:03 PM, Jamie Johnson jej2...@gmail.com wrote: Thanks Mikhail. If I'm reading the SimpleFacets class correctly, out delegates to DocValuesFacets when facet method is FC, what used to be FieldCache I believe. DocValuesFacets either uses DocValues or builds then using the UninvertingReader. Ah.. got it. Thanks for reminding this details.It seems like even docValues=true doesn't help with your custom implementation. I am not seeing a clean extension point to add a custom UninvertingReader to Solr, would the only way be to copy the FacetComponent and SimpleFacets and modify as needed? Sadly, yes. There is no proper extension point. Also, consider overriding SolrIndexSearcher.wrapReader(SolrCore, DirectoryReader) where the particular UninvertingReader is created, there you can pass the own one, which refers to custom FieldCache. On Aug 25, 2015 12:42 AM, Mikhail Khludnev mkhlud...@griddynamics.com wrote: Hello Jamie, I don't understand how it could choose DocValuesFacets (it occurs on docValues=true) field, but then switches to UninvertingReader/FieldCache which means docValues=false. If you can provide
Re: Lucene/Solr 5.0 and custom FieldCahe implementation
I think a custom UnInvertingReader would work as I could skip the process of putting things in the cache. Right now in Solr 4.x though I am caching based but including the users authorities in the key of the cache so we're not rebuilding the UnivertedField on every request. Where in 5.x is the object actually cached? Will this be possible in 5.x? On Thu, Aug 27, 2015 at 12:32 PM, Yonik Seeley ysee...@gmail.com wrote: The FieldCache has become implementation rather than interface, so I don't think you're going to see plugins at that level (it's all package protected now). One could either subclass or re-implement UnInvertingReader though. -Yonik On Thu, Aug 27, 2015 at 12:09 PM, Jamie Johnson jej2...@gmail.com wrote: Also in this vein I think that Lucene should support factories for the cache creation as described @ https://issues.apache.org/jira/browse/LUCENE-2394. I'm not endorsing the patch that is provided (I haven't even looked at it) just the concept in general. On Thu, Aug 27, 2015 at 12:01 PM, Jamie Johnson jej2...@gmail.com wrote: That makes sense, then I could extend the SolrIndexSearcher by creating a different factory class that did whatever magic I needed. If you create a Jira ticket for this please link it here so I can track it! Again thanks On Thu, Aug 27, 2015 at 11:59 AM, Tomás Fernández Löbbe tomasflo...@gmail.com wrote: I don't think there is a way to do this now. Maybe we should separate the logic of creating the SolrIndexSearcher to a factory. Moving this logic away from SolrCore is already a win, plus it will make it easier to unit test and extend for advanced use cases. Tomás On Wed, Aug 26, 2015 at 8:10 PM, Jamie Johnson jej2...@gmail.com wrote: Sorry to poke this again but I'm not following the last comment of how I could go about extending the solr index searcher and have the extension used. Is there an example of this? Again thanks Jamie On Aug 25, 2015 7:18 AM, Jamie Johnson jej2...@gmail.com wrote: I had seen this as well, if I over wrote this by extending SolrIndexSearcher how do I have my extension used? I didn't see a way that could be plugged in. On Aug 25, 2015 7:15 AM, Mikhail Khludnev mkhlud...@griddynamics.com wrote: On Tue, Aug 25, 2015 at 2:03 PM, Jamie Johnson jej2...@gmail.com wrote: Thanks Mikhail. If I'm reading the SimpleFacets class correctly, out delegates to DocValuesFacets when facet method is FC, what used to be FieldCache I believe. DocValuesFacets either uses DocValues or builds then using the UninvertingReader. Ah.. got it. Thanks for reminding this details.It seems like even docValues=true doesn't help with your custom implementation. I am not seeing a clean extension point to add a custom UninvertingReader to Solr, would the only way be to copy the FacetComponent and SimpleFacets and modify as needed? Sadly, yes. There is no proper extension point. Also, consider overriding SolrIndexSearcher.wrapReader(SolrCore, DirectoryReader) where the particular UninvertingReader is created, there you can pass the own one, which refers to custom FieldCache. On Aug 25, 2015 12:42 AM, Mikhail Khludnev mkhlud...@griddynamics.com wrote: Hello Jamie, I don't understand how it could choose DocValuesFacets (it occurs on docValues=true) field, but then switches to UninvertingReader/FieldCache which means docValues=false. If you can provide more details it would be great. Beside of that, I suppose you can only implement and inject your own UninvertingReader, I don't think there is an extension point for this. It's too specific requirement. On Tue, Aug 25, 2015 at 3:50 AM, Jamie Johnson jej2...@gmail.com wrote: as mentioned in a previous email I have a need to provide security controls at the term level. I know that Lucene/Solr doesn't support this so I had baked something onto a 4.x baseline that was sufficient for my use cases. I am now looking to move that implementation to 5.x and am running into an issue around faceting. Previously we were able to provide a custom cache implementation that would create separate cache entries given a particular set of security controls, but in Solr 5 some faceting is delegated to DocValuesFacets which delegates to UninvertingReader in my case (we are not storing DocValues). The issue I am running into is that before 5.x I had the ability to influence the FieldCache that was used at the Solr level to also include a security token into the key so each cache entry was scoped
Re: StrDocValues
Actually I should have just tried this before asking but I'll say what I'm seeing and maybe someone can confirm. Faceting looks like it took this into account, i.e. the counts were 0 for values that were in documents that I removed using my AnalyticQuery. I had expected that the AnalyticsQuery might be done after everything was completed, but it looks like it was executed before faceting which is great. On Thu, Aug 27, 2015 at 1:17 PM, Jamie Johnson jej2...@gmail.com wrote: Thanks Yonik. I currently am using this to negate the score of a document given the value of a particular field within the document, then using a custom AnalyticQuery to only collect documents with a score 0. Will this also impact the faceting counts? On Wed, Aug 26, 2015 at 8:32 PM, Yonik Seeley ysee...@gmail.com wrote: On Wed, Aug 26, 2015 at 6:20 PM, Jamie Johnson jej2...@gmail.com wrote: I don't see it explicitly mentioned, but does the boost only get applied to the final documents/score that matched the provided query or is it called for each field that matched? I'm assuming only once per document that matched the main query, is that right? Correct. -Yonik
Re: StrDocValues
Thanks Yonik. I currently am using this to negate the score of a document given the value of a particular field within the document, then using a custom AnalyticQuery to only collect documents with a score 0. Will this also impact the faceting counts? On Wed, Aug 26, 2015 at 8:32 PM, Yonik Seeley ysee...@gmail.com wrote: On Wed, Aug 26, 2015 at 6:20 PM, Jamie Johnson jej2...@gmail.com wrote: I don't see it explicitly mentioned, but does the boost only get applied to the final documents/score that matched the provided query or is it called for each field that matched? I'm assuming only once per document that matched the main query, is that right? Correct. -Yonik
Re: Lucene/Solr 5.0 and custom FieldCahe implementation
Also in this vein I think that Lucene should support factories for the cache creation as described @ https://issues.apache.org/jira/browse/LUCENE-2394. I'm not endorsing the patch that is provided (I haven't even looked at it) just the concept in general. On Thu, Aug 27, 2015 at 12:01 PM, Jamie Johnson jej2...@gmail.com wrote: That makes sense, then I could extend the SolrIndexSearcher by creating a different factory class that did whatever magic I needed. If you create a Jira ticket for this please link it here so I can track it! Again thanks On Thu, Aug 27, 2015 at 11:59 AM, Tomás Fernández Löbbe tomasflo...@gmail.com wrote: I don't think there is a way to do this now. Maybe we should separate the logic of creating the SolrIndexSearcher to a factory. Moving this logic away from SolrCore is already a win, plus it will make it easier to unit test and extend for advanced use cases. Tomás On Wed, Aug 26, 2015 at 8:10 PM, Jamie Johnson jej2...@gmail.com wrote: Sorry to poke this again but I'm not following the last comment of how I could go about extending the solr index searcher and have the extension used. Is there an example of this? Again thanks Jamie On Aug 25, 2015 7:18 AM, Jamie Johnson jej2...@gmail.com wrote: I had seen this as well, if I over wrote this by extending SolrIndexSearcher how do I have my extension used? I didn't see a way that could be plugged in. On Aug 25, 2015 7:15 AM, Mikhail Khludnev mkhlud...@griddynamics.com wrote: On Tue, Aug 25, 2015 at 2:03 PM, Jamie Johnson jej2...@gmail.com wrote: Thanks Mikhail. If I'm reading the SimpleFacets class correctly, out delegates to DocValuesFacets when facet method is FC, what used to be FieldCache I believe. DocValuesFacets either uses DocValues or builds then using the UninvertingReader. Ah.. got it. Thanks for reminding this details.It seems like even docValues=true doesn't help with your custom implementation. I am not seeing a clean extension point to add a custom UninvertingReader to Solr, would the only way be to copy the FacetComponent and SimpleFacets and modify as needed? Sadly, yes. There is no proper extension point. Also, consider overriding SolrIndexSearcher.wrapReader(SolrCore, DirectoryReader) where the particular UninvertingReader is created, there you can pass the own one, which refers to custom FieldCache. On Aug 25, 2015 12:42 AM, Mikhail Khludnev mkhlud...@griddynamics.com wrote: Hello Jamie, I don't understand how it could choose DocValuesFacets (it occurs on docValues=true) field, but then switches to UninvertingReader/FieldCache which means docValues=false. If you can provide more details it would be great. Beside of that, I suppose you can only implement and inject your own UninvertingReader, I don't think there is an extension point for this. It's too specific requirement. On Tue, Aug 25, 2015 at 3:50 AM, Jamie Johnson jej2...@gmail.com wrote: as mentioned in a previous email I have a need to provide security controls at the term level. I know that Lucene/Solr doesn't support this so I had baked something onto a 4.x baseline that was sufficient for my use cases. I am now looking to move that implementation to 5.x and am running into an issue around faceting. Previously we were able to provide a custom cache implementation that would create separate cache entries given a particular set of security controls, but in Solr 5 some faceting is delegated to DocValuesFacets which delegates to UninvertingReader in my case (we are not storing DocValues). The issue I am running into is that before 5.x I had the ability to influence the FieldCache that was used at the Solr level to also include a security token into the key so each cache entry was scoped to a particular level. With the current implementation the FieldCache seems to be an internal detail that I can't influence in anyway. Is this correct? I had noticed this Jira ticket https://issues.apache.org/jira/browse/LUCENE-5427, is there any movement on this? Is there another way to influence the information that is put into these caches? As always thanks in advance for any suggestions. -Jamie -- Sincerely yours Mikhail Khludnev Principal Engineer, Grid Dynamics http://www.griddynamics.com mkhlud...@griddynamics.com -- Sincerely yours Mikhail Khludnev Principal Engineer, Grid Dynamics http://www.griddynamics.com mkhlud...@griddynamics.com
Re: Lucene/Solr 5.0 and custom FieldCahe implementation
That makes sense, then I could extend the SolrIndexSearcher by creating a different factory class that did whatever magic I needed. If you create a Jira ticket for this please link it here so I can track it! Again thanks On Thu, Aug 27, 2015 at 11:59 AM, Tomás Fernández Löbbe tomasflo...@gmail.com wrote: I don't think there is a way to do this now. Maybe we should separate the logic of creating the SolrIndexSearcher to a factory. Moving this logic away from SolrCore is already a win, plus it will make it easier to unit test and extend for advanced use cases. Tomás On Wed, Aug 26, 2015 at 8:10 PM, Jamie Johnson jej2...@gmail.com wrote: Sorry to poke this again but I'm not following the last comment of how I could go about extending the solr index searcher and have the extension used. Is there an example of this? Again thanks Jamie On Aug 25, 2015 7:18 AM, Jamie Johnson jej2...@gmail.com wrote: I had seen this as well, if I over wrote this by extending SolrIndexSearcher how do I have my extension used? I didn't see a way that could be plugged in. On Aug 25, 2015 7:15 AM, Mikhail Khludnev mkhlud...@griddynamics.com wrote: On Tue, Aug 25, 2015 at 2:03 PM, Jamie Johnson jej2...@gmail.com wrote: Thanks Mikhail. If I'm reading the SimpleFacets class correctly, out delegates to DocValuesFacets when facet method is FC, what used to be FieldCache I believe. DocValuesFacets either uses DocValues or builds then using the UninvertingReader. Ah.. got it. Thanks for reminding this details.It seems like even docValues=true doesn't help with your custom implementation. I am not seeing a clean extension point to add a custom UninvertingReader to Solr, would the only way be to copy the FacetComponent and SimpleFacets and modify as needed? Sadly, yes. There is no proper extension point. Also, consider overriding SolrIndexSearcher.wrapReader(SolrCore, DirectoryReader) where the particular UninvertingReader is created, there you can pass the own one, which refers to custom FieldCache. On Aug 25, 2015 12:42 AM, Mikhail Khludnev mkhlud...@griddynamics.com wrote: Hello Jamie, I don't understand how it could choose DocValuesFacets (it occurs on docValues=true) field, but then switches to UninvertingReader/FieldCache which means docValues=false. If you can provide more details it would be great. Beside of that, I suppose you can only implement and inject your own UninvertingReader, I don't think there is an extension point for this. It's too specific requirement. On Tue, Aug 25, 2015 at 3:50 AM, Jamie Johnson jej2...@gmail.com wrote: as mentioned in a previous email I have a need to provide security controls at the term level. I know that Lucene/Solr doesn't support this so I had baked something onto a 4.x baseline that was sufficient for my use cases. I am now looking to move that implementation to 5.x and am running into an issue around faceting. Previously we were able to provide a custom cache implementation that would create separate cache entries given a particular set of security controls, but in Solr 5 some faceting is delegated to DocValuesFacets which delegates to UninvertingReader in my case (we are not storing DocValues). The issue I am running into is that before 5.x I had the ability to influence the FieldCache that was used at the Solr level to also include a security token into the key so each cache entry was scoped to a particular level. With the current implementation the FieldCache seems to be an internal detail that I can't influence in anyway. Is this correct? I had noticed this Jira ticket https://issues.apache.org/jira/browse/LUCENE-5427, is there any movement on this? Is there another way to influence the information that is put into these caches? As always thanks in advance for any suggestions. -Jamie -- Sincerely yours Mikhail Khludnev Principal Engineer, Grid Dynamics http://www.griddynamics.com mkhlud...@griddynamics.com -- Sincerely yours Mikhail Khludnev Principal Engineer, Grid Dynamics http://www.griddynamics.com mkhlud...@griddynamics.com
Re: StrDocValues
Right, I am removing them myself. Another feature which would be great would be the ability to specify a custom collector like the positive score only collector in this case to avoid having to do an extra pass over all of the scores, but I don't believe there is a way to do that now right? On Thu, Aug 27, 2015 at 3:16 PM, Yonik Seeley ysee...@gmail.com wrote: On Thu, Aug 27, 2015 at 2:43 PM, Erick Erickson erickerick...@gmail.com wrote: Right, when scoring any document that scores 0 is removed from the results Just to clarify, I think Jamie removed 0 scoring documents himself. Solr has never done this itself. Lucene used to a long time ago and then stopped IIRC. -Yonik
StrDocValues
Are there any example implementation showing how StrDocValues works? I am not sure if this is the right place or not, but I was thinking about having some document level doc value that I'd like to read in a function query to impact if the document is returned or not. Am I barking up the right tree looking at this or is there another method to supporting this?
Re: StrDocValues
I think I found it. {!boost..} gave me what i was looking for and then a custom collector filtered out anything that I didn't want to show. On Wed, Aug 26, 2015 at 1:48 PM, Jamie Johnson jej2...@gmail.com wrote: Are there any example implementation showing how StrDocValues works? I am not sure if this is the right place or not, but I was thinking about having some document level doc value that I'd like to read in a function query to impact if the document is returned or not. Am I barking up the right tree looking at this or is there another method to supporting this?
Re: StrDocValues
I don't see it explicitly mentioned, but does the boost only get applied to the final documents/score that matched the provided query or is it called for each field that matched? I'm assuming only once per document that matched the main query, is that right? On Wed, Aug 26, 2015 at 5:35 PM, Jamie Johnson jej2...@gmail.com wrote: I think I found it. {!boost..} gave me what i was looking for and then a custom collector filtered out anything that I didn't want to show. On Wed, Aug 26, 2015 at 1:48 PM, Jamie Johnson jej2...@gmail.com wrote: Are there any example implementation showing how StrDocValues works? I am not sure if this is the right place or not, but I was thinking about having some document level doc value that I'd like to read in a function query to impact if the document is returned or not. Am I barking up the right tree looking at this or is there another method to supporting this?
Re: Tokenizers and DelimitedPayloadTokenFilterFactory
Thanks again Erick, I created https://issues.apache.org/jira/browse/SOLR-7975, though I didn't attach s patch because my current implementation is not useful generally right now, it meets my use case but likely would not meet others. I will try to look about generalizing this to allow something custom to be plugged in. On Aug 26, 2015 2:46 AM, Erick Erickson erickerick...@gmail.com wrote: Sure, I think it's fine to raise a JIRA, especially if you can include a patch, even a preliminary one to solicit feedback... which I'll leave to people who are more familiar with that code... I'm not sure how generally useful this would be, and if it comes at a cost to normal searching there's sure to be lively discussion. Best Erick On Tue, Aug 25, 2015 at 7:50 PM, Jamie Johnson jej2...@gmail.com wrote: Looks like I have something basic working for Trie fields. I am doing exactly what I said in my previous email, so good news there. I think this is a big step as there are only a few field types left that I need to support, those being date (should be similar to Trie) and Spatial fields, which at a glance looked like it provided a way to provide the token stream through an extension. Definitely need to look more though. All of this said though, is this really the right way to get payloads into these types of fields? Should a jira feature request be added for this? On Aug 25, 2015 8:13 PM, Jamie Johnson jej2...@gmail.com wrote: Right, I had assumed (obviously here is my problem) that I'd be able to specify payloads for the field regardless of the field type. Looking at TrieField that is certainly non-trivial. After a bit of digging it appears that if I wanted to do something here I'd need to build a new TrieField, override createField and provide a Field that would return something like NumericTokenStream but also provide the payloads. Like you said sounds interesting to say the least... Were payloads not really intended to be used for these types of fields from a Lucene perspective? On Tue, Aug 25, 2015 at 6:29 PM, Erick Erickson erickerick...@gmail.com wrote: Well, you're going down a path that hasn't been trodden before ;). If you can treat your primitive types as text types you might get some traction, but that makes a lot of operations like numeric comparison difficult. H. another idea from left field. For single-valued types, what about a sidecar field that has the auth token? And even for a multiValued field, two parallel fields are guaranteed to maintain order so perhaps you could do something here. Yes, I'm waving my hands a LOT here. I suspect that trying to have a custom type that incorporates payloads for, say, trie fields will be interesting to say the least. Numeric types are packed to save storage etc. so it'll be an adventure.. Best, Erick On Tue, Aug 25, 2015 at 2:43 PM, Jamie Johnson jej2...@gmail.com wrote: We were originally using this approach, i.e. run things through the KeywordTokenizer - DelimitedPayloadFilter - WordDelimiterFilter. Again this works fine for text, though I had wanted to use the StandardTokenizer in the chain. Is there an equivalent filter that does what the StandardTokenizer does? All of this said this doesn't address the issue of the primitive field types, which at this point is the bigger issue. Given this use case should there be another way to provide payloads? My current thinking is that I will need to provide custom implementations for all of the field types I would like to support payloads on which will essentially be copies of the standard versions with some extra sugar to read/write the payloads (I don't see a way to wrap/delegate these at this point because AttributeSource has the attribute retrieval related methods as final so I can't simply wrap another tokenizer and return my added attributes + the wrapped attributes). I know my use case is a bit strange, but I had not expected to need to do this given that Lucene/Solr supports payloads on these field types, they just aren't exposed. As always I appreciate any ideas if I'm barking up the wrong tree here. On Tue, Aug 25, 2015 at 2:52 PM, Markus Jelsma markus.jel...@openindex.io wrote: Well, if i remember correctly (i have no testing facility at hand) WordDelimiterFilter maintains payloads on emitted sub terms. So if you use a KeywordTokenizer, input 'some text^PAYLOAD', and have a DelimitedPayloadFilter, the entire string gets a payload. You can then split that string up again in individual tokens. It is possible to abuse WordDelimiterFilter for it because it has a types parameter that you can use to split it on whitespace if its input is not trimmed. Otherwise you can use any other character instead of a space as your input. This is a crazy idea
Re: Lucene/Solr 5.0 and custom FieldCahe implementation
Sorry to poke this again but I'm not following the last comment of how I could go about extending the solr index searcher and have the extension used. Is there an example of this? Again thanks Jamie On Aug 25, 2015 7:18 AM, Jamie Johnson jej2...@gmail.com wrote: I had seen this as well, if I over wrote this by extending SolrIndexSearcher how do I have my extension used? I didn't see a way that could be plugged in. On Aug 25, 2015 7:15 AM, Mikhail Khludnev mkhlud...@griddynamics.com wrote: On Tue, Aug 25, 2015 at 2:03 PM, Jamie Johnson jej2...@gmail.com wrote: Thanks Mikhail. If I'm reading the SimpleFacets class correctly, out delegates to DocValuesFacets when facet method is FC, what used to be FieldCache I believe. DocValuesFacets either uses DocValues or builds then using the UninvertingReader. Ah.. got it. Thanks for reminding this details.It seems like even docValues=true doesn't help with your custom implementation. I am not seeing a clean extension point to add a custom UninvertingReader to Solr, would the only way be to copy the FacetComponent and SimpleFacets and modify as needed? Sadly, yes. There is no proper extension point. Also, consider overriding SolrIndexSearcher.wrapReader(SolrCore, DirectoryReader) where the particular UninvertingReader is created, there you can pass the own one, which refers to custom FieldCache. On Aug 25, 2015 12:42 AM, Mikhail Khludnev mkhlud...@griddynamics.com wrote: Hello Jamie, I don't understand how it could choose DocValuesFacets (it occurs on docValues=true) field, but then switches to UninvertingReader/FieldCache which means docValues=false. If you can provide more details it would be great. Beside of that, I suppose you can only implement and inject your own UninvertingReader, I don't think there is an extension point for this. It's too specific requirement. On Tue, Aug 25, 2015 at 3:50 AM, Jamie Johnson jej2...@gmail.com wrote: as mentioned in a previous email I have a need to provide security controls at the term level. I know that Lucene/Solr doesn't support this so I had baked something onto a 4.x baseline that was sufficient for my use cases. I am now looking to move that implementation to 5.x and am running into an issue around faceting. Previously we were able to provide a custom cache implementation that would create separate cache entries given a particular set of security controls, but in Solr 5 some faceting is delegated to DocValuesFacets which delegates to UninvertingReader in my case (we are not storing DocValues). The issue I am running into is that before 5.x I had the ability to influence the FieldCache that was used at the Solr level to also include a security token into the key so each cache entry was scoped to a particular level. With the current implementation the FieldCache seems to be an internal detail that I can't influence in anyway. Is this correct? I had noticed this Jira ticket https://issues.apache.org/jira/browse/LUCENE-5427, is there any movement on this? Is there another way to influence the information that is put into these caches? As always thanks in advance for any suggestions. -Jamie -- Sincerely yours Mikhail Khludnev Principal Engineer, Grid Dynamics http://www.griddynamics.com mkhlud...@griddynamics.com -- Sincerely yours Mikhail Khludnev Principal Engineer, Grid Dynamics http://www.griddynamics.com mkhlud...@griddynamics.com
Re: Tokenizers and DelimitedPayloadTokenFilterFactory
To be clear, we are using payloads as a way to attach authorizations to individual tokens within Solr. The payloads are normal Solr Payloads though we are not using floats, we are using the identity payload encoder (org.apache.lucene.analysis.payloads.IdentityEncoder) which allows for storing a byte[] of our choosing into the payload field. This works great for text, but now that I'm indexing more than just text I need a way to specify the payload on the other field types. Does that make more sense? On Tue, Aug 25, 2015 at 12:52 PM, Erick Erickson erickerick...@gmail.com wrote: This really sounds like an XY problem. Or when you use payload it's not the Solr payload. So Solr Payloads are a float value that you can attach to individual terms to influence the scoring. Attaching the _same_ payload to all terms in a field is much the same thing as boosting on any matches in the field at query time or boosting on the field at index time (this latter assuming that different docs would have different boosts). So can you back up a bit and tell us what you're trying to accomplish maybe we can be sure we're both talking about the same thing ;) Best, Erick On Tue, Aug 25, 2015 at 9:09 AM, Jamie Johnson jej2...@gmail.com wrote: I would like to specify a particular payload for all tokens emitted from a tokenizer, but don't see a clear way to do this. Ideally I could specify that something like the DelimitedPayloadTokenFilter be run on the entire field and then standard analysis be done on the rest of the field, so in the case that I had the following text this is a test\Foo I would like to create tokens this, is, a, test each with a payload of Foo. From what I'm seeing though only test get's the payload. Is there anyway to accomplish this or will I need to implement a custom tokenizer?
Tokenizers and DelimitedPayloadTokenFilterFactory
I would like to specify a particular payload for all tokens emitted from a tokenizer, but don't see a clear way to do this. Ideally I could specify that something like the DelimitedPayloadTokenFilter be run on the entire field and then standard analysis be done on the rest of the field, so in the case that I had the following text this is a test\Foo I would like to create tokens this, is, a, test each with a payload of Foo. From what I'm seeing though only test get's the payload. Is there anyway to accomplish this or will I need to implement a custom tokenizer?
Re: Tokenizers and DelimitedPayloadTokenFilterFactory
Looks like I have something basic working for Trie fields. I am doing exactly what I said in my previous email, so good news there. I think this is a big step as there are only a few field types left that I need to support, those being date (should be similar to Trie) and Spatial fields, which at a glance looked like it provided a way to provide the token stream through an extension. Definitely need to look more though. All of this said though, is this really the right way to get payloads into these types of fields? Should a jira feature request be added for this? On Aug 25, 2015 8:13 PM, Jamie Johnson jej2...@gmail.com wrote: Right, I had assumed (obviously here is my problem) that I'd be able to specify payloads for the field regardless of the field type. Looking at TrieField that is certainly non-trivial. After a bit of digging it appears that if I wanted to do something here I'd need to build a new TrieField, override createField and provide a Field that would return something like NumericTokenStream but also provide the payloads. Like you said sounds interesting to say the least... Were payloads not really intended to be used for these types of fields from a Lucene perspective? On Tue, Aug 25, 2015 at 6:29 PM, Erick Erickson erickerick...@gmail.com wrote: Well, you're going down a path that hasn't been trodden before ;). If you can treat your primitive types as text types you might get some traction, but that makes a lot of operations like numeric comparison difficult. H. another idea from left field. For single-valued types, what about a sidecar field that has the auth token? And even for a multiValued field, two parallel fields are guaranteed to maintain order so perhaps you could do something here. Yes, I'm waving my hands a LOT here. I suspect that trying to have a custom type that incorporates payloads for, say, trie fields will be interesting to say the least. Numeric types are packed to save storage etc. so it'll be an adventure.. Best, Erick On Tue, Aug 25, 2015 at 2:43 PM, Jamie Johnson jej2...@gmail.com wrote: We were originally using this approach, i.e. run things through the KeywordTokenizer - DelimitedPayloadFilter - WordDelimiterFilter. Again this works fine for text, though I had wanted to use the StandardTokenizer in the chain. Is there an equivalent filter that does what the StandardTokenizer does? All of this said this doesn't address the issue of the primitive field types, which at this point is the bigger issue. Given this use case should there be another way to provide payloads? My current thinking is that I will need to provide custom implementations for all of the field types I would like to support payloads on which will essentially be copies of the standard versions with some extra sugar to read/write the payloads (I don't see a way to wrap/delegate these at this point because AttributeSource has the attribute retrieval related methods as final so I can't simply wrap another tokenizer and return my added attributes + the wrapped attributes). I know my use case is a bit strange, but I had not expected to need to do this given that Lucene/Solr supports payloads on these field types, they just aren't exposed. As always I appreciate any ideas if I'm barking up the wrong tree here. On Tue, Aug 25, 2015 at 2:52 PM, Markus Jelsma markus.jel...@openindex.io wrote: Well, if i remember correctly (i have no testing facility at hand) WordDelimiterFilter maintains payloads on emitted sub terms. So if you use a KeywordTokenizer, input 'some text^PAYLOAD', and have a DelimitedPayloadFilter, the entire string gets a payload. You can then split that string up again in individual tokens. It is possible to abuse WordDelimiterFilter for it because it has a types parameter that you can use to split it on whitespace if its input is not trimmed. Otherwise you can use any other character instead of a space as your input. This is a crazy idea, but it might work. -Original message- From:Jamie Johnson jej2...@gmail.com Sent: Tuesday 25th August 2015 19:37 To: solr-user@lucene.apache.org Subject: Re: Tokenizers and DelimitedPayloadTokenFilterFactory To be clear, we are using payloads as a way to attach authorizations to individual tokens within Solr. The payloads are normal Solr Payloads though we are not using floats, we are using the identity payload encoder (org.apache.lucene.analysis.payloads.IdentityEncoder) which allows for storing a byte[] of our choosing into the payload field. This works great for text, but now that I'm indexing more than just text I need a way to specify the payload on the other field types. Does that make more sense? On Tue, Aug 25, 2015 at 12:52 PM, Erick Erickson erickerick...@gmail.com wrote: This really sounds like an XY problem. Or when you use
Re: Lucene/Solr 5.0 and custom FieldCahe implementation
I had seen this as well, if I over wrote this by extending SolrIndexSearcher how do I have my extension used? I didn't see a way that could be plugged in. On Aug 25, 2015 7:15 AM, Mikhail Khludnev mkhlud...@griddynamics.com wrote: On Tue, Aug 25, 2015 at 2:03 PM, Jamie Johnson jej2...@gmail.com wrote: Thanks Mikhail. If I'm reading the SimpleFacets class correctly, out delegates to DocValuesFacets when facet method is FC, what used to be FieldCache I believe. DocValuesFacets either uses DocValues or builds then using the UninvertingReader. Ah.. got it. Thanks for reminding this details.It seems like even docValues=true doesn't help with your custom implementation. I am not seeing a clean extension point to add a custom UninvertingReader to Solr, would the only way be to copy the FacetComponent and SimpleFacets and modify as needed? Sadly, yes. There is no proper extension point. Also, consider overriding SolrIndexSearcher.wrapReader(SolrCore, DirectoryReader) where the particular UninvertingReader is created, there you can pass the own one, which refers to custom FieldCache. On Aug 25, 2015 12:42 AM, Mikhail Khludnev mkhlud...@griddynamics.com wrote: Hello Jamie, I don't understand how it could choose DocValuesFacets (it occurs on docValues=true) field, but then switches to UninvertingReader/FieldCache which means docValues=false. If you can provide more details it would be great. Beside of that, I suppose you can only implement and inject your own UninvertingReader, I don't think there is an extension point for this. It's too specific requirement. On Tue, Aug 25, 2015 at 3:50 AM, Jamie Johnson jej2...@gmail.com wrote: as mentioned in a previous email I have a need to provide security controls at the term level. I know that Lucene/Solr doesn't support this so I had baked something onto a 4.x baseline that was sufficient for my use cases. I am now looking to move that implementation to 5.x and am running into an issue around faceting. Previously we were able to provide a custom cache implementation that would create separate cache entries given a particular set of security controls, but in Solr 5 some faceting is delegated to DocValuesFacets which delegates to UninvertingReader in my case (we are not storing DocValues). The issue I am running into is that before 5.x I had the ability to influence the FieldCache that was used at the Solr level to also include a security token into the key so each cache entry was scoped to a particular level. With the current implementation the FieldCache seems to be an internal detail that I can't influence in anyway. Is this correct? I had noticed this Jira ticket https://issues.apache.org/jira/browse/LUCENE-5427, is there any movement on this? Is there another way to influence the information that is put into these caches? As always thanks in advance for any suggestions. -Jamie -- Sincerely yours Mikhail Khludnev Principal Engineer, Grid Dynamics http://www.griddynamics.com mkhlud...@griddynamics.com -- Sincerely yours Mikhail Khludnev Principal Engineer, Grid Dynamics http://www.griddynamics.com mkhlud...@griddynamics.com
Re: how to prevent uuid-field changing in /update query?
It sounds like you need to control when the uuid is and is not created, just feels like you'd get better mileage doing this outside of solr On Aug 25, 2015 7:49 AM, CrazyDiamond crazy_diam...@mail.ru wrote: Why not generate the uuid client side on the initial save and reuse this on updates? i can't do this because i have delta-import queries which also should be able to assign uuid when it is needed -- View this message in context: http://lucene.472066.n3.nabble.com/how-to-prevent-uuid-field-changing-in-update-query-tp4225113p4225137.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: how to prevent uuid-field changing in /update query?
Why not generate the uuid client side on the initial save and reuse this on updates? On Aug 25, 2015 4:22 AM, CrazyDiamond crazy_diam...@mail.ru wrote: i have uuid field. it is not set as unique, but nevertheless i want it not to be changed every time when i call /update. it might be because i added requesthandler with name /update which contains uuid update срфшт .But if i not do this i have no uuid at all.May be i can config uuid update-chain to set uuid only if it is blank? -- View this message in context: http://lucene.472066.n3.nabble.com/how-to-prevent-uuid-field-changing-in-update-query-tp4225113.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: how to prevent uuid-field changing in /update query?
I am honestly not familiar enough to say. Best to try it On Aug 25, 2015 7:59 AM, CrazyDiamond crazy_diam...@mail.ru wrote: It sounds like you need to control when the uuid is and is not created, just feels like you'd get better mileage doing this outside of solr Can I simply insert a condition(blank or not ) in uuid update-chain? -- View this message in context: http://lucene.472066.n3.nabble.com/how-to-prevent-uuid-field-changing-in-update-query-tp4225113p4225141.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Lucene/Solr 5.0 and custom FieldCahe implementation
Thanks Mikhail. If I'm reading the SimpleFacets class correctly, out delegates to DocValuesFacets when facet method is FC, what used to be FieldCache I believe. DocValuesFacets either uses DocValues or builds then using the UninvertingReader. I am not seeing a clean extension point to add a custom UninvertingReader to Solr, would the only way be to copy the FacetComponent and SimpleFacets and modify as needed? On Aug 25, 2015 12:42 AM, Mikhail Khludnev mkhlud...@griddynamics.com wrote: Hello Jamie, I don't understand how it could choose DocValuesFacets (it occurs on docValues=true) field, but then switches to UninvertingReader/FieldCache which means docValues=false. If you can provide more details it would be great. Beside of that, I suppose you can only implement and inject your own UninvertingReader, I don't think there is an extension point for this. It's too specific requirement. On Tue, Aug 25, 2015 at 3:50 AM, Jamie Johnson jej2...@gmail.com wrote: as mentioned in a previous email I have a need to provide security controls at the term level. I know that Lucene/Solr doesn't support this so I had baked something onto a 4.x baseline that was sufficient for my use cases. I am now looking to move that implementation to 5.x and am running into an issue around faceting. Previously we were able to provide a custom cache implementation that would create separate cache entries given a particular set of security controls, but in Solr 5 some faceting is delegated to DocValuesFacets which delegates to UninvertingReader in my case (we are not storing DocValues). The issue I am running into is that before 5.x I had the ability to influence the FieldCache that was used at the Solr level to also include a security token into the key so each cache entry was scoped to a particular level. With the current implementation the FieldCache seems to be an internal detail that I can't influence in anyway. Is this correct? I had noticed this Jira ticket https://issues.apache.org/jira/browse/LUCENE-5427, is there any movement on this? Is there another way to influence the information that is put into these caches? As always thanks in advance for any suggestions. -Jamie -- Sincerely yours Mikhail Khludnev Principal Engineer, Grid Dynamics http://www.griddynamics.com mkhlud...@griddynamics.com
Re: Tokenizers and DelimitedPayloadTokenFilterFactory
Right, I had assumed (obviously here is my problem) that I'd be able to specify payloads for the field regardless of the field type. Looking at TrieField that is certainly non-trivial. After a bit of digging it appears that if I wanted to do something here I'd need to build a new TrieField, override createField and provide a Field that would return something like NumericTokenStream but also provide the payloads. Like you said sounds interesting to say the least... Were payloads not really intended to be used for these types of fields from a Lucene perspective? On Tue, Aug 25, 2015 at 6:29 PM, Erick Erickson erickerick...@gmail.com wrote: Well, you're going down a path that hasn't been trodden before ;). If you can treat your primitive types as text types you might get some traction, but that makes a lot of operations like numeric comparison difficult. H. another idea from left field. For single-valued types, what about a sidecar field that has the auth token? And even for a multiValued field, two parallel fields are guaranteed to maintain order so perhaps you could do something here. Yes, I'm waving my hands a LOT here. I suspect that trying to have a custom type that incorporates payloads for, say, trie fields will be interesting to say the least. Numeric types are packed to save storage etc. so it'll be an adventure.. Best, Erick On Tue, Aug 25, 2015 at 2:43 PM, Jamie Johnson jej2...@gmail.com wrote: We were originally using this approach, i.e. run things through the KeywordTokenizer - DelimitedPayloadFilter - WordDelimiterFilter. Again this works fine for text, though I had wanted to use the StandardTokenizer in the chain. Is there an equivalent filter that does what the StandardTokenizer does? All of this said this doesn't address the issue of the primitive field types, which at this point is the bigger issue. Given this use case should there be another way to provide payloads? My current thinking is that I will need to provide custom implementations for all of the field types I would like to support payloads on which will essentially be copies of the standard versions with some extra sugar to read/write the payloads (I don't see a way to wrap/delegate these at this point because AttributeSource has the attribute retrieval related methods as final so I can't simply wrap another tokenizer and return my added attributes + the wrapped attributes). I know my use case is a bit strange, but I had not expected to need to do this given that Lucene/Solr supports payloads on these field types, they just aren't exposed. As always I appreciate any ideas if I'm barking up the wrong tree here. On Tue, Aug 25, 2015 at 2:52 PM, Markus Jelsma markus.jel...@openindex.io wrote: Well, if i remember correctly (i have no testing facility at hand) WordDelimiterFilter maintains payloads on emitted sub terms. So if you use a KeywordTokenizer, input 'some text^PAYLOAD', and have a DelimitedPayloadFilter, the entire string gets a payload. You can then split that string up again in individual tokens. It is possible to abuse WordDelimiterFilter for it because it has a types parameter that you can use to split it on whitespace if its input is not trimmed. Otherwise you can use any other character instead of a space as your input. This is a crazy idea, but it might work. -Original message- From:Jamie Johnson jej2...@gmail.com Sent: Tuesday 25th August 2015 19:37 To: solr-user@lucene.apache.org Subject: Re: Tokenizers and DelimitedPayloadTokenFilterFactory To be clear, we are using payloads as a way to attach authorizations to individual tokens within Solr. The payloads are normal Solr Payloads though we are not using floats, we are using the identity payload encoder (org.apache.lucene.analysis.payloads.IdentityEncoder) which allows for storing a byte[] of our choosing into the payload field. This works great for text, but now that I'm indexing more than just text I need a way to specify the payload on the other field types. Does that make more sense? On Tue, Aug 25, 2015 at 12:52 PM, Erick Erickson erickerick...@gmail.com wrote: This really sounds like an XY problem. Or when you use payload it's not the Solr payload. So Solr Payloads are a float value that you can attach to individual terms to influence the scoring. Attaching the _same_ payload to all terms in a field is much the same thing as boosting on any matches in the field at query time or boosting on the field at index time (this latter assuming that different docs would have different boosts). So can you back up a bit and tell us what you're trying to accomplish maybe we can be sure we're both talking about the same thing ;) Best, Erick On Tue, Aug 25, 2015 at 9:09 AM, Jamie Johnson jej2...@gmail.com wrote
Re: Tokenizers and DelimitedPayloadTokenFilterFactory
We were originally using this approach, i.e. run things through the KeywordTokenizer - DelimitedPayloadFilter - WordDelimiterFilter. Again this works fine for text, though I had wanted to use the StandardTokenizer in the chain. Is there an equivalent filter that does what the StandardTokenizer does? All of this said this doesn't address the issue of the primitive field types, which at this point is the bigger issue. Given this use case should there be another way to provide payloads? My current thinking is that I will need to provide custom implementations for all of the field types I would like to support payloads on which will essentially be copies of the standard versions with some extra sugar to read/write the payloads (I don't see a way to wrap/delegate these at this point because AttributeSource has the attribute retrieval related methods as final so I can't simply wrap another tokenizer and return my added attributes + the wrapped attributes). I know my use case is a bit strange, but I had not expected to need to do this given that Lucene/Solr supports payloads on these field types, they just aren't exposed. As always I appreciate any ideas if I'm barking up the wrong tree here. On Tue, Aug 25, 2015 at 2:52 PM, Markus Jelsma markus.jel...@openindex.io wrote: Well, if i remember correctly (i have no testing facility at hand) WordDelimiterFilter maintains payloads on emitted sub terms. So if you use a KeywordTokenizer, input 'some text^PAYLOAD', and have a DelimitedPayloadFilter, the entire string gets a payload. You can then split that string up again in individual tokens. It is possible to abuse WordDelimiterFilter for it because it has a types parameter that you can use to split it on whitespace if its input is not trimmed. Otherwise you can use any other character instead of a space as your input. This is a crazy idea, but it might work. -Original message- From:Jamie Johnson jej2...@gmail.com Sent: Tuesday 25th August 2015 19:37 To: solr-user@lucene.apache.org Subject: Re: Tokenizers and DelimitedPayloadTokenFilterFactory To be clear, we are using payloads as a way to attach authorizations to individual tokens within Solr. The payloads are normal Solr Payloads though we are not using floats, we are using the identity payload encoder (org.apache.lucene.analysis.payloads.IdentityEncoder) which allows for storing a byte[] of our choosing into the payload field. This works great for text, but now that I'm indexing more than just text I need a way to specify the payload on the other field types. Does that make more sense? On Tue, Aug 25, 2015 at 12:52 PM, Erick Erickson erickerick...@gmail.com wrote: This really sounds like an XY problem. Or when you use payload it's not the Solr payload. So Solr Payloads are a float value that you can attach to individual terms to influence the scoring. Attaching the _same_ payload to all terms in a field is much the same thing as boosting on any matches in the field at query time or boosting on the field at index time (this latter assuming that different docs would have different boosts). So can you back up a bit and tell us what you're trying to accomplish maybe we can be sure we're both talking about the same thing ;) Best, Erick On Tue, Aug 25, 2015 at 9:09 AM, Jamie Johnson jej2...@gmail.com wrote: I would like to specify a particular payload for all tokens emitted from a tokenizer, but don't see a clear way to do this. Ideally I could specify that something like the DelimitedPayloadTokenFilter be run on the entire field and then standard analysis be done on the rest of the field, so in the case that I had the following text this is a test\Foo I would like to create tokens this, is, a, test each with a payload of Foo. From what I'm seeing though only test get's the payload. Is there anyway to accomplish this or will I need to implement a custom tokenizer?
Re: Disable caching
I ran into another issue that I am having issue running to ground. My implementation on Solr 4.x worked as I expected but trying to migrate this to Solr 5.x it looks like some of the faceting is delegated to DocValuesFacets which ultimately caches things at a field level in the FieldCache.DEFAULT cache. I don't see anyway to override this cache or augment the key in anyway, am I missing an extension point here? Is there another approach I should be taking in this case? On Wed, Aug 19, 2015 at 9:08 AM, Jamie Johnson jej2...@gmail.com wrote: This was my original thought. We already have the thread local so should be straight fwd to just wrap the Field name and use that as the key. Again thanks, I really appreciate the feedback On Aug 19, 2015 8:12 AM, Yonik Seeley ysee...@gmail.com wrote: On Tue, Aug 18, 2015 at 10:58 PM, Jamie Johnson jej2...@gmail.com wrote: Hmm...so I think I have things setup correctly, I have a custom QParserPlugin building a custom query that wraps the query built from the base parser and stores the user who is executing the query. I've added the username to the hashCode and equals checks so I think everything is setup properly. I ran a quick test and it definitely looks like my items are being cached now per user, which is really great. The outage that I'm running into now is the FieldValueCache doesn't take into account the query, so the FieldValueCache is built for user a and then reused for user b, which is an issue for me. In short I'm back to my NoOpCache for FieldValues. It's great that I'm in a better spot for the others, but is there anything that can be done with FieldValues to take into account the requesting user? I guess a cache implementation that gets the user through a thread local and either wraps the original key with an object containing the user, or delegates to a per-user cache underneath. -Yonik
Lucene/Solr 5.0 and custom FieldCahe implementation
as mentioned in a previous email I have a need to provide security controls at the term level. I know that Lucene/Solr doesn't support this so I had baked something onto a 4.x baseline that was sufficient for my use cases. I am now looking to move that implementation to 5.x and am running into an issue around faceting. Previously we were able to provide a custom cache implementation that would create separate cache entries given a particular set of security controls, but in Solr 5 some faceting is delegated to DocValuesFacets which delegates to UninvertingReader in my case (we are not storing DocValues). The issue I am running into is that before 5.x I had the ability to influence the FieldCache that was used at the Solr level to also include a security token into the key so each cache entry was scoped to a particular level. With the current implementation the FieldCache seems to be an internal detail that I can't influence in anyway. Is this correct? I had noticed this Jira ticket https://issues.apache.org/jira/browse/LUCENE-5427, is there any movement on this? Is there another way to influence the information that is put into these caches? As always thanks in advance for any suggestions. -Jamie
Re: Geospatial Predicate Question
Thanks for the clarification! On Aug 19, 2015 3:05 PM, david.w.smi...@gmail.com david.w.smi...@gmail.com wrote: Hi Jamie, Your understanding is inverted. The predicates can be read as: indexed shape predicate query shape. For indexed point data, there is almost no semantic different between the Within and Intersects predicates. There is if the field is multi-valued and you want to ensure that all of the points for a document are within the query shape (Within predicate) versus any of them being okay (Intersects predicate). Intersects is pretty fast. The Contains predicate only makes sense for non-point indexed data. ~ David On Wed, Aug 12, 2015 at 6:02 PM Jamie Johnson jej2...@gmail.com wrote: Can someone clarify the difference between isWithin and Contains in regards to Solr's spatial support? From https://wiki.apache.org/solr/SolrAdaptersForLuceneSpatial4 I see that if you are using point data you should use Intersects, but it is not clear when to use isWithin and contains. My guess is that you use isWithin when you want to know if the query shape is within the shape that is indexed and you use contains to know if the query shape contains the indexed shape. Is that right? -- Lucene/Solr Search Committer, Consultant, Developer, Author, Speaker LinkedIn: http://linkedin.com/in/davidwsmiley | Book: http://www.solrenterprisesearchserver.com
Re: Disable caching
This was my original thought. We already have the thread local so should be straight fwd to just wrap the Field name and use that as the key. Again thanks, I really appreciate the feedback On Aug 19, 2015 8:12 AM, Yonik Seeley ysee...@gmail.com wrote: On Tue, Aug 18, 2015 at 10:58 PM, Jamie Johnson jej2...@gmail.com wrote: Hmm...so I think I have things setup correctly, I have a custom QParserPlugin building a custom query that wraps the query built from the base parser and stores the user who is executing the query. I've added the username to the hashCode and equals checks so I think everything is setup properly. I ran a quick test and it definitely looks like my items are being cached now per user, which is really great. The outage that I'm running into now is the FieldValueCache doesn't take into account the query, so the FieldValueCache is built for user a and then reused for user b, which is an issue for me. In short I'm back to my NoOpCache for FieldValues. It's great that I'm in a better spot for the others, but is there anything that can be done with FieldValues to take into account the requesting user? I guess a cache implementation that gets the user through a thread local and either wraps the original key with an object containing the user, or delegates to a per-user cache underneath. -Yonik
Re: Disable caching
Hmm...so I think I have things setup correctly, I have a custom QParserPlugin building a custom query that wraps the query built from the base parser and stores the user who is executing the query. I've added the username to the hashCode and equals checks so I think everything is setup properly. I ran a quick test and it definitely looks like my items are being cached now per user, which is really great. The outage that I'm running into now is the FieldValueCache doesn't take into account the query, so the FieldValueCache is built for user a and then reused for user b, which is an issue for me. In short I'm back to my NoOpCache for FieldValues. It's great that I'm in a better spot for the others, but is there anything that can be done with FieldValues to take into account the requesting user? On Tue, Aug 18, 2015 at 9:59 PM, Yonik Seeley ysee...@gmail.com wrote: On Tue, Aug 18, 2015 at 9:51 PM, Jamie Johnson jej2...@gmail.com wrote: Thanks, I'll try to delve into this. We are currently using the parent query parser, within we could use {!secure} I think. Ultimately I would want the solr qparser to actually do the work of parsing and I'd just wrap that. Right... look at something like BoostQParserPlugin it should be trivial to wrap any other type of query. baseParser = subQuery(localParams.get(QueryParsing.V), null); Query q = baseParser.getQuery(); q={!secure}my_normal_query OR q={!secure v=$qq)qq=my_normal_query OR q={!secure}{!parent ...} OR q={!secure v=$qq}qq={!parent. ..} -Yonik Are there any examples that I could look at for this? It's not clear to me what to do in the qparser once I have the user auths though. Again thanks, this is really good stuff. On Aug 18, 2015 8:54 PM, Yonik Seeley ysee...@gmail.com wrote: On Tue, Aug 18, 2015 at 8:38 PM, Jamie Johnson jej2...@gmail.com wrote: I really like this idea in concept. My query would literally be just a wrapper at that point, what would be the appropriate place to do this? It depends on how much you are trying to make everything transparent (that there is security) or not. First approach is explicitly changing the query types (you obviously need to make sure that only trusted code can run queries against solr for this method): q=foo:barfq=inStock:true q={!secure id=user}foo:barfq={!secure id=user}inStock:true you could even make the {!secure} qparser look for global security params so you don't need to repeat them. q={!secure}foo:barfq={!secure}inStock:truesecurity_id=user Second approach would prob involve a search component, probably that runs after the query component, that would handle wrapping any queries or filters in the prepare() phase. This would be slightly more difficult since it would require ensuring that none of the solr code / features you use re-grab the q or fq parameters re-parse without the opportunity for you to wrap them again. What would I need to do to the query to make it behave with the cache. Probably not much... record the credentials in the wrapper and use in the hashCode / equals. -Yonik Again thanks for the idea, I think this could be a simple way to use the caches. On Tue, Aug 18, 2015 at 8:31 PM, Yonik Seeley ysee...@gmail.com wrote: On Tue, Aug 18, 2015 at 8:19 PM, Jamie Johnson jej2...@gmail.com wrote: when you say a security filter, are you asking if I can express my security constraint as a query? If that is the case then the answer is no. At this point I have a requirement to secure Terms (a nightmare I know). Heh - ok, I figured as much. So... you could also wrap the main query and any filter queries in a custom security query that would contain the user, and thus still be able to use filter and query caches unmodified. I know... that's only a small part of the problem though. -Yonik
Re: Disable caching
when you say a security filter, are you asking if I can express my security constraint as a query? If that is the case then the answer is no. At this point I have a requirement to secure Terms (a nightmare I know). Our fallback is to aggregate the authorizations to a document level and secure the document which I think we wouldn't have to do anything to the caches but our customer has pushed back on this in the past. On Tue, Aug 18, 2015 at 7:46 PM, Yonik Seeley ysee...@gmail.com wrote: On Tue, Aug 18, 2015 at 7:11 PM, Jamie Johnson jej2...@gmail.com wrote: Yes, my use case is security. Basically I am executing queries with certain auths and when they are executed multiple times with differing auths I'm getting cached results. If it's just simple stuff like top N docs returned, can't you just use a security filter? The queryResult cache uses both the main query and a list of filters (and the sort order) for the cache key. -Yonik
Re: Disable caching
Thanks, I'll try to delve into this. We are currently using the parent query parser, within we could use {!secure} I think. Ultimately I would want the solr qparser to actually do the work of parsing and I'd just wrap that. Are there any examples that I could look at for this? It's not clear to me what to do in the qparser once I have the user auths though. Again thanks, this is really good stuff. On Aug 18, 2015 8:54 PM, Yonik Seeley ysee...@gmail.com wrote: On Tue, Aug 18, 2015 at 8:38 PM, Jamie Johnson jej2...@gmail.com wrote: I really like this idea in concept. My query would literally be just a wrapper at that point, what would be the appropriate place to do this? It depends on how much you are trying to make everything transparent (that there is security) or not. First approach is explicitly changing the query types (you obviously need to make sure that only trusted code can run queries against solr for this method): q=foo:barfq=inStock:true q={!secure id=user}foo:barfq={!secure id=user}inStock:true you could even make the {!secure} qparser look for global security params so you don't need to repeat them. q={!secure}foo:barfq={!secure}inStock:truesecurity_id=user Second approach would prob involve a search component, probably that runs after the query component, that would handle wrapping any queries or filters in the prepare() phase. This would be slightly more difficult since it would require ensuring that none of the solr code / features you use re-grab the q or fq parameters re-parse without the opportunity for you to wrap them again. What would I need to do to the query to make it behave with the cache. Probably not much... record the credentials in the wrapper and use in the hashCode / equals. -Yonik Again thanks for the idea, I think this could be a simple way to use the caches. On Tue, Aug 18, 2015 at 8:31 PM, Yonik Seeley ysee...@gmail.com wrote: On Tue, Aug 18, 2015 at 8:19 PM, Jamie Johnson jej2...@gmail.com wrote: when you say a security filter, are you asking if I can express my security constraint as a query? If that is the case then the answer is no. At this point I have a requirement to secure Terms (a nightmare I know). Heh - ok, I figured as much. So... you could also wrap the main query and any filter queries in a custom security query that would contain the user, and thus still be able to use filter and query caches unmodified. I know... that's only a small part of the problem though. -Yonik
Disable caching
I see that if Solr is in realtime mode that caching is disable within the SolrIndexSearcher that is created in SolrCore, but is there anyway to disable caching without being in realtime mode? Currently I'm implementing a NoOp cache that implements SolrCache but returns null for everything and doesn't return anything on the get requests, but it would be nice to not need to do this by being able to disable caching in general. Is this possible? -Jamie
Re: Disable caching
Yes, my use case is security. Basically I am executing queries with certain auths and when they are executed multiple times with differing auths I'm getting cached results. One option is to have another implementation that has a number of caches based on the auths, something that I suspect we will at some point go (unless there is a better solution ;). I'd be happy to look at other options so all suggestions are appreciated. On Tue, Aug 18, 2015 at 6:56 PM, Yonik Seeley ysee...@gmail.com wrote: You can comment out (some) of the caches. There are some caches like field caches that are more at the lucene level and can't be disabled. Can I ask what you are trying to prevent from being cached and why? Different caches are for different things, so it would seem to be an odd usecase to disable them all. Security? -Yonik On Tue, Aug 18, 2015 at 6:52 PM, Jamie Johnson jej2...@gmail.com wrote: I see that if Solr is in realtime mode that caching is disable within the SolrIndexSearcher that is created in SolrCore, but is there anyway to disable caching without being in realtime mode? Currently I'm implementing a NoOp cache that implements SolrCache but returns null for everything and doesn't return anything on the get requests, but it would be nice to not need to do this by being able to disable caching in general. Is this possible? -Jamie
Re: Disable caching
I really like this idea in concept. My query would literally be just a wrapper at that point, what would be the appropriate place to do this? What would I need to do to the query to make it behave with the cache. Again thanks for the idea, I think this could be a simple way to use the caches. On Tue, Aug 18, 2015 at 8:31 PM, Yonik Seeley ysee...@gmail.com wrote: On Tue, Aug 18, 2015 at 8:19 PM, Jamie Johnson jej2...@gmail.com wrote: when you say a security filter, are you asking if I can express my security constraint as a query? If that is the case then the answer is no. At this point I have a requirement to secure Terms (a nightmare I know). Heh - ok, I figured as much. So... you could also wrap the main query and any filter queries in a custom security query that would contain the user, and thus still be able to use filter and query caches unmodified. I know... that's only a small part of the problem though. -Yonik
Re: phonetic filter factory question
Thanks, i didn't know you could do this, I'll check this out. On Aug 15, 2015 12:54 PM, Alexandre Rafalovitch arafa...@gmail.com wrote: From the teaching to fish category of advice (since I don't know the actual answer). Did you try Analysis screen in the Admin UI? If you check Verbose output mark, you will see all the offsets and can easily confirm the detailed behavior for yourself. Regards, Alex. Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter: http://www.solr-start.com/ On 15 August 2015 at 12:22, Jamie Johnson jej2...@gmail.com wrote: The JavaDoc says that the PhoneticFilterFactory will inject tokens with an offset of 0 into the stream. I'm assuming this means an offset of 0 from the token that it is analyzing, is that right? I am trying to collapse some of my schema, I currently have a text field that I use for general purpose text and another field with the PhoneticFilterFactory applied for finding things that are similar phonetically, but if this does inject at the current position then I could likely collapse these into a single field. As always thanks in advance! -Jamie
phonetic filter factory question
The JavaDoc says that the PhoneticFilterFactory will inject tokens with an offset of 0 into the stream. I'm assuming this means an offset of 0 from the token that it is analyzing, is that right? I am trying to collapse some of my schema, I currently have a text field that I use for general purpose text and another field with the PhoneticFilterFactory applied for finding things that are similar phonetically, but if this does inject at the current position then I could likely collapse these into a single field. As always thanks in advance! -Jamie
Geospatial Predicate Question
Can someone clarify the difference between isWithin and Contains in regards to Solr's spatial support? From https://wiki.apache.org/solr/SolrAdaptersForLuceneSpatial4 I see that if you are using point data you should use Intersects, but it is not clear when to use isWithin and contains. My guess is that you use isWithin when you want to know if the query shape is within the shape that is indexed and you use contains to know if the query shape contains the indexed shape. Is that right?
Re: Filtering documents using payloads
Looks like my issue is that my nextDoc call is consuming the first position, and then on the call to nextPosition it's moving past where I want it to be. I believe that I have this working properly now by checking if the current position should or shouldn't be incremented. On Thu, Aug 6, 2015 at 7:35 PM, Jamie Johnson jej2...@gmail.com wrote: I am attempting to put together a DocsAndPositionsEnum that can hide terms given the payload on the term. The idea is that if a term has a particular access control and the user does not I don't want it to be visible. I have based this off of https://github.com/roshanp/lucure-core/blob/master/src/main/java/com/lucure/core/codec/AccessFilteredDocsAndPositionsEnum.java with some modifications to try and preserve the position information that is consumed as part of the hasAccess method. The current iteration that I am working with seems to be providing wrong positions to the ExactPhraseScorer.phraseFreq() method in the ChunkState's that are calculated. Below is my current iteration on this, but I can't narrow down why exactly the position information isn't what I expect. Does anything jump out? package com.lucure.core.codec; import com.lucure.core.AuthorizationsHolder; import com.lucure.core.security.Authorizations; import com.lucure.core.security.FieldVisibility; import com.lucure.core.security.VisibilityParseException; import org.apache.lucene.index.DocsAndPositionsEnum; import org.apache.lucene.util.AttributeSource; import org.apache.lucene.util.BytesRef; import java.io.IOException; import java.util.Arrays; import static com.lucure.core.codec.AccessFilteredDocsAndPositionsEnum .AllAuthorizationsHolder.ALLAUTHSHOLDER; /** * Enum to read and restrict access to a document based on the payload which * is expected to store the visibility */ public class AccessFilteredDocsAndPositionsEnum extends DocsAndPositionsEnum { /** * This placeholder allows for lucene specific operations such as * merge to read data with all authorizations enabled. This should never * be used outside of the Codec. */ static class AllAuthorizationsHolder extends AuthorizationsHolder { static final AllAuthorizationsHolder ALLAUTHSHOLDER = new AllAuthorizationsHolder(); private AllAuthorizationsHolder() { super(Authorizations.EMPTY); } } static void enableMergeAuthorizations() { AuthorizationsHolder.threadAuthorizations.set(ALLAUTHSHOLDER); } static void disableMergeAuthorizations() { AuthorizationsHolder.threadAuthorizations.remove(); } private final DocsAndPositionsEnum docsAndPositionsEnum; private final AuthorizationsHolder authorizationsHolder; public AccessFilteredDocsAndPositionsEnum( DocsAndPositionsEnum docsAndPositionsEnum) { this(docsAndPositionsEnum, AuthorizationsHolder.threadAuthorizations.get()); } public AccessFilteredDocsAndPositionsEnum( DocsAndPositionsEnum docsAndPositionsEnum, AuthorizationsHolder authorizationsHolder) { this.docsAndPositionsEnum = docsAndPositionsEnum; this.authorizationsHolder = authorizationsHolder; } long cost; int endOffset, startOffset, currentPosition, freq, docId; BytesRef payload; @Override public int nextPosition() throws IOException { while (!hasAccess()) { } return currentPosition + 1; } @Override public int startOffset() throws IOException { return startOffset; } @Override public int endOffset() throws IOException { return endOffset; } @Override public BytesRef getPayload() throws IOException { return payload; } @Override public int freq() throws IOException { return docsAndPositionsEnum.freq(); } @Override public int docID() { return docsAndPositionsEnum.docID(); } @Override public int nextDoc() throws IOException { while (docsAndPositionsEnum.nextDoc() != NO_MORE_DOCS) { if (hasAccess()) { return docID(); } } return NO_MORE_DOCS; } @Override public int advance(int target) throws IOException { int advance = docsAndPositionsEnum.advance(target); if (advance != NO_MORE_DOCS) { if (hasAccess()) { return docID(); } else { //seek to next available int doc; while ((doc = nextDoc()) target) { } return doc; } } return NO_MORE_DOCS; } @Override public long cost() { return docsAndPositionsEnum.cost(); } protected boolean hasAccess() throws IOException
Filtering documents using payloads
I am attempting to put together a DocsAndPositionsEnum that can hide terms given the payload on the term. The idea is that if a term has a particular access control and the user does not I don't want it to be visible. I have based this off of https://github.com/roshanp/lucure-core/blob/master/src/main/java/com/lucure/core/codec/AccessFilteredDocsAndPositionsEnum.java with some modifications to try and preserve the position information that is consumed as part of the hasAccess method. The current iteration that I am working with seems to be providing wrong positions to the ExactPhraseScorer.phraseFreq() method in the ChunkState's that are calculated. Below is my current iteration on this, but I can't narrow down why exactly the position information isn't what I expect. Does anything jump out? package com.lucure.core.codec; import com.lucure.core.AuthorizationsHolder; import com.lucure.core.security.Authorizations; import com.lucure.core.security.FieldVisibility; import com.lucure.core.security.VisibilityParseException; import org.apache.lucene.index.DocsAndPositionsEnum; import org.apache.lucene.util.AttributeSource; import org.apache.lucene.util.BytesRef; import java.io.IOException; import java.util.Arrays; import static com.lucure.core.codec.AccessFilteredDocsAndPositionsEnum .AllAuthorizationsHolder.ALLAUTHSHOLDER; /** * Enum to read and restrict access to a document based on the payload which * is expected to store the visibility */ public class AccessFilteredDocsAndPositionsEnum extends DocsAndPositionsEnum { /** * This placeholder allows for lucene specific operations such as * merge to read data with all authorizations enabled. This should never * be used outside of the Codec. */ static class AllAuthorizationsHolder extends AuthorizationsHolder { static final AllAuthorizationsHolder ALLAUTHSHOLDER = new AllAuthorizationsHolder(); private AllAuthorizationsHolder() { super(Authorizations.EMPTY); } } static void enableMergeAuthorizations() { AuthorizationsHolder.threadAuthorizations.set(ALLAUTHSHOLDER); } static void disableMergeAuthorizations() { AuthorizationsHolder.threadAuthorizations.remove(); } private final DocsAndPositionsEnum docsAndPositionsEnum; private final AuthorizationsHolder authorizationsHolder; public AccessFilteredDocsAndPositionsEnum( DocsAndPositionsEnum docsAndPositionsEnum) { this(docsAndPositionsEnum, AuthorizationsHolder.threadAuthorizations.get()); } public AccessFilteredDocsAndPositionsEnum( DocsAndPositionsEnum docsAndPositionsEnum, AuthorizationsHolder authorizationsHolder) { this.docsAndPositionsEnum = docsAndPositionsEnum; this.authorizationsHolder = authorizationsHolder; } long cost; int endOffset, startOffset, currentPosition, freq, docId; BytesRef payload; @Override public int nextPosition() throws IOException { while (!hasAccess()) { } return currentPosition + 1; } @Override public int startOffset() throws IOException { return startOffset; } @Override public int endOffset() throws IOException { return endOffset; } @Override public BytesRef getPayload() throws IOException { return payload; } @Override public int freq() throws IOException { return docsAndPositionsEnum.freq(); } @Override public int docID() { return docsAndPositionsEnum.docID(); } @Override public int nextDoc() throws IOException { while (docsAndPositionsEnum.nextDoc() != NO_MORE_DOCS) { if (hasAccess()) { return docID(); } } return NO_MORE_DOCS; } @Override public int advance(int target) throws IOException { int advance = docsAndPositionsEnum.advance(target); if (advance != NO_MORE_DOCS) { if (hasAccess()) { return docID(); } else { //seek to next available int doc; while ((doc = nextDoc()) target) { } return doc; } } return NO_MORE_DOCS; } @Override public long cost() { return docsAndPositionsEnum.cost(); } protected boolean hasAccess() throws IOException { payload = docsAndPositionsEnum.getPayload(); endOffset = docsAndPositionsEnum.endOffset(); startOffset = docsAndPositionsEnum.startOffset(); currentPosition = docsAndPositionsEnum.nextPosition() - 1; BytesRef payload = docsAndPositionsEnum.getPayload(); try { if (payload == null || ALLAUTHSHOLDER.equals(authorizationsHolder) || this.authorizationsHolder.getVisibilityEvaluator().evaluate(
PayloadSpanOrQuery
I have a need for doing using payloads in a SpanOrQuery to influence the score. I noticed that there is no PayloadSpanOrQuery so I'd like to implement one. I couldn't find a ticket in JIRA for this so I created https://issues.apache.org/jira/browse/LUCENE-6706, if this feature exists I will gladly close the JIRA if someone could point me in the right direction.
Specifying multiple query parsers
I have a use case where I want to use the block join query parser for the top level query and for the nested portion a custom query parser. I was originally doing this, which worked {!parent which='type:parent'}_query_:{!myqp df='child_pay' v='value foo'} but switched to this which also worked {!parent which='type:parent'}{!myqp}child_pay:value foo I have never seen this type of syntax where you can specify multiple query parsers inline, is this supposed to work or am I taking advantage of some oversight in the local params implementation?
Re: Specifying multiple query parsers
Sorry answered my own question. For those that are interested this is related to how BlockJoinParentQParser handles sub queries and looks like it's working as it should. On Wed, Jul 29, 2015 at 3:31 PM, Jamie Johnson jej2...@gmail.com wrote: I have a use case where I want to use the block join query parser for the top level query and for the nested portion a custom query parser. I was originally doing this, which worked {!parent which='type:parent'}_query_:{!myqp df='child_pay' v='value foo'} but switched to this which also worked {!parent which='type:parent'}{!myqp}child_pay:value foo I have never seen this type of syntax where you can specify multiple query parsers inline, is this supposed to work or am I taking advantage of some oversight in the local params implementation?
Re: Scoring, payloads and phrase queries
Thanks Mikhail! I had seen this but had originally thought it wouldn't be usable. That said I think I was wrong. I have an example that rewrites a phrase query as a SpanQuery and then uses the PayloadNearQuery which seems to work correctly. I have done something similar for MultiPhraseQuery (though I am not sure it is right at this point as I don't understand the usage of the positions in the class at this point). My first cut is shown below (PF is just a PayloadFunction and not of much interest). Does this look correct? MultiPhraseQuery phrase = (MultiPhraseQuery)query; ListTerm[] terms = phrase.getTermArrays(); SpanQuery[] topLevelSpans = new SpanQuery[terms.size()]; for(int j = 0; j terms.size(); j++) { Term[] internalTerms = terms.get(j); SpanQuery[] sq = new SpanQuery[internalTerms.length]; for(int i = 0; i internalTerms.length; i++) { sq[i] = new SpanTermQuery(internalTerms[i]); } topLevelSpans[j]= new SpanOrQuery(sq); } PayloadNearQuery pnq = new PayloadNearQuery(topLevelSpans, phrase.getSlop(), true, new PF()); pnq.setBoost(phrase.getBoost()); It looks like to support Payloads in all the query types I would like to support I'll need to rewrite the queries (or their pieces) to a PayloadNearQuery or a PayloadTermQuery. Is there a PayloadMultiTermQuery that Fuzzy, Range, Wildcard, etc. type of queries could be rewritten to? Again thanks I really appreciate the pointer. On Jul 25, 2015 5:22 AM, Mikhail Khludnev mkhlud...@griddynamics.com wrote: Does PayloadNearQuery suite for it? On Fri, Jul 24, 2015 at 5:41 PM, Jamie Johnson jej2...@gmail.com wrote: Is there a way to consider payloads for scoring in phrase queries like exists in PayloadTermQuery? -- Sincerely yours Mikhail Khludnev Principal Engineer, Grid Dynamics http://www.griddynamics.com mkhlud...@griddynamics.com
Scoring, payloads and phrase queries
Is there a way to consider payloads for scoring in phrase queries like exists in PayloadTermQuery?
Re: Scoring, payloads and phrase queries
looks like there is nothing that exists in this regard and there is no jira ticket that I could find. Is this something that there is any other interest in? Is this something that a ticket should be created for? On Fri, Jul 24, 2015 at 10:41 AM, Jamie Johnson jej2...@gmail.com wrote: Is there a way to consider payloads for scoring in phrase queries like exists in PayloadTermQuery?