from:"Will Johnson"

Concurrent Updates

2019-04-02 Thread Mark Johnson

We have a SolrCloud cluster (of 3 nodes) running solr 6.4.2. Every night,
we delete and recreate our whole catalog. In this process, we're
simultaneously running a query which recreates the product catalog (which
includes child documents of a different type) and a query that creates a
third document type that we use for joining. When we issue a search against
one shard, we see the response we expect. But when we issue the same search
against another shard, instead of the prescribed child documents, we'll
have children that are this third type of document.

This seems to only affect the occasional document. We're wondering if
anybody out there has experience with this, and might have some ideas as to
why it is happening. Thanks so much.

-- 
*This message is intended only for the use of the
individual or entity to 
which it is addressed and may contain information that
is privileged, 
confidential and exempt from disclosure under applicable law. If
you have 
received this message in error, you are hereby notified that any use,

dissemination, distribution or copying of this message is prohibited. If 
you
have received this communication in error, please notify the sender 
immediately
and destroy the transmitted information.*

Retrieve DocIdSet from Query in lucene 5.x

2017-10-20 Thread Jamie Johnson

I am trying to migrate some old code that used to retrieve DocIdSets from
filters, but with Filters being deprecated in Lucene 5.x I am trying to
move away from those classes but I'm not sure the right way to do this
now.  Are there any examples of doing this?

JAr errors with SoLr 6.6.1 and http client and core

2017-10-16 Thread Johnson, Jaya


Hi I have the following code.
System.out.println("Initializing server");
SystemDefaultHttpClient cl = new SystemDefaultHttpClient();
client = new 
HttpSolrClient("http://localhost:8983/solr/#/prosp_poc_collection",cl);
System.out.println("Completed initializing the server");
client.deleteByQuery( "*:*" );


Version of solr 6.6.1 and http client is 4.5.3 and 4.4.8 for http core.
I get the following exception please advise.

Exception Details:
  Location:

org/apache/solr/client/solrj/impl/HttpClientUtil.createClient(Lorg/apache/solr/common/params/SolrParams;)Lorg/apache/http/impl/client/CloseableHttpClient;
 @57: areturn
  Reason:
Type 'org/apache/http/impl/client/SystemDefaultHttpClient' (current frame, 
stack[0]) is not assignable to 
'org/apache/http/impl/client/CloseableHttpClient' (from method signature)
  Current Frame:
bci: @57
flags: { }
locals: { 'org/apache/solr/common/params/SolrParams', 
'org/apache/solr/common/params/ModifiableSolrParams', 
'org/apache/http/impl/client/SystemDefaultHttpClient' }
stack: { 'org/apache/http/impl/client/SystemDefaultHttpClient' }
  Bytecode:
0x000: bb00 0959 2ab7 000a 4cb2 000b b900 0c01
0x010: 0099 001e b200 0bbb 000d 59b7 000e 120f
0x020: b600 102b b600 11b6 0012 b900 1302 00b8
0x030: 0014 4d2c 2bb8 0015 2cb0
  Stackmap Table:
append_frame(@47,Object[#172])

   at 
org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:514)
   at 
org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:279)
   at 
org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:268)
   at org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:160)
   at 
org.apache.solr.client.solrj.SolrClient.deleteByQuery(SolrClient.java:895)
   at 
org.apache.solr.client.solrj.SolrClient.deleteByQuery(SolrClient.java:858)
   at 
org.apache.solr.client.solrj.SolrClient.deleteByQuery(SolrClient.java:873)
   at 
com.moodys.poc.tika.test.PocApacheTikaSecFilings.initSolrServer(PocApacheTikaSecFilings.java:35)
   at 
com.moodys.poc.tika.test.PocApacheTikaSecFilings.main(PocApacheTikaSecFilings.java:41)

-

Moody's monitors email communications through its networks for regulatory 
compliance purposes and to protect its customers, employees and business and 
where allowed to do so by applicable law. The information contained in this 
e-mail message, and any attachment thereto, is confidential and may not be 
disclosed without our express permission. If you are not the intended recipient 
or an employee or agent responsible for delivering this message to the intended 
recipient, you are hereby notified that you have received this message in error 
and that any review, dissemination, distribution or copying of this message, or 
any attachment thereto, in whole or in part, is strictly prohibited. If you 
have received this message in error, please immediately notify us by telephone, 
fax or e-mail and delete the message and all of its attachments. Every effort 
is made to keep our network free from viruses. You should, however, review this 
e-mail message, as well as any attachment thereto, for viruses. We take no 
responsibility and have no liability for any computer virus which may be 
transferred via this e-mail message.

-

Solr Beginner!!

2017-09-28 Thread Johnson, Jaya

Hi:
I am trying to ingest a few memos - they do not have any standard format (json, 
xml etc etc) but just plain text however the memos all follow some template. 
What I would like to od post ingestion is to extract keywords and some values 
around it. So say for instance if the text contains the key word Outstanding 
Amount: 1000.  I would like to search for Outstanding Amount ( I can do that 
using the query interface) how to I extract the entire string Outstanding 
Amount +3or4 words from Solr.

I am really new to solr so any documentation etc would be super helpful. Is 
Solr the right tool for this use case also

Thanks.
-

Moody's monitors email communications through its networks for regulatory 
compliance purposes and to protect its customers, employees and business and 
where allowed to do so by applicable law. The information contained in this 
e-mail message, and any attachment thereto, is confidential and may not be 
disclosed without our express permission. If you are not the intended recipient 
or an employee or agent responsible for delivering this message to the intended 
recipient, you are hereby notified that you have received this message in error 
and that any review, dissemination, distribution or copying of this message, or 
any attachment thereto, in whole or in part, is strictly prohibited. If you 
have received this message in error, please immediately notify us by telephone, 
fax or e-mail and delete the message and all of its attachments. Every effort 
is made to keep our network free from viruses. You should, however, review this 
e-mail message, as well as any attachment thereto, for viruses. We take no 
responsibility and have no liability for any computer virus which may be 
transferred via this e-mail message.

-

RE: Solr 7.0.0 -- can it use a 6.5.0 data repository (index)

2017-09-27 Thread Wayne L. Johnson

First, thanks for the quick response.  Yes, it sounds like the same problem!!

I did a bunch of searching before repoting the issue, I didn't come across that 
JIRA or I wouldn't have reported it.  My apologies for the duplication 
(although it is a new JIRA).

Is there a good place to start searching in the future?  I'm a fairly 
experiences Solr user, and I don't mind slogging through Java code.

Meanwhile I'll follow the JIRA so I know when it gets fixed.

Thanks!!

-Original Message-
From: Stefan Matheis [mailto:matheis.ste...@gmail.com] 
Sent: Wednesday, September 27, 2017 12:32 PM
To: solr-user@lucene.apache.org
Subject: Re: Solr 7.0.0 -- can it use a 6.5.0 data repository (index)

That sounds like 
https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_SOLR-2D11406=DwIFaQ=z0adcvxXWKG6LAMN6dVEqQ=4gLDKHTqOXldY2aQti2VNXYWPtqa1bUKE6MA9VrIJfU=iYU948dQo6G0tKFQUguY6SHOZNZoCOEAEv1sCf4ukcA=HvPPQL--s3bFtNyBdUiz1hNIqfLEVrb4Cu-HIC71dKY=
  if i'm not mistaken?

-Stefan

On Sep 27, 2017 8:20 PM, "Wayne L. Johnson" <wjohn...@familysearch.org>
wrote:

> I’m testing Solr 7.0.0.  When I start with an empty index, Solr comes 
> up just fine, I can add documents and query documents.  However when I 
> start with an already-populated set of documents (from 6.5.0), Solr 
> will not start.  The relevant portion of the traceback seems to be:
>
> Caused by: java.lang.NullPointerException
>
> at java.util.Objects.requireNonNull(Objects.java:203)
>
> …
>
> at java.util.stream.ReferencePipeline.reduce(
> ReferencePipeline.java:479)
>
> at org.apache.solr.index.SlowCompositeReaderWrapper.(
> SlowCompositeReaderWrapper.java:76)
>
> at org.apache.solr.index.SlowCompositeReaderWrapper.wrap(
> SlowCompositeReaderWrapper.java:57)
>
> at org.apache.solr.search.SolrIndexSearcher.(
> SolrIndexSearcher.java:252)
>
> at org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:
> 2034)
>
> ... 12 more
>
>
>
> In looking at the de-compiled code (SlowCompositeReaderWrapper), lines 
> 72-77, and it appears that one or more “leaf” files doesn’t have a 
> “min-version” set.  That’s a guess.  If so, does this mean Solr 7.0.0 
> can’t read a 6.5.0 index?
>
>
>
> Thanks
>
>
>
> Wayne Johnson
>
> 801-240-4024
>
> wjohnson...@ldschurch.org
>
> [image: familysearch2.JPG]
>
>
>

Solr 7.0.0 -- can it use a 6.5.0 data repository (index)

2017-09-27 Thread Wayne L. Johnson


I'm testing Solr 7.0.0.  When I start with an empty index, Solr comes up just 
fine, I can add documents and query documents.  However when I start with an 
already-populated set of documents (from 6.5.0), Solr will not start.  The 
relevant portion of the traceback seems to be:
Caused by: java.lang.NullPointerException
at java.util.Objects.requireNonNull(Objects.java:203)
...
at java.util.stream.ReferencePipeline.reduce(ReferencePipeline.java:479)
at 
org.apache.solr.index.SlowCompositeReaderWrapper.(SlowCompositeReaderWrapper.java:76)
at 
org.apache.solr.index.SlowCompositeReaderWrapper.wrap(SlowCompositeReaderWrapper.java:57)
at 
org.apache.solr.search.SolrIndexSearcher.(SolrIndexSearcher.java:252)
at org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:2034)
... 12 more

In looking at the de-compiled code (SlowCompositeReaderWrapper), lines 72-77, 
and it appears that one or more "leaf" files doesn't have a "min-version" set.  
That's a guess.  If so, does this mean Solr 7.0.0 can't read a 6.5.0 index?

Thanks

Wayne Johnson
801-240-4024
wjohnson...@ldschurch.org<mailto:wjohnson...@ldschurch.org>
[familysearch2.JPG]

Re: Custom StoredFieldVisitor in Solr

2017-08-25 Thread Jamie Johnson

Hi Rick.  The use case is we use payloads to determine if a particular user
can or can't see a field, as of right now we have the query piece working
so that fields the user can't see don't contribute to the score but we
wanted to use a custom stored field visitor as well so that we can remove
fields that particular user shouldn't be able to see.

On Thu, Aug 24, 2017 at 11:08 AM, Rick Leir <rl...@leirtech.com> wrote:

> Jamie, what is the use case? Cheers -- Rick
>
> On August 23, 2017 11:30:38 AM MDT, Jamie Johnson <jej2...@gmail.com>
> wrote:
> >I thought I had asked this previously, but I can't find reference to it
> >now.  I am interested in using a custom StoredFieldVisitor in Solr and
> >after spelunking through the code for a little it seems that there is
> >no
> >easy extension point that supports me doing so.  I am currently on Solr
> >4.x
> >(moving forward is a long term option, but can't be done in the short
> >term).  The only option I see at this point is creating a forking Solr
> >and
> >changing the way SolrIndexSearcher currently works to provide another
> >option to enable my custom StoredFieldVisitor.  While I'd prefer not to
> >do
> >so, if it is my only option I am ok with it.
> >
> >Are there any suggestions for how to go about supporting this besides
> >the
> >above?
> >
> >Jamie
>
> --
> Sorry for being brief. Alternate email is rickleir at yahoo dot com

Custom StoredFieldVisitor in Solr

2017-08-23 Thread Jamie Johnson

I thought I had asked this previously, but I can't find reference to it
now.  I am interested in using a custom StoredFieldVisitor in Solr and
after spelunking through the code for a little it seems that there is no
easy extension point that supports me doing so.  I am currently on Solr 4.x
(moving forward is a long term option, but can't be done in the short
term).  The only option I see at this point is creating a forking Solr and
changing the way SolrIndexSearcher currently works to provide another
option to enable my custom StoredFieldVisitor.  While I'd prefer not to do
so, if it is my only option I am ok with it.

Are there any suggestions for how to go about supporting this besides the
above?

Jamie

Re: Regex Phrases

2017-03-23 Thread Mark Johnson

So I managed to get the tokenizing to work with
both PatternTokenizerFactory and WordDelimiterFilterFactory (used in
combination with WhitespaceTokenizerFactory). For PT I used a regex that
matches the various permutations of the phrases, and for WDF/WT I used
protected words with every permutation (there are only 40 or 50).

In both cases, via the admin/analysis screen, the Index and Query values
were tokenized correctly (for example, "Super Vitamin C" was tokenized as
"Super" and "Vitamin C").

However, when I do a query like "DisplayName:(Super Vitamin C)" with
"debug=query", I see that the parsed query is "DisplayName:Super
DisplayName:Vitamin DisplayName:C" ("DisplayName" is the field I'm working
on here).

Shouldn't that instead be parsed as something like "DIsplayName:Super
DisplayName:"Vitamin C"" or something similar? Or am I not understanding
how query parsing works?

In either case, I'm seeing results where DisplayName contains things like
"Vitamin B 90 Caps" or "Super Orange 30 pkts", neither of which contain the
phrase "Vitamin C", so I suspect something is wrong.

On Thu, Mar 23, 2017 at 8:08 AM, Joel Bernstein <joels...@gmail.com> wrote:

> You can also checkout
> https://cwiki.apache.org/confluence/display/solr/Tokenizers#Tokenizers-
> RegularExpressionPatternTokenizer
> .
>
> Joel Bernstein
> http://joelsolr.blogspot.com/
>
> On Wed, Mar 22, 2017 at 7:52 PM, Erick Erickson <erickerick...@gmail.com>
> wrote:
>
> > Susheel:
> >
> > That'll work, but the options you've specified for
> > WordDelimiterFilterFactory pretty much make it so it's doing nothing.
> > I realize it's commented out...
> >
> > That said, it's true that if you have a very specific pattern you want
> > to recognize a Regex can do the trick. WDFF is a bit more generic
> > though when you have less specific requirements.
> >
> > Best,
> > Erick
> >
> > On Wed, Mar 22, 2017 at 12:56 PM, Susheel Kumar <susheel2...@gmail.com>
> > wrote:
> > > I have used PatternReplaceFilterFactory in some of these situations.
> e.g.
> > > below
> > >
> > >  
>  > > class="solr.PatternReplaceFilterFactory" pattern="(\d+)-(\d+)-?(\d+)$"
> > > replacement="$1$2$3"/>
> > >
> > > On Wed, Mar 22, 2017 at 2:54 PM, Mark Johnson <
> > mjohn...@emersonecologics.com
> > >> wrote:
> > >
> > >> Awesome, thank you much!
> > >>
> > >> On Wed, Mar 22, 2017 at 2:38 PM, Erick Erickson <
> > erickerick...@gmail.com>
> > >> wrote:
> > >>
> > >> > Take a close look at WordDelimiterFilterFactory, it's designed to
> deal
> > >> > with things like part numbers, phone numbers and the like, and the
> > >> > example you gave is in the same class of problem I think. It'll take
> > >> > a bit to get your head around what it does, but it'll perfom better
> > >> > than regexes, assuming you can get what you need out of it.
> > >> >
> > >> > And the admin/analysis page will help you _greatly_ in understanding
> > >> > what the effects of the various parameters are.
> > >> >
> > >> > Best,
> > >> > Erick
> > >> >
> > >> > On Wed, Mar 22, 2017 at 11:06 AM, Mark Johnson
> > >> > <mjohn...@emersonecologics.com> wrote:
> > >> > > Is it possible to configure Solr to treat text that matches a
> regex
> > as
> > >> a
> > >> > > phrase?
> > >> > >
> > >> > > I have a database full of products, and the Title and Description
> > >> fields
> > >> > > are text_en, tokenized via the StandardTokenizerFactory. This
> works
> > in
> > >> > most
> > >> > > cases, but a number of products have names like:
> > >> > >
> > >> > >  - Vitamin A
> > >> > >  - Vitamin-A
> > >> > >  - Vitamin B12
> > >> > >  - Vitamin B-12
> > >> > > ...and so on
> > >> > >
> > >> > > I have a regex that will match all of the permutations and would
> > like
> > >> to
> > >> > > configure the field type so that anything that matches the regex
> > >> pattern
> > >> > is
> > >> > > treated as a single token, instead of being broken up by spaces,
>

Re: Regex Phrases

2017-03-22 Thread Mark Johnson

Awesome, thank you much!

On Wed, Mar 22, 2017 at 2:38 PM, Erick Erickson <erickerick...@gmail.com>
wrote:

> Take a close look at WordDelimiterFilterFactory, it's designed to deal
> with things like part numbers, phone numbers and the like, and the
> example you gave is in the same class of problem I think. It'll take
> a bit to get your head around what it does, but it'll perfom better
> than regexes, assuming you can get what you need out of it.
>
> And the admin/analysis page will help you _greatly_ in understanding
> what the effects of the various parameters are.
>
> Best,
> Erick
>
> On Wed, Mar 22, 2017 at 11:06 AM, Mark Johnson
> <mjohn...@emersonecologics.com> wrote:
> > Is it possible to configure Solr to treat text that matches a regex as a
> > phrase?
> >
> > I have a database full of products, and the Title and Description fields
> > are text_en, tokenized via the StandardTokenizerFactory. This works in
> most
> > cases, but a number of products have names like:
> >
> >  - Vitamin A
> >  - Vitamin-A
> >  - Vitamin B12
> >  - Vitamin B-12
> > ...and so on
> >
> > I have a regex that will match all of the permutations and would like to
> > configure the field type so that anything that matches the regex pattern
> is
> > treated as a single token, instead of being broken up by spaces, etc. Is
> > that possible?
> >
> > --
> > *This message is intended only for the use of the individual or entity to
> > which it is addressed and may contain information that is privileged,
> > confidential and exempt from disclosure under applicable law. If you have
> > received this message in error, you are hereby notified that any use,
> > dissemination, distribution or copying of this message is prohibited. If
> > you have received this communication in error, please notify the sender
> > immediately and destroy the transmitted information.*
>



-- 

Best Regards,

*Mark Johnson* | .NET Software Engineer

Office: 603-392-7017

Emerson Ecologics, LLC | 1230 Elm Street | Suite 301 | Manchester NH | 03101

<http://www.emersonecologics.com/>  <https://wellevate.me/#/>

*Supporting The Practice Of Healthy Living*

<http://blog.emersonecologics.com/>
<https://www.linkedin.com/company/emerson-ecologics>
<https://www.facebook.com/emersonecologics/>
<https://twitter.com/EmersonEcologic>
<https://www.instagram.com/emerson_ecologics/>
<https://www.pinterest.com/emersonecologic/>
<https://www.glassdoor.com/Overview/Working-at-Emerson-Ecologics-EI_IE388367.11,28.htm>

-- 
*This message is intended only for the use of the individual or entity to 
which it is addressed and may contain information that is privileged, 
confidential and exempt from disclosure under applicable law. If you have 
received this message in error, you are hereby notified that any use, 
dissemination, distribution or copying of this message is prohibited. If 
you have received this communication in error, please notify the sender 
immediately and destroy the transmitted information.*

Regex Phrases

2017-03-22 Thread Mark Johnson

Is it possible to configure Solr to treat text that matches a regex as a
phrase?

I have a database full of products, and the Title and Description fields
are text_en, tokenized via the StandardTokenizerFactory. This works in most
cases, but a number of products have names like:

 - Vitamin A
 - Vitamin-A
 - Vitamin B12
 - Vitamin B-12
...and so on

I have a regex that will match all of the permutations and would like to
configure the field type so that anything that matches the regex pattern is
treated as a single token, instead of being broken up by spaces, etc. Is
that possible?

-- 
*This message is intended only for the use of the individual or entity to 
which it is addressed and may contain information that is privileged, 
confidential and exempt from disclosure under applicable law. If you have 
received this message in error, you are hereby notified that any use, 
dissemination, distribution or copying of this message is prohibited. If 
you have received this communication in error, please notify the sender 
immediately and destroy the transmitted information.*

Re: Partial Match with DF

2017-03-16 Thread Mark Johnson

Thank you for the heads up! I think in some cases we will want to strip out
punctuation but in others we might need it (for example, "liquid courage."
should tokenize to "liquid" and "courage", while "1.5 oz liquid courage"
should tokenize to "1.5", "oz", "liquid" and "courage").

I'll have to do some experimenting to see which one will work best for us.

On Thu, Mar 16, 2017 at 11:09 AM, Erick Erickson <erickerick...@gmail.com>
wrote:

> Yeah, they've saved me on numerous occasions, glad to see they helped.
>
> One caution BTW when you start changing fieldTypes is you have to
> watch punctuation. StandardTokenizerFactory won't pass through most
> punctuation.
>
> WordDelimiterFilterFactory breaks on non alpha-num, including
> punctuation effectively throwing it out.
>
> But WhitespaceTokenizer does just that and spits out punctuation as
> part of tokens, i.e.
> "my words." (note period) is broken up as "my" "words." and wouldn't
> match a search on "word".
>
> One other note, there's a tokenizer/filter for a zillion different
> cases, you can go wild. Here's a partial
> list:https://cwiki.apache.org/confluence/display/solr/
> Understanding+Analyzers%2C+Tokenizers%2C+and+Filters,
> see the "Tokenizer", "Filters" and CharFilters" links. There are 12
> tokenizers listed and 40 or so filters... and the list is not
> guaranteed to be complete.
>
> On Thu, Mar 16, 2017 at 7:39 AM, Mark Johnson
> <mjohn...@emersonecologics.com> wrote:
> > You're right! The fields I'm searching are all "string" type. I switched
> to
> > "text_en" and now it's working exactly as I need it to! I'll do some
> > research to see if "text_en" or another "text" type field is best for our
> > needs.
> >
> > Also, those debug options are amazing! They'll help tremendously in the
> > future.
> >
> > Thank you much!
> >
> > On Thu, Mar 16, 2017 at 10:02 AM, Erick Erickson <
> erickerick...@gmail.com>
> > wrote:
> >
> >> My guess: Your analysis chain for the fields is different, i.e. they
> >> have a different fieldType. In particular, watch out for the "string"
> >> type, people are often confused about it. It does _not_ break input
> >> into tokens, you need a text-based field type, text_en is one example
> >> that is usually in the configs by default.
> >>
> >> Two tools that'll help you enormously:
> >>
> >> admin UI>>select core (or collection) from the drop-down>>analysis
> >> That shows you exactly how Solr/Lucene break up text at query and index
> >> time
> >>
> >> add =query to the URL. That'll show you how the query was parsed.
> >>
> >> Best,
> >> Erick
> >>
> >> On Thu, Mar 16, 2017 at 6:52 AM, Mark Johnson
> >> <mjohn...@emersonecologics.com> wrote:
> >> > Oh, great! Thank you!
> >> >
> >> > So if I switch over to eDisMax I'd specify the fields to query via the
> >> "qf"
> >> > parameter, right? That seems to have the same result (only matches
> when I
> >> > specify the exact phrase in the field, not just certain words from
> it).
> >> >
> >> > On Thu, Mar 16, 2017 at 9:33 AM, Alexandre Rafalovitch <
> >> arafa...@gmail.com>
> >> > wrote:
> >> >
> >> >> df is default field - you can only give one. To search over multiple
> >> >> fields, you switch to eDisMax query parser and fl parameter.
> >> >>
> >> >> Then, the question will be what type definition your fields have.
> When
> >> you
> >> >> search text field, you are using its definition because of copyField.
> >> Your
> >> >> original fields may be strings.
> >> >>
> >> >> Remember to reload core and reminded when you change definitions.
> >> >>
> >> >> Regards,
> >> >>Alex
> >> >>
> >> >>
> >> >> On 16 Mar 2017 9:15 AM, "Mark Johnson" <
> mjohn...@emersonecologics.com>
> >> >> wrote:
> >> >>
> >> >> > Forgive me if I'm missing something obvious -- I'm new to Solr,
> but I
> >> >> can't
> >> >> > seem to find an explanation for the behavior I'm seeing.
> >> >> >
> >> >> > If I have a document that look

Re: Partial Match with DF

2017-03-16 Thread Mark Johnson

Wow, that's really powerful! Thank you!

On Thu, Mar 16, 2017 at 11:19 AM, Charlie Hull <char...@flax.co.uk> wrote:

> Hi Mark,
>
> Open Source Connection's excellent www.splainer.io might also be useful to
> help you break down exactly what your query is doing.
>
> Cheers
>
> Charlie
>
> P.S. planning a blog soon listing 'useful Solr tools'
>
> On 16 March 2017 at 14:39, Mark Johnson <mjohn...@emersonecologics.com>
> wrote:
>
> > You're right! The fields I'm searching are all "string" type. I switched
> to
> > "text_en" and now it's working exactly as I need it to! I'll do some
> > research to see if "text_en" or another "text" type field is best for our
> > needs.
> >
> > Also, those debug options are amazing! They'll help tremendously in the
> > future.
> >
> > Thank you much!
> >
> > On Thu, Mar 16, 2017 at 10:02 AM, Erick Erickson <
> erickerick...@gmail.com>
> > wrote:
> >
> > > My guess: Your analysis chain for the fields is different, i.e. they
> > > have a different fieldType. In particular, watch out for the "string"
> > > type, people are often confused about it. It does _not_ break input
> > > into tokens, you need a text-based field type, text_en is one example
> > > that is usually in the configs by default.
> > >
> > > Two tools that'll help you enormously:
> > >
> > > admin UI>>select core (or collection) from the drop-down>>analysis
> > > That shows you exactly how Solr/Lucene break up text at query and index
> > > time
> > >
> > > add =query to the URL. That'll show you how the query was parsed.
> > >
> > > Best,
> > > Erick
> > >
> > > On Thu, Mar 16, 2017 at 6:52 AM, Mark Johnson
> > > <mjohn...@emersonecologics.com> wrote:
> > > > Oh, great! Thank you!
> > > >
> > > > So if I switch over to eDisMax I'd specify the fields to query via
> the
> > > "qf"
> > > > parameter, right? That seems to have the same result (only matches
> > when I
> > > > specify the exact phrase in the field, not just certain words from
> it).
> > > >
> > > > On Thu, Mar 16, 2017 at 9:33 AM, Alexandre Rafalovitch <
> > > arafa...@gmail.com>
> > > > wrote:
> > > >
> > > >> df is default field - you can only give one. To search over multiple
> > > >> fields, you switch to eDisMax query parser and fl parameter.
> > > >>
> > > >> Then, the question will be what type definition your fields have.
> When
> > > you
> > > >> search text field, you are using its definition because of
> copyField.
> > > Your
> > > >> original fields may be strings.
> > > >>
> > > >> Remember to reload core and reminded when you change definitions.
> > > >>
> > > >> Regards,
> > > >>Alex
> > > >>
> > > >>
> > > >> On 16 Mar 2017 9:15 AM, "Mark Johnson" <
> mjohn...@emersonecologics.com
> > >
> > > >> wrote:
> > > >>
> > > >> > Forgive me if I'm missing something obvious -- I'm new to Solr,
> but
> > I
> > > >> can't
> > > >> > seem to find an explanation for the behavior I'm seeing.
> > > >> >
> > > >> > If I have a document that looks like this:
> > > >> > {
> > > >> > field1: "aaa bbb",
> > > >> > field2: "ccc ddd",
> > > >> > field3: "eee fff"
> > > >> > }
> > > >> >
> > > >> > And I do a search where "q" is "aaa ccc", I get the document in
> the
> > > >> > results. This is because (please correct me if I'm wrong) the
> > default
> > > >> "df"
> > > >> > is set to the "_text_" field, which contains the text values from
> > all
> > > >> > fields.
> > > >> >
> > > >> > However, if I do a search where "df" is "field1" and "field2" and
> > "q"
> > > is
> > > >> > "aaa ccc" (words from field1 and field2) I get no results.
> > > >> >
> > > >> > In a simpler example, if I do a searc

Re: Partial Match with DF

2017-03-16 Thread Mark Johnson

You're right! The fields I'm searching are all "string" type. I switched to
"text_en" and now it's working exactly as I need it to! I'll do some
research to see if "text_en" or another "text" type field is best for our
needs.

Also, those debug options are amazing! They'll help tremendously in the
future.

Thank you much!

On Thu, Mar 16, 2017 at 10:02 AM, Erick Erickson <erickerick...@gmail.com>
wrote:

> My guess: Your analysis chain for the fields is different, i.e. they
> have a different fieldType. In particular, watch out for the "string"
> type, people are often confused about it. It does _not_ break input
> into tokens, you need a text-based field type, text_en is one example
> that is usually in the configs by default.
>
> Two tools that'll help you enormously:
>
> admin UI>>select core (or collection) from the drop-down>>analysis
> That shows you exactly how Solr/Lucene break up text at query and index
> time
>
> add =query to the URL. That'll show you how the query was parsed.
>
> Best,
> Erick
>
> On Thu, Mar 16, 2017 at 6:52 AM, Mark Johnson
> <mjohn...@emersonecologics.com> wrote:
> > Oh, great! Thank you!
> >
> > So if I switch over to eDisMax I'd specify the fields to query via the
> "qf"
> > parameter, right? That seems to have the same result (only matches when I
> > specify the exact phrase in the field, not just certain words from it).
> >
> > On Thu, Mar 16, 2017 at 9:33 AM, Alexandre Rafalovitch <
> arafa...@gmail.com>
> > wrote:
> >
> >> df is default field - you can only give one. To search over multiple
> >> fields, you switch to eDisMax query parser and fl parameter.
> >>
> >> Then, the question will be what type definition your fields have. When
> you
> >> search text field, you are using its definition because of copyField.
> Your
> >> original fields may be strings.
> >>
> >> Remember to reload core and reminded when you change definitions.
> >>
> >> Regards,
> >>Alex
> >>
> >>
> >> On 16 Mar 2017 9:15 AM, "Mark Johnson" <mjohn...@emersonecologics.com>
> >> wrote:
> >>
> >> > Forgive me if I'm missing something obvious -- I'm new to Solr, but I
> >> can't
> >> > seem to find an explanation for the behavior I'm seeing.
> >> >
> >> > If I have a document that looks like this:
> >> > {
> >> > field1: "aaa bbb",
> >> > field2: "ccc ddd",
> >> > field3: "eee fff"
> >> > }
> >> >
> >> > And I do a search where "q" is "aaa ccc", I get the document in the
> >> > results. This is because (please correct me if I'm wrong) the default
> >> "df"
> >> > is set to the "_text_" field, which contains the text values from all
> >> > fields.
> >> >
> >> > However, if I do a search where "df" is "field1" and "field2" and "q"
> is
> >> > "aaa ccc" (words from field1 and field2) I get no results.
> >> >
> >> > In a simpler example, if I do a search where "df" is "field1" and "q"
> is
> >> > "aaa" (a word from field1) I still get no results.
> >> >
> >> > If I do a search where "df" is "field1" and "q" is "aaa bbb" (the full
> >> > value of field1) then I get the document in the results.
> >> >
> >> > So I'm concluding that when using "df" to specify which fields to
> search
> >> > then only an exact match on the full field value will return a
> document.
> >> >
> >> > Is that a correct conclusion? Is there another way to specify which
> >> fields
> >> > to search without requiring an exact match? The results I'd like to
> >> achieve
> >> > are:
> >> >
> >> > Would Match:
> >> > q=aaa
> >> > q=aaa bbb
> >> > q=aaa ccc
> >> > q=aaa fff
> >> >
> >> > Would Not Match:
> >> > q=eee
> >> > q=fff
> >> > q=eee fff
> >> >
> >> > --
> >> > *This message is intended only for the use of the individual or
> entity to
> >> > which it is addressed and may contain information that is privileged,
> >> > confidential and exempt from di

Re: Partial Match with DF

2017-03-16 Thread Mark Johnson

Oh, great! Thank you!

So if I switch over to eDisMax I'd specify the fields to query via the "qf"
parameter, right? That seems to have the same result (only matches when I
specify the exact phrase in the field, not just certain words from it).

On Thu, Mar 16, 2017 at 9:33 AM, Alexandre Rafalovitch <arafa...@gmail.com>
wrote:

> df is default field - you can only give one. To search over multiple
> fields, you switch to eDisMax query parser and fl parameter.
>
> Then, the question will be what type definition your fields have. When you
> search text field, you are using its definition because of copyField. Your
> original fields may be strings.
>
> Remember to reload core and reminded when you change definitions.
>
> Regards,
>    Alex
>
>
> On 16 Mar 2017 9:15 AM, "Mark Johnson" <mjohn...@emersonecologics.com>
> wrote:
>
> > Forgive me if I'm missing something obvious -- I'm new to Solr, but I
> can't
> > seem to find an explanation for the behavior I'm seeing.
> >
> > If I have a document that looks like this:
> > {
> > field1: "aaa bbb",
> > field2: "ccc ddd",
> > field3: "eee fff"
> > }
> >
> > And I do a search where "q" is "aaa ccc", I get the document in the
> > results. This is because (please correct me if I'm wrong) the default
> "df"
> > is set to the "_text_" field, which contains the text values from all
> > fields.
> >
> > However, if I do a search where "df" is "field1" and "field2" and "q" is
> > "aaa ccc" (words from field1 and field2) I get no results.
> >
> > In a simpler example, if I do a search where "df" is "field1" and "q" is
> > "aaa" (a word from field1) I still get no results.
> >
> > If I do a search where "df" is "field1" and "q" is "aaa bbb" (the full
> > value of field1) then I get the document in the results.
> >
> > So I'm concluding that when using "df" to specify which fields to search
> > then only an exact match on the full field value will return a document.
> >
> > Is that a correct conclusion? Is there another way to specify which
> fields
> > to search without requiring an exact match? The results I'd like to
> achieve
> > are:
> >
> > Would Match:
> > q=aaa
> > q=aaa bbb
> > q=aaa ccc
> > q=aaa fff
> >
> > Would Not Match:
> > q=eee
> > q=fff
> > q=eee fff
> >
> > --
> > *This message is intended only for the use of the individual or entity to
> > which it is addressed and may contain information that is privileged,
> > confidential and exempt from disclosure under applicable law. If you have
> > received this message in error, you are hereby notified that any use,
> > dissemination, distribution or copying of this message is prohibited. If
> > you have received this communication in error, please notify the sender
> > immediately and destroy the transmitted information.*
> >
>



-- 

Best Regards,

*Mark Johnson* | .NET Software Engineer

Office: 603-392-7017

Emerson Ecologics, LLC | 1230 Elm Street | Suite 301 | Manchester NH | 03101

<http://www.emersonecologics.com/>  <https://wellevate.me/#/>

*Supporting The Practice Of Healthy Living*

<http://blog.emersonecologics.com/>
<https://www.linkedin.com/company/emerson-ecologics>
<https://www.facebook.com/emersonecologics/>
<https://twitter.com/EmersonEcologic>
<https://www.instagram.com/emerson_ecologics/>
<https://www.pinterest.com/emersonecologic/>
<https://www.glassdoor.com/Overview/Working-at-Emerson-Ecologics-EI_IE388367.11,28.htm>

-- 
*This message is intended only for the use of the individual or entity to 
which it is addressed and may contain information that is privileged, 
confidential and exempt from disclosure under applicable law. If you have 
received this message in error, you are hereby notified that any use, 
dissemination, distribution or copying of this message is prohibited. If 
you have received this communication in error, please notify the sender 
immediately and destroy the transmitted information.*

Partial Match with DF

2017-03-16 Thread Mark Johnson

Forgive me if I'm missing something obvious -- I'm new to Solr, but I can't
seem to find an explanation for the behavior I'm seeing.

If I have a document that looks like this:
{
field1: "aaa bbb",
field2: "ccc ddd",
field3: "eee fff"
}

And I do a search where "q" is "aaa ccc", I get the document in the
results. This is because (please correct me if I'm wrong) the default "df"
is set to the "_text_" field, which contains the text values from all
fields.

However, if I do a search where "df" is "field1" and "field2" and "q" is
"aaa ccc" (words from field1 and field2) I get no results.

In a simpler example, if I do a search where "df" is "field1" and "q" is
"aaa" (a word from field1) I still get no results.

If I do a search where "df" is "field1" and "q" is "aaa bbb" (the full
value of field1) then I get the document in the results.

So I'm concluding that when using "df" to specify which fields to search
then only an exact match on the full field value will return a document.

Is that a correct conclusion? Is there another way to specify which fields
to search without requiring an exact match? The results I'd like to achieve
are:

Would Match:
q=aaa
q=aaa bbb
q=aaa ccc
q=aaa fff

Would Not Match:
q=eee
q=fff
q=eee fff

-- 
*This message is intended only for the use of the individual or entity to 
which it is addressed and may contain information that is privileged, 
confidential and exempt from disclosure under applicable law. If you have 
received this message in error, you are hereby notified that any use, 
dissemination, distribution or copying of this message is prohibited. If 
you have received this communication in error, please notify the sender 
immediately and destroy the transmitted information.*

Re: Custom handler/content stream loader

2016-08-24 Thread Jamie Johnson

I did start start with this but it's a limited approach since it only works
with text fields.  Right now I'm using this with a bunch of custom fields
extended to support payloads but that is expensive to maintain between
versions, especially when APIs change so I'm looking for a less invasive
way of supporting the same capability. I believe I have a Lucene solution
for handling any field type, though i will obviously need to test it, if I
can figure out the best way to do the custom building of the document and
how to marshal it to the server and from the client.

On Aug 23, 2016 11:05 PM, "Alexandre Rafalovitch" <arafa...@gmail.com>
wrote:

> Have you tried starting with the DelimitedPayloadTokenFilterFactory?
> There is a sample configuration in the shipped examples:
> https://github.com/apache/lucene-solr/blob/releases/
> lucene-solr/6.1.0/solr/example/example-DIH/solr/db/
> conf/managed-schema#L625
>
> Regards,
> Alex.
> 
> Newsletter and resources for Solr beginners and intermediates:
> http://www.solr-start.com/
>
>
> On 24 August 2016 at 04:22, Jamie Johnson <jej2...@gmail.com> wrote:
> > I have a need to build custom field types that store additional metadata
> at
> > the field level in a payload.  I was thinking that I could satisfy this
> by
> > building a custom UpdateRequest that captured this additional information
> > in XML, but I am not really sure how to get at this additional
> information
> > on the server side.  Would I need to implement a custom RequestHandler to
> > handle the update, could I add a custom ContentStreamLoader to parse the
> > XML, how do I customize the creation of the lucene document once I have
> the
> > XML?  Any help/direction would really be appreciated.
> >
> > -Jamie
>

Re: Custom handler/content stream loader

2016-08-23 Thread Jamie Johnson

Ok, did a bit more digging.  It looks like if I build a custom
ContentStreamLoader I can create a custom AddOrUpdateCommand that is
ultimately responsible for building the lucene document.  So looks like if
I build a custom UpdateRequestHandler I can register my custom
ContentStreamLoader and I'll be set.  Is this the appropriate course of
action?

Lastly, I always want to use my custom UpdateRequest when adding data to
Solr from SolrJ but I don't see an easy way of doing this.  Really what I
need is to control the XML generated and being sent to the server and it
looks like this is the best way, but I wonder given the inability to plugin
a custom request writer (or something similar).  Am I barking up the wrong
tree?

On Aug 23, 2016 5:22 PM, "Jamie Johnson" <jej2...@gmail.com> wrote:

> I have a need to build custom field types that store additional metadata
> at the field level in a payload.  I was thinking that I could satisfy this
> by building a custom UpdateRequest that captured this additional
> information in XML, but I am not really sure how to get at this additional
> information on the server side.  Would I need to implement a custom
> RequestHandler to handle the update, could I add a custom
> ContentStreamLoader to parse the XML, how do I customize the creation of
> the lucene document once I have the XML?  Any help/direction would really
> be appreciated.
>
> -Jamie
>

Custom handler/content stream loader

2016-08-23 Thread Jamie Johnson

I have a need to build custom field types that store additional metadata at
the field level in a payload.  I was thinking that I could satisfy this by
building a custom UpdateRequest that captured this additional information
in XML, but I am not really sure how to get at this additional information
on the server side.  Would I need to implement a custom RequestHandler to
handle the update, could I add a custom ContentStreamLoader to parse the
XML, how do I customize the creation of the lucene document once I have the
XML?  Any help/direction would really be appreciated.

-Jamie

RE: How to Add New Fields and Fields Types Programmatically Using Solrj

2016-07-18 Thread Jeniba Johnson


Thanks a lot Steve. It worked out.

Regards,
Jeniba Johnson




-Original Message-
From: Steve Rowe [mailto:sar...@gmail.com] 
Sent: Monday, July 18, 2016 7:57 PM
To: solr-user@lucene.apache.org
Subject: Re: How to Add New Fields and Fields Types Programmatically Using Solrj

Hi Jeniba,

You can add fields and field types using Solrj with SchemaRequest.Update 
subclasses - see here for a list: 
<http://lucene.apache.org/solr/6_1_0/solr-solrj/org/apache/solr/client/solrj/request/schema/SchemaRequest.Update.html>

There are quite a few examples of doing both in the tests: 
<https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;a=blob;f=solr/solrj/src/test/org/apache/solr/client/solrj/request/SchemaTest.java;h=72051b123aadb2df57f4bf19abfedb0ac0deb6cd;hb=refs/heads/branch_6_1>

--
Steve
www.lucidworks.com

> On Jul 18, 2016, at 1:59 AM, Jeniba Johnson <jeniba.john...@lntinfotech.com> 
> wrote:
> 
> 
> Hi,
> 
> I have configured solr5.3.1 and started Solr in schema less mode. Using 
> SolrInputDocument, Iam able to add new fields in solrconfig.xml using Solrj.
> How to specify the field type of a field using Solrj.
> 
> Eg  required="true" multivalued="false" />
> 
> How can I add field type properties using SolrInputDocument programmatically 
> using Solrj? Can anyone help with it?
> 
> 
> 
> Regards,
> Jeniba Johnson
> 
> 
> 
> 
> The contents of this e-mail and any attachment(s) may contain confidential or 
> privileged information for the intended recipient(s). Unintended recipients 
> are prohibited from taking action on the basis of information in this e-mail 
> and using or disseminating the information, and must notify the sender and 
> delete it from their system. L Infotech will not accept responsibility or 
> liability for the accuracy or completeness of, or the presence of any virus 
> or disabling code in this e-mail"

FW: How to Add New Fields and Fields Types Programmatically Using Solrj

2016-07-18 Thread Jeniba Johnson


Hi,

I have configured solr5.3.1 and started Solr in schema less mode. Using 
SolrInputDocument, Iam able to add new fields in solrconfig.xml using Solrj.
How to specify the field type of a field using Solrj.

Eg 

How can I add field type properties using SolrInputDocument programmatically 
using Solrj? Can anyone help with it?



Regards,
Jeniba Johnson




The contents of this e-mail and any attachment(s) may contain confidential or 
privileged information for the intended recipient(s). Unintended recipients are 
prohibited from taking action on the basis of information in this e-mail and 
using or disseminating the information, and must notify the sender and delete 
it from their system. L Infotech will not accept responsibility or liability 
for the accuracy or completeness of, or the presence of any virus or disabling 
code in this e-mail"

Re: Escaping characters in a nested query

2016-02-27 Thread Jamie Johnson

Thanks Mikhail, I'll give this a try.
On Feb 27, 2016 5:27 AM, "Mikhail Khludnev" <mkhlud...@griddynamics.com>
wrote:

> Hello,
>
> I suggest q=_query_:{!lucene v=$subq}=my_awesome:less%20pain&...
> or even starting for some version (I don't remember) _query_ pseudo field
> is not necessary ie q=foo +bar {!lucene
> v=$subq}=my_awesome:less%20pain&
>
>
> On Fri, Feb 26, 2016 at 10:38 PM, Jamie Johnson <jej2...@gmail.com> wrote:
>
> > When using nested queries of the form q=_query_:"my_awesome:query", what
> > needs to be escaped in the query portion?  Just using the admin UI the
> > following works
> >
> > _query_:"+field\\:with\\:special"
> > _query_:"+field\\:with\\~special"
> > _query_:"+field\\:with\\"
> >
> > but the same doesn't work for quotes, i.e.
> >
> > _query_:"+field\\:with\\"special"
> >
> > throws a org.apache.solr.search.SyntaxError.  If I do
> >
> > _query_:"+field\\:with\\\"special" it executes, though I am not sure why
> > quotes require different escaping.
> >
> > I am currently running solr 4.10.4, any thoughts?
> >
>
>
>
> --
> Sincerely yours
> Mikhail Khludnev
> Principal Engineer,
> Grid Dynamics
>
> <http://www.griddynamics.com>
> <mkhlud...@griddynamics.com>
>

Escaping characters in a nested query

2016-02-26 Thread Jamie Johnson

When using nested queries of the form q=_query_:"my_awesome:query", what
needs to be escaped in the query portion?  Just using the admin UI the
following works

_query_:"+field\\:with\\:special"
_query_:"+field\\:with\\~special"
_query_:"+field\\:with\\"

but the same doesn't work for quotes, i.e.

_query_:"+field\\:with\\"special"

throws a org.apache.solr.search.SyntaxError.  If I do

_query_:"+field\\:with\\\"special" it executes, though I am not sure why
quotes require different escaping.

I am currently running solr 4.10.4, any thoughts?

Solr InputFormat Exist?

2016-02-22 Thread Jamie Johnson

Is there an equivalent of the ESInputFormat (
https://github.com/elastic/elasticsearch-hadoop/blob/03c056142a5ab7422b81bb1f519fd67a9581405f/mr/src/main/java/org/elasticsearch/hadoop/mr/EsInputFormat.java)
in Solr or is there any work that is planned in this regard?

-Jamie

Re: Add support in FacetsComponent for facet.method=uif

2016-01-03 Thread Jamie Johnson

The patch adds facet.method=uif and then delegates all of the work to the
JSON Faceting API to do the work.  I had originally added a facet.method=dv
and made the original facet.method=fc work using the UnInvertedField but
wanted to avoid making a change that would introduce unexpected behavior.
While I think it's strange that facet.method=dv does not exist and fc
defaults to dv I think if we wanted to change that it should be done in
another ticket.

On Sun, Jan 3, 2016 at 4:18 PM, William Bell <billnb...@gmail.com> wrote:

> Interesting that facet.method=dv or facet.method=uif. What is the
> difference?
>
> On Sun, Jan 3, 2016 at 6:44 AM, Jamie Johnson <jej2...@gmail.com> wrote:
>
> > For those interested I created a separate jira issue for this but forgot
> to
> > attach earlier.
> >
> > https://issues.apache.org/jira/browse/SOLR-8466
> > On Jan 2, 2016 8:45 PM, "William Bell" <billnb...@gmail.com> wrote:
> >
> > > Yes we would like backward compatibility. We cannot switch all the
> facet
> > > fields to DocValues and our faceting is slow.
> > >
> > > Please...
> > >
> > > On Fri, Jan 1, 2016 at 7:41 AM, Jamie Johnson <jej2...@gmail.com>
> wrote:
> > >
> > > > Is there any interest in this?  While i think it's important and
> inline
> > > > with faceting available in the new json facet api, I've seen no
> > > discussion
> > > > on it so I'm wondering if it's best I add support for this using a
> > custom
> > > > facet component even though the majority of the component will be a
> > copy
> > > > which is prefer to not need to maintain separately.
> > > >
> > > > Jamie
> > > > On Dec 22, 2015 12:37 PM, "Jamie Johnson" <jej2...@gmail.com> wrote:
> > > >
> > > > > I had previously piggybacked on another post, but I think it may
> have
> > > > been
> > > > > lost there.  I had a need to do UnInvertedField based faceting in
> the
> > > > > FacetsComponent and as such started looking at what would be
> required
> > > to
> > > > > implement something similar to what the JSON Facets based API does
> in
> > > > this
> > > > > regard.  The patch that I have in this regard works and is attached
> > to
> > > > > https://issues.apache.org/jira/browse/SOLR-8096, is that
> appropriate
> > > or
> > > > > should I create a new ticket to specifically add this support?
> > > > >
> > > > > -Jamie
> > > > >
> > > >
> > >
> > >
> > >
> > > --
> > > Bill Bell
> > > billnb...@gmail.com
> > > cell 720-256-8076
> > >
> >
>
>
>
> --
> Bill Bell
> billnb...@gmail.com
> cell 720-256-8076
>

Re: Add support in FacetsComponent for facet.method=uif

2016-01-03 Thread Jamie Johnson

Are you looking at 8466 or 8096?  The patch on 8466 is the one I'm
referencing.  I should remove the other as it is more change than I think
should be done for this ticket.

Jamie
On Jan 3, 2016 8:47 PM, "William Bell" <billnb...@gmail.com> wrote:

> Ok the path appears to have dv and uif in there.?
>
> On Sun, Jan 3, 2016 at 4:40 PM, Jamie Johnson <jej2...@gmail.com> wrote:
>
> > The patch adds facet.method=uif and then delegates all of the work to the
> > JSON Faceting API to do the work.  I had originally added a
> facet.method=dv
> > and made the original facet.method=fc work using the UnInvertedField but
> > wanted to avoid making a change that would introduce unexpected behavior.
> > While I think it's strange that facet.method=dv does not exist and fc
> > defaults to dv I think if we wanted to change that it should be done in
> > another ticket.
> >
> > On Sun, Jan 3, 2016 at 4:18 PM, William Bell <billnb...@gmail.com>
> wrote:
> >
> > > Interesting that facet.method=dv or facet.method=uif. What is the
> > > difference?
> > >
> > > On Sun, Jan 3, 2016 at 6:44 AM, Jamie Johnson <jej2...@gmail.com>
> wrote:
> > >
> > > > For those interested I created a separate jira issue for this but
> > forgot
> > > to
> > > > attach earlier.
> > > >
> > > > https://issues.apache.org/jira/browse/SOLR-8466
> > > > On Jan 2, 2016 8:45 PM, "William Bell" <billnb...@gmail.com> wrote:
> > > >
> > > > > Yes we would like backward compatibility. We cannot switch all the
> > > facet
> > > > > fields to DocValues and our faceting is slow.
> > > > >
> > > > > Please...
> > > > >
> > > > > On Fri, Jan 1, 2016 at 7:41 AM, Jamie Johnson <jej2...@gmail.com>
> > > wrote:
> > > > >
> > > > > > Is there any interest in this?  While i think it's important and
> > > inline
> > > > > > with faceting available in the new json facet api, I've seen no
> > > > > discussion
> > > > > > on it so I'm wondering if it's best I add support for this using
> a
> > > > custom
> > > > > > facet component even though the majority of the component will
> be a
> > > > copy
> > > > > > which is prefer to not need to maintain separately.
> > > > > >
> > > > > > Jamie
> > > > > > On Dec 22, 2015 12:37 PM, "Jamie Johnson" <jej2...@gmail.com>
> > wrote:
> > > > > >
> > > > > > > I had previously piggybacked on another post, but I think it
> may
> > > have
> > > > > > been
> > > > > > > lost there.  I had a need to do UnInvertedField based faceting
> in
> > > the
> > > > > > > FacetsComponent and as such started looking at what would be
> > > required
> > > > > to
> > > > > > > implement something similar to what the JSON Facets based API
> > does
> > > in
> > > > > > this
> > > > > > > regard.  The patch that I have in this regard works and is
> > attached
> > > > to
> > > > > > > https://issues.apache.org/jira/browse/SOLR-8096, is that
> > > appropriate
> > > > > or
> > > > > > > should I create a new ticket to specifically add this support?
> > > > > > >
> > > > > > > -Jamie
> > > > > > >
> > > > > >
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > Bill Bell
> > > > > billnb...@gmail.com
> > > > > cell 720-256-8076
> > > > >
> > > >
> > >
> > >
> > >
> > > --
> > > Bill Bell
> > > billnb...@gmail.com
> > > cell 720-256-8076
> > >
> >
>
>
>
> --
> Bill Bell
> billnb...@gmail.com
> cell 720-256-8076
>

Re: Add support in FacetsComponent for facet.method=uif

2016-01-03 Thread Jamie Johnson

For those interested I created a separate jira issue for this but forgot to
attach earlier.

https://issues.apache.org/jira/browse/SOLR-8466
On Jan 2, 2016 8:45 PM, "William Bell" <billnb...@gmail.com> wrote:

> Yes we would like backward compatibility. We cannot switch all the facet
> fields to DocValues and our faceting is slow.
>
> Please...
>
> On Fri, Jan 1, 2016 at 7:41 AM, Jamie Johnson <jej2...@gmail.com> wrote:
>
> > Is there any interest in this?  While i think it's important and inline
> > with faceting available in the new json facet api, I've seen no
> discussion
> > on it so I'm wondering if it's best I add support for this using a custom
> > facet component even though the majority of the component will be a copy
> > which is prefer to not need to maintain separately.
> >
> > Jamie
> > On Dec 22, 2015 12:37 PM, "Jamie Johnson" <jej2...@gmail.com> wrote:
> >
> > > I had previously piggybacked on another post, but I think it may have
> > been
> > > lost there.  I had a need to do UnInvertedField based faceting in the
> > > FacetsComponent and as such started looking at what would be required
> to
> > > implement something similar to what the JSON Facets based API does in
> > this
> > > regard.  The patch that I have in this regard works and is attached to
> > > https://issues.apache.org/jira/browse/SOLR-8096, is that appropriate
> or
> > > should I create a new ticket to specifically add this support?
> > >
> > > -Jamie
> > >
> >
>
>
>
> --
> Bill Bell
> billnb...@gmail.com
> cell 720-256-8076
>

Re: Add support in FacetsComponent for facet.method=uif

2016-01-01 Thread Jamie Johnson

Is there any interest in this?  While i think it's important and inline
with faceting available in the new json facet api, I've seen no discussion
on it so I'm wondering if it's best I add support for this using a custom
facet component even though the majority of the component will be a copy
which is prefer to not need to maintain separately.

Jamie
On Dec 22, 2015 12:37 PM, "Jamie Johnson" <jej2...@gmail.com> wrote:

> I had previously piggybacked on another post, but I think it may have been
> lost there.  I had a need to do UnInvertedField based faceting in the
> FacetsComponent and as such started looking at what would be required to
> implement something similar to what the JSON Facets based API does in this
> regard.  The patch that I have in this regard works and is attached to
> https://issues.apache.org/jira/browse/SOLR-8096, is that appropriate or
> should I create a new ticket to specifically add this support?
>
> -Jamie
>

Re: Adding the same field value question

2015-12-28 Thread Jamie Johnson

Yes the field is multi valued
On Dec 28, 2015 3:48 PM, "Jack Krupansky" <jack.krupan...@gmail.com> wrote:

> Is the field multivalued?
>
> -- Jack Krupansky
>
> On Sun, Dec 27, 2015 at 11:16 PM, Jamie Johnson <jej2...@gmail.com> wrote:
>
> > What is the difference of adding a field with the same value twice or
> > adding it once and boosting the field on add?  Is there a situation where
> > one approach is preferred?
> >
> > Jamie
> >
>

Re: Adding the same field value question

2015-12-28 Thread Jamie Johnson

Thanks, I wasn't sure if adding twice and boosting results in a similar
thing happening under the hood or not.  Appreciate the response.

Jamie
On Dec 28, 2015 9:08 AM, "Binoy Dalal" <binoydala...@gmail.com> wrote:

> There's no benefit in adding the same field twice because that'll just
> increase the size of your index without providing any real benefits at
> query time.
> For increasing the scores, boosting is definitely the way to go.
>
> On Mon, 28 Dec 2015, 09:46 Jamie Johnson <jej2...@gmail.com> wrote:
>
> > What is the difference of adding a field with the same value twice or
> > adding it once and boosting the field on add?  Is there a situation where
> > one approach is preferred?
> >
> > Jamie
> >
> --
> Regards,
> Binoy Dalal
>

Re: Solr - facet fields that contain other facet fields

2015-12-28 Thread Jamie Johnson

Can you do the opposite?  Index into an unanalyzed field and copy into the
analyzed?

If I remember correctly facets are based off of indexed values so if you
tokenize the field then the facets will be as you are seeing now.
On Dec 28, 2015 9:45 AM, "Kevin Lopez"  wrote:

> *What I am trying to accomplish: *
> Generate a facet based on the documents uploaded and a text file containing
> terms from a domain/ontology such that a facet is shown if a term is in the
> text file and in a document (key phrase extraction).
>
> *The problem:*
> When I select the facet for the term "*not necessarily*" (we see there is a
> space) and I get the results for the term "*not*". The field is tokenized
> and multivalued. This leads me to believe that I can not use a tokenized
> field as a facet field. I tried to copy the values of the field to a text
> field with a keywordtokenizer. I am told when checking the schema browser:
> "Sorry, no Term Info available :(" This is after I delete the old index and
> upload the documents again. The facet is coming from a field that is
> already copied from another field, so I cannot copy this field to a text
> field with a keywordtokenizer or strfield. What can I do to fix this? Is
> there an alternate way to accomplish this?
>
> *Here is my configuration:*
>
> 
>
>  multiValued="true" type="Cytokine_Pass"/>
> 
> 
> 
> 
> 
>
>stored="true" multiValued="true"
>termPositions="true"
>termVectors="true"
>termOffsets="true"/>
>  sortMissingLast="true" omitNorms="true">
> 
>  minShingleSize="2" maxShingleSize="5"
> outputUnigramsIfNoShingles="true"
> />
>   
>   
>  synonyms="synonyms_ColonCancer.txt" ignoreCase="true" expand="true"
> tokenizerFactory="solr.KeywordTokenizerFactory"/>
>  words="prefLabels_ColonCancer.txt" ignoreCase="true"/>
>   
> 
> 
>
> Regards,
>
> Kevin
>

Adding the same field value question

2015-12-27 Thread Jamie Johnson

What is the difference of adding a field with the same value twice or
adding it once and boosting the field on add?  Is there a situation where
one approach is preferred?

Jamie

Limit fields returned in solr based on content

2015-12-24 Thread Jamie Johnson

I have what I believe is a unique requirement discussed here in the past to
limit data sent to users based on some marking in the field.

Re: Limit fields returned in solr based on content

2015-12-24 Thread Jamie Johnson

Sorry hit send too early

Is there a mechanism in solr/lucene that allows customization of the fields
returned that would have access to the field content and payload?
On Dec 24, 2015 4:15 PM, "Jamie Johnson" <jej2...@gmail.com> wrote:

> I have what I believe is a unique requirement discussed here in the past
> to limit data sent to users based on some marking in the field.
>

Re: Limit fields returned in solr based on content

2015-12-24 Thread Jamie Johnson

I'm currently doing it in a middle tier, but it means I can't return
results from the index to users, instead it needs to always hit the store,
not the end of the world but was hoping I could use the fields in the index
as a quick first view and then get the full result when the user selected
an entry.

Jamie
On Dec 24, 2015 4:26 PM, "Walter Underwood" <wun...@wunderwood.org> wrote:

> I would do that in a middle tier. You can’t do every single thing in Solr.
>
> wunder
> Walter Underwood
> wun...@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
>
>
> > On Dec 24, 2015, at 1:21 PM, Upayavira <u...@odoko.co.uk> wrote:
> >
> > You could create a custom DocTransformer. They can enhance the fields
> > included in the search results. So, instead of fl=somefield you could
> > have fl=[my-filter:somefield], and your MyFieldDocTransformer makes the
> > decision as to whether or not to include somefield in the output.
> >
> > This would of course, require some Java coding.
> >
> > Upayavira
> >
> > On Thu, Dec 24, 2015, at 09:17 PM, Jamie Johnson wrote:
> >> Sorry hit send too early
> >>
> >> Is there a mechanism in solr/lucene that allows customization of the
> >> fields
> >> returned that would have access to the field content and payload?
> >> On Dec 24, 2015 4:15 PM, "Jamie Johnson" <jej2...@gmail.com> wrote:
> >>
> >>> I have what I believe is a unique requirement discussed here in the
> past
> >>> to limit data sent to users based on some marking in the field.
> >>>
>
>

Re: Limit fields returned in solr based on content

2015-12-24 Thread Jamie Johnson

Would the doc transformer have access to payloads?
On Dec 24, 2015 4:21 PM, "Upayavira" <u...@odoko.co.uk> wrote:

> You could create a custom DocTransformer. They can enhance the fields
> included in the search results. So, instead of fl=somefield you could
> have fl=[my-filter:somefield], and your MyFieldDocTransformer makes the
> decision as to whether or not to include somefield in the output.
>
> This would of course, require some Java coding.
>
> Upayavira
>
> On Thu, Dec 24, 2015, at 09:17 PM, Jamie Johnson wrote:
> > Sorry hit send too early
> >
> > Is there a mechanism in solr/lucene that allows customization of the
> > fields
> > returned that would have access to the field content and payload?
> > On Dec 24, 2015 4:15 PM, "Jamie Johnson" <jej2...@gmail.com> wrote:
> >
> > > I have what I believe is a unique requirement discussed here in the
> past
> > > to limit data sent to users based on some marking in the field.
> > >
>

Add support in FacetsComponent for facet.method=uif

2015-12-22 Thread Jamie Johnson

I had previously piggybacked on another post, but I think it may have been
lost there.  I had a need to do UnInvertedField based faceting in the
FacetsComponent and as such started looking at what would be required to
implement something similar to what the JSON Facets based API does in this
regard.  The patch that I have in this regard works and is attached to
https://issues.apache.org/jira/browse/SOLR-8096, is that appropriate or
should I create a new ticket to specifically add this support?

-Jamie

Re: facet component and uninverted field

2015-12-21 Thread Jamie Johnson

Thanks, the issue I'm having is that there is no equivalent to method uif
for the standard facet component.  We'll see how SOLR-8096 shakes out.

On Sun, Dec 20, 2015 at 11:29 PM, Upayavira <u...@odoko.co.uk> wrote:

>
>
> On Sun, Dec 20, 2015, at 01:32 PM, Jamie Johnson wrote:
> > For those interested I've attached an initial patch to
> > https://issues.apache.org/jira/browse/SOLR-8096 to start supporting uif
> > in
> > FacetComponent via JSON facet api.
> > On Dec 18, 2015 9:22 PM, "Jamie Johnson" <jej2...@gmail.com> wrote:
> >
> > > I recently saw that the new JSON Facet API supports controlling the
> facet
> > > method that is used and was wondering if there was any support for
> doing
> > > the same thing in the original facet component?
> > >
> > > Also is there a plan to deprecate one of these components over the
> other
> > > or is there an expectation that both will continue to live on?
> Curious if
> > > I should bite the bullet and transition to the new JSON Facet API or
> not.
>
> facet.method specifies the method for faceting! But I suspect you've
> found that already.
>
> As to deprecation, these sort of things in my experience don't get
> deprecated as such, we just find that one gets better than the other -
> the better it gets, the more adoption it sees.
>
> Upayavira
>

Re: facet component and uninverted field

2015-12-20 Thread Jamie Johnson

For those interested I've attached an initial patch to
https://issues.apache.org/jira/browse/SOLR-8096 to start supporting uif in
FacetComponent via JSON facet api.
On Dec 18, 2015 9:22 PM, "Jamie Johnson" <jej2...@gmail.com> wrote:

> I recently saw that the new JSON Facet API supports controlling the facet
> method that is used and was wondering if there was any support for doing
> the same thing in the original facet component?
>
> Also is there a plan to deprecate one of these components over the other
> or is there an expectation that both will continue to live on?  Curious if
> I should bite the bullet and transition to the new JSON Facet API or not.
>

Re: faceting is unusable slow since upgrade to 5.3.0

2015-12-19 Thread Jamie Johnson

Bill,

Check out the patch attached to
https://issues.apache.org/jira/browse/SOLR-8096.  I had considered making
the method uif after I had done most of the work, it would be trivial to
change and would probably be more aligned with not adding unexpected
changes to people that are currently using fc.

-Jamie

On Sat, Dec 19, 2015 at 11:03 PM, William Bell  wrote:

> Can we add method=uif back when not using the JSON Facet API too?
>
> That would help a lot of people.
>
> On Thu, Dec 17, 2015 at 7:17 AM, Yonik Seeley  wrote:
>
> > On Wed, Dec 16, 2015 at 4:57 AM, Vincenzo D'Amore 
> > wrote:
> > > Hi all,
> > >
> > > given that solr 5.4 is finally released, is this what's more stable and
> > > efficient version of solrcloud ?
> > >
> > > I have a website which receives many search requests. It serve normally
> > > about 2000 concurrent requests, but sometime there are peak from 4000
> to
> > > 1 requests in few seconds.
> > >
> > > On January I'll have a chance to upgrade my old SolrCloud 4.8.1 cluster
> > to
> > > a new brand version, but following this thread I read about the
> problems
> > > that can occur upgrading to latest version.
> > >
> > > I have seen that issue SOLR-7730 "speed-up faceting on doc values
> fields"
> > > is fixed in 5.4.
> > >
> > > I'm using standard faceting without docValues. Should I add docValues
> in
> > > order to benefit of such fix?
> >
> > You'll have to try it I think...
> > DocValues have a lot of advantages (much less heap consumption, and
> > much smaller overhead when opening a new searcher), but they can often
> > be slower as well.
> >
> > Comparing 4x to 5x non-docvalues, top-level field caches were removed
> > by lucene, and while that benefits certain things like NRT (opening a
> > new searcher very often), it will hurt performance for other
> > configurations.
> >
> > The JSON Facet API currently allows you to pick your strategy via the
> > "method" param for multi-valued string fields without docvalues:
> > "uif" (UninvertedField) gets you the top-level strategy from Solr 4,
> > while "dv" (DocValues built on-the-fly) gets you the NRT-friendly
> > "per-segment" strategy.
> >
> > -Yonik
> >
>
>
>
> --
> Bill Bell
> billnb...@gmail.com
> cell 720-256-8076
>

Re: faceting is unusable slow since upgrade to 5.3.0

2015-12-18 Thread Jamie Johnson

Can we still specify the cache implementation for the field cache?  When
this change occurred to faceting (uninverting reader vs field ) it
prevented us from moving to 5.x but if we can get the 4.x functionality
using that api we could look to port to the latest.

Jamie
On Dec 17, 2015 9:18 AM, "Yonik Seeley"  wrote:

> On Wed, Dec 16, 2015 at 4:57 AM, Vincenzo D'Amore 
> wrote:
> > Hi all,
> >
> > given that solr 5.4 is finally released, is this what's more stable and
> > efficient version of solrcloud ?
> >
> > I have a website which receives many search requests. It serve normally
> > about 2000 concurrent requests, but sometime there are peak from 4000 to
> > 1 requests in few seconds.
> >
> > On January I'll have a chance to upgrade my old SolrCloud 4.8.1 cluster
> to
> > a new brand version, but following this thread I read about the problems
> > that can occur upgrading to latest version.
> >
> > I have seen that issue SOLR-7730 "speed-up faceting on doc values fields"
> > is fixed in 5.4.
> >
> > I'm using standard faceting without docValues. Should I add docValues in
> > order to benefit of such fix?
>
> You'll have to try it I think...
> DocValues have a lot of advantages (much less heap consumption, and
> much smaller overhead when opening a new searcher), but they can often
> be slower as well.
>
> Comparing 4x to 5x non-docvalues, top-level field caches were removed
> by lucene, and while that benefits certain things like NRT (opening a
> new searcher very often), it will hurt performance for other
> configurations.
>
> The JSON Facet API currently allows you to pick your strategy via the
> "method" param for multi-valued string fields without docvalues:
> "uif" (UninvertedField) gets you the top-level strategy from Solr 4,
> while "dv" (DocValues built on-the-fly) gets you the NRT-friendly
> "per-segment" strategy.
>
> -Yonik
>

Re: faceting is unusable slow since upgrade to 5.3.0

2015-12-18 Thread Jamie Johnson

Also can we get the capability to choose the method of faceting in the
older faceting component?  I'm not looking for complete feature parity just
the ability to specify the method.  As always thanks.

On Fri, Dec 18, 2015 at 8:04 AM, Jamie Johnson <jej2...@gmail.com> wrote:

> Can we still specify the cache implementation for the field cache?  When
> this change occurred to faceting (uninverting reader vs field ) it
> prevented us from moving to 5.x but if we can get the 4.x functionality
> using that api we could look to port to the latest.
>
> Jamie
> On Dec 17, 2015 9:18 AM, "Yonik Seeley" <ysee...@gmail.com> wrote:
>
>> On Wed, Dec 16, 2015 at 4:57 AM, Vincenzo D'Amore <v.dam...@gmail.com>
>> wrote:
>> > Hi all,
>> >
>> > given that solr 5.4 is finally released, is this what's more stable and
>> > efficient version of solrcloud ?
>> >
>> > I have a website which receives many search requests. It serve normally
>> > about 2000 concurrent requests, but sometime there are peak from 4000 to
>> > 1 requests in few seconds.
>> >
>> > On January I'll have a chance to upgrade my old SolrCloud 4.8.1 cluster
>> to
>> > a new brand version, but following this thread I read about the problems
>> > that can occur upgrading to latest version.
>> >
>> > I have seen that issue SOLR-7730 "speed-up faceting on doc values
>> fields"
>> > is fixed in 5.4.
>> >
>> > I'm using standard faceting without docValues. Should I add docValues in
>> > order to benefit of such fix?
>>
>> You'll have to try it I think...
>> DocValues have a lot of advantages (much less heap consumption, and
>> much smaller overhead when opening a new searcher), but they can often
>> be slower as well.
>>
>> Comparing 4x to 5x non-docvalues, top-level field caches were removed
>> by lucene, and while that benefits certain things like NRT (opening a
>> new searcher very often), it will hurt performance for other
>> configurations.
>>
>> The JSON Facet API currently allows you to pick your strategy via the
>> "method" param for multi-valued string fields without docvalues:
>> "uif" (UninvertedField) gets you the top-level strategy from Solr 4,
>> while "dv" (DocValues built on-the-fly) gets you the NRT-friendly
>> "per-segment" strategy.
>>
>> -Yonik
>>
>

facet component and uninverted field

2015-12-18 Thread Jamie Johnson

I recently saw that the new JSON Facet API supports controlling the facet
method that is used and was wondering if there was any support for doing
the same thing in the original facet component?

Also is there a plan to deprecate one of these components over the other or
is there an expectation that both will continue to live on?  Curious if I
should bite the bullet and transition to the new JSON Facet API or not.

Re: Append fields to a document

2015-12-16 Thread Jamie Johnson

The expense is in gathering the pieces to do the indexing.  There isn't
much that I can do in that regard unfortunately.  I need to investigate
storing the fields, if they aren't returned is the expense just size on
disk or is there a memory cost as well?
On Dec 16, 2015 7:43 PM, "Alexandre Rafalovitch" <arafa...@gmail.com> wrote:

> ExternalFileField might be useful in some situations.
>
> But also, is it possible that your Solr schema configuration is not
> best suited for your domain? Is it - for example - possible that the
> additional data should be in child records?
>
> Pure guesswork here, not enough information. But, as described, Solr
> will not be able to fulfill your needs easily. Something will need to
> change.
>
> Regards,
>Alex.
>
> 
> Newsletter and resources for Solr beginners and intermediates:
> http://www.solr-start.com/
>
>
> On 16 December 2015 at 22:09, Jamie Johnson <jej2...@gmail.com> wrote:
> > I have a use case where we only need to append some fields to a document.
> > To retrieve the full representation is very expensive but I can easily
> get
> > the deltas.  Is it possible to just add fields to an existing Solr
> > document?  I experimented with using overwrite=false, but that resulted
> in
> > two documents with the same uniqueKey in the index (which makes sense).
> Is
> > there a way to accomplish what I'm looking to do in Solr?  My fields
> aren't
> > all stored and think it will be too expensive for me to make that change.
> > Any thoughts would be really appreciated.
>

Append fields to a document

2015-12-16 Thread Jamie Johnson

I have a use case where we only need to append some fields to a document.
To retrieve the full representation is very expensive but I can easily get
the deltas.  Is it possible to just add fields to an existing Solr
document?  I experimented with using overwrite=false, but that resulted in
two documents with the same uniqueKey in the index (which makes sense).  Is
there a way to accomplish what I'm looking to do in Solr?  My fields aren't
all stored and think it will be too expensive for me to make that change.
Any thoughts would be really appreciated.

SOLR-7996

2015-12-14 Thread Jamie Johnson

Has anyone looked at this issue? I'd be willing to take a stab at it if
someone could provide some high level design guidance. This would be a
critical piece preventing us from moving to version 5.

Jamie

Re: Child document and parent document with same key

2015-11-06 Thread Jamie Johnson

Thanks that's what I suspected given what I'm seeing but wanted to make
sure.  Again thanks
On Nov 5, 2015 1:08 PM, "Mikhail Khludnev" <mkhlud...@griddynamics.com>
wrote:

> On Fri, Oct 16, 2015 at 10:41 PM, Jamie Johnson <jej2...@gmail.com> wrote:
>
> > Is this expected to work?
>
>
> I think it is. I'm still not sure I understand the question. But let me
> bring some details from SOLR-3076:
> - Solr's  backs on Lucene's "deleteTerm" which is supplied into
> indexWriter.updateDocument();
> - when parent document has children,  is not a deleteTerm but
> its' value is used for "deleteTerm" for field "_root_" see
>
> https://github.com/apache/lucene-solr/blob/trunk/solr/core/src/java/org/apache/solr/update/DirectUpdateHandler2.java#L251
> - thus for block updates uniqueKey is (almost) meaningless.
> It lacks of elegance, but that's it.
>
> --
> Sincerely yours
> Mikhail Khludnev
> Principal Engineer,
> Grid Dynamics
>
> <http://www.griddynamics.com>
> <mkhlud...@griddynamics.com>
>

Re: Child document and parent document with same key

2015-11-05 Thread Jamie Johnson

The field is "key" and this is the value of unique key in schema.xml
On Oct 17, 2015 3:23 AM, "Mikhail Khludnev" <mkhlud...@griddynamics.com>
wrote:

> Hello,
>
> What are the field names for parent and child docs exactly?
> Whats'  in schema.xml?
> What you've got if you actually try to do this?
>
> On Fri, Oct 16, 2015 at 12:41 PM, Jamie Johnson <jej2...@gmail.com> wrote:
>
> > I am looking at using child documents and noticed that if I specify a
> child
> > and parent with the same key solr indexes this fine and I can retrieve
> both
> > documents separately.  Is this expected to work?
> >
> > -Jamie
> >
>
>
>
> --
> Sincerely yours
> Mikhail Khludnev
> Principal Engineer,
> Grid Dynamics
>
> <http://www.griddynamics.com>
> <mkhlud...@griddynamics.com>
>

Payload doesn't apply to WordDelimiterFilterFactory-generated tokens

2015-10-26 Thread Jamie Johnson

I came across this post (
http://lucene.472066.n3.nabble.com/Payload-doesn-t-apply-to-WordDelimiterFilterFactory-generated-tokens-td3136748.html)
and tried to find a JIRA for this task.  Was one ever created?  If not I'd
be happy to create it if this is still something that makes sense or if
instead there is another recommended approach for supporting cloning
attributes like payload from the source token stream in the
WordDelimiterFilterFactory.

Re: Order of actions in Update request

2015-10-25 Thread Jamie Johnson

Yes if they are in separate requests I imagine it would work though I
haven't tested.  I was wondering if there was a way to execute these
actions in a single request and maintain order.
On Oct 24, 2015 3:25 PM, "Shawn Heisey" <apa...@elyograg.org> wrote:

> On 10/24/2015 5:21 AM, Jamie Johnson wrote:
> > Looking at the code and jira I see that ordering actions in solrj update
> > request is currently not supported but I'd like to know if there is any
> > other way to get this capability.  I took a quick look at the XML loader
> > and it appears to process actions as it sees them so if the order was
> > changed to order the actions as
> >
> > Add
> > Delete
> > Add
> >
> > Vs
> > Add
> > Add
> > Delete
> >
> > Would this cause any issues with the update?  Would it achieve the
> desired
> > result?  Are there any other options for ordering actions as they were
> > provided to the update request?
>
> If those three actions are in separate update requests using
> HttpSolrClient or CloudSolrClient in a single thread, I would expect
> them to be executed in the order you make the requests.  If you're using
> multiple threads, then you probably cannot guarantee the order of the
> requests.
>
> Are you using one of those clients in a single thread and seeing
> something other than what I have described?  If so, I think that might
> be a bug.
>
> If you're using ConcurrentUpdateSolrClient, I don't think you can
> guarantee order.  That client has multiple threads pulling the requests
> out of an internal queue.  If some requests complete substantially
> faster than others, they could happen out of order.  The concurrent
> client is a poor choice for anything but bulk inserts, and because of
> the fact that it ignores almost every error that happens while it runs,
> it often is not a good choice for that either.
>
> Thanks,
> Shawn
>
>

Order of actions in Update request

2015-10-24 Thread Jamie Johnson

Looking at the code and jira I see that ordering actions in solrj update
request is currently not supported but I'd like to know if there is any
other way to get this capability.  I took a quick look at the XML loader
and it appears to process actions as it sees them so if the order was
changed to order the actions as

Add
Delete
Add

Vs
Add
Add
Delete

Would this cause any issues with the update?  Would it achieve the desired
result?  Are there any other options for ordering actions as they were
provided to the update request?

Jamie

Child document and parent document with same key

2015-10-16 Thread Jamie Johnson

I am looking at using child documents and noticed that if I specify a child
and parent with the same key solr indexes this fine and I can retrieve both
documents separately.  Is this expected to work?

-Jamie

SolrCloud NoAuth for /unrelatednode error

2015-10-09 Thread Jamie Johnson

I am getting an error that essentially says solr does not have auth for
/unrelatednode/... I would be ok with the error being displayed, but I
think this may be what is causing my solr instances to be shown as down.
Currently I'm issuing the following command

http://localhost:8983/solr/admin/collections?action=CREATE=collection=2=2=config=2

I see the collection and shards being created, but they appear as down in
the clusterstate.json.  The only exception I see when trying to show the
Cloud graph is shown below.  Could this be the cause for the shards showing
up as down?

WARN  ZookeeperInfoServlet - Keeper Exception
org.apache.zookeeper.KeeperException$NoAuthException: KeeperErrorCode =
NoAuth for /unrelatednode/foo/bar
at org.apache.zookeeper.KeeperException.create(KeeperException.java:113)
at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1155)
at
org.apache.solr.common.cloud.SolrZkClient$8.execute(SolrZkClient.java:308)
at
org.apache.solr.common.cloud.SolrZkClient$8.execute(SolrZkClient.java:305)
at
org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkCmdExecutor.java:74)
at org.apache.solr.common.cloud.SolrZkClient.getData(SolrZkClient.java:305)
at
org.apache.solr.servlet.ZookeeperInfoServlet$ZKPrinter.printTree(ZookeeperInfoServlet.java:279)
at
org.apache.solr.servlet.ZookeeperInfoServlet$ZKPrinter.printTree(ZookeeperInfoServlet.java:322)
at
org.apache.solr.servlet.ZookeeperInfoServlet$ZKPrinter.printTree(ZookeeperInfoServlet.java:322)
at
org.apache.solr.servlet.ZookeeperInfoServlet$ZKPrinter.printTree(ZookeeperInfoServlet.java:322)
at
org.apache.solr.servlet.ZookeeperInfoServlet$ZKPrinter.printTree(ZookeeperInfoServlet.java:322)
at
org.apache.solr.servlet.ZookeeperInfoServlet$ZKPrinter.print(ZookeeperInfoServlet.java:226)
at
org.apache.solr.servlet.ZookeeperInfoServlet.doGet(ZookeeperInfoServlet.java:104)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:687)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:790)
at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:769)
at
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1667)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:466)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:207)
at
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1650)
at
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:583)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
at
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:577)
at
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:223)
at
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1125)
at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515)
at
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
at
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1059)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
at
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:215)
at
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:110)
at
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97)
at org.eclipse.jetty.server.Server.handle(Server.java:497)
at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:311)
at
org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:248)
at
org.eclipse.jetty.io.AbstractConnection$2.run(AbstractConnection.java:540)
at
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:610)
at
org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:539)
at java.lang.Thread.run(Thread.java:745)

Re: SolrCloud NoAuth for /unrelatednode error

2015-10-09 Thread Jamie Johnson

Ah please ignore, it looks like this was totally unrelated and my issue was
configuration related

On Fri, Oct 9, 2015 at 11:18 AM, Jamie Johnson <jej2...@gmail.com> wrote:

> I am getting an error that essentially says solr does not have auth for
> /unrelatednode/... I would be ok with the error being displayed, but I
> think this may be what is causing my solr instances to be shown as down.
> Currently I'm issuing the following command
>
>
> http://localhost:8983/solr/admin/collections?action=CREATE=collection=2=2=config=2
>
> I see the collection and shards being created, but they appear as down in
> the clusterstate.json.  The only exception I see when trying to show the
> Cloud graph is shown below.  Could this be the cause for the shards showing
> up as down?
>
> WARN  ZookeeperInfoServlet - Keeper Exception
> org.apache.zookeeper.KeeperException$NoAuthException: KeeperErrorCode =
> NoAuth for /unrelatednode/foo/bar
> at org.apache.zookeeper.KeeperException.create(KeeperException.java:113)
> at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
> at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1155)
> at
> org.apache.solr.common.cloud.SolrZkClient$8.execute(SolrZkClient.java:308)
> at
> org.apache.solr.common.cloud.SolrZkClient$8.execute(SolrZkClient.java:305)
> at
> org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkCmdExecutor.java:74)
> at org.apache.solr.common.cloud.SolrZkClient.getData(SolrZkClient.java:305)
> at
> org.apache.solr.servlet.ZookeeperInfoServlet$ZKPrinter.printTree(ZookeeperInfoServlet.java:279)
> at
> org.apache.solr.servlet.ZookeeperInfoServlet$ZKPrinter.printTree(ZookeeperInfoServlet.java:322)
> at
> org.apache.solr.servlet.ZookeeperInfoServlet$ZKPrinter.printTree(ZookeeperInfoServlet.java:322)
> at
> org.apache.solr.servlet.ZookeeperInfoServlet$ZKPrinter.printTree(ZookeeperInfoServlet.java:322)
> at
> org.apache.solr.servlet.ZookeeperInfoServlet$ZKPrinter.printTree(ZookeeperInfoServlet.java:322)
> at
> org.apache.solr.servlet.ZookeeperInfoServlet$ZKPrinter.print(ZookeeperInfoServlet.java:226)
> at
> org.apache.solr.servlet.ZookeeperInfoServlet.doGet(ZookeeperInfoServlet.java:104)
> at javax.servlet.http.HttpServlet.service(HttpServlet.java:687)
> at javax.servlet.http.HttpServlet.service(HttpServlet.java:790)
> at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:769)
> at
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1667)
> at
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:466)
> at
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:207)
> at
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1650)
> at
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:583)
> at
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
> at
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:577)
> at
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:223)
> at
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1125)
> at
> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515)
> at
> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
> at
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1059)
> at
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
> at
> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:215)
> at
> org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:110)
> at
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97)
> at org.eclipse.jetty.server.Server.handle(Server.java:497)
> at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:311)
> at
> org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:248)
> at
> org.eclipse.jetty.io.AbstractConnection$2.run(AbstractConnection.java:540)
> at
> org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:610)
> at
> org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:539)
> at java.lang.Thread.run(Thread.java:745)
>
>

Re: Lucene/Solr 5.0 and custom FieldCahe implementation

2015-09-01 Thread Jamie Johnson

No worries, thanks again I'll begin teaching this

On Mon, Aug 31, 2015, 5:16 PM Tomás Fernández Löbbe <tomasflo...@gmail.com>
wrote:

> Sorry Jamie, I totally missed this email. There was no Jira that I could
> find. I created SOLR-7996
>
> On Sat, Aug 29, 2015 at 5:26 AM, Jamie Johnson <jej2...@gmail.com> wrote:
>
> > This sounds like a good idea, I'm assuming I'd need to make my own
> > UnInvertingReader (or subclass) to do this right?  Is there a way to do
> > this on the 5.x codebase or would I still need the solrindexer factory
> work
> > that Tomás mentioned previously?
> >
> > Tomás, is there a ticket for the SolrIndexer factory?  I'd like to follow
> > it's work to know what version of 5.x (or later) I should be looking for
> > this in.
> >
> > On Thu, Aug 27, 2015 at 1:06 PM, Yonik Seeley <ysee...@gmail.com> wrote:
> >
> > > UnInvertingReader makes indexed fields look like docvalues fields.
> > > The caching itself is still done in FieldCache/FieldCacheImpl
> > > but you could perhaps wrap what is cached there to either screen out
> > > stuff or construct a new entry based on the user.
> > >
> > > -Yonik
> > >
> > >
> > > On Thu, Aug 27, 2015 at 12:55 PM, Jamie Johnson <jej2...@gmail.com>
> > wrote:
> > > > I think a custom UnInvertingReader would work as I could skip the
> > process
> > > > of putting things in the cache.  Right now in Solr 4.x though I am
> > > caching
> > > > based but including the users authorities in the key of the cache so
> > > we're
> > > > not rebuilding the UnivertedField on every request.  Where in 5.x is
> > the
> > > > object actually cached?  Will this be possible in 5.x?
> > > >
> > > > On Thu, Aug 27, 2015 at 12:32 PM, Yonik Seeley <ysee...@gmail.com>
> > > wrote:
> > > >
> > > >> The FieldCache has become implementation rather than interface, so I
> > > >> don't think you're going to see plugins at that level (it's all
> > > >> package protected now).
> > > >>
> > > >> One could either subclass or re-implement UnInvertingReader though.
> > > >>
> > > >> -Yonik
> > > >>
> > > >>
> > > >> On Thu, Aug 27, 2015 at 12:09 PM, Jamie Johnson <jej2...@gmail.com>
> > > wrote:
> > > >> > Also in this vein I think that Lucene should support factories for
> > the
> > > >> > cache creation as described @
> > > >> > https://issues.apache.org/jira/browse/LUCENE-2394.  I'm not
> > endorsing
> > > >> the
> > > >> > patch that is provided (I haven't even looked at it) just the
> > concept
> > > in
> > > >> > general.
> > > >> >
> > > >> > On Thu, Aug 27, 2015 at 12:01 PM, Jamie Johnson <
> jej2...@gmail.com>
> > > >> wrote:
> > > >> >
> > > >> >> That makes sense, then I could extend the SolrIndexSearcher by
> > > creating
> > > >> a
> > > >> >> different factory class that did whatever magic I needed.  If you
> > > >> create a
> > > >> >> Jira ticket for this please link it here so I can track it!
> Again
> > > >> thanks
> > > >> >>
> > > >> >> On Thu, Aug 27, 2015 at 11:59 AM, Tomás Fernández Löbbe <
> > > >> >> tomasflo...@gmail.com> wrote:
> > > >> >>
> > > >> >>> I don't think there is a way to do this now. Maybe we should
> > > separate
> > > >> the
> > > >> >>> logic of creating the SolrIndexSearcher to a factory. Moving
> this
> > > logic
> > > >> >>> away from SolrCore is already a win, plus it will make it easier
> > to
> > > >> unit
> > > >> >>> test and extend for advanced use cases.
> > > >> >>>
> > > >> >>> Tomás
> > > >> >>>
> > > >> >>> On Wed, Aug 26, 2015 at 8:10 PM, Jamie Johnson <
> jej2...@gmail.com
> > >
> > > >> wrote:
> > > >> >>>
> > > >> >>> > Sorry to poke this again but I'm not following the last
> comment
> > of
> > > >> how I
> > > >> >>> > could go about extending the so

Re: Lucene/Solr 5.0 and custom FieldCahe implementation

2015-09-01 Thread Jamie Johnson

Tracking not teaching... Auto complete is fun...

On Tue, Sep 1, 2015, 6:34 AM Jamie Johnson <jej2...@gmail.com> wrote:

> No worries, thanks again I'll begin teaching this
>
> On Mon, Aug 31, 2015, 5:16 PM Tomás Fernández Löbbe <tomasflo...@gmail.com>
> wrote:
>
>> Sorry Jamie, I totally missed this email. There was no Jira that I could
>> find. I created SOLR-7996
>>
>> On Sat, Aug 29, 2015 at 5:26 AM, Jamie Johnson <jej2...@gmail.com> wrote:
>>
>> > This sounds like a good idea, I'm assuming I'd need to make my own
>> > UnInvertingReader (or subclass) to do this right?  Is there a way to do
>> > this on the 5.x codebase or would I still need the solrindexer factory
>> work
>> > that Tomás mentioned previously?
>> >
>> > Tomás, is there a ticket for the SolrIndexer factory?  I'd like to
>> follow
>> > it's work to know what version of 5.x (or later) I should be looking for
>> > this in.
>> >
>> > On Thu, Aug 27, 2015 at 1:06 PM, Yonik Seeley <ysee...@gmail.com>
>> wrote:
>> >
>> > > UnInvertingReader makes indexed fields look like docvalues fields.
>> > > The caching itself is still done in FieldCache/FieldCacheImpl
>> > > but you could perhaps wrap what is cached there to either screen out
>> > > stuff or construct a new entry based on the user.
>> > >
>> > > -Yonik
>> > >
>> > >
>> > > On Thu, Aug 27, 2015 at 12:55 PM, Jamie Johnson <jej2...@gmail.com>
>> > wrote:
>> > > > I think a custom UnInvertingReader would work as I could skip the
>> > process
>> > > > of putting things in the cache.  Right now in Solr 4.x though I am
>> > > caching
>> > > > based but including the users authorities in the key of the cache so
>> > > we're
>> > > > not rebuilding the UnivertedField on every request.  Where in 5.x is
>> > the
>> > > > object actually cached?  Will this be possible in 5.x?
>> > > >
>> > > > On Thu, Aug 27, 2015 at 12:32 PM, Yonik Seeley <ysee...@gmail.com>
>> > > wrote:
>> > > >
>> > > >> The FieldCache has become implementation rather than interface, so
>> I
>> > > >> don't think you're going to see plugins at that level (it's all
>> > > >> package protected now).
>> > > >>
>> > > >> One could either subclass or re-implement UnInvertingReader though.
>> > > >>
>> > > >> -Yonik
>> > > >>
>> > > >>
>> > > >> On Thu, Aug 27, 2015 at 12:09 PM, Jamie Johnson <jej2...@gmail.com
>> >
>> > > wrote:
>> > > >> > Also in this vein I think that Lucene should support factories
>> for
>> > the
>> > > >> > cache creation as described @
>> > > >> > https://issues.apache.org/jira/browse/LUCENE-2394.  I'm not
>> > endorsing
>> > > >> the
>> > > >> > patch that is provided (I haven't even looked at it) just the
>> > concept
>> > > in
>> > > >> > general.
>> > > >> >
>> > > >> > On Thu, Aug 27, 2015 at 12:01 PM, Jamie Johnson <
>> jej2...@gmail.com>
>> > > >> wrote:
>> > > >> >
>> > > >> >> That makes sense, then I could extend the SolrIndexSearcher by
>> > > creating
>> > > >> a
>> > > >> >> different factory class that did whatever magic I needed.  If
>> you
>> > > >> create a
>> > > >> >> Jira ticket for this please link it here so I can track it!
>> Again
>> > > >> thanks
>> > > >> >>
>> > > >> >> On Thu, Aug 27, 2015 at 11:59 AM, Tomás Fernández Löbbe <
>> > > >> >> tomasflo...@gmail.com> wrote:
>> > > >> >>
>> > > >> >>> I don't think there is a way to do this now. Maybe we should
>> > > separate
>> > > >> the
>> > > >> >>> logic of creating the SolrIndexSearcher to a factory. Moving
>> this
>> > > logic
>> > > >> >>> away from SolrCore is already a win, plus it will make it
>> easier
>> > to
>> > > >> unit
>> > >

Re: Lucene/Solr 5.0 and custom FieldCahe implementation

2015-08-29 Thread Jamie Johnson

This sounds like a good idea, I'm assuming I'd need to make my own
UnInvertingReader (or subclass) to do this right?  Is there a way to do
this on the 5.x codebase or would I still need the solrindexer factory work
that Tomás mentioned previously?

Tomás, is there a ticket for the SolrIndexer factory?  I'd like to follow
it's work to know what version of 5.x (or later) I should be looking for
this in.

On Thu, Aug 27, 2015 at 1:06 PM, Yonik Seeley ysee...@gmail.com wrote:

 UnInvertingReader makes indexed fields look like docvalues fields.
 The caching itself is still done in FieldCache/FieldCacheImpl
 but you could perhaps wrap what is cached there to either screen out
 stuff or construct a new entry based on the user.

 -Yonik


 On Thu, Aug 27, 2015 at 12:55 PM, Jamie Johnson jej2...@gmail.com wrote:
  I think a custom UnInvertingReader would work as I could skip the process
  of putting things in the cache.  Right now in Solr 4.x though I am
 caching
  based but including the users authorities in the key of the cache so
 we're
  not rebuilding the UnivertedField on every request.  Where in 5.x is the
  object actually cached?  Will this be possible in 5.x?
 
  On Thu, Aug 27, 2015 at 12:32 PM, Yonik Seeley ysee...@gmail.com
 wrote:
 
  The FieldCache has become implementation rather than interface, so I
  don't think you're going to see plugins at that level (it's all
  package protected now).
 
  One could either subclass or re-implement UnInvertingReader though.
 
  -Yonik
 
 
  On Thu, Aug 27, 2015 at 12:09 PM, Jamie Johnson jej2...@gmail.com
 wrote:
   Also in this vein I think that Lucene should support factories for the
   cache creation as described @
   https://issues.apache.org/jira/browse/LUCENE-2394.  I'm not endorsing
  the
   patch that is provided (I haven't even looked at it) just the concept
 in
   general.
  
   On Thu, Aug 27, 2015 at 12:01 PM, Jamie Johnson jej2...@gmail.com
  wrote:
  
   That makes sense, then I could extend the SolrIndexSearcher by
 creating
  a
   different factory class that did whatever magic I needed.  If you
  create a
   Jira ticket for this please link it here so I can track it!  Again
  thanks
  
   On Thu, Aug 27, 2015 at 11:59 AM, Tomás Fernández Löbbe 
   tomasflo...@gmail.com wrote:
  
   I don't think there is a way to do this now. Maybe we should
 separate
  the
   logic of creating the SolrIndexSearcher to a factory. Moving this
 logic
   away from SolrCore is already a win, plus it will make it easier to
  unit
   test and extend for advanced use cases.
  
   Tomás
  
   On Wed, Aug 26, 2015 at 8:10 PM, Jamie Johnson jej2...@gmail.com
  wrote:
  
Sorry to poke this again but I'm not following the last comment of
  how I
could go about extending the solr index searcher and have the
  extension
used.  Is there an example of this?  Again thanks
   
Jamie
On Aug 25, 2015 7:18 AM, Jamie Johnson jej2...@gmail.com
 wrote:
   
 I had seen this as well, if I over wrote this by extending
 SolrIndexSearcher how do I have my extension used?  I didn't
 see a
  way
that
 could be plugged in.
 On Aug 25, 2015 7:15 AM, Mikhail Khludnev 
   mkhlud...@griddynamics.com
 wrote:

 On Tue, Aug 25, 2015 at 2:03 PM, Jamie Johnson 
 jej2...@gmail.com
  
wrote:

  Thanks Mikhail.  If I'm reading the SimpleFacets class
  correctly,
   out
  delegates to DocValuesFacets when facet method is FC, what
 used
  to
   be
  FieldCache I believe.  DocValuesFacets either uses DocValues
 or
   builds
 then
  using the UninvertingReader.
 

 Ah.. got it. Thanks for reminding this details.It seems like
 even
 docValues=true doesn't help with your custom implementation.


 
  I am not seeing a clean extension point to add a custom
 UninvertingReader
  to Solr, would the only way be to copy the FacetComponent and
 SimpleFacets
  and modify as needed?
 
 Sadly, yes. There is no proper extension point. Also, consider
overriding
 SolrIndexSearcher.wrapReader(SolrCore, DirectoryReader) where
 the
 particular UninvertingReader is created, there you can pass the
  own
   one,
 which refers to custom FieldCache.


  On Aug 25, 2015 12:42 AM, Mikhail Khludnev 
 mkhlud...@griddynamics.com
  wrote:
 
   Hello Jamie,
   I don't understand how it could choose DocValuesFacets (it
   occurs on
   docValues=true) field, but then switches to
 UninvertingReader/FieldCache
   which means docValues=false. If you can provide more
 details
  it
would
 be
   great.
   Beside of that, I suppose you can only implement and inject
  your
   own
   UninvertingReader, I don't think there is an extension
 point
  for
this.
  It's
   too specific requirement.
  
   On Tue, Aug 25, 2015 at 3:50 AM, Jamie Johnson 
   jej2...@gmail.com
  wrote:
  
as mentioned

Re: Lucene/Solr 5.0 and custom FieldCahe implementation

2015-08-29 Thread Jamie Johnson

Also since DocValues seems to be the future of faceting, is there another
mechanism that I should be looking at to do authorization based filtering
like this?  I know that I can do this filtering at a document level and get
the desired result, but am wondering about at the Term level.  As always
thanks.

On Sat, Aug 29, 2015 at 8:26 AM, Jamie Johnson jej2...@gmail.com wrote:

 This sounds like a good idea, I'm assuming I'd need to make my own
 UnInvertingReader (or subclass) to do this right?  Is there a way to do
 this on the 5.x codebase or would I still need the solrindexer factory work
 that Tomás mentioned previously?

 Tomás, is there a ticket for the SolrIndexer factory?  I'd like to follow
 it's work to know what version of 5.x (or later) I should be looking for
 this in.

 On Thu, Aug 27, 2015 at 1:06 PM, Yonik Seeley ysee...@gmail.com wrote:

 UnInvertingReader makes indexed fields look like docvalues fields.
 The caching itself is still done in FieldCache/FieldCacheImpl
 but you could perhaps wrap what is cached there to either screen out
 stuff or construct a new entry based on the user.

 -Yonik


 On Thu, Aug 27, 2015 at 12:55 PM, Jamie Johnson jej2...@gmail.com
 wrote:
  I think a custom UnInvertingReader would work as I could skip the
 process
  of putting things in the cache.  Right now in Solr 4.x though I am
 caching
  based but including the users authorities in the key of the cache so
 we're
  not rebuilding the UnivertedField on every request.  Where in 5.x is the
  object actually cached?  Will this be possible in 5.x?
 
  On Thu, Aug 27, 2015 at 12:32 PM, Yonik Seeley ysee...@gmail.com
 wrote:
 
  The FieldCache has become implementation rather than interface, so I
  don't think you're going to see plugins at that level (it's all
  package protected now).
 
  One could either subclass or re-implement UnInvertingReader though.
 
  -Yonik
 
 
  On Thu, Aug 27, 2015 at 12:09 PM, Jamie Johnson jej2...@gmail.com
 wrote:
   Also in this vein I think that Lucene should support factories for
 the
   cache creation as described @
   https://issues.apache.org/jira/browse/LUCENE-2394.  I'm not
 endorsing
  the
   patch that is provided (I haven't even looked at it) just the
 concept in
   general.
  
   On Thu, Aug 27, 2015 at 12:01 PM, Jamie Johnson jej2...@gmail.com
  wrote:
  
   That makes sense, then I could extend the SolrIndexSearcher by
 creating
  a
   different factory class that did whatever magic I needed.  If you
  create a
   Jira ticket for this please link it here so I can track it!  Again
  thanks
  
   On Thu, Aug 27, 2015 at 11:59 AM, Tomás Fernández Löbbe 
   tomasflo...@gmail.com wrote:
  
   I don't think there is a way to do this now. Maybe we should
 separate
  the
   logic of creating the SolrIndexSearcher to a factory. Moving this
 logic
   away from SolrCore is already a win, plus it will make it easier to
  unit
   test and extend for advanced use cases.
  
   Tomás
  
   On Wed, Aug 26, 2015 at 8:10 PM, Jamie Johnson jej2...@gmail.com
  wrote:
  
Sorry to poke this again but I'm not following the last comment
 of
  how I
could go about extending the solr index searcher and have the
  extension
used.  Is there an example of this?  Again thanks
   
Jamie
On Aug 25, 2015 7:18 AM, Jamie Johnson jej2...@gmail.com
 wrote:
   
 I had seen this as well, if I over wrote this by extending
 SolrIndexSearcher how do I have my extension used?  I didn't
 see a
  way
that
 could be plugged in.
 On Aug 25, 2015 7:15 AM, Mikhail Khludnev 
   mkhlud...@griddynamics.com
 wrote:

 On Tue, Aug 25, 2015 at 2:03 PM, Jamie Johnson 
 jej2...@gmail.com
  
wrote:

  Thanks Mikhail.  If I'm reading the SimpleFacets class
  correctly,
   out
  delegates to DocValuesFacets when facet method is FC, what
 used
  to
   be
  FieldCache I believe.  DocValuesFacets either uses
 DocValues or
   builds
 then
  using the UninvertingReader.
 

 Ah.. got it. Thanks for reminding this details.It seems like
 even
 docValues=true doesn't help with your custom implementation.


 
  I am not seeing a clean extension point to add a custom
 UninvertingReader
  to Solr, would the only way be to copy the FacetComponent
 and
 SimpleFacets
  and modify as needed?
 
 Sadly, yes. There is no proper extension point. Also, consider
overriding
 SolrIndexSearcher.wrapReader(SolrCore, DirectoryReader) where
 the
 particular UninvertingReader is created, there you can pass
 the
  own
   one,
 which refers to custom FieldCache.


  On Aug 25, 2015 12:42 AM, Mikhail Khludnev 
 mkhlud...@griddynamics.com
  wrote:
 
   Hello Jamie,
   I don't understand how it could choose DocValuesFacets (it
   occurs on
   docValues=true) field, but then switches to
 UninvertingReader/FieldCache
   which means docValues=false. If you can provide

Re: Lucene/Solr 5.0 and custom FieldCahe implementation

2015-08-27 Thread Jamie Johnson

I think a custom UnInvertingReader would work as I could skip the process
of putting things in the cache.  Right now in Solr 4.x though I am caching
based but including the users authorities in the key of the cache so we're
not rebuilding the UnivertedField on every request.  Where in 5.x is the
object actually cached?  Will this be possible in 5.x?

On Thu, Aug 27, 2015 at 12:32 PM, Yonik Seeley ysee...@gmail.com wrote:

 The FieldCache has become implementation rather than interface, so I
 don't think you're going to see plugins at that level (it's all
 package protected now).

 One could either subclass or re-implement UnInvertingReader though.

 -Yonik


 On Thu, Aug 27, 2015 at 12:09 PM, Jamie Johnson jej2...@gmail.com wrote:
  Also in this vein I think that Lucene should support factories for the
  cache creation as described @
  https://issues.apache.org/jira/browse/LUCENE-2394.  I'm not endorsing
 the
  patch that is provided (I haven't even looked at it) just the concept in
  general.
 
  On Thu, Aug 27, 2015 at 12:01 PM, Jamie Johnson jej2...@gmail.com
 wrote:
 
  That makes sense, then I could extend the SolrIndexSearcher by creating
 a
  different factory class that did whatever magic I needed.  If you
 create a
  Jira ticket for this please link it here so I can track it!  Again
 thanks
 
  On Thu, Aug 27, 2015 at 11:59 AM, Tomás Fernández Löbbe 
  tomasflo...@gmail.com wrote:
 
  I don't think there is a way to do this now. Maybe we should separate
 the
  logic of creating the SolrIndexSearcher to a factory. Moving this logic
  away from SolrCore is already a win, plus it will make it easier to
 unit
  test and extend for advanced use cases.
 
  Tomás
 
  On Wed, Aug 26, 2015 at 8:10 PM, Jamie Johnson jej2...@gmail.com
 wrote:
 
   Sorry to poke this again but I'm not following the last comment of
 how I
   could go about extending the solr index searcher and have the
 extension
   used.  Is there an example of this?  Again thanks
  
   Jamie
   On Aug 25, 2015 7:18 AM, Jamie Johnson jej2...@gmail.com wrote:
  
I had seen this as well, if I over wrote this by extending
SolrIndexSearcher how do I have my extension used?  I didn't see a
 way
   that
could be plugged in.
On Aug 25, 2015 7:15 AM, Mikhail Khludnev 
  mkhlud...@griddynamics.com
wrote:
   
On Tue, Aug 25, 2015 at 2:03 PM, Jamie Johnson jej2...@gmail.com
 
   wrote:
   
 Thanks Mikhail.  If I'm reading the SimpleFacets class
 correctly,
  out
 delegates to DocValuesFacets when facet method is FC, what used
 to
  be
 FieldCache I believe.  DocValuesFacets either uses DocValues or
  builds
then
 using the UninvertingReader.

   
Ah.. got it. Thanks for reminding this details.It seems like even
docValues=true doesn't help with your custom implementation.
   
   

 I am not seeing a clean extension point to add a custom
UninvertingReader
 to Solr, would the only way be to copy the FacetComponent and
SimpleFacets
 and modify as needed?

Sadly, yes. There is no proper extension point. Also, consider
   overriding
SolrIndexSearcher.wrapReader(SolrCore, DirectoryReader) where the
particular UninvertingReader is created, there you can pass the
 own
  one,
which refers to custom FieldCache.
   
   
 On Aug 25, 2015 12:42 AM, Mikhail Khludnev 
mkhlud...@griddynamics.com
 wrote:

  Hello Jamie,
  I don't understand how it could choose DocValuesFacets (it
  occurs on
  docValues=true) field, but then switches to
UninvertingReader/FieldCache
  which means docValues=false. If you can provide more details
 it
   would
be
  great.
  Beside of that, I suppose you can only implement and inject
 your
  own
  UninvertingReader, I don't think there is an extension point
 for
   this.
 It's
  too specific requirement.
 
  On Tue, Aug 25, 2015 at 3:50 AM, Jamie Johnson 
  jej2...@gmail.com
 wrote:
 
   as mentioned in a previous email I have a need to provide
  security
  controls
   at the term level.  I know that Lucene/Solr doesn't support
  this
   so
I
 had
   baked something onto a 4.x baseline that was sufficient for
 my
  use
 cases.
   I am now looking to move that implementation to 5.x and am
  running
into
  an
   issue around faceting.  Previously we were able to provide a
   custom
 cache
   implementation that would create separate cache entries
 given a
  particular
   set of security controls, but in Solr 5 some faceting is
  delegated
to
   DocValuesFacets which delegates to UninvertingReader in my
 case
   (we
are
  not
   storing DocValues).  The issue I am running into is that
 before
   5.x
I
 had
   the ability to influence the FieldCache that was used at the
  Solr
level
  to
   also include a security token into the key so each cache
 entry
  was
 scoped

Re: StrDocValues

2015-08-27 Thread Jamie Johnson

Actually I should have just tried this before asking but I'll say what I'm
seeing and maybe someone can confirm.

Faceting looks like it took this into account, i.e. the counts were 0 for
values that were in documents that I removed using my AnalyticQuery.  I had
expected that the AnalyticsQuery might be done after everything was
completed, but it looks like it was executed before faceting which is great.

On Thu, Aug 27, 2015 at 1:17 PM, Jamie Johnson jej2...@gmail.com wrote:

 Thanks Yonik.  I currently am using this to negate the score of a document
 given the value of a particular field within the document, then using a
 custom AnalyticQuery to only collect documents with a score  0.  Will this
 also impact the faceting counts?

 On Wed, Aug 26, 2015 at 8:32 PM, Yonik Seeley ysee...@gmail.com wrote:

 On Wed, Aug 26, 2015 at 6:20 PM, Jamie Johnson jej2...@gmail.com wrote:
  I don't see it explicitly mentioned, but does the boost only get
 applied to
  the final documents/score that matched the provided query or is it
 called
  for each field that matched?  I'm assuming only once per document that
  matched the main query, is that right?

 Correct.

 -Yonik

Re: StrDocValues

2015-08-27 Thread Jamie Johnson

Thanks Yonik.  I currently am using this to negate the score of a document
given the value of a particular field within the document, then using a
custom AnalyticQuery to only collect documents with a score  0.  Will this
also impact the faceting counts?

On Wed, Aug 26, 2015 at 8:32 PM, Yonik Seeley ysee...@gmail.com wrote:

 On Wed, Aug 26, 2015 at 6:20 PM, Jamie Johnson jej2...@gmail.com wrote:
  I don't see it explicitly mentioned, but does the boost only get applied
 to
  the final documents/score that matched the provided query or is it called
  for each field that matched?  I'm assuming only once per document that
  matched the main query, is that right?

 Correct.

 -Yonik

Re: Lucene/Solr 5.0 and custom FieldCahe implementation

2015-08-27 Thread Jamie Johnson

Also in this vein I think that Lucene should support factories for the
cache creation as described @
https://issues.apache.org/jira/browse/LUCENE-2394.  I'm not endorsing the
patch that is provided (I haven't even looked at it) just the concept in
general.

On Thu, Aug 27, 2015 at 12:01 PM, Jamie Johnson jej2...@gmail.com wrote:

 That makes sense, then I could extend the SolrIndexSearcher by creating a
 different factory class that did whatever magic I needed.  If you create a
 Jira ticket for this please link it here so I can track it!  Again thanks

 On Thu, Aug 27, 2015 at 11:59 AM, Tomás Fernández Löbbe 
 tomasflo...@gmail.com wrote:

 I don't think there is a way to do this now. Maybe we should separate the
 logic of creating the SolrIndexSearcher to a factory. Moving this logic
 away from SolrCore is already a win, plus it will make it easier to unit
 test and extend for advanced use cases.

 Tomás

 On Wed, Aug 26, 2015 at 8:10 PM, Jamie Johnson jej2...@gmail.com wrote:

  Sorry to poke this again but I'm not following the last comment of how I
  could go about extending the solr index searcher and have the extension
  used.  Is there an example of this?  Again thanks
 
  Jamie
  On Aug 25, 2015 7:18 AM, Jamie Johnson jej2...@gmail.com wrote:
 
   I had seen this as well, if I over wrote this by extending
   SolrIndexSearcher how do I have my extension used?  I didn't see a way
  that
   could be plugged in.
   On Aug 25, 2015 7:15 AM, Mikhail Khludnev 
 mkhlud...@griddynamics.com
   wrote:
  
   On Tue, Aug 25, 2015 at 2:03 PM, Jamie Johnson jej2...@gmail.com
  wrote:
  
Thanks Mikhail.  If I'm reading the SimpleFacets class correctly,
 out
delegates to DocValuesFacets when facet method is FC, what used to
 be
FieldCache I believe.  DocValuesFacets either uses DocValues or
 builds
   then
using the UninvertingReader.
   
  
   Ah.. got it. Thanks for reminding this details.It seems like even
   docValues=true doesn't help with your custom implementation.
  
  
   
I am not seeing a clean extension point to add a custom
   UninvertingReader
to Solr, would the only way be to copy the FacetComponent and
   SimpleFacets
and modify as needed?
   
   Sadly, yes. There is no proper extension point. Also, consider
  overriding
   SolrIndexSearcher.wrapReader(SolrCore, DirectoryReader) where the
   particular UninvertingReader is created, there you can pass the own
 one,
   which refers to custom FieldCache.
  
  
On Aug 25, 2015 12:42 AM, Mikhail Khludnev 
   mkhlud...@griddynamics.com
wrote:
   
 Hello Jamie,
 I don't understand how it could choose DocValuesFacets (it
 occurs on
 docValues=true) field, but then switches to
   UninvertingReader/FieldCache
 which means docValues=false. If you can provide more details it
  would
   be
 great.
 Beside of that, I suppose you can only implement and inject your
 own
 UninvertingReader, I don't think there is an extension point for
  this.
It's
 too specific requirement.

 On Tue, Aug 25, 2015 at 3:50 AM, Jamie Johnson 
 jej2...@gmail.com
wrote:

  as mentioned in a previous email I have a need to provide
 security
 controls
  at the term level.  I know that Lucene/Solr doesn't support
 this
  so
   I
had
  baked something onto a 4.x baseline that was sufficient for my
 use
cases.
  I am now looking to move that implementation to 5.x and am
 running
   into
 an
  issue around faceting.  Previously we were able to provide a
  custom
cache
  implementation that would create separate cache entries given a
 particular
  set of security controls, but in Solr 5 some faceting is
 delegated
   to
  DocValuesFacets which delegates to UninvertingReader in my case
  (we
   are
 not
  storing DocValues).  The issue I am running into is that before
  5.x
   I
had
  the ability to influence the FieldCache that was used at the
 Solr
   level
 to
  also include a security token into the key so each cache entry
 was
scoped
  to a particular level.  With the current implementation the
   FieldCache
  seems to be an internal detail that I can't influence in
 anyway.
  Is
this
  correct?  I had noticed this Jira ticket
  https://issues.apache.org/jira/browse/LUCENE-5427, is there
 any
movement
  on
  this?  Is there another way to influence the information that
 is
  put
into
  these caches?  As always thanks in advance for any suggestions.
 
  -Jamie
 



 --
 Sincerely yours
 Mikhail Khludnev
 Principal Engineer,
 Grid Dynamics

 http://www.griddynamics.com
 mkhlud...@griddynamics.com

   
  
  
  
   --
   Sincerely yours
   Mikhail Khludnev
   Principal Engineer,
   Grid Dynamics
  
   http://www.griddynamics.com
   mkhlud...@griddynamics.com

Re: Lucene/Solr 5.0 and custom FieldCahe implementation

2015-08-27 Thread Jamie Johnson

That makes sense, then I could extend the SolrIndexSearcher by creating a
different factory class that did whatever magic I needed.  If you create a
Jira ticket for this please link it here so I can track it!  Again thanks

On Thu, Aug 27, 2015 at 11:59 AM, Tomás Fernández Löbbe 
tomasflo...@gmail.com wrote:

 I don't think there is a way to do this now. Maybe we should separate the
 logic of creating the SolrIndexSearcher to a factory. Moving this logic
 away from SolrCore is already a win, plus it will make it easier to unit
 test and extend for advanced use cases.

 Tomás

 On Wed, Aug 26, 2015 at 8:10 PM, Jamie Johnson jej2...@gmail.com wrote:

  Sorry to poke this again but I'm not following the last comment of how I
  could go about extending the solr index searcher and have the extension
  used.  Is there an example of this?  Again thanks
 
  Jamie
  On Aug 25, 2015 7:18 AM, Jamie Johnson jej2...@gmail.com wrote:
 
   I had seen this as well, if I over wrote this by extending
   SolrIndexSearcher how do I have my extension used?  I didn't see a way
  that
   could be plugged in.
   On Aug 25, 2015 7:15 AM, Mikhail Khludnev 
 mkhlud...@griddynamics.com
   wrote:
  
   On Tue, Aug 25, 2015 at 2:03 PM, Jamie Johnson jej2...@gmail.com
  wrote:
  
Thanks Mikhail.  If I'm reading the SimpleFacets class correctly,
 out
delegates to DocValuesFacets when facet method is FC, what used to
 be
FieldCache I believe.  DocValuesFacets either uses DocValues or
 builds
   then
using the UninvertingReader.
   
  
   Ah.. got it. Thanks for reminding this details.It seems like even
   docValues=true doesn't help with your custom implementation.
  
  
   
I am not seeing a clean extension point to add a custom
   UninvertingReader
to Solr, would the only way be to copy the FacetComponent and
   SimpleFacets
and modify as needed?
   
   Sadly, yes. There is no proper extension point. Also, consider
  overriding
   SolrIndexSearcher.wrapReader(SolrCore, DirectoryReader) where the
   particular UninvertingReader is created, there you can pass the own
 one,
   which refers to custom FieldCache.
  
  
On Aug 25, 2015 12:42 AM, Mikhail Khludnev 
   mkhlud...@griddynamics.com
wrote:
   
 Hello Jamie,
 I don't understand how it could choose DocValuesFacets (it occurs
 on
 docValues=true) field, but then switches to
   UninvertingReader/FieldCache
 which means docValues=false. If you can provide more details it
  would
   be
 great.
 Beside of that, I suppose you can only implement and inject your
 own
 UninvertingReader, I don't think there is an extension point for
  this.
It's
 too specific requirement.

 On Tue, Aug 25, 2015 at 3:50 AM, Jamie Johnson jej2...@gmail.com
 
wrote:

  as mentioned in a previous email I have a need to provide
 security
 controls
  at the term level.  I know that Lucene/Solr doesn't support this
  so
   I
had
  baked something onto a 4.x baseline that was sufficient for my
 use
cases.
  I am now looking to move that implementation to 5.x and am
 running
   into
 an
  issue around faceting.  Previously we were able to provide a
  custom
cache
  implementation that would create separate cache entries given a
 particular
  set of security controls, but in Solr 5 some faceting is
 delegated
   to
  DocValuesFacets which delegates to UninvertingReader in my case
  (we
   are
 not
  storing DocValues).  The issue I am running into is that before
  5.x
   I
had
  the ability to influence the FieldCache that was used at the
 Solr
   level
 to
  also include a security token into the key so each cache entry
 was
scoped
  to a particular level.  With the current implementation the
   FieldCache
  seems to be an internal detail that I can't influence in anyway.
  Is
this
  correct?  I had noticed this Jira ticket
  https://issues.apache.org/jira/browse/LUCENE-5427, is there any
movement
  on
  this?  Is there another way to influence the information that is
  put
into
  these caches?  As always thanks in advance for any suggestions.
 
  -Jamie
 



 --
 Sincerely yours
 Mikhail Khludnev
 Principal Engineer,
 Grid Dynamics

 http://www.griddynamics.com
 mkhlud...@griddynamics.com

   
  
  
  
   --
   Sincerely yours
   Mikhail Khludnev
   Principal Engineer,
   Grid Dynamics
  
   http://www.griddynamics.com
   mkhlud...@griddynamics.com

Re: StrDocValues

2015-08-27 Thread Jamie Johnson

Right, I am removing them myself.  Another feature which would be great
would be the ability to specify a custom collector like the positive score
only collector in this case to avoid having to do an extra pass over all of
the scores, but I don't believe there is a way to do that now right?

On Thu, Aug 27, 2015 at 3:16 PM, Yonik Seeley ysee...@gmail.com wrote:

 On Thu, Aug 27, 2015 at 2:43 PM, Erick Erickson erickerick...@gmail.com
 wrote:
  Right, when scoring any document that scores 0 is removed from the
  results

 Just to clarify, I think Jamie removed 0 scoring documents himself.

 Solr has never done this itself.  Lucene used to a long time ago and
 then stopped IIRC.

 -Yonik

StrDocValues

2015-08-26 Thread Jamie Johnson

Are there any example implementation showing how StrDocValues works?  I am
not sure if this is the right place or not, but I was thinking about having
some document level doc value that I'd like to read in a function query to
impact if the document is returned or not.  Am I barking up the right tree
looking at this or is there another method to supporting this?

Re: StrDocValues

2015-08-26 Thread Jamie Johnson

I think I found it.  {!boost..} gave me what i was looking for and then a
custom collector filtered out anything that I didn't want to show.

On Wed, Aug 26, 2015 at 1:48 PM, Jamie Johnson jej2...@gmail.com wrote:

 Are there any example implementation showing how StrDocValues works?  I am
 not sure if this is the right place or not, but I was thinking about having
 some document level doc value that I'd like to read in a function query to
 impact if the document is returned or not.  Am I barking up the right tree
 looking at this or is there another method to supporting this?

Re: StrDocValues

2015-08-26 Thread Jamie Johnson

I don't see it explicitly mentioned, but does the boost only get applied to
the final documents/score that matched the provided query or is it called
for each field that matched?  I'm assuming only once per document that
matched the main query, is that right?

On Wed, Aug 26, 2015 at 5:35 PM, Jamie Johnson jej2...@gmail.com wrote:

 I think I found it.  {!boost..} gave me what i was looking for and then a
 custom collector filtered out anything that I didn't want to show.

 On Wed, Aug 26, 2015 at 1:48 PM, Jamie Johnson jej2...@gmail.com wrote:

 Are there any example implementation showing how StrDocValues works?  I
 am not sure if this is the right place or not, but I was thinking about
 having some document level doc value that I'd like to read in a function
 query to impact if the document is returned or not.  Am I barking up the
 right tree looking at this or is there another method to supporting this?

Re: Tokenizers and DelimitedPayloadTokenFilterFactory

2015-08-26 Thread Jamie Johnson

Thanks again Erick, I created
https://issues.apache.org/jira/browse/SOLR-7975, though I didn't attach s
patch because my current implementation is not useful generally right now,
it meets my use case but likely would not meet others.  I will try to look
about generalizing this to allow something custom to be plugged in.
On Aug 26, 2015 2:46 AM, Erick Erickson erickerick...@gmail.com wrote:

 Sure, I think it's fine to raise a JIRA, especially if you can include
 a patch, even a preliminary one to solicit feedback... which I'll
 leave to people who are more familiar with that code...

 I'm not sure how generally useful this would be, and if it comes
 at a cost to normal searching there's sure to be lively discussion.

 Best
 Erick

 On Tue, Aug 25, 2015 at 7:50 PM, Jamie Johnson jej2...@gmail.com wrote:
  Looks like I have something basic working for Trie fields.  I am doing
  exactly what I said in my previous email, so good news there.  I think
 this
  is a big step as there are only a few field types left that I need to
  support, those being date (should be similar to Trie) and Spatial fields,
  which at a glance looked like it provided a way to provide the token
 stream
  through an extension.  Definitely need to look more though.
 
  All of this said though, is this really the right way to get payloads
 into
  these types of fields?  Should a jira feature request be added for this?
  On Aug 25, 2015 8:13 PM, Jamie Johnson jej2...@gmail.com wrote:
 
  Right, I had assumed (obviously here is my problem) that I'd be able to
  specify payloads for the field regardless of the field type.  Looking at
  TrieField that is certainly non-trivial.  After a bit of digging it
 appears
  that if I wanted to do something here I'd need to build a new TrieField,
  override createField and provide a Field that would return something
 like
  NumericTokenStream but also provide the payloads.  Like you said sounds
  interesting to say the least...
 
  Were payloads not really intended to be used for these types of fields
  from a Lucene perspective?
 
 
  On Tue, Aug 25, 2015 at 6:29 PM, Erick Erickson 
 erickerick...@gmail.com
  wrote:
 
  Well, you're going down a path that hasn't been trodden before ;).
 
  If you can treat your primitive types as text types you might get
  some traction, but that makes a lot of operations like numeric
  comparison difficult.
 
  H. another idea from left field. For single-valued types,
  what about a sidecar field that has the auth token? And even
  for a multiValued field, two parallel fields are guaranteed to
  maintain order so perhaps you could do something here. Yes,
  I'm waving my hands a LOT here.
 
  I suspect that trying to have a custom type that incorporates
  payloads for, say, trie fields will be interesting to say the least.
  Numeric types are packed to save storage etc. so it'll be
  an adventure..
 
  Best,
  Erick
 
  On Tue, Aug 25, 2015 at 2:43 PM, Jamie Johnson jej2...@gmail.com
 wrote:
   We were originally using this approach, i.e. run things through the
   KeywordTokenizer - DelimitedPayloadFilter - WordDelimiterFilter.
  Again
   this works fine for text, though I had wanted to use the
  StandardTokenizer
   in the chain.  Is there an equivalent filter that does what the
   StandardTokenizer does?
  
   All of this said this doesn't address the issue of the primitive
 field
   types, which at this point is the bigger issue.  Given this use case
  should
   there be another way to provide payloads?
  
   My current thinking is that I will need to provide custom
  implementations
   for all of the field types I would like to support payloads on which
  will
   essentially be copies of the standard versions with some extra
 sugar
  to
   read/write the payloads (I don't see a way to wrap/delegate these at
  this
   point because AttributeSource has the attribute retrieval related
  methods
   as final so I can't simply wrap another tokenizer and return my added
   attributes + the wrapped attributes).  I know my use case is a bit
  strange,
   but I had not expected to need to do this given that Lucene/Solr
  supports
   payloads on these field types, they just aren't exposed.
  
   As always I appreciate any ideas if I'm barking up the wrong tree
 here.
  
   On Tue, Aug 25, 2015 at 2:52 PM, Markus Jelsma 
  markus.jel...@openindex.io
   wrote:
  
   Well, if i remember correctly (i have no testing facility at hand)
   WordDelimiterFilter maintains payloads on emitted sub terms. So if
 you
  use
   a KeywordTokenizer, input 'some text^PAYLOAD', and have a
   DelimitedPayloadFilter, the entire string gets a payload. You can
 then
   split that string up again in individual tokens. It is possible to
  abuse
   WordDelimiterFilter for it because it has a types parameter that you
  can
   use to split it on whitespace if its input is not trimmed. Otherwise
  you
   can use any other character instead of a space as your input.
  
   This is a crazy idea

Re: Lucene/Solr 5.0 and custom FieldCahe implementation

2015-08-26 Thread Jamie Johnson

Sorry to poke this again but I'm not following the last comment of how I
could go about extending the solr index searcher and have the extension
used.  Is there an example of this?  Again thanks

Jamie
On Aug 25, 2015 7:18 AM, Jamie Johnson jej2...@gmail.com wrote:

 I had seen this as well, if I over wrote this by extending
 SolrIndexSearcher how do I have my extension used?  I didn't see a way that
 could be plugged in.
 On Aug 25, 2015 7:15 AM, Mikhail Khludnev mkhlud...@griddynamics.com
 wrote:

 On Tue, Aug 25, 2015 at 2:03 PM, Jamie Johnson jej2...@gmail.com wrote:

  Thanks Mikhail.  If I'm reading the SimpleFacets class correctly, out
  delegates to DocValuesFacets when facet method is FC, what used to be
  FieldCache I believe.  DocValuesFacets either uses DocValues or builds
 then
  using the UninvertingReader.
 

 Ah.. got it. Thanks for reminding this details.It seems like even
 docValues=true doesn't help with your custom implementation.


 
  I am not seeing a clean extension point to add a custom
 UninvertingReader
  to Solr, would the only way be to copy the FacetComponent and
 SimpleFacets
  and modify as needed?
 
 Sadly, yes. There is no proper extension point. Also, consider overriding
 SolrIndexSearcher.wrapReader(SolrCore, DirectoryReader) where the
 particular UninvertingReader is created, there you can pass the own one,
 which refers to custom FieldCache.


  On Aug 25, 2015 12:42 AM, Mikhail Khludnev 
 mkhlud...@griddynamics.com
  wrote:
 
   Hello Jamie,
   I don't understand how it could choose DocValuesFacets (it occurs on
   docValues=true) field, but then switches to
 UninvertingReader/FieldCache
   which means docValues=false. If you can provide more details it would
 be
   great.
   Beside of that, I suppose you can only implement and inject your own
   UninvertingReader, I don't think there is an extension point for this.
  It's
   too specific requirement.
  
   On Tue, Aug 25, 2015 at 3:50 AM, Jamie Johnson jej2...@gmail.com
  wrote:
  
as mentioned in a previous email I have a need to provide security
   controls
at the term level.  I know that Lucene/Solr doesn't support this so
 I
  had
baked something onto a 4.x baseline that was sufficient for my use
  cases.
I am now looking to move that implementation to 5.x and am running
 into
   an
issue around faceting.  Previously we were able to provide a custom
  cache
implementation that would create separate cache entries given a
   particular
set of security controls, but in Solr 5 some faceting is delegated
 to
DocValuesFacets which delegates to UninvertingReader in my case (we
 are
   not
storing DocValues).  The issue I am running into is that before 5.x
 I
  had
the ability to influence the FieldCache that was used at the Solr
 level
   to
also include a security token into the key so each cache entry was
  scoped
to a particular level.  With the current implementation the
 FieldCache
seems to be an internal detail that I can't influence in anyway.  Is
  this
correct?  I had noticed this Jira ticket
https://issues.apache.org/jira/browse/LUCENE-5427, is there any
  movement
on
this?  Is there another way to influence the information that is put
  into
these caches?  As always thanks in advance for any suggestions.
   
-Jamie
   
  
  
  
   --
   Sincerely yours
   Mikhail Khludnev
   Principal Engineer,
   Grid Dynamics
  
   http://www.griddynamics.com
   mkhlud...@griddynamics.com
  
 



 --
 Sincerely yours
 Mikhail Khludnev
 Principal Engineer,
 Grid Dynamics

 http://www.griddynamics.com
 mkhlud...@griddynamics.com

Re: Tokenizers and DelimitedPayloadTokenFilterFactory

2015-08-25 Thread Jamie Johnson

To be clear, we are using payloads as a way to attach authorizations to
individual tokens within Solr.  The payloads are normal Solr Payloads
though we are not using floats, we are using the identity payload encoder
(org.apache.lucene.analysis.payloads.IdentityEncoder) which allows for
storing a byte[] of our choosing into the payload field.

This works great for text, but now that I'm indexing more than just text I
need a way to specify the payload on the other field types.  Does that make
more sense?

On Tue, Aug 25, 2015 at 12:52 PM, Erick Erickson erickerick...@gmail.com
wrote:

 This really sounds like an XY problem. Or when you use
 payload it's not the Solr payload.

 So Solr Payloads are a float value that you can attach to
 individual terms to influence the scoring. Attaching the
 _same_ payload to all terms in a field is much the same
 thing as boosting on any matches in the field at query time
 or boosting on the field at index time (this latter assuming
 that different docs would have different boosts).

 So can you back up a bit and tell us what you're trying to
 accomplish maybe we can be sure we're both talking about
 the same thing ;)

 Best,
 Erick

 On Tue, Aug 25, 2015 at 9:09 AM, Jamie Johnson jej2...@gmail.com wrote:
  I would like to specify a particular payload for all tokens emitted from
 a
  tokenizer, but don't see a clear way to do this.  Ideally I could specify
  that something like the DelimitedPayloadTokenFilter be run on the entire
  field and then standard analysis be done on the rest of the field, so in
  the case that I had the following text
 
  this is a test\Foo
 
  I would like to create tokens this, is, a, test each with a
 payload
  of Foo.  From what I'm seeing though only test get's the payload.  Is
 there
  anyway to accomplish this or will I need to implement a custom tokenizer?

Tokenizers and DelimitedPayloadTokenFilterFactory

2015-08-25 Thread Jamie Johnson

I would like to specify a particular payload for all tokens emitted from a
tokenizer, but don't see a clear way to do this.  Ideally I could specify
that something like the DelimitedPayloadTokenFilter be run on the entire
field and then standard analysis be done on the rest of the field, so in
the case that I had the following text

this is a test\Foo

I would like to create tokens this, is, a, test each with a payload
of Foo.  From what I'm seeing though only test get's the payload.  Is there
anyway to accomplish this or will I need to implement a custom tokenizer?

Re: Tokenizers and DelimitedPayloadTokenFilterFactory

2015-08-25 Thread Jamie Johnson

Looks like I have something basic working for Trie fields.  I am doing
exactly what I said in my previous email, so good news there.  I think this
is a big step as there are only a few field types left that I need to
support, those being date (should be similar to Trie) and Spatial fields,
which at a glance looked like it provided a way to provide the token stream
through an extension.  Definitely need to look more though.

All of this said though, is this really the right way to get payloads into
these types of fields?  Should a jira feature request be added for this?
On Aug 25, 2015 8:13 PM, Jamie Johnson jej2...@gmail.com wrote:

 Right, I had assumed (obviously here is my problem) that I'd be able to
 specify payloads for the field regardless of the field type.  Looking at
 TrieField that is certainly non-trivial.  After a bit of digging it appears
 that if I wanted to do something here I'd need to build a new TrieField,
 override createField and provide a Field that would return something like
 NumericTokenStream but also provide the payloads.  Like you said sounds
 interesting to say the least...

 Were payloads not really intended to be used for these types of fields
 from a Lucene perspective?


 On Tue, Aug 25, 2015 at 6:29 PM, Erick Erickson erickerick...@gmail.com
 wrote:

 Well, you're going down a path that hasn't been trodden before ;).

 If you can treat your primitive types as text types you might get
 some traction, but that makes a lot of operations like numeric
 comparison difficult.

 H. another idea from left field. For single-valued types,
 what about a sidecar field that has the auth token? And even
 for a multiValued field, two parallel fields are guaranteed to
 maintain order so perhaps you could do something here. Yes,
 I'm waving my hands a LOT here.

 I suspect that trying to have a custom type that incorporates
 payloads for, say, trie fields will be interesting to say the least.
 Numeric types are packed to save storage etc. so it'll be
 an adventure..

 Best,
 Erick

 On Tue, Aug 25, 2015 at 2:43 PM, Jamie Johnson jej2...@gmail.com wrote:
  We were originally using this approach, i.e. run things through the
  KeywordTokenizer - DelimitedPayloadFilter - WordDelimiterFilter.
 Again
  this works fine for text, though I had wanted to use the
 StandardTokenizer
  in the chain.  Is there an equivalent filter that does what the
  StandardTokenizer does?
 
  All of this said this doesn't address the issue of the primitive field
  types, which at this point is the bigger issue.  Given this use case
 should
  there be another way to provide payloads?
 
  My current thinking is that I will need to provide custom
 implementations
  for all of the field types I would like to support payloads on which
 will
  essentially be copies of the standard versions with some extra sugar
 to
  read/write the payloads (I don't see a way to wrap/delegate these at
 this
  point because AttributeSource has the attribute retrieval related
 methods
  as final so I can't simply wrap another tokenizer and return my added
  attributes + the wrapped attributes).  I know my use case is a bit
 strange,
  but I had not expected to need to do this given that Lucene/Solr
 supports
  payloads on these field types, they just aren't exposed.
 
  As always I appreciate any ideas if I'm barking up the wrong tree here.
 
  On Tue, Aug 25, 2015 at 2:52 PM, Markus Jelsma 
 markus.jel...@openindex.io
  wrote:
 
  Well, if i remember correctly (i have no testing facility at hand)
  WordDelimiterFilter maintains payloads on emitted sub terms. So if you
 use
  a KeywordTokenizer, input 'some text^PAYLOAD', and have a
  DelimitedPayloadFilter, the entire string gets a payload. You can then
  split that string up again in individual tokens. It is possible to
 abuse
  WordDelimiterFilter for it because it has a types parameter that you
 can
  use to split it on whitespace if its input is not trimmed. Otherwise
 you
  can use any other character instead of a space as your input.
 
  This is a crazy idea, but it might work.
 
  -Original message-
   From:Jamie Johnson jej2...@gmail.com
   Sent: Tuesday 25th August 2015 19:37
   To: solr-user@lucene.apache.org
   Subject: Re: Tokenizers and DelimitedPayloadTokenFilterFactory
  
   To be clear, we are using payloads as a way to attach authorizations
 to
   individual tokens within Solr.  The payloads are normal Solr Payloads
   though we are not using floats, we are using the identity payload
 encoder
   (org.apache.lucene.analysis.payloads.IdentityEncoder) which allows
 for
   storing a byte[] of our choosing into the payload field.
  
   This works great for text, but now that I'm indexing more than just
 text
  I
   need a way to specify the payload on the other field types.  Does
 that
  make
   more sense?
  
   On Tue, Aug 25, 2015 at 12:52 PM, Erick Erickson 
  erickerick...@gmail.com
   wrote:
  
This really sounds like an XY problem. Or when you use

Re: Lucene/Solr 5.0 and custom FieldCahe implementation

2015-08-25 Thread Jamie Johnson

I had seen this as well, if I over wrote this by extending
SolrIndexSearcher how do I have my extension used?  I didn't see a way that
could be plugged in.
On Aug 25, 2015 7:15 AM, Mikhail Khludnev mkhlud...@griddynamics.com
wrote:

 On Tue, Aug 25, 2015 at 2:03 PM, Jamie Johnson jej2...@gmail.com wrote:

  Thanks Mikhail.  If I'm reading the SimpleFacets class correctly, out
  delegates to DocValuesFacets when facet method is FC, what used to be
  FieldCache I believe.  DocValuesFacets either uses DocValues or builds
 then
  using the UninvertingReader.
 

 Ah.. got it. Thanks for reminding this details.It seems like even
 docValues=true doesn't help with your custom implementation.


 
  I am not seeing a clean extension point to add a custom UninvertingReader
  to Solr, would the only way be to copy the FacetComponent and
 SimpleFacets
  and modify as needed?
 
 Sadly, yes. There is no proper extension point. Also, consider overriding
 SolrIndexSearcher.wrapReader(SolrCore, DirectoryReader) where the
 particular UninvertingReader is created, there you can pass the own one,
 which refers to custom FieldCache.


  On Aug 25, 2015 12:42 AM, Mikhail Khludnev mkhlud...@griddynamics.com
 
  wrote:
 
   Hello Jamie,
   I don't understand how it could choose DocValuesFacets (it occurs on
   docValues=true) field, but then switches to
 UninvertingReader/FieldCache
   which means docValues=false. If you can provide more details it would
 be
   great.
   Beside of that, I suppose you can only implement and inject your own
   UninvertingReader, I don't think there is an extension point for this.
  It's
   too specific requirement.
  
   On Tue, Aug 25, 2015 at 3:50 AM, Jamie Johnson jej2...@gmail.com
  wrote:
  
as mentioned in a previous email I have a need to provide security
   controls
at the term level.  I know that Lucene/Solr doesn't support this so I
  had
baked something onto a 4.x baseline that was sufficient for my use
  cases.
I am now looking to move that implementation to 5.x and am running
 into
   an
issue around faceting.  Previously we were able to provide a custom
  cache
implementation that would create separate cache entries given a
   particular
set of security controls, but in Solr 5 some faceting is delegated to
DocValuesFacets which delegates to UninvertingReader in my case (we
 are
   not
storing DocValues).  The issue I am running into is that before 5.x I
  had
the ability to influence the FieldCache that was used at the Solr
 level
   to
also include a security token into the key so each cache entry was
  scoped
to a particular level.  With the current implementation the
 FieldCache
seems to be an internal detail that I can't influence in anyway.  Is
  this
correct?  I had noticed this Jira ticket
https://issues.apache.org/jira/browse/LUCENE-5427, is there any
  movement
on
this?  Is there another way to influence the information that is put
  into
these caches?  As always thanks in advance for any suggestions.
   
-Jamie
   
  
  
  
   --
   Sincerely yours
   Mikhail Khludnev
   Principal Engineer,
   Grid Dynamics
  
   http://www.griddynamics.com
   mkhlud...@griddynamics.com
  
 



 --
 Sincerely yours
 Mikhail Khludnev
 Principal Engineer,
 Grid Dynamics

 http://www.griddynamics.com
 mkhlud...@griddynamics.com

Re: how to prevent uuid-field changing in /update query?

2015-08-25 Thread Jamie Johnson

It sounds like you need to control when the uuid is and is not created,
just feels like you'd get better mileage doing this outside of solr
On Aug 25, 2015 7:49 AM, CrazyDiamond crazy_diam...@mail.ru wrote:

 Why not generate the uuid client side on the initial save and reuse this on
 updates?  i can't do this because i have delta-import queries which also
 should be able to assign uuid when it is  needed



 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/how-to-prevent-uuid-field-changing-in-update-query-tp4225113p4225137.html
 Sent from the Solr - User mailing list archive at Nabble.com.

Re: how to prevent uuid-field changing in /update query?

2015-08-25 Thread Jamie Johnson

Why not generate the uuid client side on the initial save and reuse this on
updates?
On Aug 25, 2015 4:22 AM, CrazyDiamond crazy_diam...@mail.ru wrote:

 i have uuid field. it is not set as unique, but nevertheless i want it not
 to
 be changed every time when i call /update. it might be because  i added
 requesthandler with name /update which contains uuid update срфшт .But if
 i not do this i have no uuid at all.May be i can config uuid update-chain
 to
 set uuid only if it is blank?



 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/how-to-prevent-uuid-field-changing-in-update-query-tp4225113.html
 Sent from the Solr - User mailing list archive at Nabble.com.

Re: how to prevent uuid-field changing in /update query?

2015-08-25 Thread Jamie Johnson

I am honestly not familiar enough to say.  Best to try it
On Aug 25, 2015 7:59 AM, CrazyDiamond crazy_diam...@mail.ru wrote:

 It sounds like you need to control when the uuid is and is not created,
 just feels like you'd get better mileage doing this outside of solr
 Can I simply insert a condition(blank or not ) in uuid update-chain?



 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/how-to-prevent-uuid-field-changing-in-update-query-tp4225113p4225141.html
 Sent from the Solr - User mailing list archive at Nabble.com.

Re: Lucene/Solr 5.0 and custom FieldCahe implementation

2015-08-25 Thread Jamie Johnson

Thanks Mikhail.  If I'm reading the SimpleFacets class correctly, out
delegates to DocValuesFacets when facet method is FC, what used to be
FieldCache I believe.  DocValuesFacets either uses DocValues or builds then
using the UninvertingReader.

I am not seeing a clean extension point to add a custom UninvertingReader
to Solr, would the only way be to copy the FacetComponent and SimpleFacets
and modify as needed?
On Aug 25, 2015 12:42 AM, Mikhail Khludnev mkhlud...@griddynamics.com
wrote:

 Hello Jamie,
 I don't understand how it could choose DocValuesFacets (it occurs on
 docValues=true) field, but then switches to UninvertingReader/FieldCache
 which means docValues=false. If you can provide more details it would be
 great.
 Beside of that, I suppose you can only implement and inject your own
 UninvertingReader, I don't think there is an extension point for this. It's
 too specific requirement.

 On Tue, Aug 25, 2015 at 3:50 AM, Jamie Johnson jej2...@gmail.com wrote:

  as mentioned in a previous email I have a need to provide security
 controls
  at the term level.  I know that Lucene/Solr doesn't support this so I had
  baked something onto a 4.x baseline that was sufficient for my use cases.
  I am now looking to move that implementation to 5.x and am running into
 an
  issue around faceting.  Previously we were able to provide a custom cache
  implementation that would create separate cache entries given a
 particular
  set of security controls, but in Solr 5 some faceting is delegated to
  DocValuesFacets which delegates to UninvertingReader in my case (we are
 not
  storing DocValues).  The issue I am running into is that before 5.x I had
  the ability to influence the FieldCache that was used at the Solr level
 to
  also include a security token into the key so each cache entry was scoped
  to a particular level.  With the current implementation the FieldCache
  seems to be an internal detail that I can't influence in anyway.  Is this
  correct?  I had noticed this Jira ticket
  https://issues.apache.org/jira/browse/LUCENE-5427, is there any movement
  on
  this?  Is there another way to influence the information that is put into
  these caches?  As always thanks in advance for any suggestions.
 
  -Jamie
 



 --
 Sincerely yours
 Mikhail Khludnev
 Principal Engineer,
 Grid Dynamics

 http://www.griddynamics.com
 mkhlud...@griddynamics.com

Re: Tokenizers and DelimitedPayloadTokenFilterFactory

2015-08-25 Thread Jamie Johnson

Right, I had assumed (obviously here is my problem) that I'd be able to
specify payloads for the field regardless of the field type.  Looking at
TrieField that is certainly non-trivial.  After a bit of digging it appears
that if I wanted to do something here I'd need to build a new TrieField,
override createField and provide a Field that would return something like
NumericTokenStream but also provide the payloads.  Like you said sounds
interesting to say the least...

Were payloads not really intended to be used for these types of fields from
a Lucene perspective?


On Tue, Aug 25, 2015 at 6:29 PM, Erick Erickson erickerick...@gmail.com
wrote:

 Well, you're going down a path that hasn't been trodden before ;).

 If you can treat your primitive types as text types you might get
 some traction, but that makes a lot of operations like numeric
 comparison difficult.

 H. another idea from left field. For single-valued types,
 what about a sidecar field that has the auth token? And even
 for a multiValued field, two parallel fields are guaranteed to
 maintain order so perhaps you could do something here. Yes,
 I'm waving my hands a LOT here.

 I suspect that trying to have a custom type that incorporates
 payloads for, say, trie fields will be interesting to say the least.
 Numeric types are packed to save storage etc. so it'll be
 an adventure..

 Best,
 Erick

 On Tue, Aug 25, 2015 at 2:43 PM, Jamie Johnson jej2...@gmail.com wrote:
  We were originally using this approach, i.e. run things through the
  KeywordTokenizer - DelimitedPayloadFilter - WordDelimiterFilter.  Again
  this works fine for text, though I had wanted to use the
 StandardTokenizer
  in the chain.  Is there an equivalent filter that does what the
  StandardTokenizer does?
 
  All of this said this doesn't address the issue of the primitive field
  types, which at this point is the bigger issue.  Given this use case
 should
  there be another way to provide payloads?
 
  My current thinking is that I will need to provide custom implementations
  for all of the field types I would like to support payloads on which will
  essentially be copies of the standard versions with some extra sugar to
  read/write the payloads (I don't see a way to wrap/delegate these at this
  point because AttributeSource has the attribute retrieval related methods
  as final so I can't simply wrap another tokenizer and return my added
  attributes + the wrapped attributes).  I know my use case is a bit
 strange,
  but I had not expected to need to do this given that Lucene/Solr supports
  payloads on these field types, they just aren't exposed.
 
  As always I appreciate any ideas if I'm barking up the wrong tree here.
 
  On Tue, Aug 25, 2015 at 2:52 PM, Markus Jelsma 
 markus.jel...@openindex.io
  wrote:
 
  Well, if i remember correctly (i have no testing facility at hand)
  WordDelimiterFilter maintains payloads on emitted sub terms. So if you
 use
  a KeywordTokenizer, input 'some text^PAYLOAD', and have a
  DelimitedPayloadFilter, the entire string gets a payload. You can then
  split that string up again in individual tokens. It is possible to abuse
  WordDelimiterFilter for it because it has a types parameter that you can
  use to split it on whitespace if its input is not trimmed. Otherwise you
  can use any other character instead of a space as your input.
 
  This is a crazy idea, but it might work.
 
  -Original message-
   From:Jamie Johnson jej2...@gmail.com
   Sent: Tuesday 25th August 2015 19:37
   To: solr-user@lucene.apache.org
   Subject: Re: Tokenizers and DelimitedPayloadTokenFilterFactory
  
   To be clear, we are using payloads as a way to attach authorizations
 to
   individual tokens within Solr.  The payloads are normal Solr Payloads
   though we are not using floats, we are using the identity payload
 encoder
   (org.apache.lucene.analysis.payloads.IdentityEncoder) which allows for
   storing a byte[] of our choosing into the payload field.
  
   This works great for text, but now that I'm indexing more than just
 text
  I
   need a way to specify the payload on the other field types.  Does that
  make
   more sense?
  
   On Tue, Aug 25, 2015 at 12:52 PM, Erick Erickson 
  erickerick...@gmail.com
   wrote:
  
This really sounds like an XY problem. Or when you use
payload it's not the Solr payload.
   
So Solr Payloads are a float value that you can attach to
individual terms to influence the scoring. Attaching the
_same_ payload to all terms in a field is much the same
thing as boosting on any matches in the field at query time
or boosting on the field at index time (this latter assuming
that different docs would have different boosts).
   
So can you back up a bit and tell us what you're trying to
accomplish maybe we can be sure we're both talking about
the same thing ;)
   
Best,
Erick
   
On Tue, Aug 25, 2015 at 9:09 AM, Jamie Johnson jej2...@gmail.com
  wrote

Re: Tokenizers and DelimitedPayloadTokenFilterFactory

2015-08-25 Thread Jamie Johnson

We were originally using this approach, i.e. run things through the
KeywordTokenizer - DelimitedPayloadFilter - WordDelimiterFilter.  Again
this works fine for text, though I had wanted to use the StandardTokenizer
in the chain.  Is there an equivalent filter that does what the
StandardTokenizer does?

All of this said this doesn't address the issue of the primitive field
types, which at this point is the bigger issue.  Given this use case should
there be another way to provide payloads?

My current thinking is that I will need to provide custom implementations
for all of the field types I would like to support payloads on which will
essentially be copies of the standard versions with some extra sugar to
read/write the payloads (I don't see a way to wrap/delegate these at this
point because AttributeSource has the attribute retrieval related methods
as final so I can't simply wrap another tokenizer and return my added
attributes + the wrapped attributes).  I know my use case is a bit strange,
but I had not expected to need to do this given that Lucene/Solr supports
payloads on these field types, they just aren't exposed.

As always I appreciate any ideas if I'm barking up the wrong tree here.

On Tue, Aug 25, 2015 at 2:52 PM, Markus Jelsma markus.jel...@openindex.io
wrote:

 Well, if i remember correctly (i have no testing facility at hand)
 WordDelimiterFilter maintains payloads on emitted sub terms. So if you use
 a KeywordTokenizer, input 'some text^PAYLOAD', and have a
 DelimitedPayloadFilter, the entire string gets a payload. You can then
 split that string up again in individual tokens. It is possible to abuse
 WordDelimiterFilter for it because it has a types parameter that you can
 use to split it on whitespace if its input is not trimmed. Otherwise you
 can use any other character instead of a space as your input.

 This is a crazy idea, but it might work.

 -Original message-
  From:Jamie Johnson jej2...@gmail.com
  Sent: Tuesday 25th August 2015 19:37
  To: solr-user@lucene.apache.org
  Subject: Re: Tokenizers and DelimitedPayloadTokenFilterFactory
 
  To be clear, we are using payloads as a way to attach authorizations to
  individual tokens within Solr.  The payloads are normal Solr Payloads
  though we are not using floats, we are using the identity payload encoder
  (org.apache.lucene.analysis.payloads.IdentityEncoder) which allows for
  storing a byte[] of our choosing into the payload field.
 
  This works great for text, but now that I'm indexing more than just text
 I
  need a way to specify the payload on the other field types.  Does that
 make
  more sense?
 
  On Tue, Aug 25, 2015 at 12:52 PM, Erick Erickson 
 erickerick...@gmail.com
  wrote:
 
   This really sounds like an XY problem. Or when you use
   payload it's not the Solr payload.
  
   So Solr Payloads are a float value that you can attach to
   individual terms to influence the scoring. Attaching the
   _same_ payload to all terms in a field is much the same
   thing as boosting on any matches in the field at query time
   or boosting on the field at index time (this latter assuming
   that different docs would have different boosts).
  
   So can you back up a bit and tell us what you're trying to
   accomplish maybe we can be sure we're both talking about
   the same thing ;)
  
   Best,
   Erick
  
   On Tue, Aug 25, 2015 at 9:09 AM, Jamie Johnson jej2...@gmail.com
 wrote:
I would like to specify a particular payload for all tokens emitted
 from
   a
tokenizer, but don't see a clear way to do this.  Ideally I could
 specify
that something like the DelimitedPayloadTokenFilter be run on the
 entire
field and then standard analysis be done on the rest of the field,
 so in
the case that I had the following text
   
this is a test\Foo
   
I would like to create tokens this, is, a, test each with a
   payload
of Foo.  From what I'm seeing though only test get's the payload.  Is
   there
anyway to accomplish this or will I need to implement a custom
 tokenizer?

Re: Disable caching

2015-08-24 Thread Jamie Johnson

I ran into another issue that I am having issue running to ground.  My
implementation on Solr 4.x worked as I expected but trying to migrate this
to Solr 5.x it looks like some of the faceting is delegated to
DocValuesFacets which ultimately caches things at a field level in the
FieldCache.DEFAULT cache.  I don't see anyway to override this cache or
augment the key in anyway, am I missing an extension point here?  Is there
another approach I should be taking in this case?

On Wed, Aug 19, 2015 at 9:08 AM, Jamie Johnson jej2...@gmail.com wrote:

 This was my original thought.  We already have the thread local so should
 be straight fwd to just wrap the Field name and use that as the key.  Again
 thanks, I really appreciate the feedback
 On Aug 19, 2015 8:12 AM, Yonik Seeley ysee...@gmail.com wrote:

 On Tue, Aug 18, 2015 at 10:58 PM, Jamie Johnson jej2...@gmail.com
 wrote:
  Hmm...so I think I have things setup correctly, I have a custom
  QParserPlugin building a custom query that wraps the query built from
 the
  base parser and stores the user who is executing the query.  I've added
 the
  username to the hashCode and equals checks so I think everything is
 setup
  properly.  I ran a quick test and it definitely looks like my items are
  being cached now per user, which is really great.
 
  The outage that I'm running into now is the FieldValueCache doesn't take
  into account the query, so the FieldValueCache is built for user a and
 then
  reused for user b, which is an issue for me.  In short I'm back to my
  NoOpCache for FieldValues.  It's great that I'm in a better spot for the
  others, but is there anything that can be done with FieldValues to take
  into account the requesting user?

 I guess a cache implementation that gets the user through a thread
 local and either wraps the original key with an object containing the
 user, or delegates to a per-user cache underneath.

 -Yonik

Lucene/Solr 5.0 and custom FieldCahe implementation

2015-08-24 Thread Jamie Johnson

as mentioned in a previous email I have a need to provide security controls
at the term level.  I know that Lucene/Solr doesn't support this so I had
baked something onto a 4.x baseline that was sufficient for my use cases.
I am now looking to move that implementation to 5.x and am running into an
issue around faceting.  Previously we were able to provide a custom cache
implementation that would create separate cache entries given a particular
set of security controls, but in Solr 5 some faceting is delegated to
DocValuesFacets which delegates to UninvertingReader in my case (we are not
storing DocValues).  The issue I am running into is that before 5.x I had
the ability to influence the FieldCache that was used at the Solr level to
also include a security token into the key so each cache entry was scoped
to a particular level.  With the current implementation the FieldCache
seems to be an internal detail that I can't influence in anyway.  Is this
correct?  I had noticed this Jira ticket
https://issues.apache.org/jira/browse/LUCENE-5427, is there any movement on
this?  Is there another way to influence the information that is put into
these caches?  As always thanks in advance for any suggestions.

-Jamie

Re: Geospatial Predicate Question

2015-08-21 Thread Jamie Johnson

Thanks for the clarification!
On Aug 19, 2015 3:05 PM, david.w.smi...@gmail.com 
david.w.smi...@gmail.com wrote:

 Hi Jamie,

 Your understanding is inverted.  The predicates can be read as:
 indexed shape   predicate  query shape.

 For indexed point data, there is almost no semantic different between the
 Within and Intersects predicates.  There is if the field is multi-valued
 and you want to ensure that all of the points for a document are within the
 query shape (Within predicate) versus any of them being okay (Intersects
 predicate).  Intersects is pretty fast.

 The Contains predicate only makes sense for non-point indexed data.

 ~ David

 On Wed, Aug 12, 2015 at 6:02 PM Jamie Johnson jej2...@gmail.com wrote:

  Can someone clarify the difference between isWithin and Contains in
 regards
  to Solr's spatial support?  From
  https://wiki.apache.org/solr/SolrAdaptersForLuceneSpatial4 I see that if
  you are using point data you should use Intersects, but it is not clear
  when to use isWithin and contains.  My guess is that you use isWithin
 when
  you want to know if the query shape is within the shape that is indexed
 and
  you use contains to know if the query shape contains the indexed shape.
 Is
  that right?
 
 --
 Lucene/Solr Search Committer, Consultant, Developer, Author, Speaker
 LinkedIn: http://linkedin.com/in/davidwsmiley | Book:
 http://www.solrenterprisesearchserver.com

Re: Disable caching

2015-08-19 Thread Jamie Johnson

This was my original thought.  We already have the thread local so should
be straight fwd to just wrap the Field name and use that as the key.  Again
thanks, I really appreciate the feedback
On Aug 19, 2015 8:12 AM, Yonik Seeley ysee...@gmail.com wrote:

 On Tue, Aug 18, 2015 at 10:58 PM, Jamie Johnson jej2...@gmail.com wrote:
  Hmm...so I think I have things setup correctly, I have a custom
  QParserPlugin building a custom query that wraps the query built from the
  base parser and stores the user who is executing the query.  I've added
 the
  username to the hashCode and equals checks so I think everything is setup
  properly.  I ran a quick test and it definitely looks like my items are
  being cached now per user, which is really great.
 
  The outage that I'm running into now is the FieldValueCache doesn't take
  into account the query, so the FieldValueCache is built for user a and
 then
  reused for user b, which is an issue for me.  In short I'm back to my
  NoOpCache for FieldValues.  It's great that I'm in a better spot for the
  others, but is there anything that can be done with FieldValues to take
  into account the requesting user?

 I guess a cache implementation that gets the user through a thread
 local and either wraps the original key with an object containing the
 user, or delegates to a per-user cache underneath.

 -Yonik

Re: Disable caching

2015-08-18 Thread Jamie Johnson

Hmm...so I think I have things setup correctly, I have a custom
QParserPlugin building a custom query that wraps the query built from the
base parser and stores the user who is executing the query.  I've added the
username to the hashCode and equals checks so I think everything is setup
properly.  I ran a quick test and it definitely looks like my items are
being cached now per user, which is really great.

The outage that I'm running into now is the FieldValueCache doesn't take
into account the query, so the FieldValueCache is built for user a and then
reused for user b, which is an issue for me.  In short I'm back to my
NoOpCache for FieldValues.  It's great that I'm in a better spot for the
others, but is there anything that can be done with FieldValues to take
into account the requesting user?

On Tue, Aug 18, 2015 at 9:59 PM, Yonik Seeley ysee...@gmail.com wrote:

 On Tue, Aug 18, 2015 at 9:51 PM, Jamie Johnson jej2...@gmail.com wrote:
  Thanks, I'll try to delve into this.  We are currently using the parent
  query parser, within we could use {!secure} I think.  Ultimately I would
  want the solr qparser to actually do the work of parsing and I'd just
 wrap
  that.

 Right... look at something like BoostQParserPlugin
 it should be trivial to wrap any other type of query.

 baseParser = subQuery(localParams.get(QueryParsing.V), null);
 Query q = baseParser.getQuery();

 q={!secure}my_normal_query
 OR
 q={!secure v=$qq)qq=my_normal_query
 OR
 q={!secure}{!parent ...}
 OR
 q={!secure v=$qq}qq={!parent. ..}

 -Yonik


   Are there any examples that I could look at for this?  It's not
  clear to me what to do in the qparser once I have the user auths though.
  Again thanks, this is really good stuff.
  On Aug 18, 2015 8:54 PM, Yonik Seeley ysee...@gmail.com wrote:
 
  On Tue, Aug 18, 2015 at 8:38 PM, Jamie Johnson jej2...@gmail.com
 wrote:
   I really like this idea in concept.  My query would literally be just
 a
   wrapper at that point, what would be the appropriate place to do this?
 
  It depends on how much you are trying to make everything transparent
  (that there is security) or not.
 
  First approach is explicitly changing the query types (you obviously
  need to make sure that only trusted code can run queries against solr
  for this method):
  q=foo:barfq=inStock:true
  q={!secure id=user}foo:barfq={!secure id=user}inStock:true
you could even make the {!secure} qparser look for global security
  params so you don't need to repeat them.
  q={!secure}foo:barfq={!secure}inStock:truesecurity_id=user
 
  Second approach would prob involve a search component, probably that
  runs after the query component, that would handle wrapping any queries
  or filters in the prepare() phase.  This would be slightly more
  difficult since it would require ensuring that none of the solr code /
  features you use re-grab the q or fq parameters re-parse without
  the opportunity for you to wrap them again.
 
   What would I need to do to the query to make it behave with the cache.
 
  Probably not much... record the credentials in the wrapper and use in
  the hashCode / equals.
 
  -Yonik
 
 
   Again thanks for the idea, I think this could be a simple way to use
 the
   caches.
  
   On Tue, Aug 18, 2015 at 8:31 PM, Yonik Seeley ysee...@gmail.com
 wrote:
  
   On Tue, Aug 18, 2015 at 8:19 PM, Jamie Johnson jej2...@gmail.com
  wrote:
when you say a security filter, are you asking if I can express my
   security
constraint as a query?  If that is the case then the answer is
 no.  At
   this
point I have a requirement to secure Terms (a nightmare I know).
  
   Heh - ok, I figured as much.
  
   So... you could also wrap the main query and any filter queries in a
   custom security query that would contain the user, and thus still be
   able to use filter and query caches unmodified. I know... that's only
   a small part of the problem though.
  
   -Yonik

Re: Disable caching

2015-08-18 Thread Jamie Johnson

when you say a security filter, are you asking if I can express my security
constraint as a query?  If that is the case then the answer is no.  At this
point I have a requirement to secure Terms (a nightmare I know).  Our
fallback is to aggregate the authorizations to a document level and secure
the document which I think we wouldn't have to do anything to the caches
but our customer has pushed back on this in the past.

On Tue, Aug 18, 2015 at 7:46 PM, Yonik Seeley ysee...@gmail.com wrote:

 On Tue, Aug 18, 2015 at 7:11 PM, Jamie Johnson jej2...@gmail.com wrote:
  Yes, my use case is security.  Basically I am executing queries with
  certain auths and when they are executed multiple times with differing
  auths I'm getting cached results.

 If it's just simple stuff like top N docs returned, can't you just use
 a security filter?

 The queryResult cache uses both the main query and a list of filters
 (and the sort order) for the cache key.

 -Yonik

Re: Disable caching

2015-08-18 Thread Jamie Johnson

Thanks, I'll try to delve into this.  We are currently using the parent
query parser, within we could use {!secure} I think.  Ultimately I would
want the solr qparser to actually do the work of parsing and I'd just wrap
that.  Are there any examples that I could look at for this?  It's not
clear to me what to do in the qparser once I have the user auths though.
Again thanks, this is really good stuff.
On Aug 18, 2015 8:54 PM, Yonik Seeley ysee...@gmail.com wrote:

 On Tue, Aug 18, 2015 at 8:38 PM, Jamie Johnson jej2...@gmail.com wrote:
  I really like this idea in concept.  My query would literally be just a
  wrapper at that point, what would be the appropriate place to do this?

 It depends on how much you are trying to make everything transparent
 (that there is security) or not.

 First approach is explicitly changing the query types (you obviously
 need to make sure that only trusted code can run queries against solr
 for this method):
 q=foo:barfq=inStock:true
 q={!secure id=user}foo:barfq={!secure id=user}inStock:true
   you could even make the {!secure} qparser look for global security
 params so you don't need to repeat them.
 q={!secure}foo:barfq={!secure}inStock:truesecurity_id=user

 Second approach would prob involve a search component, probably that
 runs after the query component, that would handle wrapping any queries
 or filters in the prepare() phase.  This would be slightly more
 difficult since it would require ensuring that none of the solr code /
 features you use re-grab the q or fq parameters re-parse without
 the opportunity for you to wrap them again.

  What would I need to do to the query to make it behave with the cache.

 Probably not much... record the credentials in the wrapper and use in
 the hashCode / equals.

 -Yonik


  Again thanks for the idea, I think this could be a simple way to use the
  caches.
 
  On Tue, Aug 18, 2015 at 8:31 PM, Yonik Seeley ysee...@gmail.com wrote:
 
  On Tue, Aug 18, 2015 at 8:19 PM, Jamie Johnson jej2...@gmail.com
 wrote:
   when you say a security filter, are you asking if I can express my
  security
   constraint as a query?  If that is the case then the answer is no.  At
  this
   point I have a requirement to secure Terms (a nightmare I know).
 
  Heh - ok, I figured as much.
 
  So... you could also wrap the main query and any filter queries in a
  custom security query that would contain the user, and thus still be
  able to use filter and query caches unmodified. I know... that's only
  a small part of the problem though.
 
  -Yonik

Disable caching

2015-08-18 Thread Jamie Johnson

I see that if Solr is in realtime mode that caching is disable within the
SolrIndexSearcher that is created in SolrCore, but is there anyway to
disable caching without being in realtime mode?  Currently I'm implementing
a NoOp cache that implements SolrCache but returns null for everything and
doesn't return anything on the get requests, but it would be nice to not
need to do this by being able to disable caching in general.  Is this
possible?

-Jamie

Re: Disable caching

2015-08-18 Thread Jamie Johnson

Yes, my use case is security.  Basically I am executing queries with
certain auths and when they are executed multiple times with differing
auths I'm getting cached results.  One option is to have another
implementation that has a number of caches based on the auths, something
that I suspect we will at some point go (unless there is a better solution
;).  I'd be happy to look at other options so all suggestions are
appreciated.

On Tue, Aug 18, 2015 at 6:56 PM, Yonik Seeley ysee...@gmail.com wrote:

 You can comment out (some) of the caches.

 There are some caches like field caches that are more at the lucene
 level and can't be disabled.

 Can I ask what you are trying to prevent from being cached and why?
 Different caches are for different things, so it would seem to be an
 odd usecase to disable them all.  Security?

 -Yonik


 On Tue, Aug 18, 2015 at 6:52 PM, Jamie Johnson jej2...@gmail.com wrote:
  I see that if Solr is in realtime mode that caching is disable within the
  SolrIndexSearcher that is created in SolrCore, but is there anyway to
  disable caching without being in realtime mode?  Currently I'm
 implementing
  a NoOp cache that implements SolrCache but returns null for everything
 and
  doesn't return anything on the get requests, but it would be nice to not
  need to do this by being able to disable caching in general.  Is this
  possible?
 
  -Jamie

Re: Disable caching

2015-08-18 Thread Jamie Johnson

I really like this idea in concept.  My query would literally be just a
wrapper at that point, what would be the appropriate place to do this?
What would I need to do to the query to make it behave with the cache.

Again thanks for the idea, I think this could be a simple way to use the
caches.

On Tue, Aug 18, 2015 at 8:31 PM, Yonik Seeley ysee...@gmail.com wrote:

 On Tue, Aug 18, 2015 at 8:19 PM, Jamie Johnson jej2...@gmail.com wrote:
  when you say a security filter, are you asking if I can express my
 security
  constraint as a query?  If that is the case then the answer is no.  At
 this
  point I have a requirement to secure Terms (a nightmare I know).

 Heh - ok, I figured as much.

 So... you could also wrap the main query and any filter queries in a
 custom security query that would contain the user, and thus still be
 able to use filter and query caches unmodified. I know... that's only
 a small part of the problem though.

 -Yonik

Re: phonetic filter factory question

2015-08-16 Thread Jamie Johnson

Thanks, i didn't know you could do this, I'll check this out.
On Aug 15, 2015 12:54 PM, Alexandre Rafalovitch arafa...@gmail.com
wrote:

 From the teaching to fish category of advice (since I don't know the
 actual answer).

 Did you try Analysis screen in the Admin UI? If you check Verbose
 output mark, you will see all the offsets and can easily confirm the
 detailed behavior for yourself.

 Regards,
   Alex.
 
 Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
 http://www.solr-start.com/


 On 15 August 2015 at 12:22, Jamie Johnson jej2...@gmail.com wrote:
  The JavaDoc says that the PhoneticFilterFactory will inject tokens with
  an offset of 0 into the stream.  I'm assuming this means an offset of 0
  from the token that it is analyzing, is that right?  I am trying to
  collapse some of my schema, I currently have a text field that I use for
  general purpose text and another field with the PhoneticFilterFactory
  applied for finding things that are similar phonetically, but if this
 does
  inject at the current position then I could likely collapse these into a
  single field.  As always thanks in advance!
 
  -Jamie

phonetic filter factory question

2015-08-15 Thread Jamie Johnson

The JavaDoc says that the PhoneticFilterFactory will inject tokens with
an offset of 0 into the stream.  I'm assuming this means an offset of 0
from the token that it is analyzing, is that right?  I am trying to
collapse some of my schema, I currently have a text field that I use for
general purpose text and another field with the PhoneticFilterFactory
applied for finding things that are similar phonetically, but if this does
inject at the current position then I could likely collapse these into a
single field.  As always thanks in advance!

-Jamie

Geospatial Predicate Question

2015-08-12 Thread Jamie Johnson

Can someone clarify the difference between isWithin and Contains in regards
to Solr's spatial support?  From
https://wiki.apache.org/solr/SolrAdaptersForLuceneSpatial4 I see that if
you are using point data you should use Intersects, but it is not clear
when to use isWithin and contains.  My guess is that you use isWithin when
you want to know if the query shape is within the shape that is indexed and
you use contains to know if the query shape contains the indexed shape.  Is
that right?

Re: Filtering documents using payloads

2015-08-07 Thread Jamie Johnson

Looks like my issue is that my nextDoc call is consuming the first
position, and then on the call to nextPosition it's moving past where I
want it to be.  I believe that I have this working properly now by checking
if the current position should or shouldn't be incremented.

On Thu, Aug 6, 2015 at 7:35 PM, Jamie Johnson jej2...@gmail.com wrote:

 I am attempting to put together a DocsAndPositionsEnum that can hide terms
 given the payload on the term.  The idea is that if a term has a particular
 access control and the user does not I don't want it to be visible.  I have
 based this off of


 https://github.com/roshanp/lucure-core/blob/master/src/main/java/com/lucure/core/codec/AccessFilteredDocsAndPositionsEnum.java

 with some modifications to try and preserve the position information that
 is consumed as part of the hasAccess method.  The current iteration that I
 am working with seems to be providing wrong positions to the
 ExactPhraseScorer.phraseFreq() method in the ChunkState's that are
 calculated.

 Below is my current iteration on this, but I can't narrow down why exactly
 the position information isn't what I expect.  Does anything jump out?

 package com.lucure.core.codec;

 import com.lucure.core.AuthorizationsHolder;
 import com.lucure.core.security.Authorizations;
 import com.lucure.core.security.FieldVisibility;
 import com.lucure.core.security.VisibilityParseException;
 import org.apache.lucene.index.DocsAndPositionsEnum;
 import org.apache.lucene.util.AttributeSource;
 import org.apache.lucene.util.BytesRef;

 import java.io.IOException;
 import java.util.Arrays;

 import static com.lucure.core.codec.AccessFilteredDocsAndPositionsEnum
   .AllAuthorizationsHolder.ALLAUTHSHOLDER;

 /**
  * Enum to read and restrict access to a document based on the payload
 which
  * is expected to store the visibility
  */
 public class AccessFilteredDocsAndPositionsEnum extends
 DocsAndPositionsEnum {

 /**
  * This placeholder allows for lucene specific operations such as
  * merge to read data with all authorizations enabled. This should
 never
  * be used outside of the Codec.
  */
 static class AllAuthorizationsHolder extends AuthorizationsHolder {

 static final AllAuthorizationsHolder ALLAUTHSHOLDER = new
 AllAuthorizationsHolder();

 private AllAuthorizationsHolder() {
 super(Authorizations.EMPTY);
 }
 }

 static void enableMergeAuthorizations() {
 AuthorizationsHolder.threadAuthorizations.set(ALLAUTHSHOLDER);
 }

 static void disableMergeAuthorizations() {
 AuthorizationsHolder.threadAuthorizations.remove();
 }

 private final DocsAndPositionsEnum docsAndPositionsEnum;
 private final AuthorizationsHolder authorizationsHolder;

 public AccessFilteredDocsAndPositionsEnum(
   DocsAndPositionsEnum docsAndPositionsEnum) {
 this(docsAndPositionsEnum,
 AuthorizationsHolder.threadAuthorizations.get());
 }

 public AccessFilteredDocsAndPositionsEnum(
   DocsAndPositionsEnum docsAndPositionsEnum,
   AuthorizationsHolder authorizationsHolder) {
 this.docsAndPositionsEnum = docsAndPositionsEnum;
 this.authorizationsHolder = authorizationsHolder;
 }

 long cost;
 int endOffset, startOffset, currentPosition, freq, docId;
 BytesRef payload;

 @Override
 public int nextPosition() throws IOException {

 while (!hasAccess()) {

 }

 return currentPosition + 1;
 }

 @Override
 public int startOffset() throws IOException {
 return startOffset;
 }

 @Override
 public int endOffset() throws IOException {
 return endOffset;
 }

 @Override
 public BytesRef getPayload() throws IOException {
 return payload;
 }

 @Override
 public int freq() throws IOException {
 return docsAndPositionsEnum.freq();
 }

 @Override
 public int docID() {
 return docsAndPositionsEnum.docID();
 }

 @Override
 public int nextDoc() throws IOException {

 while (docsAndPositionsEnum.nextDoc() != NO_MORE_DOCS) {

 if (hasAccess()) {
 return docID();
 }
 }
 return NO_MORE_DOCS;

 }

 @Override
 public int advance(int target) throws IOException {
 int advance = docsAndPositionsEnum.advance(target);
 if (advance != NO_MORE_DOCS) {
 if (hasAccess()) {
 return docID();
 } else {
 //seek to next available
 int doc;
 while ((doc = nextDoc())  target) {
 }
 return doc;
 }
 }
 return NO_MORE_DOCS;
 }

 @Override
 public long cost() {
 return docsAndPositionsEnum.cost();
 }

 protected boolean hasAccess() throws IOException

Filtering documents using payloads

2015-08-06 Thread Jamie Johnson

I am attempting to put together a DocsAndPositionsEnum that can hide terms
given the payload on the term.  The idea is that if a term has a particular
access control and the user does not I don't want it to be visible.  I have
based this off of

https://github.com/roshanp/lucure-core/blob/master/src/main/java/com/lucure/core/codec/AccessFilteredDocsAndPositionsEnum.java

with some modifications to try and preserve the position information that
is consumed as part of the hasAccess method.  The current iteration that I
am working with seems to be providing wrong positions to the
ExactPhraseScorer.phraseFreq() method in the ChunkState's that are
calculated.

Below is my current iteration on this, but I can't narrow down why exactly
the position information isn't what I expect.  Does anything jump out?

package com.lucure.core.codec;

import com.lucure.core.AuthorizationsHolder;
import com.lucure.core.security.Authorizations;
import com.lucure.core.security.FieldVisibility;
import com.lucure.core.security.VisibilityParseException;
import org.apache.lucene.index.DocsAndPositionsEnum;
import org.apache.lucene.util.AttributeSource;
import org.apache.lucene.util.BytesRef;

import java.io.IOException;
import java.util.Arrays;

import static com.lucure.core.codec.AccessFilteredDocsAndPositionsEnum
  .AllAuthorizationsHolder.ALLAUTHSHOLDER;

/**
 * Enum to read and restrict access to a document based on the payload which
 * is expected to store the visibility
 */
public class AccessFilteredDocsAndPositionsEnum extends
DocsAndPositionsEnum {

/**
 * This placeholder allows for lucene specific operations such as
 * merge to read data with all authorizations enabled. This should never
 * be used outside of the Codec.
 */
static class AllAuthorizationsHolder extends AuthorizationsHolder {

static final AllAuthorizationsHolder ALLAUTHSHOLDER = new
AllAuthorizationsHolder();

private AllAuthorizationsHolder() {
super(Authorizations.EMPTY);
}
}

static void enableMergeAuthorizations() {
AuthorizationsHolder.threadAuthorizations.set(ALLAUTHSHOLDER);
}

static void disableMergeAuthorizations() {
AuthorizationsHolder.threadAuthorizations.remove();
}

private final DocsAndPositionsEnum docsAndPositionsEnum;
private final AuthorizationsHolder authorizationsHolder;

public AccessFilteredDocsAndPositionsEnum(
  DocsAndPositionsEnum docsAndPositionsEnum) {
this(docsAndPositionsEnum,
AuthorizationsHolder.threadAuthorizations.get());
}

public AccessFilteredDocsAndPositionsEnum(
  DocsAndPositionsEnum docsAndPositionsEnum,
  AuthorizationsHolder authorizationsHolder) {
this.docsAndPositionsEnum = docsAndPositionsEnum;
this.authorizationsHolder = authorizationsHolder;
}

long cost;
int endOffset, startOffset, currentPosition, freq, docId;
BytesRef payload;

@Override
public int nextPosition() throws IOException {

while (!hasAccess()) {

}

return currentPosition + 1;
}

@Override
public int startOffset() throws IOException {
return startOffset;
}

@Override
public int endOffset() throws IOException {
return endOffset;
}

@Override
public BytesRef getPayload() throws IOException {
return payload;
}

@Override
public int freq() throws IOException {
return docsAndPositionsEnum.freq();
}

@Override
public int docID() {
return docsAndPositionsEnum.docID();
}

@Override
public int nextDoc() throws IOException {

while (docsAndPositionsEnum.nextDoc() != NO_MORE_DOCS) {

if (hasAccess()) {
return docID();
}
}
return NO_MORE_DOCS;

}

@Override
public int advance(int target) throws IOException {
int advance = docsAndPositionsEnum.advance(target);
if (advance != NO_MORE_DOCS) {
if (hasAccess()) {
return docID();
} else {
//seek to next available
int doc;
while ((doc = nextDoc())  target) {
}
return doc;
}
}
return NO_MORE_DOCS;
}

@Override
public long cost() {
return docsAndPositionsEnum.cost();
}

protected boolean hasAccess() throws IOException {
payload = docsAndPositionsEnum.getPayload();
endOffset = docsAndPositionsEnum.endOffset();
startOffset = docsAndPositionsEnum.startOffset();

currentPosition = docsAndPositionsEnum.nextPosition() - 1;

BytesRef payload = docsAndPositionsEnum.getPayload();
try {
if (payload == null ||
ALLAUTHSHOLDER.equals(authorizationsHolder) ||

this.authorizationsHolder.getVisibilityEvaluator().evaluate(

PayloadSpanOrQuery

2015-07-29 Thread Jamie Johnson

I have a need for doing using payloads in a SpanOrQuery to influence the
score.  I noticed that there is no PayloadSpanOrQuery so I'd like to
implement one.  I couldn't find a ticket in JIRA for this so I created
https://issues.apache.org/jira/browse/LUCENE-6706, if this feature exists I
will gladly close the JIRA if someone could point me in the right direction.

Specifying multiple query parsers

2015-07-29 Thread Jamie Johnson

I have a use case where I want to use the block join query parser for the
top level query and for the nested portion a custom query parser.  I was
originally doing this, which worked

{!parent which='type:parent'}_query_:{!myqp df='child_pay' v='value foo'}

but switched to this which also worked

{!parent which='type:parent'}{!myqp}child_pay:value foo

I have never seen this type of syntax where you can specify multiple query
parsers inline, is this supposed to work or am I taking advantage of some
oversight in the local params implementation?

Re: Specifying multiple query parsers

2015-07-29 Thread Jamie Johnson

Sorry answered my own question.  For those that are interested this is
related to how BlockJoinParentQParser handles sub queries and looks like
it's working as it should.

On Wed, Jul 29, 2015 at 3:31 PM, Jamie Johnson jej2...@gmail.com wrote:

 I have a use case where I want to use the block join query parser for the
 top level query and for the nested portion a custom query parser.  I was
 originally doing this, which worked

 {!parent which='type:parent'}_query_:{!myqp df='child_pay' v='value foo'}

 but switched to this which also worked

 {!parent which='type:parent'}{!myqp}child_pay:value foo

 I have never seen this type of syntax where you can specify multiple query
 parsers inline, is this supposed to work or am I taking advantage of some
 oversight in the local params implementation?

Re: Scoring, payloads and phrase queries

2015-07-25 Thread Jamie Johnson

Thanks Mikhail! I had seen this but had originally thought it wouldn't be
usable.  That said I think I was wrong.  I have an example that rewrites a
phrase query as a SpanQuery and then uses the PayloadNearQuery which seems
to work correctly.  I have done something similar for MultiPhraseQuery
(though I am not sure it is right at this point as I don't understand the
usage of the positions in the class at this point).  My first cut is shown
below (PF is just a PayloadFunction and not of much interest).  Does this
look correct?

MultiPhraseQuery phrase = (MultiPhraseQuery)query;
ListTerm[] terms = phrase.getTermArrays();
SpanQuery[] topLevelSpans = new SpanQuery[terms.size()];
for(int j = 0; j  terms.size(); j++) {
Term[] internalTerms = terms.get(j);
SpanQuery[] sq = new SpanQuery[internalTerms.length];
for(int i = 0; i  internalTerms.length; i++) {
sq[i] = new SpanTermQuery(internalTerms[i]);
}
topLevelSpans[j]= new SpanOrQuery(sq);
}
PayloadNearQuery pnq = new PayloadNearQuery(topLevelSpans,
phrase.getSlop(), true, new PF());
pnq.setBoost(phrase.getBoost());


It looks like to support Payloads in all the query types I would like to
support I'll need to rewrite the queries (or their pieces) to a
PayloadNearQuery or a PayloadTermQuery.  Is there a PayloadMultiTermQuery
that Fuzzy, Range, Wildcard, etc. type of queries could be rewritten to?
Again thanks I really appreciate the pointer.


On Jul 25, 2015 5:22 AM, Mikhail Khludnev mkhlud...@griddynamics.com
wrote:

 Does PayloadNearQuery suite for it?

 On Fri, Jul 24, 2015 at 5:41 PM, Jamie Johnson jej2...@gmail.com wrote:

  Is there a way to consider payloads for scoring in phrase queries like
  exists in PayloadTermQuery?
 



 --
 Sincerely yours
 Mikhail Khludnev
 Principal Engineer,
 Grid Dynamics

 http://www.griddynamics.com
 mkhlud...@griddynamics.com

Scoring, payloads and phrase queries

2015-07-24 Thread Jamie Johnson

Is there a way to consider payloads for scoring in phrase queries like
exists in PayloadTermQuery?

Re: Scoring, payloads and phrase queries

2015-07-24 Thread Jamie Johnson

looks like there is nothing that exists in this regard and there is no jira
ticket that I could find.  Is this something that there is any other
interest in?  Is this something that a ticket should be created for?

On Fri, Jul 24, 2015 at 10:41 AM, Jamie Johnson jej2...@gmail.com wrote:

 Is there a way to consider payloads for scoring in phrase queries like
 exists in PayloadTermQuery?

1 2 3 4 5 6 >

1 - 100 of 591 matches

Mail list logo