date:20170612

Re: Phrase Query only forward direction

2017-06-12 Thread Aman Deep Singh

Thanks Eric

On Mon, Jun 12, 2017 at 10:28 PM Erick Erickson 
wrote:

> Complex phrase also has an inorder flag that I think you're looking for
> here.
>
> Best,
> Erick
>
> On Mon, Jun 12, 2017 at 7:16 AM, Erik Hatcher 
> wrote:
> > Understood.   If you need ordered, “sloppy” (some distance) phrases, you
> could OR in a {!complexphrase} query.
> >
> >
> https://cwiki.apache.org/confluence/display/solr/Other+Parsers#OtherParsers-ComplexPhraseQueryParser
> <
> https://cwiki.apache.org/confluence/display/solr/Other+Parsers#OtherParsers-ComplexPhraseQueryParser
> >
> >
> > Something like:
> >
> > q=({!edismax … ps=0 v=$qq}) OR {!complexphrase df=nameSearch v=$qq}
> >
> > where &qq=12345 masitha
> >
> > Erik
> >
> >
> >> On Jun 12, 2017, at 9:57 AM, Aman Deep Singh 
> wrote:
> >>
> >> Yes Erik I can use ps=0 but, my problem is that I want phrase which have
> >> same sequence and they can be present with in some distance
> >> E.g.
> >> If I have document masitha xyz 12345
> >> I want that to be boosted since the sequence is in order .That's why I
> have
> >> use ps=5
> >> Thanks,
> >> Aman Deep Singh
> >>
> >> On 12-Jun-2017 5:44 PM, "Erik Hatcher"  wrote:
> >>
> >> Using ps=5 causes the phrase matching to be unordered matching.   You’ll
> >> have to set ps=0, if using edismax, to get exact order phrase matches.
> >>
> >>Erik
> >>
> >>
> >>> On Jun 12, 2017, at 1:09 AM, Aman Deep Singh <
> amandeep.coo...@gmail.com>
> >> wrote:
> >>>
> >>> Hi,
> >>> I'm using a phrase query ,but it was applying the phrase boost to the
> >> query
> >>> where terms are in reverse order also ,which i don't want.Is their any
> way
> >>> to avoid the phrase boost for reverse order and apply boost only in
> case
> >> of
> >>> terms are in same sequence
> >>>
> >>> Solr version 6.5.1
> >>>
> >>> e.g.
> >>> http://localhost:8983/solr/l4_collection/select?debugQuery=o
> >> n&defType=edismax&fl=score,nameSearch&indent=on&mm=100%25&
> >> pf=nameSearch&q=12345%20masitha&qf=nameSearch&wt=xml&ps=5
> >>>
> >>>
> >>> while my document has value
> >>>
> >>> in the debug query it is applying boost as
> >>> 23.28365 = sum of:
> >>> 15.112219 = sum of:
> >>> 9.669338 = weight(nameSearch:12345 in 0) [SchemaSimilarity], result of:
> >>> 9.669338 = score(doc=0,freq=1.0 = termFreq=1.0
> >>> ), product of:
> >>> 7.6397386 = idf, computed as log(1 + (docCount - docFreq + 0.5) /
> (docFreq
> >>> + 0.5)) from:
> >>> 2.0 = docFreq
> >>> 5197.0 = docCount
> >>> 1.2656635 = tfNorm, computed as (freq * (k1 + 1)) / (freq + k1 * (1 -
> b +
> >> b
> >>> * fieldLength / avgFieldLength)) from:
> >>> 1.0 = termFreq=1.0
> >>> 1.2 = parameter k1
> >>> 0.75 = parameter b
> >>> 5.2576485 = avgFieldLength
> >>> 2.56 = fieldLength
> >>> 5.44288 = weight(nameSearch:masitha in 0) [SchemaSimilarity], result
> of:
> >>> 5.44288 = score(doc=0,freq=1.0 = termFreq=1.0
> >>> ), product of:
> >>> 4.3004165 = idf, computed as log(1 + (docCount - docFreq + 0.5) /
> (docFreq
> >>> + 0.5)) from:
> >>> 70.0 = docFreq
> >>> 5197.0 = docCount
> >>> 1.2656635 = tfNorm, computed as (freq * (k1 + 1)) / (freq + k1 * (1 -
> b +
> >> b
> >>> * fieldLength / avgFieldLength)) from:
> >>> 1.0 = termFreq=1.0
> >>> 1.2 = parameter k1
> >>> 0.75 = parameter b
> >>> 5.2576485 = avgFieldLength
> >>> 2.56 = fieldLength
> >>> 8.171431 = weight(*nameSearch:"12345 masitha"~5 *in 0)
> [SchemaSimilarity],
> >>> result of:
> >>> 8.171431 = score(doc=0,freq=0.3334 = phraseFreq=0.3334
> >>> ), product of:
> >>> 11.940155 = idf(), sum of:
> >>> 7.6397386 = idf, computed as log(1 + (docCount - docFreq + 0.5) /
> (docFreq
> >>> + 0.5)) from:
> >>> 2.0 = docFreq
> >>> 5197.0 = docCount
> >>> 4.3004165 = idf, computed as log(1 + (docCount - docFreq + 0.5) /
> (docFreq
> >>> + 0.5)) from:
> >>> 70.0 = docFreq
> >>> 5197.0 = docCount
> >>> 0.6843655 = tfNorm, computed as (freq * (k1 + 1)) / (freq + k1 * (1 -
> b +
> >> b
> >>> * fieldLength / avgFieldLength)) from:
> >>> 0.3334 = phraseFreq=0.3334
> >>> 1.2 = parameter k1
> >>> 0.75 = parameter b
> >>> 5.2576485 = avgFieldLength
> >>> 2.56 = fieldLength
> >>>
> >>> Thanks,
> >>> Aman Deep Singh
> >
>

Re: Solr Join Failures

2017-06-12 Thread Ray Niu

I am using following one:
http://lucene.apache.org/solr/5_4_1/solr-core/org/apache/solr/search/join/ScoreJoinQParserPlugin.html
q={!join from=id to=id fromIndex=B}id:*

2017-06-12 20:05 GMT-07:00 Zheng Lin Edwin Yeo :

> What is the query that you used to do the Join?
>
> There is the Streaming expression which has the various Join function, but
> it requires Solr version 6 onward.
> https://cwiki.apache.org/confluence/display/solr/Streaming+Expressions
>
>
> Regards,
> Edwin
>
>
> On 13 June 2017 at 05:25, Ray Niu  wrote:
>
> > Hi:
> >We encounter an issue when using join query in Solr Cloud, our version
> > is 5.5.2. We will query collection A and join with Collection B in the
> > runtime, collection A and B always co-exist in the same node.
> >Sometimes we found collection B was down for some reason, while
> > collection A was still active, during that period if we issue join query,
> > we will see following errors:
> >  SolrCloud join: B has a local replica (B_shard1_replica1) on
> > solrCloud_node1:8080_solr, but it is down
> >   Can anyone provide any suggestions on such scenario?
> >
>

Re: Solr Join Failures

2017-06-12 Thread Zheng Lin Edwin Yeo

What is the query that you used to do the Join?

There is the Streaming expression which has the various Join function, but
it requires Solr version 6 onward.
https://cwiki.apache.org/confluence/display/solr/Streaming+Expressions


Regards,
Edwin


On 13 June 2017 at 05:25, Ray Niu  wrote:

> Hi:
>We encounter an issue when using join query in Solr Cloud, our version
> is 5.5.2. We will query collection A and join with Collection B in the
> runtime, collection A and B always co-exist in the same node.
>Sometimes we found collection B was down for some reason, while
> collection A was still active, during that period if we issue join query,
> we will see following errors:
>  SolrCloud join: B has a local replica (B_shard1_replica1) on
> solrCloud_node1:8080_solr, but it is down
>   Can anyone provide any suggestions on such scenario?
>

RE: Highlighter not working on some documents

2017-06-12 Thread Phil Scadden

I managed to miss that. Thanks very much. I have some very large documents. I 
will look at index size and look at posting instead.

-Original Message-
From: David Smiley [mailto:david.w.smi...@gmail.com]
Sent: Monday, 12 June 2017 2:40 p.m.
To: solr-user@lucene.apache.org
Subject: Re: Highlighter not working on some documents

Probably the most common reason is the default hl.maxAnalyzedChars -- thus your 
highlightable text might not be in the first 51200 chars of text.  The first 
Solr release with the unified highlighter had an even lower default of 10k 
chars.

On Fri, Jun 9, 2017 at 9:58 PM Phil Scadden  wrote:

> Tried hard to find difference between pdfs returning no highlighter
> and ones that do for same search term.  Includes pdfs that have been
> OCRed and ones that were text to begin with. Head scratching to me.
>
> -Original Message-
> From: Erick Erickson [mailto:erickerick...@gmail.com]
> Sent: Saturday, 10 June 2017 6:22 a.m.
> To: solr-user 
> Subject: Re: Highlighter not working on some documents
>
> Need lots more information. I.e. schema definitions, query you use,
> handler configuration and the like. Note that highlighted fields must
> have stored="true" set and likely the _text_ field doesn't. At least
> in the default schemas stored is set to false for the catch-all field.
> And you don't want to store that information anyway since it's usually
> the destination of copyField directives and you'd highlight _those_ fields.
>
> Best,
> Erick
>
> On Thu, Jun 8, 2017 at 8:37 PM, Phil Scadden  wrote:
> > Do a search with:
> > fl=id,title,datasource&hl=true&hl.method=unified&limit=50&page=1&q=p
> > re
> > ssure+AND+testing&rows=50&start=0&wt=json
> >
> > and I get back a good list of documents. However, some documents are
> returning empty fields in the highlighter. Eg, in the highlight array have:
> > "W:\\Reports\\OCR\\4272.pdf":{"_text_":[]}
> >
> > Getting this well up the list of results with good highlighted
> > matchers
> above and below this entry. Why would the highlighter be failing?
> >
> > Notice: This email and any attachments are confidential and may not
> > be
> used, published or redistributed without the prior written consent of
> the Institute of Geological and Nuclear Sciences Limited (GNS
> Science). If received in error please destroy and immediately notify
> GNS Science. Do not copy or disclose the contents.
> Notice: This email and any attachments are confidential and may not be
> used, published or redistributed without the prior written consent of
> the Institute of Geological and Nuclear Sciences Limited (GNS
> Science). If received in error please destroy and immediately notify
> GNS Science. Do not copy or disclose the contents.
>
--
Lucene/Solr Search Committer, Consultant, Developer, Author, Speaker
LinkedIn: http://linkedin.com/in/davidwsmiley | Book:
http://www.solrenterprisesearchserver.com
Notice: This email and any attachments are confidential and may not be used, 
published or redistributed without the prior written consent of the Institute 
of Geological and Nuclear Sciences Limited (GNS Science). If received in error 
please destroy and immediately notify GNS Science. Do not copy or disclose the 
contents.

Solr Join Failures

2017-06-12 Thread Ray Niu

Hi:
   We encounter an issue when using join query in Solr Cloud, our version
is 5.5.2. We will query collection A and join with Collection B in the
runtime, collection A and B always co-exist in the same node.
   Sometimes we found collection B was down for some reason, while
collection A was still active, during that period if we issue join query,
we will see following errors:
 SolrCloud join: B has a local replica (B_shard1_replica1) on
solrCloud_node1:8080_solr, but it is down
  Can anyone provide any suggestions on such scenario?

Re: _version_ as LongPointField returns error

2017-06-12 Thread Shawn Feldman

Should i make stored=false?  don't i need _version_ for the mvcc semantics?


On Mon, Jun 12, 2017 at 10:41 AM Chris Hostetter 
wrote:

>
> just replying to some comments/discussion in general rather then
> individual msgs/sentences..
>
> * uninversion/FieldCache of *singlevalued* Points fields was fixed in
> SOLR-10472
>
> * currently a bad idea to use indexed="true" Points for _version_ due to
> SOLR-10832
>
> * AFAICT it's a good idea (in general, regardless of type) to use
> indexed="true" docValues="true" for _version_ (once SOLR-10832 is fixed)
> to ensure VersionInfo.getMaxVersionFromIndex doesn't make core
> load/reloads (and CDCR aparently) slow.
>
>
>
> : Date: Mon, 12 Jun 2017 12:32:50 -0400
> : From: Yonik Seeley 
> : Reply-To: solr-user@lucene.apache.org
> : To: "solr-user@lucene.apache.org" 
> : Subject: Re: _version_ as LongPointField returns error
> :
> : On Mon, Jun 12, 2017 at 12:24 PM, Shawn Feldman 
> wrote:
> : > Why do you need doc values though?  i'm never going to sort by version
> :
> : Solr needs a quick lookup from docid->_version_
> : If you don't have docValues, Solr tries to create an in-memory version
> : (via the FieldCache).  That's not yet supported for Point* fields.
> :
> : -Yonik
> :
> : > On Mon, Jun 12, 2017 at 10:13 AM Yonik Seeley 
> wrote:
> : >
> : >> I think the _version_ field should be
> : >>  - indexed="false"
> : >>  - stored="false"
> : >>  - docValues="true"
> : >>
> : >> -Yonik
> : >>
> : >>
> : >> On Mon, Jun 12, 2017 at 12:08 PM, Shawn Feldman <
> shawn.feld...@gmail.com>
> : >> wrote:
> : >> > I changed all my TrieLong Fields to Point fields.  _version_ always
> : >> returns
> : >> > an error unless i turn on docvalues
> : >> >
> : >> >   
> : >> >/>
> : >> >
> : >> > Getting this error when i index.  Any ideas?
> : >> >
> : >> >
> : >> >  Remote error message: Point fields can't use FieldCache. Use
> : >> > docValues=true for field: _version_
> : >> > solr2_1|at
> : >> >
> : >>
> org.apache.solr.update.processor.DistributedUpdateProcessor.doFinish(DistributedUpdateProcessor.java:973)
> : >> > solr2_1|at
> : >> >
> : >>
> org.apache.solr.update.processor.DistributedUpdateProcessor.finish(DistributedUpdateProcessor.java:1912)
> : >> > solr2_1|at
> : >> >
> : >>
> org.apache.solr.update.processor.LogUpdateProcessorFactory$LogUpdateProcessor.finish(LogUpdateProcessorFactory.java:182)
> : >> > solr2_1|at
> : >> >
> : >>
> org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:78)
> : >> > solr2_1|at
> : >> >
> : >>
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:173)
> : >> > solr2_1|at
> : >> org.apache.solr.core.SolrCore.execute(SolrCore.java:2440)
> : >> > solr2_1|at
> : >> > org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:723)
> : >>
> :
>
> -Hoss
> http://www.lucidworks.com/
>

RE: Parallel API interface into SOLR

2017-06-12 Thread Rohit Jain

Thanks a lot Joel!  No wonder I could not find it :-).  I will try to see if 
this will work for us.

Rohit

-Original Message-
From: Joel Bernstein [mailto:joels...@gmail.com] 
Sent: Monday, June 12, 2017 1:01 PM
To: solr-user@lucene.apache.org
Subject: Re: Parallel API interface into SOLR

You can do what you're trying to do by using the SolrStream but it's
complex and not documented. Here is the basic code for having multiple
clients hitting the same shard:

*On client 1:*

SolrClientCache cache = new SolrClientCache();

StreamContext context = new StreamContext();
context.setSolrClientCache(cache);
context.numWorkers=2;
*context.workerID=0;*
ModifiableSolrParams params = new ModifiableSolrParams();

params.put("qt", "/export");
params.put("partitionKeys", "field1, field2");
params.put("sort", "field1 asc, field2 asc);
params.put("q", "some query");

SolrStream solrStream = new SolrStream(/shard_endpoint, params);
solrStream.setStreamContext(context);
solrStream.open();
while(true) {
Tuple tup =solrStream.read();
}

*On client 2:*

SolrClientCache cache = new SolrClientCache();

StreamContext context = new StreamContext();
context.setSolrClientCache(cache);
context.numWorkers=2;
*context.workerID=1;*
ModifiableSolrParams params = new ModifiableSolrParams();

params.put("qt", "/export");
params.put("partitionKeys", "field1, field2");
params.put("sort", "field1 asc, field2 asc);
params.put("q", "some query");

SolrStream solrStream = new SolrStream(/shard_endpoint, params);
solrStream.setStreamContext(context);
solrStream.open();
while(true) {
Tuple tup =solrStream.read();
}



In this scenario client1 and client2 are each getting a partition of the
result set. Notice that the context.workerID attribute is the difference
between the two requests. You can partition a result set as many ways as
you want by setting the context.numWorkers attribute.









Joel Bernstein
http://joelsolr.blogspot.com/

On Mon, Jun 12, 2017 at 1:11 PM, Rohit Jain  wrote:

> Erick,
>
> I think so, although I may have overlooked something.  The idea is that we
> would make a request to the API from a single client but expect multiple
> streams of results to be returned in parallel to multiple parallel
> processes that we have set up to receive those results from SOLR.  Do these
> interfaces provide that?  This has always been the issue with interfaces
> like JDBC / ODBC as well, since they don't provide a mechanism to consume
> the results in parallel streams.  There is no protocol set up to do that.
> I was just wondering if there was for SOLR and what would be an example of
> that.
>
> Rohit
>
> -Original Message-
> From: Erick Erickson [mailto:erickerick...@gmail.com]
> Sent: Monday, June 12, 2017 11:56 AM
> To: solr-user 
> Subject: Re: Parallel API interface into SOLR
>
> Have you looked at Streaming Aggregation/Streaming Expressions/Parallel
> SQL etc?
>
> Best,
> Erick
>
> On Mon, Jun 12, 2017 at 9:24 AM, Rohit Jain  wrote:
> > Hi folks,
> >
> > We have a solution where we would like to connect to SOLR via an API,
> submit a query, and then pre-process the results before we return the
> results to our users.  However, in some cases, it is possible that the
> results being returned by SOLR, in a large distributed cluster deployment,
> is very large.  In these cases, we would like to set up parallel streams,
> so that each parallel SOLR worker feeds directly into one of our processes
> distributed across the cluster.  That way, we can pre-process those results
> in parallel, before we consolidate (and potentially reduce / aggregate) the
> results further for the user, who has a single client connection to our
> solution.  Sort of a MapReduce type scenario where our processors are the
> reducers.  We could consume the results as returned by these SOLR Worker
> processes, or perhaps have them shuffled based on a shard key, before our
> processes would receive them.
> >
> > Any ideas on how this could be done?
> >
> > Rohit Jain
>

RE: DIH issue with streaming xml file

2017-06-12 Thread Miller, William K - Norman, OK - Contractor

Thank you for your response.  I will look into this link.  Also, sorry I did 
not specify the file type.   I am working with XML files.




~~~
William Kevin Miller

ECS Federal, Inc.
USPS/MTSC
(405) 573-2158


-Original Message-
From: Alexandre Rafalovitch [mailto:arafa...@gmail.com] 
Sent: Monday, June 12, 2017 1:26 PM
To: solr-user
Subject: Re: DIH issue with streaming xml file

Solr 6.5.1 DIH setup has - somewhat broken - RSS example (redone as ATOM 
example in 6.6) that shows how to get stuff from https URL. You can see the 
atom example here:
https://github.com/apache/lucene-solr/blob/releases/lucene-solr/6.6.0/solr/example/example-DIH/solr/atom/conf/atom-data-config.xml


The main issue however is that you are not saying what format is that list of 
file on the server. Is that a plain list? Is it XML with files? Are you doing 
directory listing?

Regards,
   Alex.

http://www.solr-start.com/ - Resources for Solr users, new and experienced


On 12 June 2017 at 14:11, Miller, William K - Norman, OK - Contractor 
 wrote:
> Thank you for your response.  That is the issue that I am having.  I cannot 
> figure out how to get the list of files from the remote server.  I have tried 
> changing the parent Entity Processor to the XPathEntityProcessor and the 
> baseDir to a url using https.  This did not work as it was looking for a 
> "foreach" attribute.  Is there an Entity Processor that can be used to get 
> the list of files from an https source or am I going to have to use solrj or 
> create a custom entity processor?
>
>
>
>
> ~~~
> William Kevin Miller
>
> ECS Federal, Inc.
> USPS/MTSC
> (405) 573-2158
>
>
> -Original Message-
> From: Alexandre Rafalovitch [mailto:arafa...@gmail.com]
> Sent: Monday, June 12, 2017 12:57 PM
> To: solr-user
> Subject: Re: DIH issue with streaming xml file
>
> How do you get a list of URLs for the files on the remote server? That's 
> probably the first issue. Once you have the URLs in an outside entity or two, 
> you can feed them one by one into the inner entity.
>
> Regards,
>Alex.
>
> 
> http://www.solr-start.com/ - Resources for Solr users, new and 
> experienced
>
> On 12 June 2017 at 09:39, Miller, William K - Norman, OK - Contractor < 
> william.k.mil...@usps.gov.invalid> wrote:
>
>> I am using Solr 6.5.1 and working on importing xml files using the 
>> DataImportHandler.  I am wanting to get the files from a remote 
>> server, but I am dealing with multiple xml files in multiple folders.
>> I am using a nested entity in my dataConfig.  Below is an example of 
>> how I have my dataConfig set up.  I got most of this from an online 
>> reference.  In this example I am getting the xml files from a folder 
>> on the Solr server, but as I mentioned above I want to get the files 
>> from a remote server.  I have looked at the different Entity 
>> Processors for the DIH, but have not seen anything that seems to work.
>> Is there a way to configure the below code to let me do this?
>>
>>
>>
>>
>>
>> 
>>
>>
>>
>> > type="FileDataSource" />
>>
>> 
>>
>> 
>>
>>
>>
>> >
>> name="pickupdir"
>>
>> processor="FileListEntityProcessor"
>>
>> rootEntity="false"
>>
>> dataSource="null"
>>
>> fileName="^[\w\d-]+\.xml$"
>>
>> baseDir="/var/solr/data/hbk/data/xml/"
>>
>> recursive="true"
>>
>>
>>
>> >
>>
>> 
>>
>>
>>
>> >
>> name="xml"
>>
>>
>> pk="itemId"
>>
>>
>> processor="XPathEntityProcessor"
>>
>>
>> transformer="RegexTransformer,TemplateTransformer"
>>
>>
>> datasource="pickupdir"
>>
>>
>> stream="true"
>>
>>
>> xsl="/var/solr/data/hbk/data/xsl/solr_timdex.xsl"
>>
>>
>> url="${pickupdir.fileAbsolutePath}"
>>
>>
>> forEach="/eflow/section | /eflow/section/item"
>>
>> >
>>
>>
>>
>> 
>> > commonField="true" />
>>
>> 
>> > />
>>
>> 
>> > commonField="true" />
>>
>> 
>> > commonField="true" />
>>
>> 
>> > commonField="true" />
>>
>>
>>
>> 
>> 
>>
>> 
>> 
>>
>> 
>> 
>>
>> 
>> 
>>
>> 
>> 
>>
>>

Re: Use of blanks in context filter field with AnalyzingInfixLookupFactory

2017-06-12 Thread Alfonso Muñoz-Pomer Fuentes

suggestAnalyzerFieldType and queryAnalyzerFieldType are related to the field 
parameter (in my case property_value), not to the contextField. Moreover, the 
change you suggest makes AnalyzingInfixLookupFactory always return 0 results 
(something that’s not discussed in the reference guide and has confused other 
users previously).

Cheers,
Alfonso


> On 12 Jun 2017, at 19:10, Susheel Kumar  wrote:
> 
> Change below type to string and try...
> 
> text_en
>text_en
> 
> Thanks,
> Susheel
> 
> On Mon, Jun 12, 2017 at 1:28 PM, Alfonso Muñoz-Pomer Fuentes <
> amu...@ebi.ac.uk> wrote:
> 
>> Hi all,
>> 
>> I was wondering if anybody has experience setting up a suggester with
>> filtering using a context field that has blanks. Currently this is what I
>> have in solr_config.xml:
>> 
>>  
>>AnalyzingInfixLookupFactory
>>DocumentDictionaryFactory
>>species
>>text_en
>>text_en
>>false
>>  
>> 
>> 
>> And this is an example record in my index:
>> {
>>  "bioentity_identifier":["ENSG419"],
>>  "bioentity_type":["ensgene"],
>>  "species":"homo sapiens",
>>  "property_value":["R-HSA-162699"],
>>  "property_name":["pathwayid"],
>>  "id":"795aedd9-54aa-44c9-99bf-8d195985b7cc",
>>  "_version_”:1570016930397421568
>> }
>> 
>> When I request for suggestions like this, everything’s fine:
>> http://localhost:8983/solr/bioentities/suggest?wt=json&;
>> indent=on&suggest.q=r
>> 
>> But if I try to narrow by species, I get 0 results:
>> http://localhost:8983/solr/bioentities/suggest?wt=json&;
>> indent=on&suggest.q=r&suggest.cfq=homo sapiens
>> 
>> I’ve tried escaping the space, URL-encode it (with %20 and +), enclosing
>> it in single quotes, double quotes, square brackets... to no avail (getting
>> 0 results except when I enclose the parameter value with double quotes, in
>> which case I get an exception). In the example record above, species is of
>> type string. In schemaless mode the results are the same.
>> 
>> Using underscores in the species lets me filter properly, so the filtering
>> mechanism per se works fine.
>> 
>> Any help greatly appreciated.
>> 
>> --
>> Alfonso Muñoz-Pomer Fuentes
>> Software Engineer @ Expression Atlas Team
>> European Bioinformatics Institute (EMBL-EBI)
>> European Molecular Biology Laboratory
>> Tel:+ 44 (0) 1223 49 2633
>> Skype: amunozpomer
>> 
>> 

--
Alfonso Muñoz-Pomer Fuentes
Software Engineer @ Expression Atlas Team
European Bioinformatics Institute (EMBL-EBI)
European Molecular Biology Laboratory
Tel:+ 44 (0) 1223 49 2633
Skype: amunozpomer

Re: DIH issue with streaming xml file

2017-06-12 Thread Alexandre Rafalovitch

Solr 6.5.1 DIH setup has - somewhat broken - RSS example (redone as
ATOM example in 6.6) that shows how to get stuff from https URL. You
can see the atom example here:
https://github.com/apache/lucene-solr/blob/releases/lucene-solr/6.6.0/solr/example/example-DIH/solr/atom/conf/atom-data-config.xml


The main issue however is that you are not saying what format is that
list of file on the server. Is that a plain list? Is it XML with
files? Are you doing directory listing?

Regards,
   Alex.

http://www.solr-start.com/ - Resources for Solr users, new and experienced


On 12 June 2017 at 14:11, Miller, William K - Norman, OK - Contractor
 wrote:
> Thank you for your response.  That is the issue that I am having.  I cannot 
> figure out how to get the list of files from the remote server.  I have tried 
> changing the parent Entity Processor to the XPathEntityProcessor and the 
> baseDir to a url using https.  This did not work as it was looking for a 
> "foreach" attribute.  Is there an Entity Processor that can be used to get 
> the list of files from an https source or am I going to have to use solrj or 
> create a custom entity processor?
>
>
>
>
> ~~~
> William Kevin Miller
>
> ECS Federal, Inc.
> USPS/MTSC
> (405) 573-2158
>
>
> -Original Message-
> From: Alexandre Rafalovitch [mailto:arafa...@gmail.com]
> Sent: Monday, June 12, 2017 12:57 PM
> To: solr-user
> Subject: Re: DIH issue with streaming xml file
>
> How do you get a list of URLs for the files on the remote server? That's 
> probably the first issue. Once you have the URLs in an outside entity or two, 
> you can feed them one by one into the inner entity.
>
> Regards,
>Alex.
>
> 
> http://www.solr-start.com/ - Resources for Solr users, new and experienced
>
> On 12 June 2017 at 09:39, Miller, William K - Norman, OK - Contractor < 
> william.k.mil...@usps.gov.invalid> wrote:
>
>> I am using Solr 6.5.1 and working on importing xml files using the
>> DataImportHandler.  I am wanting to get the files from a remote
>> server, but I am dealing with multiple xml files in multiple folders.
>> I am using a nested entity in my dataConfig.  Below is an example of
>> how I have my dataConfig set up.  I got most of this from an online
>> reference.  In this example I am getting the xml files from a folder
>> on the Solr server, but as I mentioned above I want to get the files
>> from a remote server.  I have looked at the different Entity
>> Processors for the DIH, but have not seen anything that seems to work.
>> Is there a way to configure the below code to let me do this?
>>
>>
>>
>>
>>
>> 
>>
>>
>>
>> > type="FileDataSource" />
>>
>> 
>>
>> 
>>
>>
>>
>> >
>> name="pickupdir"
>>
>> processor="FileListEntityProcessor"
>>
>> rootEntity="false"
>>
>> dataSource="null"
>>
>> fileName="^[\w\d-]+\.xml$"
>>
>> baseDir="/var/solr/data/hbk/data/xml/"
>>
>> recursive="true"
>>
>>
>>
>> >
>>
>> 
>>
>>
>>
>> >
>> name="xml"
>>
>>
>> pk="itemId"
>>
>>
>> processor="XPathEntityProcessor"
>>
>>
>> transformer="RegexTransformer,TemplateTransformer"
>>
>>
>> datasource="pickupdir"
>>
>>
>> stream="true"
>>
>>
>> xsl="/var/solr/data/hbk/data/xsl/solr_timdex.xsl"
>>
>>
>> url="${pickupdir.fileAbsolutePath}"
>>
>>
>> forEach="/eflow/section | /eflow/section/item"
>>
>> >
>>
>>
>>
>> > column="sectionId" xpath="/eflow/section/@id" commonField="true" />
>>
>> > column="sectionTitle" xpath="/eflow/section/@title" commonField="true"
>> />
>>
>> > column="sectionNo" xpath="/eflow/section/@secno" commonField="true" />
>>
>> > column="hbkNo" xpath="/eflow/section/@hbkno" commonField="true" />
>>
>> > column="volumeNo" xpath="/eflow/section/@volno" commonField="true" />
>>
>>
>>
>> > column="itemId" xpath="/eflow/section/item/@id" />
>>
>> > column="itemTitle" xpath="/eflow/section/item/@title" />
>>
>> > column="itemNo" xpath="/eflow/section/item/@mit" />
>>
>> > column="itemFile" xpath="/eflow/section/item/@file" />
>>
>> >

RE: DIH issue with streaming xml file

2017-06-12 Thread Miller, William K - Norman, OK - Contractor

Thank you for your response.  That is the issue that I am having.  I cannot 
figure out how to get the list of files from the remote server.  I have tried 
changing the parent Entity Processor to the XPathEntityProcessor and the 
baseDir to a url using https.  This did not work as it was looking for a 
"foreach" attribute.  Is there an Entity Processor that can be used to get the 
list of files from an https source or am I going to have to use solrj or create 
a custom entity processor?




~~~
William Kevin Miller

ECS Federal, Inc.
USPS/MTSC
(405) 573-2158


-Original Message-
From: Alexandre Rafalovitch [mailto:arafa...@gmail.com] 
Sent: Monday, June 12, 2017 12:57 PM
To: solr-user
Subject: Re: DIH issue with streaming xml file

How do you get a list of URLs for the files on the remote server? That's 
probably the first issue. Once you have the URLs in an outside entity or two, 
you can feed them one by one into the inner entity.

Regards,
   Alex.


http://www.solr-start.com/ - Resources for Solr users, new and experienced

On 12 June 2017 at 09:39, Miller, William K - Norman, OK - Contractor < 
william.k.mil...@usps.gov.invalid> wrote:

> I am using Solr 6.5.1 and working on importing xml files using the 
> DataImportHandler.  I am wanting to get the files from a remote 
> server, but I am dealing with multiple xml files in multiple folders.  
> I am using a nested entity in my dataConfig.  Below is an example of 
> how I have my dataConfig set up.  I got most of this from an online 
> reference.  In this example I am getting the xml files from a folder 
> on the Solr server, but as I mentioned above I want to get the files 
> from a remote server.  I have looked at the different Entity 
> Processors for the DIH, but have not seen anything that seems to work.  
> Is there a way to configure the below code to let me do this?
>
>
>
>
>
> 
>
>
>
>  type="FileDataSource" />
>
> 
>
> 
>
>
>
> 
> name="pickupdir"
>
> processor="FileListEntityProcessor"
>
> rootEntity="false"
>
> dataSource="null"
>
> fileName="^[\w\d-]+\.xml$"
>
> baseDir="/var/solr/data/hbk/data/xml/"
>
> recursive="true"
>
>
>
> >
>
> 
>
>
>
> 
> name="xml"
>
>
> pk="itemId"
>
>
> processor="XPathEntityProcessor"
>
>
> transformer="RegexTransformer,TemplateTransformer"
>
>
> datasource="pickupdir"
>
>
> stream="true"
>
>
> xsl="/var/solr/data/hbk/data/xsl/solr_timdex.xsl"
>
>
> url="${pickupdir.fileAbsolutePath}"
>
>
> forEach="/eflow/section | /eflow/section/item"
>
> >
>
>
>
>  column="sectionId" xpath="/eflow/section/@id" commonField="true" />
>
>  column="sectionTitle" xpath="/eflow/section/@title" commonField="true" 
> />
>
>  column="sectionNo" xpath="/eflow/section/@secno" commonField="true" />
>
>  column="hbkNo" xpath="/eflow/section/@hbkno" commonField="true" />
>
>  column="volumeNo" xpath="/eflow/section/@volno" commonField="true" />
>
>
>
>  column="itemId" xpath="/eflow/section/item/@id" />
>
>  column="itemTitle" xpath="/eflow/section/item/@title" />
>
>  column="itemNo" xpath="/eflow/section/item/@mit" />
>
>  column="itemFile" xpath="/eflow/section/item/@file" />
>
>  column="itemType" xpath="/eflow/section/item/@type" />
>
> 
>
> 
>
> 
>
> 
>
>
>
>
>
>
>
>
>
>
>
> ~~~
>
> William Kevin Miller
>
> [image: ecsLogo]
>
> ECS Federal, Inc.
>
> USPS/MTSC
>
> (405) 573-2158
>
>
>

Re: Use of blanks in context filter field with AnalyzingInfixLookupFactory

2017-06-12 Thread Susheel Kumar

Change below type to string and try...

 text_en
text_en

Thanks,
Susheel

On Mon, Jun 12, 2017 at 1:28 PM, Alfonso Muñoz-Pomer Fuentes <
amu...@ebi.ac.uk> wrote:

> Hi all,
>
> I was wondering if anybody has experience setting up a suggester with
> filtering using a context field that has blanks. Currently this is what I
> have in solr_config.xml:
> 
>   
> AnalyzingInfixLookupFactory
> DocumentDictionaryFactory
> species
> text_en
> text_en
> false
>   
> 
>
> And this is an example record in my index:
> {
>   "bioentity_identifier":["ENSG419"],
>   "bioentity_type":["ensgene"],
>   "species":"homo sapiens",
>   "property_value":["R-HSA-162699"],
>   "property_name":["pathwayid"],
>   "id":"795aedd9-54aa-44c9-99bf-8d195985b7cc",
>   "_version_”:1570016930397421568
> }
>
> When I request for suggestions like this, everything’s fine:
> http://localhost:8983/solr/bioentities/suggest?wt=json&;
> indent=on&suggest.q=r
>
> But if I try to narrow by species, I get 0 results:
> http://localhost:8983/solr/bioentities/suggest?wt=json&;
> indent=on&suggest.q=r&suggest.cfq=homo sapiens
>
> I’ve tried escaping the space, URL-encode it (with %20 and +), enclosing
> it in single quotes, double quotes, square brackets... to no avail (getting
> 0 results except when I enclose the parameter value with double quotes, in
> which case I get an exception). In the example record above, species is of
> type string. In schemaless mode the results are the same.
>
> Using underscores in the species lets me filter properly, so the filtering
> mechanism per se works fine.
>
> Any help greatly appreciated.
>
> --
> Alfonso Muñoz-Pomer Fuentes
> Software Engineer @ Expression Atlas Team
> European Bioinformatics Institute (EMBL-EBI)
> European Molecular Biology Laboratory
> Tel:+ 44 (0) 1223 49 2633
> Skype: amunozpomer
>
>

Re: Parallel API interface into SOLR

2017-06-12 Thread Joel Bernstein

You can do what you're trying to do by using the SolrStream but it's
complex and not documented. Here is the basic code for having multiple
clients hitting the same shard:

*On client 1:*

SolrClientCache cache = new SolrClientCache();

StreamContext context = new StreamContext();
context.setSolrClientCache(cache);
context.numWorkers=2;
*context.workerID=0;*
ModifiableSolrParams params = new ModifiableSolrParams();

params.put("qt", "/export");
params.put("partitionKeys", "field1, field2");
params.put("sort", "field1 asc, field2 asc);
params.put("q", "some query");

SolrStream solrStream = new SolrStream(/shard_endpoint, params);
solrStream.setStreamContext(context);
solrStream.open();
while(true) {
Tuple tup =solrStream.read();
}

*On client 2:*

SolrClientCache cache = new SolrClientCache();

StreamContext context = new StreamContext();
context.setSolrClientCache(cache);
context.numWorkers=2;
*context.workerID=1;*
ModifiableSolrParams params = new ModifiableSolrParams();

params.put("qt", "/export");
params.put("partitionKeys", "field1, field2");
params.put("sort", "field1 asc, field2 asc);
params.put("q", "some query");

SolrStream solrStream = new SolrStream(/shard_endpoint, params);
solrStream.setStreamContext(context);
solrStream.open();
while(true) {
Tuple tup =solrStream.read();
}



In this scenario client1 and client2 are each getting a partition of the
result set. Notice that the context.workerID attribute is the difference
between the two requests. You can partition a result set as many ways as
you want by setting the context.numWorkers attribute.









Joel Bernstein
http://joelsolr.blogspot.com/

On Mon, Jun 12, 2017 at 1:11 PM, Rohit Jain  wrote:

> Erick,
>
> I think so, although I may have overlooked something.  The idea is that we
> would make a request to the API from a single client but expect multiple
> streams of results to be returned in parallel to multiple parallel
> processes that we have set up to receive those results from SOLR.  Do these
> interfaces provide that?  This has always been the issue with interfaces
> like JDBC / ODBC as well, since they don't provide a mechanism to consume
> the results in parallel streams.  There is no protocol set up to do that.
> I was just wondering if there was for SOLR and what would be an example of
> that.
>
> Rohit
>
> -Original Message-
> From: Erick Erickson [mailto:erickerick...@gmail.com]
> Sent: Monday, June 12, 2017 11:56 AM
> To: solr-user 
> Subject: Re: Parallel API interface into SOLR
>
> Have you looked at Streaming Aggregation/Streaming Expressions/Parallel
> SQL etc?
>
> Best,
> Erick
>
> On Mon, Jun 12, 2017 at 9:24 AM, Rohit Jain  wrote:
> > Hi folks,
> >
> > We have a solution where we would like to connect to SOLR via an API,
> submit a query, and then pre-process the results before we return the
> results to our users.  However, in some cases, it is possible that the
> results being returned by SOLR, in a large distributed cluster deployment,
> is very large.  In these cases, we would like to set up parallel streams,
> so that each parallel SOLR worker feeds directly into one of our processes
> distributed across the cluster.  That way, we can pre-process those results
> in parallel, before we consolidate (and potentially reduce / aggregate) the
> results further for the user, who has a single client connection to our
> solution.  Sort of a MapReduce type scenario where our processors are the
> reducers.  We could consume the results as returned by these SOLR Worker
> processes, or perhaps have them shuffled based on a shard key, before our
> processes would receive them.
> >
> > Any ideas on how this could be done?
> >
> > Rohit Jain
>

Re: DIH issue with streaming xml file

2017-06-12 Thread Alexandre Rafalovitch

How do you get a list of URLs for the files on the remote server? That's
probably the first issue. Once you have the URLs in an outside entity or
two, you can feed them one by one into the inner entity.

Regards,
   Alex.


http://www.solr-start.com/ - Resources for Solr users, new and experienced

On 12 June 2017 at 09:39, Miller, William K - Norman, OK - Contractor <
william.k.mil...@usps.gov.invalid> wrote:

> I am using Solr 6.5.1 and working on importing xml files using the
> DataImportHandler.  I am wanting to get the files from a remote server, but
> I am dealing with multiple xml files in multiple folders.  I am using a
> nested entity in my dataConfig.  Below is an example of how I have my
> dataConfig set up.  I got most of this from an online reference.  In this
> example I am getting the xml files from a folder on the Solr server, but as
> I mentioned above I want to get the files from a remote server.  I have
> looked at the different Entity Processors for the DIH, but have not seen
> anything that seems to work.  Is there a way to configure the below code to
> let me do this?
>
>
>
>
>
> 
>
>
>
>  type="FileDataSource" />
>
> 
>
> 
>
>
>
> 
> name="pickupdir"
>
> processor="FileListEntityProcessor"
>
> rootEntity="false"
>
> dataSource="null"
>
> fileName="^[\w\d-]+\.xml$"
>
> baseDir="/var/solr/data/hbk/data/xml/"
>
> recursive="true"
>
>
>
> >
>
> 
>
>
>
> 
> name="xml"
>
>
> pk="itemId"
>
>
> processor="XPathEntityProcessor"
>
>
> transformer="RegexTransformer,TemplateTransformer"
>
>
> datasource="pickupdir"
>
>
> stream="true"
>
>
> xsl="/var/solr/data/hbk/data/xsl/solr_timdex.xsl"
>
>
> url="${pickupdir.fileAbsolutePath}"
>
>
> forEach="/eflow/section | /eflow/section/item"
>
> >
>
>
>
>  column="sectionId" xpath="/eflow/section/@id" commonField="true" />
>
>  column="sectionTitle" xpath="/eflow/section/@title" commonField="true" />
>
>  column="sectionNo" xpath="/eflow/section/@secno" commonField="true" />
>
>  column="hbkNo" xpath="/eflow/section/@hbkno" commonField="true" />
>
>  column="volumeNo" xpath="/eflow/section/@volno" commonField="true" />
>
>
>
>  column="itemId" xpath="/eflow/section/item/@id" />
>
>  column="itemTitle" xpath="/eflow/section/item/@title" />
>
>  column="itemNo" xpath="/eflow/section/item/@mit" />
>
>  column="itemFile" xpath="/eflow/section/item/@file" />
>
>  column="itemType" xpath="/eflow/section/item/@type" />
>
> 
>
> 
>
> 
>
> 
>
>
>
>
>
>
>
>
>
>
>
> ~~~
>
> William Kevin Miller
>
> [image: ecsLogo]
>
> ECS Federal, Inc.
>
> USPS/MTSC
>
> (405) 573-2158
>
>
>

Use of blanks in context filter field with AnalyzingInfixLookupFactory

2017-06-12 Thread Alfonso Muñoz-Pomer Fuentes

Hi all,

I was wondering if anybody has experience setting up a suggester with filtering 
using a context field that has blanks. Currently this is what I have in 
solr_config.xml:

  
AnalyzingInfixLookupFactory
DocumentDictionaryFactory
species
text_en
text_en
false
  


And this is an example record in my index:
{
  "bioentity_identifier":["ENSG419"],
  "bioentity_type":["ensgene"],
  "species":"homo sapiens",
  "property_value":["R-HSA-162699"],
  "property_name":["pathwayid"],
  "id":"795aedd9-54aa-44c9-99bf-8d195985b7cc",
  "_version_”:1570016930397421568
}

When I request for suggestions like this, everything’s fine:
http://localhost:8983/solr/bioentities/suggest?wt=json&indent=on&suggest.q=r

But if I try to narrow by species, I get 0 results:
http://localhost:8983/solr/bioentities/suggest?wt=json&indent=on&suggest.q=r&suggest.cfq=homo
 sapiens

I’ve tried escaping the space, URL-encode it (with %20 and +), enclosing it in 
single quotes, double quotes, square brackets... to no avail (getting 0 results 
except when I enclose the parameter value with double quotes, in which case I 
get an exception). In the example record above, species is of type string. In 
schemaless mode the results are the same.

Using underscores in the species lets me filter properly, so the filtering 
mechanism per se works fine.

Any help greatly appreciated.

--
Alfonso Muñoz-Pomer Fuentes
Software Engineer @ Expression Atlas Team
European Bioinformatics Institute (EMBL-EBI)
European Molecular Biology Laboratory
Tel:+ 44 (0) 1223 49 2633
Skype: amunozpomer

RE: Parallel API interface into SOLR

2017-06-12 Thread Rohit Jain

Erick,

I think so, although I may have overlooked something.  The idea is that we 
would make a request to the API from a single client but expect multiple 
streams of results to be returned in parallel to multiple parallel processes 
that we have set up to receive those results from SOLR.  Do these interfaces 
provide that?  This has always been the issue with interfaces like JDBC / ODBC 
as well, since they don't provide a mechanism to consume the results in 
parallel streams.  There is no protocol set up to do that.  I was just 
wondering if there was for SOLR and what would be an example of that.

Rohit

-Original Message-
From: Erick Erickson [mailto:erickerick...@gmail.com] 
Sent: Monday, June 12, 2017 11:56 AM
To: solr-user 
Subject: Re: Parallel API interface into SOLR

Have you looked at Streaming Aggregation/Streaming Expressions/Parallel SQL etc?

Best,
Erick

On Mon, Jun 12, 2017 at 9:24 AM, Rohit Jain  wrote:
> Hi folks,
>
> We have a solution where we would like to connect to SOLR via an API, submit 
> a query, and then pre-process the results before we return the results to our 
> users.  However, in some cases, it is possible that the results being 
> returned by SOLR, in a large distributed cluster deployment, is very large.  
> In these cases, we would like to set up parallel streams, so that each 
> parallel SOLR worker feeds directly into one of our processes distributed 
> across the cluster.  That way, we can pre-process those results in parallel, 
> before we consolidate (and potentially reduce / aggregate) the results 
> further for the user, who has a single client connection to our solution.  
> Sort of a MapReduce type scenario where our processors are the reducers.  We 
> could consume the results as returned by these SOLR Worker processes, or 
> perhaps have them shuffled based on a shard key, before our processes would 
> receive them.
>
> Any ideas on how this could be done?
>
> Rohit Jain

Re: Phrase Query only forward direction

2017-06-12 Thread Erick Erickson

Complex phrase also has an inorder flag that I think you're looking for here.

Best,
Erick

On Mon, Jun 12, 2017 at 7:16 AM, Erik Hatcher  wrote:
> Understood.   If you need ordered, “sloppy” (some distance) phrases, you 
> could OR in a {!complexphrase} query.
>
> https://cwiki.apache.org/confluence/display/solr/Other+Parsers#OtherParsers-ComplexPhraseQueryParser
>  
> 
>
> Something like:
>
> q=({!edismax … ps=0 v=$qq}) OR {!complexphrase df=nameSearch v=$qq}
>
> where &qq=12345 masitha
>
> Erik
>
>
>> On Jun 12, 2017, at 9:57 AM, Aman Deep Singh  
>> wrote:
>>
>> Yes Erik I can use ps=0 but, my problem is that I want phrase which have
>> same sequence and they can be present with in some distance
>> E.g.
>> If I have document masitha xyz 12345
>> I want that to be boosted since the sequence is in order .That's why I have
>> use ps=5
>> Thanks,
>> Aman Deep Singh
>>
>> On 12-Jun-2017 5:44 PM, "Erik Hatcher"  wrote:
>>
>> Using ps=5 causes the phrase matching to be unordered matching.   You’ll
>> have to set ps=0, if using edismax, to get exact order phrase matches.
>>
>>Erik
>>
>>
>>> On Jun 12, 2017, at 1:09 AM, Aman Deep Singh 
>> wrote:
>>>
>>> Hi,
>>> I'm using a phrase query ,but it was applying the phrase boost to the
>> query
>>> where terms are in reverse order also ,which i don't want.Is their any way
>>> to avoid the phrase boost for reverse order and apply boost only in case
>> of
>>> terms are in same sequence
>>>
>>> Solr version 6.5.1
>>>
>>> e.g.
>>> http://localhost:8983/solr/l4_collection/select?debugQuery=o
>> n&defType=edismax&fl=score,nameSearch&indent=on&mm=100%25&
>> pf=nameSearch&q=12345%20masitha&qf=nameSearch&wt=xml&ps=5
>>>
>>>
>>> while my document has value
>>>
>>> in the debug query it is applying boost as
>>> 23.28365 = sum of:
>>> 15.112219 = sum of:
>>> 9.669338 = weight(nameSearch:12345 in 0) [SchemaSimilarity], result of:
>>> 9.669338 = score(doc=0,freq=1.0 = termFreq=1.0
>>> ), product of:
>>> 7.6397386 = idf, computed as log(1 + (docCount - docFreq + 0.5) / (docFreq
>>> + 0.5)) from:
>>> 2.0 = docFreq
>>> 5197.0 = docCount
>>> 1.2656635 = tfNorm, computed as (freq * (k1 + 1)) / (freq + k1 * (1 - b +
>> b
>>> * fieldLength / avgFieldLength)) from:
>>> 1.0 = termFreq=1.0
>>> 1.2 = parameter k1
>>> 0.75 = parameter b
>>> 5.2576485 = avgFieldLength
>>> 2.56 = fieldLength
>>> 5.44288 = weight(nameSearch:masitha in 0) [SchemaSimilarity], result of:
>>> 5.44288 = score(doc=0,freq=1.0 = termFreq=1.0
>>> ), product of:
>>> 4.3004165 = idf, computed as log(1 + (docCount - docFreq + 0.5) / (docFreq
>>> + 0.5)) from:
>>> 70.0 = docFreq
>>> 5197.0 = docCount
>>> 1.2656635 = tfNorm, computed as (freq * (k1 + 1)) / (freq + k1 * (1 - b +
>> b
>>> * fieldLength / avgFieldLength)) from:
>>> 1.0 = termFreq=1.0
>>> 1.2 = parameter k1
>>> 0.75 = parameter b
>>> 5.2576485 = avgFieldLength
>>> 2.56 = fieldLength
>>> 8.171431 = weight(*nameSearch:"12345 masitha"~5 *in 0) [SchemaSimilarity],
>>> result of:
>>> 8.171431 = score(doc=0,freq=0.3334 = phraseFreq=0.3334
>>> ), product of:
>>> 11.940155 = idf(), sum of:
>>> 7.6397386 = idf, computed as log(1 + (docCount - docFreq + 0.5) / (docFreq
>>> + 0.5)) from:
>>> 2.0 = docFreq
>>> 5197.0 = docCount
>>> 4.3004165 = idf, computed as log(1 + (docCount - docFreq + 0.5) / (docFreq
>>> + 0.5)) from:
>>> 70.0 = docFreq
>>> 5197.0 = docCount
>>> 0.6843655 = tfNorm, computed as (freq * (k1 + 1)) / (freq + k1 * (1 - b +
>> b
>>> * fieldLength / avgFieldLength)) from:
>>> 0.3334 = phraseFreq=0.3334
>>> 1.2 = parameter k1
>>> 0.75 = parameter b
>>> 5.2576485 = avgFieldLength
>>> 2.56 = fieldLength
>>>
>>> Thanks,
>>> Aman Deep Singh
>

Re: Parallel API interface into SOLR

2017-06-12 Thread Erick Erickson

Have you looked at Streaming Aggregation/Streaming Expressions/Parallel SQL etc?

Best,
Erick

On Mon, Jun 12, 2017 at 9:24 AM, Rohit Jain  wrote:
> Hi folks,
>
> We have a solution where we would like to connect to SOLR via an API, submit 
> a query, and then pre-process the results before we return the results to our 
> users.  However, in some cases, it is possible that the results being 
> returned by SOLR, in a large distributed cluster deployment, is very large.  
> In these cases, we would like to set up parallel streams, so that each 
> parallel SOLR worker feeds directly into one of our processes distributed 
> across the cluster.  That way, we can pre-process those results in parallel, 
> before we consolidate (and potentially reduce / aggregate) the results 
> further for the user, who has a single client connection to our solution.  
> Sort of a MapReduce type scenario where our processors are the reducers.  We 
> could consume the results as returned by these SOLR Worker processes, or 
> perhaps have them shuffled based on a shard key, before our processes would 
> receive them.
>
> Any ideas on how this could be done?
>
> Rohit Jain

Re: _version_ as LongPointField returns error

2017-06-12 Thread Chris Hostetter


just replying to some comments/discussion in general rather then 
individual msgs/sentences..

* uninversion/FieldCache of *singlevalued* Points fields was fixed in SOLR-10472

* currently a bad idea to use indexed="true" Points for _version_ due to 
SOLR-10832

* AFAICT it's a good idea (in general, regardless of type) to use 
indexed="true" docValues="true" for _version_ (once SOLR-10832 is fixed) 
to ensure VersionInfo.getMaxVersionFromIndex doesn't make core 
load/reloads (and CDCR aparently) slow.



: Date: Mon, 12 Jun 2017 12:32:50 -0400
: From: Yonik Seeley 
: Reply-To: solr-user@lucene.apache.org
: To: "solr-user@lucene.apache.org" 
: Subject: Re: _version_ as LongPointField returns error
: 
: On Mon, Jun 12, 2017 at 12:24 PM, Shawn Feldman  
wrote:
: > Why do you need doc values though?  i'm never going to sort by version
: 
: Solr needs a quick lookup from docid->_version_
: If you don't have docValues, Solr tries to create an in-memory version
: (via the FieldCache).  That's not yet supported for Point* fields.
: 
: -Yonik
: 
: > On Mon, Jun 12, 2017 at 10:13 AM Yonik Seeley  wrote:
: >
: >> I think the _version_ field should be
: >>  - indexed="false"
: >>  - stored="false"
: >>  - docValues="true"
: >>
: >> -Yonik
: >>
: >>
: >> On Mon, Jun 12, 2017 at 12:08 PM, Shawn Feldman 
: >> wrote:
: >> > I changed all my TrieLong Fields to Point fields.  _version_ always
: >> returns
: >> > an error unless i turn on docvalues
: >> >
: >> >   
: >> >   
: >> >
: >> > Getting this error when i index.  Any ideas?
: >> >
: >> >
: >> >  Remote error message: Point fields can't use FieldCache. Use
: >> > docValues=true for field: _version_
: >> > solr2_1|at
: >> >
: >> 
org.apache.solr.update.processor.DistributedUpdateProcessor.doFinish(DistributedUpdateProcessor.java:973)
: >> > solr2_1|at
: >> >
: >> 
org.apache.solr.update.processor.DistributedUpdateProcessor.finish(DistributedUpdateProcessor.java:1912)
: >> > solr2_1|at
: >> >
: >> 
org.apache.solr.update.processor.LogUpdateProcessorFactory$LogUpdateProcessor.finish(LogUpdateProcessorFactory.java:182)
: >> > solr2_1|at
: >> >
: >> 
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:78)
: >> > solr2_1|at
: >> >
: >> 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:173)
: >> > solr2_1|at
: >> org.apache.solr.core.SolrCore.execute(SolrCore.java:2440)
: >> > solr2_1|at
: >> > org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:723)
: >>
: 

-Hoss
http://www.lucidworks.com/

Re: Proximity Search using edismax parser.

2017-06-12 Thread abhi Abhishek

Thanks for the suggestions Erik and Vrindavda,

i was trying to understand how does the above query work when we have slop
set to 10. the debug output of the SOLR Query gave the terms which were
being looked up but the transpositions done to look up the search wasn't
exposed.

i found following stack overflow link which describes the transpositions
done when one is looking for phrase with slop 4. is there a guide to
understand this?

https://stackoverflow.com/questions/25558195/lucene-proximity-search-for-phrase-with-more-than-two-words

Thanks in advance.

Best Regards,
Abhishek

On Mon, Jun 12, 2017 at 5:41 PM, Erik Hatcher 
wrote:

> Adding &debug=true to your search requests will give you the parsing
> details, so you can see how edismax interprets the query string and
> parameters to turn it into the underlying dismax and phrase queries.
>
> Erik
>
> > On Jun 12, 2017, at 3:22 AM, abhi Abhishek  wrote:
> >
> > Hi All,
> >  How does proximity Query work in SOLR.
> >
> > Example if i am running a query like below, for the field containing the
> > text “India registered a historical test match win against the arch rival
> > Pakistan here in Lords, England on Sunday”
> >
> > Query: “Test match India Pakistan” ~ 10
> >
> >I am interested in understanding the intermediate steps
> > involved here to understand the search behavior and determine how results
> > are being matched to the search phrase.
> >
> > Thanks in Advance,
> >
> > Abhishek
>
>

Re: _version_ as LongPointField returns error

2017-06-12 Thread Yonik Seeley

On Mon, Jun 12, 2017 at 12:24 PM, Shawn Feldman  wrote:
> Why do you need doc values though?  i'm never going to sort by version

Solr needs a quick lookup from docid->_version_
If you don't have docValues, Solr tries to create an in-memory version
(via the FieldCache).  That's not yet supported for Point* fields.

-Yonik

> On Mon, Jun 12, 2017 at 10:13 AM Yonik Seeley  wrote:
>
>> I think the _version_ field should be
>>  - indexed="false"
>>  - stored="false"
>>  - docValues="true"
>>
>> -Yonik
>>
>>
>> On Mon, Jun 12, 2017 at 12:08 PM, Shawn Feldman 
>> wrote:
>> > I changed all my TrieLong Fields to Point fields.  _version_ always
>> returns
>> > an error unless i turn on docvalues
>> >
>> >   
>> >   
>> >
>> > Getting this error when i index.  Any ideas?
>> >
>> >
>> >  Remote error message: Point fields can't use FieldCache. Use
>> > docValues=true for field: _version_
>> > solr2_1|at
>> >
>> org.apache.solr.update.processor.DistributedUpdateProcessor.doFinish(DistributedUpdateProcessor.java:973)
>> > solr2_1|at
>> >
>> org.apache.solr.update.processor.DistributedUpdateProcessor.finish(DistributedUpdateProcessor.java:1912)
>> > solr2_1|at
>> >
>> org.apache.solr.update.processor.LogUpdateProcessorFactory$LogUpdateProcessor.finish(LogUpdateProcessorFactory.java:182)
>> > solr2_1|at
>> >
>> org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:78)
>> > solr2_1|at
>> >
>> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:173)
>> > solr2_1|at
>> org.apache.solr.core.SolrCore.execute(SolrCore.java:2440)
>> > solr2_1|at
>> > org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:723)
>>

Re: _version_ as LongPointField returns error

2017-06-12 Thread Shawn Feldman

Why do you need doc values though?  i'm never going to sort by version

On Mon, Jun 12, 2017 at 10:13 AM Yonik Seeley  wrote:

> I think the _version_ field should be
>  - indexed="false"
>  - stored="false"
>  - docValues="true"
>
> -Yonik
>
>
> On Mon, Jun 12, 2017 at 12:08 PM, Shawn Feldman 
> wrote:
> > I changed all my TrieLong Fields to Point fields.  _version_ always
> returns
> > an error unless i turn on docvalues
> >
> >   
> >   
> >
> > Getting this error when i index.  Any ideas?
> >
> >
> >  Remote error message: Point fields can't use FieldCache. Use
> > docValues=true for field: _version_
> > solr2_1|at
> >
> org.apache.solr.update.processor.DistributedUpdateProcessor.doFinish(DistributedUpdateProcessor.java:973)
> > solr2_1|at
> >
> org.apache.solr.update.processor.DistributedUpdateProcessor.finish(DistributedUpdateProcessor.java:1912)
> > solr2_1|at
> >
> org.apache.solr.update.processor.LogUpdateProcessorFactory$LogUpdateProcessor.finish(LogUpdateProcessorFactory.java:182)
> > solr2_1|at
> >
> org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:78)
> > solr2_1|at
> >
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:173)
> > solr2_1|at
> org.apache.solr.core.SolrCore.execute(SolrCore.java:2440)
> > solr2_1|at
> > org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:723)
>

Parallel API interface into SOLR

2017-06-12 Thread Rohit Jain

Hi folks,

We have a solution where we would like to connect to SOLR via an API, submit a 
query, and then pre-process the results before we return the results to our 
users.  However, in some cases, it is possible that the results being returned 
by SOLR, in a large distributed cluster deployment, is very large.  In these 
cases, we would like to set up parallel streams, so that each parallel SOLR 
worker feeds directly into one of our processes distributed across the cluster. 
 That way, we can pre-process those results in parallel, before we consolidate 
(and potentially reduce / aggregate) the results further for the user, who has 
a single client connection to our solution.  Sort of a MapReduce type scenario 
where our processors are the reducers.  We could consume the results as 
returned by these SOLR Worker processes, or perhaps have them shuffled based on 
a shard key, before our processes would receive them.

Any ideas on how this could be done?

Rohit Jain

Re: _version_ as LongPointField returns error

2017-06-12 Thread Yonik Seeley

I think the _version_ field should be
 - indexed="false"
 - stored="false"
 - docValues="true"

-Yonik


On Mon, Jun 12, 2017 at 12:08 PM, Shawn Feldman  wrote:
> I changed all my TrieLong Fields to Point fields.  _version_ always returns
> an error unless i turn on docvalues
>
>   
>   
>
> Getting this error when i index.  Any ideas?
>
>
>  Remote error message: Point fields can't use FieldCache. Use
> docValues=true for field: _version_
> solr2_1|at
> org.apache.solr.update.processor.DistributedUpdateProcessor.doFinish(DistributedUpdateProcessor.java:973)
> solr2_1|at
> org.apache.solr.update.processor.DistributedUpdateProcessor.finish(DistributedUpdateProcessor.java:1912)
> solr2_1|at
> org.apache.solr.update.processor.LogUpdateProcessorFactory$LogUpdateProcessor.finish(LogUpdateProcessorFactory.java:182)
> solr2_1|at
> org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:78)
> solr2_1|at
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:173)
> solr2_1|at org.apache.solr.core.SolrCore.execute(SolrCore.java:2440)
> solr2_1|at
> org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:723)

Re: _version_ as LongPointField returns error

2017-06-12 Thread Shawn Feldman

logged this ticket https://issues.apache.org/jira/browse/SOLR-10872

On Mon, Jun 12, 2017 at 10:08 AM Shawn Feldman 
wrote:

> I changed all my TrieLong Fields to Point fields.  _version_ always
> returns an error unless i turn on docvalues
>
>   
>   
>
> Getting this error when i index.  Any ideas?
>
>
>  Remote error message: Point fields can't use FieldCache. Use
> docValues=true for field: _version_
> solr2_1|at
> org.apache.solr.update.processor.DistributedUpdateProcessor.doFinish(DistributedUpdateProcessor.java:973)
> solr2_1|at
> org.apache.solr.update.processor.DistributedUpdateProcessor.finish(DistributedUpdateProcessor.java:1912)
> solr2_1|at
> org.apache.solr.update.processor.LogUpdateProcessorFactory$LogUpdateProcessor.finish(LogUpdateProcessorFactory.java:182)
> solr2_1|at
> org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:78)
> solr2_1|at
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:173)
> solr2_1|at
> org.apache.solr.core.SolrCore.execute(SolrCore.java:2440)
> solr2_1|at
> org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:723)
>

_version_ as LongPointField returns error

2017-06-12 Thread Shawn Feldman

I changed all my TrieLong Fields to Point fields.  _version_ always returns
an error unless i turn on docvalues

  
  

Getting this error when i index.  Any ideas?


 Remote error message: Point fields can't use FieldCache. Use
docValues=true for field: _version_
solr2_1|at
org.apache.solr.update.processor.DistributedUpdateProcessor.doFinish(DistributedUpdateProcessor.java:973)
solr2_1|at
org.apache.solr.update.processor.DistributedUpdateProcessor.finish(DistributedUpdateProcessor.java:1912)
solr2_1|at
org.apache.solr.update.processor.LogUpdateProcessorFactory$LogUpdateProcessor.finish(LogUpdateProcessorFactory.java:182)
solr2_1|at
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:78)
solr2_1|at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:173)
solr2_1|at org.apache.solr.core.SolrCore.execute(SolrCore.java:2440)
solr2_1|at
org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:723)

Re: Phrase Query only forward direction

2017-06-12 Thread Erik Hatcher

Understood.   If you need ordered, “sloppy” (some distance) phrases, you could 
OR in a {!complexphrase} query.

https://cwiki.apache.org/confluence/display/solr/Other+Parsers#OtherParsers-ComplexPhraseQueryParser
 


Something like:

q=({!edismax … ps=0 v=$qq}) OR {!complexphrase df=nameSearch v=$qq}

where &qq=12345 masitha

Erik


> On Jun 12, 2017, at 9:57 AM, Aman Deep Singh  
> wrote:
> 
> Yes Erik I can use ps=0 but, my problem is that I want phrase which have
> same sequence and they can be present with in some distance
> E.g.
> If I have document masitha xyz 12345
> I want that to be boosted since the sequence is in order .That's why I have
> use ps=5
> Thanks,
> Aman Deep Singh
> 
> On 12-Jun-2017 5:44 PM, "Erik Hatcher"  wrote:
> 
> Using ps=5 causes the phrase matching to be unordered matching.   You’ll
> have to set ps=0, if using edismax, to get exact order phrase matches.
> 
>Erik
> 
> 
>> On Jun 12, 2017, at 1:09 AM, Aman Deep Singh 
> wrote:
>> 
>> Hi,
>> I'm using a phrase query ,but it was applying the phrase boost to the
> query
>> where terms are in reverse order also ,which i don't want.Is their any way
>> to avoid the phrase boost for reverse order and apply boost only in case
> of
>> terms are in same sequence
>> 
>> Solr version 6.5.1
>> 
>> e.g.
>> http://localhost:8983/solr/l4_collection/select?debugQuery=o
> n&defType=edismax&fl=score,nameSearch&indent=on&mm=100%25&
> pf=nameSearch&q=12345%20masitha&qf=nameSearch&wt=xml&ps=5
>> 
>> 
>> while my document has value
>> 
>> in the debug query it is applying boost as
>> 23.28365 = sum of:
>> 15.112219 = sum of:
>> 9.669338 = weight(nameSearch:12345 in 0) [SchemaSimilarity], result of:
>> 9.669338 = score(doc=0,freq=1.0 = termFreq=1.0
>> ), product of:
>> 7.6397386 = idf, computed as log(1 + (docCount - docFreq + 0.5) / (docFreq
>> + 0.5)) from:
>> 2.0 = docFreq
>> 5197.0 = docCount
>> 1.2656635 = tfNorm, computed as (freq * (k1 + 1)) / (freq + k1 * (1 - b +
> b
>> * fieldLength / avgFieldLength)) from:
>> 1.0 = termFreq=1.0
>> 1.2 = parameter k1
>> 0.75 = parameter b
>> 5.2576485 = avgFieldLength
>> 2.56 = fieldLength
>> 5.44288 = weight(nameSearch:masitha in 0) [SchemaSimilarity], result of:
>> 5.44288 = score(doc=0,freq=1.0 = termFreq=1.0
>> ), product of:
>> 4.3004165 = idf, computed as log(1 + (docCount - docFreq + 0.5) / (docFreq
>> + 0.5)) from:
>> 70.0 = docFreq
>> 5197.0 = docCount
>> 1.2656635 = tfNorm, computed as (freq * (k1 + 1)) / (freq + k1 * (1 - b +
> b
>> * fieldLength / avgFieldLength)) from:
>> 1.0 = termFreq=1.0
>> 1.2 = parameter k1
>> 0.75 = parameter b
>> 5.2576485 = avgFieldLength
>> 2.56 = fieldLength
>> 8.171431 = weight(*nameSearch:"12345 masitha"~5 *in 0) [SchemaSimilarity],
>> result of:
>> 8.171431 = score(doc=0,freq=0.3334 = phraseFreq=0.3334
>> ), product of:
>> 11.940155 = idf(), sum of:
>> 7.6397386 = idf, computed as log(1 + (docCount - docFreq + 0.5) / (docFreq
>> + 0.5)) from:
>> 2.0 = docFreq
>> 5197.0 = docCount
>> 4.3004165 = idf, computed as log(1 + (docCount - docFreq + 0.5) / (docFreq
>> + 0.5)) from:
>> 70.0 = docFreq
>> 5197.0 = docCount
>> 0.6843655 = tfNorm, computed as (freq * (k1 + 1)) / (freq + k1 * (1 - b +
> b
>> * fieldLength / avgFieldLength)) from:
>> 0.3334 = phraseFreq=0.3334
>> 1.2 = parameter k1
>> 0.75 = parameter b
>> 5.2576485 = avgFieldLength
>> 2.56 = fieldLength
>> 
>> Thanks,
>> Aman Deep Singh

Re: Phrase Query only forward direction

2017-06-12 Thread Aman Deep Singh

Yes Erik I can use ps=0 but, my problem is that I want phrase which have
same sequence and they can be present with in some distance
E.g.
If I have document masitha xyz 12345
I want that to be boosted since the sequence is in order .That's why I have
use ps=5
Thanks,
Aman Deep Singh

On 12-Jun-2017 5:44 PM, "Erik Hatcher"  wrote:

Using ps=5 causes the phrase matching to be unordered matching.   You’ll
have to set ps=0, if using edismax, to get exact order phrase matches.

Erik


> On Jun 12, 2017, at 1:09 AM, Aman Deep Singh 
wrote:
>
> Hi,
> I'm using a phrase query ,but it was applying the phrase boost to the
query
> where terms are in reverse order also ,which i don't want.Is their any way
> to avoid the phrase boost for reverse order and apply boost only in case
of
> terms are in same sequence
>
> Solr version 6.5.1
>
> e.g.
> http://localhost:8983/solr/l4_collection/select?debugQuery=o
n&defType=edismax&fl=score,nameSearch&indent=on&mm=100%25&
pf=nameSearch&q=12345%20masitha&qf=nameSearch&wt=xml&ps=5
>
>
> while my document has value
>
> in the debug query it is applying boost as
> 23.28365 = sum of:
> 15.112219 = sum of:
> 9.669338 = weight(nameSearch:12345 in 0) [SchemaSimilarity], result of:
> 9.669338 = score(doc=0,freq=1.0 = termFreq=1.0
> ), product of:
> 7.6397386 = idf, computed as log(1 + (docCount - docFreq + 0.5) / (docFreq
> + 0.5)) from:
> 2.0 = docFreq
> 5197.0 = docCount
> 1.2656635 = tfNorm, computed as (freq * (k1 + 1)) / (freq + k1 * (1 - b +
b
> * fieldLength / avgFieldLength)) from:
> 1.0 = termFreq=1.0
> 1.2 = parameter k1
> 0.75 = parameter b
> 5.2576485 = avgFieldLength
> 2.56 = fieldLength
> 5.44288 = weight(nameSearch:masitha in 0) [SchemaSimilarity], result of:
> 5.44288 = score(doc=0,freq=1.0 = termFreq=1.0
> ), product of:
> 4.3004165 = idf, computed as log(1 + (docCount - docFreq + 0.5) / (docFreq
> + 0.5)) from:
> 70.0 = docFreq
> 5197.0 = docCount
> 1.2656635 = tfNorm, computed as (freq * (k1 + 1)) / (freq + k1 * (1 - b +
b
> * fieldLength / avgFieldLength)) from:
> 1.0 = termFreq=1.0
> 1.2 = parameter k1
> 0.75 = parameter b
> 5.2576485 = avgFieldLength
> 2.56 = fieldLength
> 8.171431 = weight(*nameSearch:"12345 masitha"~5 *in 0) [SchemaSimilarity],
> result of:
> 8.171431 = score(doc=0,freq=0.3334 = phraseFreq=0.3334
> ), product of:
> 11.940155 = idf(), sum of:
> 7.6397386 = idf, computed as log(1 + (docCount - docFreq + 0.5) / (docFreq
> + 0.5)) from:
> 2.0 = docFreq
> 5197.0 = docCount
> 4.3004165 = idf, computed as log(1 + (docCount - docFreq + 0.5) / (docFreq
> + 0.5)) from:
> 70.0 = docFreq
> 5197.0 = docCount
> 0.6843655 = tfNorm, computed as (freq * (k1 + 1)) / (freq + k1 * (1 - b +
b
> * fieldLength / avgFieldLength)) from:
> 0.3334 = phraseFreq=0.3334
> 1.2 = parameter k1
> 0.75 = parameter b
> 5.2576485 = avgFieldLength
> 2.56 = fieldLength
>
> Thanks,
> Aman Deep Singh

DIH issue with streaming xml file

2017-06-12 Thread Miller, William K - Norman, OK - Contractor

I am using Solr 6.5.1 and working on importing xml files using the 
DataImportHandler.  I am wanting to get the files from a remote server, but I 
am dealing with multiple xml files in multiple folders.  I am using a nested 
entity in my dataConfig.  Below is an example of how I have my dataConfig set 
up.  I got most of this from an online reference.  In this example I am getting 
the xml files from a folder on the Solr server, but as I mentioned above I want 
to get the files from a remote server.  I have looked at the different Entity 
Processors for the DIH, but have not seen anything that seems to work.  Is 
there a way to configure the below code to let me do this?

































~~~
William Kevin Miller
[ecsLogo]
ECS Federal, Inc.
USPS/MTSC
(405) 573-2158

Re: Phrase Query only forward direction

2017-06-12 Thread Erik Hatcher

Using ps=5 causes the phrase matching to be unordered matching.   You’ll have 
to set ps=0, if using edismax, to get exact order phrase matches.

Erik


> On Jun 12, 2017, at 1:09 AM, Aman Deep Singh  
> wrote:
> 
> Hi,
> I'm using a phrase query ,but it was applying the phrase boost to the query
> where terms are in reverse order also ,which i don't want.Is their any way
> to avoid the phrase boost for reverse order and apply boost only in case of
> terms are in same sequence
> 
> Solr version 6.5.1
> 
> e.g.
> http://localhost:8983/solr/l4_collection/select?debugQuery=on&defType=edismax&fl=score,nameSearch&indent=on&mm=100%25&pf=nameSearch&q=12345%20masitha&qf=nameSearch&wt=xml&ps=5
> 
> 
> while my document has value masitha 12345
> 
> in the debug query it is applying boost as
> 23.28365 = sum of:
> 15.112219 = sum of:
> 9.669338 = weight(nameSearch:12345 in 0) [SchemaSimilarity], result of:
> 9.669338 = score(doc=0,freq=1.0 = termFreq=1.0
> ), product of:
> 7.6397386 = idf, computed as log(1 + (docCount - docFreq + 0.5) / (docFreq
> + 0.5)) from:
> 2.0 = docFreq
> 5197.0 = docCount
> 1.2656635 = tfNorm, computed as (freq * (k1 + 1)) / (freq + k1 * (1 - b + b
> * fieldLength / avgFieldLength)) from:
> 1.0 = termFreq=1.0
> 1.2 = parameter k1
> 0.75 = parameter b
> 5.2576485 = avgFieldLength
> 2.56 = fieldLength
> 5.44288 = weight(nameSearch:masitha in 0) [SchemaSimilarity], result of:
> 5.44288 = score(doc=0,freq=1.0 = termFreq=1.0
> ), product of:
> 4.3004165 = idf, computed as log(1 + (docCount - docFreq + 0.5) / (docFreq
> + 0.5)) from:
> 70.0 = docFreq
> 5197.0 = docCount
> 1.2656635 = tfNorm, computed as (freq * (k1 + 1)) / (freq + k1 * (1 - b + b
> * fieldLength / avgFieldLength)) from:
> 1.0 = termFreq=1.0
> 1.2 = parameter k1
> 0.75 = parameter b
> 5.2576485 = avgFieldLength
> 2.56 = fieldLength
> 8.171431 = weight(*nameSearch:"12345 masitha"~5 *in 0) [SchemaSimilarity],
> result of:
> 8.171431 = score(doc=0,freq=0.3334 = phraseFreq=0.3334
> ), product of:
> 11.940155 = idf(), sum of:
> 7.6397386 = idf, computed as log(1 + (docCount - docFreq + 0.5) / (docFreq
> + 0.5)) from:
> 2.0 = docFreq
> 5197.0 = docCount
> 4.3004165 = idf, computed as log(1 + (docCount - docFreq + 0.5) / (docFreq
> + 0.5)) from:
> 70.0 = docFreq
> 5197.0 = docCount
> 0.6843655 = tfNorm, computed as (freq * (k1 + 1)) / (freq + k1 * (1 - b + b
> * fieldLength / avgFieldLength)) from:
> 0.3334 = phraseFreq=0.3334
> 1.2 = parameter k1
> 0.75 = parameter b
> 5.2576485 = avgFieldLength
> 2.56 = fieldLength
> 
> Thanks,
> Aman Deep Singh

Re: Proximity Search using edismax parser.

2017-06-12 Thread Erik Hatcher

Adding &debug=true to your search requests will give you the parsing details, 
so you can see how edismax interprets the query string and parameters to turn 
it into the underlying dismax and phrase queries.

Erik

> On Jun 12, 2017, at 3:22 AM, abhi Abhishek  wrote:
> 
> Hi All,
>  How does proximity Query work in SOLR.
> 
> Example if i am running a query like below, for the field containing the
> text “India registered a historical test match win against the arch rival
> Pakistan here in Lords, England on Sunday”
> 
> Query: “Test match India Pakistan” ~ 10
> 
>I am interested in understanding the intermediate steps
> involved here to understand the search behavior and determine how results
> are being matched to the search phrase.
> 
> Thanks in Advance,
> 
> Abhishek

Re: Proximity Search using edismax parser.

2017-06-12 Thread vrindavda

hi you can refer : http://yonik.com/solr/query-syntax/



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Proximity-Search-using-edismax-parser-tp4340115p4340133.html
Sent from the Solr - User mailing list archive at Nabble.com.

Proximity Search using edismax parser.

2017-06-12 Thread abhi Abhishek

Hi All,
  How does proximity Query work in SOLR.

Example if i am running a query like below, for the field containing the
text “India registered a historical test match win against the arch rival
Pakistan here in Lords, England on Sunday”

Query: “Test match India Pakistan” ~ 10

I am interested in understanding the intermediate steps
involved here to understand the search behavior and determine how results
are being matched to the search phrase.

Thanks in Advance,

Abhishek

Re: Phrase Query only forward direction

Re: Solr Join Failures

Re: Solr Join Failures

RE: Highlighter not working on some documents

Solr Join Failures

Re: _version_ as LongPointField returns error

RE: Parallel API interface into SOLR

RE: DIH issue with streaming xml file

Re: Use of blanks in context filter field with AnalyzingInfixLookupFactory

Re: DIH issue with streaming xml file

RE: DIH issue with streaming xml file

Re: Use of blanks in context filter field with AnalyzingInfixLookupFactory

Re: Parallel API interface into SOLR

Re: DIH issue with streaming xml file

Use of blanks in context filter field with AnalyzingInfixLookupFactory

RE: Parallel API interface into SOLR

Re: Phrase Query only forward direction

Re: Parallel API interface into SOLR

Re: _version_ as LongPointField returns error

Re: Proximity Search using edismax parser.

Re: _version_ as LongPointField returns error

Re: _version_ as LongPointField returns error

Parallel API interface into SOLR

Re: _version_ as LongPointField returns error

Re: _version_ as LongPointField returns error

_version_ as LongPointField returns error

Re: Phrase Query only forward direction

Re: Phrase Query only forward direction

DIH issue with streaming xml file

Re: Phrase Query only forward direction

Re: Proximity Search using edismax parser.

Re: Proximity Search using edismax parser.

Proximity Search using edismax parser.

33 matches

Site Navigation

Mail list logo

Footer information