Re: Negating multiple array fileds

2016-02-16 Thread Binoy Dalal
Hi Shawn,
Please correct me If I'm wrong here, but don't the all inclusive range
query [* TO *] and an only wildcard query like the one above essentially do
the same thing from a black box perspective?
In such a case wouldn't it be better to default an only wildcard query to
an all inclusive range query?

On Wed, 17 Feb 2016, 06:47 Shawn Heisey  wrote:

> On 2/15/2016 9:22 AM, Jack Krupansky wrote:
> > I should also have noted that your full query:
> >
> > (-persons:*)AND(-places:*)AND(-orgs:*)
> >
> > can be written as:
> >
> > -persons:* -places:* -orgs:*
> >
> > Which may work as is, or can also be written as:
> >
> > *:* -persons:* -places:* -orgs:*
>
> Salman,
>
> One fact of Lucene operation is that purely negative queries do not
> work.  A negative query clause is like a subtraction.  If you make a
> query that only says "subtract these values", then you aren't going to
> get anything, because you did not start with anything.
>
> Adding the "*:*" clause at the beginning of the query says "start with
> everything."
>
> You might ask why a query of -field:value works, when I just said that
> it *won't* work.  This is because Solr has detected the problem and
> fixed it.  When the query is very simple (a single negated clause), Solr
> is able to detect the unworkable situation and implicitly add the "*:*"
> starting point, producing the expected results.  With more complex
> queries, like the one you are trying, this detection fails, and the
> query is executed as-is.
>
> Jack is an awesome member of this community.  I do not want to disparage
> him at all when I tell you that the rewritten query he provided will
> work, but is not optimal.  It can be optimized as the following:
>
> *:* -persons:[* TO *] -places:[* TO *] -orgs:[* TO *]
>
> A query clause of the format "field:*" is a wildcard query.  Behind the
> scenes, Solr will interpret this as "all possible values for field" --
> which sounds like it would be exactly what you're looking for, except
> that if there are ten million possible values in the field you're
> searching, the constructed Lucene query will quite literally include all
> ten million values.  Wildcard queries tend to use a lot of memory and
> run slowly.
>
> The [* TO *] syntax is an all-inclusive range query, which will usually
> be much faster than a wildcard query.
>
> Thanks,
> Shawn
>
> --
Regards,
Binoy Dalal


Re: Negating multiple array fileds

2016-02-16 Thread Shawn Heisey
On 2/15/2016 9:22 AM, Jack Krupansky wrote:
> I should also have noted that your full query:
>
> (-persons:*)AND(-places:*)AND(-orgs:*)
>
> can be written as:
>
> -persons:* -places:* -orgs:*
>
> Which may work as is, or can also be written as:
>
> *:* -persons:* -places:* -orgs:*

Salman,

One fact of Lucene operation is that purely negative queries do not
work.  A negative query clause is like a subtraction.  If you make a
query that only says "subtract these values", then you aren't going to
get anything, because you did not start with anything.

Adding the "*:*" clause at the beginning of the query says "start with
everything."

You might ask why a query of -field:value works, when I just said that
it *won't* work.  This is because Solr has detected the problem and
fixed it.  When the query is very simple (a single negated clause), Solr
is able to detect the unworkable situation and implicitly add the "*:*"
starting point, producing the expected results.  With more complex
queries, like the one you are trying, this detection fails, and the
query is executed as-is.

Jack is an awesome member of this community.  I do not want to disparage
him at all when I tell you that the rewritten query he provided will
work, but is not optimal.  It can be optimized as the following:

*:* -persons:[* TO *] -places:[* TO *] -orgs:[* TO *]

A query clause of the format "field:*" is a wildcard query.  Behind the
scenes, Solr will interpret this as "all possible values for field" --
which sounds like it would be exactly what you're looking for, except
that if there are ten million possible values in the field you're
searching, the constructed Lucene query will quite literally include all
ten million values.  Wildcard queries tend to use a lot of memory and
run slowly.

The [* TO *] syntax is an all-inclusive range query, which will usually
be much faster than a wildcard query.

Thanks,
Shawn



RE: Solr and Nutch integration

2016-02-16 Thread Markus Jelsma
Hello Tom - Nutch 2.x has iirc old SolrServer client implemented. It should 
just send an HTTP request to a specified node. The Solr node will then forward 
it to a destination shard. In Nutch, you should set up indexer-solr as an 
indexing plugin in the plugin.includes configuration directive and use the 
bin/nutch index ... command.

If there are errors in logs, please check Nutch' logs/hadoop.log and/or Solr's 
log and copy/paste it to the mailing list.

It would be best to continue this on Nutch' mailing list. In most cases it is a 
Nutch configuration problem except when you are not using Nutch' schema.xml in 
your Solr set up.

Regards,
Markus

 
 
-Original message-
> From:Tom Running 
> Sent: Tuesday 16th February 2016 21:40
> To: solr-user 
> Subject: Solr and Nutch integration
> 
> I am having problem configuring Solr to read Nutch data or Integrate with
> Nutch.
> Does  anyone able to get SOLR 5.4.x to work with Nutch?
> 
> I went through lot of google's article any still not able to get SOLR 5.4.1
> to searching Nutch contents.
> 
> Any howto or working configuration sample that you can share will be
> greatly appreciate it.
> 
> Thanks,
> Toom
> 


Re: Using Solr's spatial functionality for astronomical catalog

2016-02-16 Thread david.w.smi...@gmail.com
Ah; I saw that.  I'm glad you figured it out.  Yes, you needed the SQL
alias.  I'm kinda surprised you didn't get an error about a field by the
name of your expression not existing... but maybe you have a catch-all
dynamic field or maybe you're in data-driven mode.  In either case, I'd
expect a quick select of your data would show those fields.  And your first
query wasn't a spatial query; there is no spatial=true parameter.  You got
it right the second time.

On Tue, Feb 16, 2016 at 4:03 AM Colin Freas  wrote:

>
> Looks like the only issue was that I did not have an alias for SourceRpt
> field in the SQL.
>
> With that in place, everything seems to work more or less as expected.
> SourceRpt shows up where it should.
>
> Queries like
>
> http://localhost:8983/solr/spatial/select?q=*:*&fq={!geofilt%20sfield=Sour
> 
> ceRpt}&pt=0,0&d=5
>
>
>
> ... return appropriate subsets.
>
> Doh,
> Colin
>
> On 2/16/16, 3:31 AM, "Colin Freas"  wrote:
>
> >
> >David, thanks for getting back to me.  SpatialRecursivePrefixTreeFieldType
> >seems to be what I need, and the default search seems appropriate.  This
> >is for entries in an astronomical catalog, so great circle distances on a
> >perfect sphere is what I¹m after.
> >
> >I am having a bit of difficulty though.
> >
> >Having gotten records importing via our database into a schema on both a
> >stand-along Solr instance and in a SolrCloud cluster, I¹ve moved on to
> >³spatializing² the appropriate fields, and everything looks like it¹s
> >working, in that there are no errors thrown.  But when I try what I think
> >is valid spatial query, it doesn¹t work.
> >
> >Here¹s what I¹m doing.  Pertinent bits from my schema:
> >
> >   ...
> >stored="false"
> >required="false" multiValued="false" />
> >   CatID
> >   
> >   
> >class="solr.SpatialRecursivePrefixTreeFieldType"
> >   geo="true"
> >   distanceUnits="degrees" />
> >   ...
> >   
> >
> >In my db-config.xml, I¹ve got this sql:
> >   
> >
> >
> >When I run a data import through Solr¹s admin gui and look at the verbose
> >debug output, something seems off.  Top of the output is this:
> >   {
> >   "responseHeader": {
> >   "status": 0,
> >   "QTime": 75
> >   },
> >   "initArgs": [
> >   "defaults",
> >   [
> >   "config",
> >   "hsc-db-config.xml"
> >   ]
> >   ],
> >   "command": "full-import",
> >   "mode": "debug",
> >   "documents": [
> >   {
> >   "MatchDec": [
> >   -0.67312569921
> >   ],
> >   "SourceDec": [
> >   -0.67312569921
> >   ],
> >   "MatchRA": [
> >   0.5681586795334927
> >   ],
> >   "SourceRA": [
> >   0.5681586795334927
> >   ],
> >   "CatID": [
> >   25558943
> >   ]
> >   },
> >
> >
> >There¹s no SourceRpt field there.  But in verbose output of what¹s
> >returned from the query, the SourceRpt field seems to be correctly put
> >together:
> >
> >   "verbose-output": [
> >   "entity:observation",
> >   [
> >   "document#1",
> >   [
> >   "query",
> >   "SELECT CatID, MatchRA, MatchDec, SourceRA, SourceDec,   +
> >ltrim(rtrim(Str(SourceRA,25,16))) + ',' +
> >ltrim(rtrim(Str(SourceDec,25,16))) +  FROM xcat.BestCatalog",
> >   "time-taken",
> >   "0:0:0.22",
> >   null,
> >   "--- row #1-",
> >   "",
> >   "'0.5681586795334928,-0.673125699210'",
> >   "MatchDec",
> >   -0.67312569921,
> >   "SourceDec",
> >   -0.67312569921,
> >   "MatchRA",
> >   0.5681586795334927,
> >   "SourceRA",
> >   0.5681586795334927,
> >   "CatID",
> >   25558943,
> >   null,
> >   "-"
> >   ]
> >
> >
> >
> >I try a spatial search like this:
> >
> http://localhost:8983/solr/spatial/select?q=*%3A*&wt=json&indent=true&spa
> >t
> >ial=true&pt=3%2C3&sfield=SourceRpt&d=0.0001
> >
> >
> >... And I get back all (10) records in the core, when I would expect 0,
> >given the very small distance I supply to a point well away from any of
> >the records.
> >
> >I¹m not sure what¹s going on.  I don¹t know if this is a simple Solr
> >config error I¹m missing, or if there¹s some spatial magic I¹m unaware of.
> >
> >
> >Any thoughts appreciated.
> >
> >-Colin
> >
> >
> >On 1/20/16, 9:34 PM, "david.w.smi...@gmail.com"  >
> >wrote:
> >
> >>Hello Colin,
> >>
> >>If the spatial field you use is the SpatialRecursivePrefixTreeFieldType
> >>one
> >>(RPT for short) with geo="true" then the circle shape (i.e. point-radius
> >>filter) implied by the geofilt Solr QParser is on a sphere.  That is, it
> >>uses the "great circle" distance computed using the Haversine formula by
> >>default, though it can be configured to use the Law of Cosines formula or

Solr and Nutch integration

2016-02-16 Thread Tom Running
I am having problem configuring Solr to read Nutch data or Integrate with
Nutch.
Does  anyone able to get SOLR 5.4.x to work with Nutch?

I went through lot of google's article any still not able to get SOLR 5.4.1
to searching Nutch contents.

Any howto or working configuration sample that you can share will be
greatly appreciate it.

Thanks,
Toom


Re: SOLR ranking

2016-02-16 Thread david.w.smi...@gmail.com
I just want to interject to say one thing:
You *can* sort on multi-valued fields as-of recent Solr 5 releases.  it's
done using the "field" function query with either a "min" or "max" 2nd
argument:
https://cwiki.apache.org/confluence/display/solr/Function+Queries
Of course it'd be nicer to simply sort asc/desc on the field like normally
and not use this special syntax but AFAIK that convenience hasn't been
added yet.

~ David

On Mon, Feb 15, 2016 at 10:26 AM Binoy Dalal  wrote:

> I'm sorry, missed that part. It's true, you cannot sort on multivalued
> fields. The workaround will be pretty complex; you'll either have to find
> the max or min value of the fields at index time and store those in
> separate fields and use those to sort, or somehow come up with some
> function that can convert the values from your multivalued field into a
> single value (something like sum(field)) but it surely won't be trivial.
>
> Instead you should do what Emir's saying.
> Boost your fields at index or query time based on how you want to sort your
> documents.
> So in your case, give the highest boost to topic_title then a little lower
> to subtopic_title and so on. This should return your documents in the
> correct order.
> You will have to play around with the boost values a little to get them
> right, though.
>
> Alternatively, you could boost on the multivalued fields and then sort
> based on your single valued fields.
>
> Either ways, you'll have to experiment and see what works best for you.
>
> On Mon, Feb 15, 2016 at 8:21 PM Nitin.K  wrote:
>
> > Thanks Binoy..
> >
> > Actually it is throwing following error:
> >
> > can not sort on multivalued field: index_term
> >
> >
> >
> > --
> > View this message in context:
> > http://lucene.472066.n3.nabble.com/SOLR-ranking-tp4257367p4257378.html
> > Sent from the Solr - User mailing list archive at Nabble.com.
> >
> --
> Regards,
> Binoy Dalal
>
-- 
Lucene/Solr Search Committer, Consultant, Developer, Author, Speaker
LinkedIn: http://linkedin.com/in/davidwsmiley | Book:
http://www.solrenterprisesearchserver.com


RE: Which open-source crawler to use with SolrJ and Postgresql ?

2016-02-16 Thread Markus Jelsma
Hello - Nutch 1.x is much more feature rich than 2.x, both can do tremendous 
large crawls with ease. I haven't tried all others mentioned except ManifoldCF, 
which is very good in retrieving data from shared file systems and stuff like 
filenet.

We use Nutch 1.x for most of our crawls, small and large. And actively create 
issues and commits. Nutch 2.x is fun though in case your primary data store is 
not a Hadoop sequence file, but any store supported by Apache Gora, which has 
matured and stabilized a lot.

Markus

 
 
-Original message-
> From:Davis, Daniel (NIH/NLM) [C] 
> Sent: Tuesday 16th February 2016 17:08
> To: solr-user@lucene.apache.org
> Subject: RE: Which open-source crawler to use with SolrJ and Postgresql ?
> 
> I'm far, far from an expert on this sort of thing, but my personal experience 
> 1-year ago was that Nutch-1 was easier to use, and the blog post I link below 
> suggests that the abstraction layer in Nutch-2 really costs some time.I 
> expect that Nutch-2 has matured some since then, but going with Nutch-1 is 
> not a bad choice.
> 
> http://digitalpebble.blogspot.com/2013/09/nutch-fight-17-vs-221.html
> 
> There are other dogs in this fight, as shown by the SolrEcosystem wiki page:
> 
> https://wiki.apache.org/solr/SolrEcosystem
> 
> - Apache Manifold CF has a crawler for web pages and a GUI to configure and 
> start things that must be done by hand for Nutch (unless there is a front-end 
> I don't know about).Web crawling is not the prime reason for which 
> Manifold CF exists.
> - Heritrix is a good crawler, dedicated to handling broad and incremental 
> crawling well.
> - Narconex Collectors is sort of a toolkit for building such crawlers.
> - Aspire (by Search Technologies) seems a bit complex, but has a web crawler. 
>Again it's more of a toolkit for building such crawlers.
> 
> I sure which I knew which one to go with ;)
> 
> Dan Davis, Systems/Applications Architect (Contractor),
> Office of Computer and Communications Systems,
> National Library of Medicine, NIH
> 
> 
> 
> -Original Message-
> From: Emir Arnautovic [mailto:emir.arnauto...@sematext.com] 
> Sent: Tuesday, February 16, 2016 10:58 AM
> To: solr-user@lucene.apache.org
> Subject: Re: Which open-source crawler to use with SolrJ and Postgresql ?
> 
> Markus,
> Ticket I run into is for Nutch2 and NUTCH-2197 is for Nutch1.
> 
> Haven't been using Nutch for a while so cannot recommend version.
> 
> Thanks,
> Emir
> 
> On 16.02.2016 16:37, Markus Jelsma wrote:
> > Nutch has Solr 5 cloud support in trunk, i committed it earlier this month.
> > https://issues.apache.org/jira/browse/NUTCH-2197
> >
> > Markus
> >   
> > -Original message-
> >> From:Emir Arnautovic 
> >> Sent: Tuesday 16th February 2016 16:26
> >> To: solr-user@lucene.apache.org
> >> Subject: Re: Which open-source crawler to use with SolrJ and Postgresql ?
> >>
> >> Hi,
> >> It is most common to use Nutch as crawler, but it seems that it still 
> >> does not have support for SolrCloud (if I am reading this ticket 
> >> correctly https://issues.apache.org/jira/browse/NUTCH-1662). Anyway, 
> >> I would recommend Nutch with standard http client.
> >>
> >> Regards,
> >> Emir
> >>
> >> On 16.02.2016 16:02, Victor D'agostino wrote:
> >>> Hi
> >>>
> >>> I am building a Solr 5 architecture with 3 Solr nodes and 1 zookeeper.
> >>> The database backend is postgresql 9 on RHEL 6.
> >>>
> >>> I am looking for a free open-source crawler which use SolrJ.
> >>>
> >>> What do you guys recommend ?
> >>>
> >>> Best regards
> >>> Victor d'Agostino
> >>>
> >>>
> >>> 
> >>> 
> >>> Ce message et les éventuels documents joints peuvent contenir des 
> >>> informations confidentielles. Au cas où il ne vous serait pas 
> >>> destiné, nous vous remercions de bien vouloir le supprimer et en 
> >>> aviser immédiatement l'expéditeur. Toute utilisation de ce message 
> >>> non conforme à sa destination, toute diffusion ou publication, 
> >>> totale ou partielle et quel qu'en soit le moyen est formellement 
> >>> interdite. Les communications sur internet n'étant pas sécurisées, 
> >>> l'intégrité de ce message n'est pas assurée et la société émettrice 
> >>> ne peut être tenue pour responsable de son contenu.
> >> --
> >> Monitoring * Alerting * Anomaly Detection * Centralized Log 
> >> Management Solr & Elasticsearch Support * http://sematext.com/
> >>
> >>
> 
> --
> Monitoring * Alerting * Anomaly Detection * Centralized Log Management Solr & 
> Elasticsearch Support * http://sematext.com/
> 
> 


Re: Errors on master after upgrading to 4.10.3

2016-02-16 Thread Joseph Hagerty
Does literally nobody else see this error in their logs? I see this error
hundreds of times per day, in occasional bursts. Should I file this as a
bug?

On Mon, Feb 15, 2016 at 4:56 PM, Joseph Hagerty  wrote:

> After migrating from 3.5 to 4.10.3, I'm seeing the following error with
> alarming regularity in the master's error log:
>
> 2/15/2016, 4:32:22 PM ERROR PDSimpleFont Can't determine the width of the
> space character using 250 as default
> I can't seem to glean much information about this one from the web. Has
> anyone else fought this error?
>
> In case this helps, here's some technical/miscellaneous info:
>
> - I'm running a master-slave set-up.
>
> - I rely on the ERH (tika/solr-cell/whatever) for extracting plaintext
> from .docs and .pdfs. I'm guessing that PDSimpleFont is a component of
> this, but I don't know the first thing about it.
>
> - I have the clients specifying 'autocommit=6s' in their requests, which I
> realize is a pretty aggressive commit interval, but so far that hasn't
> caused any problems I couldn't surmount.
>
> - There are north of 11 million docs in my index, which is 36 gigs thick.
> The storage volume is only 10% full.
>
> - When I migrated from 3.5 to 4.10.3, I correctly performed a reindex due
> to incompatibility between versions.
>
> - Both master and slave are running on AWS instances, C4.4XL's (16 cores,
> 30 gigs of RAM).
>
> So far, I have been unable to reproduce this error on my own: I can only
> observe it in the logs. I haven't been able to tie it to any specific
> document.
>
> Let me know if further information would be helpful.
>
>
>
>


-- 
- Joe


Re: Why is my index size going up (or: why it was smaller)?

2016-02-16 Thread Chris Hostetter

: I'm testing this on Windows, so that maybe a factor too (the OS is not
: releasing file handles?!)

specifically: Windows won't let Solr delete files on disk that have open 
file handles...

https://wiki.apache.org/solr/FAQ#Why_doesn.27t_my_index_directory_get_smaller_.28immediately.29_when_i_delete_documents.3F_force_a_merge.3F_optimize.3F



-Hoss
http://www.lucidworks.com/


Re: Why is my index size going up (or: why it was smaller)?

2016-02-16 Thread Steven White
Here is how I was testing: stop Solr, delete the "data" folder, start Solr,
start indexing, and finally check index size.

I used the same pattern for the before and after my (see my original email)
and each time I run this test, the index size ended up being larger;
restarting Solr did the trick.

Each document I'm adding is unique so there is no deletion involved here at
all.

I'm testing this on Windows, so that maybe a factor too (the OS is not
releasing file handles?!)

Steve


On Tue, Feb 16, 2016 at 11:57 AM, Shawn Heisey  wrote:

> On 2/16/2016 9:37 AM, Steven White wrote:
> > I found the issue: as soon as I restart Solr, the index size goes down.
> >
> > My index and data size must have been at a border line where some
> segments
> > are not released on my last document commit.
>
> I think the only likely thing that could cause this behavior is having
> index segments that are composed fully of deleted documents, which
> supports the idea that Upayavira mentioned.  An optimize would probably
> cause the same behavior as the restart.
>
> If you do enough indexing to cause a segment merge, that would probably
> also remove segments composed only of deleted documents.
>
> Thanks,
> Shawn
>
>


Re: Why is my index size going up (or: why it was smaller)?

2016-02-16 Thread Shawn Heisey
On 2/16/2016 9:37 AM, Steven White wrote:
> I found the issue: as soon as I restart Solr, the index size goes down.
>
> My index and data size must have been at a border line where some segments
> are not released on my last document commit.

I think the only likely thing that could cause this behavior is having
index segments that are composed fully of deleted documents, which
supports the idea that Upayavira mentioned.  An optimize would probably
cause the same behavior as the restart.

If you do enough indexing to cause a segment merge, that would probably
also remove segments composed only of deleted documents.

Thanks,
Shawn



RE: Delay in replication between cloud servers

2016-02-16 Thread Cool Techi
Further we have noticed that the delay increase a couple of hours after 
restart. Details related to sorlconfig.xml are given below,
  
   15000 
   25000
   false 
 

 
 1000 
   Regards,Rohit 

> From: cooltec...@outlook.com
> To: solr-user@lucene.apache.org
> Subject: Delay in replication between cloud servers
> Date: Tue, 16 Feb 2016 20:20:04 +0530
> 
> We are using solr cloud with 1 shard and replication factor as 3. We are 
> noticing that the time for data to become available across all replicas from 
> the leader is very high.
> The data rate is not very high, is there anyway to control this. In 
> master-slave setup with give a replication time.
> Regards,Rohit  
> 
  

Re: Why is my index size going up (or: why it was smaller)?

2016-02-16 Thread Steven White
I found the issue: as soon as I restart Solr, the index size goes down.

My index and data size must have been at a border line where some segments
are not released on my last document commit.

Steve

On Mon, Feb 15, 2016 at 11:09 PM, Shawn Heisey  wrote:

> On 2/15/2016 1:12 PM, Steven White wrote:
> > I'm fixing code that I noticed to have a defect.  My expectation was that
> > once I make the fix, the index size will be smaller but instead I see it
> > growing.
>
> I'm going to assume that SolrField_ID_LIST and SolrField_ALL_FIELDS_DATA
> are String instances that contain "ID_LIST" and "ALL_FIELDS_DATA".
>
> All three pieces of code will add exactly one document with exactly two
> fields.  The value of "field" is never used in any of the code loops,
> and "doc" is never reset/changed.
>
> I'm guessing that the actual code is more complex than the code
> fragments that you shared.  We will need to see actual code, because the
> shared code looks incomplete.
>
> Thanks,
> Shawn
>
>


RE: Which open-source crawler to use with SolrJ and Postgresql ?

2016-02-16 Thread Davis, Daniel (NIH/NLM) [C]
I'm far, far from an expert on this sort of thing, but my personal experience 
1-year ago was that Nutch-1 was easier to use, and the blog post I link below 
suggests that the abstraction layer in Nutch-2 really costs some time.I 
expect that Nutch-2 has matured some since then, but going with Nutch-1 is not 
a bad choice.

http://digitalpebble.blogspot.com/2013/09/nutch-fight-17-vs-221.html

There are other dogs in this fight, as shown by the SolrEcosystem wiki page:

https://wiki.apache.org/solr/SolrEcosystem

- Apache Manifold CF has a crawler for web pages and a GUI to configure and 
start things that must be done by hand for Nutch (unless there is a front-end I 
don't know about).Web crawling is not the prime reason for which Manifold 
CF exists.
- Heritrix is a good crawler, dedicated to handling broad and incremental 
crawling well.
- Narconex Collectors is sort of a toolkit for building such crawlers.
- Aspire (by Search Technologies) seems a bit complex, but has a web crawler.   
 Again it's more of a toolkit for building such crawlers.

I sure which I knew which one to go with ;)

Dan Davis, Systems/Applications Architect (Contractor),
Office of Computer and Communications Systems,
National Library of Medicine, NIH



-Original Message-
From: Emir Arnautovic [mailto:emir.arnauto...@sematext.com] 
Sent: Tuesday, February 16, 2016 10:58 AM
To: solr-user@lucene.apache.org
Subject: Re: Which open-source crawler to use with SolrJ and Postgresql ?

Markus,
Ticket I run into is for Nutch2 and NUTCH-2197 is for Nutch1.

Haven't been using Nutch for a while so cannot recommend version.

Thanks,
Emir

On 16.02.2016 16:37, Markus Jelsma wrote:
> Nutch has Solr 5 cloud support in trunk, i committed it earlier this month.
> https://issues.apache.org/jira/browse/NUTCH-2197
>
> Markus
>   
> -Original message-
>> From:Emir Arnautovic 
>> Sent: Tuesday 16th February 2016 16:26
>> To: solr-user@lucene.apache.org
>> Subject: Re: Which open-source crawler to use with SolrJ and Postgresql ?
>>
>> Hi,
>> It is most common to use Nutch as crawler, but it seems that it still 
>> does not have support for SolrCloud (if I am reading this ticket 
>> correctly https://issues.apache.org/jira/browse/NUTCH-1662). Anyway, 
>> I would recommend Nutch with standard http client.
>>
>> Regards,
>> Emir
>>
>> On 16.02.2016 16:02, Victor D'agostino wrote:
>>> Hi
>>>
>>> I am building a Solr 5 architecture with 3 Solr nodes and 1 zookeeper.
>>> The database backend is postgresql 9 on RHEL 6.
>>>
>>> I am looking for a free open-source crawler which use SolrJ.
>>>
>>> What do you guys recommend ?
>>>
>>> Best regards
>>> Victor d'Agostino
>>>
>>>
>>> 
>>> 
>>> Ce message et les éventuels documents joints peuvent contenir des 
>>> informations confidentielles. Au cas où il ne vous serait pas 
>>> destiné, nous vous remercions de bien vouloir le supprimer et en 
>>> aviser immédiatement l'expéditeur. Toute utilisation de ce message 
>>> non conforme à sa destination, toute diffusion ou publication, 
>>> totale ou partielle et quel qu'en soit le moyen est formellement 
>>> interdite. Les communications sur internet n'étant pas sécurisées, 
>>> l'intégrité de ce message n'est pas assurée et la société émettrice 
>>> ne peut être tenue pour responsable de son contenu.
>> --
>> Monitoring * Alerting * Anomaly Detection * Centralized Log 
>> Management Solr & Elasticsearch Support * http://sematext.com/
>>
>>

--
Monitoring * Alerting * Anomaly Detection * Centralized Log Management Solr & 
Elasticsearch Support * http://sematext.com/



Re: Which open-source crawler to use with SolrJ and Postgresql ?

2016-02-16 Thread Emir Arnautovic

Markus,
Ticket I run into is for Nutch2 and NUTCH-2197 is for Nutch1.

Haven't been using Nutch for a while so cannot recommend version.

Thanks,
Emir

On 16.02.2016 16:37, Markus Jelsma wrote:

Nutch has Solr 5 cloud support in trunk, i committed it earlier this month.
https://issues.apache.org/jira/browse/NUTCH-2197

Markus
  
-Original message-

From:Emir Arnautovic 
Sent: Tuesday 16th February 2016 16:26
To: solr-user@lucene.apache.org
Subject: Re: Which open-source crawler to use with SolrJ and Postgresql ?

Hi,
It is most common to use Nutch as crawler, but it seems that it still
does not have support for SolrCloud (if I am reading this ticket
correctly https://issues.apache.org/jira/browse/NUTCH-1662). Anyway, I
would recommend Nutch with standard http client.

Regards,
Emir

On 16.02.2016 16:02, Victor D'agostino wrote:

Hi

I am building a Solr 5 architecture with 3 Solr nodes and 1 zookeeper.
The database backend is postgresql 9 on RHEL 6.

I am looking for a free open-source crawler which use SolrJ.

What do you guys recommend ?

Best regards
Victor d'Agostino




Ce message et les éventuels documents joints peuvent contenir des
informations confidentielles. Au cas où il ne vous serait pas destiné,
nous vous remercions de bien vouloir le supprimer et en aviser
immédiatement l'expéditeur. Toute utilisation de ce message non
conforme à sa destination, toute diffusion ou publication, totale ou
partielle et quel qu'en soit le moyen est formellement interdite. Les
communications sur internet n'étant pas sécurisées, l'intégrité de ce
message n'est pas assurée et la société émettrice ne peut être tenue
pour responsable de son contenu.

--
Monitoring * Alerting * Anomaly Detection * Centralized Log Management
Solr & Elasticsearch Support * http://sematext.com/




--
Monitoring * Alerting * Anomaly Detection * Centralized Log Management
Solr & Elasticsearch Support * http://sematext.com/



Re: Which open-source crawler to use with SolrJ and Postgresql ?

2016-02-16 Thread Victor D'agostino

Hi,

Thanks for your help.
Nutch is exactly what i'm looking for and i'm feeling lucky the solr 
cloud support has just been comited !


I'll try the trunk version and wait until the 1.12 version is released.

Regards
Victor


Nutch has Solr 5 cloud support in trunk, i committed it earlier this month.
https://issues.apache.org/jira/browse/NUTCH-2197

Markus
  
-Original message-

From:Emir Arnautovic 
Sent: Tuesday 16th February 2016 16:26
To: solr-user@lucene.apache.org
Subject: Re: Which open-source crawler to use with SolrJ and Postgresql ?

Hi,
It is most common to use Nutch as crawler, but it seems that it still
does not have support for SolrCloud (if I am reading this ticket
correctly https://issues.apache.org/jira/browse/NUTCH-1662). Anyway, I
would recommend Nutch with standard http client.

Regards,
Emir

On 16.02.2016 16:02, Victor D'agostino wrote:

Hi

I am building a Solr 5 architecture with 3 Solr nodes and 1 zookeeper.
The database backend is postgresql 9 on RHEL 6.

I am looking for a free open-source crawler which use SolrJ.

What do you guys recommend ?

Best regards
Victor d'Agostino













Ce message et les éventuels documents joints peuvent contenir des informations confidentielles. Au cas où il ne vous serait pas destiné, nous vous remercions de bien vouloir le supprimer et en aviser immédiatement l'expéditeur. Toute utilisation de ce message non conforme à sa destination, toute diffusion ou publication, totale ou partielle et quel qu'en soit le moyen est formellement interdite. Les communications sur internet n'étant pas sécurisées, l'intégrité de ce message n'est pas assurée et la société émettrice ne peut être tenue pour responsable de son contenu. 


RE: Which open-source crawler to use with SolrJ and Postgresql ?

2016-02-16 Thread Markus Jelsma
Nutch has Solr 5 cloud support in trunk, i committed it earlier this month.
https://issues.apache.org/jira/browse/NUTCH-2197

Markus 
 
-Original message-
> From:Emir Arnautovic 
> Sent: Tuesday 16th February 2016 16:26
> To: solr-user@lucene.apache.org
> Subject: Re: Which open-source crawler to use with SolrJ and Postgresql ?
> 
> Hi,
> It is most common to use Nutch as crawler, but it seems that it still 
> does not have support for SolrCloud (if I am reading this ticket 
> correctly https://issues.apache.org/jira/browse/NUTCH-1662). Anyway, I 
> would recommend Nutch with standard http client.
> 
> Regards,
> Emir
> 
> On 16.02.2016 16:02, Victor D'agostino wrote:
> > Hi
> >
> > I am building a Solr 5 architecture with 3 Solr nodes and 1 zookeeper.
> > The database backend is postgresql 9 on RHEL 6.
> >
> > I am looking for a free open-source crawler which use SolrJ.
> >
> > What do you guys recommend ?
> >
> > Best regards
> > Victor d'Agostino
> >
> >
> > 
> > 
> > Ce message et les éventuels documents joints peuvent contenir des 
> > informations confidentielles. Au cas où il ne vous serait pas destiné, 
> > nous vous remercions de bien vouloir le supprimer et en aviser 
> > immédiatement l'expéditeur. Toute utilisation de ce message non 
> > conforme à sa destination, toute diffusion ou publication, totale ou 
> > partielle et quel qu'en soit le moyen est formellement interdite. Les 
> > communications sur internet n'étant pas sécurisées, l'intégrité de ce 
> > message n'est pas assurée et la société émettrice ne peut être tenue 
> > pour responsable de son contenu. 
> 
> -- 
> Monitoring * Alerting * Anomaly Detection * Centralized Log Management
> Solr & Elasticsearch Support * http://sematext.com/
> 
> 


Re: Which open-source crawler to use with SolrJ and Postgresql ?

2016-02-16 Thread Emir Arnautovic

Hi,
It is most common to use Nutch as crawler, but it seems that it still 
does not have support for SolrCloud (if I am reading this ticket 
correctly https://issues.apache.org/jira/browse/NUTCH-1662). Anyway, I 
would recommend Nutch with standard http client.


Regards,
Emir

On 16.02.2016 16:02, Victor D'agostino wrote:

Hi

I am building a Solr 5 architecture with 3 Solr nodes and 1 zookeeper.
The database backend is postgresql 9 on RHEL 6.

I am looking for a free open-source crawler which use SolrJ.

What do you guys recommend ?

Best regards
Victor d'Agostino




Ce message et les éventuels documents joints peuvent contenir des 
informations confidentielles. Au cas où il ne vous serait pas destiné, 
nous vous remercions de bien vouloir le supprimer et en aviser 
immédiatement l'expéditeur. Toute utilisation de ce message non 
conforme à sa destination, toute diffusion ou publication, totale ou 
partielle et quel qu'en soit le moyen est formellement interdite. Les 
communications sur internet n'étant pas sécurisées, l'intégrité de ce 
message n'est pas assurée et la société émettrice ne peut être tenue 
pour responsable de son contenu. 


--
Monitoring * Alerting * Anomaly Detection * Centralized Log Management
Solr & Elasticsearch Support * http://sematext.com/



Which open-source crawler to use with SolrJ and Postgresql ?

2016-02-16 Thread Victor D'agostino

Hi

I am building a Solr 5 architecture with 3 Solr nodes and 1 zookeeper.
The database backend is postgresql 9 on RHEL 6.

I am looking for a free open-source crawler which use SolrJ.

What do you guys recommend ?

Best regards
Victor d'Agostino




Ce message et les éventuels documents joints peuvent contenir des informations confidentielles. Au cas où il ne vous serait pas destiné, nous vous remercions de bien vouloir le supprimer et en aviser immédiatement l'expéditeur. Toute utilisation de ce message non conforme à sa destination, toute diffusion ou publication, totale ou partielle et quel qu'en soit le moyen est formellement interdite. Les communications sur internet n'étant pas sécurisées, l'intégrité de ce message n'est pas assurée et la société émettrice ne peut être tenue pour responsable de son contenu. 


Delay in replication between cloud servers

2016-02-16 Thread Cool Techi
We are using solr cloud with 1 shard and replication factor as 3. We are 
noticing that the time for data to become available across all replicas from 
the leader is very high.
The data rate is not very high, is there anyway to control this. In 
master-slave setup with give a replication time.
Regards,Rohit  
  

Re: join and NOT together

2016-02-16 Thread marotosg
Actually I was wrong this doesn't work. (-DocType:pdf)



--
View this message in context: 
http://lucene.472066.n3.nabble.com/join-and-NOT-together-tp4257411p4257620.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: doubt about timeAllowed

2016-02-16 Thread Anatoli Matuskova
Is there any way to tell timeAllow to just affect query component and not the
others?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/doubt-about-timeAllowed-tp4257363p4257622.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: SOLR ranking

2016-02-16 Thread Alessandro Benedetti
Sorry for the misleading mail, actually if you play with the slop factor,
that is going to be easy.

A proximity search can be done with a sloppy phrase query. The closer
> together the two terms appear in the document, the higher the score will
> be. A sloppy phrase query specifies a maximum "slop", or the number of
> positions tokens need to be moved to get a match.


I assume playing with pf and ps should be your solution.

Cheers

On 16 February 2016 at 12:50, Alessandro Benedetti 
wrote:

> You can describe the pf field as an exact phrase query : ""~0 .
> But
> You can specify the slop with :
>
> The ps Parameter
>
> Default amount of slop on phrase queries built with pf, pf2 and/or pf3 fields
> (affects boosting).
>
> Just take a look to the edismax page in the wiki, it seems well described :
>
>
> https://cwiki.apache.org/confluence/display/solr/The+Extended+DisMax+Query+Parser
>
> But if this is what you want :
>
> Query : A B
>
> Results :
>
> 1) A B
>
> 2) A C B
>
> 3) A C C B
>
> ...
>
> 4) A C C C C C C C C C B
>
>
> It's not going to be simple.
>
> On 16 February 2016 at 12:33, Binoy Dalal  wrote:
>
>> By my understanding, it will depend on whether you're explicitly running
>> the phrase query or whether you're also searching for the terms
>> individually.
>> In the first case, it will not match.
>> In the second case, it will match just as long as your field contains all
>> the terms.
>>
>> On Tue, 16 Feb 2016, 17:52 Modassar Ather  wrote:
>>
>> > In that case will a phrase with a given slop match a document having the
>> > terms of the given phrase with more than the given slop in between them
>> > when pf field and mm=100%? Per my understanding as a phrase it will not
>> > match for sure.
>> >
>> > Best,
>> > Modassar
>> >
>> >
>> > On Tue, Feb 16, 2016 at 5:26 PM, Alessandro Benedetti <
>> > abenede...@apache.org
>> > > wrote:
>> >
>> > > If I remember well , it is going to be as a phrase query ( when you
>> use
>> > the
>> > > "quotes") .
>> > > So the close proximity means a match of the phrase with 0 tolerance (
>> so
>> > > the terms must respect the position distance in the query).
>> > > If I remember well I debugged that recently.
>> > >
>> > > Cheers
>> > >
>> > > On 16 February 2016 at 11:42, Modassar Ather 
>> > > wrote:
>> > >
>> > > > Actually you can get it with the edismax.
>> > > > Just set mm to 100% and then configure a pf field ( or more) .
>> > > > You are going to search all the search terms mandatory and boost
>> > phrases
>> > > > match .
>> > > >
>> > > > @Alessandro Thanks for your insight.
>> > > > I thought that the document will be boosted if all of the terms
>> appear
>> > in
>> > > > close proximity by setting pf. Not sure how much is meant by the
>> close
>> > > > proximity. Checked it on dismax query parser wiki too.
>> > > >
>> > > > Best,
>> > > > Modassar
>> > > >
>> > > > On Tue, Feb 16, 2016 at 3:36 PM, Alessandro Benedetti <
>> > > > abenede...@apache.org
>> > > > > wrote:
>> > > >
>> > > > > Binoy, the omitTermFreqAndPositions is set only for text_ws which
>> is
>> > > used
>> > > > > only on the "indexed_terms" field.
>> > > > > The text_general fields seem fine to me.
>> > > > >
>> > > > > Are you omitting norms on purpose ? To be fair it could be
>> relevant
>> > in
>> > > > > title or short topic searches to boost up short field values,
>> > > containing
>> > > > a
>> > > > > lot of terms from the searched query.
>> > > > >
>> > > > > To respond Modassar :
>> > > > >
>> > > > > I don't think the phrase will be searched as individual ANDed
>> terms
>> > > until
>> > > > > > the query has it like below.
>> > > > > > "Eating Disorders" OR (Eating AND Disorders).
>> > > > > >
>> > > > >
>> > > > > Actually you can get it with the edismax.
>> > > > > Just set mm to 100% and then configure a pf field ( or more) .
>> > > > > You are going to search all the search terms mandatory and boost
>> > > phrases
>> > > > > match .
>> > > > >
>> > > > > Cheers
>> > > > >
>> > > > > On 16 February 2016 at 07:57, Emir Arnautovic <
>> > > > > emir.arnauto...@sematext.com>
>> > > > > wrote:
>> > > > >
>> > > > > > Hi Nitin,
>> > > > > > You can use pf parameter to boost results with exact phrase. You
>> > can
>> > > > also
>> > > > > > use pf2 and pf3 to boost results with bigrams (phrase matches
>> with
>> > 2
>> > > > or 3
>> > > > > > words in case input is with more than 3 words)
>> > > > > >
>> > > > > > Regards,
>> > > > > > Emir
>> > > > > >
>> > > > > >
>> > > > > > On 16.02.2016 06:18, Nitin.K wrote:
>> > > > > >
>> > > > > >> I am using edismax parser with the following query:
>> > > > > >>
>> > > > > >>
>> > > > > >>
>> > > > >
>> > > >
>> > >
>> >
>> localhost:8983/solr/tgl/select?q=eating%20disorders&wt=xml&tie=1.0&rows=200&q.op=AND&indent=true&defType=edismax&stopwords=true&lowercaseOperators=true&debugQuery=true&qf=topic_title%5E100+subtopic_title%5E40+index_term%5E20+drug%5E15+content%5E3&pf2=topTitle%5E200+subTopTitle%5E80+indTerm%5E40+drugString%5E30+

Re: SOLR ranking

2016-02-16 Thread Alessandro Benedetti
You can describe the pf field as an exact phrase query : ""~0 .
But
You can specify the slop with :

The ps Parameter

Default amount of slop on phrase queries built with pf, pf2 and/or pf3 fields
(affects boosting).

Just take a look to the edismax page in the wiki, it seems well described :

https://cwiki.apache.org/confluence/display/solr/The+Extended+DisMax+Query+Parser

But if this is what you want :

Query : A B

Results :

1) A B

2) A C B

3) A C C B

...

4) A C C C C C C C C C B


It's not going to be simple.

On 16 February 2016 at 12:33, Binoy Dalal  wrote:

> By my understanding, it will depend on whether you're explicitly running
> the phrase query or whether you're also searching for the terms
> individually.
> In the first case, it will not match.
> In the second case, it will match just as long as your field contains all
> the terms.
>
> On Tue, 16 Feb 2016, 17:52 Modassar Ather  wrote:
>
> > In that case will a phrase with a given slop match a document having the
> > terms of the given phrase with more than the given slop in between them
> > when pf field and mm=100%? Per my understanding as a phrase it will not
> > match for sure.
> >
> > Best,
> > Modassar
> >
> >
> > On Tue, Feb 16, 2016 at 5:26 PM, Alessandro Benedetti <
> > abenede...@apache.org
> > > wrote:
> >
> > > If I remember well , it is going to be as a phrase query ( when you use
> > the
> > > "quotes") .
> > > So the close proximity means a match of the phrase with 0 tolerance (
> so
> > > the terms must respect the position distance in the query).
> > > If I remember well I debugged that recently.
> > >
> > > Cheers
> > >
> > > On 16 February 2016 at 11:42, Modassar Ather 
> > > wrote:
> > >
> > > > Actually you can get it with the edismax.
> > > > Just set mm to 100% and then configure a pf field ( or more) .
> > > > You are going to search all the search terms mandatory and boost
> > phrases
> > > > match .
> > > >
> > > > @Alessandro Thanks for your insight.
> > > > I thought that the document will be boosted if all of the terms
> appear
> > in
> > > > close proximity by setting pf. Not sure how much is meant by the
> close
> > > > proximity. Checked it on dismax query parser wiki too.
> > > >
> > > > Best,
> > > > Modassar
> > > >
> > > > On Tue, Feb 16, 2016 at 3:36 PM, Alessandro Benedetti <
> > > > abenede...@apache.org
> > > > > wrote:
> > > >
> > > > > Binoy, the omitTermFreqAndPositions is set only for text_ws which
> is
> > > used
> > > > > only on the "indexed_terms" field.
> > > > > The text_general fields seem fine to me.
> > > > >
> > > > > Are you omitting norms on purpose ? To be fair it could be relevant
> > in
> > > > > title or short topic searches to boost up short field values,
> > > containing
> > > > a
> > > > > lot of terms from the searched query.
> > > > >
> > > > > To respond Modassar :
> > > > >
> > > > > I don't think the phrase will be searched as individual ANDed terms
> > > until
> > > > > > the query has it like below.
> > > > > > "Eating Disorders" OR (Eating AND Disorders).
> > > > > >
> > > > >
> > > > > Actually you can get it with the edismax.
> > > > > Just set mm to 100% and then configure a pf field ( or more) .
> > > > > You are going to search all the search terms mandatory and boost
> > > phrases
> > > > > match .
> > > > >
> > > > > Cheers
> > > > >
> > > > > On 16 February 2016 at 07:57, Emir Arnautovic <
> > > > > emir.arnauto...@sematext.com>
> > > > > wrote:
> > > > >
> > > > > > Hi Nitin,
> > > > > > You can use pf parameter to boost results with exact phrase. You
> > can
> > > > also
> > > > > > use pf2 and pf3 to boost results with bigrams (phrase matches
> with
> > 2
> > > > or 3
> > > > > > words in case input is with more than 3 words)
> > > > > >
> > > > > > Regards,
> > > > > > Emir
> > > > > >
> > > > > >
> > > > > > On 16.02.2016 06:18, Nitin.K wrote:
> > > > > >
> > > > > >> I am using edismax parser with the following query:
> > > > > >>
> > > > > >>
> > > > > >>
> > > > >
> > > >
> > >
> >
> localhost:8983/solr/tgl/select?q=eating%20disorders&wt=xml&tie=1.0&rows=200&q.op=AND&indent=true&defType=edismax&stopwords=true&lowercaseOperators=true&debugQuery=true&qf=topic_title%5E100+subtopic_title%5E40+index_term%5E20+drug%5E15+content%5E3&pf2=topTitle%5E200+subTopTitle%5E80+indTerm%5E40+drugString%5E30+content%5E6
> > > > > >>
> > > > > >> Configuration of schema.xml
> > > > > >>
> > > > > >>  > > > > stored="true"
> > > > > >> />
> > > > > >>  > stored="false"/>
> > > > > >>
> > > > > >>  > > > > >> stored="true"/>
> > > > > >>  > > > stored="false"/>
> > > > > >>
> > > > > >>  > stored="true"
> > > > > >> multiValued="true"/>
> > > > > >>  stored="false"
> > > > > >> multiValued="true"/>
> > > > > >>
> > > > > >>  > > > > >> multiValued="true"/>
> > > > > >>  > stored="false"
> > > > > >> multiValued="true"/>
> > > > > >>
> > > > > >>  > > > stored="true"/>
> > > > > >>
> > > > > >> 
> > > > > >> 
> > > > > >> 
> > > > > >> 
> > > > > >>
> > > > > >>  > > > > >> p

Re: SOLR ranking

2016-02-16 Thread Binoy Dalal
By my understanding, it will depend on whether you're explicitly running
the phrase query or whether you're also searching for the terms
individually.
In the first case, it will not match.
In the second case, it will match just as long as your field contains all
the terms.

On Tue, 16 Feb 2016, 17:52 Modassar Ather  wrote:

> In that case will a phrase with a given slop match a document having the
> terms of the given phrase with more than the given slop in between them
> when pf field and mm=100%? Per my understanding as a phrase it will not
> match for sure.
>
> Best,
> Modassar
>
>
> On Tue, Feb 16, 2016 at 5:26 PM, Alessandro Benedetti <
> abenede...@apache.org
> > wrote:
>
> > If I remember well , it is going to be as a phrase query ( when you use
> the
> > "quotes") .
> > So the close proximity means a match of the phrase with 0 tolerance ( so
> > the terms must respect the position distance in the query).
> > If I remember well I debugged that recently.
> >
> > Cheers
> >
> > On 16 February 2016 at 11:42, Modassar Ather 
> > wrote:
> >
> > > Actually you can get it with the edismax.
> > > Just set mm to 100% and then configure a pf field ( or more) .
> > > You are going to search all the search terms mandatory and boost
> phrases
> > > match .
> > >
> > > @Alessandro Thanks for your insight.
> > > I thought that the document will be boosted if all of the terms appear
> in
> > > close proximity by setting pf. Not sure how much is meant by the close
> > > proximity. Checked it on dismax query parser wiki too.
> > >
> > > Best,
> > > Modassar
> > >
> > > On Tue, Feb 16, 2016 at 3:36 PM, Alessandro Benedetti <
> > > abenede...@apache.org
> > > > wrote:
> > >
> > > > Binoy, the omitTermFreqAndPositions is set only for text_ws which is
> > used
> > > > only on the "indexed_terms" field.
> > > > The text_general fields seem fine to me.
> > > >
> > > > Are you omitting norms on purpose ? To be fair it could be relevant
> in
> > > > title or short topic searches to boost up short field values,
> > containing
> > > a
> > > > lot of terms from the searched query.
> > > >
> > > > To respond Modassar :
> > > >
> > > > I don't think the phrase will be searched as individual ANDed terms
> > until
> > > > > the query has it like below.
> > > > > "Eating Disorders" OR (Eating AND Disorders).
> > > > >
> > > >
> > > > Actually you can get it with the edismax.
> > > > Just set mm to 100% and then configure a pf field ( or more) .
> > > > You are going to search all the search terms mandatory and boost
> > phrases
> > > > match .
> > > >
> > > > Cheers
> > > >
> > > > On 16 February 2016 at 07:57, Emir Arnautovic <
> > > > emir.arnauto...@sematext.com>
> > > > wrote:
> > > >
> > > > > Hi Nitin,
> > > > > You can use pf parameter to boost results with exact phrase. You
> can
> > > also
> > > > > use pf2 and pf3 to boost results with bigrams (phrase matches with
> 2
> > > or 3
> > > > > words in case input is with more than 3 words)
> > > > >
> > > > > Regards,
> > > > > Emir
> > > > >
> > > > >
> > > > > On 16.02.2016 06:18, Nitin.K wrote:
> > > > >
> > > > >> I am using edismax parser with the following query:
> > > > >>
> > > > >>
> > > > >>
> > > >
> > >
> >
> localhost:8983/solr/tgl/select?q=eating%20disorders&wt=xml&tie=1.0&rows=200&q.op=AND&indent=true&defType=edismax&stopwords=true&lowercaseOperators=true&debugQuery=true&qf=topic_title%5E100+subtopic_title%5E40+index_term%5E20+drug%5E15+content%5E3&pf2=topTitle%5E200+subTopTitle%5E80+indTerm%5E40+drugString%5E30+content%5E6
> > > > >>
> > > > >> Configuration of schema.xml
> > > > >>
> > > > >>  > > > stored="true"
> > > > >> />
> > > > >>  stored="false"/>
> > > > >>
> > > > >>  > > > >> stored="true"/>
> > > > >>  > > stored="false"/>
> > > > >>
> > > > >>  stored="true"
> > > > >> multiValued="true"/>
> > > > >>  > > > >> multiValued="true"/>
> > > > >>
> > > > >>  > > > >> multiValued="true"/>
> > > > >>  stored="false"
> > > > >> multiValued="true"/>
> > > > >>
> > > > >>  > > stored="true"/>
> > > > >>
> > > > >> 
> > > > >> 
> > > > >> 
> > > > >> 
> > > > >>
> > > > >>  > > > >> positionIncrementGap="100" omitNorms="true">
> > > > >> 
> > > > >>  > > > class="solr.StandardTokenizerFactory"/>
> > > > >>  > > > >> ignoreCase="true"
> > > > >> words="stopwords.txt" />
> > > > >>  > class="solr.LowerCaseFilterFactory"/>
> > > > >> 
> > > > >> 
> > > > >>  > > > class="solr.StandardTokenizerFactory"/>
> > > > >>  > > > >> ignoreCase="true"
> > > > >> words="stopwords.txt" />
> > > > >>  > > > >> synonyms="synonyms.txt"
> > > > >> ignoreCase="true" expand="true"/>
> > > > >>  > class="solr.LowerCaseFilterFactory"/>
> > > > >> 
> > > > >> 
> > > > >>  > > > >> positionIncrementGap="100"
> > > > >> omitTermFreqAndPositions="true" omitNorms="true">
> > > > >> 

Re: SOLR ranking

2016-02-16 Thread Modassar Ather
In that case will a phrase with a given slop match a document having the
terms of the given phrase with more than the given slop in between them
when pf field and mm=100%? Per my understanding as a phrase it will not
match for sure.

Best,
Modassar


On Tue, Feb 16, 2016 at 5:26 PM, Alessandro Benedetti  wrote:

> If I remember well , it is going to be as a phrase query ( when you use the
> "quotes") .
> So the close proximity means a match of the phrase with 0 tolerance ( so
> the terms must respect the position distance in the query).
> If I remember well I debugged that recently.
>
> Cheers
>
> On 16 February 2016 at 11:42, Modassar Ather 
> wrote:
>
> > Actually you can get it with the edismax.
> > Just set mm to 100% and then configure a pf field ( or more) .
> > You are going to search all the search terms mandatory and boost phrases
> > match .
> >
> > @Alessandro Thanks for your insight.
> > I thought that the document will be boosted if all of the terms appear in
> > close proximity by setting pf. Not sure how much is meant by the close
> > proximity. Checked it on dismax query parser wiki too.
> >
> > Best,
> > Modassar
> >
> > On Tue, Feb 16, 2016 at 3:36 PM, Alessandro Benedetti <
> > abenede...@apache.org
> > > wrote:
> >
> > > Binoy, the omitTermFreqAndPositions is set only for text_ws which is
> used
> > > only on the "indexed_terms" field.
> > > The text_general fields seem fine to me.
> > >
> > > Are you omitting norms on purpose ? To be fair it could be relevant in
> > > title or short topic searches to boost up short field values,
> containing
> > a
> > > lot of terms from the searched query.
> > >
> > > To respond Modassar :
> > >
> > > I don't think the phrase will be searched as individual ANDed terms
> until
> > > > the query has it like below.
> > > > "Eating Disorders" OR (Eating AND Disorders).
> > > >
> > >
> > > Actually you can get it with the edismax.
> > > Just set mm to 100% and then configure a pf field ( or more) .
> > > You are going to search all the search terms mandatory and boost
> phrases
> > > match .
> > >
> > > Cheers
> > >
> > > On 16 February 2016 at 07:57, Emir Arnautovic <
> > > emir.arnauto...@sematext.com>
> > > wrote:
> > >
> > > > Hi Nitin,
> > > > You can use pf parameter to boost results with exact phrase. You can
> > also
> > > > use pf2 and pf3 to boost results with bigrams (phrase matches with 2
> > or 3
> > > > words in case input is with more than 3 words)
> > > >
> > > > Regards,
> > > > Emir
> > > >
> > > >
> > > > On 16.02.2016 06:18, Nitin.K wrote:
> > > >
> > > >> I am using edismax parser with the following query:
> > > >>
> > > >>
> > > >>
> > >
> >
> localhost:8983/solr/tgl/select?q=eating%20disorders&wt=xml&tie=1.0&rows=200&q.op=AND&indent=true&defType=edismax&stopwords=true&lowercaseOperators=true&debugQuery=true&qf=topic_title%5E100+subtopic_title%5E40+index_term%5E20+drug%5E15+content%5E3&pf2=topTitle%5E200+subTopTitle%5E80+indTerm%5E40+drugString%5E30+content%5E6
> > > >>
> > > >> Configuration of schema.xml
> > > >>
> > > >>  > > stored="true"
> > > >> />
> > > >> 
> > > >>
> > > >>  > > >> stored="true"/>
> > > >>  > stored="false"/>
> > > >>
> > > >>  > > >> multiValued="true"/>
> > > >>  > > >> multiValued="true"/>
> > > >>
> > > >>  > > >> multiValued="true"/>
> > > >>  > > >> multiValued="true"/>
> > > >>
> > > >>  > stored="true"/>
> > > >>
> > > >> 
> > > >> 
> > > >> 
> > > >> 
> > > >>
> > > >>  > > >> positionIncrementGap="100" omitNorms="true">
> > > >> 
> > > >>  > > class="solr.StandardTokenizerFactory"/>
> > > >>  > > >> ignoreCase="true"
> > > >> words="stopwords.txt" />
> > > >>  class="solr.LowerCaseFilterFactory"/>
> > > >> 
> > > >> 
> > > >>  > > class="solr.StandardTokenizerFactory"/>
> > > >>  > > >> ignoreCase="true"
> > > >> words="stopwords.txt" />
> > > >>  > > >> synonyms="synonyms.txt"
> > > >> ignoreCase="true" expand="true"/>
> > > >>  class="solr.LowerCaseFilterFactory"/>
> > > >> 
> > > >> 
> > > >>  > > >> positionIncrementGap="100"
> > > >> omitTermFreqAndPositions="true" omitNorms="true">
> > > >> 
> > > >>  > > >> class="solr.WhitespaceTokenizerFactory"/>
> > > >>  > > >> ignoreCase="true"
> > > >> words="stopwords.txt" />
> > > >>  class="solr.LowerCaseFilterFactory"/>
> > > >> 
> > > >> 
> > > >>
> > > >>
> > > >> I want , if user will search for a phrase then that pharse should
> > always
> > > >> takes the priority in comaprison to the individual words;
> > > >>
> > > >> Example: "Eating Disorders"
> > > >>
> > > >> First it will search for "Eating Disorders" together and then the
> > > >> individual
> > > >> words "Eating" and "Disorders"
> > > >> but while searching for individual words, it will always return
> 

Re: Data Import Handler Usage

2016-02-16 Thread vidya
Hi

Dataimport section in web ui page still shows me that no data import handler
is defined. And no data is being added to my new collection.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Data-Import-Handler-Usage-tp4257518p4257576.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: SOLR ranking

2016-02-16 Thread Alessandro Benedetti
If I remember well , it is going to be as a phrase query ( when you use the
"quotes") .
So the close proximity means a match of the phrase with 0 tolerance ( so
the terms must respect the position distance in the query).
If I remember well I debugged that recently.

Cheers

On 16 February 2016 at 11:42, Modassar Ather  wrote:

> Actually you can get it with the edismax.
> Just set mm to 100% and then configure a pf field ( or more) .
> You are going to search all the search terms mandatory and boost phrases
> match .
>
> @Alessandro Thanks for your insight.
> I thought that the document will be boosted if all of the terms appear in
> close proximity by setting pf. Not sure how much is meant by the close
> proximity. Checked it on dismax query parser wiki too.
>
> Best,
> Modassar
>
> On Tue, Feb 16, 2016 at 3:36 PM, Alessandro Benedetti <
> abenede...@apache.org
> > wrote:
>
> > Binoy, the omitTermFreqAndPositions is set only for text_ws which is used
> > only on the "indexed_terms" field.
> > The text_general fields seem fine to me.
> >
> > Are you omitting norms on purpose ? To be fair it could be relevant in
> > title or short topic searches to boost up short field values, containing
> a
> > lot of terms from the searched query.
> >
> > To respond Modassar :
> >
> > I don't think the phrase will be searched as individual ANDed terms until
> > > the query has it like below.
> > > "Eating Disorders" OR (Eating AND Disorders).
> > >
> >
> > Actually you can get it with the edismax.
> > Just set mm to 100% and then configure a pf field ( or more) .
> > You are going to search all the search terms mandatory and boost phrases
> > match .
> >
> > Cheers
> >
> > On 16 February 2016 at 07:57, Emir Arnautovic <
> > emir.arnauto...@sematext.com>
> > wrote:
> >
> > > Hi Nitin,
> > > You can use pf parameter to boost results with exact phrase. You can
> also
> > > use pf2 and pf3 to boost results with bigrams (phrase matches with 2
> or 3
> > > words in case input is with more than 3 words)
> > >
> > > Regards,
> > > Emir
> > >
> > >
> > > On 16.02.2016 06:18, Nitin.K wrote:
> > >
> > >> I am using edismax parser with the following query:
> > >>
> > >>
> > >>
> >
> localhost:8983/solr/tgl/select?q=eating%20disorders&wt=xml&tie=1.0&rows=200&q.op=AND&indent=true&defType=edismax&stopwords=true&lowercaseOperators=true&debugQuery=true&qf=topic_title%5E100+subtopic_title%5E40+index_term%5E20+drug%5E15+content%5E3&pf2=topTitle%5E200+subTopTitle%5E80+indTerm%5E40+drugString%5E30+content%5E6
> > >>
> > >> Configuration of schema.xml
> > >>
> > >>  > stored="true"
> > >> />
> > >> 
> > >>
> > >>  > >> stored="true"/>
> > >>  stored="false"/>
> > >>
> > >>  > >> multiValued="true"/>
> > >>  > >> multiValued="true"/>
> > >>
> > >>  > >> multiValued="true"/>
> > >>  > >> multiValued="true"/>
> > >>
> > >>  stored="true"/>
> > >>
> > >> 
> > >> 
> > >> 
> > >> 
> > >>
> > >>  > >> positionIncrementGap="100" omitNorms="true">
> > >> 
> > >>  > class="solr.StandardTokenizerFactory"/>
> > >>  > >> ignoreCase="true"
> > >> words="stopwords.txt" />
> > >> 
> > >> 
> > >> 
> > >>  > class="solr.StandardTokenizerFactory"/>
> > >>  > >> ignoreCase="true"
> > >> words="stopwords.txt" />
> > >>  > >> synonyms="synonyms.txt"
> > >> ignoreCase="true" expand="true"/>
> > >> 
> > >> 
> > >> 
> > >>  > >> positionIncrementGap="100"
> > >> omitTermFreqAndPositions="true" omitNorms="true">
> > >> 
> > >>  > >> class="solr.WhitespaceTokenizerFactory"/>
> > >>  > >> ignoreCase="true"
> > >> words="stopwords.txt" />
> > >> 
> > >> 
> > >> 
> > >>
> > >>
> > >> I want , if user will search for a phrase then that pharse should
> always
> > >> takes the priority in comaprison to the individual words;
> > >>
> > >> Example: "Eating Disorders"
> > >>
> > >> First it will search for "Eating Disorders" together and then the
> > >> individual
> > >> words "Eating" and "Disorders"
> > >> but while searching for individual words, it will always return those
> > >> documents where both the words should exist for which i am already
> using
> > >> q.op="AND" in my query.
> > >>
> > >> Thanks,
> > >> Nitin
> > >>
> > >>
> > >>
> > >>
> > >> --
> > >> View this message in context:
> > >>
> http://lucene.472066.n3.nabble.com/SOLR-ranking-tp4257367p4257510.html
> > >> Sent from the Solr - User mailing list archive at Nabble.com.
> > >>
> > >
> > > --
> > > Monitoring * Alerting * Anomaly Detection * Centralized Log Management
> > > Solr & Elasticsearch Support * http://sematext.com/
> > >
> > >
> >
> >
> > --
> > --
> >
> > Benedetti Alessandro
> > Visiting card : http://about.me/alessandro_benedetti
> >
> > "Tyger, tyger burning bright
> > In the forests of t

Re: SOLR ranking

2016-02-16 Thread Modassar Ather
Actually you can get it with the edismax.
Just set mm to 100% and then configure a pf field ( or more) .
You are going to search all the search terms mandatory and boost phrases
match .

@Alessandro Thanks for your insight.
I thought that the document will be boosted if all of the terms appear in
close proximity by setting pf. Not sure how much is meant by the close
proximity. Checked it on dismax query parser wiki too.

Best,
Modassar

On Tue, Feb 16, 2016 at 3:36 PM, Alessandro Benedetti  wrote:

> Binoy, the omitTermFreqAndPositions is set only for text_ws which is used
> only on the "indexed_terms" field.
> The text_general fields seem fine to me.
>
> Are you omitting norms on purpose ? To be fair it could be relevant in
> title or short topic searches to boost up short field values, containing a
> lot of terms from the searched query.
>
> To respond Modassar :
>
> I don't think the phrase will be searched as individual ANDed terms until
> > the query has it like below.
> > "Eating Disorders" OR (Eating AND Disorders).
> >
>
> Actually you can get it with the edismax.
> Just set mm to 100% and then configure a pf field ( or more) .
> You are going to search all the search terms mandatory and boost phrases
> match .
>
> Cheers
>
> On 16 February 2016 at 07:57, Emir Arnautovic <
> emir.arnauto...@sematext.com>
> wrote:
>
> > Hi Nitin,
> > You can use pf parameter to boost results with exact phrase. You can also
> > use pf2 and pf3 to boost results with bigrams (phrase matches with 2 or 3
> > words in case input is with more than 3 words)
> >
> > Regards,
> > Emir
> >
> >
> > On 16.02.2016 06:18, Nitin.K wrote:
> >
> >> I am using edismax parser with the following query:
> >>
> >>
> >>
> localhost:8983/solr/tgl/select?q=eating%20disorders&wt=xml&tie=1.0&rows=200&q.op=AND&indent=true&defType=edismax&stopwords=true&lowercaseOperators=true&debugQuery=true&qf=topic_title%5E100+subtopic_title%5E40+index_term%5E20+drug%5E15+content%5E3&pf2=topTitle%5E200+subTopTitle%5E80+indTerm%5E40+drugString%5E30+content%5E6
> >>
> >> Configuration of schema.xml
> >>
> >>  stored="true"
> >> />
> >> 
> >>
> >>  >> stored="true"/>
> >> 
> >>
> >>  >> multiValued="true"/>
> >>  >> multiValued="true"/>
> >>
> >>  >> multiValued="true"/>
> >>  >> multiValued="true"/>
> >>
> >> 
> >>
> >> 
> >> 
> >> 
> >> 
> >>
> >>  >> positionIncrementGap="100" omitNorms="true">
> >> 
> >>  class="solr.StandardTokenizerFactory"/>
> >>  >> ignoreCase="true"
> >> words="stopwords.txt" />
> >> 
> >> 
> >> 
> >>  class="solr.StandardTokenizerFactory"/>
> >>  >> ignoreCase="true"
> >> words="stopwords.txt" />
> >>  >> synonyms="synonyms.txt"
> >> ignoreCase="true" expand="true"/>
> >> 
> >> 
> >> 
> >>  >> positionIncrementGap="100"
> >> omitTermFreqAndPositions="true" omitNorms="true">
> >> 
> >>  >> class="solr.WhitespaceTokenizerFactory"/>
> >>  >> ignoreCase="true"
> >> words="stopwords.txt" />
> >> 
> >> 
> >> 
> >>
> >>
> >> I want , if user will search for a phrase then that pharse should always
> >> takes the priority in comaprison to the individual words;
> >>
> >> Example: "Eating Disorders"
> >>
> >> First it will search for "Eating Disorders" together and then the
> >> individual
> >> words "Eating" and "Disorders"
> >> but while searching for individual words, it will always return those
> >> documents where both the words should exist for which i am already using
> >> q.op="AND" in my query.
> >>
> >> Thanks,
> >> Nitin
> >>
> >>
> >>
> >>
> >> --
> >> View this message in context:
> >> http://lucene.472066.n3.nabble.com/SOLR-ranking-tp4257367p4257510.html
> >> Sent from the Solr - User mailing list archive at Nabble.com.
> >>
> >
> > --
> > Monitoring * Alerting * Anomaly Detection * Centralized Log Management
> > Solr & Elasticsearch Support * http://sematext.com/
> >
> >
>
>
> --
> --
>
> Benedetti Alessandro
> Visiting card : http://about.me/alessandro_benedetti
>
> "Tyger, tyger burning bright
> In the forests of the night,
> What immortal hand or eye
> Could frame thy fearful symmetry?"
>
> William Blake - Songs of Experience -1794 England
>


Re: SOLR ranking

2016-02-16 Thread Emir Arnautovic

Hi Nitin,
Not sure if you changed what fields you use for phrase boost, but in 
example you sent, all fields except content are "string" fields and 
content is boosted with 6 while topic_title in qf is boosted with 100. 
Try setting same field you use in qf in pf2 and you should see the 
difference. After that you can play with field analysis and which field 
to use just for boosting.


Regards,
Emir

On 16.02.2016 11:30, Nitin.K wrote:

Hi Emir,

I tried using the boost parameters for phrase search by removing the
omitTermFreqAndPositions from the multivalued field type but somehow while
searching phrases; the documents that have exact match are not coming up in
the order. Instead; in the content field, it is considering the mutual count
of both the terms and based on that, its deciding the order.

kindly let me know, how can i first search the phrase and then go to the
individual words (i.e word-1 AND word-2)



--
View this message in context: 
http://lucene.472066.n3.nabble.com/SOLR-ranking-tp4257367p4257556.html
Sent from the Solr - User mailing list archive at Nabble.com.


--
Monitoring * Alerting * Anomaly Detection * Centralized Log Management
Solr & Elasticsearch Support * http://sematext.com/



Re: SOLR ranking

2016-02-16 Thread Alessandro Benedetti
Nithin, have you read my reply ?

kindly let me know, how can i first search the phrase and then go to the
> individual words (i.e word-1 AND word-2)
>

On 16 February 2016 at 10:45, Binoy Dalal  wrote:

> Based on a quick look at the documentation, I think that you should use
> termPositions=true to achieve what you want.
>
> On Tue, 16 Feb 2016, 16:08 Nitin.K  wrote:
>
> > Hi Emir,
> >
> > I tried using the boost parameters for phrase search by removing the
> > omitTermFreqAndPositions from the multivalued field type but somehow
> while
> > searching phrases; the documents that have exact match are not coming up
> in
> > the order. Instead; in the content field, it is considering the mutual
> > count
> > of both the terms and based on that, its deciding the order.
> >
> > kindly let me know, how can i first search the phrase and then go to the
> > individual words (i.e word-1 AND word-2)
> >
> >
> >
> > --
> > View this message in context:
> > http://lucene.472066.n3.nabble.com/SOLR-ranking-tp4257367p4257556.html
> > Sent from the Solr - User mailing list archive at Nabble.com.
> >
> --
> Regards,
> Binoy Dalal
>



-- 
--

Benedetti Alessandro
Visiting card : http://about.me/alessandro_benedetti

"Tyger, tyger burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symmetry?"

William Blake - Songs of Experience -1794 England


Re: Need to move on SOlr cloud (help required)

2016-02-16 Thread Paul Borgermans
On 16 February 2016 at 06:09, Midas A  wrote:

> Susheel,
>
> Is there any client available in php for solr cloud which maintain the same
> ??
>
>
No there is none. I recommend HAProxy for Non SolrJ clients and
loadbalancing SolrCloud.
HAProxy makes it also easy to do rolling updates of your SolrCloud nodes

Hth
Paul


>
> On Tue, Feb 16, 2016 at 7:31 AM, Susheel Kumar 
> wrote:
>
> > In SolrJ, you would use CloudSolrClient which interacts with Zookeeper
> > (which maintains Cluster State). See CloudSolrClient API. So that's how
> > SolrJ would know which node is down or not.
>


Re: SOLR ranking

2016-02-16 Thread Binoy Dalal
Based on a quick look at the documentation, I think that you should use
termPositions=true to achieve what you want.

On Tue, 16 Feb 2016, 16:08 Nitin.K  wrote:

> Hi Emir,
>
> I tried using the boost parameters for phrase search by removing the
> omitTermFreqAndPositions from the multivalued field type but somehow while
> searching phrases; the documents that have exact match are not coming up in
> the order. Instead; in the content field, it is considering the mutual
> count
> of both the terms and based on that, its deciding the order.
>
> kindly let me know, how can i first search the phrase and then go to the
> individual words (i.e word-1 AND word-2)
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/SOLR-ranking-tp4257367p4257556.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
-- 
Regards,
Binoy Dalal


Re: SOLR ranking

2016-02-16 Thread Nitin.K
Hi Emir,

I tried using the boost parameters for phrase search by removing the
omitTermFreqAndPositions from the multivalued field type but somehow while
searching phrases; the documents that have exact match are not coming up in
the order. Instead; in the content field, it is considering the mutual count
of both the terms and based on that, its deciding the order.

kindly let me know, how can i first search the phrase and then go to the
individual words (i.e word-1 AND word-2)



--
View this message in context: 
http://lucene.472066.n3.nabble.com/SOLR-ranking-tp4257367p4257556.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: SOLR ranking

2016-02-16 Thread Nitin.K
You are absolutely right Binoy..!!

But my problem is; We don't want the term frequency to take into account for
index term as well as drug. (i.e. Don't want to consider the no. of
occurrences of search term for both of these fields.)
Is it possible that i can omit the term frequency for these two fields and
also indexed them with term positions for phrase search ??

I tried using omitTermFreqAndPositions="true" and omitPositions="false" but
thats not working for me.

Thanks,
Nitin




--
View this message in context: 
http://lucene.472066.n3.nabble.com/SOLR-ranking-tp4257367p4257551.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: join and NOT together

2016-02-16 Thread Sergio García Maroto
My debugQuery=true returns related to the NOT:

0.06755901 = (MATCH) sum of: 0.06755901 = (MATCH) MatchAllDocsQuery,
product of: 0.06755901 = queryNorm

I tried changing v='(*:* -DocType:pdf)'  to v='(-DocType:pdf)'
and it worked.

Anyone could explain the difference?

Thanks
Sergo


On 15 February 2016 at 21:12, Mikhail Khludnev 
wrote:

> Hello Sergio,
>
> What debougQuery=true output does look like?
>
> On Mon, Feb 15, 2016 at 7:10 PM, marotosg  wrote:
>
> > Hi,
> >
> > I am trying to solve an issue when doing a search joining two collections
> > and negating the cross core query.
> >
> > Let's say I have one collection person and another collection documents
> and
> > I can join them using local param !join because I have PersonIDS in
> > document
> > collection.
> >
> > if my query is like below. Query executed against Person Core. I want to
> > retrieve people with name Peter and not documents attached of type pdf.
> >
> > q=PersonName:peter AND {!type=join from=DocPersonID to=PersonID
> > fromIndex=document v='(*:* -DocType:pdf)' }
> >
> > If I have for person 1 called peter two documents one of type:pdf and
> other
> > one of type:word.
> > Then this person will come back.
> >
> > Is there any way of excluding that person if any of the docs fulfill the
> > NOT.
> >
> > Thanks
> > Sergio
> >
> >
> >
> >
> >
> >
> >
> > --
> > View this message in context:
> > http://lucene.472066.n3.nabble.com/join-and-NOT-together-tp4257411.html
> > Sent from the Solr - User mailing list archive at Nabble.com.
> >
>
>
>
> --
> Sincerely yours
> Mikhail Khludnev
> Principal Engineer,
> Grid Dynamics
>
> 
> 
>


Re: SOLR ranking

2016-02-16 Thread Binoy Dalal
@Nitin
Why are you phrase boosting on string fields?
More often than not, it won't do anything because the phrases simply won't
match the entire string.

On Tue, 16 Feb 2016, 15:36 Alessandro Benedetti 
wrote:

> Binoy, the omitTermFreqAndPositions is set only for text_ws which is used
> only on the "indexed_terms" field.
> The text_general fields seem fine to me.
>
> Are you omitting norms on purpose ? To be fair it could be relevant in
> title or short topic searches to boost up short field values, containing a
> lot of terms from the searched query.
>
> To respond Modassar :
>
> I don't think the phrase will be searched as individual ANDed terms until
> > the query has it like below.
> > "Eating Disorders" OR (Eating AND Disorders).
> >
>
> Actually you can get it with the edismax.
> Just set mm to 100% and then configure a pf field ( or more) .
> You are going to search all the search terms mandatory and boost phrases
> match .
>
> Cheers
>
> On 16 February 2016 at 07:57, Emir Arnautovic <
> emir.arnauto...@sematext.com>
> wrote:
>
> > Hi Nitin,
> > You can use pf parameter to boost results with exact phrase. You can also
> > use pf2 and pf3 to boost results with bigrams (phrase matches with 2 or 3
> > words in case input is with more than 3 words)
> >
> > Regards,
> > Emir
> >
> >
> > On 16.02.2016 06:18, Nitin.K wrote:
> >
> >> I am using edismax parser with the following query:
> >>
> >>
> >>
> localhost:8983/solr/tgl/select?q=eating%20disorders&wt=xml&tie=1.0&rows=200&q.op=AND&indent=true&defType=edismax&stopwords=true&lowercaseOperators=true&debugQuery=true&qf=topic_title%5E100+subtopic_title%5E40+index_term%5E20+drug%5E15+content%5E3&pf2=topTitle%5E200+subTopTitle%5E80+indTerm%5E40+drugString%5E30+content%5E6
> >>
> >> Configuration of schema.xml
> >>
> >>  stored="true"
> >> />
> >> 
> >>
> >>  >> stored="true"/>
> >> 
> >>
> >>  >> multiValued="true"/>
> >>  >> multiValued="true"/>
> >>
> >>  >> multiValued="true"/>
> >>  >> multiValued="true"/>
> >>
> >> 
> >>
> >> 
> >> 
> >> 
> >> 
> >>
> >>  >> positionIncrementGap="100" omitNorms="true">
> >> 
> >>  class="solr.StandardTokenizerFactory"/>
> >>  >> ignoreCase="true"
> >> words="stopwords.txt" />
> >> 
> >> 
> >> 
> >>  class="solr.StandardTokenizerFactory"/>
> >>  >> ignoreCase="true"
> >> words="stopwords.txt" />
> >>  >> synonyms="synonyms.txt"
> >> ignoreCase="true" expand="true"/>
> >> 
> >> 
> >> 
> >>  >> positionIncrementGap="100"
> >> omitTermFreqAndPositions="true" omitNorms="true">
> >> 
> >>  >> class="solr.WhitespaceTokenizerFactory"/>
> >>  >> ignoreCase="true"
> >> words="stopwords.txt" />
> >> 
> >> 
> >> 
> >>
> >>
> >> I want , if user will search for a phrase then that pharse should always
> >> takes the priority in comaprison to the individual words;
> >>
> >> Example: "Eating Disorders"
> >>
> >> First it will search for "Eating Disorders" together and then the
> >> individual
> >> words "Eating" and "Disorders"
> >> but while searching for individual words, it will always return those
> >> documents where both the words should exist for which i am already using
> >> q.op="AND" in my query.
> >>
> >> Thanks,
> >> Nitin
> >>
> >>
> >>
> >>
> >> --
> >> View this message in context:
> >> http://lucene.472066.n3.nabble.com/SOLR-ranking-tp4257367p4257510.html
> >> Sent from the Solr - User mailing list archive at Nabble.com.
> >>
> >
> > --
> > Monitoring * Alerting * Anomaly Detection * Centralized Log Management
> > Solr & Elasticsearch Support * http://sematext.com/
> >
> >
>
>
> --
> --
>
> Benedetti Alessandro
> Visiting card : http://about.me/alessandro_benedetti
>
> "Tyger, tyger burning bright
> In the forests of the night,
> What immortal hand or eye
> Could frame thy fearful symmetry?"
>
> William Blake - Songs of Experience -1794 England
>
-- 
Regards,
Binoy Dalal


Re: SOLR ranking

2016-02-16 Thread Alessandro Benedetti
Binoy, the omitTermFreqAndPositions is set only for text_ws which is used
only on the "indexed_terms" field.
The text_general fields seem fine to me.

Are you omitting norms on purpose ? To be fair it could be relevant in
title or short topic searches to boost up short field values, containing a
lot of terms from the searched query.

To respond Modassar :

I don't think the phrase will be searched as individual ANDed terms until
> the query has it like below.
> "Eating Disorders" OR (Eating AND Disorders).
>

Actually you can get it with the edismax.
Just set mm to 100% and then configure a pf field ( or more) .
You are going to search all the search terms mandatory and boost phrases
match .

Cheers

On 16 February 2016 at 07:57, Emir Arnautovic 
wrote:

> Hi Nitin,
> You can use pf parameter to boost results with exact phrase. You can also
> use pf2 and pf3 to boost results with bigrams (phrase matches with 2 or 3
> words in case input is with more than 3 words)
>
> Regards,
> Emir
>
>
> On 16.02.2016 06:18, Nitin.K wrote:
>
>> I am using edismax parser with the following query:
>>
>>
>> localhost:8983/solr/tgl/select?q=eating%20disorders&wt=xml&tie=1.0&rows=200&q.op=AND&indent=true&defType=edismax&stopwords=true&lowercaseOperators=true&debugQuery=true&qf=topic_title%5E100+subtopic_title%5E40+index_term%5E20+drug%5E15+content%5E3&pf2=topTitle%5E200+subTopTitle%5E80+indTerm%5E40+drugString%5E30+content%5E6
>>
>> Configuration of schema.xml
>>
>> > />
>> 
>>
>> > stored="true"/>
>> 
>>
>> > multiValued="true"/>
>> > multiValued="true"/>
>>
>> > multiValued="true"/>
>> > multiValued="true"/>
>>
>> 
>>
>> 
>> 
>> 
>> 
>>
>> > positionIncrementGap="100" omitNorms="true">
>> 
>> 
>> > ignoreCase="true"
>> words="stopwords.txt" />
>> 
>> 
>> 
>> 
>> > ignoreCase="true"
>> words="stopwords.txt" />
>> > synonyms="synonyms.txt"
>> ignoreCase="true" expand="true"/>
>> 
>> 
>> 
>> > positionIncrementGap="100"
>> omitTermFreqAndPositions="true" omitNorms="true">
>> 
>> > class="solr.WhitespaceTokenizerFactory"/>
>> > ignoreCase="true"
>> words="stopwords.txt" />
>> 
>> 
>> 
>>
>>
>> I want , if user will search for a phrase then that pharse should always
>> takes the priority in comaprison to the individual words;
>>
>> Example: "Eating Disorders"
>>
>> First it will search for "Eating Disorders" together and then the
>> individual
>> words "Eating" and "Disorders"
>> but while searching for individual words, it will always return those
>> documents where both the words should exist for which i am already using
>> q.op="AND" in my query.
>>
>> Thanks,
>> Nitin
>>
>>
>>
>>
>> --
>> View this message in context:
>> http://lucene.472066.n3.nabble.com/SOLR-ranking-tp4257367p4257510.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>>
>
> --
> Monitoring * Alerting * Anomaly Detection * Centralized Log Management
> Solr & Elasticsearch Support * http://sematext.com/
>
>


-- 
--

Benedetti Alessandro
Visiting card : http://about.me/alessandro_benedetti

"Tyger, tyger burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symmetry?"

William Blake - Songs of Experience -1794 England


Re: Data Import Handler Usage

2016-02-16 Thread Erik Hatcher
The "other" collection (destination of the import) is the collection where that 
data import handler definition resides. 

   Erik

> On Feb 16, 2016, at 01:54, vidya  wrote:
> 
> Hi
> 
> I have gone through documents to define data import handler in solr. But i
> couldnot implement it.
> I have created data-config.xml file that specifies moving data from
> collection1 core to another collection, i donno where i need to specify that
> second collection.
> 
> 
>  
> url="http://localhost:8983/solr/collection1"; query="*:*"/>
>  
> 
> 
> and request handler is defined as follows in solrconfig.xml
> 
>  class="org.apache.solr.handler.dataimport.DataImportHandler">
>
>  /home/username/data-config.xml
>
>  
> 
> Even after adding this, i couldnot get any data import handler in web url
> page for importing.
> Why is it so? And what changes need to be done?
> I have followed the following url : 
> http://www.codewrecks.com/blog/index.php/2013/4/29/loading-data-from-sql-server-to-solr-with-a-data-import-handler
> 
> 
> 
> 
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Data-Import-Handler-Usage-tp4257518.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Re: Using Solr's spatial functionality for astronomical catalog

2016-02-16 Thread Colin Freas

Looks like the only issue was that I did not have an alias for SourceRpt
field in the SQL.

With that in place, everything seems to work more or less as expected.
SourceRpt shows up where it should.

Queries like

http://localhost:8983/solr/spatial/select?q=*:*&fq={!geofilt%20sfield=Sour
ceRpt}&pt=0,0&d=5



... return appropriate subsets.

Doh,
Colin

On 2/16/16, 3:31 AM, "Colin Freas"  wrote:

>
>David, thanks for getting back to me.  SpatialRecursivePrefixTreeFieldType
>seems to be what I need, and the default search seems appropriate.  This
>is for entries in an astronomical catalog, so great circle distances on a
>perfect sphere is what I¹m after.
>
>I am having a bit of difficulty though.
>
>Having gotten records importing via our database into a schema on both a
>stand-along Solr instance and in a SolrCloud cluster, I¹ve moved on to
>³spatializing² the appropriate fields, and everything looks like it¹s
>working, in that there are no errors thrown.  But when I try what I think
>is valid spatial query, it doesn¹t work.
>
>Here¹s what I¹m doing.  Pertinent bits from my schema:
>
>   ...
>   required="false" multiValued="false" />
>   CatID
>   
>   
>class="solr.SpatialRecursivePrefixTreeFieldType"
>   geo="true"
>   distanceUnits="degrees" />
>   ...
>   
>
>In my db-config.xml, I¹ve got this sql:
>   
>
>
>When I run a data import through Solr¹s admin gui and look at the verbose
>debug output, something seems off.  Top of the output is this:
>   {
>   "responseHeader": {
>   "status": 0,
>   "QTime": 75
>   },
>   "initArgs": [
>   "defaults",
>   [
>   "config",
>   "hsc-db-config.xml"
>   ]
>   ],
>   "command": "full-import",
>   "mode": "debug",
>   "documents": [
>   {
>   "MatchDec": [
>   -0.67312569921
>   ],
>   "SourceDec": [
>   -0.67312569921
>   ],
>   "MatchRA": [
>   0.5681586795334927
>   ],
>   "SourceRA": [
>   0.5681586795334927
>   ],
>   "CatID": [
>   25558943
>   ]
>   },
>
>
>There¹s no SourceRpt field there.  But in verbose output of what¹s
>returned from the query, the SourceRpt field seems to be correctly put
>together:
>
>   "verbose-output": [
>   "entity:observation",
>   [
>   "document#1",
>   [
>   "query",
>   "SELECT CatID, MatchRA, MatchDec, SourceRA, SourceDec,   +
>ltrim(rtrim(Str(SourceRA,25,16))) + ',' +
>ltrim(rtrim(Str(SourceDec,25,16))) +  FROM xcat.BestCatalog",
>   "time-taken",
>   "0:0:0.22",
>   null,
>   "--- row #1-",
>   "",
>   "'0.5681586795334928,-0.673125699210'",
>   "MatchDec",
>   -0.67312569921,
>   "SourceDec",
>   -0.67312569921,
>   "MatchRA",
>   0.5681586795334927,
>   "SourceRA",
>   0.5681586795334927,
>   "CatID",
>   25558943,
>   null,
>   "-"
>   ]
>
>
>
>I try a spatial search like this:
>   
> http://localhost:8983/solr/spatial/select?q=*%3A*&wt=json&indent=true&spa
>t
>ial=true&pt=3%2C3&sfield=SourceRpt&d=0.0001
>
>
>... And I get back all (10) records in the core, when I would expect 0,
>given the very small distance I supply to a point well away from any of
>the records.
>
>I¹m not sure what¹s going on.  I don¹t know if this is a simple Solr
>config error I¹m missing, or if there¹s some spatial magic I¹m unaware of.
> 
>
>Any thoughts appreciated.
>
>-Colin
>
>
>On 1/20/16, 9:34 PM, "david.w.smi...@gmail.com" 
>wrote:
>
>>Hello Colin,
>>
>>If the spatial field you use is the SpatialRecursivePrefixTreeFieldType
>>one
>>(RPT for short) with geo="true" then the circle shape (i.e. point-radius
>>filter) implied by the geofilt Solr QParser is on a sphere.  That is, it
>>uses the "great circle" distance computed using the Haversine formula by
>>default, though it can be configured to use the Law of Cosines formula or
>>Vincenty (spherical version) formula if you so choose.  Using geodist()
>>for
>>spatial distance sorting/boosting also uses this.  If you use LatLonType
>>then geofilt & geodist() use Haversine too.
>>
>>If you use polygons or line strings, then it's *not* using a spherical
>>model; it's using a Euclidean (flat) model on plate carrée.  I am
>>currently
>>working on adapting the Spatial4j library to work with Lucene's Geo3D
>>(aka
>>spatial 3d) which has both a spherical model and an ellipsoidal model,
>>which can be configured with the characteristics specified by WGS84.  If
>>you are super-eager to get this yourself without waiting, then you could
>>write a Solr QParser that constructs a Geo3dShape wrapping a Geo3D
>>GeoShape
>>object constructed from query parameters.  You might alternatively try
>>and
>>use Geo3DPointField on Lucene 6 trunk.
>>
>>~ David
>>
>>On Tue, Jan 19, 2016 at

Re: Using Solr's spatial functionality for astronomical catalog

2016-02-16 Thread Colin Freas

David, thanks for getting back to me.  SpatialRecursivePrefixTreeFieldType
seems to be what I need, and the default search seems appropriate.  This
is for entries in an astronomical catalog, so great circle distances on a
perfect sphere is what I¹m after.

I am having a bit of difficulty though.

Having gotten records importing via our database into a schema on both a
stand-along Solr instance and in a SolrCloud cluster, I¹ve moved on to
³spatializing² the appropriate fields, and everything looks like it¹s
working, in that there are no errors thrown.  But when I try what I think
is valid spatial query, it doesn¹t work.

Here¹s what I¹m doing.  Pertinent bits from my schema:

...

CatID



...


In my db-config.xml, I¹ve got this sql:



When I run a data import through Solr¹s admin gui and look at the verbose
debug output, something seems off.  Top of the output is this:
{
"responseHeader": {
"status": 0,
"QTime": 75
},
"initArgs": [
"defaults",
[
"config",
"hsc-db-config.xml"
]
],
"command": "full-import",
"mode": "debug",
"documents": [
{
"MatchDec": [
-0.67312569921
],
"SourceDec": [
-0.67312569921
],
"MatchRA": [
0.5681586795334927
],
"SourceRA": [
0.5681586795334927
],
"CatID": [
25558943
]
},


There¹s no SourceRpt field there.  But in verbose output of what¹s
returned from the query, the SourceRpt field seems to be correctly put
together:

"verbose-output": [
"entity:observation",
[
"document#1",
[
"query",
"SELECT CatID, MatchRA, MatchDec, SourceRA, SourceDec,   +
ltrim(rtrim(Str(SourceRA,25,16))) + ',' +
ltrim(rtrim(Str(SourceDec,25,16))) +  FROM xcat.BestCatalog",
"time-taken",
"0:0:0.22",
null,
"--- row #1-",
"",
"'0.5681586795334928,-0.673125699210'",
"MatchDec",
-0.67312569921,
"SourceDec",
-0.67312569921,
"MatchRA",
0.5681586795334927,
"SourceRA",
0.5681586795334927,
"CatID",
25558943,
null,
"-"
]



I try a spatial search like this:

http://localhost:8983/solr/spatial/select?q=*%3A*&wt=json&indent=true&spat
ial=true&pt=3%2C3&sfield=SourceRpt&d=0.0001


... And I get back all (10) records in the core, when I would expect 0,
given the very small distance I supply to a point well away from any of
the records.

I¹m not sure what¹s going on.  I don¹t know if this is a simple Solr
config error I¹m missing, or if there¹s some spatial magic I¹m unaware of.
 

Any thoughts appreciated.

-Colin


On 1/20/16, 9:34 PM, "david.w.smi...@gmail.com" 
wrote:

>Hello Colin,
>
>If the spatial field you use is the SpatialRecursivePrefixTreeFieldType
>one
>(RPT for short) with geo="true" then the circle shape (i.e. point-radius
>filter) implied by the geofilt Solr QParser is on a sphere.  That is, it
>uses the "great circle" distance computed using the Haversine formula by
>default, though it can be configured to use the Law of Cosines formula or
>Vincenty (spherical version) formula if you so choose.  Using geodist()
>for
>spatial distance sorting/boosting also uses this.  If you use LatLonType
>then geofilt & geodist() use Haversine too.
>
>If you use polygons or line strings, then it's *not* using a spherical
>model; it's using a Euclidean (flat) model on plate carrée.  I am
>currently
>working on adapting the Spatial4j library to work with Lucene's Geo3D (aka
>spatial 3d) which has both a spherical model and an ellipsoidal model,
>which can be configured with the characteristics specified by WGS84.  If
>you are super-eager to get this yourself without waiting, then you could
>write a Solr QParser that constructs a Geo3dShape wrapping a Geo3D
>GeoShape
>object constructed from query parameters.  You might alternatively try and
>use Geo3DPointField on Lucene 6 trunk.
>
>~ David
>
>On Tue, Jan 19, 2016 at 11:07 AM Colin Freas  wrote:
>
>>
>> Greetings!
>>
>> I have recently stood up an instance of Solr, indexing a catalog of
>>about
>> 100M records representing points on the celestial sphere.  All of the
>> fields are strings, floats, and non-spatial types.  I¹d like to convert
>>the
>> positional data to an appropriate spatial point data type supported by
>>Solr.
>>
>> I have a couple of questions about indexing spatial data using Solr,
>>since
>> it seems spatial4j, and the spatial functionality in Solr generally, is
>> more GIS geared.  I worry that the measurements of lat/long on the
>> imperfect sphere of the Earth wouldn¹t match up with the astronomical
>>r

Re: Highlight brings the content from the first pages of pdf

2016-02-16 Thread Binoy Dalal
Yeah.
Under  an entry like so:
fields

On Tue, 16 Feb 2016, 13:00 Anil  wrote:

> you mean default fl ?
>
> On 16 February 2016 at 12:57, Binoy Dalal  wrote:
>
> > Oh wait. We don't append the fl parameter to the query.
> > We've configured it in the request handler in solrconfig.xml
> > Maybe that is something that you can do.
> >
> > On Tue, 16 Feb 2016, 12:39 Anil  wrote:
> >
> > > Thanks for your response Binoy.
> > >
> > > Yes.I am looking for any alternative to this. With long number of
> fileds,
> > > url will become long and might lead to "url too long exception" when
> > using
> > > http request.
> > >
> > > On 16 February 2016 at 11:01, Binoy Dalal 
> > wrote:
> > >
> > > > Filling in the fl parameter with all the required fields is what we
> do
> > at
> > > > my project as well, and I don't think there is any alternative to
> this.
> > > >
> > > > Maybe somebody else can advise on this?
> > > >
> > > > On Tue, 16 Feb 2016, 10:30 Anil  wrote:
> > > >
> > > > > Any help on this ? Thanks.
> > > > >
> > > > > On 15 February 2016 at 19:06, Anil  wrote:
> > > > >
> > > > > > Yes. But i have long list of fields.
> > > > > >
> > > > > > i feel adding all the fileds in fl is not good practice unless
> one
> > > > > > interested in few fields. In my case, i am interested in all
> fields
> > > > > except
> > > > > > the one .
> > > > > >
> > > > > > is there any alternative approach ? Thanks in advance.
> > > > > >
> > > > > >
> > > > > >
> > > > > > On 15 February 2016 at 17:27, Binoy Dalal <
> binoydala...@gmail.com>
> > > > > wrote:
> > > > > >
> > > > > >> If I understand correctly, you have already highlighted the
> field
> > > and
> > > > > only
> > > > > >> want to return the highlights and not the field itself.
> > > > > >> Well in that case, simply remove the field name from your fl
> list.
> > > > > >>
> > > > > >> On Mon, 15 Feb 2016, 17:04 Anil  wrote:
> > > > > >>
> > > > > >> > HOw can highlighted field excluded in the main result ? as it
> is
> > > > > >> available
> > > > > >> > in the highlight section.
> > > > > >> >
> > > > > >> > In my scenario, One filed (lets say commands) of the each solr
> > > > > document
> > > > > >> > would be around 10 mg. I dont want to fetch that filed in
> > response
> > > > > when
> > > > > >> its
> > > > > >> > highlight snippets available in the response.
> > > > > >> >
> > > > > >> > Please advice.
> > > > > >> >
> > > > > >> >
> > > > > >> >
> > > > > >> > On 15 February 2016 at 15:36, Evert R.  >
> > > > wrote:
> > > > > >> >
> > > > > >> > > Hello Mark,
> > > > > >> > >
> > > > > >> > > Thanks for you reply.
> > > > > >> > >
> > > > > >> > > All text is indexed (1 pdf file). It works now.
> > > > > >> > >
> > > > > >> > > Best regard,
> > > > > >> > >
> > > > > >> > >
> > > > > >> > > *--Evert*
> > > > > >> > >
> > > > > >> > > 2016-02-14 23:47 GMT-02:00 Mark Ehle :
> > > > > >> > >
> > > > > >> > > > is all the text being indexed? Check to make sure that
> > there's
> > > > > >> actually
> > > > > >> > > the
> > > > > >> > > > data you are looking for in the index. Is there a setting
> in
> > > > tika
> > > > > >> that
> > > > > >> > > > limits how much is indexed? I seem to remember confronting
> > > this
> > > > > >> problem
> > > > > >> > > > myself once, and the data that I wanted just wasn't in the
> > > index
> > > > > >> > because
> > > > > >> > > it
> > > > > >> > > > was never put there in the first place.Something about
> > > > > >> > setMaxStringLength
> > > > > >> > > > orsomething.
> > > > > >> > > >
> > > > > >> > > > On Sun, Feb 14, 2016 at 8:28 PM, Binoy Dalal <
> > > > > >> binoydala...@gmail.com>
> > > > > >> > > > wrote:
> > > > > >> > > >
> > > > > >> > > > > What you've done so far will highlight every instance of
> > > > > "nietava"
> > > > > >> > > found
> > > > > >> > > > in
> > > > > >> > > > > the field, and return it, i.e., your entire field will
> > > return
> > > > > with
> > > > > >> > all
> > > > > >> > > > the
> > > > > >> > > > > "nietava"s in  tags.
> > > > > >> > > > > If you do not want the entire field, only portions of
> your
> > > > field
> > > > > >> > > > containing
> > > > > >> > > > > the matched terms, then use hl.snippets parameter = the
> > > number
> > > > > of
> > > > > >> > > > snippets
> > > > > >> > > > > you want, in this particular case 3, along with the
> > > > hl.fragsize
> > > > > >> > > parameter
> > > > > >> > > > > set to the same number as your hl.mazAnalyzedChars (or a
> > > > really
> > > > > >> large
> > > > > >> > > > > number).
> > > > > >> > > > >
> > > > > >> > > > > I suggest you go through the wiki documentation for
> > > > highlighting
> > > > > >> > once (
> > > > > >> > > > > https://wiki.apache.org/solr/HighlightingParameters).
> It
> > > > should
> > > > > >> > answer
> > > > > >> > > > all
> > > > > >> > > > > of your questions regarding the use of the standard
> > > > highlighter
> > > > > >> that
> > > > > >> > > you
> > > > > >> > > > > might have.
> > > > > >> > > > >
> > >