Re: Query String Limit

2016-05-09 Thread Zheng Lin Edwin Yeo
Thanks for the info Shawn.

Regards,
Edwin

On 10 May 2016 at 11:27, Shawn Heisey  wrote:

> On 5/9/2016 8:31 PM, Zheng Lin Edwin Yeo wrote:
> > Would like to check, for the increasing of maxBooleanClauses to a large
> > values, if there are 50 collections in my core,  I will have to increase
> > that in the solrconfig.xml for all the 50 collections, before it will
> work?
>
> The maxBooleanClauses is a global Lucene setting across the entire JVM.
> The last core that gets started (the order is not completely
> controllable) will set the value, so if that core's config doesn't have
> a value set, it will go back to the default of 1024.
>
> Which means that yes, you must set it in EVERY core's configuration.
>
> Thanks,
> Shawn
>
>


Re: solrcloud performance problem

2016-05-09 Thread Toke Eskildsen
On Tue, 2016-05-10 at 00:41 +0800, lltvw wrote:
> Recently we setup a 4.10 solrcloud  env with about 9000 doc indexed
> in it,this solrcloud with 12 shards, each shard on one separate
> machine, but when we try to search some infor on solrcloud, the
> response time is about 300ms.

Could you provide us with a sample request? Preferably taken from the
log of one of the shards, so that we also get timing. There will
probably be 2 entries in the log for each request you issue. This will
make it easier for us to check if you have some of the typical problems,
such as very high rows or facet.limit.

- Toke Eskildsen, State and University Library, Denmark




Re: Filter queries & caching

2016-05-09 Thread Jay Potharaju
Thanks for the explanation Eric.

So that I understand this clearly


1)  fq=filter(fromfield:[* TO NOW/DAY+1DAY]&& tofield:[NOW/DAY-7DAY TO *])
&& fq=type:abc
2) fq= fromfield:[* TO NOW/DAY+1DAY]&& fq=tofield:[NOW/DAY-7DAY TO *]) &&
fq=type:abc

Using 1) would benefit from having 2 separate filter caches instead of 3
slots in the cache. But in general both would be using the filter cache.
And secondly it would  be more useful to use filter() in a scenario like
above(mentioned in your email).
Thanks




On Mon, May 9, 2016 at 9:43 PM, Erick Erickson 
wrote:

> You're confusing a query clause with fq when thinking about filter() I
> think.
>
> Essentially they don't need to be used together, i.e.
>
> q=myclause AND filter(field:value)
>
> is identical to
>
> q=myclause&fq=field:value
>
> both in docs returned and filterCache usage.
>
> q=myclause&filter(fq=field:value)
>
> actually uses two filterCache entries, so is probably not what you want to
> use.
>
> the filter() syntax attached to a q clause (not an fq clause) is meant
> to allow you to get speedups
> you want to use compound clauses without having every combination be
> separate filterCache entries.
>
> Consider the following:
> fq=A OR B
> fq=A AND B
> fq=A
> fq=B
>
> These would require 4 filterCache entries.
>
> q=filter(A) OR filter(B)
> q=filter(A) AND filter(B)
> q=filter(A)
> q=filter(B)
>
> would only require two. Yet all of them would be satisfied only by
> looking at the filterCache.
>
> Aside from the example immediately above, which one you use is largely
> a matter of taste.
>
> Best,
> Erick
>
> On Mon, May 9, 2016 at 12:47 PM, Jay Potharaju 
> wrote:
> > Thanks Ahmet...but I am not still clear how is adding filter() option
> > better or is it the same as filtercache?
> >
> > My question is below.
> >
> > "As mentioned above adding filter() will add the filter query to the
> cache.
> > This would mean that results are fetched from cache instead of running n
> > number of filter queries  in parallel.
> > Is it necessary to use the filter() option? I was under the impression
> that
> > all filter queries will get added to the "filtercache". What is the
> > advantage of using filter()?"
> >
> > Thanks
> >
> > On Sun, May 8, 2016 at 6:30 PM, Ahmet Arslan 
> > wrote:
> >
> >> Hi,
> >>
> >> As I understand it useful incase you use an OR operator between two
> >> restricting clauses.
> >> Recall that multiple fq means implicit AND.
> >>
> >> ahmet
> >>
> >>
> >>
> >> On Monday, May 9, 2016 4:02 AM, Jay Potharaju 
> >> wrote:
> >> As mentioned above adding filter() will add the filter query to the
> cache.
> >> This would mean that results are fetched from cache instead of running n
> >> number of filter queries  in parallel.
> >> Is it necessary to use the filter() option? I was under the impression
> that
> >> all filter queries will get added to the "filtercache". What is the
> >> advantage of using filter()?
> >>
> >> *From
> >> doc:
> >>
> https://cwiki.apache.org/confluence/display/solr/Query+Settings+in+SolrConfig
> >> <
> >>
> https://cwiki.apache.org/confluence/display/solr/Query+Settings+in+SolrConfig
> >> >*
> >> This cache is used by SolrIndexSearcher for filters (DocSets) for
> unordered
> >> sets of all documents that match a query. The numeric attributes control
> >> the number of entries in the cache.
> >> Solr uses the filterCache to cache results of queries that use the fq
> >> search parameter. Subsequent queries using the same parameter setting
> >> result in cache hits and rapid returns of results. See Searching for a
> >> detailed discussion of the fq parameter.
> >>
> >> *From Yonik's site: http://yonik.com/solr/query-syntax/#FilterQuery
> >> *
> >>
> >> (Since Solr 5.4)
> >>
> >> A filter query retrieves a set of documents matching a query from the
> >> filter cache. Since scores are not cached, all documents that match the
> >> filter produce the same score (0 by default). Cached filters will be
> >> extremely fast when they are used again in another query.
> >>
> >>
> >> Thanks
> >>
> >>
> >> On Fri, May 6, 2016 at 9:46 AM, Jay Potharaju 
> >> wrote:
> >>
> >> > We have high query load and considering that I think the suggestions
> made
> >> > above will help with performance.
> >> > Thanks
> >> > Jay
> >> >
> >> > On Fri, May 6, 2016 at 7:26 AM, Shawn Heisey 
> >> wrote:
> >> >
> >> >> On 5/6/2016 7:19 AM, Shawn Heisey wrote:
> >> >> > With three separate
> >> >> > fq parameters, you'll get three cache entries in filterCache from
> the
> >> >> > one query.
> >> >>
> >> >> One more tidbit of information related to this:
> >> >>
> >> >> When you have multiple filters and they aren't cached, I am
> reasonably
> >> >> certain that they run in parallel.  Instead of one complex filter,
> you
> >> >> would have three simple filters running simultaneously.  For low to
> >> >> medium query loads on a server with a whole bunch of CPUs, where
> there
> >> >> is plenty of spa

Unable to achieve boosting into solr 5.5

2016-05-09 Thread Upendra Kumar Baliyan
Hi,

We are using solr 5.5, but could not achieve field boosting. We are not getting 
the result as per the below configuration.



Below is configuration in solrconfig.xml for 



edismax

   

  metatag.keywords^10.0 metatag.description^9.0 h1^7.0 h2^6.0 h3^5.0 
h4^4.0 _text_^1.0 id^0.5

   

   100%

   *:*

   10

   *,score

  explicit





Any help ?



Regards

Upendra Kumar Baliyan



Re:Re: solrcloud performance problem

2016-05-09 Thread lltvw
Shawn, 

By using jps command double check the parms used to start solr, i found that 
the max  heap size already set to 10G. So I made a big mistake yesterday.

But by using solr admin UI, I select the collection with performance problem, 
in the overview page I find that the heap memory is about 8M. What is wrong.

Every time I search difference characters, QTime from response header always 
greater than 300ms. If I search again, cause i can hit cache, the response time 
could become to about 30ms.






--
发自我的网易邮箱手机智能版


在 2016-05-10 11:35:27,"Shawn Heisey"  写道:
>On 5/9/2016 9:11 PM, lltvw wrote:
>> You are right, the max heap is 512MB, thanks.
>
>90 million documents split into 12 shards means 7.5 million documents
>per shard.
>
>With that many documents and a 512MB heap, you're VERY lucky if Solr
>doesn't experience OutOfMemoryError problems -- which will make Solr's
>behavior very unpredictable.
>
>Your server has plenty of memory.  Because of the very small max heap,
>it is probably spending a lot of time doing garbage collection.  You'll
>actually need to *increase* your heap size.  I would recommend starting
>with 4GB.
>
>Exactly how to do this will depend on how you're starting it.  If you
>are starting it with "bin/solr" then you can add a -m 4g option to that
>commandline.
>
>Thanks,
>Shawn
>


Using Ping Request Handler in SolrCloud within a load balancer

2016-05-09 Thread Sandy Foley
A couple of questions ...



We've upconfig'd the ping request handler to ZooKeeper within the 
solrconfig.xml.  SolrCloud and ZooKeeper are working fine.
I understand that the /solr/admin/ping command is for a ping on its local 
server only (not from a remote machine).  This is working.I also understand 
that /solr/[core]/admin/ping can be used from a load balancer to ping a 
particular core on a server. This is working also.
Question #1:Is there a SINGLE command that can be issued to each server from a 
load balancer to check the ping status of each server?
Question #2:When running /solr/admin/ping from the load balancer to each Solr 
node, one of the three nodes returns a status ok.  It's the same node every 
time; it's the first node that we set up of the 3 (which is not always the 
leader). The zkcli upconfig command has always been issued from this first 
node.Out of curiosity, if this command is for local ping only, why does this 
return status ok on one node (issued from the load balancer) and not the other 
nodes?

Configuration:
WindowsTomcat 8.0SolrCloud 4.10.3 (3 nodes)External ensemble ZooKeeper 3.4.6 - 
3 servers

  

Re: building solr with an error UNRESOLVED DEPENDENCIES

2016-05-09 Thread Erick Erickson
Not that particular one, but every once in a while the Ivy cache goes
a bit wonky on my local machine. It's a bit painful, but on my mac
there's a directory
~/.ivy2
I go there and "rm -r cache" then restart the build. Be prepared to
wait a while while the build process downloads a lot of jars.

You're working on a Windows box, so I'm not quite sure where the Ivy
cache will be located, but you get the idea...

Best,
Erick

On Mon, May 9, 2016 at 6:43 PM, 黄炜文jwos <1055069...@qq.com> wrote:
> when i try to compile solr 5.4.0 with src, i met an error:
>
>
> [ivy:retrieve]  report for org.apache.solr#core;working@zozt-PC test.MiniKdc 
> produced in F:\solr-5.4.0-src 
> (1)\solr-5.4.0\lucene\build\ivy-resolution-cache\org.apache.solr-core-test.MiniKdc.xml
> [ivy:retrieve]  resolve done (1982ms resolve - 156ms download)
> [ivy:retrieve]
> [ivy:retrieve] :: problems summary ::
> [ivy:retrieve]  WARNINGS
> [ivy:retrieve]  ::
> [ivy:retrieve]  ::  UNRESOLVED DEPENDENCIES ::
> [ivy:retrieve]  ::
> [ivy:retrieve]  :: 
> org.apache.directory.server#apacheds-interceptors-admin;2.0.0-M15: 
> configuration not found in 
> org.apache.directory.server#apacheds-interceptors-admin;2.0.0-M15: 'master'. 
> It was required from org.apache.solr#core;working@zozt-PC test.MiniKdc
>
>
>
> i'm not familiar with ivy and couldn't figure out what's problem is ,anyone 
> met this situation before?
> Please give me some advice,thanks a lot


Re: Filter queries & caching

2016-05-09 Thread Erick Erickson
You're confusing a query clause with fq when thinking about filter() I think.

Essentially they don't need to be used together, i.e.

q=myclause AND filter(field:value)

is identical to

q=myclause&fq=field:value

both in docs returned and filterCache usage.

q=myclause&filter(fq=field:value)

actually uses two filterCache entries, so is probably not what you want to use.

the filter() syntax attached to a q clause (not an fq clause) is meant
to allow you to get speedups
you want to use compound clauses without having every combination be
separate filterCache entries.

Consider the following:
fq=A OR B
fq=A AND B
fq=A
fq=B

These would require 4 filterCache entries.

q=filter(A) OR filter(B)
q=filter(A) AND filter(B)
q=filter(A)
q=filter(B)

would only require two. Yet all of them would be satisfied only by
looking at the filterCache.

Aside from the example immediately above, which one you use is largely
a matter of taste.

Best,
Erick

On Mon, May 9, 2016 at 12:47 PM, Jay Potharaju  wrote:
> Thanks Ahmet...but I am not still clear how is adding filter() option
> better or is it the same as filtercache?
>
> My question is below.
>
> "As mentioned above adding filter() will add the filter query to the cache.
> This would mean that results are fetched from cache instead of running n
> number of filter queries  in parallel.
> Is it necessary to use the filter() option? I was under the impression that
> all filter queries will get added to the "filtercache". What is the
> advantage of using filter()?"
>
> Thanks
>
> On Sun, May 8, 2016 at 6:30 PM, Ahmet Arslan 
> wrote:
>
>> Hi,
>>
>> As I understand it useful incase you use an OR operator between two
>> restricting clauses.
>> Recall that multiple fq means implicit AND.
>>
>> ahmet
>>
>>
>>
>> On Monday, May 9, 2016 4:02 AM, Jay Potharaju 
>> wrote:
>> As mentioned above adding filter() will add the filter query to the cache.
>> This would mean that results are fetched from cache instead of running n
>> number of filter queries  in parallel.
>> Is it necessary to use the filter() option? I was under the impression that
>> all filter queries will get added to the "filtercache". What is the
>> advantage of using filter()?
>>
>> *From
>> doc:
>> https://cwiki.apache.org/confluence/display/solr/Query+Settings+in+SolrConfig
>> <
>> https://cwiki.apache.org/confluence/display/solr/Query+Settings+in+SolrConfig
>> >*
>> This cache is used by SolrIndexSearcher for filters (DocSets) for unordered
>> sets of all documents that match a query. The numeric attributes control
>> the number of entries in the cache.
>> Solr uses the filterCache to cache results of queries that use the fq
>> search parameter. Subsequent queries using the same parameter setting
>> result in cache hits and rapid returns of results. See Searching for a
>> detailed discussion of the fq parameter.
>>
>> *From Yonik's site: http://yonik.com/solr/query-syntax/#FilterQuery
>> *
>>
>> (Since Solr 5.4)
>>
>> A filter query retrieves a set of documents matching a query from the
>> filter cache. Since scores are not cached, all documents that match the
>> filter produce the same score (0 by default). Cached filters will be
>> extremely fast when they are used again in another query.
>>
>>
>> Thanks
>>
>>
>> On Fri, May 6, 2016 at 9:46 AM, Jay Potharaju 
>> wrote:
>>
>> > We have high query load and considering that I think the suggestions made
>> > above will help with performance.
>> > Thanks
>> > Jay
>> >
>> > On Fri, May 6, 2016 at 7:26 AM, Shawn Heisey 
>> wrote:
>> >
>> >> On 5/6/2016 7:19 AM, Shawn Heisey wrote:
>> >> > With three separate
>> >> > fq parameters, you'll get three cache entries in filterCache from the
>> >> > one query.
>> >>
>> >> One more tidbit of information related to this:
>> >>
>> >> When you have multiple filters and they aren't cached, I am reasonably
>> >> certain that they run in parallel.  Instead of one complex filter, you
>> >> would have three simple filters running simultaneously.  For low to
>> >> medium query loads on a server with a whole bunch of CPUs, where there
>> >> is plenty of spare CPU power, this can be a real gain in performance ...
>> >> but if the query load is really high, it might be a bad thing.
>> >>
>> >> Thanks,
>> >> Shawn
>> >>
>> >>
>> >
>> >
>> > --
>> > Thanks
>> > Jay Potharaju
>>
>> >
>> >
>>
>>
>>
>> --
>> Thanks
>> Jay Potharaju
>>
>
>
>
> --
> Thanks
> Jay Potharaju


Re: Streaming expressions join operations

2016-05-09 Thread Ryan Cutter
Yes, the people collection has the personId and pets has ownerId, as
described.
On May 9, 2016 8:55 PM, "Joel Bernstein"  wrote:

> The example is using two collections: people and pets. So these collections
> would need to be present for the join expression to work.
>
> Joel Bernstein
> http://joelsolr.blogspot.com/
>
> On Mon, May 9, 2016 at 10:43 PM, Ryan Cutter  wrote:
>
> > Thanks Joel, I added the personId and ownerId fields before ingested a
> > little data.  I made them to be stored=true/multiValue=false/longs (and
> > strings, later).  Is additional schema required?
> >
> > On Mon, May 9, 2016 at 6:45 PM, Joel Bernstein 
> wrote:
> >
> > > Hi,
> > >
> > > The example in the cwiki would require setting up the people and pets
> > > collections. Unless I'm mistaken this won't work with the out of the
> box
> > > schemas. So you'll need to setup some test schemas to get started.
> > Although
> > > having out of the box streaming schemas is a great idea.
> > >
> > > Joel Bernstein
> > > http://joelsolr.blogspot.com/
> > >
> > > On Mon, May 9, 2016 at 9:22 PM, Ryan Cutter 
> > wrote:
> > >
> > > > Hello, I'm checking out the cool stream join operations in Solr 6.0
> but
> > > > can't seem to the example listed on the wiki to work:
> > > >
> > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/solr/Streaming+Expressions#StreamingExpressions-innerJoin
> > > >
> > > > innerJoin(
> > > >   search(people, q=*:*, fl="personId,name", sort="personId asc"),
> > > >   search(pets, q=type:cat, fl="ownerId,petName", sort="ownerId asc"),
> > > >   on="personId=ownerId"
> > > > )
> > > >
> > > > ERROR - 2016-05-09 21:42:43.497; [c:pets s:shard1 r:core_node1
> > > > x:pets_shard1_replica1] org.apache.solr.common.SolrException;
> > > > java.io.IOException: java.lang.NullPointerException
> > > >
> > > > at
> > > >
> > > >
> > >
> >
> org.apache.solr.client.solrj.io.stream.CloudSolrStream.constructStreams(CloudSolrStream.java:339)
> > > >
> > > > 1. Joel Bernstein pointed me at SOLR-9058.  Is this the likely bug?
> > > > 2. What kind of field should personId and ownerId be?  long, string,
> > > > something else?
> > > > 3. Does someone have an example schema or dataset that show off these
> > > > joins?  If not, it's something I could work on for future souls.
> > > >
> > > > Thanks! Ryan
> > > >
> > >
> >
>


Re: Solr 5.x bug with Service installation script?

2016-05-09 Thread Erick Erickson
How do you shut down your Solrs? Any kind of un-graceful
stopping (kill -9 is a favorite) may leave the lock file around.

It can't be coming from nowhere, so my guess is that
it's present in the source or destination before
you do your copy...

Best,
Erick

On Mon, May 9, 2016 at 10:30 AM, A Laxmi  wrote:
> yes, I always shutdown both source and destination Solr before copying the
> index over from one to another. Somehow the write.lock only happens when
> Solr restarts from service script. If loads just fine when started manually.
>
> On Mon, May 9, 2016 at 1:20 PM, Abdel Belkasri  wrote:
>
>> Did you copy the core while solr is running? if yes, first shuown source
>> and destination solr, copy intex to the other solr, then restat solr nodes.
>> Lock files get written to the core while solr is running and doing indexing
>> or searching, etc.
>>
>> On Mon, May 9, 2016 at 12:38 PM, A Laxmi  wrote:
>>
>> > Hi,
>> >
>> > I have installed Solr 5.3.1 using the Service Installation Script. I was
>> > able to successfully start and stop Solr using service solr start/stop
>> > commands and Solr loads up just fine.
>> >
>> > However, when I stop Solr service and copy an index of a core from one
>> > server to another with same exact version of Solr and its corresponding
>> > conf and restart the service, it complains about write.lock file when
>> none
>> > exists under the path that it specifies in the log.
>> >
>> > To validate whether the issue is with the data that is being copied or
>> the
>> > service script itself, I copied the collection directory with new index
>> > into example-DIH directory and restarted Solr manually bin/solr start -e
>> > dih -m 2g, it worked without any error. So, atleast this validates that
>> > collection data is just fine and service script is creating a lock
>> > everytime a new index is copied from another server though it has the
>> same
>> > exact Solr version.
>> >
>> > Did anyone experience the same? Any thoughts if this is a bug?
>> >
>> > Thanks!
>> > AL
>> >
>>
>>
>>
>> --
>> Abdel K. Belkasri, PhD
>>


Re: Solr re-indexing in case of store=false

2016-05-09 Thread Erick Erickson
Stored data is compressed by default, anecdotally there's about
a 2:1 compression ratio.

But the _other_ reason not to store all the data is that
it then gets replicated. If you have master/slave or SolrCloud
with replicas, you have N copies of your index and each and
every one of them has a copy of all your stored data

Best,
Erick

On Mon, May 9, 2016 at 6:14 AM, Ali Nazemian  wrote:
> Dear Erick,
> Hi,
> Thank you very much. About the storing part you are right, unless the
> primary datastore uses some kind of data compression which in my case it
> does (I am using Cassandra as a primary datastore), and I am not sure about
> Solr that it has any kind of compression or not.
> According to your reply, it seems that I have to do that in a hard way.  I
> mean using the primary datastore to build the index from scratch.
>
> Sincerely,
>
> On Sun, May 8, 2016 at 11:07 PM, Erick Erickson 
> wrote:
>
>> bq: I would be grateful if somebody could introduce other way of
>> re-indexing
>> the whole data without using another datastore
>>
>> Not possible currently. Consider what's _in_ the index when stored="false".
>> The actual terms are the output of the entire analysis chain, including
>> stemming, stopword removal, synonym substitution etc. Since the
>> indexing process is lossy, you simply cannot reconstruct the original
>> stream from the indexed terms.
>>
>> I suppose one _could_ do this in the case of docValues only index with
>> the new return-values-from-docvalues functionality, but even that's lossy
>> because the order of returned values may not be the original insertion
>> order. And if that suits your needs, a pretty simple driver program would
>> suffice.
>>
>> To do this from indexed-only terms you'd have to somehow store the
>> original version of each term or store some codes indicating exactly
>> how to reconstruct the original steam, which very possibly would take
>> up as much space as if you'd just stored the values anyway. _And_ it
>> would burden every one else who didn't want to do this with a bloated
>> index.
>>
>> Best,
>> Erick
>>
>> On Sun, May 8, 2016 at 4:25 AM, Ali Nazemian 
>> wrote:
>> > Dear all,
>> > Hi,
>> > I was wondering, is it possible to re-index Solr 6.0 data in case of
>> > store=false? I am using Solr as a secondary datastore, and for the sake
>> of
>> > space efficiency all the fields (except id) are considered as
>> store=false.
>> > Currently, due to some changes in application business, Solr schema
>> should
>> > change, and in order to see the effect of changing schema on old data, I
>> > have to do the re-index process.  I know that one way of re-indexing in
>> > Solr is reading data from one collection (core) and inserting that to
>> > another one, but this solution is not possible for store=false fields,
>> and
>> > re-indexing the whole data through primary datastore is kind of costly,
>> so
>> > I would be grateful if somebody could introduce other way of re-indexing
>> > the whole data without using another datastore.
>> >
>> > Sincerely,
>> >
>> > --
>> > A.Nazemian
>>
>
>
>
> --
> A.Nazemian


Re: Replicate Between sites

2016-05-09 Thread Erick Erickson
bq: How similar thing could be done in 4.9.1?

That's not going to happen. More precisely,
there is zero chance that anyone will take on that
work unless it's a custom one-off that you
hire done or develop internally. And even
if someone took this on, it'd never be officially
released.

IOW, if you want to try backporting it on your own,
have at it but that'll be completely unsupported.

One thing people have done is create two
independent clusters, complete to separate ZK
ensembles and have the indexing client send
updates to both DCs. At that point it also makes
sense to have them both serve queries.

Another choice is to have your system-of-record
replicated to both DCs, and have the indexing
process run in both DCs from the local copy of
the system-of-record to the local Solr
clusters independently of each other.

Best,
Erick

On Mon, May 9, 2016 at 12:31 PM, Abdel Belkasri  wrote:
> Hi Alex,
>
> just started reading about CDCR, looks very promissing. Is this only in
> 6.0? our PROD server are running 4.9.1 and we cannot upgrade just yet. How
> similar thing could be done in 4.9.1?
>
> Thanks,
> --Abdel
>
> On Mon, May 9, 2016 at 2:59 PM, Alexandre Rafalovitch 
> wrote:
>
>> Have you looked at Cross Data Center replication that's the new big
>> feature in Solr 6.0?
>>
>> Regards,
>>Alex.
>> 
>> Newsletter and resources for Solr beginners and intermediates:
>> http://www.solr-start.com/
>>
>>
>> On 10 May 2016 at 02:13, Abdel Belkasri  wrote:
>> > Hi there,
>> >
>> > we have the main site setup as follows:
>> > solrCould:
>> > App --> smart Client (solrj) --> ensemble of zookeeper --> SolrCloud Noes
>> > (with slice/shard/recplica)
>> > Works fine.
>> >
>> > On the DR site we have a mirror setup, how can we keep the two site in
>> > sync, so that if something happened we point the app to DR and get back
>> up
>> > and running?
>> >
>> > Note: making zookeeper span the two sites is not an option because of
>> > network latency.
>> >
>> > We are looking for replication (kind of master-slave that exists in Solr
>> > classic)...how that is achieved in SolrCloud?
>> >
>> > Thanks,
>> > --Abdel.
>>
>
>
>
> --
> Abdel K. Belkasri, PhD


Re: Advice to add additional non-related fields to a collection or create a subset of it?

2016-05-09 Thread Erick Erickson
Not quite sure where you are at with this. It sounds
like your slow loading is fixed and was a coding
issue on your part, that happens to us all.

bq: Is it advisable to has as less number of
queries to solr in a page?

Of course it is advisable to have as few Solr queries
executed to display a page as possible. Every one
costs you at least _some_ turnaround time. You can
mitigate this (assuming your Solr server isn't running
flat out) by issuing the subsequent queries in parallel
threads.

But it's not really a question to me of advisability, it's a
question of what your application needs to deliver. The
use-case drives all. You can do some tricks like display
partial pages and fill in the rest behind the scenes to
display when your user clicks something and the like.

bq: In my case, by denormalizing,that means putting the
product and supplier information into one collection?
The supplier information are stored but not indexed in the collection.

It Depends(tm). If all you want to do is provide supplier
information when people do product searches then stored-only
is fine.

If you want to perform queries like "show me all the products
supplied by supplier X", then you need to index at least
some values too.

Best,
Erick

On Sun, May 8, 2016 at 10:36 PM, Derek Poh  wrote:
> Hi Erick
>
> In my case, by denormalizing,that means putting the product and supplier
> information into one collection?
> The supplier information arestored but not indexed in thecollection.
>
> We haveidentified itwas a combination of a loop and bad source data that
> caused an endless loop under certain scenario.
>
> Is it advisable to has as less number of queries to solr in a page?
>
>
> On 5/6/2016 11:17 PM, Erick Erickson wrote:
>>
>> Denormalizing the data is usually the first thing to try. That's
>> certainly the preferred option if it doesn't bloat the index
>> unacceptably.
>>
>> But my real question is what have you done to try to figure out _why_
>> it's slow? Do you have some loop
>> like
>> for (each found document)
>> extract all the supplier IDs and query Solr for them)
>>
>> ? That's a fundamental design decision that will be expensive.
>>
>> Have you examined the time each query takes to see if Solr is really
>> the bottleneck or whether it's "something else"? Mind you, I have no
>> clue what "something else" is here
>>
>> Do you ever return lots of rows (i.e. thousands)?
>>
>> Solr serves queries very quickly, so I'd concentrate on identifying what
>> is slow before jumping to a solution
>>
>> Best,
>> Erick
>>
>> On Wed, May 4, 2016 at 10:28 PM, Derek Poh  wrote:
>>>
>>> Hi
>>>
>>> We have a "product" collection and a "supplier" collection.
>>> The "product" collection contains products information and "supplier"
>>> collection contains the product's suppliers information.
>>> We have a subsidiary page that query on "product" collection for the
>>> search.
>>> The display result include product and supplier information.
>>> This page will query the "product" collection to get the matching product
>>> records.
>>>  From this query a list of the matching product's supplier id is
>>> extracted
>>> and used in a filter query against the "supplier" collection to get the
>>> necessary supplier's information.
>>>
>>> The loading of this page is very slow, it leads to timeout at times as
>>> well.
>>> Beside looking at tweaking the codes of the page we are also looking at
>>> what
>>> tweaking can be done on solr side. Reducing the number of queries
>>> generated
>>> bythis page was one of the optionto try.
>>>
>>> The main "product" collection is also use by our site main search page
>>> and
>>> other subsidiary pages as well. So the query load on it is substantial.
>>> It has about 6.5 million documents and index size of 38-39 GB.
>>> It is setup as 1 shard with 5 replicas. Each replica is on it's own
>>> server.
>>> Total of 5 servers.
>>> There are other smaller collections with similar 1 shard 5 replicas setup
>>> residing on these servers as well.
>>>
>>> I am thinking of either
>>> 1. Index supplier information into the "product" collection.
>>> 2. Create another similar "product" collection for this page to use. This
>>> collection will have lesser product fields and will include the required
>>> supplier fields. But the number of documents in it will be the same as
>>> the
>>> main "product" collection. The index size will be smallerthough.
>>>
>>> With either 2 options we do not need to query "supplier" collection. So
>>> there is one less query and hopefully it will improve the performance of
>>> this page.
>>>
>>> What is the advise between the 2 options?
>>> Any other advice or options?
>>>
>>> Derek
>>>
>>> --
>>> CONFIDENTIALITY NOTICE
>>> This e-mail (including any attachments) may contain confidential and/or
>>> privileged information. If you are not the intended recipient or have
>>> received this e-mail in error, please inform the sender immediately and
>>> delete this e-ma

Re: Streaming expressions join operations

2016-05-09 Thread Joel Bernstein
The example is using two collections: people and pets. So these collections
would need to be present for the join expression to work.

Joel Bernstein
http://joelsolr.blogspot.com/

On Mon, May 9, 2016 at 10:43 PM, Ryan Cutter  wrote:

> Thanks Joel, I added the personId and ownerId fields before ingested a
> little data.  I made them to be stored=true/multiValue=false/longs (and
> strings, later).  Is additional schema required?
>
> On Mon, May 9, 2016 at 6:45 PM, Joel Bernstein  wrote:
>
> > Hi,
> >
> > The example in the cwiki would require setting up the people and pets
> > collections. Unless I'm mistaken this won't work with the out of the box
> > schemas. So you'll need to setup some test schemas to get started.
> Although
> > having out of the box streaming schemas is a great idea.
> >
> > Joel Bernstein
> > http://joelsolr.blogspot.com/
> >
> > On Mon, May 9, 2016 at 9:22 PM, Ryan Cutter 
> wrote:
> >
> > > Hello, I'm checking out the cool stream join operations in Solr 6.0 but
> > > can't seem to the example listed on the wiki to work:
> > >
> > >
> > >
> >
> https://cwiki.apache.org/confluence/display/solr/Streaming+Expressions#StreamingExpressions-innerJoin
> > >
> > > innerJoin(
> > >   search(people, q=*:*, fl="personId,name", sort="personId asc"),
> > >   search(pets, q=type:cat, fl="ownerId,petName", sort="ownerId asc"),
> > >   on="personId=ownerId"
> > > )
> > >
> > > ERROR - 2016-05-09 21:42:43.497; [c:pets s:shard1 r:core_node1
> > > x:pets_shard1_replica1] org.apache.solr.common.SolrException;
> > > java.io.IOException: java.lang.NullPointerException
> > >
> > > at
> > >
> > >
> >
> org.apache.solr.client.solrj.io.stream.CloudSolrStream.constructStreams(CloudSolrStream.java:339)
> > >
> > > 1. Joel Bernstein pointed me at SOLR-9058.  Is this the likely bug?
> > > 2. What kind of field should personId and ownerId be?  long, string,
> > > something else?
> > > 3. Does someone have an example schema or dataset that show off these
> > > joins?  If not, it's something I could work on for future souls.
> > >
> > > Thanks! Ryan
> > >
> >
>


Re: Solr edismax field boosting

2016-05-09 Thread Nick D
One thing to note: you can also take on wt=ruby&indent=true it makes the
debug explain data look better for pasting.

But what I am seeing is a score is being all based on that fact that it
found the content you were looking for in an unboosted field, i.e.
*_text_*, so your boosts don't look to be having any value in scoring the
way you are setup currently.  Next if you look at what is creating the
score difference you can see its being computed from the tf-Norm values.

But maybe pasting in a cleaning version of the debug cause getting all
scoring lined up is a bit of a pain. You can get it to look something like
this with the wt=ruby&indent=true:

'
10.541302 = (MATCH) sum of:
  10.541302 = (MATCH) max plus 0.01 times others of:
10.518621 = (MATCH) weight(ngram_tags_a:"developer group"~1 in 88)
[DefaultSimilarity], result of:
  10.518621 = score(doc=88,freq=1.0), product of:
0.64834845 = queryWeight, product of:
  16.223717 = idf(), sum of:
9.416559 = idf(docFreq=21, maxDocs=99469)
6.8071575 = idf(docFreq=298, maxDocs=99469)
  0.039963003 = queryNorm
16.223717 = fieldWeight in 88, product of:
  1.0 = tf(freq=1.0), with freq of:
1.0 = phraseFreq=1.0
  16.223717 = idf(), sum of:
9.416559 = idf(docFreq=21, maxDocs=99469)
6.8071575 = idf(docFreq=298, maxDocs=99469)
  1.0 = fieldNorm(doc=88)
0.32740274 = (MATCH) weight(ngram_content:"developer group"~1 in 88)
[DefaultSimilarity], result of:
  0.32740274 = score(doc=88,freq=2.0), product of:
0.38474476 = queryWeight, product of:
  9.627523 = idf(), sum of:
6.387304 = idf(docFreq=454, maxDocs=99469)
3.240219 = idf(docFreq=10586, maxDocs=99469)
  0.039963003 = queryNorm
0.85096085 = fieldWeight in 88, product of:
  1.4142135 = tf(freq=2.0), with freq of:
2.0 = phraseFreq=2.0
  9.627523 = idf(), sum of:
6.387304 = idf(docFreq=454, maxDocs=99469)
3.240219 = idf(docFreq=10586, maxDocs=99469)
  0.0625 = fieldNorm(doc=88)
1.9406005 = (MATCH) weight(ngram_label:"developer group"~1 in 88)
[DefaultSimilarity], result of:
  1.9406005 = score(doc=88,freq=1.0), product of:
0.556964 = queryWeight, product of:
  13.936991 = idf(), sum of:
9.3721075 = idf(docFreq=22, maxDocs=99469)
4.5648837 = idf(docFreq=2814, maxDocs=99469)
  0.039963003 = queryNorm
3.4842477 = fieldWeight in 88, product of:
  1.0 = tf(freq=1.0), with freq of:
1.0 = phraseFreq=1.0
  13.936991 = idf(), sum of:
9.3721075 = idf(docFreq=22, maxDocs=99469)
4.5648837 = idf(docFreq=2814, maxDocs=99469)
  0.25 = fieldNorm(doc=88)
'

Also dont know what Solr version you may be using so you explain data might
look a bit different.

This link is a bit out of date but may help you understand how the scoring
works:
https://wiki.apache.org/solr/SolrRelevancyFAQ#How_are_documents_scored

Nick



On Mon, May 9, 2016 at 8:08 PM, Megha Bhandari 
wrote:

> To clarify on the debug information given earlier , we changed the query
> factor to the following to ignore title field completely
>
> metatag.description^9 h1^7 h2^6 h3^5 h4^4 _text_^1 id^0.5"
>
> But still title results are coming on top
>
> Full response with debug on:
>
> Full response
>
> {
>   "responseHeader":{
> "status":0,
> "QTime":13,
> "params":{
>   "mm":"100%",
>   "q":"Foo",
>   "tie":"0.99",
>   "defType":"edismax",
>   "q.alt":"Foo",
>   "indent":"on",
>   "qf":"metatag.description^9 h1^7 h2^6 h3^5 h4^4 _text_^1 id^0.5",
>   "wt":"json",
>   "debugQuery":"on",
>   "_":"1462810987788"}},
>   "response":{"numFound":3,"start":0,"maxScore":0.8430033,"docs":[
>   {
> "h2":["Looks like your browser is a little out-of-date."],
> "h3":["Already a member?"],
> "title":"Foo Custon",
> "id":"
> http://localhost:4503/content/uhcdotcom/en/home/waysin/poc/Foo-custon.html
> ",
> "tstamp":"2016-05-09T17:15:57.604Z",
> "metatag.hideininternalsearch":[false],
> "segment":[20160509224553],
> "digest":["844296a63233b3e4089424fe1ec9d036"],
> "boost":[1.4142135],
> "lang":"en",
> "_version_":1533871839698223104,
> "host":"localhost",
> "url":"
> http://localhost:4503/content/uhcdotcom/en/home/waysin/poc/Foo-custon.html
> ",
> "score":0.8430033},
>   {
> "metatag.description":"test",
> "h1":["Health care"],
> "h2":["Looks like your browser is a little out-of-date."],
> "h3":["Already a member?"],
> "title":"Foo",
> "id":"
> http://localhost:4503/content/uhcdotcom/en/home/waysin/poc/Foo-custon/health-care.html
> ",
> "tstamp":"2016-05-09T17:15:57.838Z",
> "metatag.hideininternalsearc

Re: solrcloud performance problem

2016-05-09 Thread Shawn Heisey
On 5/9/2016 9:11 PM, lltvw wrote:
> You are right, the max heap is 512MB, thanks.

90 million documents split into 12 shards means 7.5 million documents
per shard.

With that many documents and a 512MB heap, you're VERY lucky if Solr
doesn't experience OutOfMemoryError problems -- which will make Solr's
behavior very unpredictable.

Your server has plenty of memory.  Because of the very small max heap,
it is probably spending a lot of time doing garbage collection.  You'll
actually need to *increase* your heap size.  I would recommend starting
with 4GB.

Exactly how to do this will depend on how you're starting it.  If you
are starting it with "bin/solr" then you can add a -m 4g option to that
commandline.

Thanks,
Shawn



Re: Query String Limit

2016-05-09 Thread Shawn Heisey
On 5/9/2016 8:31 PM, Zheng Lin Edwin Yeo wrote:
> Would like to check, for the increasing of maxBooleanClauses to a large
> values, if there are 50 collections in my core,  I will have to increase
> that in the solrconfig.xml for all the 50 collections, before it will work?

The maxBooleanClauses is a global Lucene setting across the entire JVM. 
The last core that gets started (the order is not completely
controllable) will set the value, so if that core's config doesn't have
a value set, it will go back to the default of 1024.

Which means that yes, you must set it in EVERY core's configuration.

Thanks,
Shawn



Re:Re: solrcloud performance problem

2016-05-09 Thread lltvw
Hi shawn,

You are right, the max heap is 512MB, thanks.




--
发自我的网易邮箱手机智能版


在 2016-05-10 10:02:44,"Shawn Heisey"  写道:
>On 5/9/2016 4:41 PM, lltvw wrote:
>> Shawn, thanks.
>>
>> Each machine with 48G memory installed,  and now with 20G free, I check JVM 
>> heap size use solr admin UI, the heap size is about 20M.
>
>What is the *max* heap?  An unmodified install of Solr 5.x or later has
>a max heap of 512MB.
>
>In the admin UI, there are three numbers for "JVM Memory".  The max heap
>is the largest of the three numbers.
>
>Thanks,
>Shawn
>


RE: Solr edismax field boosting

2016-05-09 Thread Megha Bhandari
To clarify on the debug information given earlier , we changed the query factor 
to the following to ignore title field completely

metatag.description^9 h1^7 h2^6 h3^5 h4^4 _text_^1 id^0.5"

But still title results are coming on top

Full response with debug on:

Full response

{
  "responseHeader":{
"status":0,
"QTime":13,
"params":{
  "mm":"100%",
  "q":"Foo",
  "tie":"0.99",
  "defType":"edismax",
  "q.alt":"Foo",
  "indent":"on",
  "qf":"metatag.description^9 h1^7 h2^6 h3^5 h4^4 _text_^1 id^0.5",
  "wt":"json",
  "debugQuery":"on",
  "_":"1462810987788"}},
  "response":{"numFound":3,"start":0,"maxScore":0.8430033,"docs":[
  {
"h2":["Looks like your browser is a little out-of-date."],
"h3":["Already a member?"],
"title":"Foo Custon",

"id":"http://localhost:4503/content/uhcdotcom/en/home/waysin/poc/Foo-custon.html";,
"tstamp":"2016-05-09T17:15:57.604Z",
"metatag.hideininternalsearch":[false],
"segment":[20160509224553],
"digest":["844296a63233b3e4089424fe1ec9d036"],
"boost":[1.4142135],
"lang":"en",
"_version_":1533871839698223104,
"host":"localhost",

"url":"http://localhost:4503/content/uhcdotcom/en/home/waysin/poc/Foo-custon.html";,
"score":0.8430033},
  {
"metatag.description":"test",
"h1":["Health care"],
"h2":["Looks like your browser is a little out-of-date."],
"h3":["Already a member?"],
"title":"Foo",

"id":"http://localhost:4503/content/uhcdotcom/en/home/waysin/poc/Foo-custon/health-care.html";,
"tstamp":"2016-05-09T17:15:57.838Z",
"metatag.hideininternalsearch":[false],
"metatag.topresultthumbnailalt":[","],
"segment":[20160509224553],
"digest":["dd4ef8879be2d4d3f28e24928e9b84c5"],
"boost":[1.4142135],
"lang":"en",
"metatag.keywords":",",
"_version_":1533871839731777536,
"host":"localhost",

"url":"http://localhost:4503/content/uhcdotcom/en/home/waysin/poc/Foo-custon/health-care.html";,
"score":0.7009616},
  {
"metatag.description":"Foo decription testing",
"h1":["healthcare description"],
"h2":["Looks like your browser is a little out-of-date."],
"h3":["Already a member?"],
"title":"healthcare description",

"id":"http://localhost:4503/content/uhcdotcom/en/home/waysin/poc/Foo-custon/healthcare-description.html";,
"tstamp":"2016-05-09T17:15:57.682Z",
"metatag.hideininternalsearch":[false],
"metatag.topresultthumbnailalt":[","],
"segment":[20160509224553],
"digest":["6262795db6aed05a5de7cc3cbe496401"],
"boost":[1.4142135],
"lang":"en",
"metatag.keywords":",",
"_version_":1533871839739117568,
"host":"localhost",

"url":"http://localhost:4503/content/uhcdotcom/en/home/waysin/poc/Foo-custon/healthcare-description.html";,
"score":0.5481102}]
  },
  "debug":{
"rawquerystring":"Foo",
"querystring":"Foo",
"parsedquery":"(+DisjunctionMaxQuery(((metatag.description:Foo)^9.0 | 
(h1:Foo)^7.0 | (h2:Foo)^6.0 | (h3:Foo)^5.0 | (id:Foo)^0.5 | (h4:Foo)^4.0 | 
_text_:Foo)~0.99))/no_coord",
"parsedquery_toString":"+((metatag.description:Foo)^9.0 | (h1:Foo)^7.0 | 
(h2:Foo)^6.0 | (h3:Foo)^5.0 | (id:Foo)^0.5 | (h4:Foo)^4.0 | _text_:Foo)~0.99",
"explain":{
  
"http://localhost:4503/content/uhcdotcom/en/home/waysin/poc/Foo-custon.html":"\n0.84300333
 = max plus 0.99 times others of:\n  0.84300333 = weight(_text_:Foo in 0) [], 
result of:\n0.84300333 = score(doc=0,freq=6.0 = termFreq=6.0\n), product 
of:\n  0.44183275 = idf(docFreq=4, docCount=6)\n  1.9079694 = tfNorm, 
computed from:\n6.0 = termFreq=6.0\n1.2 = parameter k1\n
0.75 = parameter b\n121.64 = avgFieldLength\n83.591835 = 
fieldLength\n",
  
"http://localhost:4503/content/uhcdotcom/en/home/waysin/poc/Foo-custon/health-care.html":"\n0.7009616
 = max plus 0.99 times others of:\n  0.7009616 = weight(_text_:Foo in 3) [], 
result of:\n0.7009616 = score(doc=3,freq=2.0 = termFreq=2.0\n), product 
of:\n  0.44183275 = idf(docFreq=4, docCount=6)\n  1.5864862 = tfNorm, 
computed from:\n2.0 = termFreq=2.0\n1.2 = parameter k1\n
0.75 = parameter b\n121.64 = avgFieldLength\n64.0 = 
fieldLength\n",
  
"http://localhost:4503/content/uhcdotcom/en/home/waysin/poc/Foo-custon/healthcare-description.html":"\n0.5481102
 = max plus 0.99 times others of:\n  0.5481102 = weight(_text_:Foo in 4) [], 
result of:\n0.5481102 = score(doc=4,freq=1.0 = termFreq=1.0\n), product 
of:\n  0.44183275 = idf(docFreq=4, docCount=6)\n  1.2405376 = tfNorm, 
computed from:\n1.0 = termFreq=1.0\n1.2 = parameter k1\n
0.75 = parameter b\n121.64 = avgFieldLength\

Re: Streaming expressions join operations

2016-05-09 Thread Ryan Cutter
Thanks Joel, I added the personId and ownerId fields before ingested a
little data.  I made them to be stored=true/multiValue=false/longs (and
strings, later).  Is additional schema required?

On Mon, May 9, 2016 at 6:45 PM, Joel Bernstein  wrote:

> Hi,
>
> The example in the cwiki would require setting up the people and pets
> collections. Unless I'm mistaken this won't work with the out of the box
> schemas. So you'll need to setup some test schemas to get started. Although
> having out of the box streaming schemas is a great idea.
>
> Joel Bernstein
> http://joelsolr.blogspot.com/
>
> On Mon, May 9, 2016 at 9:22 PM, Ryan Cutter  wrote:
>
> > Hello, I'm checking out the cool stream join operations in Solr 6.0 but
> > can't seem to the example listed on the wiki to work:
> >
> >
> >
> https://cwiki.apache.org/confluence/display/solr/Streaming+Expressions#StreamingExpressions-innerJoin
> >
> > innerJoin(
> >   search(people, q=*:*, fl="personId,name", sort="personId asc"),
> >   search(pets, q=type:cat, fl="ownerId,petName", sort="ownerId asc"),
> >   on="personId=ownerId"
> > )
> >
> > ERROR - 2016-05-09 21:42:43.497; [c:pets s:shard1 r:core_node1
> > x:pets_shard1_replica1] org.apache.solr.common.SolrException;
> > java.io.IOException: java.lang.NullPointerException
> >
> > at
> >
> >
> org.apache.solr.client.solrj.io.stream.CloudSolrStream.constructStreams(CloudSolrStream.java:339)
> >
> > 1. Joel Bernstein pointed me at SOLR-9058.  Is this the likely bug?
> > 2. What kind of field should personId and ownerId be?  long, string,
> > something else?
> > 3. Does someone have an example schema or dataset that show off these
> > joins?  If not, it's something I could work on for future souls.
> >
> > Thanks! Ryan
> >
>


RE: Solr edismax field boosting

2016-05-09 Thread Megha Bhandari
Hi

Following is the debug information with debug=true

Excerpt of debug information :

"debug":{
"rawquerystring":"Upendra",
"querystring":"Upendra",
"parsedquery":"(+DisjunctionMaxQuery(((metatag.description:Upendra)^9.0 | 
(h1:Upendra)^7.0 | (h2:Upendra)^6.0 | (h3:Upendra)^5.0 | (id:Upendra)^0.5 | 
(h4:Upendra)^4.0 | _text_:upendra)~0.99))/no_coord",
"parsedquery_toString":"+((metatag.description:Upendra)^9.0 | 
(h1:Upendra)^7.0 | (h2:Upendra)^6.0 | (h3:Upendra)^5.0 | (id:Upendra)^0.5 | 
(h4:Upendra)^4.0 | _text_:upendra)~0.99",
"explain":{
  
"http://localhost:4503/content/uhcdotcom/en/home/waysin/poc/upendra-custon.html":"\n0.84300333
 = max plus 0.99 times others of:\n  0.84300333 = weight(_text_:upendra in 0) 
[], result of:\n0.84300333 = score(doc=0,freq=6.0 = termFreq=6.0\n), 
product of:\n  0.44183275 = idf(docFreq=4, docCount=6)\n  1.9079694 = 
tfNorm, computed from:\n6.0 = termFreq=6.0\n1.2 = parameter 
k1\n0.75 = parameter b\n121.64 = avgFieldLength\n
83.591835 = fieldLength\n",
  
"http://localhost:4503/content/uhcdotcom/en/home/waysin/poc/upendra-custon/health-care.html":"\n0.7009616
 = max plus 0.99 times others of:\n  0.7009616 = weight(_text_:upendra in 3) 
[], result of:\n0.7009616 = score(doc=3,freq=2.0 = termFreq=2.0\n), product 
of:\n  0.44183275 = idf(docFreq=4, docCount=6)\n  1.5864862 = tfNorm, 
computed from:\n2.0 = termFreq=2.0\n1.2 = parameter k1\n
0.75 = parameter b\n121.64 = avgFieldLength\n64.0 = 
fieldLength\n",
  
"http://localhost:4503/content/uhcdotcom/en/home/waysin/poc/upendra-custon/healthcare-description.html":"\n0.5481102
 = max plus 0.99 times others of:\n  0.5481102 = weight(_text_:upendra in 4) 
[], result of:\n0.5481102 = score(doc=4,freq=1.0 = termFreq=1.0\n), product 
of:\n  0.44183275 = idf(docFreq=4, docCount=6)\n  1.2405376 = tfNorm, 
computed from:\n1.0 = termFreq=1.0\n1.2 = parameter k1\n
0.75 = parameter b\n121.64 = avgFieldLength\n64.0 = 
fieldLength\n"},
"QParser":"ExtendedDismaxQParser",
"altquerystring":null,
"boost_queries":null,


Full response

{
  "responseHeader":{
"status":0,
"QTime":13,
"params":{
  "mm":"100%",
  "q":"Upendra",
  "tie":"0.99",
  "defType":"edismax",
  "q.alt":"Upendra",
  "indent":"on",
  "qf":"metatag.description^9 h1^7 h2^6 h3^5 h4^4 _text_^1 id^0.5",
  "wt":"json",
  "debugQuery":"on",
  "_":"1462810987788"}},
  "response":{"numFound":3,"start":0,"maxScore":0.8430033,"docs":[
  {
"h2":["Looks like your browser is a little out-of-date."],
"h3":["Already a member?"],
"strtitle":"Upendra Custon",

"id":"http://localhost:4503/content/uhcdotcom/en/home/waysin/poc/upendra-custon.html";,
"tstamp":"2016-05-09T17:15:57.604Z",
"metatag.hideininternalsearch":[false],
"segment":[20160509224553],
"digest":["844296a63233b3e4089424fe1ec9d036"],
"boost":[1.4142135],
"lang":"en",
"_version_":1533871839698223104,
"host":"localhost",

"url":"http://localhost:4503/content/uhcdotcom/en/home/waysin/poc/upendra-custon.html";,
"score":0.8430033},
  {
"metatag.description":"test",
"h1":["Health care"],
"h2":["Looks like your browser is a little out-of-date."],
"h3":["Already a member?"],
"strtitle":"Upendra",

"id":"http://localhost:4503/content/uhcdotcom/en/home/waysin/poc/upendra-custon/health-care.html";,
"tstamp":"2016-05-09T17:15:57.838Z",
"metatag.hideininternalsearch":[false],
"metatag.topresultthumbnailalt":[","],
"segment":[20160509224553],
"digest":["dd4ef8879be2d4d3f28e24928e9b84c5"],
"boost":[1.4142135],
"lang":"en",
"metatag.keywords":",",
"_version_":1533871839731777536,
"host":"localhost",

"url":"http://localhost:4503/content/uhcdotcom/en/home/waysin/poc/upendra-custon/health-care.html";,
"score":0.7009616},
  {
"metatag.description":"Upendra decription testing",
"h1":["healthcare description"],
"h2":["Looks like your browser is a little out-of-date."],
"h3":["Already a member?"],
"strtitle":"healthcare description",

"id":"http://localhost:4503/content/uhcdotcom/en/home/waysin/poc/upendra-custon/healthcare-description.html";,
"tstamp":"2016-05-09T17:15:57.682Z",
"metatag.hideininternalsearch":[false],
"metatag.topresultthumbnailalt":[","],
"segment":[20160509224553],
"digest":["6262795db6aed05a5de7cc3cbe496401"],
"boost":[1.4142135],
"lang":"en",
"metatag.keywords":",",
"_version_":1533871839739117568,
"host":"localhost",

"url":"http://localhost:4503/c

Re: Query String Limit

2016-05-09 Thread Zheng Lin Edwin Yeo
Hi Prasanna

Would like to check, for the increasing of maxBooleanClauses to a large
values, if there are 50 collections in my core,  I will have to increase
that in the solrconfig.xml for all the 50 collections, before it will work?

Regards,
Edwin


On 6 May 2016 at 23:42, Erick Erickson  wrote:

> By the way, this is the use-case for the TermsQueryParser
> rather than a standard clause, see:
>
> https://cwiki.apache.org/confluence/display/solr/Other+Parsers#OtherParsers-TermsQueryParser
>
> I _think_ that this doesn't trip the maxBooleanClauses bits...
>
> Best,
> Erick
>
> On Fri, May 6, 2016 at 5:01 AM, Prasanna S. Dhakephalkar
>  wrote:
> > Hi,
> >
> > This got resolved. Needed to do 2 things
> >
> > 1. maxBooleanClauses needed to be set to large value from 1024 in
> solrconfig.xml for all cores.
> > 2. In jetty.xml file solr.jetty.request.header.size needed to be set to
> higher value from 8192
> >
> > Thanks all for giving pointers to come to a solution.
> >
> > Regards,
> >
> > Prasanna.
> >
> > -Original Message-
> > From: Susmit Shukla [mailto:shukla.sus...@gmail.com]
> > Sent: Thursday, May 5, 2016 11:31 AM
> > To: solr-user@lucene.apache.org
> > Subject: Re: Query String Limit
> >
> > Hi Prasanna,
> >
> > What is the exact number you set it to?
> > What error did you get on solr console and in the solr logs?
> > Did you reload the core/restarted solr after bumping up the solrconfig?
> >
> > Thanks,
> > Susmit
> >
> > On Wed, May 4, 2016 at 9:45 PM, Prasanna S. Dhakephalkar <
> prasann...@merajob.in> wrote:
> >
> >> Hi
> >>
> >> We had increased theto a large number, but it did
> >> not work
> >>
> >> Here is the query
> >>
> >>
> >> http://localhost:8983/solr/collection1/select?fq=record_id%3A(604929+5
> >> 04197+
> >>
> >>
> 500759+510957+624719+524081+544530+375687+494822+468221+553049+441998+495212
> >>
> >>
> +462613+623866+344379+462078+501936+189274+609976+587180+620273+479690+60601
> >>
> >>
> 8+487078+496314+497899+374231+486707+516582+74518+479684+1696152+1090711+396
> >>
> >>
> 784+377205+600603+539686+550483+436672+512228+1102968+600604+487699+612271+4
> >>
> >>
> 87978+433952+479846+492699+380838+412290+487086+515836+487957+525335+495426+
> >>
> >> 619724+49726+444558+67422+368749+630542+473638+613887+1679503+509367+1
> >> 619724+49726+444558+67422+368749+630542+473638+613887+1679503+509367+1
> >> 619724+49726+444558+67422+368749+630542+473638+613887+1679503+509367+0
> >> 619724+49726+444558+67422+368749+630542+473638+613887+1679503+509367+8
> >> 619724+49726+444558+67422+368749+630542+473638+613887+1679503+509367+2
> >> 619724+49726+444558+67422+368749+630542+473638+613887+1679503+509367+9
> >> 619724+49726+444558+67422+368749+630542+473638+613887+1679503+509367+9
> >>
> >>
> +498818+528683+530270+595087+468595+585998+487888+600612+515884+455568+60643
> >>
> >>
> 8+526281+497992+460147+587530+576456+526021+790508+486148+469160+365923+4846
> >>
> >>
> 54+510829+488792+610933+254610+632700+522376+594418+514817+439283+1676569+52
> >>
> >>
> 4031+431557+521628+609255+627205+1255921+57+477017+519675+548373+350309+
> >>
> >>
> 491176+524276+570935+549458+495765+512814+494722+382249+619036+477309+487718
> >>
> >>
> +470604+514622+1240902+570607+613830+519130+479708+630293+496994+623870+5706
> >>
> >>
> 72+390434+483496+609115+490875+443859+292168+522383+501802+606498+596773+479
> >>
> >>
> 881+486020+488654+490422+512636+495512+489480+626269+614618+498967+476988+47
> >>
> >>
> 7608+486568+270095+295480+478367+607120+583892+593474+494373+368030+484522+5
> >>
> >>
> 01183+432822+448109+553418+584084+614868+486206+481014+495027+501880+479113+
> >>
> >>
> 615208+488161+512278+597663+569409+139097+489490+584000+493619+607479+281080
> >>
> >>
> +518617+518803+487896+719003+584153+484341+505689+278177+539722+548001+62529
> >>
> >>
> 6+1676456+507566+619039+501882+530385+474125+293642+612857+568418+640839+519
> >>
> >>
> 893+524335+612859+618762+479460+479719+593700+573677+525991+610965+462087+52
> >>
> >>
> 1251+501197+443642+1684784+533972+510695+475499+490644+613829+613893+479467+
> >>
> >>
> 542478+1102898+499230+436921+458632+602303+488468+1684407+584373+494603+4992
> >>
> >>
> 45+548019+600436+606997+59+503156+440428+518759+535013+548023+494273+649
> >>
> >>
> 062+528704+469282+582249+511250+496466+497675+505937+489504+600444+614240+19
> >>
> >>
> 35577+464232+522398+613809+1206232+607149+607644+498059+506810+487115+550976
> >>
> >>
> +638174+600849+525655+625011+500082+606336+507156+487887+333601+457209+60111
> >>
> >>
> 0+494927+1712081+601280+486061+501558+600451+263864+527378+571918+472415+608
> >>
> >>
> 130+212386+380460+590400+478850+631886+486782+608013+613824+581767+527023+62
> >>
> >>
> 3207+607013+505819+485418+486786+537626+507047+92+527473+495520+553141+5
> >>
> >>
> 17837+497295+563266+495506+532725+267057+497321+453249+524341+429654+720001+
> >>
> >>
> 539946+490813+479491+479628+479630+1125985+351147+524296+565077+439949+61241
> >>
> >>
> 3+495854+479493

Re: Solr edismax field boosting

2016-05-09 Thread Nick D
You can add the debug flag to the end of the request and see exactly what
the scoring is and why things are happening.

&debug=ALL will show you everything including the scoring.

Showing the result of the debug query should help you, or adding that into
your question here, decipher what is going on with your scoring and how the
boosts are(n't) working.

Nick

On Mon, May 9, 2016 at 7:22 PM, Megha Bhandari 
wrote:

> Hi
>
> We are trying to boost certain fields with relevancy. However we are not
> getting results as per expectation. Below is the configuration in
> solr-config.xml.
> Even though the title field has a lesser boost than metatag.description
> results for title field are coming higher.
>
> We even created test data that have data only in description in
> metatag.description and title. Example , page 1 has foo in description and
> page 2 has foo in title. Solr is still returning page 2 before page 1.
>
> We are using Solr 5.5 and Nutch 1.11 currently.
>
> Following is the configuration we are using. Any ideas on what we are
> missing to enable correct field boosting?
>
> 
> 
>   
> metatag.keywords^10 metatag.description^9 title^8 h1^7 h2^6 h3^5
> h4^4 id _text_^1
>   
>   explicit
>   10
>
>   
>
>   explicit
>   _text_
>   default
>   on
>   false
>   10
>   5
>   5
>   false
>   true
>   10
>   5
> 
>   id title metatag.description itemtype
> lang metatag.hideininternalsearch metatag.topresultthumbnailalt
> metatag.topresultthumbnailurl playerid playerkey
>   on
>   0
>   title metatag.description
>   
>   
> 
> 
>   spellcheck
> elevator
> 
>   
>
> Thanks
> Megha
>


Solr edismax field boosting

2016-05-09 Thread Megha Bhandari
Hi

We are trying to boost certain fields with relevancy. However we are not 
getting results as per expectation. Below is the configuration in 
solr-config.xml.
Even though the title field has a lesser boost than metatag.description results 
for title field are coming higher.

We even created test data that have data only in description in 
metatag.description and title. Example , page 1 has foo in description and page 
2 has foo in title. Solr is still returning page 2 before page 1.

We are using Solr 5.5 and Nutch 1.11 currently.

Following is the configuration we are using. Any ideas on what we are missing 
to enable correct field boosting?



  
metatag.keywords^10 metatag.description^9 title^8 h1^7 h2^6 h3^5 h4^4 
id _text_^1
  
  explicit
  10

  

  explicit
  _text_
  default
  on
  false
  10
  5
  5
  false
  true
  10
  5

  id title metatag.description itemtype lang 
metatag.hideininternalsearch metatag.topresultthumbnailalt 
metatag.topresultthumbnailurl playerid playerkey
  on
  0
  title metatag.description
  
  


  spellcheck
elevator

  

Thanks
Megha


Re: solrcloud performance problem

2016-05-09 Thread Shawn Heisey
On 5/9/2016 4:41 PM, lltvw wrote:
> Shawn, thanks.
>
> Each machine with 48G memory installed,  and now with 20G free, I check JVM 
> heap size use solr admin UI, the heap size is about 20M.

What is the *max* heap?  An unmodified install of Solr 5.x or later has
a max heap of 512MB.

In the admin UI, there are three numbers for "JVM Memory".  The max heap
is the largest of the three numbers.

Thanks,
Shawn



Re: Streaming expressions join operations

2016-05-09 Thread Joel Bernstein
Hi,

The example in the cwiki would require setting up the people and pets
collections. Unless I'm mistaken this won't work with the out of the box
schemas. So you'll need to setup some test schemas to get started. Although
having out of the box streaming schemas is a great idea.

Joel Bernstein
http://joelsolr.blogspot.com/

On Mon, May 9, 2016 at 9:22 PM, Ryan Cutter  wrote:

> Hello, I'm checking out the cool stream join operations in Solr 6.0 but
> can't seem to the example listed on the wiki to work:
>
>
> https://cwiki.apache.org/confluence/display/solr/Streaming+Expressions#StreamingExpressions-innerJoin
>
> innerJoin(
>   search(people, q=*:*, fl="personId,name", sort="personId asc"),
>   search(pets, q=type:cat, fl="ownerId,petName", sort="ownerId asc"),
>   on="personId=ownerId"
> )
>
> ERROR - 2016-05-09 21:42:43.497; [c:pets s:shard1 r:core_node1
> x:pets_shard1_replica1] org.apache.solr.common.SolrException;
> java.io.IOException: java.lang.NullPointerException
>
> at
>
> org.apache.solr.client.solrj.io.stream.CloudSolrStream.constructStreams(CloudSolrStream.java:339)
>
> 1. Joel Bernstein pointed me at SOLR-9058.  Is this the likely bug?
> 2. What kind of field should personId and ownerId be?  long, string,
> something else?
> 3. Does someone have an example schema or dataset that show off these
> joins?  If not, it's something I could work on for future souls.
>
> Thanks! Ryan
>


building solr with an error UNRESOLVED DEPENDENCIES

2016-05-09 Thread ??????jwos
when i try to compile solr 5.4.0 with src, i met an error:


[ivy:retrieve]  report for org.apache.solr#core;working@zozt-PC test.MiniKdc 
produced in F:\solr-5.4.0-src 
(1)\solr-5.4.0\lucene\build\ivy-resolution-cache\org.apache.solr-core-test.MiniKdc.xml
[ivy:retrieve]  resolve done (1982ms resolve - 156ms download)
[ivy:retrieve]
[ivy:retrieve] :: problems summary ::
[ivy:retrieve]  WARNINGS
[ivy:retrieve]  ::
[ivy:retrieve]  ::  UNRESOLVED DEPENDENCIES ::
[ivy:retrieve]  ::
[ivy:retrieve]  :: 
org.apache.directory.server#apacheds-interceptors-admin;2.0.0-M15: 
configuration not found in 
org.apache.directory.server#apacheds-interceptors-admin;2.0.0-M15: 'master'. It 
was required from org.apache.solr#core;working@zozt-PC test.MiniKdc



i'm not familiar with ivy and couldn't figure out what's problem is ,anyone met 
this situation before?
Please give me some advice,thanks a lot

Re: Indexing xml documents using solrj 6.0 + solr 6.0

2016-05-09 Thread Abdel Belkasri
did you look at this:
https://cwiki.apache.org/confluence/display/solr/Using+SolrJ

Regards,
--Abdel.

On Mon, May 9, 2016 at 1:32 PM, Mat San  wrote:

> Hello,
>
> Could I ask please for urgent help since I'm new to solrj and solr. I've
> read all documentation but I did not find a full complete example in java
> how to index arbitrary xml documents and rich documents. (These documents
> are placed in a folder).
>
> Can somebody provide some examples please (Java code) ??
>
>
> Many thanks in advance,
>
> Matteo
>



-- 
Abdel K. Belkasri, PhD


Streaming expressions join operations

2016-05-09 Thread Ryan Cutter
Hello, I'm checking out the cool stream join operations in Solr 6.0 but
can't seem to the example listed on the wiki to work:

https://cwiki.apache.org/confluence/display/solr/Streaming+Expressions#StreamingExpressions-innerJoin

innerJoin(
  search(people, q=*:*, fl="personId,name", sort="personId asc"),
  search(pets, q=type:cat, fl="ownerId,petName", sort="ownerId asc"),
  on="personId=ownerId"
)

ERROR - 2016-05-09 21:42:43.497; [c:pets s:shard1 r:core_node1
x:pets_shard1_replica1] org.apache.solr.common.SolrException;
java.io.IOException: java.lang.NullPointerException

at
org.apache.solr.client.solrj.io.stream.CloudSolrStream.constructStreams(CloudSolrStream.java:339)

1. Joel Bernstein pointed me at SOLR-9058.  Is this the likely bug?
2. What kind of field should personId and ownerId be?  long, string,
something else?
3. Does someone have an example schema or dataset that show off these
joins?  If not, it's something I could work on for future souls.

Thanks! Ryan


Re:Re: solrcloud performance problem

2016-05-09 Thread lltvw
Shawn, thanks.

Each machine with 48G memory installed,  and now with 20G free, I check JVM 
heap size use solr admin UI, the heap size is about 20M.




--
发自我的网易邮箱手机智能版


在 2016-05-10 02:04:22,"Shawn Heisey"  写道:
>On 5/9/2016 10:52 AM, lltvw wrote:
>> Sorry, I missed the size of each shard, the size is about 3G each. Thanks.
>>
>> 在 2016-05-10 00:41:13,lltvw  写道:
>>> Recently we setup a 4.10 solrcloud  env with about 9000 doc indexed in 
>>> it,this solrcloud with 12 shards, each shard on one separate machine, but 
>>> when we try to search some infor on solrcloud, the response time is about 
>>> 300ms.
>
>How much RAM is installed in each of these servers, and what is the max
>heap size on the Solr instance?
>
>Best guess right now, with limited information, is that you will need to
>install more memory on each server, or possibly reduce the max heap size
>so there's more memory available to the OS for caching your index.
>
>Thanks,
>Shawn
>


Error on creating new collection with existing configs

2016-05-09 Thread Jay Potharaju
Hi,
I created a new config and uploaded it to zk with the name test_conf. And
then created a collection which uses this config.

CREATE COLLECTION:
/solr/admin/collections?action=CREATE&name=test2&numShards=1&replicationFactor=2&collection.configName=test_conf

 When indexing the data using DIH I get an error.

org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode

for /configs/test2/dataimport.properties


When I create the collection using command line and dont pass the
configname but just the confdir, DIH indexing works.

Using Solr 5.5

Am I missing something??

-- 
Thanks
Jay


Indexing xml documents using solrj 6.0 + solr 6.0

2016-05-09 Thread Mat San
Hello,

Could I ask please for urgent help since I'm new to solrj and solr. I've
read all documentation but I did not find a full complete example in java
how to index arbitrary xml documents and rich documents. (These documents
are placed in a folder).

Can somebody provide some examples please (Java code) ??


Many thanks in advance,

Matteo


Solr Grouping

2016-05-09 Thread Srinivas Mudam
Hi

How can i customize the group limit,

I have 5 groups, i want different limit for each group like 3,3,3,2,1.

Could please provide solution for this.


Thanks,
Srinivas Mudam.


Re: Filter queries & caching

2016-05-09 Thread Jay Potharaju
Thanks Ahmet...but I am not still clear how is adding filter() option
better or is it the same as filtercache?

My question is below.

"As mentioned above adding filter() will add the filter query to the cache.
This would mean that results are fetched from cache instead of running n
number of filter queries  in parallel.
Is it necessary to use the filter() option? I was under the impression that
all filter queries will get added to the "filtercache". What is the
advantage of using filter()?"

Thanks

On Sun, May 8, 2016 at 6:30 PM, Ahmet Arslan 
wrote:

> Hi,
>
> As I understand it useful incase you use an OR operator between two
> restricting clauses.
> Recall that multiple fq means implicit AND.
>
> ahmet
>
>
>
> On Monday, May 9, 2016 4:02 AM, Jay Potharaju 
> wrote:
> As mentioned above adding filter() will add the filter query to the cache.
> This would mean that results are fetched from cache instead of running n
> number of filter queries  in parallel.
> Is it necessary to use the filter() option? I was under the impression that
> all filter queries will get added to the "filtercache". What is the
> advantage of using filter()?
>
> *From
> doc:
> https://cwiki.apache.org/confluence/display/solr/Query+Settings+in+SolrConfig
> <
> https://cwiki.apache.org/confluence/display/solr/Query+Settings+in+SolrConfig
> >*
> This cache is used by SolrIndexSearcher for filters (DocSets) for unordered
> sets of all documents that match a query. The numeric attributes control
> the number of entries in the cache.
> Solr uses the filterCache to cache results of queries that use the fq
> search parameter. Subsequent queries using the same parameter setting
> result in cache hits and rapid returns of results. See Searching for a
> detailed discussion of the fq parameter.
>
> *From Yonik's site: http://yonik.com/solr/query-syntax/#FilterQuery
> *
>
> (Since Solr 5.4)
>
> A filter query retrieves a set of documents matching a query from the
> filter cache. Since scores are not cached, all documents that match the
> filter produce the same score (0 by default). Cached filters will be
> extremely fast when they are used again in another query.
>
>
> Thanks
>
>
> On Fri, May 6, 2016 at 9:46 AM, Jay Potharaju 
> wrote:
>
> > We have high query load and considering that I think the suggestions made
> > above will help with performance.
> > Thanks
> > Jay
> >
> > On Fri, May 6, 2016 at 7:26 AM, Shawn Heisey 
> wrote:
> >
> >> On 5/6/2016 7:19 AM, Shawn Heisey wrote:
> >> > With three separate
> >> > fq parameters, you'll get three cache entries in filterCache from the
> >> > one query.
> >>
> >> One more tidbit of information related to this:
> >>
> >> When you have multiple filters and they aren't cached, I am reasonably
> >> certain that they run in parallel.  Instead of one complex filter, you
> >> would have three simple filters running simultaneously.  For low to
> >> medium query loads on a server with a whole bunch of CPUs, where there
> >> is plenty of spare CPU power, this can be a real gain in performance ...
> >> but if the query load is really high, it might be a bad thing.
> >>
> >> Thanks,
> >> Shawn
> >>
> >>
> >
> >
> > --
> > Thanks
> > Jay Potharaju
>
> >
> >
>
>
>
> --
> Thanks
> Jay Potharaju
>



-- 
Thanks
Jay Potharaju


Re: Re-ranking query: issue with sort criteria and how to disable it

2016-05-09 Thread Andrea Gazzarini

Hi Joel,

just created [1] a new issue for that.

Many thanks again

Andrea

[1] https://issues.apache.org/jira/browse/SOLR-9095


On 06/05/16 20:21, Joel Bernstein wrote:

Maybe one ticket would work. Something like: "ReRanker should gracefully
handle sorts without score". Then you can describe the two scenarios. It
might be that these problems are tackled outside of the
ReRankQParserPlugin. Possibly the QueryComponent could add some logic that
would tack on the secondary score sort or remove the reRanker.

Joel Bernstein
http://joelsolr.blogspot.com/

On Fri, May 6, 2016 at 1:39 PM, Andrea Gazzarini  wrote:


Hi Joel,
many thanks for the response and sorry for this late reply.

About the first question, I can open a JIRA for that. Instead, for
disabling the component I think it would be useful to add

- an automatic behaviour: if the sort criteria excludes the score the
re-ranking could be automatically excluded
- a parameter / flag (something like *rr=true*) which enables / disables
the reranking. In this way such behaviour could be also driven on the
client side

What do you think? I guess this should be another JIRA

Best,
Andrea


On Fri, May 6, 2016 at 3:32 PM, Joel Bernstein  wrote:


I would consider the NPE when sort by score is not included a bug. There

is

the work around, that you mentioned, which is to have a compound sort

which

includes score.

The second issue though of disabling the ReRanker when someone doesn't
include a sort by score, would be a new feature of the ReRanker. I think
it's a good idea but it's not implemented yet.

I'm not sure if anyone has any ideas about conditionally adding the
ReRanker using configurations?

Joel Bernstein
http://joelsolr.blogspot.com/

On Fri, May 6, 2016 at 4:10 AM, Andrea Gazzarini 

wrote:

Hi guys,
I have a Solr 4.10.4 instance with a RequestHandler that has a

re-ranking

query configured like this:


 dismax
 ...
 {!boost b=someFunction() v=$q}
 {!rerank reRankQuery=$rqq reRankDocs=60
reRankWeight=1.2}
 score desc


Everything is working until the client sends a sort params that doesn't
include the score field. So if for example the request contains

"sort=price

asc" then a NullPointerException is thrown:
/
//09:46:08,548 ERROR [org.apache.solr.core.SolrCore]
java.lang.NullPointerException//
//[INFO] [talledLocalContainer] at


org.apache.lucene.search.TopFieldCollector$OneComparatorScoringMaxScoreCollector.collect(TopFieldCollector.java:291)//

//[INFO] [talledLocalContainer] at


org.apache.solr.search.ReRankQParserPlugin$ReRankCollector.collect(ReRankQParserPlugin.java:263)//

//[INFO] [talledLocalContainer] at


org.apache.solr.search.SolrIndexSearcher.sortDocSet(SolrIndexSearcher.java:1999)//

//[INFO] [talledLocalContainer] at


org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:1423)//

//[INFO] [talledLocalContainer] at


org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:514)//

//[INFO] [talledLocalContainer] at


org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:484)//

//[INFO] [talledLocalContainer] at


org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:218)//

//[INFO] [talledLocalContainer] at


org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)

/The only way to avoid this exception is to _explicitly_ add th/e

"score

desc" /value to the incoming field (i.e. sort=price asc, score desc).

In

this way I get no exception. I said "explicitly" because adding an
"appends" section in my handler


 score desc


Even I don't know if that could solve my problem, in practice it is
completely ignoring (i.e. I'm still getting the NPE above).
However, when I explicitly add "sort=price asc, score desc", as
consequence of the re-ranking, the top 60 results, although I said to

Solr

"order by price", are still shuffled and that's not what I want.

On top of that I have two questions:

  * Any idea about the exception above?
  * How can I disable the re-ranking query in case the order is not by
score?

About the second question, I'm thinking to the following solutions, but
I'm not sure if there's a better way to do that.

1. Create another request handler, which is basically a clone of the
handler above but without the re-ranking stuff
2. Use local params for the reRankDocs...


 dismax
 ...
 {!boost b=someFunction() v=$q}
 {!rerank reRankQuery=$rqq reRankDocs=*$rrd*
reRankWeight=1.2}
*60*
 score desc


...and have (in case of sorting by something different from the score)

the

client sending an additional params "rdd=0". This is working but I

still

need to explicitly declare "sort=price asc, score desc"

Any thoughts?

Best,
Andrea






Re: Replicate Between sites

2016-05-09 Thread Abdel Belkasri
Hi Alex,

just started reading about CDCR, looks very promissing. Is this only in
6.0? our PROD server are running 4.9.1 and we cannot upgrade just yet. How
similar thing could be done in 4.9.1?

Thanks,
--Abdel

On Mon, May 9, 2016 at 2:59 PM, Alexandre Rafalovitch 
wrote:

> Have you looked at Cross Data Center replication that's the new big
> feature in Solr 6.0?
>
> Regards,
>Alex.
> 
> Newsletter and resources for Solr beginners and intermediates:
> http://www.solr-start.com/
>
>
> On 10 May 2016 at 02:13, Abdel Belkasri  wrote:
> > Hi there,
> >
> > we have the main site setup as follows:
> > solrCould:
> > App --> smart Client (solrj) --> ensemble of zookeeper --> SolrCloud Noes
> > (with slice/shard/recplica)
> > Works fine.
> >
> > On the DR site we have a mirror setup, how can we keep the two site in
> > sync, so that if something happened we point the app to DR and get back
> up
> > and running?
> >
> > Note: making zookeeper span the two sites is not an option because of
> > network latency.
> >
> > We are looking for replication (kind of master-slave that exists in Solr
> > classic)...how that is achieved in SolrCloud?
> >
> > Thanks,
> > --Abdel.
>



-- 
Abdel K. Belkasri, PhD


auto purge for embedded zookeeper

2016-05-09 Thread tedsolr
I have a development environment that is using an embedded zookeeper, and the
zoo_data folder continues to grow. It's filled with snapshot files that are
not getting purged. zoo.cfg has properties
autopurge.snapRetainCount=10
autopurge.purgeInterval=1
Perhaps it's not in the correct location so its not getting read? Or maybe
these props don't apply for embedded instances?

Anyone know? Thanks!
v5.2.1



--
View this message in context: 
http://lucene.472066.n3.nabble.com/auto-purge-for-embedded-zookeeper-tp4275561.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Replicate Between sites

2016-05-09 Thread Alexandre Rafalovitch
Have you looked at Cross Data Center replication that's the new big
feature in Solr 6.0?

Regards,
   Alex.

Newsletter and resources for Solr beginners and intermediates:
http://www.solr-start.com/


On 10 May 2016 at 02:13, Abdel Belkasri  wrote:
> Hi there,
>
> we have the main site setup as follows:
> solrCould:
> App --> smart Client (solrj) --> ensemble of zookeeper --> SolrCloud Noes
> (with slice/shard/recplica)
> Works fine.
>
> On the DR site we have a mirror setup, how can we keep the two site in
> sync, so that if something happened we point the app to DR and get back up
> and running?
>
> Note: making zookeeper span the two sites is not an option because of
> network latency.
>
> We are looking for replication (kind of master-slave that exists in Solr
> classic)...how that is achieved in SolrCloud?
>
> Thanks,
> --Abdel.


Re: solrcloud performance problem

2016-05-09 Thread Shawn Heisey
On 5/9/2016 10:52 AM, lltvw wrote:
> Sorry, I missed the size of each shard, the size is about 3G each. Thanks.
>
> 在 2016-05-10 00:41:13,lltvw  写道:
>> Recently we setup a 4.10 solrcloud  env with about 9000 doc indexed in 
>> it,this solrcloud with 12 shards, each shard on one separate machine, but 
>> when we try to search some infor on solrcloud, the response time is about 
>> 300ms.

How much RAM is installed in each of these servers, and what is the max
heap size on the Solr instance?

Best guess right now, with limited information, is that you will need to
install more memory on each server, or possibly reduce the max heap size
so there's more memory available to the OS for caching your index.

Thanks,
Shawn



Re: Solr 5.x bug with Service installation script?

2016-05-09 Thread A Laxmi
yes, I always shutdown both source and destination Solr before copying the
index over from one to another. Somehow the write.lock only happens when
Solr restarts from service script. If loads just fine when started manually.

On Mon, May 9, 2016 at 1:20 PM, Abdel Belkasri  wrote:

> Did you copy the core while solr is running? if yes, first shuown source
> and destination solr, copy intex to the other solr, then restat solr nodes.
> Lock files get written to the core while solr is running and doing indexing
> or searching, etc.
>
> On Mon, May 9, 2016 at 12:38 PM, A Laxmi  wrote:
>
> > Hi,
> >
> > I have installed Solr 5.3.1 using the Service Installation Script. I was
> > able to successfully start and stop Solr using service solr start/stop
> > commands and Solr loads up just fine.
> >
> > However, when I stop Solr service and copy an index of a core from one
> > server to another with same exact version of Solr and its corresponding
> > conf and restart the service, it complains about write.lock file when
> none
> > exists under the path that it specifies in the log.
> >
> > To validate whether the issue is with the data that is being copied or
> the
> > service script itself, I copied the collection directory with new index
> > into example-DIH directory and restarted Solr manually bin/solr start -e
> > dih -m 2g, it worked without any error. So, atleast this validates that
> > collection data is just fine and service script is creating a lock
> > everytime a new index is copied from another server though it has the
> same
> > exact Solr version.
> >
> > Did anyone experience the same? Any thoughts if this is a bug?
> >
> > Thanks!
> > AL
> >
>
>
>
> --
> Abdel K. Belkasri, PhD
>


Re: Solr 5.x bug with Service installation script?

2016-05-09 Thread Abdel Belkasri
Did you copy the core while solr is running? if yes, first shuown source
and destination solr, copy intex to the other solr, then restat solr nodes.
Lock files get written to the core while solr is running and doing indexing
or searching, etc.

On Mon, May 9, 2016 at 12:38 PM, A Laxmi  wrote:

> Hi,
>
> I have installed Solr 5.3.1 using the Service Installation Script. I was
> able to successfully start and stop Solr using service solr start/stop
> commands and Solr loads up just fine.
>
> However, when I stop Solr service and copy an index of a core from one
> server to another with same exact version of Solr and its corresponding
> conf and restart the service, it complains about write.lock file when none
> exists under the path that it specifies in the log.
>
> To validate whether the issue is with the data that is being copied or the
> service script itself, I copied the collection directory with new index
> into example-DIH directory and restarted Solr manually bin/solr start -e
> dih -m 2g, it worked without any error. So, atleast this validates that
> collection data is just fine and service script is creating a lock
> everytime a new index is copied from another server though it has the same
> exact Solr version.
>
> Did anyone experience the same? Any thoughts if this is a bug?
>
> Thanks!
> AL
>



-- 
Abdel K. Belkasri, PhD


Re:solrcloud performance problem

2016-05-09 Thread lltvw
Sorry, I missed the size of each shard, the size is about 3G each. Thanks.




-


在 2016-05-10 00:41:13,lltvw  写道:
>
>Hi all,
>
>Recently we setup a 4.10 solrcloud  env with about 9000 doc indexed in 
>it,this solrcloud with 12 shards, each shard on one separate machine, but when 
>we try to search some infor on solrcloud, the response time is about 300ms.
>
>Seems that the performance is not good, please advice how to tunning. Thanks 
>very much.
>
>
>
>--
>发自我的网易邮箱手机智能版


solrcloud performance problem

2016-05-09 Thread lltvw

Hi all,

Recently we setup a 4.10 solrcloud  env with about 9000 doc indexed in 
it,this solrcloud with 12 shards, each shard on one separate machine, but when 
we try to search some infor on solrcloud, the response time is about 300ms.

Seems that the performance is not good, please advice how to tunning. Thanks 
very much.



--
发自我的网易邮箱手机智能版

Solr 5.x bug with Service installation script?

2016-05-09 Thread A Laxmi
Hi,

I have installed Solr 5.3.1 using the Service Installation Script. I was
able to successfully start and stop Solr using service solr start/stop
commands and Solr loads up just fine.

However, when I stop Solr service and copy an index of a core from one
server to another with same exact version of Solr and its corresponding
conf and restart the service, it complains about write.lock file when none
exists under the path that it specifies in the log.

To validate whether the issue is with the data that is being copied or the
service script itself, I copied the collection directory with new index
into example-DIH directory and restarted Solr manually bin/solr start -e
dih -m 2g, it worked without any error. So, atleast this validates that
collection data is just fine and service script is creating a lock
everytime a new index is copied from another server though it has the same
exact Solr version.

Did anyone experience the same? Any thoughts if this is a bug?

Thanks!
AL


Replicate Between sites

2016-05-09 Thread Abdel Belkasri
Hi there,

we have the main site setup as follows:
solrCould:
App --> smart Client (solrj) --> ensemble of zookeeper --> SolrCloud Noes
(with slice/shard/recplica)
Works fine.

On the DR site we have a mirror setup, how can we keep the two site in
sync, so that if something happened we point the app to DR and get back up
and running?

Note: making zookeeper span the two sites is not an option because of
network latency.

We are looking for replication (kind of master-slave that exists in Solr
classic)...how that is achieved in SolrCloud?

Thanks,
--Abdel.


Re: Solr 5.4.1 Mergeindexes duplicate rows

2016-05-09 Thread Kalpana
Hello

Can anyone help me with a merge. Currently I have the two cores already
pulling data from SQL Table based on the query I set up.

Solr is running

I also have a third core set up with schema similar to the first two. and
then I wrote this in the url and hit enter 
http://localhost:8983/solr/admin/cores?action=mergeindexes&core=Sitecore_SharePoint&srcCore=sitecore_web_index&srcCore=SharePoint_All

I stop and start Solr and I see data with duplicates.

Am I doing this right? 

Thanks



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-5-4-1-Mergeindexes-duplicate-rows-tp4275153p4275491.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr 6 / Solrj RuntimeException: First tuple is not a metadata tuple

2016-05-09 Thread Joel Bernstein
Yes, /sql has been in trunk / master for a long time (more then 6 months).
But it was not released in the 5x branch because it requires Java 8.

I'm wondering what the issue is with the server setup that you have that is
throwing the NPE.

Joel Bernstein
http://joelsolr.blogspot.com/

On Mon, May 9, 2016 at 4:21 AM, deniz  wrote:

> was able to get "gettingstarted" example running with sql, on my local only
> with a single zk...
>
> still not sure why the core/collection i have tried, didnt work till now...
>
> thanks a lot for pointing out for the version related issues, it made me
> change my focus from client to server side :)
>
>
>
> -
> Zeki ama calismiyor... Calissa yapar...
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Solr-6-Solrj-RuntimeException-First-tuple-is-not-a-metadata-tuple-tp4274451p4275447.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: CloudSolrStream returns only top 30 results.

2016-05-09 Thread Joel Bernstein
Yes, you need to specify the /export request handler using the following
named paramter: qt="/export"

This is described in the documentation in a number of places.
https://cwiki.apache.org/confluence/display/solr/Streaming+Expressions

By default CloudSolrStream uses the /select handler and uses the default
rows and start.



Joel Bernstein
http://joelsolr.blogspot.com/

On Mon, May 9, 2016 at 7:16 AM, Roshan Kamble <
roshan.kam...@smartstreamrdu.com> wrote:

> Hello,
>
> I have a plan to use streaming API for join queries.
>
> But it has been observed that CloudSolrStream returns only top 30 matching
> records.
>
>
> Is there any configuration which needs to be done to retrieve ALL records
> or to have pagination like provision to specify start or rows attributes?
>
>
> Regards,
> Roshan
> 
> The information in this email is confidential and may be legally
> privileged. It is intended solely for the addressee. Access to this email
> by anyone else is unauthorised. If you are not the intended recipient, any
> disclosure, copying, distribution or any action taken or omitted to be
> taken in reliance on it, is prohibited and may be unlawful.
>


Re: Solr re-indexing in case of store=false

2016-05-09 Thread Ali Nazemian
Dear Erick,
Hi,
Thank you very much. About the storing part you are right, unless the
primary datastore uses some kind of data compression which in my case it
does (I am using Cassandra as a primary datastore), and I am not sure about
Solr that it has any kind of compression or not.
According to your reply, it seems that I have to do that in a hard way.  I
mean using the primary datastore to build the index from scratch.

Sincerely,

On Sun, May 8, 2016 at 11:07 PM, Erick Erickson 
wrote:

> bq: I would be grateful if somebody could introduce other way of
> re-indexing
> the whole data without using another datastore
>
> Not possible currently. Consider what's _in_ the index when stored="false".
> The actual terms are the output of the entire analysis chain, including
> stemming, stopword removal, synonym substitution etc. Since the
> indexing process is lossy, you simply cannot reconstruct the original
> stream from the indexed terms.
>
> I suppose one _could_ do this in the case of docValues only index with
> the new return-values-from-docvalues functionality, but even that's lossy
> because the order of returned values may not be the original insertion
> order. And if that suits your needs, a pretty simple driver program would
> suffice.
>
> To do this from indexed-only terms you'd have to somehow store the
> original version of each term or store some codes indicating exactly
> how to reconstruct the original steam, which very possibly would take
> up as much space as if you'd just stored the values anyway. _And_ it
> would burden every one else who didn't want to do this with a bloated
> index.
>
> Best,
> Erick
>
> On Sun, May 8, 2016 at 4:25 AM, Ali Nazemian 
> wrote:
> > Dear all,
> > Hi,
> > I was wondering, is it possible to re-index Solr 6.0 data in case of
> > store=false? I am using Solr as a secondary datastore, and for the sake
> of
> > space efficiency all the fields (except id) are considered as
> store=false.
> > Currently, due to some changes in application business, Solr schema
> should
> > change, and in order to see the effect of changing schema on old data, I
> > have to do the re-index process.  I know that one way of re-indexing in
> > Solr is reading data from one collection (core) and inserting that to
> > another one, but this solution is not possible for store=false fields,
> and
> > re-indexing the whole data through primary datastore is kind of costly,
> so
> > I would be grateful if somebody could introduce other way of re-indexing
> > the whole data without using another datastore.
> >
> > Sincerely,
> >
> > --
> > A.Nazemian
>



-- 
A.Nazemian


Re: Re: Re: MoreLikeThis Component - how to get fields of documents

2016-05-09 Thread Dr. Jan Frederik Maas

Hey Alessandro,

it seems that the edismax MLThandler is not able to work correctly with 
a solr cloud/sharding:


https://issues.apache.org/jira/browse/SOLR-4414

Using the MLThandler we got randomly a response for only very few 
requests, while the MLTcomponent works fine (except for the problem I 
encountered...)


Best wishes,
Jan


Am 09.05.2016 um 12:01 schrieb Alessandro Benedetti:

Hey Jan, any reason you are using the MLTComponent, instead of a specific
Request Handler with the MLT query parser ?


 
 
 Entities Similarity
 edismax
 5
 *,score
 explicit
 json
 {!mlt qf=field1,field2,field3,field4 mintf=1
mindf=5 maxqt=50 v=$entity_id}
 
 0
 



This should solve easily your problems,

Cheers


On Mon, May 9, 2016 at 9:58 AM, Dr. Jan Frederik Maas <
jan.m...@sub.uni-hamburg.de> wrote:


Hi Edwin,

thanks for your reply - we currently use 5.0.0

Best wishes,
Jan


Am 05.05.2016 um 05:26 schrieb Zheng Lin Edwin Yeo:


Hi Jan,

Which version of Solr are you using?

Regards,
Edwin


On 26 April 2016 at 23:46, Dr. Jan Frederik Maas <
jan.m...@sub.uni-hamburg.de> wrote:

Hello,

I want to use the moreLikeThis Component to get similar documents from a
sharded SOLR. This works quite well except for the fact that the
documents
in the moreLikeThis-list only contain the id/unique key of the documents.

Is it possible to get the other fields? I can of course do another query
for the given IDs, but this would be complicated and very slow.

For example:



http://mysolrsystem/?q=id:524507260&mlt=true&mlt.fl=title&mlt.mintf=0&indent=true&mlt.match.include=true&fl=title,id,topic

creates

(...)


646199803
613210832
562239472
819200034
539877271

(...)

I tried to modify the fl-parameter, but I can only switch the ID-field in
the moreLikeThis-Documents on and off (the latter resulting in empty
document tags). In the result list however, the fl-fields are shown as
specified.

I would be very grateful for help.

Best wishes,
Jan Maas



--
Dr.-Ing. Jan Frederik Maas
Staats- und Universitaetsbibliothek Hamburg Carl von Ossietzky
IuK-Technik / Digitale Bibliothek / Entwicklung
Fachreferent für Informatik, Technik
Von-Melle-Park 3, 20146 Hamburg
Telefon (040) 42838-6674 | Fax (040) 41345070
Mail: m...@sub.uni-hamburg.de
http://www.sub.uni-hamburg.de







--
Dr.-Ing. Jan Frederik Maas
Staats- und Universitaetsbibliothek Hamburg Carl von Ossietzky
IuK-Technik / Digitale Bibliothek / Entwicklung
Fachreferent für Informatik, Technik
Von-Melle-Park 3, 20146 Hamburg
Telefon (040) 42838-6674 | Fax (040) 41345070
Mail: m...@sub.uni-hamburg.de
http://www.sub.uni-hamburg.de



CloudSolrStream returns only top 30 results.

2016-05-09 Thread Roshan Kamble
Hello,

I have a plan to use streaming API for join queries.

But it has been observed that CloudSolrStream returns only top 30 matching 
records.


Is there any configuration which needs to be done to retrieve ALL records or to 
have pagination like provision to specify start or rows attributes?


Regards,
Roshan

The information in this email is confidential and may be legally privileged. It 
is intended solely for the addressee. Access to this email by anyone else is 
unauthorised. If you are not the intended recipient, any disclosure, copying, 
distribution or any action taken or omitted to be taken in reliance on it, is 
prohibited and may be unlawful.


RE: Nodes appear twice in state.json

2016-05-09 Thread Markus Jelsma
Issue created!
https://issues.apache.org/jira/browse/SOLR-9089
 
-Original message-
> From:Shalin Shekhar Mangar 
> Sent: Thursday 5th May 2016 12:58
> To: solr-user@lucene.apache.org
> Subject: Re: Nodes appear twice in state.json
> 
> Hmm not good. Definitely a new bug. Please open an issue.
> 
> Please look up the core node name in core.properties for that particular
> core and remove the other one from state.json manually. Probably best to do
> a cluster restart to avoid surprises. This is certainly uncharted territory.
> 
> On Wed, May 4, 2016 at 6:54 PM, Markus Jelsma 
> wrote:
> 
> > Hi - we've just upgraded a development environment from 5.5 to Solr 6.0.
> > After the upgrade, which went fine, we see two replica's appear twice in
> > the cloud view (see below), both being leader. We've seen this happen
> > before on some older 5.x versions. Is there a Jira issue i am missing? An
> > unknown issue?
> >
> > Also, how to fix this. How do we remove the double node from the
> > state.json?
> >
> > Many thanks!
> > Markus
> >
> > {"search":{
> > "replicationFactor":"3",
> > "shards":{
> >   "shard1":{
> > "range":"8000-",
> > "state":"active",
> > "replicas":{
> >   "core_node6":{
> > "core":"search_shard1_replica1",
> > "base_url":"http://idx5.oi.dev:8983/solr";,
> > "node_name":"idx5.oi.dev:8983_solr",
> > "state":"down"},
> >   "core_node2":{
> > "core":"search_shard1_replica2",
> > "base_url":"http://idx2.oi.dev:8983/solr";,
> > "node_name":"idx2.oi.dev:8983_solr",
> > "state":"active",
> > "leader":"true"},
> >   "core_node3":{
> > "core":"search_shard1_replica2",
> > "base_url":"http://idx2.oi.dev:8983/solr";,
> > "node_name":"idx2.oi.dev:8983_solr",
> > "state":"down",
> > "leader":"true"},
> >   "core_node5":{
> > "core":"search_shard1_replica3",
> > "base_url":"http://idx3.oi.dev:8983/solr";,
> > "node_name":"idx3.oi.dev:8983_solr",
> > "state":"down"}}},
> >   "shard2":{
> > "range":"0-7fff",
> > "state":"active",
> > "replicas":{
> >   "core_node1":{
> > "core":"search_shard2_replica1",
> > "base_url":"http://idx4.oi.dev:8983/solr";,
> > "node_name":"idx4.oi.dev:8983_solr",
> > "state":"down"},
> >   "core_node2":{
> > "core":"search_shard2_replica2",
> > "base_url":"http://idx6.oi.dev:8983/solr";,
> > "node_name":"idx6.oi.dev:8983_solr",
> > "state":"down"},
> >   "core_node4":{
> > "core":"search_shard2_replica3",
> > "base_url":"http://idx1.oi.dev:8983/solr";,
> > "node_name":"idx1.oi.dev:8983_solr",
> > "state":"active",
> > "leader":"true",
> > "router":{"name":"compositeId"},
> > "maxShardsPerNode":"1",
> > "autoAddReplicas":"false"}}
> >
> >
> >
> >
> 
> 
> -- 
> Regards,
> Shalin Shekhar Mangar.
> 


Re: Re: MoreLikeThis Component - how to get fields of documents

2016-05-09 Thread Alessandro Benedetti
Hey Jan, any reason you are using the MLTComponent, instead of a specific
Request Handler with the MLT query parser ?




Entities Similarity
edismax
5
*,score
explicit
json
{!mlt qf=field1,field2,field3,field4 mintf=1
mindf=5 maxqt=50 v=$entity_id}

0




This should solve easily your problems,

Cheers


On Mon, May 9, 2016 at 9:58 AM, Dr. Jan Frederik Maas <
jan.m...@sub.uni-hamburg.de> wrote:

> Hi Edwin,
>
> thanks for your reply - we currently use 5.0.0
>
> Best wishes,
> Jan
>
>
> Am 05.05.2016 um 05:26 schrieb Zheng Lin Edwin Yeo:
>
>> Hi Jan,
>>
>> Which version of Solr are you using?
>>
>> Regards,
>> Edwin
>>
>>
>> On 26 April 2016 at 23:46, Dr. Jan Frederik Maas <
>> jan.m...@sub.uni-hamburg.de> wrote:
>>
>> Hello,
>>>
>>> I want to use the moreLikeThis Component to get similar documents from a
>>> sharded SOLR. This works quite well except for the fact that the
>>> documents
>>> in the moreLikeThis-list only contain the id/unique key of the documents.
>>>
>>> Is it possible to get the other fields? I can of course do another query
>>> for the given IDs, but this would be complicated and very slow.
>>>
>>> For example:
>>>
>>>
>>>
>>> http://mysolrsystem/?q=id:524507260&mlt=true&mlt.fl=title&mlt.mintf=0&indent=true&mlt.match.include=true&fl=title,id,topic
>>>
>>> creates
>>>
>>> (...)
>>> 
>>> 
>>> 646199803
>>> 613210832
>>> 562239472
>>> 819200034
>>> 539877271
>>> 
>>> (...)
>>>
>>> I tried to modify the fl-parameter, but I can only switch the ID-field in
>>> the moreLikeThis-Documents on and off (the latter resulting in empty
>>> document tags). In the result list however, the fl-fields are shown as
>>> specified.
>>>
>>> I would be very grateful for help.
>>>
>>> Best wishes,
>>> Jan Maas
>>>
>>>
>
> --
> Dr.-Ing. Jan Frederik Maas
> Staats- und Universitaetsbibliothek Hamburg Carl von Ossietzky
> IuK-Technik / Digitale Bibliothek / Entwicklung
> Fachreferent für Informatik, Technik
> Von-Melle-Park 3, 20146 Hamburg
> Telefon (040) 42838-6674 | Fax (040) 41345070
> Mail: m...@sub.uni-hamburg.de
> http://www.sub.uni-hamburg.de
>
>


-- 
--

Benedetti Alessandro
Visiting card : http://about.me/alessandro_benedetti

"Tyger, tyger burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symmetry?"

William Blake - Songs of Experience -1794 England


Re: Facet ignoring repeated word

2016-05-09 Thread Ahmet Arslan
Hi,


I understand the word cloud part. 
It looks like you want to use within-resultList term frequency information.In 
your first mail, I thought you want within-document term frequency.

TermsComponent reports within-collection term frequency.

I am not sure how to retrieve within-resultList term frequency.
Traversing the result list and collecting term vector data seems plausible.

Ahmet

 



On Monday, May 9, 2016 11:55 AM, "G, Rajesh"  wrote:
Hi Ahmet,

Please let me know if I am not clear

Thanks
Rajesh



CEB India Private Limited. Registration No: U741040HR2004PTC035324. Registered 
office: 6th Floor, Tower B, DLF Building No.10 DLF Cyber City, Gurgaon, 
Haryana-122002, India.

This e-mail and/or its attachments are intended only for the use of the 
addressee(s) and may contain confidential and legally privileged information 
belonging to CEB and/or its subsidiaries, including SHL. If you have received 
this e-mail in error, please notify the sender and immediately, destroy all 
copies of this email and its attachments. The publication, copying, in whole or 
in part, or use or dissemination in any other way of this e-mail and 
attachments by anyone other than the intended person(s) is prohibited.

-Original Message-
From: G, Rajesh [mailto:r...@cebglobal.com]
Sent: Friday, May 6, 2016 1:08 PM
To: Ahmet Arslan ; solr-user@lucene.apache.org
Subject: RE: Facet ignoring repeated word

Hi Ahmet,



Sorry it is Word Cloud  
https://urldefense.proofpoint.com/v2/url?u=https-3A__www.google.co.uk_webhp-3Fsourceid-3Dchrome-2Dinstant-26ion-3D1-26espv-3D2-26ie-3DUTF-2D8-23newwindow-3D1-26q-3Dword-2Bcloud&d=CwIGaQ&c=zzHkMf6HMoOvCB4yTPe0Gg&r=05YCVYE-IrDXcnbr1V8J9Q&m=k-w03YA11ltRmGgXa55Yx2gs1Jk1QowoFIE32lm9QMU&s=X_BPC_BR1vgdcijmmd50zYBOnIP97BfPfS2H7MxC9V4&e=




We have comments from survey. We want to build word cloud using the filed 
comments



e.g For question 1 the comments are



Comment 1.Projects, technology, features, performance

Comment 2.Too many projects and technology, not enough people to run 
projects



I want to run a query for question 1 that will produce the below result



projects: 3

technology:2

features:1

performance:1

Too:1

Many:1

Enough:1

People:1

Run:1





Facet produces the result but ignores repeated words in a document[projects 
count will be 2 instead of 3].



projects: 2

technology:2

features:1

performance:1

Too:1

Many:1

Enough:1

People:1

Run:1



TeamVectorComponent produces the result as expected but they are not grouped by 
words, instead they are grouped by id.





1





1











2





2









I wanted to know if it is possible to produce a result that is grouped by word 
and also does not ignore repeated words in a document. If it is not possible 
then I have to write some script that will take the above result from solr 
group words and sum the count



Thanks

Rajesh









CEB India Private Limited. Registration No: U741040HR2004PTC035324. Registered 
office: 6th Floor, Tower B, DLF Building No.10 DLF Cyber City, Gurgaon, 
Haryana-122002, India.



This e-mail and/or its attachments are intended only for the use of the 
addressee(s) and may contain confidential and legally privileged information 
belonging to CEB and/or its subsidiaries, including SHL. If you have received 
this e-mail in error, please notify the sender and immediately, destroy all 
copies of this email and its attachments. The publication, copying, in whole or 
in part, or use or dissemination in any other way of this e-mail and 
attachments by anyone other than the intended person(s) is prohibited.



-Original Message-

From: Ahmet Arslan [mailto:iori...@yahoo.com]

Sent: Friday, May 6, 2016 12:39 PM

To: G, Rajesh ; solr-user@lucene.apache.org

Subject: Re: Facet ignoring repeated word



Hi Rajesh,



Can you please explain what do you mean by "tag cloud"?

How it is related to a query?

Please explain your requirements.



Ahmet







On Friday, May 6, 2016 8:44 AM, "G,"  wrote:

Hi,



Can you please help? If there is a solution then It will be easy, else I have 
to create a script in python that can process the results from 
TermVectorComponent and group the result by words in different documents to 
find the word count. The Python script will accept the exported Solr result as 
input



Thanks

Rajesh







CEB India Private Limited. Registration No: U741040HR2004PTC035324. Registered 
office: 6th Floor, Tower B, DLF Building No.10 DLF Cyber City, Gurgaon, 
Haryana-122002, India.



This e-mail and/or its attachments are intended only for the use of the 
addressee(s) and may contain confidential and legally privileged information 
belonging to CEB and/or its subsidiaries, including SHL. If you have received 
this e-mail in error, please notify the sender and immediately, destroy all 
copies of this

issue of the popped message "ERROR: Solr at http://xxxx:8983/solr did not come online within 30 seconds!" in Solr with basic authentication

2016-05-09 Thread tjlp
 Hi,
I am using Solr 5.5.0. I config the basic authentication to Solr following by 
http://muddyazian.blogspot.com/2013/11/how-to-require-password-authentication.html.
 Then when I start solr, the message "ERROR: Solr at http://:8983/solr did 
not come online within 30 seconds!" is popup on console. However, I can access 
the solr URL and the basic authentication works as expectation.  How to disable 
this message or any other methods to make solr not report this error?
Thanks


Re: Re: MoreLikeThis Component - how to get fields of documents

2016-05-09 Thread Dr. Jan Frederik Maas

Hi Edwin,

thanks for your reply - we currently use 5.0.0

Best wishes,
Jan

Am 05.05.2016 um 05:26 schrieb Zheng Lin Edwin Yeo:

Hi Jan,

Which version of Solr are you using?

Regards,
Edwin


On 26 April 2016 at 23:46, Dr. Jan Frederik Maas <
jan.m...@sub.uni-hamburg.de> wrote:


Hello,

I want to use the moreLikeThis Component to get similar documents from a
sharded SOLR. This works quite well except for the fact that the documents
in the moreLikeThis-list only contain the id/unique key of the documents.

Is it possible to get the other fields? I can of course do another query
for the given IDs, but this would be complicated and very slow.

For example:


http://mysolrsystem/?q=id:524507260&mlt=true&mlt.fl=title&mlt.mintf=0&indent=true&mlt.match.include=true&fl=title,id,topic

creates

(...)


646199803
613210832
562239472
819200034
539877271

(...)

I tried to modify the fl-parameter, but I can only switch the ID-field in
the moreLikeThis-Documents on and off (the latter resulting in empty
document tags). In the result list however, the fl-fields are shown as
specified.

I would be very grateful for help.

Best wishes,
Jan Maas




--
Dr.-Ing. Jan Frederik Maas
Staats- und Universitaetsbibliothek Hamburg Carl von Ossietzky
IuK-Technik / Digitale Bibliothek / Entwicklung
Fachreferent für Informatik, Technik
Von-Melle-Park 3, 20146 Hamburg
Telefon (040) 42838-6674 | Fax (040) 41345070
Mail: m...@sub.uni-hamburg.de
http://www.sub.uni-hamburg.de



RE: Facet ignoring repeated word

2016-05-09 Thread G, Rajesh
Hi Ahmet,

Please let me know if I am not clear

Thanks
Rajesh



CEB India Private Limited. Registration No: U741040HR2004PTC035324. Registered 
office: 6th Floor, Tower B, DLF Building No.10 DLF Cyber City, Gurgaon, 
Haryana-122002, India.

This e-mail and/or its attachments are intended only for the use of the 
addressee(s) and may contain confidential and legally privileged information 
belonging to CEB and/or its subsidiaries, including SHL. If you have received 
this e-mail in error, please notify the sender and immediately, destroy all 
copies of this email and its attachments. The publication, copying, in whole or 
in part, or use or dissemination in any other way of this e-mail and 
attachments by anyone other than the intended person(s) is prohibited.

-Original Message-
From: G, Rajesh [mailto:r...@cebglobal.com]
Sent: Friday, May 6, 2016 1:08 PM
To: Ahmet Arslan ; solr-user@lucene.apache.org
Subject: RE: Facet ignoring repeated word

Hi Ahmet,



Sorry it is Word Cloud  
https://urldefense.proofpoint.com/v2/url?u=https-3A__www.google.co.uk_webhp-3Fsourceid-3Dchrome-2Dinstant-26ion-3D1-26espv-3D2-26ie-3DUTF-2D8-23newwindow-3D1-26q-3Dword-2Bcloud&d=CwIGaQ&c=zzHkMf6HMoOvCB4yTPe0Gg&r=05YCVYE-IrDXcnbr1V8J9Q&m=k-w03YA11ltRmGgXa55Yx2gs1Jk1QowoFIE32lm9QMU&s=X_BPC_BR1vgdcijmmd50zYBOnIP97BfPfS2H7MxC9V4&e=



We have comments from survey. We want to build word cloud using the filed 
comments



e.g For question 1 the comments are



Comment 1.Projects, technology, features, performance

Comment 2.Too many projects and technology, not enough people to run 
projects



I want to run a query for question 1 that will produce the below result



projects: 3

technology:2

features:1

performance:1

Too:1

Many:1

Enough:1

People:1

Run:1





Facet produces the result but ignores repeated words in a document[projects 
count will be 2 instead of 3].



projects: 2

technology:2

features:1

performance:1

Too:1

Many:1

Enough:1

People:1

Run:1



TeamVectorComponent produces the result as expected but they are not grouped by 
words, instead they are grouped by id.





1





1











2





2









I wanted to know if it is possible to produce a result that is grouped by word 
and also does not ignore repeated words in a document. If it is not possible 
then I have to write some script that will take the above result from solr 
group words and sum the count



Thanks

Rajesh









CEB India Private Limited. Registration No: U741040HR2004PTC035324. Registered 
office: 6th Floor, Tower B, DLF Building No.10 DLF Cyber City, Gurgaon, 
Haryana-122002, India.



This e-mail and/or its attachments are intended only for the use of the 
addressee(s) and may contain confidential and legally privileged information 
belonging to CEB and/or its subsidiaries, including SHL. If you have received 
this e-mail in error, please notify the sender and immediately, destroy all 
copies of this email and its attachments. The publication, copying, in whole or 
in part, or use or dissemination in any other way of this e-mail and 
attachments by anyone other than the intended person(s) is prohibited.



-Original Message-

From: Ahmet Arslan [mailto:iori...@yahoo.com]

Sent: Friday, May 6, 2016 12:39 PM

To: G, Rajesh ; solr-user@lucene.apache.org

Subject: Re: Facet ignoring repeated word



Hi Rajesh,



Can you please explain what do you mean by "tag cloud"?

How it is related to a query?

Please explain your requirements.



Ahmet







On Friday, May 6, 2016 8:44 AM, "G,"  wrote:

Hi,



Can you please help? If there is a solution then It will be easy, else I have 
to create a script in python that can process the results from 
TermVectorComponent and group the result by words in different documents to 
find the word count. The Python script will accept the exported Solr result as 
input



Thanks

Rajesh







CEB India Private Limited. Registration No: U741040HR2004PTC035324. Registered 
office: 6th Floor, Tower B, DLF Building No.10 DLF Cyber City, Gurgaon, 
Haryana-122002, India.



This e-mail and/or its attachments are intended only for the use of the 
addressee(s) and may contain confidential and legally privileged information 
belonging to CEB and/or its subsidiaries, including SHL. If you have received 
this e-mail in error, please notify the sender and immediately, destroy all 
copies of this email and its attachments. The publication, copying, in whole or 
in part, or use or dissemination in any other way of this e-mail and 
attachments by anyone other than the intended person(s) is prohibited.





-Original Message-

From: G, Rajesh [mailto:r...@cebglobal.com]

Sent: Thursday, May 5, 2016 4:29 PM

To: Ahmet Arslan ; solr-user@lucene.apache.org; 
erickerick...@gmail.com

Subject: RE: Facet ignoring repeated word



Hi,



TermV

Re: Solr Suggester no results

2016-05-09 Thread Grigoris Iliopoulos
Yes, i also realized  that stored="false" was the problem..It is also
stated clearly in the documentation : "To be used as the basis for a
suggestion, the field must be stored."

Thanks for your time,
Grigoris

2016-05-06 19:42 GMT+03:00 Erick Erickson :

> First off, kudos for providing the details, that really helps!
>
> The root of your problem is that your suggest field has stored="false".
> DocumentDictionaryFactory reads through all the
> docs in your corpus, extracts the stored data and puts it in the FST. Since
> you don't have any stored data your FST is...er...minimal.
>
> I'd also add
> suggester_fuzzy_dir
> to the searchComponent. You'll find the FST on disk in that directory
> where it
> can be read next time Solr starts up. It is also helpful for figuring out
> whether there are suggestions to be had.
>
> And a minor nit, you probably don't want to specify suggest.dictionary
> in your query,
> that's already specified in your config.
>
> And it looks like you're alive to the fact that with that setup
> capitalization matters
> as does the fact that these suggestions be matched from the beginning of
> the
> field...
>
> Best,
> Erick
>
> On Thu, May 5, 2016 at 1:05 AM, Grigoris Iliopoulos
>  wrote:
> > Hi there,
> >
> > I want to use the Solr suggester component for city names. I have the
> > following settings:
> > schema.xml
> >
> > Field definition
> >
> >  positionIncrementGap="100">
> >   
> > 
> > 
> > 
> >   
> > 
> >
> > The field i want to apply the suggester on
> >
> > 
> >
> > The copy field
> >
> > 
> >
> > The field
> >
> >  indexed="true" />
> >
> > solr-config.xml
> >
> >  startup="lazy">
> >   
> > true
> > 10
> > mySuggester
> >   
> >   
> > suggest
> >   
> > 
> >
> >
> >
> > 
> >   
> > mySuggester
> > FuzzyLookupFactory
> > DocumentDictionaryFactory
> > citySuggest
> > string
> >   
> > 
> >
> > Then i run
> >
> >
> http://localhost:8983/solr/company/suggest?suggest=true&suggest.dictionary=mySuggester&wt=json&suggest.q=Ath&suggest.build=true
> >
> > to build the suggest component
> >
> > Finally i run
> >
> >
> http://localhost:8983/solr/company/suggest?suggest=true&suggest.dictionary=mySuggester&wt=json&suggest.q=Ath
> >
> > but i get an empty result set
> >
> >
> {"responseHeader":{"status":0,"QTime":0},"suggest":{"mySuggester":{"Ath":{"numFound":0,"suggestions":[]
> >
> > Are there any obvious mistakes? Any thoughts?
>


Re: Solr 6 / Solrj RuntimeException: First tuple is not a metadata tuple

2016-05-09 Thread deniz
was able to get "gettingstarted" example running with sql, on my local only
with a single zk... 

still not sure why the core/collection i have tried, didnt work till now...

thanks a lot for pointing out for the version related issues, it made me
change my focus from client to server side :) 



-
Zeki ama calismiyor... Calissa yapar...
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-6-Solrj-RuntimeException-First-tuple-is-not-a-metadata-tuple-tp4274451p4275447.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Issues of deploying mutiple web applications to Solr Jetty

2016-05-09 Thread Jan Høydahl
Hi

Don’t add multiple webapps to Solr’s Jetty. Install a separate Jetty on another 
port
for your other apps. Solr using Jetty is an implementation detail which may 
suddenly
disappear in a future release.

You will find that Jetty is very light weight and it is a much cleaner approach 
to
run a separate one for your apps!

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com

> 9. mai 2016 kl. 08.00 skrev t...@sina.com:
> 
> Hi,
> At present I try to deploy multiple web applications to Solr Jetty. So I add 
> context files for each web application under the folder $SOLR.HOME/contexts. 
> Now I get one library conflict issues:one of web applications uses old 
> version of slf-4j, which conflict with the slf-4j libraries used by Solr 
> itself. So, some error messages are printed on console as follow. How to 
> config the web applications that avoid the conflict and keep different 
> version of slf-4j for different applications.
> 
> log4j:ERROR A "org.apache.log4j.EnhancedPatternLayout" object is not 
> assignable
> to a "org.apache.log4j.Layout" variable.
> log4j:ERROR The class "org.apache.log4j.Layout" was loaded by
> log4j:ERROR [WebAppClassLoader=QuartzWeb@77fbd92c] whereas object of type
> log4j:ERROR "org.apache.log4j.EnhancedPatternLayout" was loaded by 
> [startJarLoad
> er@681a9515].
> log4j:ERROR A "org.apache.log4j.EnhancedPatternLayout" object is not 
> assignable
> to a "org.apache.log4j.Layout" variable.
> log4j:ERROR The class "org.apache.log4j.Layout" was loaded by
> log4j:ERROR [WebAppClassLoader=QuartzWeb@77fbd92c] whereas object of type
> log4j:ERROR "org.apache.log4j.EnhancedPatternLayout" was loaded by 
> [startJarLoad
> er@681a9515].
> log4j:ERROR No layout set for the appender named [file].
> log4j:ERROR No layout set for the appender named [CONSOLE].
> ThanksLiu Peng