RE: MLT and facetting

2019-02-26 Thread Martin Frank Hansen (MHQ)
Hi Edwin,

Thanks for your response. Are you sure it is a bug? Or is it not meant to work 
together? 
After doing some thinking I do see a problem faceting a MLT-result. MLT-results 
have a clear ordering of the documents which will be hard to maintain with 
facets. How will faceting MLT-results deal with the ordering of the documents? 
Will the ordering just be ignored?

Best regards

Martin 



Internal - KMD A/S

-Original Message-
From: Zheng Lin Edwin Yeo  
Sent: 27. februar 2019 03:38
To: solr-user@lucene.apache.org
Subject: Re: MLT and facetting

Hi Martin,

I also get the same problem in Solr 7.7 if I turn on faceting in /mlt 
requestHandler.

Found this issue in the JIRA:
https://issues.apache.org/jira/browse/SOLR-7883
Seems like it is a bug in Solr and it has not been resolved yet.

Regards,
Edwin

On Tue, 26 Feb 2019 at 21:03, Martin Frank Hansen (MHQ)  wrote:

> Hi Edwin,
>
> Here it is:
>
>
> 
>
>
> -
>
>
> -
>
> text
>
> 1
>
> 1
>
> true
>
> 
>
> 
>
>
> Internal - KMD A/S
>
> -Original Message-
> From: Zheng Lin Edwin Yeo 
> Sent: 26. februar 2019 08:24
> To: solr-user@lucene.apache.org
> Subject: Re: MLT and facetting
>
> Hi Martin,
>
> What is your setting in your /mlt requestHandler in solrconfig.xml?
>
> Regards,
> Edwin
>
> On Tue, 26 Feb 2019 at 14:43, Martin Frank Hansen (MHQ) 
> wrote:
>
> > Hi Edwin,
> >
> > Thanks for your response.
> >
> > Yes you are right. It was simply the search parameters from Solr.
> >
> > The query looks like this:
> >
> > http://
> > .../solr/.../mlt?df=text=Journalnummer=on=id,Jo
> > ur
> > nalnummer=id:*6512815*
> >
> > best regards,
> >
> > Martin
> >
> >
> > Internal - KMD A/S
> >
> > -Original Message-
> > From: Zheng Lin Edwin Yeo 
> > Sent: 26. februar 2019 03:54
> > To: solr-user@lucene.apache.org
> > Subject: Re: MLT and facetting
> >
> > Hi Martin,
> >
> > I think there are some pictures which are not being sent through in 
> > the email.
> >
> > Do send your query that you are using, and which version of Solr you 
> > are using?
> >
> > Regards,
> > Edwin
> >
> > On Mon, 25 Feb 2019 at 20:54, Martin Frank Hansen (MHQ) 
> > wrote:
> >
> > > Hi,
> > >
> > >
> > >
> > > I am trying to combine the mlt functionality with facets, but Solr 
> > > throws
> > > org.apache.solr.common.SolrException: ":"Unable to compute facet 
> > > ranges, facet context is not set".
> > >
> > >
> > >
> > > What I am trying to do is quite simple, find similar documents 
> > > using mlt and group these using the facet parameter. When using 
> > > mlt and facets separately everything works fine, but not when 
> > > combining the
> > functionality.
> > >
> > >
> > >
> > >
> > >
> > > {
> > >
> > >   "responseHeader":{
> > >
> > > "status":500,
> > >
> > > "QTime":109},
> > >
> > >   "match":{"numFound":1,"start":0,"docs":[
> > >
> > >   {
> > >
> > > "Journalnummer":" 00759",
> > >
> > > "id":"6512815"  },
> > >
> > >   "response":{"numFound":602234,"start":0,"docs":[
> > >
> > >   {
> > >
> > > "Journalnummer":" 00759",
> > >
> > > "id":"6512816",
> > >
> > >   {
> > >
> > > "Journalnummer":" 00759",
> > >
> > > "id":"6834653"
> > >
> > >   {
> > >
> > > "Journalnummer":" 00739",
> > >
> > > "id":"6202373"
> > >
> > >   {
> > >
> > > "Journalnummer":" 00739",
> > >
> > > "id":"6748105"
> > >
> > >
> > >
> > >   {
> > >
> > > "Journalnummer":" 00803",
> > >
> > > "id":"7402155"
> > >
> > >   },
> > >
> > >   "error":{
> > >
> > > "metadata":[
> > >
> > >   "error-class","org.apache.solr.common.SolrException",
> > >
> > >   "root-error-class","org.apache.solr.common.SolrException"],
> > >
> > > "msg":"Unable to compute facet ranges, facet context is not 
> > > set",
> > >
> > > "trace":"org.apache.solr.common.SolrException: Unable to 
> > > compute facet ranges, facet context is not set\n\tat 
> > > org.apache.solr.handler.component.RangeFacetProcessor.getFacetRang
> > > eC ou nts(RangeFacetProcessor.java:66)\n\tat
> > > org.apache.solr.handler.component.FacetComponent.getFacetCounts(Fa
> > > ce
> > > tC
> > > omponent.java:331)\n\tat
> > > org.apache.solr.handler.component.FacetComponent.getFacetCounts(Fa
> > > ce
> > > tC
> > > omponent.java:295)\n\tat
> > > org.apache.solr.handler.MoreLikeThisHandler.handleRequestBody(More
> > > Li
> > > ke
> > > ThisHandler.java:240)\n\tat
> > > org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHa
> > > nd
> > > le
> > > rBase.java:199)\n\tat
> > > org.apache.solr.core.SolrCore.execute(SolrCore.java:2541)\n\tat
> > > org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:709
> > > )\
> > > n\
> > > tat
> > > org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:515)\n
> > > \t
> > > at
> > > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFi
> > > lt
> > > er
> > > .java:377)\n\tat
> > > 

Re: MLT and facetting

2019-02-26 Thread Zheng Lin Edwin Yeo
Hi Martin,

I also get the same problem in Solr 7.7 if I turn on faceting in /mlt
requestHandler.

Found this issue in the JIRA:
https://issues.apache.org/jira/browse/SOLR-7883
Seems like it is a bug in Solr and it has not been resolved yet.

Regards,
Edwin

On Tue, 26 Feb 2019 at 21:03, Martin Frank Hansen (MHQ)  wrote:

> Hi Edwin,
>
> Here it is:
>
>
> 
>
>
> -
>
>
> -
>
> text
>
> 1
>
> 1
>
> true
>
> 
>
> 
>
>
> Internal - KMD A/S
>
> -Original Message-
> From: Zheng Lin Edwin Yeo 
> Sent: 26. februar 2019 08:24
> To: solr-user@lucene.apache.org
> Subject: Re: MLT and facetting
>
> Hi Martin,
>
> What is your setting in your /mlt requestHandler in solrconfig.xml?
>
> Regards,
> Edwin
>
> On Tue, 26 Feb 2019 at 14:43, Martin Frank Hansen (MHQ) 
> wrote:
>
> > Hi Edwin,
> >
> > Thanks for your response.
> >
> > Yes you are right. It was simply the search parameters from Solr.
> >
> > The query looks like this:
> >
> > http://
> > .../solr/.../mlt?df=text=Journalnummer=on=id,Jour
> > nalnummer=id:*6512815*
> >
> > best regards,
> >
> > Martin
> >
> >
> > Internal - KMD A/S
> >
> > -Original Message-
> > From: Zheng Lin Edwin Yeo 
> > Sent: 26. februar 2019 03:54
> > To: solr-user@lucene.apache.org
> > Subject: Re: MLT and facetting
> >
> > Hi Martin,
> >
> > I think there are some pictures which are not being sent through in
> > the email.
> >
> > Do send your query that you are using, and which version of Solr you
> > are using?
> >
> > Regards,
> > Edwin
> >
> > On Mon, 25 Feb 2019 at 20:54, Martin Frank Hansen (MHQ) 
> > wrote:
> >
> > > Hi,
> > >
> > >
> > >
> > > I am trying to combine the mlt functionality with facets, but Solr
> > > throws
> > > org.apache.solr.common.SolrException: ":"Unable to compute facet
> > > ranges, facet context is not set".
> > >
> > >
> > >
> > > What I am trying to do is quite simple, find similar documents using
> > > mlt and group these using the facet parameter. When using mlt and
> > > facets separately everything works fine, but not when combining the
> > functionality.
> > >
> > >
> > >
> > >
> > >
> > > {
> > >
> > >   "responseHeader":{
> > >
> > > "status":500,
> > >
> > > "QTime":109},
> > >
> > >   "match":{"numFound":1,"start":0,"docs":[
> > >
> > >   {
> > >
> > > "Journalnummer":" 00759",
> > >
> > > "id":"6512815"  },
> > >
> > >   "response":{"numFound":602234,"start":0,"docs":[
> > >
> > >   {
> > >
> > > "Journalnummer":" 00759",
> > >
> > > "id":"6512816",
> > >
> > >   {
> > >
> > > "Journalnummer":" 00759",
> > >
> > > "id":"6834653"
> > >
> > >   {
> > >
> > > "Journalnummer":" 00739",
> > >
> > > "id":"6202373"
> > >
> > >   {
> > >
> > > "Journalnummer":" 00739",
> > >
> > > "id":"6748105"
> > >
> > >
> > >
> > >   {
> > >
> > > "Journalnummer":" 00803",
> > >
> > > "id":"7402155"
> > >
> > >   },
> > >
> > >   "error":{
> > >
> > > "metadata":[
> > >
> > >   "error-class","org.apache.solr.common.SolrException",
> > >
> > >   "root-error-class","org.apache.solr.common.SolrException"],
> > >
> > > "msg":"Unable to compute facet ranges, facet context is not
> > > set",
> > >
> > > "trace":"org.apache.solr.common.SolrException: Unable to compute
> > > facet ranges, facet context is not set\n\tat
> > > org.apache.solr.handler.component.RangeFacetProcessor.getFacetRangeC
> > > ou nts(RangeFacetProcessor.java:66)\n\tat
> > > org.apache.solr.handler.component.FacetComponent.getFacetCounts(Face
> > > tC
> > > omponent.java:331)\n\tat
> > > org.apache.solr.handler.component.FacetComponent.getFacetCounts(Face
> > > tC
> > > omponent.java:295)\n\tat
> > > org.apache.solr.handler.MoreLikeThisHandler.handleRequestBody(MoreLi
> > > ke
> > > ThisHandler.java:240)\n\tat
> > > org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHand
> > > le
> > > rBase.java:199)\n\tat
> > > org.apache.solr.core.SolrCore.execute(SolrCore.java:2541)\n\tat
> > > org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:709)\
> > > n\
> > > tat
> > > org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:515)\n\t
> > > at
> > > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilt
> > > er
> > > .java:377)\n\tat
> > > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilt
> > > er
> > > .java:323)\n\tat
> > > org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(Servle
> > > tH
> > > andler.java:1634)\n\tat
> > > org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:
> > > 533)\n\tat
> > > org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.
> > > ja
> > > va:146)\n\tat
> > > org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.ja
> > > va
> > > :548)\n\tat
> > > org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.
> > > java:132)\n\tat
> > > 

Cannot get MBean info via JConsole

2019-02-26 Thread Yasufumi Mizoguchi
Hi,

I want to access MBean information via JConsole with Solr 6.2.
Now, I could get the information via MBeanRequestHandler, but could not via
JConsole from the same host that Solr ran.
So, how can I do it via JConsole?

Any information about this would be greatly appreciated.

Thanks,
Yasufumi.


Re: Overseer could not get tags

2019-02-26 Thread dshih
Opened SOLR-13274



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Overseer could not get tags

2019-02-26 Thread dshih
We are seeing the same issue running 7.4.0.  Increasing the request and
response header size did not resolve the issue.  Should we open a JIRA
ticket if one does not already exist?



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: SolrCloud exclusive features

2019-02-26 Thread Arnold Bronley
Here is what I have found on my own little research. Please correct me if I
am wrong. Also, please feel free to add more features.


   - Collections API
   - ConfigSets API
   - Zookeeper CLI
   - Streaming expressions
   - Parallel SQL interface
   - Authorization plugins
   - Blob store API


On Sat, Feb 16, 2019 at 7:07 PM Arnold Bronley 
wrote:

> I am glad to learn that there are others in similar need. A list for
> SolrCloud exclusive features will be really awesome.
> Can any Solr devs please reply to this thread?
>
>
> On Fri, Feb 15, 2019 at 8:39 AM David Hastings <
> hastings.recurs...@gmail.com> wrote:
>
>> >streaming expressions are only available in
>> SolrCloud mode and not in Solr master-slave mode?
>>
>> yes, and its annoying as there are features of solr cloud I do not like.
>> as far as a comprehensive list, that I do not know but would be interested
>> in one as well
>>
>> On Thu, Feb 14, 2019 at 5:07 PM Arnold Bronley 
>> wrote:
>>
>> > Hi,
>> >
>> > Are there any features that are only exclusive to SolrCloud?
>> >
>> > e.g. when I am reading Streaming Expressions documentation, first
>> sentence
>> > there says 'Streaming Expressions provide a simple yet powerful stream
>> > processing language for Solr Cloud.'
>> >
>> > So, does this mean that streaming expressions are only available in
>> > SolrCloud mode and not in Solr master-slave mode?
>> >
>> > If yes, is there a list of such features that only exclusively
>> available in
>> > SolrCloud?
>> >
>>
>


Python Client for Solr Cloud - Leader aware

2019-02-26 Thread Ganesh Sethuraman
We are using Solr Cloud 7.2.1. Is there a leader aware python client (like
SolrJ for Java), which can send the updates to the leader and it its highly
available?
I see PySolr https://pypi.org/project/pysolr/ project, not able to find any
documentation if it supports leader aware updates.

Regards
Ganesh


StreamingSolrClients intermittent Error SolrCloud setup

2019-02-26 Thread abhishek_itengg
Hi,

I am using SolrCloud setup with 3 SolrNodes. Intermittently we see errors of
streaming solr clients on individual solr node logs. These error do resolved
automatically but it comes back every now and then.
We have 3 zookeepers and I verified that they have always maintained quorum.
Is there a known issue, any ideas to resolve this?

Solr Version - 6.6.2
Zookeeper Version - 3.4.12

Below is the exception:-

org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error
from server at
https://solrCloudTest1.Test.com:8983/solr/sitecore_core_index_shard1_replica1:
Bad Request



request:
https://solrCloudTest1.Test.com:8983//solr/sitecore_core_index_shard1_replica1/update?update.distrib=TOLEADER=https%3A%2F%2FsolrCloudTest1.Test.com%3A8983%2Fsolr%2Fsitecore_core_index_shard1_replica2%2F=javabin=2
Remote error message: ERROR:
[doc=sitecore://core/{----}?lang=da=1=sitecore_core_index]
Error adding field 'level_tl'='false' msg=For input string: "false"
at
org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrClient$Runner.sendUpdateStream(ConcurrentUpdateSolrClient.java:345)
at
org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrClient$Runner.run(ConcurrentUpdateSolrClient.java:184)
at
com.codahale.metrics.InstrumentedExecutorService$InstrumentedRunnable.run(InstrumentedExecutorService.java:176)
at
org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:229)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source) 

Thanks,
Abhi



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: SolrCloud fails to restart after rebooting

2019-02-26 Thread abhishek_itengg
Shawn,

As you mentioned, it was indeed a problem with my network. The port 2888 was
blocked restricting zookeeper communication its peers.

Thanks,
Abhi




--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Giving SolrJ credentials for Zookeeper

2019-02-26 Thread Snead, Ryan [USA]
I am following along with the example found in Zookeeper Access Control of the 
Apache Solr 7.5 Reference Guide. I have gotten to the point where I can use the 
zkcli.sh control script to access my secured Zookeeper environment. I can also 
connect using Zookeeper's zkCli.sh and then authenticate using the auth 
command. The point where I run into trouble is having completed the steps in 
the article, how do I find what parameters to set with SolrJ to allow my 
indexer code to communicate with Zookeeper.

The error my Java code is returning when I try to process a QueryRequest is: 
Error reading cluster properties from zookeeper 
org.apache.zookeeper.KeeperException$NoAuthException: KeeperError Code = NoAuth 
for /clusterprops.json

My code is:
solrClient = new CloudSolrClient.Builder("localhost:2181", 
Optional.of("/")).build();
String solrQuery = String.format("PRODUCT_TYPE:USER and PRODUCT_SK:%s", 
productSk);
SolrQuery q = new SolrQuery();
q.set("q", solrQuery);
QueryRequest request = new QueryRequest(q);
numfound = request.process(solrClient).getResults().getNumFound();
Error occurs at the last line. I suspect that I need to set a property in 
solrClient, but it is not clear to me what that would be.

References:
https://lucene.apache.org/solr/guide/7_5/zookeeper-access-control.html
ZooKeeper Access Control | Apache Solr Reference Guide 
7.5
Content stored in ZooKeeper is critical to the operation of a SolrCloud 
cluster. Open access to SolrCloud content on ZooKeeper could lead to a variety 
of problems.
lucene.apache.org




Re: questions regrading stored fields role in query time

2019-02-26 Thread Erick Erickson
It Depends (tm).

See: SOLR-12598 for details. The short form is that as of Solr 7.5, Solr 
attempts to do the most efficient thing possible when fetching fields to return 
to the client.

1> if all requested fields are docValues, return from docValues.
2> if _any_ field is stored, return from the stored (fdt) values.
3> if some are DV=true, but stored=false, get from both places
4> if some are DV=false but stored=true, get from both places.

To return a single stored=true field that is _not_ docValues, a minimum 16K 
block must be read from disk and decompressed. Much of the time, that will 
contain all of the fields and the uncompressed doc will be in the JVM’s heap so 
it’s more efficient to do that than pull it from MMapDirectory space.

If all values are dv=true, then not having to seek to disk/uncompress is 
probably more efficient so do it that way.

3 and 4 are really the same thing, you _can’t_ get all the fields from the same 
place, so you have to read/decompress _and_ pull from DV.

But wrapped around all this is that you’re really not doing either for even a 
small fraction of the docs compared to searching. Say I have numFound of 
1,000,000 but return 10 docs. You only have to decompress 10 blocks at worst.

And, as Emir says, accessing the fdt files is only done for the 10 docs 
returned, so that really doesn’t impact the search times much…

Best,
Erick

> On Feb 26, 2019, at 2:40 AM, Emir Arnautović  
> wrote:
> 
> Hi Saurabh,
> DocValues can be used for retrieving field values (note that order will not 
> be preserved in case of multivalue field) but they are also stored in files, 
> just different structures. Doc values will load some structure in memory, but 
> will also use memory mapped files to access values (not familiar with this 
> code and just assuming) so in any case it will use “shared” OS caches. Those 
> caches will be affected when loading stored fields to do partial update. Also 
> it’ll take some memory when indexing documents. That is why storing and doing 
> partial updates could indirectly affect query performances. But that might be 
> insignificant and only test can tell for sure. Unless you have small index 
> and enough RAM, then I can also tell that for sure.
> 
> Emir
> --
> Monitoring - Log Management - Alerting - Anomaly Detection
> Solr & Elasticsearch Consulting Support Training - http://sematext.com/
> 
> 
> 
>> On 26 Feb 2019, at 11:21, Saurabh Sharma  wrote:
>> 
>> Hi Emir,
>> 
>> I had this question in my mind if I store my only returnable field as
>> docValue in RAM.will my stored documents be referenced while constructing
>> the response after the query. Ideally, as the field asked to return i.e fl
>> is already in RAM then documents on disk should not be consulted for this
>> field.
>> 
>> Any insight about the usage of docValued field vs stored field and
>> preference order will help here in understanding the situation in a better
>> way.
>> 
>> Thanks
>> Saurabh
>> 
>> On Tue, Feb 26, 2019 at 2:41 PM Emir Arnautović <
>> emir.arnauto...@sematext.com> wrote:
>> 
>>> Hi Saurabh,
>>> Welcome to the channel!
>>> Storing fields should not affect query performances directly if you use
>>> lazy field loading and it is the default set. And it should not affect at
>>> all if you have enough RAM compared to index size. Otherwise OS caches
>>> might be affected by stored fields. The best way to tell is to tests with
>>> expected indexing/partial updates load and see if/how much it affects
>>> performances.
>>> 
>>> HTH,
>>> Emir
>>> --
>>> Monitoring - Log Management - Alerting - Anomaly Detection
>>> Solr & Elasticsearch Consulting Support Training - http://sematext.com/
>>> 
>>> 
>>> 
 On 26 Feb 2019, at 09:34, Saurabh Sharma 
>>> wrote:
 
 Hi All ,
 
 
 I am new here on this channel.
 Few days back we upgraded our solr cloud to version 7.3 and doing
>>> real-time
 document posting with 15 seconds soft commit and 2 minutes hard commit
 time.As of now we posting full document to solr which includes data
 accumulations from various sources.
 
 Now we want to do partial updates.I went through the documentation and
 found that all the fields should be stored or docValues for partial
 updates. I have few questions regarding this?
 
 1) In case i am just fetching only 1 field while making query.What will
>>> the
 performance impact due to all fields being stored? Lets say i have an
>>> "id"
 field and i do have doc value true for the field, will solr use stored
 fields in this case? will it load whole document in RAM ?
 
 2)What's the impact of large stored fields (.fdt) on query time
 performance. Do query time even depend on the stored field or they just
 depend on indexes?
 
 
 Thanks and regards
 Saurabh
>>> 
>>> 
> 



Re: High Availability with two nodes

2019-02-26 Thread Shawn Heisey

On 2/26/2019 2:39 AM, Andreas Mock wrote:

currently we are looking at Apache Solr as a solution
for searching. One important component is high availability.
I digged around finding out that HA is built in via
SolrCloud which means I have to install ZooKeeper in
a production environment which needs at least three
nodes.


ZK requires at least three nodes.  This is how it is designed, and the 
only way that is is possible to be absolutely certain that operation is 
correct.


The third node (running ZK only, not Solr) does not need much power. 
You could get a very cheap laptop for that role and it would probably 
work great.



So, now to my problem. I haven't found any documents
showing a way to get a two node cluster simply for
HA (active/passive).

Is there a recommended way or are there any solutions
out there showing a scenario with Solr Single Server
combined e.g. DRBD and Pacemaker?


SolrCloud cannot achieve this with two servers, due to ZK requirements.

You could set up two servers using master/slave replication ... but if 
your master dies, indexing becomes a challenge.  Switching which machine 
is master requires manual reconfiguration of all servers ... something 
that wouldn't be the case with SolrCloud ... because it does not have 
masters and slaves.  The leader for each index is transitory, assigned 
by a ZK election.


Thanks,
Shawn


Re: questions regrading stored fields role in query time

2019-02-26 Thread Shawn Heisey

On 2/26/2019 1:34 AM, Saurabh Sharma wrote:

Now we want to do partial updates.I went through the documentation and
found that all the fields should be stored or docValues for partial
updates. I have few questions regarding this?

1) In case i am just fetching only 1 field while making query.What will the
performance impact due to all fields being stored? Lets say i have an "id"
field and i do have doc value true for the field, will solr use stored
fields in this case? will it load whole document in RAM ?


I am not aware of any option to keep docValues in RAM.  If you have 
enough memory in your system (memory that has NOT been assigned to any 
program), then the OS *might* keep some or all of your index data in 
memory.  That functionality, present in all modern operating systems, is 
the secret to good performance.


The stored data is compressed.  The docValues data is not compressed. 
Uncompressing stored data uses CPU cycles.  Generally if data must be 
read off of disk, compressed will be faster.  But if the data has been 
cached by the OS and comes from memory, which you definitely want to 
happen if possible, uncompressed will likely be faster ... and it will 
definitely require less CPU.


If you have many fields but you're only fetching one, then docValues 
will almost certainly be faster than stored.  All of the stored fields 
for one document are compressed together, so Solr will be reading data 
that it won't actually be using, in order to achieve decompression.


I believe that if you have both stored data and docValues for a field, 
Solr will use stored data for search results.  I am not positive that 
this is the case, but I think it's what happens.



2)What's the impact of large stored fields (.fdt) on query time
performance. Do query time even depend on the stored field or they just
depend on indexes?


The size of your stored data will have no *DIRECT* impact on query 
performance.  Stored data is not consulted for the query part.  It is 
consulted when document data is retrieved to return with the response.


A large amount of stored data can have an indirect impact on query 
performance.  If there is insufficient memory available to the OS disk 
cache, then reading the stored data to return results to the client will 
push information out of the disk cache that is needed for queries.  If 
that happens, then Solr will need to re-read that data off the disk to 
do a query.  Because disks are glacially slow compared to memory, 
performance will be impacted.


Here's a page about performance problems.  Most of it is about memory, 
since that is usually the resource that has the biggest effect on 
performance:


https://wiki.apache.org/solr/SolrPerformanceProblems

Thanks,
Shawn


Re: SOLR Tokenizer “solr.SimplePatternSplitTokenizerFactory” splits at unexpected characters

2019-02-26 Thread Shawn Heisey

On 2/26/2019 12:18 AM, Stephan Damson wrote:

If we take the example input "operative", the analyzer shows that during indexing, the input gets split into the tokens 
"ope", "a" and "ive", that is the tokenizer splits at the characters "r" and "t", and not at 
the expected whitespace characters (CR, TAB). Just to be sure I also tried to use more than one backspace in the pattern (e.g. \t and 
\\t), but this did not change how the input is tokenized during indexing.


I tried your fieldType on 7.5.0 and I see the same problem.  I couldn't 
get it working no matter what I tried.


I then tested it on 7.7.0 and it works properly in that version.

Thanks,
Shawn


Re: %solr_logs_dir% does not like spaces

2019-02-26 Thread Erick Erickson
If you can munge the solr.cmd file and it works for you, _please_ submit a JIRA 
and a patch!

most of the Solr devs develop on *nix boxes, so this kind of thing creeps in 
and we need to fix it.

Best,
Erick

> On Feb 26, 2019, at 6:38 AM, paul.d...@ub.unibe.ch wrote:
> 
> Perhaps the instances of %SOLR_LOGS_DIR% in the solr.cmd files should be 
> quoted i.e. "%SOLR_LOGS_DIR%" ??
> 
> 
> 
> Gesendet von Mail für Windows 
> 10
> 
> 
> 
> Von: Arturas Mazeika
> Gesendet: Dienstag, 26. Februar 2019 15:10
> An: solr-user@lucene.apache.org
> Betreff: Re: %solr_logs_dir% does not like spaces
> 
> 
> 
> Hi Paul,
> 
> getting rid of space in "program files" is doable, you are right. One way
> to do it is through
> 
>   - echo %programfiles% ==> C:\Program Files
>   - echo %programfiles(x86)% ==> C:\Program Files (x86)
> 
> Getting rid of spaces in sub directories is very difficult as we use tons
> of those for different components of our suite.
> 
> Any other options to set it in some XML file or something?
> 
> Cheers,
> Arturas
> 
> 
> On Tue, Feb 26, 2019 at 3:03 PM  wrote:
> 
>> Looks like a bug in solr.cmd. You could try eliminating the spaces and/or
>> opening an issue.
>> 
>> 
>> 
>> Instead of ‘Program Files (x86)’ use ‘PROGRA~2’
>> 
>> And don’t have spaces in your subdirectory…
>> 
>> 
>> 
>> NB: Depending on your Windows Version you may Have another alias for
>> ‘Program Files (x86)’; use «dir /X» to view the aliases.
>> 
>> 
>> 
>> Gesendet von Mail für
>> Windows 10
>> 
>> 
>> 
>> Von: Arturas Mazeika
>> Gesendet: Dienstag, 26. Februar 2019 14:41
>> An: solr-user@lucene.apache.org
>> Betreff: %solr_logs_dir% does not like spaces
>> 
>> 
>> 
>> Hi All,
>> 
>> I am testing solr 7.7 (and 7.6) under windows. My aim is to set logging
>> into a subdirectory that contains spaces of a directory that contains
>> spaces.
>> 
>> If I set on windows:
>> 
>> setx /m SOLR_LOGS_DIR "f:\solr_deployment\logs"
>> 
>> and start a solr instance:
>> 
>> F:\solr_deployment\solr-7.7.0\bin\solr.cmd start -h localhost -p 8983 -s
>> F:\solr_deployment\solr_data -m 1g
>> 
>> this goes smoothly.
>> 
>> However If I set the logging directory to:
>> 
>> setx /m SOLR_LOGS_DIR  "C:\Program Files (x86)\My Directory\Another
>> Directory\logs\solr"
>> 
>> then I get a cryptic error:
>> 
>> F:\solr_deployment\solr-7.7.0\bin\solr.cmd start -h localhost -p 8983 -s
>> F:\solr_deployment\solr_data -m 1g
>> Files was unexpected at this time.
>> 
>> If I comment "@echo off" in both solr.cmd and solr.cmd.in, it shows that
>> it
>> dies around those lines in solr.cmd:
>> 
>> F:\solr_deployment\solr-7.7.0\bin>IF "" == "" set STOP_KEY=solrrocks
>> Files was unexpected at this time.
>> 
>> In the solr.cmd the following block is shown:
>> 
>> IF "%STOP_KEY%"=="" set STOP_KEY=solrrocks
>> 
>> @REM This is quite hacky, but examples rely on a different log4j2.xml
>> @REM so that we can write logs for examples to %SOLR_HOME%\..\logs
>> IF [%SOLR_LOGS_DIR%] == [] (
>>  set "SOLR_LOGS_DIR=%SOLR_SERVER_DIR%\logs"
>> ) ELSE (
>>  set SOLR_LOGS_DIR=%SOLR_LOGS_DIR:"=%
>> )
>> 
>> comments?
>> 
>> Cheers,
>> Arturas
>> 



Re: High Availability with two nodes

2019-02-26 Thread Walter Underwood
Yes, you need three Zookeeper nodes. You cannot have an HA Solr Cloud 
installation with only two hosts. The Zookeeper hosts do not need to be large.

A master/slave configuration might be fine, but we need to know more before 
recommending that.

How many documents? How big are they? How fresh does the index need to be (time 
between repository change and available to search)? How often does your source 
data change? How much downtime is allowable to switch over?

Also, I have no idea what “DRBD” or “Pacemaker” means in this context.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)

> On Feb 26, 2019, at 7:25 AM, Andreas Mock  wrote:
> 
> Hi Walter,
> 
> but I thought I need at least 3 zookeeper nodes? Is this not valid?
> I only have two servers. So, how can I have a two server SolrCloud
> installation? Am I missing something?
> 
> Best regards
> Andreas
> 
>> -Ursprüngliche Nachricht-
>> Von: Walter Underwood 
>> Gesendet: Dienstag, 26. Februar 2019 16:14
>> An: solr-user@lucene.apache.org
>> Betreff: Re: High Availability with two nodes
>> 
>> Solr Cloud automatically choose a leader and a follower.
>> 
>> I am not a fan of cold standby hosts, because you don’t really know
>> whether they work. You have two hosts, so keep them both hot, put a load
>> balancer in front of them, and send all the traffic to both of them all
>> the time. If one fails, you are still up.
>> 
>> wunder
>> Walter Underwood
>> wun...@wunderwood.org
>> http://observer.wunderwood.org/  (my blog)
>> 
>>> On Feb 26, 2019, at 1:39 AM, Andreas Mock 
>> wrote:
>>> 
>>> Hi all,
>>> 
>>> currently we are looking at Apache Solr as a solution
>>> for searching. One important component is high availability.
>>> I digged around finding out that HA is built in via
>>> SolrCloud which means I have to install ZooKeeper in
>>> a production environment which needs at least three
>>> nodes.
>>> 
>>> So, now to my problem. I haven't found any documents
>>> showing a way to get a two node cluster simply for
>>> HA (active/passive).
>>> 
>>> Is there a recommended way or are there any solutions
>>> out there showing a scenario with Solr Single Server
>>> combined e.g. DRBD and Pacemaker?
>>> 
>>> Any hints and pointers are very welcome.
>>> 
>>> Thank you in advance
>>> Andreas
>>> 
> 



AW: High Availability with two nodes

2019-02-26 Thread Andreas Mock
Hi Walter,

but I thought I need at least 3 zookeeper nodes? Is this not valid?
I only have two servers. So, how can I have a two server SolrCloud
installation? Am I missing something?

Best regards
Andreas

> -Ursprüngliche Nachricht-
> Von: Walter Underwood 
> Gesendet: Dienstag, 26. Februar 2019 16:14
> An: solr-user@lucene.apache.org
> Betreff: Re: High Availability with two nodes
> 
> Solr Cloud automatically choose a leader and a follower.
> 
> I am not a fan of cold standby hosts, because you don’t really know
> whether they work. You have two hosts, so keep them both hot, put a load
> balancer in front of them, and send all the traffic to both of them all
> the time. If one fails, you are still up.
> 
> wunder
> Walter Underwood
> wun...@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
> 
> > On Feb 26, 2019, at 1:39 AM, Andreas Mock 
> wrote:
> >
> > Hi all,
> >
> > currently we are looking at Apache Solr as a solution
> > for searching. One important component is high availability.
> > I digged around finding out that HA is built in via
> > SolrCloud which means I have to install ZooKeeper in
> > a production environment which needs at least three
> > nodes.
> >
> > So, now to my problem. I haven't found any documents
> > showing a way to get a two node cluster simply for
> > HA (active/passive).
> >
> > Is there a recommended way or are there any solutions
> > out there showing a scenario with Solr Single Server
> > combined e.g. DRBD and Pacemaker?
> >
> > Any hints and pointers are very welcome.
> >
> > Thank you in advance
> > Andreas
> >



Re: High Availability with two nodes

2019-02-26 Thread Walter Underwood
Solr Cloud automatically choose a leader and a follower.

I am not a fan of cold standby hosts, because you don’t really know whether 
they work. You have two hosts, so keep them both hot, put a load balancer in 
front of them, and send all the traffic to both of them all the time. If one 
fails, you are still up.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)

> On Feb 26, 2019, at 1:39 AM, Andreas Mock  wrote:
> 
> Hi all,
> 
> currently we are looking at Apache Solr as a solution
> for searching. One important component is high availability.
> I digged around finding out that HA is built in via
> SolrCloud which means I have to install ZooKeeper in 
> a production environment which needs at least three
> nodes.
> 
> So, now to my problem. I haven't found any documents
> showing a way to get a two node cluster simply for
> HA (active/passive).
> 
> Is there a recommended way or are there any solutions
> out there showing a scenario with Solr Single Server
> combined e.g. DRBD and Pacemaker?
> 
> Any hints and pointers are very welcome.
> 
> Thank you in advance
> Andreas 
> 



AW: %solr_logs_dir% does not like spaces

2019-02-26 Thread paul.dodd
Perhaps the instances of %SOLR_LOGS_DIR% in the solr.cmd files should be quoted 
i.e. "%SOLR_LOGS_DIR%" ??



Gesendet von Mail für Windows 10



Von: Arturas Mazeika
Gesendet: Dienstag, 26. Februar 2019 15:10
An: solr-user@lucene.apache.org
Betreff: Re: %solr_logs_dir% does not like spaces



Hi Paul,

getting rid of space in "program files" is doable, you are right. One way
to do it is through

   - echo %programfiles% ==> C:\Program Files
   - echo %programfiles(x86)% ==> C:\Program Files (x86)

Getting rid of spaces in sub directories is very difficult as we use tons
of those for different components of our suite.

Any other options to set it in some XML file or something?

Cheers,
Arturas


On Tue, Feb 26, 2019 at 3:03 PM  wrote:

> Looks like a bug in solr.cmd. You could try eliminating the spaces and/or
> opening an issue.
>
>
>
> Instead of ‘Program Files (x86)’ use ‘PROGRA~2’
>
> And don’t have spaces in your subdirectory…
>
>
>
> NB: Depending on your Windows Version you may Have another alias for
> ‘Program Files (x86)’; use «dir /X» to view the aliases.
>
>
>
> Gesendet von Mail für
> Windows 10
>
>
>
> Von: Arturas Mazeika
> Gesendet: Dienstag, 26. Februar 2019 14:41
> An: solr-user@lucene.apache.org
> Betreff: %solr_logs_dir% does not like spaces
>
>
>
> Hi All,
>
> I am testing solr 7.7 (and 7.6) under windows. My aim is to set logging
> into a subdirectory that contains spaces of a directory that contains
> spaces.
>
> If I set on windows:
>
> setx /m SOLR_LOGS_DIR "f:\solr_deployment\logs"
>
> and start a solr instance:
>
> F:\solr_deployment\solr-7.7.0\bin\solr.cmd start -h localhost -p 8983 -s
> F:\solr_deployment\solr_data -m 1g
>
> this goes smoothly.
>
> However If I set the logging directory to:
>
> setx /m SOLR_LOGS_DIR  "C:\Program Files (x86)\My Directory\Another
> Directory\logs\solr"
>
> then I get a cryptic error:
>
> F:\solr_deployment\solr-7.7.0\bin\solr.cmd start -h localhost -p 8983 -s
> F:\solr_deployment\solr_data -m 1g
> Files was unexpected at this time.
>
> If I comment "@echo off" in both solr.cmd and solr.cmd.in, it shows that
> it
> dies around those lines in solr.cmd:
>
> F:\solr_deployment\solr-7.7.0\bin>IF "" == "" set STOP_KEY=solrrocks
> Files was unexpected at this time.
>
> In the solr.cmd the following block is shown:
>
> IF "%STOP_KEY%"=="" set STOP_KEY=solrrocks
>
> @REM This is quite hacky, but examples rely on a different log4j2.xml
> @REM so that we can write logs for examples to %SOLR_HOME%\..\logs
> IF [%SOLR_LOGS_DIR%] == [] (
>   set "SOLR_LOGS_DIR=%SOLR_SERVER_DIR%\logs"
> ) ELSE (
>   set SOLR_LOGS_DIR=%SOLR_LOGS_DIR:"=%
> )
>
> comments?
>
> Cheers,
> Arturas
>


Re: Suggester autocomplete for address information

2019-02-26 Thread Kehan Harman
I'd like to clarify that what I am looking for is the right field type for
the address field that will suggest values as follows for the input:
Input:
"123 SM"
Suggestions:

   - 123-127 SMITH STREET, KEMPSEY NSW 2440
   - 123 SMYTHE STREET. RANDOM PLACE 


And in addition to this I want the search to also provide results if I
simply include the postcode (4 integers here in Oz) as follows:

Input:
"2440"

Suggestions:

   - 123-127 SMITH STREET, KEMPSEY NSW 2440
   - 120 SMITH STREET, KEMPSEY NSW 2440
   - 65 SMITH STREET, KEMPSEY NSW 2440
   - 2440 ANOTHER RANDOM ROAD, RANDOM PLACE 


In short I would like it to try to match the beginning part of the address
first and if that fails start using later parts of the string such as
suburb, state and postcode.

The field type that I'm currently using as the basis of these suggestions
is as follows:



  <
filter class="solr.StopFilterFactory" words="lang/stopwords_en.txt"
ignoreCase="true"/> 

Thanks,
Kehan


On Tue, 26 Feb 2019 at 21:54, Kehan Harman <
kehan.har...@gaiaresources.com.au> wrote:

> Hi All,
>
> I'm new to Solr & the community so feel free to ignore / remove if this is
> the incorrect mailing list for this query.
>
> I'm trying to build an autocomplete using a Solr index for addresses in a
> format similar to:
>
> 123 Smith Street, KEMPSEY, NSW 2440
>
> I'm looking to have these addresses suggest values to users based on their
> input with some spellchecking capability.
>
> My documents contain contents like:
> { "id":"ANSW718363409", "table":"ADDRESS_DEFAULT_GEOCODE", "address":"123-127
> SMITH STREET, KEMPSEY NSW 2440", "address_location":
> "-31.07321967,152.84505473", "address_latitude":-31.07322, "
> address_longitude":152.84506, "locality_pid":"NSW2119", "locality_latitude
> ":-31.060476, "locality_longitude":152.84819, "suburb_postcode":"KEMPSEY
> NSW 2440", "number_first":123, "number_last":127, "street_number":
> "123-127", "street_name":"SMITH", "street_type_code":"STREET", "
> locality_name":"KEMPSEY", "state_name":"NEW SOUTH WALES", "
> state_abbreviation":"NSW", "postcode":"2440", "_version_":
> 1626515771141128204}
>
> These are Australian addresses extracted from
> https://data.gov.au/dataset/ds-dga-19432f89-dc3a-4ef3-b943-5326ef1dbecc/details
> .
>
> My managed schema has the following fields - I'm using the example managed
> schema *sample_techproducts_configs* with some additional fields that
> have been added using the schema API.:
>
>  stored="true"/>  "false" indexed="true" stored="true"/>  ="location" multiValued="false" indexed="true" stored="true"/>  ="address_longitude" type="float" multiValued="false" indexed="true"
> stored="true"/>  "false" indexed="true" stored="true"/>  "string" multiValued="false" indexed="true" stored="true"/>  "flat_number" type="int" multiValued="false" indexed="true" stored="true"
> />  "true" stored="true"/>  stored="true"/>  ="true" required="true" stored="true"/>  "strings"/>  indexed="true" stored="true"/>  "float" multiValued="false" indexed="true" stored="true"/>  "locality_location" type="location" multiValued="false" indexed="true"
> stored="true"/>  "false" indexed="true" stored="true"/>  "string" multiValued="false" indexed="true" stored="true"/>  "locality_pid" type="string" multiValued="false" indexed="true" stored=
> "true"/>  ="true" stored="true"/>  multiValued="false" indexed="true" stored="true"/>  "number_last" type="int" multiValued="false" indexed="true" stored="true"
> />  indexed="true" stored="true"/>  multiValued="false" indexed="true" stored="true"/>  "state_abbreviation" type="string" multiValued="false" indexed="true"
> stored="true"/>  indexed="true" stored="true"/>  multiValued="false" indexed="true" stored="true"/>  "street_number" type="string" multiValued="false" indexed="true" stored=
> "true"/>  indexed="true" stored="true"/>  "text_en" multiValued="false" indexed="true" stored="true"/>  "table" type="string" multiValued="false" indexed="true" stored="true"/> <
> field name="type" type="string" multiValued="false" indexed="true" stored=
> "true"/>
>
> The search component / requestHandler are defined as follows.
>
>   "suggester"> suburb 
> FuzzyLookupFactory 
> DocumentDictionaryFactory suburb_postcode <
> str name="suggestAnalyzerFieldType">string  "buildOnStartup">true>address FuzzyLookupFactory  "dictionaryImpl">DocumentDictionaryFactory address
>  string  "buildOnStartup">true="/suggest" class="solr.SearchHandler" startup="lazy" >  "defaults"> true 10 str>   suggest   requestHandler>
>
> Please let me know if you need any more information in order to answer
> this?
> Thanks,
> Kehan
>
>
>

-- 
**
Kehan Harman
Gaia Resources
p +61 8 92277309
m +61 406872510
w www.gaiaresources.com.au
e kehan.har...@gaiaresources.com.au
t @kehan 
g kehh 

I acknowledge the traditional custodians of the lands and waters where we
live and work, and pay my respects to elders 

Re: %solr_logs_dir% does not like spaces

2019-02-26 Thread Arturas Mazeika
Hi Paul,

getting rid of space in "program files" is doable, you are right. One way
to do it is through

   - echo %programfiles% ==> C:\Program Files
   - echo %programfiles(x86)% ==> C:\Program Files (x86)

Getting rid of spaces in sub directories is very difficult as we use tons
of those for different components of our suite.

Any other options to set it in some XML file or something?

Cheers,
Arturas


On Tue, Feb 26, 2019 at 3:03 PM  wrote:

> Looks like a bug in solr.cmd. You could try eliminating the spaces and/or
> opening an issue.
>
>
>
> Instead of ‘Program Files (x86)’ use ‘PROGRA~2’
>
> And don’t have spaces in your subdirectory…
>
>
>
> NB: Depending on your Windows Version you may Have another alias for
> ‘Program Files (x86)’; use «dir /X» to view the aliases.
>
>
>
> Gesendet von Mail für
> Windows 10
>
>
>
> Von: Arturas Mazeika
> Gesendet: Dienstag, 26. Februar 2019 14:41
> An: solr-user@lucene.apache.org
> Betreff: %solr_logs_dir% does not like spaces
>
>
>
> Hi All,
>
> I am testing solr 7.7 (and 7.6) under windows. My aim is to set logging
> into a subdirectory that contains spaces of a directory that contains
> spaces.
>
> If I set on windows:
>
> setx /m SOLR_LOGS_DIR "f:\solr_deployment\logs"
>
> and start a solr instance:
>
> F:\solr_deployment\solr-7.7.0\bin\solr.cmd start -h localhost -p 8983 -s
> F:\solr_deployment\solr_data -m 1g
>
> this goes smoothly.
>
> However If I set the logging directory to:
>
> setx /m SOLR_LOGS_DIR  "C:\Program Files (x86)\My Directory\Another
> Directory\logs\solr"
>
> then I get a cryptic error:
>
> F:\solr_deployment\solr-7.7.0\bin\solr.cmd start -h localhost -p 8983 -s
> F:\solr_deployment\solr_data -m 1g
> Files was unexpected at this time.
>
> If I comment "@echo off" in both solr.cmd and solr.cmd.in, it shows that
> it
> dies around those lines in solr.cmd:
>
> F:\solr_deployment\solr-7.7.0\bin>IF "" == "" set STOP_KEY=solrrocks
> Files was unexpected at this time.
>
> In the solr.cmd the following block is shown:
>
> IF "%STOP_KEY%"=="" set STOP_KEY=solrrocks
>
> @REM This is quite hacky, but examples rely on a different log4j2.xml
> @REM so that we can write logs for examples to %SOLR_HOME%\..\logs
> IF [%SOLR_LOGS_DIR%] == [] (
>   set "SOLR_LOGS_DIR=%SOLR_SERVER_DIR%\logs"
> ) ELSE (
>   set SOLR_LOGS_DIR=%SOLR_LOGS_DIR:"=%
> )
>
> comments?
>
> Cheers,
> Arturas
>


AW: %solr_logs_dir% does not like spaces

2019-02-26 Thread paul.dodd
Looks like a bug in solr.cmd. You could try eliminating the spaces and/or 
opening an issue.



Instead of ‘Program Files (x86)’ use ‘PROGRA~2’

And don’t have spaces in your subdirectory…



NB: Depending on your Windows Version you may Have another alias for ‘Program 
Files (x86)’; use «dir /X» to view the aliases.



Gesendet von Mail für Windows 10



Von: Arturas Mazeika
Gesendet: Dienstag, 26. Februar 2019 14:41
An: solr-user@lucene.apache.org
Betreff: %solr_logs_dir% does not like spaces



Hi All,

I am testing solr 7.7 (and 7.6) under windows. My aim is to set logging
into a subdirectory that contains spaces of a directory that contains
spaces.

If I set on windows:

setx /m SOLR_LOGS_DIR "f:\solr_deployment\logs"

and start a solr instance:

F:\solr_deployment\solr-7.7.0\bin\solr.cmd start -h localhost -p 8983 -s
F:\solr_deployment\solr_data -m 1g

this goes smoothly.

However If I set the logging directory to:

setx /m SOLR_LOGS_DIR  "C:\Program Files (x86)\My Directory\Another
Directory\logs\solr"

then I get a cryptic error:

F:\solr_deployment\solr-7.7.0\bin\solr.cmd start -h localhost -p 8983 -s
F:\solr_deployment\solr_data -m 1g
Files was unexpected at this time.

If I comment "@echo off" in both solr.cmd and solr.cmd.in, it shows that it
dies around those lines in solr.cmd:

F:\solr_deployment\solr-7.7.0\bin>IF "" == "" set STOP_KEY=solrrocks
Files was unexpected at this time.

In the solr.cmd the following block is shown:

IF "%STOP_KEY%"=="" set STOP_KEY=solrrocks

@REM This is quite hacky, but examples rely on a different log4j2.xml
@REM so that we can write logs for examples to %SOLR_HOME%\..\logs
IF [%SOLR_LOGS_DIR%] == [] (
  set "SOLR_LOGS_DIR=%SOLR_SERVER_DIR%\logs"
) ELSE (
  set SOLR_LOGS_DIR=%SOLR_LOGS_DIR:"=%
)

comments?

Cheers,
Arturas


Suggester autocomplete for address information

2019-02-26 Thread Kehan Harman
Hi All,

I'm new to Solr & the community so feel free to ignore / remove if this is
the incorrect mailing list for this query.

I'm trying to build an autocomplete using a Solr index for addresses in a
format similar to:

123 Smith Street, KEMPSEY, NSW 2440

I'm looking to have these addresses suggest values to users based on their
input with some spellchecking capability.

My documents contain contents like:
{ "id":"ANSW718363409", "table":"ADDRESS_DEFAULT_GEOCODE", "address":"123-127
SMITH STREET, KEMPSEY NSW 2440", "address_location":
"-31.07321967,152.84505473", "address_latitude":-31.07322, "
address_longitude":152.84506, "locality_pid":"NSW2119", "locality_latitude":
-31.060476, "locality_longitude":152.84819, "suburb_postcode":"KEMPSEY NSW
2440", "number_first":123, "number_last":127, "street_number":"123-127", "
street_name":"SMITH", "street_type_code":"STREET", "locality_name":"KEMPSEY",
"state_name":"NEW SOUTH WALES", "state_abbreviation":"NSW", "postcode":
"2440", "_version_":1626515771141128204}

These are Australian addresses extracted from
https://data.gov.au/dataset/ds-dga-19432f89-dc3a-4ef3-b943-5326ef1dbecc/details
.

My managed schema has the following fields - I'm using the example managed
schema *sample_techproducts_configs* with some additional fields that have
been added using the schema API.:

 <
field name="id" type="string" multiValued="false" indexed="true" required=
"true" stored="true"/>  
 
   
   

The search component / requestHandler are defined as follows.

  suburb 
FuzzyLookupFactory 
DocumentDictionaryFactory suburb_postcode string true
   address FuzzyLookupFactory 
DocumentDictionaryFactory address string true true 10   suggest  

Please let me know if you need any more information in order to answer this?
Thanks,
Kehan


%solr_logs_dir% does not like spaces

2019-02-26 Thread Arturas Mazeika
Hi All,

I am testing solr 7.7 (and 7.6) under windows. My aim is to set logging
into a subdirectory that contains spaces of a directory that contains
spaces.

If I set on windows:

setx /m SOLR_LOGS_DIR "f:\solr_deployment\logs"

and start a solr instance:

F:\solr_deployment\solr-7.7.0\bin\solr.cmd start -h localhost -p 8983 -s
F:\solr_deployment\solr_data -m 1g

this goes smoothly.

However If I set the logging directory to:

setx /m SOLR_LOGS_DIR  "C:\Program Files (x86)\My Directory\Another
Directory\logs\solr"

then I get a cryptic error:

F:\solr_deployment\solr-7.7.0\bin\solr.cmd start -h localhost -p 8983 -s
F:\solr_deployment\solr_data -m 1g
Files was unexpected at this time.

If I comment "@echo off" in both solr.cmd and solr.cmd.in, it shows that it
dies around those lines in solr.cmd:

F:\solr_deployment\solr-7.7.0\bin>IF "" == "" set STOP_KEY=solrrocks
Files was unexpected at this time.

In the solr.cmd the following block is shown:

IF "%STOP_KEY%"=="" set STOP_KEY=solrrocks

@REM This is quite hacky, but examples rely on a different log4j2.xml
@REM so that we can write logs for examples to %SOLR_HOME%\..\logs
IF [%SOLR_LOGS_DIR%] == [] (
  set "SOLR_LOGS_DIR=%SOLR_SERVER_DIR%\logs"
) ELSE (
  set SOLR_LOGS_DIR=%SOLR_LOGS_DIR:"=%
)

comments?

Cheers,
Arturas


RE: MLT and facetting

2019-02-26 Thread Martin Frank Hansen (MHQ)
Hi Edwin,

Here it is: 





-


-

text

1

1

true






Internal - KMD A/S

-Original Message-
From: Zheng Lin Edwin Yeo  
Sent: 26. februar 2019 08:24
To: solr-user@lucene.apache.org
Subject: Re: MLT and facetting

Hi Martin,

What is your setting in your /mlt requestHandler in solrconfig.xml?

Regards,
Edwin

On Tue, 26 Feb 2019 at 14:43, Martin Frank Hansen (MHQ)  wrote:

> Hi Edwin,
>
> Thanks for your response.
>
> Yes you are right. It was simply the search parameters from Solr.
>
> The query looks like this:
>
> http://
> .../solr/.../mlt?df=text=Journalnummer=on=id,Jour
> nalnummer=id:*6512815*
>
> best regards,
>
> Martin
>
>
> Internal - KMD A/S
>
> -Original Message-
> From: Zheng Lin Edwin Yeo 
> Sent: 26. februar 2019 03:54
> To: solr-user@lucene.apache.org
> Subject: Re: MLT and facetting
>
> Hi Martin,
>
> I think there are some pictures which are not being sent through in 
> the email.
>
> Do send your query that you are using, and which version of Solr you 
> are using?
>
> Regards,
> Edwin
>
> On Mon, 25 Feb 2019 at 20:54, Martin Frank Hansen (MHQ) 
> wrote:
>
> > Hi,
> >
> >
> >
> > I am trying to combine the mlt functionality with facets, but Solr 
> > throws
> > org.apache.solr.common.SolrException: ":"Unable to compute facet 
> > ranges, facet context is not set".
> >
> >
> >
> > What I am trying to do is quite simple, find similar documents using 
> > mlt and group these using the facet parameter. When using mlt and 
> > facets separately everything works fine, but not when combining the
> functionality.
> >
> >
> >
> >
> >
> > {
> >
> >   "responseHeader":{
> >
> > "status":500,
> >
> > "QTime":109},
> >
> >   "match":{"numFound":1,"start":0,"docs":[
> >
> >   {
> >
> > "Journalnummer":" 00759",
> >
> > "id":"6512815"  },
> >
> >   "response":{"numFound":602234,"start":0,"docs":[
> >
> >   {
> >
> > "Journalnummer":" 00759",
> >
> > "id":"6512816",
> >
> >   {
> >
> > "Journalnummer":" 00759",
> >
> > "id":"6834653"
> >
> >   {
> >
> > "Journalnummer":" 00739",
> >
> > "id":"6202373"
> >
> >   {
> >
> > "Journalnummer":" 00739",
> >
> > "id":"6748105"
> >
> >
> >
> >   {
> >
> > "Journalnummer":" 00803",
> >
> > "id":"7402155"
> >
> >   },
> >
> >   "error":{
> >
> > "metadata":[
> >
> >   "error-class","org.apache.solr.common.SolrException",
> >
> >   "root-error-class","org.apache.solr.common.SolrException"],
> >
> > "msg":"Unable to compute facet ranges, facet context is not 
> > set",
> >
> > "trace":"org.apache.solr.common.SolrException: Unable to compute 
> > facet ranges, facet context is not set\n\tat 
> > org.apache.solr.handler.component.RangeFacetProcessor.getFacetRangeC
> > ou nts(RangeFacetProcessor.java:66)\n\tat
> > org.apache.solr.handler.component.FacetComponent.getFacetCounts(Face
> > tC
> > omponent.java:331)\n\tat
> > org.apache.solr.handler.component.FacetComponent.getFacetCounts(Face
> > tC
> > omponent.java:295)\n\tat
> > org.apache.solr.handler.MoreLikeThisHandler.handleRequestBody(MoreLi
> > ke
> > ThisHandler.java:240)\n\tat
> > org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHand
> > le
> > rBase.java:199)\n\tat
> > org.apache.solr.core.SolrCore.execute(SolrCore.java:2541)\n\tat
> > org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:709)\
> > n\
> > tat
> > org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:515)\n\t
> > at 
> > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilt
> > er
> > .java:377)\n\tat
> > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilt
> > er
> > .java:323)\n\tat
> > org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(Servle
> > tH
> > andler.java:1634)\n\tat
> > org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:
> > 533)\n\tat
> > org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.
> > ja
> > va:146)\n\tat
> > org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.ja
> > va
> > :548)\n\tat
> > org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.
> > java:132)\n\tat
> > org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHand
> > le
> > r.java:257)\n\tat
> > org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHand
> > le
> > r.java:1595)\n\tat
> > org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHand
> > le
> > r.java:255)\n\tat
> > org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHand
> > le
> > r.java:1317)\n\tat
> > org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandl
> > er
> > .java:203)\n\tat
> > org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java
> > :4
> > 73)\n\tat
> > org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandl
> > er
> > .java:1564)\n\tat
> > 

Re: Spring Boot Solr+ Kerberos+ Ambari

2019-02-26 Thread Rushikesh Garadade
Hi,
Thanks for the links. I have followed these steps earlier as well, however
I did not excuted  steps from Ranger as I don't want authorization.
 I didn't get any success.

Thats why My question is
*Is Ranger mandatory when you just want authentication with Kerberos?*


Thank you,
Rushikesh Garadade

On Thu, Feb 21, 2019, 6:34 PM Furkan KAMACI  wrote:

> Hi,
>
> You can also check here:
>
> https://community.hortonworks.com/articles/15159/securing-solr-collections-with-ranger-kerberos.html
> On
> the other hand, we have a section for Solr Kerberos at documentation:
>
> https://lucene.apache.org/solr/guide/6_6/kerberos-authentication-plugin.html
> For
> any Ambari specific questions, you can ask them at this forum:
> https://community.hortonworks.com/topics/forum.html
>
> Kind Regards,
> Furkan KAMACI
>
> On Thu, Feb 21, 2019 at 1:43 PM Rushikesh Garadade <
> rushikeshgarad...@gmail.com> wrote:
>
> > Hi Furkan,
> > I think the link you provided is for ranger audit setting, please correct
> > me if wrong?
> >
> > I use HDP 2.6.5. which has Solr 5.6
> >
> > Thank you,
> > Rushikesh Garadade
> >
> >
> > On Thu, Feb 21, 2019, 2:57 PM Furkan KAMACI 
> > wrote:
> >
> > > Hi Rushikesh,
> > >
> > > Did you check here:
> > >
> > >
> >
> https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.6.5/bk_security/content/solr_ranger_configure_solrcloud_kerberos.html
> > >
> > > By the way, which versions do you use?
> > >
> > > Kind Regards,
> > > Furkan KAMACI
> > >
> > > On Thu, Feb 21, 2019 at 11:41 AM Rushikesh Garadade <
> > > rushikeshgarad...@gmail.com> wrote:
> > >
> > > > Hi All,
> > > >
> > > > I am trying to set Kerberos for Solr which is installed on
> Hortonworks
> > > > Ambari.
> > > >
> > > > Q1. Is Ranger a mandatory component for Solr Kerberos configuration
> on
> > > > ambari.?
> > > >
> > > > I am getting little confused with documents available on internet for
> > > this.
> > > > I tried to do without ranger but not getting any success.
> > > >
> > > > If is there any good document for the same, please let me know.
> > > >
> > > > Thanks,
> > > > Rushikesh Garadade.
> > > >
> > >
> >
>


Re: LTR feature based on other collection data

2019-02-26 Thread Kamal Kishore Aggarwal
I looks to me that I can modify the *SolrFeature *class, but dont know how
to create IndexSearcher and SolrQueryRequest params as per the new request
and second collection.

@Override
  public FeatureWeight createWeight(*IndexSearcher searcher*, boolean
needsScores,
  *SolrQueryRequest request*, Query originalQuery, Map
efi)
  throws IOException {
return new SolrFeatureWeight(searcher, request, originalQuery, efi);
  }

Regards
Kamal


On Tue, Feb 26, 2019 at 12:34 PM Kamal Kishore Aggarwal <
kkroyal@gmail.com> wrote:

> Hi,
>
> I am working on LTR using solr 6.6.2. I am working on custom feature
> creation. I am able to create few custom features as per our requirement.
>
> But, there are certain features, for which the data is stored in other
> collection. Data like count of clicks, last date when the product was
> ordered, etc. These type of information is stored in another collection and
> we are not planning to put this info. in first collection.
>
> Now, we need to use the data in other collection to generate the score of
> the document  in LTR. We are open to develop custom components as well.
>
> Is there a way, we can modify our query using some join. But, we know join
> is expensive.
>
> Please suggest. Thanks in advance.
>
> Regards
> Kamal Kishore
>


AW: High Availability with two nodes

2019-02-26 Thread Andreas Mock
Hi Jörn,

thank you.

How would this scenario look like? 
Single Server on both nodes. 
But how would you keep the indexes in sync?

Best regards
Andreas


> -Ursprüngliche Nachricht-
> Von: Jörn Franke 
> Gesendet: Dienstag, 26. Februar 2019 11:29
> An: solr-user@lucene.apache.org
> Betreff: Re: High Availability with two nodes
> 
> I would go for SolrCloud, but for simple active / passive scenarios you
> can use a simple http load balancer with health checks.
> 
> > Am 26.02.2019 um 10:39 schrieb Andreas Mock :
> >
> > Hi all,
> >
> > currently we are looking at Apache Solr as a solution
> > for searching. One important component is high availability.
> > I digged around finding out that HA is built in via
> > SolrCloud which means I have to install ZooKeeper in
> > a production environment which needs at least three
> > nodes.
> >
> > So, now to my problem. I haven't found any documents
> > showing a way to get a two node cluster simply for
> > HA (active/passive).
> >
> > Is there a recommended way or are there any solutions
> > out there showing a scenario with Solr Single Server
> > combined e.g. DRBD and Pacemaker?
> >
> > Any hints and pointers are very welcome.
> >
> > Thank you in advance
> > Andreas
> >


Re: questions regrading stored fields role in query time

2019-02-26 Thread Emir Arnautović
Hi Saurabh,
DocValues can be used for retrieving field values (note that order will not be 
preserved in case of multivalue field) but they are also stored in files, just 
different structures. Doc values will load some structure in memory, but will 
also use memory mapped files to access values (not familiar with this code and 
just assuming) so in any case it will use “shared” OS caches. Those caches will 
be affected when loading stored fields to do partial update. Also it’ll take 
some memory when indexing documents. That is why storing and doing partial 
updates could indirectly affect query performances. But that might be 
insignificant and only test can tell for sure. Unless you have small index and 
enough RAM, then I can also tell that for sure.

Emir
--
Monitoring - Log Management - Alerting - Anomaly Detection
Solr & Elasticsearch Consulting Support Training - http://sematext.com/



> On 26 Feb 2019, at 11:21, Saurabh Sharma  wrote:
> 
> Hi Emir,
> 
> I had this question in my mind if I store my only returnable field as
> docValue in RAM.will my stored documents be referenced while constructing
> the response after the query. Ideally, as the field asked to return i.e fl
> is already in RAM then documents on disk should not be consulted for this
> field.
> 
> Any insight about the usage of docValued field vs stored field and
> preference order will help here in understanding the situation in a better
> way.
> 
> Thanks
> Saurabh
> 
> On Tue, Feb 26, 2019 at 2:41 PM Emir Arnautović <
> emir.arnauto...@sematext.com> wrote:
> 
>> Hi Saurabh,
>> Welcome to the channel!
>> Storing fields should not affect query performances directly if you use
>> lazy field loading and it is the default set. And it should not affect at
>> all if you have enough RAM compared to index size. Otherwise OS caches
>> might be affected by stored fields. The best way to tell is to tests with
>> expected indexing/partial updates load and see if/how much it affects
>> performances.
>> 
>> HTH,
>> Emir
>> --
>> Monitoring - Log Management - Alerting - Anomaly Detection
>> Solr & Elasticsearch Consulting Support Training - http://sematext.com/
>> 
>> 
>> 
>>> On 26 Feb 2019, at 09:34, Saurabh Sharma 
>> wrote:
>>> 
>>> Hi All ,
>>> 
>>> 
>>> I am new here on this channel.
>>> Few days back we upgraded our solr cloud to version 7.3 and doing
>> real-time
>>> document posting with 15 seconds soft commit and 2 minutes hard commit
>>> time.As of now we posting full document to solr which includes data
>>> accumulations from various sources.
>>> 
>>> Now we want to do partial updates.I went through the documentation and
>>> found that all the fields should be stored or docValues for partial
>>> updates. I have few questions regarding this?
>>> 
>>> 1) In case i am just fetching only 1 field while making query.What will
>> the
>>> performance impact due to all fields being stored? Lets say i have an
>> "id"
>>> field and i do have doc value true for the field, will solr use stored
>>> fields in this case? will it load whole document in RAM ?
>>> 
>>> 2)What's the impact of large stored fields (.fdt) on query time
>>> performance. Do query time even depend on the stored field or they just
>>> depend on indexes?
>>> 
>>> 
>>> Thanks and regards
>>> Saurabh
>> 
>> 



Re: High Availability with two nodes

2019-02-26 Thread Jörn Franke
I would go for SolrCloud, but for simple active / passive scenarios you can use 
a simple http load balancer with health checks.

> Am 26.02.2019 um 10:39 schrieb Andreas Mock :
> 
> Hi all,
> 
> currently we are looking at Apache Solr as a solution
> for searching. One important component is high availability.
> I digged around finding out that HA is built in via
> SolrCloud which means I have to install ZooKeeper in 
> a production environment which needs at least three
> nodes.
> 
> So, now to my problem. I haven't found any documents
> showing a way to get a two node cluster simply for
> HA (active/passive).
> 
> Is there a recommended way or are there any solutions
> out there showing a scenario with Solr Single Server
> combined e.g. DRBD and Pacemaker?
> 
> Any hints and pointers are very welcome.
> 
> Thank you in advance
> Andreas 
> 


Re: questions regrading stored fields role in query time

2019-02-26 Thread Saurabh Sharma
Hi Emir,

I had this question in my mind if I store my only returnable field as
docValue in RAM.will my stored documents be referenced while constructing
the response after the query. Ideally, as the field asked to return i.e fl
is already in RAM then documents on disk should not be consulted for this
field.

Any insight about the usage of docValued field vs stored field and
preference order will help here in understanding the situation in a better
way.

Thanks
Saurabh

On Tue, Feb 26, 2019 at 2:41 PM Emir Arnautović <
emir.arnauto...@sematext.com> wrote:

> Hi Saurabh,
> Welcome to the channel!
> Storing fields should not affect query performances directly if you use
> lazy field loading and it is the default set. And it should not affect at
> all if you have enough RAM compared to index size. Otherwise OS caches
> might be affected by stored fields. The best way to tell is to tests with
> expected indexing/partial updates load and see if/how much it affects
> performances.
>
> HTH,
> Emir
> --
> Monitoring - Log Management - Alerting - Anomaly Detection
> Solr & Elasticsearch Consulting Support Training - http://sematext.com/
>
>
>
> > On 26 Feb 2019, at 09:34, Saurabh Sharma 
> wrote:
> >
> > Hi All ,
> >
> >
> > I am new here on this channel.
> > Few days back we upgraded our solr cloud to version 7.3 and doing
> real-time
> > document posting with 15 seconds soft commit and 2 minutes hard commit
> > time.As of now we posting full document to solr which includes data
> > accumulations from various sources.
> >
> > Now we want to do partial updates.I went through the documentation and
> > found that all the fields should be stored or docValues for partial
> > updates. I have few questions regarding this?
> >
> > 1) In case i am just fetching only 1 field while making query.What will
> the
> > performance impact due to all fields being stored? Lets say i have an
> "id"
> > field and i do have doc value true for the field, will solr use stored
> > fields in this case? will it load whole document in RAM ?
> >
> > 2)What's the impact of large stored fields (.fdt) on query time
> > performance. Do query time even depend on the stored field or they just
> > depend on indexes?
> >
> >
> > Thanks and regards
> > Saurabh
>
>


High Availability with two nodes

2019-02-26 Thread Andreas Mock
Hi all,

currently we are looking at Apache Solr as a solution
for searching. One important component is high availability.
I digged around finding out that HA is built in via
SolrCloud which means I have to install ZooKeeper in 
a production environment which needs at least three
nodes.

So, now to my problem. I haven't found any documents
showing a way to get a two node cluster simply for
HA (active/passive).

Is there a recommended way or are there any solutions
out there showing a scenario with Solr Single Server
combined e.g. DRBD and Pacemaker?

Any hints and pointers are very welcome.

Thank you in advance
Andreas 



Re: questions regrading stored fields role in query time

2019-02-26 Thread Emir Arnautović
Hi Saurabh,
Welcome to the channel!
Storing fields should not affect query performances directly if you use lazy 
field loading and it is the default set. And it should not affect at all if you 
have enough RAM compared to index size. Otherwise OS caches might be affected 
by stored fields. The best way to tell is to tests with expected 
indexing/partial updates load and see if/how much it affects performances.

HTH,
Emir
--
Monitoring - Log Management - Alerting - Anomaly Detection
Solr & Elasticsearch Consulting Support Training - http://sematext.com/



> On 26 Feb 2019, at 09:34, Saurabh Sharma  wrote:
> 
> Hi All ,
> 
> 
> I am new here on this channel.
> Few days back we upgraded our solr cloud to version 7.3 and doing real-time
> document posting with 15 seconds soft commit and 2 minutes hard commit
> time.As of now we posting full document to solr which includes data
> accumulations from various sources.
> 
> Now we want to do partial updates.I went through the documentation and
> found that all the fields should be stored or docValues for partial
> updates. I have few questions regarding this?
> 
> 1) In case i am just fetching only 1 field while making query.What will the
> performance impact due to all fields being stored? Lets say i have an "id"
> field and i do have doc value true for the field, will solr use stored
> fields in this case? will it load whole document in RAM ?
> 
> 2)What's the impact of large stored fields (.fdt) on query time
> performance. Do query time even depend on the stored field or they just
> depend on indexes?
> 
> 
> Thanks and regards
> Saurabh



questions regrading stored fields role in query time

2019-02-26 Thread Saurabh Sharma
Hi All ,


I am new here on this channel.
Few days back we upgraded our solr cloud to version 7.3 and doing real-time
document posting with 15 seconds soft commit and 2 minutes hard commit
time.As of now we posting full document to solr which includes data
accumulations from various sources.

Now we want to do partial updates.I went through the documentation and
found that all the fields should be stored or docValues for partial
updates. I have few questions regarding this?

1) In case i am just fetching only 1 field while making query.What will the
performance impact due to all fields being stored? Lets say i have an "id"
field and i do have doc value true for the field, will solr use stored
fields in this case? will it load whole document in RAM ?

2)What's the impact of large stored fields (.fdt) on query time
performance. Do query time even depend on the stored field or they just
depend on indexes?


Thanks and regards
Saurabh