Tool to format the solr query for easier reading?

2019-01-07 Thread Hullegård , Jimi
Hi,

I often find myself having to analyze an already existing solr query. But when 
the number of clauses and/or number of nested parentheses reach a certain level 
I can no longer grasp what the query is about by just a quick glance. Sometimes 
I can look at the code generating the query, but it might be autogenerated in a 
complex way, or I might only have access to a log output of the query.

Here is an example query, based on a real query in our system:


system:(a) type:(x OR y OR z) date1:[* TO 2019-08-31T06:15:00Z/DAY+1DAYS] 
((boolean1:false OR date2:[* TO 2019-08-31T06:15:00Z/DAY-30DAYS])) 
-date3:[2019-08-31T06:15:00Z/DAY+1DAYS TO *] (((*:* -date4:*) OR date5:* OR 
date3:[* TO 2019-08-31T06:15:00Z/DAY+1DAYS]))


Here I find it quite difficult to what clauses are grouped together (using 
parentheses). What I tend to do in these circumstances is to copy the query 
into a simple text editor, and then manually add line breaks and indentation 
matching the parentheses levels.

For the query above, it would result in something like this:


system:(a)
type:(x OR y OR z)
date1:[* TO 2019-08-31T06:15:00Z/DAY+1DAYS]
(
 (boolean1:false OR date2:[* TO 
2019-08-31T06:15:00Z/DAY-30DAYS])
)
-date3:[2019-08-31T06:15:00Z/DAY+1DAYS TO *]
(
 ((*:* -date4:*) OR date5:* OR date3:[* TO 
2019-08-31T06:15:00Z/DAY+1DAYS])
)


But that is a slow process, and I might make a mistake that messes up the 
interpretation completely. Especially when there are several levels of nested 
parentheses.

Does anyone know of any kind of tool that would help automate this? It wouldn't 
have to format its output like my example, as long as it makes it easier to see 
what start and end parentheses belong to each other, preferably using multiple 
lines and indentation.

A java tool would be perfect, because then I could easily integrate it into our 
existing debugging tools, but an online formatter (like 
http://jsonformatter.curiousconcept.com) would also be very useful.

Regards
/Jimi

Svenskt Näringsliv behandlar dina personuppgifter i enlighet med GDPR. Här kan 
du läsa mer om vår behandling och dina rättigheter, 
Integritetspolicy


Re: Re: Page faults

2019-01-07 Thread Erick Erickson
having some replicas at 90G and some at 18G is totally unexpected with
compisiteID routing unless you're using "multi-level routing", see:
https://lucidworks.com/2014/01/06/multi-level-composite-id-routing-solrcloud/

But let's be clear what we're talking about here. I'm talking about
specifically the size of the index on disk for any particular
_replica_, meaning the size in places similar to:
pdv201806_shard1_replica1/data/index. I've never seen as much
disparity as you're talking about so we should get to the bottom of
that.

Do you have massive numbers of deleted docs in any of those shards?
The admin screen for any particular replica will show this number.


On another note: Your cache sizes are probably not part of the page
fault question, but on the surface they're badly misconfigured, at
least the filterCache and queryResultCache. Each entry in the
filterCache is a map entry, the key is roughly the query and the value
is bounded by maxDoc/8. So if you have, say, 8M documents, your
filterCache could theoretically be 1M each (give or take) and you
could have up to 20,000 of them. You're probably just being lucky and
either not having very many distinct fq clauses or are indexing often
enough that it isn't growing for very long before being flushed.

Your queryResultCache takes up a lot less space, but still it's quite
large. It has two primary purposes:
> paging. It generally stores a few integers (40 is common, maybe several 
> hundred but who cares?) so hitting the next page won't have to search again. 
> This isn't terribly important in modern installations.

> being used in autowarming to pre-load parts of the index into memory.

I'd consider knocking each of these back to the defaults (512), except
I'd put the autowarm count at, say, 16 or so.

The document cache is less clear, the recommendation is (number of
simultaneous queries you expect) X (your average row parameter)

Best,
Erick

On Mon, Jan 7, 2019 at 12:43 PM Branham, Jeremy (Experis)
 wrote:
>
> Thanks Erick/Chris for the information.
> The page faults are occurring on each node of the cluster.
> These are VMs running SOLR v7.2.1 on RHEL 7. CPUx8, 64GB mem.
>
> We’re collecting GC information and using a DynaTrace agent, so I’m not sure 
> if / how much that contributes to the overhead.
>
> This cluster is used strictly for type-ahead/auto-complete functionality.
>
> I’ve also just noticed that the shards are imbalanced – 2 having about 90GB 
> and 2 having about 18GB of data.
> Having just joined this team, I’m not too familiar yet with the documents or 
> queries/updates [and maybe not relevant to the page faults].
> Although, I did check the schema, and most of the fields are stored=true, 
> docValues=true
>
> Solr v7.2.1
> OS: RHEL 7
>
> Collection Configuration -
> Shard count: 4
> configName: pdv201806
> replicationFactor: 2
> maxShardsPerNode: 1
> router: compositeId
> autoAddReplicas: false
>
> Cache configuration –
> filterCache class="solr.FastLRUCache"
>  size="2"
>  initialSize="5000"
>  autowarmCount="10"
> queryResultCache class="solr.LRUCache"
>   size="5000"
>   initialSize="1000"
>   autowarmCount="0"
> documentCache class="solr.LRUCache"
>size="15000"
>initialSize="512"
>
> enableLazyFieldLoading=true
>
>
> JVM Information/Configuration –
> java.runtime.version: 1.8.0_162-b12
>
> -XX:+CMSParallelRemarkEnabled
> -XX:+CMSScavengeBeforeRemark
> -XX:+ParallelRefProcEnabled
> -XX:+PrintGCApplicationStoppedTime
> -XX:+PrintGCDateStamps
> -XX:+PrintGCDetails
> -XX:+PrintGCTimeStamps
> -XX:+PrintHeapAtGC
> -XX:+PrintTenuringDistribution
> -XX:+ScavengeBeforeFullGC
> -XX:+UseCMSInitiatingOccupancyOnly
> -XX:+UseConcMarkSweepGC
> -XX:+UseGCLogFileRotation
> -XX:+UseParNewGC
> -XX:-OmitStackTraceInFastThrow
> -XX:CMSInitiatingOccupancyFraction=70
> -XX:CMSMaxAbortablePrecleanTime=6000
> -XX:ConcGCThreads=4
> -XX:GCLogFileSize=20M
> -XX:MaxTenuringThreshold=8
> -XX:NewRatio=3
> -XX:ParallelGCThreads=8
> -XX:PretenureSizeThreshold=64m
> -XX:SurvivorRatio=4
> -XX:TargetSurvivorRatio=90
> -Xms16g
> -Xmx32g
> -Xss256k
> -verbose:gc
>
>
>
> Jeremy Branham
> jb...@allstate.com
>
> On 1/7/19, 1:16 PM, "Christopher Schultz"  
> wrote:
>
> -BEGIN PGP SIGNED MESSAGE-
> Hash: SHA256
>
> Erick,
>
> On 1/7/19 11:52, Erick Erickson wrote:
> > Images do not come through, so we don't see what you're seeing.
> >
> > That said, I'd expect page faults to happen:
> >
> > 1> when indexing. Besides what you'd expect (new segments written
> > to disk), there's segment merging going on in the background which
> > has to read segments from disk in order to merge.
> >
> > 2> when querying, any fields returned as part of a doc that has
> > stored=true docValues=false will require a disk access to get the
> > stored data.
>
> A page 

DateRangeField requires month?

2019-01-07 Thread Jeremy Smith
Hello,

 I am trying to use the DateRangeField and ran into an interesting issue.  
According to the documentation 
(https://lucene.apache.org/solr/guide/7_6/working-with-dates.html), these are 
both valid for the DateRangeField: 2000-11 and 2000-11T13.  I can confirm this 
is working in 7.6.  I would also expect to be able to use 2000T13, which would 
mean any time in the year 2000 between 1300 and 1400.  However, I get an error 
when trying to insert this value:


"error":{"metadata":


["error-class","org.apache.solr.common.SolrException","root-error-class","java.lang.NumberFormatException"],

"msg":"ERROR: Error adding field 'dtRange'='2000T13' msg=Couldn't parse 
date because: Improperly formatted date: 2000T13","code":400

}


I am using 7.6 with a super simple schema containing only _version_ and a 
DateRangeField and there's nothing special in my solrconfig.xml.  Is this 
behavior expected?  Should I open a jira issue?


Thanks,

Jeremy


Re: Re: Page faults

2019-01-07 Thread Branham, Jeremy (Experis)
Thanks Erick/Chris for the information.
The page faults are occurring on each node of the cluster.
These are VMs running SOLR v7.2.1 on RHEL 7. CPUx8, 64GB mem.

We’re collecting GC information and using a DynaTrace agent, so I’m not sure if 
/ how much that contributes to the overhead.

This cluster is used strictly for type-ahead/auto-complete functionality. 

I’ve also just noticed that the shards are imbalanced – 2 having about 90GB and 
2 having about 18GB of data.
Having just joined this team, I’m not too familiar yet with the documents or 
queries/updates [and maybe not relevant to the page faults]. 
Although, I did check the schema, and most of the fields are stored=true, 
docValues=true

Solr v7.2.1
OS: RHEL 7

Collection Configuration - 
Shard count: 4
configName: pdv201806
replicationFactor: 2
maxShardsPerNode: 1
router: compositeId
autoAddReplicas: false

Cache configuration –
filterCache class="solr.FastLRUCache"
 size="2"
 initialSize="5000"
 autowarmCount="10"
queryResultCache class="solr.LRUCache"
  size="5000"
  initialSize="1000"
  autowarmCount="0"
documentCache class="solr.LRUCache"
   size="15000"
   initialSize="512"

enableLazyFieldLoading=true


JVM Information/Configuration –
java.runtime.version: 1.8.0_162-b12

-XX:+CMSParallelRemarkEnabled
-XX:+CMSScavengeBeforeRemark
-XX:+ParallelRefProcEnabled
-XX:+PrintGCApplicationStoppedTime
-XX:+PrintGCDateStamps
-XX:+PrintGCDetails
-XX:+PrintGCTimeStamps
-XX:+PrintHeapAtGC
-XX:+PrintTenuringDistribution
-XX:+ScavengeBeforeFullGC
-XX:+UseCMSInitiatingOccupancyOnly
-XX:+UseConcMarkSweepGC
-XX:+UseGCLogFileRotation
-XX:+UseParNewGC
-XX:-OmitStackTraceInFastThrow
-XX:CMSInitiatingOccupancyFraction=70
-XX:CMSMaxAbortablePrecleanTime=6000
-XX:ConcGCThreads=4
-XX:GCLogFileSize=20M
-XX:MaxTenuringThreshold=8
-XX:NewRatio=3
-XX:ParallelGCThreads=8
-XX:PretenureSizeThreshold=64m
-XX:SurvivorRatio=4
-XX:TargetSurvivorRatio=90
-Xms16g
-Xmx32g
-Xss256k
-verbose:gc


 
Jeremy Branham
jb...@allstate.com

On 1/7/19, 1:16 PM, "Christopher Schultz"  wrote:

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

Erick,

On 1/7/19 11:52, Erick Erickson wrote:
> Images do not come through, so we don't see what you're seeing.
> 
> That said, I'd expect page faults to happen:
> 
> 1> when indexing. Besides what you'd expect (new segments written
> to disk), there's segment merging going on in the background which
> has to read segments from disk in order to merge.
> 
> 2> when querying, any fields returned as part of a doc that has
> stored=true docValues=false will require a disk access to get the
> stored data.

A page fault is not necessarily a disk access. It almost always *is*,
but it's not because the application is calling fopen(). It's because
the OS is performing a memory operation which often results in a dip
into virtual memory.

Jeremy, are these page-faults occurring on all the machines in your
cluster, or only some? What is the hardware configuration of each
machine (specifically, memory)? What are your JVM settings for your
Solr instances? Is anything else running on these nodes?

It would help to understand what's happening on your servers. "I'm
seeing page faults" doesn't really help us help you.

Thanks,
- -chris

> On Mon, Jan 7, 2019 at 8:35 AM Branham, Jeremy (Experis) 
>  wrote:
>> 
>> Does anyone know if it is typical behavior for a SOLR cluster to
>> have lots of page faults (50-100 per second) under heavy load?
>> 
>> We are performing load testing on a cluster with 8 nodes, and my
>> performance engineer has brought this information to attention.
>> 
>> I don’t know enough about memory management to say it is normal
>> or not.
>> 
>> 
>> 
>> The performance doesn’t appear to be suffering, but I don’t want
>> to overlook a potential hazard.
>> 
>> 
>> 
>> Thanks!
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> Jeremy Branham
>> 
>> jb...@allstate.com
>> 
>> Allstate Insurance Company | UCV Technology Services |
>> Information Services Group
>> 
>> 
> 
-BEGIN PGP SIGNATURE-
Comment: Using GnuPG with Thunderbird - https://www.enigmail.net/

iQIzBAEBCAAdFiEEMmKgYcQvxMe7tcJcHPApP6U8pFgFAlwzpYsACgkQHPApP6U8
pFgSHxAAgaXV5wkwV7Ru2QyhnvxUnIWY4Iom0IdZYrDuZBDxmFx9wzE7P33zmR3E
nrgZCqBtAMdxRSwG9BfyKircChZBssqtQpskw6mgJyzRyGvKVJjJ68r0vEio3Kjo
HjaJczBFWvdOKm42W1Li4SeymGyYXu/jmdkWLcIbEM4BgDQLf1HhSEphDeZzP4ST
GNDBrIA6XkUJwE1r58FUuj9l0XSKUAPLOPNAx1qGiAn4fKdbysVHvLcvJvJzC0pC
1kx000r+Mqdd61EzhM20ZDIvg2F3vgFgGCUtB31hIi18bfD8whoAafL2FSMkIccD

Re: Questions for SynonymGraphFilter and WordDelimiterGraphFilter

2019-01-07 Thread Wei
Thanks Thomas. You mentioned "Also there is no need for the
FlattenGraphFilter", that's quite interesting because the Solr
documentation says it's mandatory for indexing:
https://lucene.apache.org/solr/guide/7_6/filter-descriptions.html. Is there
any more explanation for this?

Best regards,
Wei


On Mon, Jan 7, 2019 at 7:56 AM Thomas Aglassinger <
t.aglassin...@netconomy.net> wrote:

> Hi Wei,
>
> here's a fairly simple field type we currently use in a project that seems
> to do the job with graph synonyms. Maybe this helps as a starting point for
> you:
>
>  positionIncrementGap="100">
> 
> 
>  managed="de" />
>  />
>  preserveOriginal="1"
> generateWordParts="1" generateNumberParts="1"
> catenateWords="1"
> catenateNumbers="1" catenateAll="0"
> splitOnCaseChange="1" />
> 
> 
> 
> 
> 
> 
>
> As you can see we use the same filters for both indexing and query, so
> this might have some impact on positional queries but so far it seems
> negligible for the short synonyms we use in practice. Also there is no need
> for the FlattenGraphFilter.
>
> The WhitespaceTokenizerFactory ensures that you can define synonyms with
> hyphens like mac-book -> macbook.
>
> Best regards, Thomas.
>
>
> On 05.01.19, 02:11, "Wei"  wrote:
>
> Hello,
>
> We are upgrading to Solr 7.6.0 and noticed that SynonymFilter and
> WordDelimiterFilter have been deprecated. Solr doc recommends to use
> SynonymGraphFilter and WordDelimiterGraphFilter instead
> I guess the StopFilter mess up the SynonymGraphFilter output? Not sure
> if  it's a solr defect or there is a guideline that StopFilter should
> not be put after graph filters.
>
> Thanks in advance for you input.
>
>
> Thanks,
>
> Wei
>
>
>


Re: Page faults

2019-01-07 Thread Christopher Schultz
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

Erick,

On 1/7/19 11:52, Erick Erickson wrote:
> Images do not come through, so we don't see what you're seeing.
> 
> That said, I'd expect page faults to happen:
> 
> 1> when indexing. Besides what you'd expect (new segments written
> to disk), there's segment merging going on in the background which
> has to read segments from disk in order to merge.
> 
> 2> when querying, any fields returned as part of a doc that has
> stored=true docValues=false will require a disk access to get the
> stored data.

A page fault is not necessarily a disk access. It almost always *is*,
but it's not because the application is calling fopen(). It's because
the OS is performing a memory operation which often results in a dip
into virtual memory.

Jeremy, are these page-faults occurring on all the machines in your
cluster, or only some? What is the hardware configuration of each
machine (specifically, memory)? What are your JVM settings for your
Solr instances? Is anything else running on these nodes?

It would help to understand what's happening on your servers. "I'm
seeing page faults" doesn't really help us help you.

Thanks,
- -chris

> On Mon, Jan 7, 2019 at 8:35 AM Branham, Jeremy (Experis) 
>  wrote:
>> 
>> Does anyone know if it is typical behavior for a SOLR cluster to
>> have lots of page faults (50-100 per second) under heavy load?
>> 
>> We are performing load testing on a cluster with 8 nodes, and my
>> performance engineer has brought this information to attention.
>> 
>> I don’t know enough about memory management to say it is normal
>> or not.
>> 
>> 
>> 
>> The performance doesn’t appear to be suffering, but I don’t want
>> to overlook a potential hazard.
>> 
>> 
>> 
>> Thanks!
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> Jeremy Branham
>> 
>> jb...@allstate.com
>> 
>> Allstate Insurance Company | UCV Technology Services |
>> Information Services Group
>> 
>> 
> 
-BEGIN PGP SIGNATURE-
Comment: Using GnuPG with Thunderbird - https://www.enigmail.net/

iQIzBAEBCAAdFiEEMmKgYcQvxMe7tcJcHPApP6U8pFgFAlwzpYsACgkQHPApP6U8
pFgSHxAAgaXV5wkwV7Ru2QyhnvxUnIWY4Iom0IdZYrDuZBDxmFx9wzE7P33zmR3E
nrgZCqBtAMdxRSwG9BfyKircChZBssqtQpskw6mgJyzRyGvKVJjJ68r0vEio3Kjo
HjaJczBFWvdOKm42W1Li4SeymGyYXu/jmdkWLcIbEM4BgDQLf1HhSEphDeZzP4ST
GNDBrIA6XkUJwE1r58FUuj9l0XSKUAPLOPNAx1qGiAn4fKdbysVHvLcvJvJzC0pC
1kx000r+Mqdd61EzhM20ZDIvg2F3vgFgGCUtB31hIi18bfD8whoAafL2FSMkIccD
H7X09PpUK8qPM/oQgqCKTtfmVR3M2pi3CSxLFSQ1/QucnF2wxWknOOWUH1TMU/L2
KUQHS6GwuTk+R/8PxdBRsZI8ON3MVb690ECV4QplYlkrtygXrLRg2YOgifgAXsKL
5Kg2mrpKoxfNnDWaRksy4GUDTsSxbkd1rpnHJEZ8le26HXvz9wrug/FtNPzqP8S9
dan2gkgiSqOM9GKlKkA72ROyQDhZa5YiXfGNdRrmfkiQzlDBEcGpD8pg1GwskRJl
yidTBfvRSyCHsI5NBGf65nTG+2WfUnr8wClHVK5QQGVilHBn6KzeHeDTL9ZpHvcn
GhkDMvc+9f8DR7Hr/mTiGjYIAvJZYiIJeYUoe0Bl2BHmGDv0tEk=
=OpZo
-END PGP SIGNATURE-


Re: Solr relevancy score different on replicated nodes

2019-01-07 Thread Erick Erickson
You misunderstand my point. The wall clock times _will_ be
different on leader and follower. It follows that the
documents contained in the individual segments on
the leader and follower will _not_ be identical.

This leads to _deleted_ documents being in different
segments on the leader and follower. Which also means
that the merge decisions will eventually merge different
segments.

Now remember that over time when you update a doc,
the doc is "marked as deleted", but some of the stats
e.g. termfrequency _still_ include the data for the
deleted docs and will until the segment is merged.

So the term frequency for some term on the leader
will be slightly different than on the follower and thus
the scoring will differ depending on which replica
gets the query. Etc.

The fact that you deleted and re-added the follower
supports the above. And your scores will skew as
you continue to update documents over time.

Generally this isn't something that people concern
themselves with, but if it's important to you you can
try enabling exactstatscache helps, see:
https://lucene.apache.org/solr/guide/6_6/distributed-requests.html

Best,
Erick

On Sun, Jan 6, 2019 at 10:25 PM Ashish Bisht  wrote:
>
> Hi Erick,
>
> Thank you for the details,but doesn't look like a time difference in
> autocommit caused this issue.As I said if I do retrieve all query/keyword
> query on both server,they returned correct number of docs,its just relevancy
> score is taking diff values.
>
> I waited for brief period,still discrepancy was coming(no indexing also).So
> I went ahead deleting the follower node(thinking leader replica should be in
> correct state).After adding the new replica again,the issue is not
> appearing.
>
> We will monitor same if it appears in future.
>
> Regards
> Ashish
>
>
>
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Page faults

2019-01-07 Thread Erick Erickson
Images do not come through, so we don't see what you're seeing.

That said, I'd expect page faults to happen:

1> when indexing. Besides what you'd expect (new segments
 written to disk), there's segment merging going on in
 the background which has to read segments from disk
 in order to merge.

2> when querying, any fields returned as part of a doc
 that has stored=true docValues=false will require
 a disk access to get the stored data.

Best,
Erick


On Mon, Jan 7, 2019 at 8:35 AM Branham, Jeremy (Experis)
 wrote:
>
> Does anyone know if it is typical behavior for a SOLR cluster to have lots of 
> page faults (50-100 per second) under heavy load?
>
> We are performing load testing on a cluster with 8 nodes, and my performance 
> engineer has brought this information to attention.
>
> I don’t know enough about memory management to say it is normal or not.
>
>
>
> The performance doesn’t appear to be suffering, but I don’t want to overlook 
> a potential hazard.
>
>
>
> Thanks!
>
>
>
>
>
>
>
>
>
> Jeremy Branham
>
> jb...@allstate.com
>
> Allstate Insurance Company | UCV Technology Services | Information Services 
> Group
>
>


Page faults

2019-01-07 Thread Branham, Jeremy (Experis)
Does anyone know if it is typical behavior for a SOLR cluster to have lots of 
page faults (50-100 per second) under heavy load?
We are performing load testing on a cluster with 8 nodes, and my performance 
engineer has brought this information to attention.
I don’t know enough about memory management to say it is normal or not.

The performance doesn’t appear to be suffering, but I don’t want to overlook a 
potential hazard.

Thanks!

[cid:image001.png@01D4A674.A5F74590]



Jeremy Branham
jb...@allstate.com
Allstate Insurance Company | UCV Technology Services | Information Services 
Group



Re: Questions for SynonymGraphFilter and WordDelimiterGraphFilter

2019-01-07 Thread Thomas Aglassinger
Hi Wei,

here's a fairly simple field type we currently use in a project that seems to 
do the job with graph synonyms. Maybe this helps as a starting point for you:














As you can see we use the same filters for both indexing and query, so this 
might have some impact on positional queries but so far it seems negligible for 
the short synonyms we use in practice. Also there is no need for the 
FlattenGraphFilter.

The WhitespaceTokenizerFactory ensures that you can define synonyms with 
hyphens like mac-book -> macbook.

Best regards, Thomas.


On 05.01.19, 02:11, "Wei"  wrote:

Hello,

We are upgrading to Solr 7.6.0 and noticed that SynonymFilter and
WordDelimiterFilter have been deprecated. Solr doc recommends to use
SynonymGraphFilter and WordDelimiterGraphFilter instead 
I guess the StopFilter mess up the SynonymGraphFilter output? Not sure
if  it's a solr defect or there is a guideline that StopFilter should
not be put after graph filters.

Thanks in advance for you input.


Thanks,

Wei




Setting Solr Home via installation script

2019-01-07 Thread Stephon Harris
I am trying to install solr as a service so that when a restart takes place
the solr home directory is set to `example/schemaless/solr`where there are
cores I have created while running solr in the schemaless example.



As instructed in taking Solr to Production
,
I ran the command sudo bash ./install_solr_service.sh solr-7.4.0.tgz -i
/opt/ -d example/schemaless/solr -u solr -s solr -p 8983 and it started
solr successfully, however the solr home was set to /var/solr/data. I
thought that giving the -d option Solr home would be set to
example/schemaless/solr. What should I do to get solr home set to
example/schemaless/solr? Is there another way I should go about getting the
cores that I created under the schemaless directory in the solr home
directory?

-- 
Stephon Harris

*Enterprise Knowledge, LLC*
*Web: *http://www.enterprise-knowledge.com/

*E-mail:* shar...@enterprise-knowledge.com/

*Cell:* 832-628-8352


RE: Solr Replication

2019-01-07 Thread Vadim Ivanov
Using cdcr with new replica types be aware of 
https://issues.apache.org/jira/browse/SOLR-12057?focusedComm

Parallel indexing to both cluster might be an option as well
-- 
Vadim


> -Original Message-
> From: Bernd Fehling [mailto:bernd.fehl...@uni-bielefeld.de]
> Sent: Monday, January 07, 2019 11:10 AM
> To: solr-user@lucene.apache.org
> Subject: Re: Solr Replication
> 
> In SolrCloud there are Data Centers.
> Your Cluster 1 is DataCenter 1 and your Cluster 2 is Data Center 2.
> You can then use CDCR (Cross Data Center Replication).
> http://lucene.apache.org/solr/guide/7_0/cross-data-center-replication-
> cdcr.html
> 
> Nevertheless I would spend your Cluster 2 another 2 zookeeper instances.
> 
> Regards, Bernd
> 
> Am 07.01.19 um 06:39 schrieb Mannar mannan:
> > Hi All,
> >
> > I would like to configure master slave between two solr cloud clusters (for
> > failover). Below is the scenario
> >
> > Solr version : 7.0
> >
> > Cluster 1:
> > 3 zookeeper instances :   zk1, zk2, zk3
> > 2 solr instances : solr1, solr2
> >
> > Cluster 2:
> > 1 zookeeper instance : bkpzk1,
> > 1 solr instances : bkpsolr1, bkpsolr2
> >
> > Master / Slave :  solr1 / bkpsolr1
> >solr2 / bkpsolr2
> >
> > Is it possible to have master / slave replication configured for solr
> > instances running in cluster1 & cluster2 (for failover). Kindly let me know
> > the possibility.
> >



Re: Web Server HTTP Header Internal IP Disclosure SOLR port

2019-01-07 Thread Jan Høydahl
Are you saying that the redirect from http://my.ip:8983/ to 
http://my.ip.8983/solr/ is a security issue for you? Please tell us how this 
could be by providing a real example where you believe that Solr exposes some 
secret information that the requesting client should not gain access to?? 
Remember that Solr is not any random Web server and must be firewalled and not 
exposed to the internet. Your security scan tool may have other assumptions?

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com

> 7. jan. 2019 kl. 05:55 skrev Muniraj M :
> 
> Hi,
> 
> I am using Apache SOLR 6.6.5 as my search engine and when we do security
> scan on our server, we got the below response
> 
> *When processing the following request : GET / HTTP/1.0 this web server
> leaks the following private IP address : X.X.X.X as found in the following
> collection of HTTP headers : HTTP/1.1 302 Found
> Location: http://X.X.X.X:8983/solr/
>  Content-Length: 0*
> 
> I have checked for more time however haven't find any solutions to fix this
> problem. Any idea of how to solve this would be really appreciated.
> 
> -- 
> Regards,
> *Muniraj M*



Re: How to have the same SOLR cores for both 8983 and 8984 ports

2019-01-07 Thread Jan Høydahl
Solr runs on only one port at a time, so there must be some misunderstanding 
here.
If you have Solr running on both ports at the same time then you have simply
started a new instance, not reconfigured the previous.

For us to help you more, you will have to provide more details on how you have 
installed
and started Solr, where your SOLR_HOME is located etc

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com

> 7. jan. 2019 kl. 05:54 skrev Muniraj M :
> 
> Hi,
> 
> I am using Apache SOLR 6.6.5 as my search engine running on port 8983. I
> just wanted to enable SSL for solr and followed this guide
>  to make it
> work under 8984 port with SSL.
> 
> Here my problem is that I am not able to see any cores on 8984 which is
> already created under the port 8983(port without SSL).
> 
> http://mywebsite.com:8983/solr/#/ ==> This have 3 cores
> 
> https://mywebsite.com:8984/solr/#/ ==> This don't have any cores
> 
> It will be really appreciated if anyone could provide the solution for
> having the same cores for both 8983 and 8984 ports.
> 
> Thanks
> 
> -- 
> Regards,
> *Muniraj M*



Re: Solr Replication

2019-01-07 Thread Bernd Fehling

In SolrCloud there are Data Centers.
Your Cluster 1 is DataCenter 1 and your Cluster 2 is Data Center 2.
You can then use CDCR (Cross Data Center Replication).
http://lucene.apache.org/solr/guide/7_0/cross-data-center-replication-cdcr.html

Nevertheless I would spend your Cluster 2 another 2 zookeeper instances.

Regards, Bernd

Am 07.01.19 um 06:39 schrieb Mannar mannan:

Hi All,

I would like to configure master slave between two solr cloud clusters (for
failover). Below is the scenario

Solr version : 7.0

Cluster 1:
3 zookeeper instances :   zk1, zk2, zk3
2 solr instances : solr1, solr2

Cluster 2:
1 zookeeper instance : bkpzk1,
1 solr instances : bkpsolr1, bkpsolr2

Master / Slave :  solr1 / bkpsolr1
   solr2 / bkpsolr2

Is it possible to have master / slave replication configured for solr
instances running in cluster1 & cluster2 (for failover). Kindly let me know
the possibility.