Re: Suppress stack trace in error response

2019-02-21 Thread Zheng Lin Edwin Yeo
Hi,

There's too little information provided in your questions.
You can explain more on the issue or the exception that you are facing.

Regards,
Edwin

On Thu, 21 Feb 2019 at 23:45, Branham, Jeremy (Experis) 
wrote:

> When Solr throws an exception, like when a client sends a badly formed
> query string, is there a way to suppress the stack trace in the error
> response?
>
>
>
> Jeremy Branham
> jb...@allstate.com
> Allstate Insurance Company | UCV Technology Services | Information
> Services Group
>
>


Re: Full index replication upon service restart

2019-02-21 Thread Erick Erickson
There really is no such thing as the replica falling “too far behind”. The 
process is
> leader gets an update
> leader indexes locally and forwards the documents to the follower
> follower acks back that it’s received the raw docs and is indexing them
> the leader acks back to the client that the update is complete.

So if you’re saying that the follower isn’t showing the docs you think it is,
something _else_ is going on that you have to identify.

This is for NRT replicas. TLOG and PULL replicas can “fall behind”, but you 
should
probably throttle on the client side.

Best,
Erick

> On Feb 21, 2019, at 8:46 AM, Rahul Goswami  wrote:
> 
> Eric,
> Thanks for the insight. We are looking at tuning the architecture. We are
> also stopping the indexing application before we bring down the Solr nodes
> for maintenance. However when both nodes are up, and one replica is falling
> behind too much we want to throttle the requests. Is there an API in Solr
> to know whether a replica is falling behind from the leader ?
> 
> Thanks,
> Rahul
> 
> On Mon, Feb 11, 2019 at 10:28 PM Erick Erickson 
> wrote:
> 
>> bq. To answer your question about index size on
>> disk, it is 3 TB on every node. As mentioned it's a 32 GB machine and I
>> allocated 24GB to Java heap.
>> 
>> This is massively undersized in terms of RAM in my experience. You're
>> trying to cram 3TB of index into 32GB of memory. Frankly, I don't think
>> there's much you can do to increase stability in this situation, too many
>> things are going on. In particular, you're indexing during node restart.
>> 
>> That means that
>> 1> you'll almost inevitably get a full sync on start given your update
>> rate.
>> 2> while you're doing the full sync, all new updates are sent to the
>>  recovering replica and put in the tlog.
>> 3> When the initial replication is done, the documents sent to the
>> tlog while recovering are indexed. This is 7 hours of accumulated
>> updates.
>> 4> If much goes wrong in this situation, then you're talking another full
>> sync.
>> 5> rinse, repeat.
>> 
>> There are no magic tweaks here. You really have to rethink your
>> architecture. I'm actually surprised that your queries are performant.
>> I expect you're getting a _lot_ of I/O, that is the relevant parts of your
>> index are swapping in and out of the OS memory space. A _lot_.
>> Or you're only using a _very_ small bit of your index.
>> 
>> Sorry to be so negative, but this is not a situation that's amenable to
>> a quick fix.
>> 
>> Best,
>> Erick
>> 
>> 
>> 
>> 
>> On Mon, Feb 11, 2019 at 4:10 PM Rahul Goswami 
>> wrote:
>>> 
>>> Thanks for the response Eric. To answer your question about index size on
>>> disk, it is 3 TB on every node. As mentioned it's a 32 GB machine and I
>>> allocated 24GB to Java heap.
>>> 
>>> Further monitoring the recovery, I see that when the follower node is
>>> recovering, the leader node (which is NOT recovering) almost freezes with
>>> 100% CPU usage and 80%+ memory usage. Follower node's memory usage is
>> 80%+
>>> but CPU is very healthy. Also Follower node's log is filled up with
>> updates
>>> forwarded from the leader ("...PRE_UPDATE FINISH
>>> {update.distrib=FROMLEADER=...") and replication starts much
>>> afterwards.
>>> There have been instances when complete recovery took 10+ hours. We have
>>> upgraded to a 4 Gbps NIC between the nodes to see if it helps.
>>> 
>>> Also, a few followup questions:
>>> 
>>> 1) Is  there a configuration which would start throttling update requests
>>> if the replica falls behind a certain number of updates so as to not
>>> trigger an index replication later?  If not, would it be a worthy
>>> enhancement?
>>> 2) What would be a recommended hard commit interval for this kind of
>> setup
>>> ?
>>> 3) What are some of the improvements in 7.5 with respect to recovery as
>>> compared to 7.2.1?
>>> 4) What do the below peersync failure logs lines mean?  This would help
>> me
>>> better understand the reasons for peersync failure and maybe devise some
>>> alert mechanism to start throttling update requests from application
>>> program if feasible.
>>> 
>>> *PeerSync Failure type 1*:
>>> --
>>> 2019-02-04 20:43:50.018 INFO
>>> (recoveryExecutor-4-thread-2-processing-n:indexnode1:2_solr
>>> x:DataIndex_1C6F947C-6673-4778-847D-2DE0FDE56C66_shard11_replica_n42
>>> s:shard11 c:DataIndex_1C6F947C-6673-4778-847D-2DE0FDE56C66 r:core_node45)
>>> [c:DataIndex_1C6F947C-6673-4778-847D-2DE0FDE56C66 s:shard11 r:core_node45
>>> x:DataIndex_1C6F947C-6673-4778-847D-2DE0FDE56C66_shard11_replica_n42]
>>> org.apache.solr.update.PeerSync Fingerprint comparison: 1
>>> 
>>> 2019-02-04 20:43:50.018 INFO
>>> (recoveryExecutor-4-thread-2-processing-n:indexnode1:2_solr
>>> x:DataIndex_1C6F947C-6673-4778-847D-2DE0FDE56C66_shard11_replica_n42
>>> s:shard11 c:DataIndex_1C6F947C-6673-4778-847D-2DE0FDE56C66 r:core_node45)
>>> [c:DataIndex_1C6F947C-6673-4778-847D-2DE0FDE56C66 s:shard11 

Re: dynamic field issue

2019-02-21 Thread Erick Erickson
Dynamic fields are exactly the same as statically defined fields at the Lucene 
level, so this is exactly equivalent to “would defining N fields statically 
increase heap”.

IOW, no more than defining that many fields manually.

Best,
Erick

> On Feb 21, 2019, at 8:40 AM, Midas A  wrote:
> 
> Here we are indexing dynamic fields and  we are using one of this field in*
> bf *.
> Would only indexing dynamic field will increase heap and load of master -
> slave solr servers ?
> 
> 
> Regards,
> Midas
> 
> On Thu, Feb 21, 2019 at 10:03 PM Erick Erickson 
> wrote:
> 
>> 300 is still not excessive. As far as memory goes, sure. If you’re
>> faceting, grouping, or sorting docValues would _certainly_ help with memory
>> consumption.
>> 
>>> On Feb 21, 2019, at 8:31 AM, Midas A  wrote:
>>> 
>>> Hi ,
>>> Plelase help me here we have crossed 100+ fields per dyanmic fields and
>> we
>>> have three dynamic fields.
>>> using docValues in dynamic fields will help in improving heap and query
>>> time ?
>>> 
>>> Regards,
>>> Abhishek Tiwari
>>> 
>>> 
>>> On Thu, Feb 21, 2019 at 9:38 PM Midas A  wrote:
>>> 
 Yes . We have crossed  100 fields .
 
 Would docValues help here ?
 
 What kind of information you want from my side ?
 
 On Thu, 21 Feb 2019, 9:31 pm Erick Erickson, 
 wrote:
 
> There’s no way to answer this given you’ve provided almost no
> information.
> 
> Do note that once you get to more than a few hundred fields,
> Solr still functions, but I’ve seen performance degrade and
> memory increase.
> 
> Best,
> Erick
> 
>> On Feb 21, 2019, at 7:54 AM, Midas A  wrote:
>> 
>> Thanks for quick reply .
>> 
>> we are creating  search *query(keyword)*  for dynamic field creation
>> to
>> use click ,cart  and order data  in search.
>> 
>> But we are experiencing  more heap and increase in query time .
>> What could be the problem? can be anything related to it ?
>> 
>> 
>> On Thu, Feb 21, 2019 at 8:43 PM Shawn Heisey 
> wrote:
>> 
>>> On 2/21/2019 8:01 AM, Midas A wrote:
 How many dynamic field we can create in solr ?. is there any
> limitation ?
 Is indexing dynamic field can increase heap memory on server .
>>> 
>>> At the Lucene level, there is absolutely no difference between a
>>> standard field and a dynamic field.  The difference in Solr is how
>> the
>>> field is defined, nothing more.
>>> 
>>> Lucene has no hard limitations on the number of fields you can
>> create,
>>> but the more you have the larger your index will probably be.  Larger
>>> indexes perform slower than smaller ones and require more resources
> like
>>> memory.
>>> 
>>> Thanks,
>>> Shawn
>>> 
> 
> 
>> 
>> 



Re: Solr Cell, Tika and UpdateProcessorChains

2019-02-21 Thread Erick Erickson
Several things:

1> Please don’t use add-unknown…. It’s fine for prototyping, but guesses field 
definitions.

2> the solrocnfig appears to be malformed, I’m surprised it fires up at all. 
This never terminates for instance: 

Solr Cell, Tika and UpdateProcessorChains

2019-02-21 Thread Demian Katz
I'm posting this question on behalf of Whitney Clarke, who is a pending member 
of this list but is not able to post on her own yet. I've been working with her 
on some troubleshooting, but I'm not familiar with the components she's using 
and thought somebody here might be able to point her in the right direction 
more quickly than I can.

Here is her original inquiry:


I am pulling data from a local drive for indexing.  I am using solr cell and 
tika in schemaless mode.  I am attempting to rewrite certain field information 
prior to indexing using html-strip and regex UpdateProcessorChains.  However, 
when run, the UpdateProcessorChains never appear to get invoked.

For example,

I am looking to rewrite "url":"e:\\documents\\apiscript.txt" to 
be http://apiscript.txt .  My current solrconfig is trying to rewrite id and 
put the rewritten link into url, but this is just the recent attempt of many 
different ways I have tried to get it to work.

My other issues is with the content field.  I am trying to strip that field 
down to just the actual text of the document.  I am getting all meta data in it 
as well.   Any suggestions?

Thanks,
Whitney


Whitney's latest solrconfig.xml in pasted in full below - as she notes, we've 
been through many iterations without any success. The key question is how to 
manipulate the data retrieved from Tika prior to indexing it. Is there a 
documented best practice for this type of situation, or any tips on how to 
troubleshoot when nothing appears to be happening?

Thanks,
Demian




  7.3.1

  
  

  
  

  
  

  

  ${solr.solr.home:./solr}/text_test

  

  



  ${solr.ulog.dir:}
  ${solr.ulog.numVersionBuckets:65536}



  ${solr.autoCommit.maxTime:15000}
  false


  

1024
  

  



  


true


  20


  200


   
  

  


  

  

false

  

   
 

  

  


  explicit
  
 content

  





  
  

  explicit
  json
  true

  

  


add-unknown-fields-to-the-schema
 html-strip-features
regex-replace

  

  



  
  true
  links
  ignored_
  true
  ignored_

  

   

text_general





  default
  _text_
  solr.DirectSolrSpellChecker
  
  internal
  
  0.5
  
  2
  
  1
  
  5
  
  4
  
  0.01
  

  

  

  
  default
  on
  true
  10
  5
  5
  true
  true
  10
  5


  spellcheck

  



  
  
  

  100

  

  
  

  
  70
  
  0.5
  
  [-\w ,/\n\]{20,200}

  

  
  

  
  

  

  
  

  
  

  
  

  
  

  
  

  

  
  

  
  

  

  

  10
  .,!? 

  

  

  
  WORD
  
  
  en
  US

  

  

  
  
  
[^\w-\.]
_
  
  
  
  
  

  -MM-dd'T'HH:mm:ss.SSSZ
  -MM-dd'T'HH:mm:ss,SSSZ
  -MM-dd'T'HH:mm:ss.SSS
  -MM-dd'T'HH:mm:ss,SSS
  -MM-dd'T'HH:mm:ssZ
  -MM-dd'T'HH:mm:ss
  -MM-dd'T'HH:mmZ
  -MM-dd'T'HH:mm
  -MM-dd HH:mm:ss.SSSZ
  -MM-dd HH:mm:ss,SSSZ
  -MM-dd HH:mm:ss.SSS
  -MM-dd HH:mm:ss,SSS
  -MM-dd HH:mm:ssZ
  -MM-dd HH:mm:ss
  -MM-dd HH:mmZ
  -MM-dd HH:mm
  -MM-dd

  
  

  java.lang.String
  text_general
  
*_str
256
  
  
  true


  java.lang.Boolean
  booleans


  java.util.Date
  pdates


  java.lang.Long
  java.lang.Integer
  plongs


  java.lang.Number
  pdoubles

  

  
  



  


  _text_




  

   
  
   id
   ^[a-z]:\w+
   http://
   true



 
  
 
   text,title,subject,description
   language_s
   en
 
 
 
   

   

text/plain; charset=UTF-8
  

   
5
  



Re: dynamic field issue

2019-02-21 Thread Midas A
Here we are indexing dynamic fields and  we are using one of this field in*
bf *.
Would only indexing dynamic field will increase heap and load of master -
slave solr servers ?


Regards,
Midas

On Thu, Feb 21, 2019 at 10:03 PM Erick Erickson 
wrote:

> 300 is still not excessive. As far as memory goes, sure. If you’re
> faceting, grouping, or sorting docValues would _certainly_ help with memory
> consumption.
>
> > On Feb 21, 2019, at 8:31 AM, Midas A  wrote:
> >
> > Hi ,
> > Plelase help me here we have crossed 100+ fields per dyanmic fields and
> we
> > have three dynamic fields.
> > using docValues in dynamic fields will help in improving heap and query
> > time ?
> >
> > Regards,
> > Abhishek Tiwari
> >
> >
> > On Thu, Feb 21, 2019 at 9:38 PM Midas A  wrote:
> >
> >> Yes . We have crossed  100 fields .
> >>
> >> Would docValues help here ?
> >>
> >> What kind of information you want from my side ?
> >>
> >> On Thu, 21 Feb 2019, 9:31 pm Erick Erickson, 
> >> wrote:
> >>
> >>> There’s no way to answer this given you’ve provided almost no
> >>> information.
> >>>
> >>> Do note that once you get to more than a few hundred fields,
> >>> Solr still functions, but I’ve seen performance degrade and
> >>> memory increase.
> >>>
> >>> Best,
> >>> Erick
> >>>
>  On Feb 21, 2019, at 7:54 AM, Midas A  wrote:
> 
>  Thanks for quick reply .
> 
>  we are creating  search *query(keyword)*  for dynamic field creation
> to
>  use click ,cart  and order data  in search.
> 
>  But we are experiencing  more heap and increase in query time .
>  What could be the problem? can be anything related to it ?
> 
> 
>  On Thu, Feb 21, 2019 at 8:43 PM Shawn Heisey 
> >>> wrote:
> 
> > On 2/21/2019 8:01 AM, Midas A wrote:
> >> How many dynamic field we can create in solr ?. is there any
> >>> limitation ?
> >> Is indexing dynamic field can increase heap memory on server .
> >
> > At the Lucene level, there is absolutely no difference between a
> > standard field and a dynamic field.  The difference in Solr is how
> the
> > field is defined, nothing more.
> >
> > Lucene has no hard limitations on the number of fields you can
> create,
> > but the more you have the larger your index will probably be.  Larger
> > indexes perform slower than smaller ones and require more resources
> >>> like
> > memory.
> >
> > Thanks,
> > Shawn
> >
> >>>
> >>>
>
>


Re: Full index replication upon service restart

2019-02-21 Thread Rahul Goswami
Eric,
Thanks for the insight. We are looking at tuning the architecture. We are
also stopping the indexing application before we bring down the Solr nodes
for maintenance. However when both nodes are up, and one replica is falling
behind too much we want to throttle the requests. Is there an API in Solr
to know whether a replica is falling behind from the leader ?

Thanks,
Rahul

On Mon, Feb 11, 2019 at 10:28 PM Erick Erickson 
wrote:

> bq. To answer your question about index size on
> disk, it is 3 TB on every node. As mentioned it's a 32 GB machine and I
> allocated 24GB to Java heap.
>
> This is massively undersized in terms of RAM in my experience. You're
> trying to cram 3TB of index into 32GB of memory. Frankly, I don't think
> there's much you can do to increase stability in this situation, too many
> things are going on. In particular, you're indexing during node restart.
>
> That means that
> 1> you'll almost inevitably get a full sync on start given your update
>  rate.
> 2> while you're doing the full sync, all new updates are sent to the
>   recovering replica and put in the tlog.
> 3> When the initial replication is done, the documents sent to the
>  tlog while recovering are indexed. This is 7 hours of accumulated
>  updates.
> 4> If much goes wrong in this situation, then you're talking another full
>  sync.
> 5> rinse, repeat.
>
> There are no magic tweaks here. You really have to rethink your
> architecture. I'm actually surprised that your queries are performant.
> I expect you're getting a _lot_ of I/O, that is the relevant parts of your
> index are swapping in and out of the OS memory space. A _lot_.
> Or you're only using a _very_ small bit of your index.
>
> Sorry to be so negative, but this is not a situation that's amenable to
> a quick fix.
>
> Best,
> Erick
>
>
>
>
> On Mon, Feb 11, 2019 at 4:10 PM Rahul Goswami 
> wrote:
> >
> > Thanks for the response Eric. To answer your question about index size on
> > disk, it is 3 TB on every node. As mentioned it's a 32 GB machine and I
> > allocated 24GB to Java heap.
> >
> > Further monitoring the recovery, I see that when the follower node is
> > recovering, the leader node (which is NOT recovering) almost freezes with
> > 100% CPU usage and 80%+ memory usage. Follower node's memory usage is
> 80%+
> > but CPU is very healthy. Also Follower node's log is filled up with
> updates
> > forwarded from the leader ("...PRE_UPDATE FINISH
> > {update.distrib=FROMLEADER=...") and replication starts much
> > afterwards.
> > There have been instances when complete recovery took 10+ hours. We have
> > upgraded to a 4 Gbps NIC between the nodes to see if it helps.
> >
> > Also, a few followup questions:
> >
> > 1) Is  there a configuration which would start throttling update requests
> > if the replica falls behind a certain number of updates so as to not
> > trigger an index replication later?  If not, would it be a worthy
> > enhancement?
> > 2) What would be a recommended hard commit interval for this kind of
> setup
> > ?
> > 3) What are some of the improvements in 7.5 with respect to recovery as
> > compared to 7.2.1?
> > 4) What do the below peersync failure logs lines mean?  This would help
> me
> > better understand the reasons for peersync failure and maybe devise some
> > alert mechanism to start throttling update requests from application
> > program if feasible.
> >
> > *PeerSync Failure type 1*:
> > --
> > 2019-02-04 20:43:50.018 INFO
> > (recoveryExecutor-4-thread-2-processing-n:indexnode1:2_solr
> > x:DataIndex_1C6F947C-6673-4778-847D-2DE0FDE56C66_shard11_replica_n42
> > s:shard11 c:DataIndex_1C6F947C-6673-4778-847D-2DE0FDE56C66 r:core_node45)
> > [c:DataIndex_1C6F947C-6673-4778-847D-2DE0FDE56C66 s:shard11 r:core_node45
> > x:DataIndex_1C6F947C-6673-4778-847D-2DE0FDE56C66_shard11_replica_n42]
> > org.apache.solr.update.PeerSync Fingerprint comparison: 1
> >
> > 2019-02-04 20:43:50.018 INFO
> > (recoveryExecutor-4-thread-2-processing-n:indexnode1:2_solr
> > x:DataIndex_1C6F947C-6673-4778-847D-2DE0FDE56C66_shard11_replica_n42
> > s:shard11 c:DataIndex_1C6F947C-6673-4778-847D-2DE0FDE56C66 r:core_node45)
> > [c:DataIndex_1C6F947C-6673-4778-847D-2DE0FDE56C66 s:shard11 r:core_node45
> > x:DataIndex_1C6F947C-6673-4778-847D-2DE0FDE56C66_shard11_replica_n42]
> > org.apache.solr.update.PeerSync Other fingerprint:
> > {maxVersionSpecified=1624579878580912128,
> > maxVersionEncountered=1624579893816721408, maxInHash=1624579878580912128,
> > versionsHash=-8308981502886241345, numVersions=32966082,
> numDocs=32966165,
> > maxDoc=1828452}, Our fingerprint:
> {maxVersionSpecified=1624579878580912128,
> > maxVersionEncountered=1624579975760838656, maxInHash=1624579878580912128,
> > versionsHash=4017509388564167234, numVersions=32966066, numDocs=32966165,
> > maxDoc=1828452}
> >
> > 2019-02-04 20:43:50.018 INFO
> > (recoveryExecutor-4-thread-2-processing-n:indexnode1:2_solr
> > 

RE: TLOG replica, updateHandler errors in metrics, no logs

2019-02-21 Thread Markus Jelsma
Hello Erick,

I just delete a replica and add again, but with type=tlog.

Yes, it is reproducibly both locally and in production, and with various 
collections. For each document added, the metric increments as well.

I'll open a ticket!

Thanks!
Markus

https://issues.apache.org/jira/browse/SOLR-13265


 
-Original message-
> From:Erick Erickson 
> Sent: Thursday 21st February 2019 17:06
> To: solr-user@lucene.apache.org
> Subject: Re: TLOG replica, updateHandler errors in metrics, no logs
> 
> How are you “moving”? There’s no provision that I know of to _change_ an 
> existing replica.
> 
> But no, if you’re starting with replicas created as TLOG then I haven’t heard 
> of this. If
> the documents are getting indexed and replicated properly then it sounds like 
> a bogus
> counter is being incremented. That said, if you can reliably reproduce this 
> should be 
> a JIRA IMO.
> 
> Best,
> Erick
> 
> > On Feb 21, 2019, at 2:33 AM, Markus Jelsma  
> > wrote:
> > 
> > Hello,
> > 
> > We are moving some replica's to TLOG, one collection runs 7.5, the others 
> > 7.7. When indexing, we see UPDATE.updateHandler.errors increment for each 
> > document being indexed, there is nothing in the logs.
> > 
> > Is this a known issue? 
> > 
> > Thanks,
> > Markus
> 
> 


Re: dynamic field issue

2019-02-21 Thread Erick Erickson
300 is still not excessive. As far as memory goes, sure. If you’re faceting, 
grouping, or sorting docValues would _certainly_ help with memory consumption.

> On Feb 21, 2019, at 8:31 AM, Midas A  wrote:
> 
> Hi ,
> Plelase help me here we have crossed 100+ fields per dyanmic fields and we
> have three dynamic fields.
> using docValues in dynamic fields will help in improving heap and query
> time ?
> 
> Regards,
> Abhishek Tiwari
> 
> 
> On Thu, Feb 21, 2019 at 9:38 PM Midas A  wrote:
> 
>> Yes . We have crossed  100 fields .
>> 
>> Would docValues help here ?
>> 
>> What kind of information you want from my side ?
>> 
>> On Thu, 21 Feb 2019, 9:31 pm Erick Erickson, 
>> wrote:
>> 
>>> There’s no way to answer this given you’ve provided almost no
>>> information.
>>> 
>>> Do note that once you get to more than a few hundred fields,
>>> Solr still functions, but I’ve seen performance degrade and
>>> memory increase.
>>> 
>>> Best,
>>> Erick
>>> 
 On Feb 21, 2019, at 7:54 AM, Midas A  wrote:
 
 Thanks for quick reply .
 
 we are creating  search *query(keyword)*  for dynamic field creation  to
 use click ,cart  and order data  in search.
 
 But we are experiencing  more heap and increase in query time .
 What could be the problem? can be anything related to it ?
 
 
 On Thu, Feb 21, 2019 at 8:43 PM Shawn Heisey 
>>> wrote:
 
> On 2/21/2019 8:01 AM, Midas A wrote:
>> How many dynamic field we can create in solr ?. is there any
>>> limitation ?
>> Is indexing dynamic field can increase heap memory on server .
> 
> At the Lucene level, there is absolutely no difference between a
> standard field and a dynamic field.  The difference in Solr is how the
> field is defined, nothing more.
> 
> Lucene has no hard limitations on the number of fields you can create,
> but the more you have the larger your index will probably be.  Larger
> indexes perform slower than smaller ones and require more resources
>>> like
> memory.
> 
> Thanks,
> Shawn
> 
>>> 
>>> 



Re: Newbie question - Error loading an existing config file

2019-02-21 Thread Erick Erickson
I have absolutely no idea when it comes to Drupal, the Drupal folks would be
much better equipped to answer.

Best,
Erick

> On Feb 21, 2019, at 8:16 AM, Greg Robinson  wrote:
> 
> Thanks for the feedback.
> 
> So here is where I'm at.
> 
> I first went ahead and deleted the existing core that was returning the
> error using the following command: bin/solr delete -c new_solr_core
> 
> Now when I access the admin panel, there are no errors.
> 
> I then referred to the large "warning" box on the CREATE action
> documentation:
> 
> "While it’s possible to create a core for a non-existent collection, this
> approach is not supported and not recommended. Always create a collection
> using the *Collections API* before creating a core directly for it."
> 
> I then tried to create a collection using the following "Collections API"
> command:
> 
> http://localhost:8983/solr/admin/collections?action=CREATE=newCollection=2=1=xml
> 
> This was the response:
> 
> 
> 
> 400
> 2
> 
> 
> 
> org.apache.solr.common.SolrException
> org.apache.solr.common.SolrException
> 
> Solr instance is not running in SolrCloud mode.
> 400
> 
> 
> 
> I guess my main question is, do I need to be running in "SolrCloud mode" if
> my intention is to use Solr Server to index a Drupal 7 website? We're
> currently using opensolr.com which is working fine but we're trying to
> avoid the monthly costs associated with their "Shared Solr Cloud" plan.
> 
> Thanks!
> 
> 
> 
> 
> 
> On Wed, Feb 20, 2019 at 8:34 PM Shawn Heisey  wrote:
> 
>> On 2/20/2019 11:07 AM, Greg Robinson wrote:
>>> Lets try this: https://imgur.com/a/z5OzbLW
>>> 
>>> What I'm trying to do seems pretty straightforward:
>>> 
>>> 1. Install Solr Server 7.4 on Linux (Completed)
>>> 2. Connect my Drupal 7 site to the Solr Server and use it for indexing
>>> content
>>> 
>>> My understanding is that I must first create a core in order to connect
>> my
>>> drupal site to Solr Server. This is where I'm currently stuck.
>> 
>> The assertion in your screenshot that the dataDir must exist is
>> incorrect.  If current versions of Solr say this also, that is something
>> we will need to change.  This is what actually happens:  If all the
>> other requirements are met and the dataDir does not exist, it will be
>> created automatically when the core starts, if the process has
>> sufficient permissions.
>> 
>> See the large "warning" box on the CREATE action documentation for
>> details on what you need:
>> 
>> 
>> https://lucene.apache.org/solr/guide/7_4/coreadmin-api.html#coreadmin-create
>> 
>> The warning box is the one that has a red triangle to the left of it.
>> The red triangle contains an exclamation point.
>> 
>> The essence of what it says there is that the core's instance directory
>> must exist, that directory must contain a "conf" directory, and all
>> required config files must be in the conf directory.
>> 
>> If you're running in SolrCloud mode, then you're using the wrong API.
>> 
>> Thanks,
>> Shawn
>> 
> 
> 
> -- 
> Greg Robinson
> CEO - Mobile*Enhanced*
> www.mobileenhanced.com
> g...@mobileenhanced.com
> 303-598-1865



Re: dynamic field issue

2019-02-21 Thread Midas A
Hi ,
Plelase help me here we have crossed 100+ fields per dyanmic fields and we
have three dynamic fields.
using docValues in dynamic fields will help in improving heap and query
time ?

Regards,
Abhishek Tiwari


On Thu, Feb 21, 2019 at 9:38 PM Midas A  wrote:

> Yes . We have crossed  100 fields .
>
> Would docValues help here ?
>
> What kind of information you want from my side ?
>
> On Thu, 21 Feb 2019, 9:31 pm Erick Erickson, 
> wrote:
>
>> There’s no way to answer this given you’ve provided almost no
>> information.
>>
>> Do note that once you get to more than a few hundred fields,
>> Solr still functions, but I’ve seen performance degrade and
>> memory increase.
>>
>> Best,
>> Erick
>>
>> > On Feb 21, 2019, at 7:54 AM, Midas A  wrote:
>> >
>> > Thanks for quick reply .
>> >
>> > we are creating  search *query(keyword)*  for dynamic field creation  to
>> > use click ,cart  and order data  in search.
>> >
>> > But we are experiencing  more heap and increase in query time .
>> > What could be the problem? can be anything related to it ?
>> >
>> >
>> > On Thu, Feb 21, 2019 at 8:43 PM Shawn Heisey 
>> wrote:
>> >
>> >> On 2/21/2019 8:01 AM, Midas A wrote:
>> >>> How many dynamic field we can create in solr ?. is there any
>> limitation ?
>> >>> Is indexing dynamic field can increase heap memory on server .
>> >>
>> >> At the Lucene level, there is absolutely no difference between a
>> >> standard field and a dynamic field.  The difference in Solr is how the
>> >> field is defined, nothing more.
>> >>
>> >> Lucene has no hard limitations on the number of fields you can create,
>> >> but the more you have the larger your index will probably be.  Larger
>> >> indexes perform slower than smaller ones and require more resources
>> like
>> >> memory.
>> >>
>> >> Thanks,
>> >> Shawn
>> >>
>>
>>


Re: Newbie question - Error loading an existing config file

2019-02-21 Thread Greg Robinson
Thanks for the feedback.

So here is where I'm at.

I first went ahead and deleted the existing core that was returning the
error using the following command: bin/solr delete -c new_solr_core

Now when I access the admin panel, there are no errors.

I then referred to the large "warning" box on the CREATE action
documentation:

"While it’s possible to create a core for a non-existent collection, this
approach is not supported and not recommended. Always create a collection
using the *Collections API* before creating a core directly for it."

I then tried to create a collection using the following "Collections API"
command:

http://localhost:8983/solr/admin/collections?action=CREATE=newCollection=2=1=xml

This was the response:



400
2



org.apache.solr.common.SolrException
org.apache.solr.common.SolrException

Solr instance is not running in SolrCloud mode.
400



I guess my main question is, do I need to be running in "SolrCloud mode" if
my intention is to use Solr Server to index a Drupal 7 website? We're
currently using opensolr.com which is working fine but we're trying to
avoid the monthly costs associated with their "Shared Solr Cloud" plan.

Thanks!





On Wed, Feb 20, 2019 at 8:34 PM Shawn Heisey  wrote:

> On 2/20/2019 11:07 AM, Greg Robinson wrote:
> > Lets try this: https://imgur.com/a/z5OzbLW
> >
> > What I'm trying to do seems pretty straightforward:
> >
> > 1. Install Solr Server 7.4 on Linux (Completed)
> > 2. Connect my Drupal 7 site to the Solr Server and use it for indexing
> > content
> >
> > My understanding is that I must first create a core in order to connect
> my
> > drupal site to Solr Server. This is where I'm currently stuck.
>
> The assertion in your screenshot that the dataDir must exist is
> incorrect.  If current versions of Solr say this also, that is something
> we will need to change.  This is what actually happens:  If all the
> other requirements are met and the dataDir does not exist, it will be
> created automatically when the core starts, if the process has
> sufficient permissions.
>
> See the large "warning" box on the CREATE action documentation for
> details on what you need:
>
>
> https://lucene.apache.org/solr/guide/7_4/coreadmin-api.html#coreadmin-create
>
> The warning box is the one that has a red triangle to the left of it.
> The red triangle contains an exclamation point.
>
> The essence of what it says there is that the core's instance directory
> must exist, that directory must contain a "conf" directory, and all
> required config files must be in the conf directory.
>
> If you're running in SolrCloud mode, then you're using the wrong API.
>
> Thanks,
> Shawn
>


-- 
Greg Robinson
CEO - Mobile*Enhanced*
www.mobileenhanced.com
g...@mobileenhanced.com
303-598-1865


Re: dynamic field issue

2019-02-21 Thread Midas A
Yes . We have crossed  100 fields .

Would docValues help here ?

What kind of information you want from my side ?

On Thu, 21 Feb 2019, 9:31 pm Erick Erickson, 
wrote:

> There’s no way to answer this given you’ve provided almost no
> information.
>
> Do note that once you get to more than a few hundred fields,
> Solr still functions, but I’ve seen performance degrade and
> memory increase.
>
> Best,
> Erick
>
> > On Feb 21, 2019, at 7:54 AM, Midas A  wrote:
> >
> > Thanks for quick reply .
> >
> > we are creating  search *query(keyword)*  for dynamic field creation  to
> > use click ,cart  and order data  in search.
> >
> > But we are experiencing  more heap and increase in query time .
> > What could be the problem? can be anything related to it ?
> >
> >
> > On Thu, Feb 21, 2019 at 8:43 PM Shawn Heisey 
> wrote:
> >
> >> On 2/21/2019 8:01 AM, Midas A wrote:
> >>> How many dynamic field we can create in solr ?. is there any
> limitation ?
> >>> Is indexing dynamic field can increase heap memory on server .
> >>
> >> At the Lucene level, there is absolutely no difference between a
> >> standard field and a dynamic field.  The difference in Solr is how the
> >> field is defined, nothing more.
> >>
> >> Lucene has no hard limitations on the number of fields you can create,
> >> but the more you have the larger your index will probably be.  Larger
> >> indexes perform slower than smaller ones and require more resources like
> >> memory.
> >>
> >> Thanks,
> >> Shawn
> >>
>
>


Re: TLOG replica, updateHandler errors in metrics, no logs

2019-02-21 Thread Erick Erickson
How are you “moving”? There’s no provision that I know of to _change_ an 
existing replica.

But no, if you’re starting with replicas created as TLOG then I haven’t heard 
of this. If
the documents are getting indexed and replicated properly then it sounds like a 
bogus
counter is being incremented. That said, if you can reliably reproduce this 
should be 
a JIRA IMO.

Best,
Erick

> On Feb 21, 2019, at 2:33 AM, Markus Jelsma  wrote:
> 
> Hello,
> 
> We are moving some replica's to TLOG, one collection runs 7.5, the others 
> 7.7. When indexing, we see UPDATE.updateHandler.errors increment for each 
> document being indexed, there is nothing in the logs.
> 
> Is this a known issue? 
> 
> Thanks,
> Markus



Re: dynamic field issue

2019-02-21 Thread Erick Erickson
There’s no way to answer this given you’ve provided almost no
information.

Do note that once you get to more than a few hundred fields,
Solr still functions, but I’ve seen performance degrade and
memory increase.

Best,
Erick

> On Feb 21, 2019, at 7:54 AM, Midas A  wrote:
> 
> Thanks for quick reply .
> 
> we are creating  search *query(keyword)*  for dynamic field creation  to
> use click ,cart  and order data  in search.
> 
> But we are experiencing  more heap and increase in query time .
> What could be the problem? can be anything related to it ?
> 
> 
> On Thu, Feb 21, 2019 at 8:43 PM Shawn Heisey  wrote:
> 
>> On 2/21/2019 8:01 AM, Midas A wrote:
>>> How many dynamic field we can create in solr ?. is there any limitation ?
>>> Is indexing dynamic field can increase heap memory on server .
>> 
>> At the Lucene level, there is absolutely no difference between a
>> standard field and a dynamic field.  The difference in Solr is how the
>> field is defined, nothing more.
>> 
>> Lucene has no hard limitations on the number of fields you can create,
>> but the more you have the larger your index will probably be.  Larger
>> indexes perform slower than smaller ones and require more resources like
>> memory.
>> 
>> Thanks,
>> Shawn
>> 



Re: dynamic field issue

2019-02-21 Thread Midas A
Thanks for quick reply .

we are creating  search *query(keyword)*  for dynamic field creation  to
use click ,cart  and order data  in search.

But we are experiencing  more heap and increase in query time .
What could be the problem? can be anything related to it ?


On Thu, Feb 21, 2019 at 8:43 PM Shawn Heisey  wrote:

> On 2/21/2019 8:01 AM, Midas A wrote:
> > How many dynamic field we can create in solr ?. is there any limitation ?
> > Is indexing dynamic field can increase heap memory on server .
>
> At the Lucene level, there is absolutely no difference between a
> standard field and a dynamic field.  The difference in Solr is how the
> field is defined, nothing more.
>
> Lucene has no hard limitations on the number of fields you can create,
> but the more you have the larger your index will probably be.  Larger
> indexes perform slower than smaller ones and require more resources like
> memory.
>
> Thanks,
> Shawn
>


Suppress stack trace in error response

2019-02-21 Thread Branham, Jeremy (Experis)
When Solr throws an exception, like when a client sends a badly formed query 
string, is there a way to suppress the stack trace in the error response?



Jeremy Branham
jb...@allstate.com
Allstate Insurance Company | UCV Technology Services | Information Services 
Group



Re: Solr CDRC updating data in target and not in source

2019-02-21 Thread Susheel Kumar
Do you see CDCR forward messages in source solr logs and with some
numbers?  That will confirm if data indeed going thru source and forwarded
to target.

Also any auto/soft commit settings difference between source & target?

On Wed, Feb 20, 2019 at 8:29 AM ypriverol  wrote:

> Hi:
>
> I'm using a CDRC feature from solr 7.1. My source solrcloud cluster is 3
> shards and the target similar 3 shards. When we create both clusters and
> push to the source and then enable CDRC the data is transfer nicely to the
> target. If we start adding records everything is fine.
>
> However, we have deleted ALL records in source and start adding again all
> with our pipelines (Spring solr). Interestingly,  all records appear in
> target but not in the source. We have even stopped the cdrc and the data
> continue transferring to the target and not appear in the source even when
> we are 100% we are inserting in the source.
>
> Any ideas?
>
> Regards
> Yasset
>
>
>
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>


dynamic field issue

2019-02-21 Thread Midas A
Hi All,

How many dynamic field we can create in solr ?. is there any limitation ?
Is indexing dynamic field can increase heap memory on server .


Regards,
Midas


Re: dynamic field issue

2019-02-21 Thread Shawn Heisey

On 2/21/2019 8:01 AM, Midas A wrote:

How many dynamic field we can create in solr ?. is there any limitation ?
Is indexing dynamic field can increase heap memory on server .


At the Lucene level, there is absolutely no difference between a 
standard field and a dynamic field.  The difference in Solr is how the 
field is defined, nothing more.


Lucene has no hard limitations on the number of fields you can create, 
but the more you have the larger your index will probably be.  Larger 
indexes perform slower than smaller ones and require more resources like 
memory.


Thanks,
Shawn


Re: Trying to enable HTTP gzip compression

2019-02-21 Thread Walter Underwood
Years ago we did some testing with HTTP compression for search results with the 
Ultraseek search engine. It wasn’t faster. It was sometimes slower.

Once you have enough RAM, search is a CPU-limited problem. HTTP compression 
uses more CPU to save network bandwidth. But search isn’t limited by network 
bandwidth, so this uses more of the bottleneck resource (CPU) to reduce usage 
of a plentiful resource (network bandwidth).

Look at the amount of data going in and out of your nodes. I bet it is far 
below the maximum.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)

> On Feb 21, 2019, at 6:07 AM, Jörn Franke  wrote:
> 
> You could also change the responsewriter from json to javabin to improve 
> performance. 
> Or increase network bandwidth. Then often people fetch more from solr than 
> they need. There is a huge saving potential. Increasing the cores for https 
> encryption can sometimes help.
> 
> Compression also leads to other issues (performance but potentially also 
> security wise).
> 
>> Am 21.02.2019 um 12:03 schrieb Luthien Dulk :
>> 
>> hi all,
>> 
>> I was wondering if anyone could point me in the right direction. 
>> 
>> I am looking into whether enabling Gzip HTTP compression for our Solr 
>> clusters (all running Solr 6.6.5) would help performance; my problem is that 
>> I can’t figure out how to do that.
>> 
>> Our infrastructure setup is like this: our applications are running on a 
>> Cloud Foundry PAAS environment, but our Solr clusters run elsewhere. 
>> Communication between applications and Solr clusters is secured by firewalls 
>> on every Solr machine (we do have a Socks Proxy set up in the CF 
>> environment, but unfortunately we can't use that for Solr because of the 
>> incompatibility between Zookeeper and Java Nio I/O - much to the chagrin of 
>> our sysadmin).
>> 
>> We think that HTTP compression might be very interesting for us because of 
>> the hight volume of traffic between two separate environments.
>> 
>> Here’s what I found out so far: 
>> 
>> (re. config changes in Solr’s embedded Jetty)
>> - I’m aware that this is mostly a matter of configuring Jetty;
>> - it seems that this should preferably be set in the solr-jetty-context.xml 
>> file;
>> - this seems to relate to enabling Jetty's “GzipHandler”
>> 
>> (re. gzip ‘module’ activation ..?)
>> - it puzzles me that 
>> https://aroratimus.blogspot.com/2017/08/jettyserver-9.html mentions that 
>> Jetty’s GzipHandler should be enabled using two files not found in Solr's 
>> embedded Jetty: server/etc/jetty-gzip.xml and server/modules/gzip.mod (they 
>> are available when installing Jetty separately though);
>> - apparently, Jetty's Gzip module should be activated by adding 
>> —add-to-start=gzip to the server startup command. For the embedded Jetty in 
>> Solr, it seems that this would require changing the solr startup script
>> 
>> (re. changes in Solr client)
>> - the calling application should add the HTTP Accept-Encoding: gzip, deflate 
>> ( according to 
>> https://menelic.com/2015/12/04/deploying-solrcloud-across-multiple-data-centers-dc-performance/
>>  )
>> 
>> 
>> I wonder, has anyone ever got this working? In particular:
>> 
>> - is that gzip ‘module’ activation necessary? That would seem a bit 
>> far-fetched, because it involves files not found in the Solr installation 
>> and possibly hacking the Solr startup script;
>> - what did you add to solr-jetty-context.xml in order to enable gzip 
>> compression?
>> 
>> 
>> I suppose that situations with high volumes of external network traffic 
>> between Solr and Client must be quite rare. Otherwise I’d think that a 
>> feature that potentially offers such obvious benefits (one of the pages 
>> above mentions a drop of 75% of network traffic and a 60% faster response 
>> time) would have been turned into an “enable http compression yes/no” 
>> setting by now :)
>> 
>> Anyhow, we’re stuck with it … I hope I can get it working.
>> 
>> 
>> Thank in advance for any advice!
>> 
>> Luthien
>> Api developer
>> Europeana.eu
>> 
>> 
>> 
>> 
>> 
>> 
>> -- 
>> Disclaimer: This email and any files transmitted with it are confidential 
>> and intended solely for the use of the individual or entity to whom they 
>> are
>> addressed. If you have received this email in error please notify the 
>> system manager. If you are not the named addressee you should not 
>> disseminate,
>> distribute or copy this email. Please notify the sender 
>> immediately by email if you have received this email by mistake and delete 
>> this email from your
>> system.



Re: Spring Boot Solr+ Kerberos+ Ambari

2019-02-21 Thread Furkan KAMACI
Hi,

You can also check here:
https://community.hortonworks.com/articles/15159/securing-solr-collections-with-ranger-kerberos.html
On
the other hand, we have a section for Solr Kerberos at documentation:
https://lucene.apache.org/solr/guide/6_6/kerberos-authentication-plugin.html
For
any Ambari specific questions, you can ask them at this forum:
https://community.hortonworks.com/topics/forum.html

Kind Regards,
Furkan KAMACI

On Thu, Feb 21, 2019 at 1:43 PM Rushikesh Garadade <
rushikeshgarad...@gmail.com> wrote:

> Hi Furkan,
> I think the link you provided is for ranger audit setting, please correct
> me if wrong?
>
> I use HDP 2.6.5. which has Solr 5.6
>
> Thank you,
> Rushikesh Garadade
>
>
> On Thu, Feb 21, 2019, 2:57 PM Furkan KAMACI 
> wrote:
>
> > Hi Rushikesh,
> >
> > Did you check here:
> >
> >
> https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.6.5/bk_security/content/solr_ranger_configure_solrcloud_kerberos.html
> >
> > By the way, which versions do you use?
> >
> > Kind Regards,
> > Furkan KAMACI
> >
> > On Thu, Feb 21, 2019 at 11:41 AM Rushikesh Garadade <
> > rushikeshgarad...@gmail.com> wrote:
> >
> > > Hi All,
> > >
> > > I am trying to set Kerberos for Solr which is installed on Hortonworks
> > > Ambari.
> > >
> > > Q1. Is Ranger a mandatory component for Solr Kerberos configuration on
> > > ambari.?
> > >
> > > I am getting little confused with documents available on internet for
> > this.
> > > I tried to do without ranger but not getting any success.
> > >
> > > If is there any good document for the same, please let me know.
> > >
> > > Thanks,
> > > Rushikesh Garadade.
> > >
> >
>


Re: Querying on sum of child documents

2019-02-21 Thread flatmind
Hello Mikhail - Thanks for your help with the query and I was able to get the
result. But when I am trying to add another condition to that, it is not
showing any results, though there are matching records in the index. But
when I execute those queries individually, I am getting the response.

This is the index that we have and it has 2 parent records with few children

 "response":{"numFound":2,"start":0,"docs":[ { "id":"one",
"th_content":["this resume belongs to php developer"], "th_is_parent":1,
"_version_":1626065661717381120, "_childDocuments_":[ { "id":"doc1***one",
"th_recent_exp":["Wokred as java developer"], "th_experience":2,
"th_is_parent":2, "_version_":1626065661717381120}, { "id":"doc2***one",
"th_recent_exp":["experience in java"], "th_experience":2, "th_is_parent":2,
"_version_":1626065661717381120}, { "id":"doc3***one",
"th_recent_exp":["junior software developer"], "th_experience":1,
"th_is_parent":2, "_version_":1626065661717381120}]}, { "id":"two",
"th_content":["this resume belongs to php developer"], "th_is_parent":1,
"_version_":1626065856446332928, "_childDocuments_":[ { "id":"doc1***two",
"th_recent_exp":["Wokred as php developer"], "th_experience":2,
"th_is_parent":2, "_version_":1626065856446332928}, { "id":"doc2***two",
"th_recent_exp":["experience in php"], "th_experience":2, "th_is_parent":2,
"_version_":1626065856446332928}, { "id":"doc3***two",
"th_recent_exp":["junior software developer"], "th_experience":1,
"th_is_parent":2, "_version_":1626065856446332928}]}] }}

When I use the below query either in the Q or FQ, I am able to see 1 record
in the search results. But when I try to add another condition, it is not
giving anything. Please find the below working query and the other one with
one more clause added. Seems that I might be missing a small syntactical
thing that combines both queries to get the results. Appreciate if you could
help here.

*Working Query
===*
{!frange l=4}{!parent which=th_is_parent:1 score=total 
v=$chq}=+th_is_parent:2^=0 AND th_recent_exp:php^=0 AND
{!func}th_experience 

*Adding another condition to the above query like below
===*
{!frange l=4}{!parent which=th_is_parent:1 score=total 
v=$chq}=+th_is_parent:2^=0 AND th_recent_exp:*php*^=0 AND
{!func}th_experience *AND* {!frange l=4}{!parent which=th_is_parent:1
score=total 
v=$chq}=+th_is_parent:2^=0 AND th_recent_exp:*java*^=0 AND
{!func}th_experience 

Thank You




--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Trying to enable HTTP gzip compression

2019-02-21 Thread Jörn Franke
You could also change the responsewriter from json to javabin to improve 
performance. 
Or increase network bandwidth. Then often people fetch more from solr than they 
need. There is a huge saving potential. Increasing the cores for https 
encryption can sometimes help.

Compression also leads to other issues (performance but potentially also 
security wise).

> Am 21.02.2019 um 12:03 schrieb Luthien Dulk :
> 
> hi all,
> 
> I was wondering if anyone could point me in the right direction. 
> 
> I am looking into whether enabling Gzip HTTP compression for our Solr 
> clusters (all running Solr 6.6.5) would help performance; my problem is that 
> I can’t figure out how to do that.
> 
> Our infrastructure setup is like this: our applications are running on a 
> Cloud Foundry PAAS environment, but our Solr clusters run elsewhere. 
> Communication between applications and Solr clusters is secured by firewalls 
> on every Solr machine (we do have a Socks Proxy set up in the CF environment, 
> but unfortunately we can't use that for Solr because of the incompatibility 
> between Zookeeper and Java Nio I/O - much to the chagrin of our sysadmin).
> 
> We think that HTTP compression might be very interesting for us because of 
> the hight volume of traffic between two separate environments.
> 
> Here’s what I found out so far: 
> 
> (re. config changes in Solr’s embedded Jetty)
> - I’m aware that this is mostly a matter of configuring Jetty;
> - it seems that this should preferably be set in the solr-jetty-context.xml 
> file;
> - this seems to relate to enabling Jetty's “GzipHandler”
> 
> (re. gzip ‘module’ activation ..?)
> - it puzzles me that 
> https://aroratimus.blogspot.com/2017/08/jettyserver-9.html mentions that 
> Jetty’s GzipHandler should be enabled using two files not found in Solr's 
> embedded Jetty: server/etc/jetty-gzip.xml and server/modules/gzip.mod (they 
> are available when installing Jetty separately though);
> - apparently, Jetty's Gzip module should be activated by adding 
> —add-to-start=gzip to the server startup command. For the embedded Jetty in 
> Solr, it seems that this would require changing the solr startup script
> 
> (re. changes in Solr client)
> - the calling application should add the HTTP Accept-Encoding: gzip, deflate 
> ( according to 
> https://menelic.com/2015/12/04/deploying-solrcloud-across-multiple-data-centers-dc-performance/
>  )
> 
> 
> I wonder, has anyone ever got this working? In particular:
> 
> - is that gzip ‘module’ activation necessary? That would seem a bit 
> far-fetched, because it involves files not found in the Solr installation and 
> possibly hacking the Solr startup script;
> - what did you add to solr-jetty-context.xml in order to enable gzip 
> compression?
> 
> 
> I suppose that situations with high volumes of external network traffic 
> between Solr and Client must be quite rare. Otherwise I’d think that a 
> feature that potentially offers such obvious benefits (one of the pages above 
> mentions a drop of 75% of network traffic and a 60% faster response time) 
> would have been turned into an “enable http compression yes/no” setting by 
> now :)
> 
> Anyhow, we’re stuck with it … I hope I can get it working.
> 
> 
> Thank in advance for any advice!
> 
> Luthien
> Api developer
> Europeana.eu
> 
> 
> 
> 
> 
> 
> -- 
> Disclaimer: This email and any files transmitted with it are confidential 
> and intended solely for the use of the individual or entity to whom they 
> are
> addressed. If you have received this email in error please notify the 
> system manager. If you are not the named addressee you should not 
> disseminate,
> distribute or copy this email. Please notify the sender 
> immediately by email if you have received this email by mistake and delete 
> this email from your
> system.


Trying to enable HTTP gzip compression

2019-02-21 Thread Luthien Dulk
hi all,

I was wondering if anyone could point me in the right direction. 

I am looking into whether enabling Gzip HTTP compression for our Solr clusters 
(all running Solr 6.6.5) would help performance; my problem is that I can’t 
figure out how to do that.

Our infrastructure setup is like this: our applications are running on a Cloud 
Foundry PAAS environment, but our Solr clusters run elsewhere. 
Communication between applications and Solr clusters is secured by firewalls on 
every Solr machine (we do have a Socks Proxy set up in the CF environment, but 
unfortunately we can't use that for Solr because of the incompatibility between 
Zookeeper and Java Nio I/O - much to the chagrin of our sysadmin).

We think that HTTP compression might be very interesting for us because of the 
hight volume of traffic between two separate environments.

Here’s what I found out so far: 

(re. config changes in Solr’s embedded Jetty)
- I’m aware that this is mostly a matter of configuring Jetty;
- it seems that this should preferably be set in the solr-jetty-context.xml 
file;
- this seems to relate to enabling Jetty's “GzipHandler”

(re. gzip ‘module’ activation ..?)
- it puzzles me that https://aroratimus.blogspot.com/2017/08/jettyserver-9.html 
mentions that Jetty’s GzipHandler should be enabled using two files not found 
in Solr's embedded Jetty: server/etc/jetty-gzip.xml and server/modules/gzip.mod 
(they are available when installing Jetty separately though);
- apparently, Jetty's Gzip module should be activated by adding 
—add-to-start=gzip to the server startup command. For the embedded Jetty in 
Solr, it seems that this would require changing the solr startup script

(re. changes in Solr client)
- the calling application should add the HTTP Accept-Encoding: gzip, deflate ( 
according to 
https://menelic.com/2015/12/04/deploying-solrcloud-across-multiple-data-centers-dc-performance/
 )


I wonder, has anyone ever got this working? In particular:

- is that gzip ‘module’ activation necessary? That would seem a bit 
far-fetched, because it involves files not found in the Solr installation and 
possibly hacking the Solr startup script;
- what did you add to solr-jetty-context.xml in order to enable gzip 
compression?


I suppose that situations with high volumes of external network traffic between 
Solr and Client must be quite rare. Otherwise I’d think that a feature that 
potentially offers such obvious benefits (one of the pages above mentions a 
drop of 75% of network traffic and a 60% faster response time) would have been 
turned into an “enable http compression yes/no” setting by now :)

Anyhow, we’re stuck with it … I hope I can get it working.


Thank in advance for any advice!

Luthien
Api developer
Europeana.eu






-- 
Disclaimer: This email and any files transmitted with it are confidential 
and intended solely for the use of the individual or entity to whom they 
are
addressed. If you have received this email in error please notify the 
system manager. If you are not the named addressee you should not 
disseminate,
distribute or copy this email. Please notify the sender 
immediately by email if you have received this email by mistake and delete 
this email from your
system.


Re: Spring Boot Solr+ Kerberos+ Ambari

2019-02-21 Thread Rushikesh Garadade
Hi Furkan,
I think the link you provided is for ranger audit setting, please correct
me if wrong?

I use HDP 2.6.5. which has Solr 5.6

Thank you,
Rushikesh Garadade


On Thu, Feb 21, 2019, 2:57 PM Furkan KAMACI  wrote:

> Hi Rushikesh,
>
> Did you check here:
>
> https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.6.5/bk_security/content/solr_ranger_configure_solrcloud_kerberos.html
>
> By the way, which versions do you use?
>
> Kind Regards,
> Furkan KAMACI
>
> On Thu, Feb 21, 2019 at 11:41 AM Rushikesh Garadade <
> rushikeshgarad...@gmail.com> wrote:
>
> > Hi All,
> >
> > I am trying to set Kerberos for Solr which is installed on Hortonworks
> > Ambari.
> >
> > Q1. Is Ranger a mandatory component for Solr Kerberos configuration on
> > ambari.?
> >
> > I am getting little confused with documents available on internet for
> this.
> > I tried to do without ranger but not getting any success.
> >
> > If is there any good document for the same, please let me know.
> >
> > Thanks,
> > Rushikesh Garadade.
> >
>


TLOG replica, updateHandler errors in metrics, no logs

2019-02-21 Thread Markus Jelsma
Hello,

We are moving some replica's to TLOG, one collection runs 7.5, the others 7.7. 
When indexing, we see UPDATE.updateHandler.errors increment for each document 
being indexed, there is nothing in the logs.

Is this a known issue? 

Thanks,
Markus


Re: [lucene > nori ] special characters issue

2019-02-21 Thread Furkan KAMACI
Hi,

Could you give some more information about your configuration? Also, check
here for how to debug the reason:
https://lucene.apache.org/solr/guide/7_6/analysis-screen.html

Kind Regards,
Furkan KAMACI

On Tue, Feb 12, 2019 at 11:34 AM 유정인  wrote:

>
> Hi I'm using the "nori" analyzer.
>
> Whether it's an error or an intentional question.
>
> All special characters are filtered.
>
> Special characters stored in the dictionary are also filtered.
>
> How do I print special characters?
>
>


Re: Spring Boot Solr+ Kerberos+ Ambari

2019-02-21 Thread Furkan KAMACI
Hi Rushikesh,

Did you check here:
https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.6.5/bk_security/content/solr_ranger_configure_solrcloud_kerberos.html

By the way, which versions do you use?

Kind Regards,
Furkan KAMACI

On Thu, Feb 21, 2019 at 11:41 AM Rushikesh Garadade <
rushikeshgarad...@gmail.com> wrote:

> Hi All,
>
> I am trying to set Kerberos for Solr which is installed on Hortonworks
> Ambari.
>
> Q1. Is Ranger a mandatory component for Solr Kerberos configuration on
> ambari.?
>
> I am getting little confused with documents available on internet for this.
> I tried to do without ranger but not getting any success.
>
> If is there any good document for the same, please let me know.
>
> Thanks,
> Rushikesh Garadade.
>


Spring Boot Solr+ Kerberos+ Ambari

2019-02-21 Thread Rushikesh Garadade
Hi All,

I am trying to set Kerberos for Solr which is installed on Hortonworks
Ambari.

Q1. Is Ranger a mandatory component for Solr Kerberos configuration on
ambari.?

I am getting little confused with documents available on internet for this.
I tried to do without ranger but not getting any success.

If is there any good document for the same, please let me know.

Thanks,
Rushikesh Garadade.


Re: Upload Synonym to Solr Cloud

2019-02-21 Thread Jan Høydahl
Or use managed resource 
https://lucene.apache.org/solr/guide/6_6/managed-resources.html#ManagedResources-Synonyms

Jan Høydahl

> 21. feb. 2019 kl. 01:50 skrev Erick Erickson :
> 
> bin/solr zk -help
> particularly
> bin/solr zk cp
> 
>> On Feb 20, 2019, at 4:00 PM, Rathor, Piyush (US - Philadelphia) 
>>  wrote:
>> 
>> I am new to solr.
>> Need command to upload synonym.txt to solr cloud.
>> 
>> Thanks & Regards
>> 
>> 
>> This message (including any attachments) contains confidential information 
>> intended for a specific individual and purpose, and is protected by law. If 
>> you are not the intended recipient, you should delete this message and any 
>> disclosure, copying, or distribution of this message, or the taking of any 
>> action based on it, by you is strictly prohibited.
>> 
>> v.E.1
>