Re: Can solr index replacement character

2020-12-01 Thread Erick Erickson
Solr handles UTF-8, so it should be able to. The problem you’ll have is
getting the UTF-8 characters to get through all the various transport
encodings, i.e. if you try to search from a browser, you need to encode
it so the browser passes it through. If you search through SolrJ, it needs
to be encoded at that level. If you use cURL, it needs another….

> On Dec 1, 2020, at 12:30 AM, Eran Buchnick  wrote:
> 
> Hi community,
> During integration tests with new data source I have noticed weird scenario
> where replacement character can't be searched, though, seems to be stored.
> I mean, honestly, I don't want that irrelevant data stored in my index but
> I wondered if solr can index replacement character (U+FFFD �) as string, if
> so, how to search it?
> And in general, is there any built-in char filtration?!
> 
> Thanks



Can solr index replacement character

2020-11-30 Thread Eran Buchnick
Hi community,
During integration tests with new data source I have noticed weird scenario
where replacement character can't be searched, though, seems to be stored.
I mean, honestly, I don't want that irrelevant data stored in my index but
I wondered if solr can index replacement character (U+FFFD �) as string, if
so, how to search it?
And in general, is there any built-in char filtration?!

Thanks


Re: How to forcefully open new searcher, in case when there is no change in Solr index

2020-08-10 Thread Erick Erickson
Are you also posting the same question as :Akshay Murarka ? 
Please do not do this if so, use one e-mail address.


would in-place updates serve your use-case better? See: 
https://lucene.apache.org/solr/guide/8_1/updating-parts-of-documents.html

> On Aug 10, 2020, at 8:17 AM, raj.yadav  wrote:
> 
> I have a use case where none of the document in my solr index is changing but
> I still want to open a new searcher through the curl api. 
> 
> On executing the below curl command 
> curl
> "XXX.XX.XX.XXX:9744/solr/mycollection/update?openSearcher=true&commit=true"
> it doesn't open a new searcher. 
> 
> Below is what I get in logs
> 2020-08-10 09:32:22.696 INFO  (qtp297786644-6824) [c:mycollection
> s:shard1_1_0 r:core_node6 x:mycollection_shard1_1_0_replica1]
> o.a.s.u.DirectUpdateHandler2 start
> commit{,optimize=false,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=false,prepareCommit=false}
> 2020-08-10 09:32:22.696 INFO  (qtp297786644-6819) [c:mycollection
> s:shard1_0_1 r:core_node5 x:mycollection_shard1_0_1_replica1]
> o.a.s.u.DirectUpdateHandler2 start
> commit{,optimize=false,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=false,prepareCommit=false}
> 2020-08-10 09:32:22.696 INFO  (qtp297786644-6829) [c:mycollection
> s:shard1_0_0 r:core_node4 x:mycollection_shard1_0_0_replica1]
> o.a.s.u.DirectUpdateHandler2 start
> commit{,optimize=false,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=false,prepareCommit=false}
> 2020-08-10 09:32:22.696 INFO  (qtp297786644-6824) [c:mycollection
> s:shard1_1_0 r:core_node6 x:mycollection_shard1_1_0_replica1]
> o.a.s.u.DirectUpdateHandler2 No uncommitted changes. Skipping IW.commit.
> 2020-08-10 09:32:22.696 INFO  (qtp297786644-6819) [c:mycollection
> s:shard1_0_1 r:core_node5 x:mycollection_shard1_0_1_replica1]
> o.a.s.u.DirectUpdateHandler2 No uncommitted changes. Skipping IW.commit.
> 2020-08-10 09:32:22.696 INFO  (qtp297786644-6766) [c:mycollection
> s:shard1_1_1 r:core_node7 x:mycollection_shard1_1_1_replica1]
> o.a.s.u.DirectUpdateHandler2 start
> commit{,optimize=false,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=false,prepareCommit=false}
> 2020-08-10 09:32:22.696 INFO  (qtp297786644-6829) [c:mycollection
> s:shard1_0_0 r:core_node4 x:mycollection_shard1_0_0_replica1]
> o.a.s.u.DirectUpdateHandler2 No uncommitted changes. Skipping IW.commit.
> 2020-08-10 09:32:22.696 INFO  (qtp297786644-6766) [c:mycollection
> s:shard1_1_1 r:core_node7 x:mycollection_shard1_1_1_replica1]
> o.a.s.u.DirectUpdateHandler2 No uncommitted changes. Skipping IW.commit.
> 2020-08-10 09:32:22.697 INFO  (qtp297786644-6824) [c:mycollection
> s:shard1_1_0 r:core_node6 x:mycollection_shard1_1_0_replica1]
> o.a.s.c.SolrCore SolrIndexSearcher has not changed - not re-opening:
> org.apache.solr.search.SolrIndexSearcher
> 2020-08-10 09:32:22.697 INFO  (qtp297786644-6819) [c:mycollection
> s:shard1_0_1 r:core_node5 x:mycollection_shard1_0_1_replica1]
> o.a.s.c.SolrCore SolrIndexSearcher has not changed - not re-opening:
> org.apache.solr.search.SolrIndexSearcher
> 2020-08-10 09:32:22.697 INFO  (qtp297786644-6829) [c:mycollection
> s:shard1_0_0 r:core_node4 x:mycollection_shard1_0_0_replica1]
> o.a.s.c.SolrCore SolrIndexSearcher has not changed - not re-opening:
> org.apache.solr.search.SolrIndexSearcher
> 2020-08-10 09:32:22.697 INFO  (qtp297786644-6824) [c:mycollection
> s:shard1_1_0 r:core_node6 x:mycollection_shard1_1_0_replica1]
> o.a.s.u.DirectUpdateHandler2 end_commit_flush
> 2020-08-10 09:32:22.697 INFO  (qtp297786644-6819) [c:mycollection
> s:shard1_0_1 r:core_node5 x:mycollection_shard1_0_1_replica1]
> o.a.s.u.DirectUpdateHandler2 end_commit_flush
> 2020-08-10 09:32:22.697 INFO  (qtp297786644-6829) [c:mycollection
> s:shard1_0_0 r:core_node4 x:mycollection_shard1_0_0_replica1]
> o.a.s.u.DirectUpdateHandler2 end_commit_flush
> 
> I don't want to do a complete reload of my collection.
> Is there any parameter that can be used to forcefully open a new searcher
> every time I do a commit with openSearcher=true. 
> 
> In our collection there are few ExternalFileField type and changes in the
> external file is not getting reflected on issuing commits (using the curl
> command mentioned above).
> 
> Thanks in advance for the help
> 
> 
> 
> --
> Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html



How to forcefully open new searcher, in case when there is no change in Solr index

2020-08-10 Thread raj.yadav
I have a use case where none of the document in my solr index is changing but
I still want to open a new searcher through the curl api. 

On executing the below curl command 
curl
"XXX.XX.XX.XXX:9744/solr/mycollection/update?openSearcher=true&commit=true"
it doesn't open a new searcher. 

Below is what I get in logs
2020-08-10 09:32:22.696 INFO  (qtp297786644-6824) [c:mycollection
s:shard1_1_0 r:core_node6 x:mycollection_shard1_1_0_replica1]
o.a.s.u.DirectUpdateHandler2 start
commit{,optimize=false,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=false,prepareCommit=false}
2020-08-10 09:32:22.696 INFO  (qtp297786644-6819) [c:mycollection
s:shard1_0_1 r:core_node5 x:mycollection_shard1_0_1_replica1]
o.a.s.u.DirectUpdateHandler2 start
commit{,optimize=false,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=false,prepareCommit=false}
2020-08-10 09:32:22.696 INFO  (qtp297786644-6829) [c:mycollection
s:shard1_0_0 r:core_node4 x:mycollection_shard1_0_0_replica1]
o.a.s.u.DirectUpdateHandler2 start
commit{,optimize=false,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=false,prepareCommit=false}
2020-08-10 09:32:22.696 INFO  (qtp297786644-6824) [c:mycollection
s:shard1_1_0 r:core_node6 x:mycollection_shard1_1_0_replica1]
o.a.s.u.DirectUpdateHandler2 No uncommitted changes. Skipping IW.commit.
2020-08-10 09:32:22.696 INFO  (qtp297786644-6819) [c:mycollection
s:shard1_0_1 r:core_node5 x:mycollection_shard1_0_1_replica1]
o.a.s.u.DirectUpdateHandler2 No uncommitted changes. Skipping IW.commit.
2020-08-10 09:32:22.696 INFO  (qtp297786644-6766) [c:mycollection
s:shard1_1_1 r:core_node7 x:mycollection_shard1_1_1_replica1]
o.a.s.u.DirectUpdateHandler2 start
commit{,optimize=false,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=false,prepareCommit=false}
2020-08-10 09:32:22.696 INFO  (qtp297786644-6829) [c:mycollection
s:shard1_0_0 r:core_node4 x:mycollection_shard1_0_0_replica1]
o.a.s.u.DirectUpdateHandler2 No uncommitted changes. Skipping IW.commit.
2020-08-10 09:32:22.696 INFO  (qtp297786644-6766) [c:mycollection
s:shard1_1_1 r:core_node7 x:mycollection_shard1_1_1_replica1]
o.a.s.u.DirectUpdateHandler2 No uncommitted changes. Skipping IW.commit.
2020-08-10 09:32:22.697 INFO  (qtp297786644-6824) [c:mycollection
s:shard1_1_0 r:core_node6 x:mycollection_shard1_1_0_replica1]
o.a.s.c.SolrCore SolrIndexSearcher has not changed - not re-opening:
org.apache.solr.search.SolrIndexSearcher
2020-08-10 09:32:22.697 INFO  (qtp297786644-6819) [c:mycollection
s:shard1_0_1 r:core_node5 x:mycollection_shard1_0_1_replica1]
o.a.s.c.SolrCore SolrIndexSearcher has not changed - not re-opening:
org.apache.solr.search.SolrIndexSearcher
2020-08-10 09:32:22.697 INFO  (qtp297786644-6829) [c:mycollection
s:shard1_0_0 r:core_node4 x:mycollection_shard1_0_0_replica1]
o.a.s.c.SolrCore SolrIndexSearcher has not changed - not re-opening:
org.apache.solr.search.SolrIndexSearcher
2020-08-10 09:32:22.697 INFO  (qtp297786644-6824) [c:mycollection
s:shard1_1_0 r:core_node6 x:mycollection_shard1_1_0_replica1]
o.a.s.u.DirectUpdateHandler2 end_commit_flush
2020-08-10 09:32:22.697 INFO  (qtp297786644-6819) [c:mycollection
s:shard1_0_1 r:core_node5 x:mycollection_shard1_0_1_replica1]
o.a.s.u.DirectUpdateHandler2 end_commit_flush
2020-08-10 09:32:22.697 INFO  (qtp297786644-6829) [c:mycollection
s:shard1_0_0 r:core_node4 x:mycollection_shard1_0_0_replica1]
o.a.s.u.DirectUpdateHandler2 end_commit_flush

I don't want to do a complete reload of my collection.
Is there any parameter that can be used to forcefully open a new searcher
every time I do a commit with openSearcher=true. 

In our collection there are few ExternalFileField type and changes in the
external file is not getting reflected on issuing commits (using the curl
command mentioned above).

Thanks in advance for the help



--
Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Solr index size has increased in solr 7.7.2

2020-04-15 Thread David Hastings
i wouldnt worry about the index size until you get above a half terabyte or
so.  adding doc values and other features means you sacrifice things that
dont matter, like size.  memory and ssd's are cheap.

On Wed, Apr 15, 2020 at 1:21 PM Rajdeep Sahoo 
wrote:

> Hi all
> We are migrating from solr 4.6 to solr 7.7.2.
> In solr 4.6 the size was 2.5 gb but here in solr 7.7.2 the solr index size
> is showing 6.8 gb with the same no of documents. Is it expected behavior or
> any suggestions how to optimize the size.
>


Solr index size has increased in solr 7.7.2

2020-04-15 Thread Rajdeep Sahoo
Hi all
We are migrating from solr 4.6 to solr 7.7.2.
In solr 4.6 the size was 2.5 gb but here in solr 7.7.2 the solr index size
is showing 6.8 gb with the same no of documents. Is it expected behavior or
any suggestions how to optimize the size.


Re: offline Solr index creation

2020-02-13 Thread Erick Erickson
Indexing rates scale pretty linearly with the number of shards, so one
way to increase throughput is to simply create a collection with
more shards. For the initial bulk-indexing operations, you can 
go with a 1-replica-per-shard scenario then ADDREPLICA if you need
to build things out.

However… that may leave you with more shards than you really want, but
that’s usually not an impediment.

The MapReduceIndexerTool uses something called the embedded solr server,
so it’s really using Solr under the covers.

All that said, I’m not yet convinced you need to go there. How are you
sure that you’re really driving Solr hard? Are you pegging all the CPUs on
all your Solr nodes while indexing? Very often I see “slow indexing” be the
result of the collection process not being able to feed Solr docs fast
enough. So here’s a couple of things to look at:

1> are your CPUs on the Solr nodes running flat out? If not, you need to
work on your ingestion process. Perhaps parallelize it on the client side
so you have multiple threads throwing docs at Solr. 

2>  Comment out the bit in your SolrJ program where you call
CloudSolrClient.add(doclist). If that doesn’t change the rate you can
process your docs, then you’re spending all your time on the client
side.

Also, sanity checks: You’re not committing after every batch or anything
else like that, right? Speaking of autocommit, I’d set them in my solrconfig
be autoCommit every, say, 60 seconds with openSearcher=true and leave
it at that until proven you need something different.

You also haven’t told us about your topology. How many shards? How many
machines? I pretty much guarantee you won’t be able to fit all that data on a
single shard...

Best,
Erick

> On Feb 13, 2020, at 8:17 PM, vivek chaurasiya  wrote:
> 
> Hi there,
> 
> We are using AWS EMR as our big data processing cluster. We have like 3TB
> of text files where each line denotes a json record which I want to be
> indexed into Solr.
> 
> I have tried this by batching them and pushing to Solr index using
> SolrJClient. But I feel thats really slow.
> 
> My doubt is 2 fold:
> 
> 1. Is there a ready-to-use tool which can be used to create a Solr index
> offline and store in say S3 or somewhere.
> 2. That offline solr index file if possible in (1), how can i push it to a
> live Solr cluster?
> 
> 
> I found this tool:
> https://docs.cloudera.com/documentation/enterprise/latest/topics/search_mapreduceindexertool.html
> 
> but its really cumbersome to use and looks like at the time of creating
> offline index you need to put in shard/schema information.
> 
> Some suggestions would be greatly appreciated.
> 
> -Vivek



offline Solr index creation

2020-02-13 Thread vivek chaurasiya
Hi there,

We are using AWS EMR as our big data processing cluster. We have like 3TB
of text files where each line denotes a json record which I want to be
indexed into Solr.

I have tried this by batching them and pushing to Solr index using
SolrJClient. But I feel thats really slow.

My doubt is 2 fold:

1. Is there a ready-to-use tool which can be used to create a Solr index
offline and store in say S3 or somewhere.
2. That offline solr index file if possible in (1), how can i push it to a
live Solr cluster?


I found this tool:
https://docs.cloudera.com/documentation/enterprise/latest/topics/search_mapreduceindexertool.html

but its really cumbersome to use and looks like at the time of creating
offline index you need to put in shard/schema information.

Some suggestions would be greatly appreciated.

-Vivek


Re: solr index data from hdfs with error

2019-12-20 Thread Erick Erickson
Morphlines support was removed from Solr in Solr 6.6, see: 
https://issues.apache.org/jira/browse/SOLR-9221

So I don’t think anyone here will be very conversant in the details. I vaguely 
recall that this process added an ID field by default, but it’s been a very 
long time since I looked. Do check if you have UUIDUpdateProcessorFactory in 
your solrconfig.xml file, that automatically adds a field to a document if it 
doesn’t have one and it usually defaults to “id”.

Sorry I can’t be more help,
Erick

> On Dec 20, 2019, at 10:17 AM, bennis  wrote:
> 
> Hello
> I am new in using Solr and I need your help.
> I have data on HDFS that I  need to index with Solr.
> 
> I) My data looks like that, it is saved on hdfs  :
> ID_METIER_PCS_ESE,CD_PCS_ESE_1,LB_PCS_ESE_1,CD_PCS_ESE_2,LB_PCS_ESE_2,CD_PCS_ESE_3,LB_PCS_ESE_3,DT_DEB,DT_FIN,TS_TEC_INSERT,TS_TEC_UPDATE
> 37,3,Cadres et professions intellectuelles supérieures,35,Professions de
> l'information, des arts et des spectacles,353a,Directeurs de journaux,
> administrateurs de presse, directeurs d'éditions (littéraire, musicale,
> audiovisuelle et multimédia),01/01/70,31/12/99,08/01/19 18:13:42,274272000,
> 
> it is located here :
> ${GEOBI_NAMENODE}/user/bdatadev2/work/tmp/tmp_TD_METIER_PCS_ESE
> 
> II) I made solr-morphline.conf :
> 
> *
> SOLR_LOCATOR : {
>  # Name of solr collection
>  collection : oracle_table_test_DEV2 
> 
>  # ZooKeeper ensemble
>  zkHost : "eufrtopbdt003.randstaddta.gis:2182/solr"
> }
> 
> morphlines : [
>  {
>id : morphline1
>importCommands : ["org.kitesdk.**"]
> 
>commands : [
>  {
>readCSV {
>  separator : ","
>  # This columns should map the one configured in SolR and are
> expected in this position inside CSV
>  columns :
> [ID_METIER_PCS_ESE,CD_PCS_ESE_1,LB_PCS_ESE_1,CD_PCS_ESE_2,LB_PCS_ESE_2,CD_PCS_ESE_3,LB_PCS_ESE_3,DT_DEB,DT_FIN,TS_TEC_INSERT,TS_TEC_UPDATE]
>  ignoreFirstLine : true
>  commentPrefix : ""
>  trim : true
>  charset : UTF-8
>}
>  }
> 
>  {
>sanitizeUnknownSolrFields {
>  # Location from which to fetch Solr schema
>  solrLocator : ${SOLR_LOCATOR}
>}
>  }
> 
>  # log the record at DEBUG level to SLF4J
>  { logDebug { format : "output record: {}", args : ["@{}"] } }
> 
>  # load the record into a Solr server or MapReduce Reducer
>  {
>loadSolr {
>  solrLocator : ${SOLR_LOCATOR}
>}
>  }
> 
>]
>  }
> ]
> 
> *
> 
> 
> III) and finally my schema.xml is the following, I modified only the part to
> define FIELDS :
> *
> 
> 
> 
> 
> 
> stored="true" required="true"  docValues="false"/>
>stored="true" required="false" docValues="false"/>
>stored="true" required="false" docValues="false"/>
>stored="true" required="false" docValues="false"/>
>stored="true" required="false" docValues="false"/>
>stored="true" required="false" docValues="false"/>
>stored="true" required="false" docValues="false"/>
>stored="true" required="false" docValues="false"/>
>stored="true" required="false" docValues="false"/>
>stored="true" required="false" docValues="false"/>
>stored="true" required="false" docValues="false"/>
>required="false"/>
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
>   
> 
> 
> 
> 
> 
>   
>multiValued="true"/>
>   
>multiValued="true"/>
>   
>multiValued="true"/>
>stored="true"/>
>stored="true" multiValued="true"/>
>stored="true" multiValued="true"/>
>   
>multiValued="true"/>
>   
>multiValued="true"/>
>   
>multiValued="true"/>
> 
> 
>stored="false" />
> 
>   
>multiValued="true"/>
>   
> 
> 
>   
>   
>   
>   
>   
> 
>   
>stored="true"/>
> 
>   
>stored="true" multiValued="true"/>
> 
>   
> 
> 
> 
> 
> 
> 
> 
> 
> ID_METIER_PCS_ESE 
> 
> 
> 
> 
> 
>  -->
>
> 
> 
> sortMissingLast="true"/>
> 
> positionIncrementGap="0"/>
> positionIncrementGap="0"/>
> positionIncrementGap="0"/>
> positionIncrementGap="0"/>
> 
> positionIncrementGap="0"/>
> positionIncrementGap="0"/>
> positionIncrementGap="0"/>
> positionIncrementGap="0"/>
> 
> positionIncrementGap="0"/>
> 
> 
> positionIncrementGap="0"/>
> 
> 
> 
>
> 
>
>
>
>
>
> 
>   
> 
> 
> positionIncrementGap="100">
>  
>
>  
>
> 
> 
> positionIncrementGap="100">
>  
>
> words="stopwords.txt" />
> 
>
>
>  
>  
>
> words="stopwords.txt" />
> ignoreCase="true" expand="true"/>
>
>
>  
>
> 
> 
>  
>  
> 
> 
> 
> positionIncrementGap="100">
>  
>
> 
>ignoreCase="true"
>words="lang/stopwords_en.txt"
>/>
>
>   
> protected="protwords.txt"/>
>   
>
>  
>  
>
> ignoreCase="true" expand="true"/>
>

solr index data from hdfs with error

2019-12-20 Thread bennis
Hello
I am new in using Solr and I need your help.
I have data on HDFS that I  need to index with Solr.

I) My data looks like that, it is saved on hdfs  :
ID_METIER_PCS_ESE,CD_PCS_ESE_1,LB_PCS_ESE_1,CD_PCS_ESE_2,LB_PCS_ESE_2,CD_PCS_ESE_3,LB_PCS_ESE_3,DT_DEB,DT_FIN,TS_TEC_INSERT,TS_TEC_UPDATE
37,3,Cadres et professions intellectuelles supérieures,35,Professions de
l'information, des arts et des spectacles,353a,Directeurs de journaux,
administrateurs de presse, directeurs d'éditions (littéraire, musicale,
audiovisuelle et multimédia),01/01/70,31/12/99,08/01/19 18:13:42,274272000,

it is located here :
${GEOBI_NAMENODE}/user/bdatadev2/work/tmp/tmp_TD_METIER_PCS_ESE

II) I made solr-morphline.conf :

*
SOLR_LOCATOR : {
  # Name of solr collection
  collection : oracle_table_test_DEV2 

  # ZooKeeper ensemble
  zkHost : "eufrtopbdt003.randstaddta.gis:2182/solr"
}

morphlines : [
  {
id : morphline1
importCommands : ["org.kitesdk.**"]

commands : [
  {
readCSV {
  separator : ","
  # This columns should map the one configured in SolR and are
expected in this position inside CSV
  columns :
[ID_METIER_PCS_ESE,CD_PCS_ESE_1,LB_PCS_ESE_1,CD_PCS_ESE_2,LB_PCS_ESE_2,CD_PCS_ESE_3,LB_PCS_ESE_3,DT_DEB,DT_FIN,TS_TEC_INSERT,TS_TEC_UPDATE]
  ignoreFirstLine : true
  commentPrefix : ""
  trim : true
  charset : UTF-8
}
  }

  {
sanitizeUnknownSolrFields {
  # Location from which to fetch Solr schema
  solrLocator : ${SOLR_LOCATOR}
}
  }

  # log the record at DEBUG level to SLF4J
  { logDebug { format : "output record: {}", args : ["@{}"] } }

  # load the record into a Solr server or MapReduce Reducer
  {
loadSolr {
  solrLocator : ${SOLR_LOCATOR}
}
  }

]
  }
]

*


III) and finally my schema.xml is the following, I modified only the part to
define FIELDS :
*



 
 

   
   
   
   
   
   
   
   
   
   
   
   
   

   

   

   
   
   
   

   

   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   

   
   

   
   
   

   
   
   
   
   
   

   
   

   
   

   


   
   
 


 
 ID_METIER_PCS_ESE 

 
 
 
  
  -->






























   

   

  

  


   

  





  
  





  


 
  
  
 



  








  
  








  



  








  
  







  




  









  




  




  
  




  




  








  



  


  



  



  




  


  




  

  
  

  



  

  
  

  


 












 


   



   







  




   
 

 
   
  




   





   
  




  






  




   



   
  




   




   
  




   







  




   





  




   





  




   




  




  







  




   





  




   








  




   








  




   





  




   








  




   



Re: About Snapshot API and Backup for Solr Index

2019-11-24 Thread Paras Lehana
Hey Kaya,

Are you not able to restore with the same restore backup command?

http://localhost:8983/solr/gettingstarted/replication?command=restore&name=backup_name


Replace backup_name with the snapshot name.

On Thu, 21 Nov 2019 at 16:23, Kayak28  wrote:

> I was not clear in the last email.
> I mean "For me, it is impossible to "backup" or "restore" Solr's index by
> taking a snapshot."
>
> If I make you confuse, I am sorry about that.
>
> Sincerely,
> Kaya Ota
>
> 2019年11月21日(木) 19:50 Kayak28 :
>
> > Hello, Community Members:
> >
> > I am using Solr 7.7.4
> > I have a question about a Snapshot API.
> >
> >
> https://lucene.apache.org/solr/guide/7_4/making-and-restoring-backups.html#create-snapshot-api
> >
> > I have tested basic of snapshot APIs, create snapshot, list snapshot,
> > delete snapshot.
> >
> > As far as I know, when I do:
> > - create a snapshot: create a binary file (snapshot_N where n is
> identical
> > to segment_N) that contains a path of the index.
> > - the file is created under data/snapshot_metadata directory.
> >
> > - list snapshot: return JSON, containing all snapshot data which show
> > segment generation and path to the index.
> > - delete snapshot: delete a snapshot data from snapshot_N.
> >
> > For me, it is impossible to "backup" or "restore" Solr's index.
> >
> > So, my questions are:
> >
> > - How snapshot APIs are related to "backup" or "restore"?
> > - or more basically, when should I use snapshot API?
> > - Is there any way to make a "backup" without consuming a double-size of
> > the index? (I am asking because if I use backup API, it will copy the
> > entire index)
> > - what is the cheapest way to make a backup for Solr?
> >
> > If you help me out one of these questions or give me any clue,
> > I will really appreciate.
> >
> > Sincerely,
> > Kaya Ota
> >
> >
> >
> >
> >
> >
> >
>


-- 
-- 
Regards,

*Paras Lehana* [65871]
Development Engineer, Auto-Suggest,
IndiaMART Intermesh Ltd.

8th Floor, Tower A, Advant-Navis Business Park, Sector 142,
Noida, UP, IN - 201303

Mob.: +91-9560911996
Work: 01203916600 | Extn:  *8173*

-- 
IMPORTANT: 
NEVER share your IndiaMART OTP/ Password with anyone.


Re: About Snapshot API and Backup for Solr Index

2019-11-21 Thread Kayak28
I was not clear in the last email.
I mean "For me, it is impossible to "backup" or "restore" Solr's index by
taking a snapshot."

If I make you confuse, I am sorry about that.

Sincerely,
Kaya Ota

2019年11月21日(木) 19:50 Kayak28 :

> Hello, Community Members:
>
> I am using Solr 7.7.4
> I have a question about a Snapshot API.
>
> https://lucene.apache.org/solr/guide/7_4/making-and-restoring-backups.html#create-snapshot-api
>
> I have tested basic of snapshot APIs, create snapshot, list snapshot,
> delete snapshot.
>
> As far as I know, when I do:
> - create a snapshot: create a binary file (snapshot_N where n is identical
> to segment_N) that contains a path of the index.
> - the file is created under data/snapshot_metadata directory.
>
> - list snapshot: return JSON, containing all snapshot data which show
> segment generation and path to the index.
> - delete snapshot: delete a snapshot data from snapshot_N.
>
> For me, it is impossible to "backup" or "restore" Solr's index.
>
> So, my questions are:
>
> - How snapshot APIs are related to "backup" or "restore"?
> - or more basically, when should I use snapshot API?
> - Is there any way to make a "backup" without consuming a double-size of
> the index? (I am asking because if I use backup API, it will copy the
> entire index)
> - what is the cheapest way to make a backup for Solr?
>
> If you help me out one of these questions or give me any clue,
> I will really appreciate.
>
> Sincerely,
> Kaya Ota
>
>
>
>
>
>
>


About Snapshot API and Backup for Solr Index

2019-11-21 Thread Kayak28
Hello, Community Members:

I am using Solr 7.7.4
I have a question about a Snapshot API.
https://lucene.apache.org/solr/guide/7_4/making-and-restoring-backups.html#create-snapshot-api

I have tested basic of snapshot APIs, create snapshot, list snapshot,
delete snapshot.

As far as I know, when I do:
- create a snapshot: create a binary file (snapshot_N where n is identical
to segment_N) that contains a path of the index.
- the file is created under data/snapshot_metadata directory.

- list snapshot: return JSON, containing all snapshot data which show
segment generation and path to the index.
- delete snapshot: delete a snapshot data from snapshot_N.

For me, it is impossible to "backup" or "restore" Solr's index.

So, my questions are:

- How snapshot APIs are related to "backup" or "restore"?
- or more basically, when should I use snapshot API?
- Is there any way to make a "backup" without consuming a double-size of
the index? (I am asking because if I use backup API, it will copy the
entire index)
- what is the cheapest way to make a backup for Solr?

If you help me out one of these questions or give me any clue,
I will really appreciate.

Sincerely,
Kaya Ota


Re: Delete documents from the Solr index using SolrJ

2019-11-05 Thread Erick Erickson
OK, you have two options:

1.1> do NOT construct IDs with the version. Have two separate fields, id (which 
is the  in your schema and a _separate_ field called tracking (note, 
there’s already by default an _version_ field, with underscores, for optimistic 
locking, do not use that).

1.2> Index the new version of the doc with the exact same ID and a new version 
and a new “tracking” value

Solr will replace the old version with the new version based on the ID.

Second:
Before you re-add the doc, issue a delete-by-query that identifies the 
document, something like q=id:123*

_How_ you determine that there is a new version of the doc you need to index is 
outside of Solr, you have to do that yourself.

Best,
Erick

> On Nov 5, 2019, at 3:56 AM, Khare, Kushal (MIND) 
>  wrote:
> 
> Well, I cannot still completely relate to the solutions by you guys, am 
> looking into it as how could I achieve that with my application. Thanks !
> One thing, that I want to know is how to avoid full re-indexing, that is, 
> what I need is I don’t want that Solr index all the data every time some docs 
> are added, instead I want it to update it, that is index only newly added 
> docs. I hope this is possible, but how ?
> Because, currently I am using SolrJ  and it re-index complete data each time.
> 
> -Original Message-
> From: Peter Lancaster [mailto:peter.lancas...@findmypast.com]
> Sent: 04 November 2019 21:35
> To: solr-user@lucene.apache.org
> Subject: RE: Delete documents from the Solr index using SolrJ
> 
> You can delete documents in SolrJ by using deleteByQuery. Using this you can 
> delete any number of documents from your index or all your documents 
> depending on the query you specify as the parameter. How you use it is down 
> to your application.
> 
> You haven't said if your application performs a full re-index, but if so you 
> might find it useful to index a version number for your data which you 
> increment each time you perform the full indexing. Then you can increment 
> version, re-index data, delete data for old version number.
> 
> 
> -Original Message-
> From: Khare, Kushal (MIND) [mailto:kushal.kh...@mind-infotech.com]
> Sent: 04 November 2019 15:03
> To: solr-user@lucene.apache.org
> Subject: [EXTERNAL] RE: Delete documents from the Solr index using SolrJ
> 
> Thanks!
> Actually am working on a Java web application using SolrJ for Solr search.
> The users would actually be uploading/editing/deleting the docs. What have 
> done is defined a location/directory where the docs would be stored and 
> passed that location for indexing.
> So, I am quite confused how to carry on with the solution that you proposed. 
> Please guide !
> 
> -Original Message-
> From: David Hastings [mailto:hastings.recurs...@gmail.com]
> Sent: 04 November 2019 20:10
> To: solr-user@lucene.apache.org
> Subject: Re: Delete documents from the Solr index using SolrJ
> 
> delete them by query would do the trick unless im missing something 
> significant in what youre trying to do here. you can just pass in an xml
> command:
> '".$kill_query."'
> 
> On Mon, Nov 4, 2019 at 9:37 AM Khare, Kushal (MIND) < 
> kushal.kh...@mind-infotech.com> wrote:
> 
>> In my case, id won't be same.
>> Suppose, I have a doc with id : 20
>> Now, it's newer version would be either 20.1 or 22 What in this case?
>> -Original Message-
>> From: David Hastings [mailto:hastings.recurs...@gmail.com]
>> Sent: 04 November 2019 20:04
>> To: solr-user@lucene.apache.org
>> Subject: Re: Delete documents from the Solr index using SolrJ
>> 
>> when you add a new document using the same "id" value as another it
>> just over writes it
>> 
>> On Mon, Nov 4, 2019 at 9:30 AM Khare, Kushal (MIND) <
>> kushal.kh...@mind-infotech.com> wrote:
>> 
>>> Could you please let me know how to achieve that ?
>>> 
>>> 
>>> -Original Message-
>>> From: Jörn Franke [mailto:jornfra...@gmail.com]
>>> Sent: 04 November 2019 19:59
>>> To: solr-user@lucene.apache.org
>>> Subject: Re: Delete documents from the Solr index using SolrJ
>>> 
>>> I don’t understand why it is not possible.
>>> 
>>> However why don’t you simply overwrite the existing document instead
>>> of
>>> add+delete
>>> 
>>>> Am 04.11.2019 um 15:12 schrieb Khare, Kushal (MIND) <
>>> kushal.kh...@mind-infotech.com>:
>>>> 
>>>> Hello mates!
>>>> I want to know how we can delete the documents from the Solr index .
>>> Suppose for my system, I have a document that has bee

RE: Delete documents from the Solr index using SolrJ

2019-11-05 Thread Khare, Kushal (MIND)
Well, I cannot still completely relate to the solutions by you guys, am looking 
into it as how could I achieve that with my application. Thanks !
One thing, that I want to know is how to avoid full re-indexing, that is, what 
I need is I don’t want that Solr index all the data every time some docs are 
added, instead I want it to update it, that is index only newly added docs. I 
hope this is possible, but how ?
Because, currently I am using SolrJ  and it re-index complete data each time.

-Original Message-
From: Peter Lancaster [mailto:peter.lancas...@findmypast.com]
Sent: 04 November 2019 21:35
To: solr-user@lucene.apache.org
Subject: RE: Delete documents from the Solr index using SolrJ

You can delete documents in SolrJ by using deleteByQuery. Using this you can 
delete any number of documents from your index or all your documents depending 
on the query you specify as the parameter. How you use it is down to your 
application.

You haven't said if your application performs a full re-index, but if so you 
might find it useful to index a version number for your data which you 
increment each time you perform the full indexing. Then you can increment 
version, re-index data, delete data for old version number.


-Original Message-
From: Khare, Kushal (MIND) [mailto:kushal.kh...@mind-infotech.com]
Sent: 04 November 2019 15:03
To: solr-user@lucene.apache.org
Subject: [EXTERNAL] RE: Delete documents from the Solr index using SolrJ

Thanks!
Actually am working on a Java web application using SolrJ for Solr search.
The users would actually be uploading/editing/deleting the docs. What have done 
is defined a location/directory where the docs would be stored and passed that 
location for indexing.
So, I am quite confused how to carry on with the solution that you proposed. 
Please guide !

-Original Message-
From: David Hastings [mailto:hastings.recurs...@gmail.com]
Sent: 04 November 2019 20:10
To: solr-user@lucene.apache.org
Subject: Re: Delete documents from the Solr index using SolrJ

delete them by query would do the trick unless im missing something significant 
in what youre trying to do here. you can just pass in an xml
command:
'".$kill_query."'

On Mon, Nov 4, 2019 at 9:37 AM Khare, Kushal (MIND) < 
kushal.kh...@mind-infotech.com> wrote:

> In my case, id won't be same.
> Suppose, I have a doc with id : 20
> Now, it's newer version would be either 20.1 or 22 What in this case?
> -Original Message-
> From: David Hastings [mailto:hastings.recurs...@gmail.com]
> Sent: 04 November 2019 20:04
> To: solr-user@lucene.apache.org
> Subject: Re: Delete documents from the Solr index using SolrJ
>
> when you add a new document using the same "id" value as another it
> just over writes it
>
> On Mon, Nov 4, 2019 at 9:30 AM Khare, Kushal (MIND) <
> kushal.kh...@mind-infotech.com> wrote:
>
> > Could you please let me know how to achieve that ?
> >
> >
> > -Original Message-
> > From: Jörn Franke [mailto:jornfra...@gmail.com]
> > Sent: 04 November 2019 19:59
> > To: solr-user@lucene.apache.org
> > Subject: Re: Delete documents from the Solr index using SolrJ
> >
> > I don’t understand why it is not possible.
> >
> > However why don’t you simply overwrite the existing document instead
> > of
> > add+delete
> >
> > > Am 04.11.2019 um 15:12 schrieb Khare, Kushal (MIND) <
> > kushal.kh...@mind-infotech.com>:
> > >
> > > Hello mates!
> > > I want to know how we can delete the documents from the Solr index .
> > Suppose for my system, I have a document that has been indexed, now
> > its newer version is into use, so I want to use the latest one, for
> > that I want the previous one to be deleted from the index.
> > > Kindly help me a way out !
> > > I went through many articles and blogs, got the way (methods) for
> > deleting , but not actually, how to do it, because it's not possible
> > to delete every time by passing id's in around 50,000 doc system.
> > > Please suggest!
> > >
> > > 
> > >
> > > The information contained in this electronic message and any
> > > attachments
> > to this message are intended for the exclusive use of the
> > addressee(s) and may contain proprietary, confidential or privileged 
> > information.
> > If you are not the intended recipient, you should not disseminate,
> > distribute or copy this e-mail. Please notify the sender immediately
> > and destroy all copies of this message and any attachments. WARNING:
> > Computer viruses can be transmitted via email. The recipient should
> > check this email and any attachments fo

Re: Delete documents from the Solr index using SolrJ

2019-11-04 Thread Erick Erickson
What Walter said. If you require displaying the version number in the UI, put 
that in a separate field.

BTW, Delete-by-query can be expensive for various arcane reasons if you’re 
using SolrCloud.

> On Nov 4, 2019, at 11:08 AM, Walter Underwood  wrote:
> 
> If it is the same document, why are you changing the ID? Use the same ID and 
> you are done. You won’t need to delete previous versions.
> 
> wunder
> Walter Underwood
> wun...@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
> 
>> On Nov 4, 2019, at 8:37 AM, Khare, Kushal (MIND) 
>>  wrote:
>> 
>> In my case, id won't be same.
>> Suppose, I have a doc with id : 20
>> Now, it's newer version would be either 20.1 or 22
>> What in this case?
>> -Original Message-
>> From: David Hastings [mailto:hastings.recurs...@gmail.com]
>> Sent: 04 November 2019 20:04
>> To: solr-user@lucene.apache.org
>> Subject: Re: Delete documents from the Solr index using SolrJ
>> 
>> when you add a new document using the same "id" value as another it just 
>> over writes it
>> 
>> On Mon, Nov 4, 2019 at 9:30 AM Khare, Kushal (MIND) < 
>> kushal.kh...@mind-infotech.com> wrote:
>> 
>>> Could you please let me know how to achieve that ?
>>> 
>>> 
>>> -Original Message-
>>> From: Jörn Franke [mailto:jornfra...@gmail.com]
>>> Sent: 04 November 2019 19:59
>>> To: solr-user@lucene.apache.org
>>> Subject: Re: Delete documents from the Solr index using SolrJ
>>> 
>>> I don’t understand why it is not possible.
>>> 
>>> However why don’t you simply overwrite the existing document instead
>>> of
>>> add+delete
>>> 
>>>> Am 04.11.2019 um 15:12 schrieb Khare, Kushal (MIND) <
>>> kushal.kh...@mind-infotech.com>:
>>>> 
>>>> Hello mates!
>>>> I want to know how we can delete the documents from the Solr index .
>>> Suppose for my system, I have a document that has been indexed, now
>>> its newer version is into use, so I want to use the latest one, for
>>> that I want the previous one to be deleted from the index.
>>>> Kindly help me a way out !
>>>> I went through many articles and blogs, got the way (methods) for
>>> deleting , but not actually, how to do it, because it's not possible
>>> to delete every time by passing id's in around 50,000 doc system.
>>>> Please suggest!
>>>> 
>>>> 
>>>> 
>>>> The information contained in this electronic message and any
>>>> attachments
>>> to this message are intended for the exclusive use of the addressee(s)
>>> and may contain proprietary, confidential or privileged information.
>>> If you are not the intended recipient, you should not disseminate,
>>> distribute or copy this e-mail. Please notify the sender immediately
>>> and destroy all copies of this message and any attachments. WARNING:
>>> Computer viruses can be transmitted via email. The recipient should
>>> check this email and any attachments for the presence of viruses. The
>>> company accepts no liability for any damage caused by any
>>> virus/trojan/worms/malicious code transmitted by this email.
>>> www.motherson.com
>>> 
>>> 
>>> 
>>> The information contained in this electronic message and any
>>> attachments to this message are intended for the exclusive use of the
>>> addressee(s) and may contain proprietary, confidential or privileged
>>> information. If you are not the intended recipient, you should not
>>> disseminate, distribute or copy this e-mail. Please notify the sender
>>> immediately and destroy all copies of this message and any
>>> attachments. WARNING: Computer viruses can be transmitted via email.
>>> The recipient should check this email and any attachments for the
>>> presence of viruses. The company accepts no liability for any damage
>>> caused by any virus/trojan/worms/malicious code transmitted by this
>>> email. www.motherson.com
>>> 
>> 
>> 
>> 
>> The information contained in this electronic message and any attachments to 
>> this message are intended for the exclusive use of the addressee(s) and may 
>> contain proprietary, confidential or privileged information. If you are not 
>> the intended recipient, you should not disseminate, distribute or copy this 
>> e-mail. Please notify the sender immediately and destroy all copies of this 
>> message and any attachments. WARNING: Computer viruses can be transmitted 
>> via email. The recipient should check this email and any attachments for the 
>> presence of viruses. The company accepts no liability for any damage caused 
>> by any virus/trojan/worms/malicious code transmitted by this email. 
>> www.motherson.com
> 



Re: Delete documents from the Solr index using SolrJ

2019-11-04 Thread Walter Underwood
If it is the same document, why are you changing the ID? Use the same ID and 
you are done. You won’t need to delete previous versions.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)

> On Nov 4, 2019, at 8:37 AM, Khare, Kushal (MIND) 
>  wrote:
> 
> In my case, id won't be same.
> Suppose, I have a doc with id : 20
> Now, it's newer version would be either 20.1 or 22
> What in this case?
> -Original Message-
> From: David Hastings [mailto:hastings.recurs...@gmail.com]
> Sent: 04 November 2019 20:04
> To: solr-user@lucene.apache.org
> Subject: Re: Delete documents from the Solr index using SolrJ
> 
> when you add a new document using the same "id" value as another it just over 
> writes it
> 
> On Mon, Nov 4, 2019 at 9:30 AM Khare, Kushal (MIND) < 
> kushal.kh...@mind-infotech.com> wrote:
> 
>> Could you please let me know how to achieve that ?
>> 
>> 
>> -Original Message-
>> From: Jörn Franke [mailto:jornfra...@gmail.com]
>> Sent: 04 November 2019 19:59
>> To: solr-user@lucene.apache.org
>> Subject: Re: Delete documents from the Solr index using SolrJ
>> 
>> I don’t understand why it is not possible.
>> 
>> However why don’t you simply overwrite the existing document instead
>> of
>> add+delete
>> 
>>> Am 04.11.2019 um 15:12 schrieb Khare, Kushal (MIND) <
>> kushal.kh...@mind-infotech.com>:
>>> 
>>> Hello mates!
>>> I want to know how we can delete the documents from the Solr index .
>> Suppose for my system, I have a document that has been indexed, now
>> its newer version is into use, so I want to use the latest one, for
>> that I want the previous one to be deleted from the index.
>>> Kindly help me a way out !
>>> I went through many articles and blogs, got the way (methods) for
>> deleting , but not actually, how to do it, because it's not possible
>> to delete every time by passing id's in around 50,000 doc system.
>>> Please suggest!
>>> 
>>> 
>>> 
>>> The information contained in this electronic message and any
>>> attachments
>> to this message are intended for the exclusive use of the addressee(s)
>> and may contain proprietary, confidential or privileged information.
>> If you are not the intended recipient, you should not disseminate,
>> distribute or copy this e-mail. Please notify the sender immediately
>> and destroy all copies of this message and any attachments. WARNING:
>> Computer viruses can be transmitted via email. The recipient should
>> check this email and any attachments for the presence of viruses. The
>> company accepts no liability for any damage caused by any
>> virus/trojan/worms/malicious code transmitted by this email.
>> www.motherson.com
>> 
>> 
>> 
>> The information contained in this electronic message and any
>> attachments to this message are intended for the exclusive use of the
>> addressee(s) and may contain proprietary, confidential or privileged
>> information. If you are not the intended recipient, you should not
>> disseminate, distribute or copy this e-mail. Please notify the sender
>> immediately and destroy all copies of this message and any
>> attachments. WARNING: Computer viruses can be transmitted via email.
>> The recipient should check this email and any attachments for the
>> presence of viruses. The company accepts no liability for any damage
>> caused by any virus/trojan/worms/malicious code transmitted by this
>> email. www.motherson.com
>> 
> 
> 
> 
> The information contained in this electronic message and any attachments to 
> this message are intended for the exclusive use of the addressee(s) and may 
> contain proprietary, confidential or privileged information. If you are not 
> the intended recipient, you should not disseminate, distribute or copy this 
> e-mail. Please notify the sender immediately and destroy all copies of this 
> message and any attachments. WARNING: Computer viruses can be transmitted via 
> email. The recipient should check this email and any attachments for the 
> presence of viruses. The company accepts no liability for any damage caused 
> by any virus/trojan/worms/malicious code transmitted by this email. 
> www.motherson.com



RE: Delete documents from the Solr index using SolrJ

2019-11-04 Thread Peter Lancaster
You can delete documents in SolrJ by using deleteByQuery. Using this you can 
delete any number of documents from your index or all your documents depending 
on the query you specify as the parameter. How you use it is down to your 
application.

You haven't said if your application performs a full re-index, but if so you 
might find it useful to index a version number for your data which you 
increment each time you perform the full indexing. Then you can increment 
version, re-index data, delete data for old version number.


-Original Message-
From: Khare, Kushal (MIND) [mailto:kushal.kh...@mind-infotech.com]
Sent: 04 November 2019 15:03
To: solr-user@lucene.apache.org
Subject: [EXTERNAL] RE: Delete documents from the Solr index using SolrJ

Thanks!
Actually am working on a Java web application using SolrJ for Solr search.
The users would actually be uploading/editing/deleting the docs. What have done 
is defined a location/directory where the docs would be stored and passed that 
location for indexing.
So, I am quite confused how to carry on with the solution that you proposed. 
Please guide !

-Original Message-
From: David Hastings [mailto:hastings.recurs...@gmail.com]
Sent: 04 November 2019 20:10
To: solr-user@lucene.apache.org
Subject: Re: Delete documents from the Solr index using SolrJ

delete them by query would do the trick unless im missing something significant 
in what youre trying to do here. you can just pass in an xml
command:
'".$kill_query."'

On Mon, Nov 4, 2019 at 9:37 AM Khare, Kushal (MIND) < 
kushal.kh...@mind-infotech.com> wrote:

> In my case, id won't be same.
> Suppose, I have a doc with id : 20
> Now, it's newer version would be either 20.1 or 22 What in this case?
> -Original Message-
> From: David Hastings [mailto:hastings.recurs...@gmail.com]
> Sent: 04 November 2019 20:04
> To: solr-user@lucene.apache.org
> Subject: Re: Delete documents from the Solr index using SolrJ
>
> when you add a new document using the same "id" value as another it
> just over writes it
>
> On Mon, Nov 4, 2019 at 9:30 AM Khare, Kushal (MIND) <
> kushal.kh...@mind-infotech.com> wrote:
>
> > Could you please let me know how to achieve that ?
> >
> >
> > -Original Message-
> > From: Jörn Franke [mailto:jornfra...@gmail.com]
> > Sent: 04 November 2019 19:59
> > To: solr-user@lucene.apache.org
> > Subject: Re: Delete documents from the Solr index using SolrJ
> >
> > I don’t understand why it is not possible.
> >
> > However why don’t you simply overwrite the existing document instead
> > of
> > add+delete
> >
> > > Am 04.11.2019 um 15:12 schrieb Khare, Kushal (MIND) <
> > kushal.kh...@mind-infotech.com>:
> > >
> > > Hello mates!
> > > I want to know how we can delete the documents from the Solr index .
> > Suppose for my system, I have a document that has been indexed, now
> > its newer version is into use, so I want to use the latest one, for
> > that I want the previous one to be deleted from the index.
> > > Kindly help me a way out !
> > > I went through many articles and blogs, got the way (methods) for
> > deleting , but not actually, how to do it, because it's not possible
> > to delete every time by passing id's in around 50,000 doc system.
> > > Please suggest!
> > >
> > > 
> > >
> > > The information contained in this electronic message and any
> > > attachments
> > to this message are intended for the exclusive use of the
> > addressee(s) and may contain proprietary, confidential or privileged 
> > information.
> > If you are not the intended recipient, you should not disseminate,
> > distribute or copy this e-mail. Please notify the sender immediately
> > and destroy all copies of this message and any attachments. WARNING:
> > Computer viruses can be transmitted via email. The recipient should
> > check this email and any attachments for the presence of viruses.
> > The company accepts no liability for any damage caused by any
> > virus/trojan/worms/malicious code transmitted by this email.
> > www.motherson.com
> >
> > 
> >
> > The information contained in this electronic message and any
> > attachments to this message are intended for the exclusive use of
> > the
> > addressee(s) and may contain proprietary, confidential or privileged
> > information. If you are not the intended recipient, you should not
> > disseminate, distribute or copy this e-mail. Please notify the
> > sender immediately and destroy all copies of

RE: Delete documents from the Solr index using SolrJ

2019-11-04 Thread Khare, Kushal (MIND)
Thanks!
Actually am working on a Java web application using SolrJ for Solr search.
The users would actually be uploading/editing/deleting the docs. What have done 
is defined a location/directory where the docs would be stored and passed that 
location for indexing.
So, I am quite confused how to carry on with the solution that you proposed. 
Please guide !

-Original Message-
From: David Hastings [mailto:hastings.recurs...@gmail.com]
Sent: 04 November 2019 20:10
To: solr-user@lucene.apache.org
Subject: Re: Delete documents from the Solr index using SolrJ

delete them by query would do the trick unless im missing something significant 
in what youre trying to do here. you can just pass in an xml
command:
'".$kill_query."'

On Mon, Nov 4, 2019 at 9:37 AM Khare, Kushal (MIND) < 
kushal.kh...@mind-infotech.com> wrote:

> In my case, id won't be same.
> Suppose, I have a doc with id : 20
> Now, it's newer version would be either 20.1 or 22 What in this case?
> -Original Message-
> From: David Hastings [mailto:hastings.recurs...@gmail.com]
> Sent: 04 November 2019 20:04
> To: solr-user@lucene.apache.org
> Subject: Re: Delete documents from the Solr index using SolrJ
>
> when you add a new document using the same "id" value as another it
> just over writes it
>
> On Mon, Nov 4, 2019 at 9:30 AM Khare, Kushal (MIND) <
> kushal.kh...@mind-infotech.com> wrote:
>
> > Could you please let me know how to achieve that ?
> >
> >
> > -Original Message-
> > From: Jörn Franke [mailto:jornfra...@gmail.com]
> > Sent: 04 November 2019 19:59
> > To: solr-user@lucene.apache.org
> > Subject: Re: Delete documents from the Solr index using SolrJ
> >
> > I don’t understand why it is not possible.
> >
> > However why don’t you simply overwrite the existing document instead
> > of
> > add+delete
> >
> > > Am 04.11.2019 um 15:12 schrieb Khare, Kushal (MIND) <
> > kushal.kh...@mind-infotech.com>:
> > >
> > > Hello mates!
> > > I want to know how we can delete the documents from the Solr index .
> > Suppose for my system, I have a document that has been indexed, now
> > its newer version is into use, so I want to use the latest one, for
> > that I want the previous one to be deleted from the index.
> > > Kindly help me a way out !
> > > I went through many articles and blogs, got the way (methods) for
> > deleting , but not actually, how to do it, because it's not possible
> > to delete every time by passing id's in around 50,000 doc system.
> > > Please suggest!
> > >
> > > 
> > >
> > > The information contained in this electronic message and any
> > > attachments
> > to this message are intended for the exclusive use of the
> > addressee(s) and may contain proprietary, confidential or privileged 
> > information.
> > If you are not the intended recipient, you should not disseminate,
> > distribute or copy this e-mail. Please notify the sender immediately
> > and destroy all copies of this message and any attachments. WARNING:
> > Computer viruses can be transmitted via email. The recipient should
> > check this email and any attachments for the presence of viruses.
> > The company accepts no liability for any damage caused by any
> > virus/trojan/worms/malicious code transmitted by this email.
> > www.motherson.com
> >
> > 
> >
> > The information contained in this electronic message and any
> > attachments to this message are intended for the exclusive use of
> > the
> > addressee(s) and may contain proprietary, confidential or privileged
> > information. If you are not the intended recipient, you should not
> > disseminate, distribute or copy this e-mail. Please notify the
> > sender immediately and destroy all copies of this message and any
> > attachments. WARNING: Computer viruses can be transmitted via email.
> > The recipient should check this email and any attachments for the
> > presence of viruses. The company accepts no liability for any damage
> > caused by any virus/trojan/worms/malicious code transmitted by this
> > email. www.motherson.com
> >
>
> 
>
> The information contained in this electronic message and any
> attachments to this message are intended for the exclusive use of the
> addressee(s) and may contain proprietary, confidential or privileged
> information. If you are not the intended recipient, you should not
> disseminate, distribute or copy this e-mail. Please 

Re: Delete documents from the Solr index using SolrJ

2019-11-04 Thread David Hastings
delete them by query would do the trick unless im missing something
significant in what youre trying to do here. you can just pass in an xml
command:
'".$kill_query."'

On Mon, Nov 4, 2019 at 9:37 AM Khare, Kushal (MIND) <
kushal.kh...@mind-infotech.com> wrote:

> In my case, id won't be same.
> Suppose, I have a doc with id : 20
> Now, it's newer version would be either 20.1 or 22
> What in this case?
> -Original Message-
> From: David Hastings [mailto:hastings.recurs...@gmail.com]
> Sent: 04 November 2019 20:04
> To: solr-user@lucene.apache.org
> Subject: Re: Delete documents from the Solr index using SolrJ
>
> when you add a new document using the same "id" value as another it just
> over writes it
>
> On Mon, Nov 4, 2019 at 9:30 AM Khare, Kushal (MIND) <
> kushal.kh...@mind-infotech.com> wrote:
>
> > Could you please let me know how to achieve that ?
> >
> >
> > -Original Message-
> > From: Jörn Franke [mailto:jornfra...@gmail.com]
> > Sent: 04 November 2019 19:59
> > To: solr-user@lucene.apache.org
> > Subject: Re: Delete documents from the Solr index using SolrJ
> >
> > I don’t understand why it is not possible.
> >
> > However why don’t you simply overwrite the existing document instead
> > of
> > add+delete
> >
> > > Am 04.11.2019 um 15:12 schrieb Khare, Kushal (MIND) <
> > kushal.kh...@mind-infotech.com>:
> > >
> > > Hello mates!
> > > I want to know how we can delete the documents from the Solr index .
> > Suppose for my system, I have a document that has been indexed, now
> > its newer version is into use, so I want to use the latest one, for
> > that I want the previous one to be deleted from the index.
> > > Kindly help me a way out !
> > > I went through many articles and blogs, got the way (methods) for
> > deleting , but not actually, how to do it, because it's not possible
> > to delete every time by passing id's in around 50,000 doc system.
> > > Please suggest!
> > >
> > > 
> > >
> > > The information contained in this electronic message and any
> > > attachments
> > to this message are intended for the exclusive use of the addressee(s)
> > and may contain proprietary, confidential or privileged information.
> > If you are not the intended recipient, you should not disseminate,
> > distribute or copy this e-mail. Please notify the sender immediately
> > and destroy all copies of this message and any attachments. WARNING:
> > Computer viruses can be transmitted via email. The recipient should
> > check this email and any attachments for the presence of viruses. The
> > company accepts no liability for any damage caused by any
> > virus/trojan/worms/malicious code transmitted by this email.
> > www.motherson.com
> >
> > 
> >
> > The information contained in this electronic message and any
> > attachments to this message are intended for the exclusive use of the
> > addressee(s) and may contain proprietary, confidential or privileged
> > information. If you are not the intended recipient, you should not
> > disseminate, distribute or copy this e-mail. Please notify the sender
> > immediately and destroy all copies of this message and any
> > attachments. WARNING: Computer viruses can be transmitted via email.
> > The recipient should check this email and any attachments for the
> > presence of viruses. The company accepts no liability for any damage
> > caused by any virus/trojan/worms/malicious code transmitted by this
> > email. www.motherson.com
> >
>
> 
>
> The information contained in this electronic message and any attachments
> to this message are intended for the exclusive use of the addressee(s) and
> may contain proprietary, confidential or privileged information. If you are
> not the intended recipient, you should not disseminate, distribute or copy
> this e-mail. Please notify the sender immediately and destroy all copies of
> this message and any attachments. WARNING: Computer viruses can be
> transmitted via email. The recipient should check this email and any
> attachments for the presence of viruses. The company accepts no liability
> for any damage caused by any virus/trojan/worms/malicious code transmitted
> by this email. www.motherson.com
>


RE: Delete documents from the Solr index using SolrJ

2019-11-04 Thread Khare, Kushal (MIND)
In my case, id won't be same.
Suppose, I have a doc with id : 20
Now, it's newer version would be either 20.1 or 22
What in this case?
-Original Message-
From: David Hastings [mailto:hastings.recurs...@gmail.com]
Sent: 04 November 2019 20:04
To: solr-user@lucene.apache.org
Subject: Re: Delete documents from the Solr index using SolrJ

when you add a new document using the same "id" value as another it just over 
writes it

On Mon, Nov 4, 2019 at 9:30 AM Khare, Kushal (MIND) < 
kushal.kh...@mind-infotech.com> wrote:

> Could you please let me know how to achieve that ?
>
>
> -Original Message-
> From: Jörn Franke [mailto:jornfra...@gmail.com]
> Sent: 04 November 2019 19:59
> To: solr-user@lucene.apache.org
> Subject: Re: Delete documents from the Solr index using SolrJ
>
> I don’t understand why it is not possible.
>
> However why don’t you simply overwrite the existing document instead
> of
> add+delete
>
> > Am 04.11.2019 um 15:12 schrieb Khare, Kushal (MIND) <
> kushal.kh...@mind-infotech.com>:
> >
> > Hello mates!
> > I want to know how we can delete the documents from the Solr index .
> Suppose for my system, I have a document that has been indexed, now
> its newer version is into use, so I want to use the latest one, for
> that I want the previous one to be deleted from the index.
> > Kindly help me a way out !
> > I went through many articles and blogs, got the way (methods) for
> deleting , but not actually, how to do it, because it's not possible
> to delete every time by passing id's in around 50,000 doc system.
> > Please suggest!
> >
> > 
> >
> > The information contained in this electronic message and any
> > attachments
> to this message are intended for the exclusive use of the addressee(s)
> and may contain proprietary, confidential or privileged information.
> If you are not the intended recipient, you should not disseminate,
> distribute or copy this e-mail. Please notify the sender immediately
> and destroy all copies of this message and any attachments. WARNING:
> Computer viruses can be transmitted via email. The recipient should
> check this email and any attachments for the presence of viruses. The
> company accepts no liability for any damage caused by any
> virus/trojan/worms/malicious code transmitted by this email.
> www.motherson.com
>
> 
>
> The information contained in this electronic message and any
> attachments to this message are intended for the exclusive use of the
> addressee(s) and may contain proprietary, confidential or privileged
> information. If you are not the intended recipient, you should not
> disseminate, distribute or copy this e-mail. Please notify the sender
> immediately and destroy all copies of this message and any
> attachments. WARNING: Computer viruses can be transmitted via email.
> The recipient should check this email and any attachments for the
> presence of viruses. The company accepts no liability for any damage
> caused by any virus/trojan/worms/malicious code transmitted by this
> email. www.motherson.com
>



The information contained in this electronic message and any attachments to 
this message are intended for the exclusive use of the addressee(s) and may 
contain proprietary, confidential or privileged information. If you are not the 
intended recipient, you should not disseminate, distribute or copy this e-mail. 
Please notify the sender immediately and destroy all copies of this message and 
any attachments. WARNING: Computer viruses can be transmitted via email. The 
recipient should check this email and any attachments for the presence of 
viruses. The company accepts no liability for any damage caused by any 
virus/trojan/worms/malicious code transmitted by this email. www.motherson.com


RE: Delete documents from the Solr index using SolrJ

2019-11-04 Thread Khare, Kushal (MIND)
Basically , what I need is to refresh the index. Suppose, in a directory I have 
4 docs, that have been indexed. So, my search works upon those 4.
Now, when I delete one of them, re-index and search, still that deleted 
document from the directory is being searched upon.
Hope I have made it a bit more clear now.

-Original Message-
From: Khare, Kushal (MIND) [mailto:kushal.kh...@mind-infotech.com]
Sent: 04 November 2019 20:00
To: solr-user@lucene.apache.org
Subject: RE: Delete documents from the Solr index using SolrJ

Could you please let me know how to achieve that ?


-Original Message-
From: Jörn Franke [mailto:jornfra...@gmail.com]
Sent: 04 November 2019 19:59
To: solr-user@lucene.apache.org
Subject: Re: Delete documents from the Solr index using SolrJ

I don’t understand why it is not possible.

However why don’t you simply overwrite the existing document instead of 
add+delete

> Am 04.11.2019 um 15:12 schrieb Khare, Kushal (MIND) 
> :
>
> Hello mates!
> I want to know how we can delete the documents from the Solr index . Suppose 
> for my system, I have a document that has been indexed, now its newer version 
> is into use, so I want to use the latest one, for that I want the previous 
> one to be deleted from the index.
> Kindly help me a way out !
> I went through many articles and blogs, got the way (methods) for deleting , 
> but not actually, how to do it, because it's not possible to delete every 
> time by passing id's in around 50,000 doc system.
> Please suggest!
>
> 
>
> The information contained in this electronic message and any attachments to 
> this message are intended for the exclusive use of the addressee(s) and may 
> contain proprietary, confidential or privileged information. If you are not 
> the intended recipient, you should not disseminate, distribute or copy this 
> e-mail. Please notify the sender immediately and destroy all copies of this 
> message and any attachments. WARNING: Computer viruses can be transmitted via 
> email. The recipient should check this email and any attachments for the 
> presence of viruses. The company accepts no liability for any damage caused 
> by any virus/trojan/worms/malicious code transmitted by this email. 
> www.motherson.com



The information contained in this electronic message and any attachments to 
this message are intended for the exclusive use of the addressee(s) and may 
contain proprietary, confidential or privileged information. If you are not the 
intended recipient, you should not disseminate, distribute or copy this e-mail. 
Please notify the sender immediately and destroy all copies of this message and 
any attachments. WARNING: Computer viruses can be transmitted via email. The 
recipient should check this email and any attachments for the presence of 
viruses. The company accepts no liability for any damage caused by any 
virus/trojan/worms/malicious code transmitted by this email. www.motherson.com



The information contained in this electronic message and any attachments to 
this message are intended for the exclusive use of the addressee(s) and may 
contain proprietary, confidential or privileged information. If you are not the 
intended recipient, you should not disseminate, distribute or copy this e-mail. 
Please notify the sender immediately and destroy all copies of this message and 
any attachments. WARNING: Computer viruses can be transmitted via email. The 
recipient should check this email and any attachments for the presence of 
viruses. The company accepts no liability for any damage caused by any 
virus/trojan/worms/malicious code transmitted by this email. www.motherson.com


Re: Delete documents from the Solr index using SolrJ

2019-11-04 Thread David Hastings
when you add a new document using the same "id" value as another it just
over writes it

On Mon, Nov 4, 2019 at 9:30 AM Khare, Kushal (MIND) <
kushal.kh...@mind-infotech.com> wrote:

> Could you please let me know how to achieve that ?
>
>
> -Original Message-
> From: Jörn Franke [mailto:jornfra...@gmail.com]
> Sent: 04 November 2019 19:59
> To: solr-user@lucene.apache.org
> Subject: Re: Delete documents from the Solr index using SolrJ
>
> I don’t understand why it is not possible.
>
> However why don’t you simply overwrite the existing document instead of
> add+delete
>
> > Am 04.11.2019 um 15:12 schrieb Khare, Kushal (MIND) <
> kushal.kh...@mind-infotech.com>:
> >
> > Hello mates!
> > I want to know how we can delete the documents from the Solr index .
> Suppose for my system, I have a document that has been indexed, now its
> newer version is into use, so I want to use the latest one, for that I want
> the previous one to be deleted from the index.
> > Kindly help me a way out !
> > I went through many articles and blogs, got the way (methods) for
> deleting , but not actually, how to do it, because it's not possible to
> delete every time by passing id's in around 50,000 doc system.
> > Please suggest!
> >
> > 
> >
> > The information contained in this electronic message and any attachments
> to this message are intended for the exclusive use of the addressee(s) and
> may contain proprietary, confidential or privileged information. If you are
> not the intended recipient, you should not disseminate, distribute or copy
> this e-mail. Please notify the sender immediately and destroy all copies of
> this message and any attachments. WARNING: Computer viruses can be
> transmitted via email. The recipient should check this email and any
> attachments for the presence of viruses. The company accepts no liability
> for any damage caused by any virus/trojan/worms/malicious code transmitted
> by this email. www.motherson.com
>
> 
>
> The information contained in this electronic message and any attachments
> to this message are intended for the exclusive use of the addressee(s) and
> may contain proprietary, confidential or privileged information. If you are
> not the intended recipient, you should not disseminate, distribute or copy
> this e-mail. Please notify the sender immediately and destroy all copies of
> this message and any attachments. WARNING: Computer viruses can be
> transmitted via email. The recipient should check this email and any
> attachments for the presence of viruses. The company accepts no liability
> for any damage caused by any virus/trojan/worms/malicious code transmitted
> by this email. www.motherson.com
>


RE: Delete documents from the Solr index using SolrJ

2019-11-04 Thread Khare, Kushal (MIND)
Could you please let me know how to achieve that ?


-Original Message-
From: Jörn Franke [mailto:jornfra...@gmail.com]
Sent: 04 November 2019 19:59
To: solr-user@lucene.apache.org
Subject: Re: Delete documents from the Solr index using SolrJ

I don’t understand why it is not possible.

However why don’t you simply overwrite the existing document instead of 
add+delete

> Am 04.11.2019 um 15:12 schrieb Khare, Kushal (MIND) 
> :
>
> Hello mates!
> I want to know how we can delete the documents from the Solr index . Suppose 
> for my system, I have a document that has been indexed, now its newer version 
> is into use, so I want to use the latest one, for that I want the previous 
> one to be deleted from the index.
> Kindly help me a way out !
> I went through many articles and blogs, got the way (methods) for deleting , 
> but not actually, how to do it, because it's not possible to delete every 
> time by passing id's in around 50,000 doc system.
> Please suggest!
>
> 
>
> The information contained in this electronic message and any attachments to 
> this message are intended for the exclusive use of the addressee(s) and may 
> contain proprietary, confidential or privileged information. If you are not 
> the intended recipient, you should not disseminate, distribute or copy this 
> e-mail. Please notify the sender immediately and destroy all copies of this 
> message and any attachments. WARNING: Computer viruses can be transmitted via 
> email. The recipient should check this email and any attachments for the 
> presence of viruses. The company accepts no liability for any damage caused 
> by any virus/trojan/worms/malicious code transmitted by this email. 
> www.motherson.com



The information contained in this electronic message and any attachments to 
this message are intended for the exclusive use of the addressee(s) and may 
contain proprietary, confidential or privileged information. If you are not the 
intended recipient, you should not disseminate, distribute or copy this e-mail. 
Please notify the sender immediately and destroy all copies of this message and 
any attachments. WARNING: Computer viruses can be transmitted via email. The 
recipient should check this email and any attachments for the presence of 
viruses. The company accepts no liability for any damage caused by any 
virus/trojan/worms/malicious code transmitted by this email. www.motherson.com


Re: Delete documents from the Solr index using SolrJ

2019-11-04 Thread Jörn Franke
I don’t understand why it is not possible.

However why don’t you simply overwrite the existing document instead of 
add+delete

> Am 04.11.2019 um 15:12 schrieb Khare, Kushal (MIND) 
> :
> 
> Hello mates!
> I want to know how we can delete the documents from the Solr index . Suppose 
> for my system, I have a document that has been indexed, now its newer version 
> is into use, so I want to use the latest one, for that I want the previous 
> one to be deleted from the index.
> Kindly help me a way out !
> I went through many articles and blogs, got the way (methods) for deleting , 
> but not actually, how to do it, because it's not possible to delete every 
> time by passing id's in around 50,000 doc system.
> Please suggest!
> 
> 
> 
> The information contained in this electronic message and any attachments to 
> this message are intended for the exclusive use of the addressee(s) and may 
> contain proprietary, confidential or privileged information. If you are not 
> the intended recipient, you should not disseminate, distribute or copy this 
> e-mail. Please notify the sender immediately and destroy all copies of this 
> message and any attachments. WARNING: Computer viruses can be transmitted via 
> email. The recipient should check this email and any attachments for the 
> presence of viruses. The company accepts no liability for any damage caused 
> by any virus/trojan/worms/malicious code transmitted by this email. 
> www.motherson.com


Delete documents from the Solr index using SolrJ

2019-11-04 Thread Khare, Kushal (MIND)
Hello mates!
I want to know how we can delete the documents from the Solr index . Suppose 
for my system, I have a document that has been indexed, now its newer version 
is into use, so I want to use the latest one, for that I want the previous one 
to be deleted from the index.
Kindly help me a way out !
I went through many articles and blogs, got the way (methods) for deleting , 
but not actually, how to do it, because it's not possible to delete every time 
by passing id's in around 50,000 doc system.
Please suggest!



The information contained in this electronic message and any attachments to 
this message are intended for the exclusive use of the addressee(s) and may 
contain proprietary, confidential or privileged information. If you are not the 
intended recipient, you should not disseminate, distribute or copy this e-mail. 
Please notify the sender immediately and destroy all copies of this message and 
any attachments. WARNING: Computer viruses can be transmitted via email. The 
recipient should check this email and any attachments for the presence of 
viruses. The company accepts no liability for any damage caused by any 
virus/trojan/worms/malicious code transmitted by this email. www.motherson.com


Re: Solr index

2019-08-08 Thread Dario Rigolin
Do you know that your solr is open to the internet? It's better to filter
the port or at least not put here the full address...

Il giorno gio 8 ago 2019 alle ore 15:58 HTMLServices.it <
i...@htmlservices.it> ha scritto:

> Hi everyone
> I installed Solr on a test server (centos 7) to get the fastest searches
> on dovecot, Solr and new for me and I think I didn't understand how it
> works perfectly.
> I installed following the official guide on the dovecot wiki:
> https://wiki2.dovecot.org/Plugins/FTS/Solr
> but I can't get it to work properly.
>
> This is my installation that I made public provisionally without a
> password:
> http://5.39.2.59:8987/solr/#/
> (I changed port because the default one was busy)
>
> I believe that the index is not created, should it be created
> automatically? or did I do something wrong?
>
> if I run one of these two commands as a guide
> curl http://5.39.2.59:8987/solr/dovecot/update?optimize=true
> curl http://5.39.2.59:8987/solr/dovecot/update?commit=true
> I get
>
> 
> 
> 
> 0 
> 2 
> 
> 
>
> this is right? have I forgotten or am I wrong?
>
> Excuse the stupid questions but I'm seeing Solr for the first time
> thank you all
>


-- 

Dario Rigolin
Comperio srl - CTO
Mobile: +39 347 7232652 - Office: +39 0425 471482
Skype: dario.rigolin


Solr index

2019-08-08 Thread HTMLServices.it

Hi everyone
I installed Solr on a test server (centos 7) to get the fastest searches 
on dovecot, Solr and new for me and I think I didn't understand how it 
works perfectly.
I installed following the official guide on the dovecot wiki: 
https://wiki2.dovecot.org/Plugins/FTS/Solr

but I can't get it to work properly.

This is my installation that I made public provisionally without a password:
http://5.39.2.59:8987/solr/#/
(I changed port because the default one was busy)

I believe that the index is not created, should it be created 
automatically? or did I do something wrong?


if I run one of these two commands as a guide
curl http://5.39.2.59:8987/solr/dovecot/update?optimize=true
curl http://5.39.2.59:8987/solr/dovecot/update?commit=true
I get




   0 
   2 



this is right? have I forgotten or am I wrong?

Excuse the stupid questions but I'm seeing Solr for the first time
thank you all


Re: Encrypting Solr Index

2019-06-25 Thread Jörn Franke
Maybe in this scenario a Secure Enclave could make sense (eg Intel sgx)?

The scenario that you describes looks like MIT CryptDB, eg 
https://css.csail.mit.edu/cryptdb/



> Am 25.06.2019 um 21:05 schrieb Tim Casey :
> 
> My two cents worth of comment,
> 
> For our local lucene indexes we use AES encryption.  We encrypt the blocks
> on the way out, decrypt on the way in.
> We are using a C version of lucene, not the java version.  But, I suspect
> the same methodology could be applied.  This assumes the data at rest is
> the attack vector for discovering what is in the invertible index.  But
> allows for the indexing/querying to be done in the clear.  This would allow
> for stemming and the like.
> 
> If you have an attack vector in which the indexing/querying are not
> trusted, then you have a whole different set of problems.
> 
> To do stemming, you need a homomorphic encryption scheme which would allow
> per character/byte queries.  This is different type of attack vector than
> the on-disk encryption.  To me, this implies the query system itself is
> untrusted and you are indexing/querying encrypted content.  The first
> "thing" people are going to try  is to hash a token into a 256bit value
> which becomes the indexable token value.  This leads to the lack of
> stemming from above comments.  Depending on how keys are handled and hashes
> are generated you can run out of token space in the various underlying
> lucene indexes because you have more than 2 million tokens.
> 
> 
> 
>> On Tue, Jun 25, 2019 at 10:21 AM Ahuja, Sakshi  wrote:
>> 
>> I am actually looking for the best option so currently doing research on
>> it.
>> For Window's FS encryption I didn't find a way to use different
>> Username/Password. It by default takes window's username/password to
>> encrypt and decrypt.
>> 
>> I tried bitlocker too for creating encrypted virtual directory (Which
>> allows me to use different credentials) and to keep Solr Index in that but
>> somehow Solr Admin was unable to access Index from that encrypted
>> directory. Not sure how that is working.
>> 
>> If you have any idea on that- will wok for me. Thanks!
>> 
>> -Original Message-
>> From: Jörn Franke 
>> Sent: Tuesday, June 25, 2019 12:47 PM
>> To: solr-user@lucene.apache.org
>> Subject: Re: Encrypting Solr Index
>> 
>> Why does FS encryption does not serve your use case?
>> 
>> Can’t you apply it also for backups etc?
>> 
>>> Am 25.06.2019 um 17:32 schrieb Ahuja, Sakshi :
>>> 
>>> Hi,
>>> 
>>> I am using solr 6.6 and want to encrypt index for security reasons. I
>> have tried Windows FS encryption option that works but want to know if solr
>> has some inbuilt feature to encrypt index or any good way to encrypt solr
>> index?
>>> 
>>> Thanks,
>>> Sakshi
>> 


Re: Encrypting Solr Index

2019-06-25 Thread Tim Casey
My two cents worth of comment,

For our local lucene indexes we use AES encryption.  We encrypt the blocks
on the way out, decrypt on the way in.
We are using a C version of lucene, not the java version.  But, I suspect
the same methodology could be applied.  This assumes the data at rest is
the attack vector for discovering what is in the invertible index.  But
allows for the indexing/querying to be done in the clear.  This would allow
for stemming and the like.

If you have an attack vector in which the indexing/querying are not
trusted, then you have a whole different set of problems.

To do stemming, you need a homomorphic encryption scheme which would allow
per character/byte queries.  This is different type of attack vector than
the on-disk encryption.  To me, this implies the query system itself is
untrusted and you are indexing/querying encrypted content.  The first
"thing" people are going to try  is to hash a token into a 256bit value
which becomes the indexable token value.  This leads to the lack of
stemming from above comments.  Depending on how keys are handled and hashes
are generated you can run out of token space in the various underlying
lucene indexes because you have more than 2 million tokens.



On Tue, Jun 25, 2019 at 10:21 AM Ahuja, Sakshi  wrote:

> I am actually looking for the best option so currently doing research on
> it.
> For Window's FS encryption I didn't find a way to use different
> Username/Password. It by default takes window's username/password to
> encrypt and decrypt.
>
> I tried bitlocker too for creating encrypted virtual directory (Which
> allows me to use different credentials) and to keep Solr Index in that but
> somehow Solr Admin was unable to access Index from that encrypted
> directory. Not sure how that is working.
>
> If you have any idea on that- will wok for me. Thanks!
>
> -Original Message-
> From: Jörn Franke 
> Sent: Tuesday, June 25, 2019 12:47 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Encrypting Solr Index
>
> Why does FS encryption does not serve your use case?
>
> Can’t you apply it also for backups etc?
>
> > Am 25.06.2019 um 17:32 schrieb Ahuja, Sakshi :
> >
> > Hi,
> >
> > I am using solr 6.6 and want to encrypt index for security reasons. I
> have tried Windows FS encryption option that works but want to know if solr
> has some inbuilt feature to encrypt index or any good way to encrypt solr
> index?
> >
> > Thanks,
> > Sakshi
>


RE: Encrypting Solr Index

2019-06-25 Thread Ahuja, Sakshi
I am actually looking for the best option so currently doing research on it.  
For Window's FS encryption I didn't find a way to use different 
Username/Password. It by default takes window's username/password to encrypt 
and decrypt.

I tried bitlocker too for creating encrypted virtual directory (Which allows me 
to use different credentials) and to keep Solr Index in that but somehow Solr 
Admin was unable to access Index from that encrypted directory. Not sure how 
that is working. 

If you have any idea on that- will wok for me. Thanks!

-Original Message-
From: Jörn Franke  
Sent: Tuesday, June 25, 2019 12:47 PM
To: solr-user@lucene.apache.org
Subject: Re: Encrypting Solr Index

Why does FS encryption does not serve your use case?

Can’t you apply it also for backups etc?

> Am 25.06.2019 um 17:32 schrieb Ahuja, Sakshi :
> 
> Hi,
> 
> I am using solr 6.6 and want to encrypt index for security reasons. I have 
> tried Windows FS encryption option that works but want to know if solr has 
> some inbuilt feature to encrypt index or any good way to encrypt solr index?
> 
> Thanks,
> Sakshi


Re: Encrypting Solr Index

2019-06-25 Thread Jörn Franke
Why does FS encryption does not serve your use case?

Can’t you apply it also for backups etc?

> Am 25.06.2019 um 17:32 schrieb Ahuja, Sakshi :
> 
> Hi,
> 
> I am using solr 6.6 and want to encrypt index for security reasons. I have 
> tried Windows FS encryption option that works but want to know if solr has 
> some inbuilt feature to encrypt index or any good way to encrypt solr index?
> 
> Thanks,
> Sakshi


Re: Encrypting Solr Index

2019-06-25 Thread Erick Erickson
This is a recurring issue. The Hitachi solution will encrypt individual 
_tokens_ in the index, even with different keys for different users. However, 
the price is functionality.

Take wildcards. The Hitachi solution doesn’t solve this, the problem is 
basically intractable. Consider the words run, running, runner, and runs. A 
search for run* has to match all those words, and an encryption algorithm that 
encodes the first three letters identically is trivially breakable.

People do as you are, put the index on an encrypting filesystim if 
encryption-at-rest is sufficient. My personal take is that if a hacker has 
unrestricted access to the memory on your Solr servers and could read the 
unencrypted index, Solr is only one of many problems you have.

Best,
Erick

> On Jun 25, 2019, at 8:40 AM, Alexandre Rafalovitch  wrote:
> 
> No index encryption in the box. I am aware of a commercial solution but no
> details on how good or what the price is:
> https://www.hitachi-solutions.com/securesearch/
> 
> Regards,
>Alex
> 
> On Tue, Jun 25, 2019, 11:32 AM Ahuja, Sakshi,  wrote:
> 
>> Hi,
>> 
>> I am using solr 6.6 and want to encrypt index for security reasons. I have
>> tried Windows FS encryption option that works but want to know if solr has
>> some inbuilt feature to encrypt index or any good way to encrypt solr index?
>> 
>> Thanks,
>> Sakshi
>> 



Re: Encrypting Solr Index

2019-06-25 Thread Alexandre Rafalovitch
No index encryption in the box. I am aware of a commercial solution but no
details on how good or what the price is:
https://www.hitachi-solutions.com/securesearch/

Regards,
Alex

On Tue, Jun 25, 2019, 11:32 AM Ahuja, Sakshi,  wrote:

> Hi,
>
> I am using solr 6.6 and want to encrypt index for security reasons. I have
> tried Windows FS encryption option that works but want to know if solr has
> some inbuilt feature to encrypt index or any good way to encrypt solr index?
>
> Thanks,
> Sakshi
>


Encrypting Solr Index

2019-06-25 Thread Ahuja, Sakshi
Hi,

I am using solr 6.6 and want to encrypt index for security reasons. I have 
tried Windows FS encryption option that works but want to know if solr has some 
inbuilt feature to encrypt index or any good way to encrypt solr index?

Thanks,
Sakshi


Re: Solr index slow response

2019-03-19 Thread Walter Underwood
A sharded index will go faster, because the indexing workload is split among 
the machines.

A 5 Mbyte batch for indexing seems a little large, but it may be OK. Increase 
the client threads until you get CPU around 80%. 

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)

> On Mar 19, 2019, at 8:53 AM, Aaron Yingcai Sun  wrote:
> 
> Hello, Walter,
> 
> Thanks for the hint. it looks like the size matters, our documents size are 
> not fixed, there are many small documents, such as 59KB perl 10 documents, 
> the response time is around 10ms which is pretty good, there I could let it 
> send bigger batch still I get reasonable response time.
> 
> 
> I will try with Solr Could cluster, maybe get better speed there.
> 
> 
> //Aaron
> 
> 
> From: Walter Underwood 
> Sent: Tuesday, March 19, 2019 3:29:17 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Solr index slow response
> 
> Indexing is CPU bound. If you have enough RAM, SSD disks, and enough client 
> threads, you should be able to drive CPU to over 90%.
> 
> Start with two client threads per CPU. That allows one thread to be sending 
> data over the network while another is waiting for Solr to process the batch.
> 
> A couple of years ago, I was indexing a million docs per minute into a Solr 
> Cloud cluster. I think that was four shards on instances with 16 CPUs, so it 
> was 64 CPUs available for indexing. That was with Java 8, G1GC, and 8 GB of 
> heap.
> 
> Your document are averaging about 50 kbytes, which is pretty big. Our 
> documents average about 3.5 kbytes. A lot of the indexing work is handling 
> the text, so those larger documents would be at least 10X slower than ours.
> 
> Are you doing atomic updates? That would slow things down a lot.
> 
> If you want to use G1GC, use the configuration I sent earlier.
> 
> wunder
> Walter Underwood
> wun...@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
> 
>> On Mar 19, 2019, at 7:05 AM, Bernd Fehling  
>> wrote:
>> 
>> Isn't there somthing about largePageTables which must be enabled
>> in JAVA and also supported by OS for such huge heaps?
>> 
>> Just a guess.
>> 
>> Am 19.03.19 um 15:01 schrieb Jörn Franke:
>>> It could be an issue with jdk 8 that may not be suitable for such large 
>>> heaps. Have more nodes with smaller heaps (eg 31 gb)
>>>> Am 18.03.2019 um 11:47 schrieb Aaron Yingcai Sun :
>>>> 
>>>> Hello, Solr!
>>>> 
>>>> 
>>>> We are having some performance issue when try to send documents for solr 
>>>> to index. The repose time is very slow and unpredictable some time.
>>>> 
>>>> 
>>>> Solr server is running on a quit powerful server, 32 cpus, 400GB RAM, 
>>>> while 300 GB is reserved for solr, while this happening, cpu usage is 
>>>> around 30%, mem usage is 34%.  io also look ok according to iotop. SSD 
>>>> disk.
>>>> 
>>>> 
>>>> Our application send 100 documents to solr per request, json encoded. the 
>>>> size is around 5M each time. some times the response time is under 1 
>>>> seconds, some times could be 300 seconds, the slow response happens very 
>>>> often.
>>>> 
>>>> 
>>>> "Soft AutoCommit: disabled", "Hard AutoCommit: if uncommited for 
>>>> 360ms; if 100 uncommited docs"
>>>> 
>>>> 
>>>> There are around 100 clients sending those documents at the same time, but 
>>>> each for the client is blocking call which wait the http response then 
>>>> send the next one.
>>>> 
>>>> 
>>>> I tried to make the number of documents smaller in one request, such as 
>>>> 20, but  still I see slow response time to time, like 80 seconds.
>>>> 
>>>> 
>>>> Would you help to give some hint how improve the response time?  solr does 
>>>> not seems very loaded, there must be a way to make the response faster.
>>>> 
>>>> 
>>>> BRs
>>>> 
>>>> //Aaron
>>>> 
>>>> 
>>>> 
> 



Re: Solr index slow response

2019-03-19 Thread Aaron Yingcai Sun
Hello, Walter,

Thanks for the hint. it looks like the size matters, our documents size are not 
fixed, there are many small documents, such as 59KB perl 10 documents, the 
response time is around 10ms which is pretty good, there I could let it send 
bigger batch still I get reasonable response time.


I will try with Solr Could cluster, maybe get better speed there.


//Aaron


From: Walter Underwood 
Sent: Tuesday, March 19, 2019 3:29:17 PM
To: solr-user@lucene.apache.org
Subject: Re: Solr index slow response

Indexing is CPU bound. If you have enough RAM, SSD disks, and enough client 
threads, you should be able to drive CPU to over 90%.

Start with two client threads per CPU. That allows one thread to be sending 
data over the network while another is waiting for Solr to process the batch.

A couple of years ago, I was indexing a million docs per minute into a Solr 
Cloud cluster. I think that was four shards on instances with 16 CPUs, so it 
was 64 CPUs available for indexing. That was with Java 8, G1GC, and 8 GB of 
heap.

Your document are averaging about 50 kbytes, which is pretty big. Our documents 
average about 3.5 kbytes. A lot of the indexing work is handling the text, so 
those larger documents would be at least 10X slower than ours.

Are you doing atomic updates? That would slow things down a lot.

If you want to use G1GC, use the configuration I sent earlier.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)

> On Mar 19, 2019, at 7:05 AM, Bernd Fehling  
> wrote:
>
> Isn't there somthing about largePageTables which must be enabled
> in JAVA and also supported by OS for such huge heaps?
>
> Just a guess.
>
> Am 19.03.19 um 15:01 schrieb Jörn Franke:
>> It could be an issue with jdk 8 that may not be suitable for such large 
>> heaps. Have more nodes with smaller heaps (eg 31 gb)
>>> Am 18.03.2019 um 11:47 schrieb Aaron Yingcai Sun :
>>>
>>> Hello, Solr!
>>>
>>>
>>> We are having some performance issue when try to send documents for solr to 
>>> index. The repose time is very slow and unpredictable some time.
>>>
>>>
>>> Solr server is running on a quit powerful server, 32 cpus, 400GB RAM, while 
>>> 300 GB is reserved for solr, while this happening, cpu usage is around 30%, 
>>> mem usage is 34%.  io also look ok according to iotop. SSD disk.
>>>
>>>
>>> Our application send 100 documents to solr per request, json encoded. the 
>>> size is around 5M each time. some times the response time is under 1 
>>> seconds, some times could be 300 seconds, the slow response happens very 
>>> often.
>>>
>>>
>>> "Soft AutoCommit: disabled", "Hard AutoCommit: if uncommited for 360ms; 
>>> if 100 uncommited docs"
>>>
>>>
>>> There are around 100 clients sending those documents at the same time, but 
>>> each for the client is blocking call which wait the http response then send 
>>> the next one.
>>>
>>>
>>> I tried to make the number of documents smaller in one request, such as 20, 
>>> but  still I see slow response time to time, like 80 seconds.
>>>
>>>
>>> Would you help to give some hint how improve the response time?  solr does 
>>> not seems very loaded, there must be a way to make the response faster.
>>>
>>>
>>> BRs
>>>
>>> //Aaron
>>>
>>>
>>>



Re: Solr index slow response

2019-03-19 Thread Emir Arnautović
The fact that it is happening with a single client suggests that it is not 
about concurrency. If it is happening equally frequent, I would assume it is 
about bulks - they might appear the same but might be significantly different 
from Solr’s POV. Is it update or append always? If it is append, maybe try 
isolate bulk that is taking longer and try repeating the same bulk multiple 
times and see if it will always be slow.
Maybe try doing thread dump while processing slow bulk and it might show you 
some pointers where Solr is spending time. Or maybe even try sending a single 
doc bulks and see if some documents are significantly heavier than others.

Emir
--
Monitoring - Log Management - Alerting - Anomaly Detection
Solr & Elasticsearch Consulting Support Training - http://sematext.com/



> On 19 Mar 2019, at 13:22, Aaron Yingcai Sun  wrote:
> 
> Yes, the same behavior even with a single thread client. The following page 
> says "In general, adding many documents per update request is faster than one 
> per update request."  but in reality, add many documents per request result 
> in much longer response time, it's not liner, response time of 100 docs per 
> request  is bigger than (the response time of 10 docs per request) * 10.
> 
> 
> https://wiki.apache.org/solr/SolrPerformanceFactors#mergeFactor
> 
> SolrPerformanceFactors - Solr 
> Wiki<https://wiki.apache.org/solr/SolrPerformanceFactors#mergeFactor>
> wiki.apache.org
> Schema Design Considerations. indexed fields. The number of indexed fields 
> greatly increases the following: Memory usage during indexing ; Segment merge 
> time
> 
> 
> 
> 
> 
> From: Emir Arnautović 
> Sent: Tuesday, March 19, 2019 1:00:19 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Solr index slow response
> 
> If you start indexing with just a single thread/client, do you still see slow 
> bulks?
> 
> Emir
> --
> Monitoring - Log Management - Alerting - Anomaly Detection
> Solr & Elasticsearch Consulting Support Training - http://sematext.com/
> 
> 
> 
>> On 19 Mar 2019, at 12:54, Aaron Yingcai Sun  wrote:
>> 
>> "QTime" value is from the solr rest api response, extracted from the 
>> http/json payload.  The "Request time" is what I measured from client side, 
>> it's almost the same value as QTime, just some milliseconds difference.  I 
>> could provide tcpdump to prove that it is really solr slow response.
>> 
>> Those long response time is not really spikes, it's constantly happening, 
>> almost half of the request has such long delay.  The more document added in 
>> one request the more delay it has.
>> 
>> 
>> From: Emir Arnautović 
>> Sent: Tuesday, March 19, 2019 12:30:33 PM
>> To: solr-user@lucene.apache.org
>> Subject: Re: Solr index slow response
>> 
>> Just to add different perspective here: how do you send documents to Solr? 
>> Are those log lines from your client? Maybe it is not Solr that is slow. 
>> Could it be network or client itself. If you have some dry run on client, 
>> maybe try running it without Solr to eliminate client from the suspects.
>> 
>> Do you observe similar spikes when you run indexing with less concurrent 
>> clients?
>> 
>> It is really hard to pinpoint the issue without looking at some monitoring 
>> tool.
>> 
>> Emir
>> --
>> Monitoring - Log Management - Alerting - Anomaly Detection
>> Solr & Elasticsearch Consulting Support Training - http://sematext.com/
>> 
>> 
>> 
>>> On 19 Mar 2019, at 09:17, Aaron Yingcai Sun  wrote:
>>> 
>>> We have around 80 million documents to index, total index size around 3TB,  
>>> I guess I'm not the first one to work with this big amount of data. with 
>>> such slow response time, the index process would take around 2 weeks. While 
>>> the system resource is not very loaded, there must be a way to speed it up.
>>> 
>>> 
>>> To Walter, I don't see why G1GC would improve this, we only do index, no 
>>> query in the background. There is no memory constraint.  it's more feel 
>>> like some internal thread are blocking each other.
>>> 
>>> 
>>> I used to run with more documents in one request, that give much worse 
>>> response time, 300 documents in one request could end up 20 minutes 
>>> response time, now I changed to max 10 documents in one request, still many 
>>> response time around 30 seconds, while some of them are very fast( ~100 
>>> ms).  How come there are such big differe

Re: Solr index slow response

2019-03-19 Thread Michael Gibney
I'll second Emir's suggestion to try disabling swap. "I doubt swap would
affect it since there is such huge free memory." -- sounds reasonable, but
has not been my experience, and the stats you sent indicate that swap is in
fact being used. Also, note that in many cases setting vm.swappiness=0 is
not equivalent to disabling swap (i.e., swapoff -a). If you're inclined to
try disabling swap, verify that it's successfully disabled by checking (and
re-checking) actual swap usage (that may sound obvious or trivial, but
relying on possibly-incorrect assumptions related to amount of free memory,
swappiness, etc. can be misleading). Good luck!

On Tue, Mar 19, 2019 at 10:29 AM Walter Underwood 
wrote:

> Indexing is CPU bound. If you have enough RAM, SSD disks, and enough
> client threads, you should be able to drive CPU to over 90%.
>
> Start with two client threads per CPU. That allows one thread to be
> sending data over the network while another is waiting for Solr to process
> the batch.
>
> A couple of years ago, I was indexing a million docs per minute into a
> Solr Cloud cluster. I think that was four shards on instances with 16 CPUs,
> so it was 64 CPUs available for indexing. That was with Java 8, G1GC, and 8
> GB of heap.
>
> Your document are averaging about 50 kbytes, which is pretty big. Our
> documents average about 3.5 kbytes. A lot of the indexing work is handling
> the text, so those larger documents would be at least 10X slower than ours.
>
> Are you doing atomic updates? That would slow things down a lot.
>
> If you want to use G1GC, use the configuration I sent earlier.
>
> wunder
> Walter Underwood
> wun...@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
>
> > On Mar 19, 2019, at 7:05 AM, Bernd Fehling <
> bernd.fehl...@uni-bielefeld.de> wrote:
> >
> > Isn't there somthing about largePageTables which must be enabled
> > in JAVA and also supported by OS for such huge heaps?
> >
> > Just a guess.
> >
> > Am 19.03.19 um 15:01 schrieb Jörn Franke:
> >> It could be an issue with jdk 8 that may not be suitable for such large
> heaps. Have more nodes with smaller heaps (eg 31 gb)
> >>> Am 18.03.2019 um 11:47 schrieb Aaron Yingcai Sun :
> >>>
> >>> Hello, Solr!
> >>>
> >>>
> >>> We are having some performance issue when try to send documents for
> solr to index. The repose time is very slow and unpredictable some time.
> >>>
> >>>
> >>> Solr server is running on a quit powerful server, 32 cpus, 400GB RAM,
> while 300 GB is reserved for solr, while this happening, cpu usage is
> around 30%, mem usage is 34%.  io also look ok according to iotop. SSD disk.
> >>>
> >>>
> >>> Our application send 100 documents to solr per request, json encoded.
> the size is around 5M each time. some times the response time is under 1
> seconds, some times could be 300 seconds, the slow response happens very
> often.
> >>>
> >>>
> >>> "Soft AutoCommit: disabled", "Hard AutoCommit: if uncommited for
> 360ms; if 100 uncommited docs"
> >>>
> >>>
> >>> There are around 100 clients sending those documents at the same time,
> but each for the client is blocking call which wait the http response then
> send the next one.
> >>>
> >>>
> >>> I tried to make the number of documents smaller in one request, such
> as 20, but  still I see slow response time to time, like 80 seconds.
> >>>
> >>>
> >>> Would you help to give some hint how improve the response time?  solr
> does not seems very loaded, there must be a way to make the response faster.
> >>>
> >>>
> >>> BRs
> >>>
> >>> //Aaron
> >>>
> >>>
> >>>
>
>


Re: Solr index slow response

2019-03-19 Thread Walter Underwood
Indexing is CPU bound. If you have enough RAM, SSD disks, and enough client 
threads, you should be able to drive CPU to over 90%.

Start with two client threads per CPU. That allows one thread to be sending 
data over the network while another is waiting for Solr to process the batch.

A couple of years ago, I was indexing a million docs per minute into a Solr 
Cloud cluster. I think that was four shards on instances with 16 CPUs, so it 
was 64 CPUs available for indexing. That was with Java 8, G1GC, and 8 GB of 
heap.

Your document are averaging about 50 kbytes, which is pretty big. Our documents 
average about 3.5 kbytes. A lot of the indexing work is handling the text, so 
those larger documents would be at least 10X slower than ours.

Are you doing atomic updates? That would slow things down a lot.

If you want to use G1GC, use the configuration I sent earlier.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)

> On Mar 19, 2019, at 7:05 AM, Bernd Fehling  
> wrote:
> 
> Isn't there somthing about largePageTables which must be enabled
> in JAVA and also supported by OS for such huge heaps?
> 
> Just a guess.
> 
> Am 19.03.19 um 15:01 schrieb Jörn Franke:
>> It could be an issue with jdk 8 that may not be suitable for such large 
>> heaps. Have more nodes with smaller heaps (eg 31 gb)
>>> Am 18.03.2019 um 11:47 schrieb Aaron Yingcai Sun :
>>> 
>>> Hello, Solr!
>>> 
>>> 
>>> We are having some performance issue when try to send documents for solr to 
>>> index. The repose time is very slow and unpredictable some time.
>>> 
>>> 
>>> Solr server is running on a quit powerful server, 32 cpus, 400GB RAM, while 
>>> 300 GB is reserved for solr, while this happening, cpu usage is around 30%, 
>>> mem usage is 34%.  io also look ok according to iotop. SSD disk.
>>> 
>>> 
>>> Our application send 100 documents to solr per request, json encoded. the 
>>> size is around 5M each time. some times the response time is under 1 
>>> seconds, some times could be 300 seconds, the slow response happens very 
>>> often.
>>> 
>>> 
>>> "Soft AutoCommit: disabled", "Hard AutoCommit: if uncommited for 360ms; 
>>> if 100 uncommited docs"
>>> 
>>> 
>>> There are around 100 clients sending those documents at the same time, but 
>>> each for the client is blocking call which wait the http response then send 
>>> the next one.
>>> 
>>> 
>>> I tried to make the number of documents smaller in one request, such as 20, 
>>> but  still I see slow response time to time, like 80 seconds.
>>> 
>>> 
>>> Would you help to give some hint how improve the response time?  solr does 
>>> not seems very loaded, there must be a way to make the response faster.
>>> 
>>> 
>>> BRs
>>> 
>>> //Aaron
>>> 
>>> 
>>> 



Re: Solr index slow response

2019-03-19 Thread Bernd Fehling

Isn't there somthing about largePageTables which must be enabled
in JAVA and also supported by OS for such huge heaps?

Just a guess.

Am 19.03.19 um 15:01 schrieb Jörn Franke:

It could be an issue with jdk 8 that may not be suitable for such large heaps. 
Have more nodes with smaller heaps (eg 31 gb)


Am 18.03.2019 um 11:47 schrieb Aaron Yingcai Sun :

Hello, Solr!


We are having some performance issue when try to send documents for solr to 
index. The repose time is very slow and unpredictable some time.


Solr server is running on a quit powerful server, 32 cpus, 400GB RAM, while 300 
GB is reserved for solr, while this happening, cpu usage is around 30%, mem 
usage is 34%.  io also look ok according to iotop. SSD disk.


Our application send 100 documents to solr per request, json encoded. the size 
is around 5M each time. some times the response time is under 1 seconds, some 
times could be 300 seconds, the slow response happens very often.


"Soft AutoCommit: disabled", "Hard AutoCommit: if uncommited for 360ms; if 
100 uncommited docs"


There are around 100 clients sending those documents at the same time, but each 
for the client is blocking call which wait the http response then send the next 
one.


I tried to make the number of documents smaller in one request, such as 20, but 
 still I see slow response time to time, like 80 seconds.


Would you help to give some hint how improve the response time?  solr does not 
seems very loaded, there must be a way to make the response faster.


BRs

//Aaron





Re: Solr index slow response

2019-03-19 Thread Jörn Franke
It could be an issue with jdk 8 that may not be suitable for such large heaps. 
Have more nodes with smaller heaps (eg 31 gb)

> Am 18.03.2019 um 11:47 schrieb Aaron Yingcai Sun :
> 
> Hello, Solr!
> 
> 
> We are having some performance issue when try to send documents for solr to 
> index. The repose time is very slow and unpredictable some time.
> 
> 
> Solr server is running on a quit powerful server, 32 cpus, 400GB RAM, while 
> 300 GB is reserved for solr, while this happening, cpu usage is around 30%, 
> mem usage is 34%.  io also look ok according to iotop. SSD disk.
> 
> 
> Our application send 100 documents to solr per request, json encoded. the 
> size is around 5M each time. some times the response time is under 1 seconds, 
> some times could be 300 seconds, the slow response happens very often.
> 
> 
> "Soft AutoCommit: disabled", "Hard AutoCommit: if uncommited for 360ms; 
> if 100 uncommited docs"
> 
> 
> There are around 100 clients sending those documents at the same time, but 
> each for the client is blocking call which wait the http response then send 
> the next one.
> 
> 
> I tried to make the number of documents smaller in one request, such as 20, 
> but  still I see slow response time to time, like 80 seconds.
> 
> 
> Would you help to give some hint how improve the response time?  solr does 
> not seems very loaded, there must be a way to make the response faster.
> 
> 
> BRs
> 
> //Aaron
> 
> 
> 


Re: Solr index slow response

2019-03-19 Thread Chris Ulicny
Do you know what is causing the 30% CPU usage? The seems awfully high for
solr when only indexing (unless it is iowait), but could be expected based
on custom update processors and analyzers you may have.

Are you putting all of the documents into a single core or multiple? Also
are you using SATA or PCIe solid state drives?

We have a similar index situation. About 65million documents that take up
about 4.5TB for a single copy of the index. We've never tried to put all of
the documents into a single solr core before, so I don't know how well that
would work.

On Tue, Mar 19, 2019 at 8:28 AM Aaron Yingcai Sun  wrote:

> Yes, the same behavior even with a single thread client. The following
> page says "In general, adding many documents per update request is faster
> than one per update request."  but in reality, add many documents per
> request result in much longer response time, it's not liner, response time
> of 100 docs per request  is bigger than (the response time of 10 docs per
> request) * 10.
>
>
> https://wiki.apache.org/solr/SolrPerformanceFactors#mergeFactor
>
> SolrPerformanceFactors - Solr Wiki<
> https://wiki.apache.org/solr/SolrPerformanceFactors#mergeFactor>
> wiki.apache.org
> Schema Design Considerations. indexed fields. The number of indexed fields
> greatly increases the following: Memory usage during indexing ; Segment
> merge time
>
>
>
>
> 
> From: Emir Arnautović 
> Sent: Tuesday, March 19, 2019 1:00:19 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Solr index slow response
>
> If you start indexing with just a single thread/client, do you still see
> slow bulks?
>
> Emir
> --
> Monitoring - Log Management - Alerting - Anomaly Detection
> Solr & Elasticsearch Consulting Support Training - http://sematext.com/
>
>
>
> > On 19 Mar 2019, at 12:54, Aaron Yingcai Sun  wrote:
> >
> > "QTime" value is from the solr rest api response, extracted from the
> http/json payload.  The "Request time" is what I measured from client side,
> it's almost the same value as QTime, just some milliseconds difference.  I
> could provide tcpdump to prove that it is really solr slow response.
> >
> > Those long response time is not really spikes, it's constantly
> happening, almost half of the request has such long delay.  The more
> document added in one request the more delay it has.
> >
> > 
> > From: Emir Arnautović 
> > Sent: Tuesday, March 19, 2019 12:30:33 PM
> > To: solr-user@lucene.apache.org
> > Subject: Re: Solr index slow response
> >
> > Just to add different perspective here: how do you send documents to
> Solr? Are those log lines from your client? Maybe it is not Solr that is
> slow. Could it be network or client itself. If you have some dry run on
> client, maybe try running it without Solr to eliminate client from the
> suspects.
> >
> > Do you observe similar spikes when you run indexing with less concurrent
> clients?
> >
> > It is really hard to pinpoint the issue without looking at some
> monitoring tool.
> >
> > Emir
> > --
> > Monitoring - Log Management - Alerting - Anomaly Detection
> > Solr & Elasticsearch Consulting Support Training - http://sematext.com/
> >
> >
> >
> >> On 19 Mar 2019, at 09:17, Aaron Yingcai Sun  wrote:
> >>
> >> We have around 80 million documents to index, total index size around
> 3TB,  I guess I'm not the first one to work with this big amount of data.
> with such slow response time, the index process would take around 2 weeks.
> While the system resource is not very loaded, there must be a way to speed
> it up.
> >>
> >>
> >> To Walter, I don't see why G1GC would improve this, we only do index,
> no query in the background. There is no memory constraint.  it's more feel
> like some internal thread are blocking each other.
> >>
> >>
> >> I used to run with more documents in one request, that give much worse
> response time, 300 documents in one request could end up 20 minutes
> response time, now I changed to max 10 documents in one request, still many
> response time around 30 seconds, while some of them are very fast( ~100
> ms).  How come there are such big difference? the documents size does not
> have such big difference.
> >>
> >>
> >> I just want to speed it up since nothing seems to be overloaded.  Are
> there any other faster way to index such big amount of data?
> >>
> >>
> >> BRs
> >>
> >> //Aaron
> >>
> >&

Re: Solr index slow response

2019-03-19 Thread Aaron Yingcai Sun
Yes, the same behavior even with a single thread client. The following page 
says "In general, adding many documents per update request is faster than one 
per update request."  but in reality, add many documents per request result in 
much longer response time, it's not liner, response time of 100 docs per 
request  is bigger than (the response time of 10 docs per request) * 10.


https://wiki.apache.org/solr/SolrPerformanceFactors#mergeFactor

SolrPerformanceFactors - Solr 
Wiki<https://wiki.apache.org/solr/SolrPerformanceFactors#mergeFactor>
wiki.apache.org
Schema Design Considerations. indexed fields. The number of indexed fields 
greatly increases the following: Memory usage during indexing ; Segment merge 
time





From: Emir Arnautović 
Sent: Tuesday, March 19, 2019 1:00:19 PM
To: solr-user@lucene.apache.org
Subject: Re: Solr index slow response

If you start indexing with just a single thread/client, do you still see slow 
bulks?

Emir
--
Monitoring - Log Management - Alerting - Anomaly Detection
Solr & Elasticsearch Consulting Support Training - http://sematext.com/



> On 19 Mar 2019, at 12:54, Aaron Yingcai Sun  wrote:
>
> "QTime" value is from the solr rest api response, extracted from the 
> http/json payload.  The "Request time" is what I measured from client side, 
> it's almost the same value as QTime, just some milliseconds difference.  I 
> could provide tcpdump to prove that it is really solr slow response.
>
> Those long response time is not really spikes, it's constantly happening, 
> almost half of the request has such long delay.  The more document added in 
> one request the more delay it has.
>
> 
> From: Emir Arnautović 
> Sent: Tuesday, March 19, 2019 12:30:33 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Solr index slow response
>
> Just to add different perspective here: how do you send documents to Solr? 
> Are those log lines from your client? Maybe it is not Solr that is slow. 
> Could it be network or client itself. If you have some dry run on client, 
> maybe try running it without Solr to eliminate client from the suspects.
>
> Do you observe similar spikes when you run indexing with less concurrent 
> clients?
>
> It is really hard to pinpoint the issue without looking at some monitoring 
> tool.
>
> Emir
> --
> Monitoring - Log Management - Alerting - Anomaly Detection
> Solr & Elasticsearch Consulting Support Training - http://sematext.com/
>
>
>
>> On 19 Mar 2019, at 09:17, Aaron Yingcai Sun  wrote:
>>
>> We have around 80 million documents to index, total index size around 3TB,  
>> I guess I'm not the first one to work with this big amount of data. with 
>> such slow response time, the index process would take around 2 weeks. While 
>> the system resource is not very loaded, there must be a way to speed it up.
>>
>>
>> To Walter, I don't see why G1GC would improve this, we only do index, no 
>> query in the background. There is no memory constraint.  it's more feel like 
>> some internal thread are blocking each other.
>>
>>
>> I used to run with more documents in one request, that give much worse 
>> response time, 300 documents in one request could end up 20 minutes response 
>> time, now I changed to max 10 documents in one request, still many response 
>> time around 30 seconds, while some of them are very fast( ~100 ms).  How 
>> come there are such big difference? the documents size does not have such 
>> big difference.
>>
>>
>> I just want to speed it up since nothing seems to be overloaded.  Are there 
>> any other faster way to index such big amount of data?
>>
>>
>> BRs
>>
>> //Aaron
>>
>> 
>> From: Walter Underwood 
>> Sent: Monday, March 18, 2019 4:59:20 PM
>> To: solr-user@lucene.apache.org
>> Subject: Re: Solr index slow response
>>
>> Solr is not designed to have consistent response times for updates. You are 
>> expecting Solr to do something that it does not do.
>>
>> About Xms and Xmx, the JVM will continue to allocate memory until it hits 
>> the max. After it hits the max, it will start to collect garbage. A smaller 
>> Xms just wastes time doing allocations after the JVM is running. Avoid that 
>> by making Xms and Xms the same.
>>
>> We run all of our JVMs with 8 GB of heap and the G1 collector. You probably 
>> do not need more than 8 GB unless you are doing high-cardinality facets or 
>> some other memory-hungry querying.
>>
>> The first step would be to use a good configurati

Re: Solr index slow response

2019-03-19 Thread Emir Arnautović
If you start indexing with just a single thread/client, do you still see slow 
bulks?

Emir
--
Monitoring - Log Management - Alerting - Anomaly Detection
Solr & Elasticsearch Consulting Support Training - http://sematext.com/



> On 19 Mar 2019, at 12:54, Aaron Yingcai Sun  wrote:
> 
> "QTime" value is from the solr rest api response, extracted from the 
> http/json payload.  The "Request time" is what I measured from client side, 
> it's almost the same value as QTime, just some milliseconds difference.  I 
> could provide tcpdump to prove that it is really solr slow response.
> 
> Those long response time is not really spikes, it's constantly happening, 
> almost half of the request has such long delay.  The more document added in 
> one request the more delay it has.
> 
> 
> From: Emir Arnautović 
> Sent: Tuesday, March 19, 2019 12:30:33 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Solr index slow response
> 
> Just to add different perspective here: how do you send documents to Solr? 
> Are those log lines from your client? Maybe it is not Solr that is slow. 
> Could it be network or client itself. If you have some dry run on client, 
> maybe try running it without Solr to eliminate client from the suspects.
> 
> Do you observe similar spikes when you run indexing with less concurrent 
> clients?
> 
> It is really hard to pinpoint the issue without looking at some monitoring 
> tool.
> 
> Emir
> --
> Monitoring - Log Management - Alerting - Anomaly Detection
> Solr & Elasticsearch Consulting Support Training - http://sematext.com/
> 
> 
> 
>> On 19 Mar 2019, at 09:17, Aaron Yingcai Sun  wrote:
>> 
>> We have around 80 million documents to index, total index size around 3TB,  
>> I guess I'm not the first one to work with this big amount of data. with 
>> such slow response time, the index process would take around 2 weeks. While 
>> the system resource is not very loaded, there must be a way to speed it up.
>> 
>> 
>> To Walter, I don't see why G1GC would improve this, we only do index, no 
>> query in the background. There is no memory constraint.  it's more feel like 
>> some internal thread are blocking each other.
>> 
>> 
>> I used to run with more documents in one request, that give much worse 
>> response time, 300 documents in one request could end up 20 minutes response 
>> time, now I changed to max 10 documents in one request, still many response 
>> time around 30 seconds, while some of them are very fast( ~100 ms).  How 
>> come there are such big difference? the documents size does not have such 
>> big difference.
>> 
>> 
>> I just want to speed it up since nothing seems to be overloaded.  Are there 
>> any other faster way to index such big amount of data?
>> 
>> 
>> BRs
>> 
>> //Aaron
>> 
>> 
>> From: Walter Underwood 
>> Sent: Monday, March 18, 2019 4:59:20 PM
>> To: solr-user@lucene.apache.org
>> Subject: Re: Solr index slow response
>> 
>> Solr is not designed to have consistent response times for updates. You are 
>> expecting Solr to do something that it does not do.
>> 
>> About Xms and Xmx, the JVM will continue to allocate memory until it hits 
>> the max. After it hits the max, it will start to collect garbage. A smaller 
>> Xms just wastes time doing allocations after the JVM is running. Avoid that 
>> by making Xms and Xms the same.
>> 
>> We run all of our JVMs with 8 GB of heap and the G1 collector. You probably 
>> do not need more than 8 GB unless you are doing high-cardinality facets or 
>> some other memory-hungry querying.
>> 
>> The first step would be to use a good configuration. We start our Java 8 
>> JVMs with these parameters:
>> 
>> SOLR_HEAP=8g
>> # Use G1 GC  -- wunder 2017-01-23
>> # Settings from https://wiki.apache.org/solr/ShawnHeisey
>> GC_TUNE=" \
>> -XX:+UseG1GC \
>> -XX:+ParallelRefProcEnabled \
>> -XX:G1HeapRegionSize=8m \
>> -XX:MaxGCPauseMillis=200 \
>> -XX:+UseLargePages \
>> -XX:+AggressiveOpts \
>> “
>> 
>> Use SSD for disks, with total space about 3X as big as the expected index 
>> size.
>> 
>> Have RAM not used by Solr or the OS that is equal to the expected index size.
>> 
>> After that, let’s figure out what the real requirement is. If you must have 
>> consistent response times for update requests, you’ll need to do that 
>> outside of Solr. But if you need high data import rates, we can probably 

Re: Solr index slow response

2019-03-19 Thread Aaron Yingcai Sun
"QTime" value is from the solr rest api response, extracted from the http/json 
payload.  The "Request time" is what I measured from client side, it's almost 
the same value as QTime, just some milliseconds difference.  I could provide 
tcpdump to prove that it is really solr slow response.

Those long response time is not really spikes, it's constantly happening, 
almost half of the request has such long delay.  The more document added in one 
request the more delay it has.


From: Emir Arnautović 
Sent: Tuesday, March 19, 2019 12:30:33 PM
To: solr-user@lucene.apache.org
Subject: Re: Solr index slow response

Just to add different perspective here: how do you send documents to Solr? Are 
those log lines from your client? Maybe it is not Solr that is slow. Could it 
be network or client itself. If you have some dry run on client, maybe try 
running it without Solr to eliminate client from the suspects.

Do you observe similar spikes when you run indexing with less concurrent 
clients?

It is really hard to pinpoint the issue without looking at some monitoring tool.

Emir
--
Monitoring - Log Management - Alerting - Anomaly Detection
Solr & Elasticsearch Consulting Support Training - http://sematext.com/



> On 19 Mar 2019, at 09:17, Aaron Yingcai Sun  wrote:
>
> We have around 80 million documents to index, total index size around 3TB,  I 
> guess I'm not the first one to work with this big amount of data. with such 
> slow response time, the index process would take around 2 weeks. While the 
> system resource is not very loaded, there must be a way to speed it up.
>
>
> To Walter, I don't see why G1GC would improve this, we only do index, no 
> query in the background. There is no memory constraint.  it's more feel like 
> some internal thread are blocking each other.
>
>
> I used to run with more documents in one request, that give much worse 
> response time, 300 documents in one request could end up 20 minutes response 
> time, now I changed to max 10 documents in one request, still many response 
> time around 30 seconds, while some of them are very fast( ~100 ms).  How come 
> there are such big difference? the documents size does not have such big 
> difference.
>
>
> I just want to speed it up since nothing seems to be overloaded.  Are there 
> any other faster way to index such big amount of data?
>
>
> BRs
>
> //Aaron
>
> ________
> From: Walter Underwood 
> Sent: Monday, March 18, 2019 4:59:20 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Solr index slow response
>
> Solr is not designed to have consistent response times for updates. You are 
> expecting Solr to do something that it does not do.
>
> About Xms and Xmx, the JVM will continue to allocate memory until it hits the 
> max. After it hits the max, it will start to collect garbage. A smaller Xms 
> just wastes time doing allocations after the JVM is running. Avoid that by 
> making Xms and Xms the same.
>
> We run all of our JVMs with 8 GB of heap and the G1 collector. You probably 
> do not need more than 8 GB unless you are doing high-cardinality facets or 
> some other memory-hungry querying.
>
> The first step would be to use a good configuration. We start our Java 8 JVMs 
> with these parameters:
>
> SOLR_HEAP=8g
> # Use G1 GC  -- wunder 2017-01-23
> # Settings from https://wiki.apache.org/solr/ShawnHeisey
> GC_TUNE=" \
> -XX:+UseG1GC \
> -XX:+ParallelRefProcEnabled \
> -XX:G1HeapRegionSize=8m \
> -XX:MaxGCPauseMillis=200 \
> -XX:+UseLargePages \
> -XX:+AggressiveOpts \
> “
>
> Use SSD for disks, with total space about 3X as big as the expected index 
> size.
>
> Have RAM not used by Solr or the OS that is equal to the expected index size.
>
> After that, let’s figure out what the real requirement is. If you must have 
> consistent response times for update requests, you’ll need to do that outside 
> of Solr. But if you need high data import rates, we can probably help.
>
> wunder
> Walter Underwood
> wun...@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
>
>> On Mar 18, 2019, at 8:31 AM, Aaron Yingcai Sun  wrote:
>>
>> Hello, Chris
>>
>>
>> Thanks for the tips.  So I tried to set it as you suggested, not see too 
>> much improvement.  Since I don't need it visible immediately, softCommit is 
>> disabled totally.
>>
>> The slow response is happening every few seconds,  if it happens hourly I 
>> would suspect the hourly auto-commit.  But it happen much more frequently.  
>> I don't see any CPU/RAM/NETWORK IO/DISK IO bottleneck on OS level.  It just 
>> looks like solr server is blocking intern

Re: Solr index slow response

2019-03-19 Thread Emir Arnautović
Just to add different perspective here: how do you send documents to Solr? Are 
those log lines from your client? Maybe it is not Solr that is slow. Could it 
be network or client itself. If you have some dry run on client, maybe try 
running it without Solr to eliminate client from the suspects.

Do you observe similar spikes when you run indexing with less concurrent 
clients?

It is really hard to pinpoint the issue without looking at some monitoring tool.

Emir
--
Monitoring - Log Management - Alerting - Anomaly Detection
Solr & Elasticsearch Consulting Support Training - http://sematext.com/



> On 19 Mar 2019, at 09:17, Aaron Yingcai Sun  wrote:
> 
> We have around 80 million documents to index, total index size around 3TB,  I 
> guess I'm not the first one to work with this big amount of data. with such 
> slow response time, the index process would take around 2 weeks. While the 
> system resource is not very loaded, there must be a way to speed it up.
> 
> 
> To Walter, I don't see why G1GC would improve this, we only do index, no 
> query in the background. There is no memory constraint.  it's more feel like 
> some internal thread are blocking each other.
> 
> 
> I used to run with more documents in one request, that give much worse 
> response time, 300 documents in one request could end up 20 minutes response 
> time, now I changed to max 10 documents in one request, still many response 
> time around 30 seconds, while some of them are very fast( ~100 ms).  How come 
> there are such big difference? the documents size does not have such big 
> difference.
> 
> 
> I just want to speed it up since nothing seems to be overloaded.  Are there 
> any other faster way to index such big amount of data?
> 
> 
> BRs
> 
> //Aaron
> 
> 
> From: Walter Underwood 
> Sent: Monday, March 18, 2019 4:59:20 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Solr index slow response
> 
> Solr is not designed to have consistent response times for updates. You are 
> expecting Solr to do something that it does not do.
> 
> About Xms and Xmx, the JVM will continue to allocate memory until it hits the 
> max. After it hits the max, it will start to collect garbage. A smaller Xms 
> just wastes time doing allocations after the JVM is running. Avoid that by 
> making Xms and Xms the same.
> 
> We run all of our JVMs with 8 GB of heap and the G1 collector. You probably 
> do not need more than 8 GB unless you are doing high-cardinality facets or 
> some other memory-hungry querying.
> 
> The first step would be to use a good configuration. We start our Java 8 JVMs 
> with these parameters:
> 
> SOLR_HEAP=8g
> # Use G1 GC  -- wunder 2017-01-23
> # Settings from https://wiki.apache.org/solr/ShawnHeisey
> GC_TUNE=" \
> -XX:+UseG1GC \
> -XX:+ParallelRefProcEnabled \
> -XX:G1HeapRegionSize=8m \
> -XX:MaxGCPauseMillis=200 \
> -XX:+UseLargePages \
> -XX:+AggressiveOpts \
> “
> 
> Use SSD for disks, with total space about 3X as big as the expected index 
> size.
> 
> Have RAM not used by Solr or the OS that is equal to the expected index size.
> 
> After that, let’s figure out what the real requirement is. If you must have 
> consistent response times for update requests, you’ll need to do that outside 
> of Solr. But if you need high data import rates, we can probably help.
> 
> wunder
> Walter Underwood
> wun...@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
> 
>> On Mar 18, 2019, at 8:31 AM, Aaron Yingcai Sun  wrote:
>> 
>> Hello, Chris
>> 
>> 
>> Thanks for the tips.  So I tried to set it as you suggested, not see too 
>> much improvement.  Since I don't need it visible immediately, softCommit is 
>> disabled totally.
>> 
>> The slow response is happening every few seconds,  if it happens hourly I 
>> would suspect the hourly auto-commit.  But it happen much more frequently.  
>> I don't see any CPU/RAM/NETWORK IO/DISK IO bottleneck on OS level.  It just 
>> looks like solr server is blocking internally itself.
>> 
>> 
>> <   ${solr.autoCommit.maxTime:360}
>> ---
>>> ${solr.autoCommit.maxTime:15000}
>> 16c16
>> <   true
>> ---
>>> false
>> 
>> 
>> 
>> 190318-162811.610-189982 DBG1:doc_count: 10 , doc_size: 539  KB, Res code: 
>> 200, QTime: 1405 ms, Request time: 1407 ms.
>> 190318-162811.636-189968 DBG1:doc_count: 10 , doc_size: 465  KB, Res code: 
>> 200, QTime: 1357 ms, Request time: 1360 ms.
>> 190318-162811.732-189968 DBG1:doc_count: 10 , doc_size: 473  KB, Res code: 
>> 200, QTime: 90 ms, Request

Re: Solr index slow response

2019-03-19 Thread Aaron Yingcai Sun
We have around 80 million documents to index, total index size around 3TB,  I 
guess I'm not the first one to work with this big amount of data. with such 
slow response time, the index process would take around 2 weeks. While the 
system resource is not very loaded, there must be a way to speed it up.


To Walter, I don't see why G1GC would improve this, we only do index, no query 
in the background. There is no memory constraint.  it's more feel like some 
internal thread are blocking each other.


I used to run with more documents in one request, that give much worse response 
time, 300 documents in one request could end up 20 minutes response time, now I 
changed to max 10 documents in one request, still many response time around 30 
seconds, while some of them are very fast( ~100 ms).  How come there are such 
big difference? the documents size does not have such big difference.


I just want to speed it up since nothing seems to be overloaded.  Are there any 
other faster way to index such big amount of data?


BRs

//Aaron


From: Walter Underwood 
Sent: Monday, March 18, 2019 4:59:20 PM
To: solr-user@lucene.apache.org
Subject: Re: Solr index slow response

Solr is not designed to have consistent response times for updates. You are 
expecting Solr to do something that it does not do.

About Xms and Xmx, the JVM will continue to allocate memory until it hits the 
max. After it hits the max, it will start to collect garbage. A smaller Xms 
just wastes time doing allocations after the JVM is running. Avoid that by 
making Xms and Xms the same.

We run all of our JVMs with 8 GB of heap and the G1 collector. You probably do 
not need more than 8 GB unless you are doing high-cardinality facets or some 
other memory-hungry querying.

The first step would be to use a good configuration. We start our Java 8 JVMs 
with these parameters:

SOLR_HEAP=8g
# Use G1 GC  -- wunder 2017-01-23
# Settings from https://wiki.apache.org/solr/ShawnHeisey
GC_TUNE=" \
-XX:+UseG1GC \
-XX:+ParallelRefProcEnabled \
-XX:G1HeapRegionSize=8m \
-XX:MaxGCPauseMillis=200 \
-XX:+UseLargePages \
-XX:+AggressiveOpts \
“

Use SSD for disks, with total space about 3X as big as the expected index size.

Have RAM not used by Solr or the OS that is equal to the expected index size.

After that, let’s figure out what the real requirement is. If you must have 
consistent response times for update requests, you’ll need to do that outside 
of Solr. But if you need high data import rates, we can probably help.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)

> On Mar 18, 2019, at 8:31 AM, Aaron Yingcai Sun  wrote:
>
> Hello, Chris
>
>
> Thanks for the tips.  So I tried to set it as you suggested, not see too much 
> improvement.  Since I don't need it visible immediately, softCommit is 
> disabled totally.
>
> The slow response is happening every few seconds,  if it happens hourly I 
> would suspect the hourly auto-commit.  But it happen much more frequently.  I 
> don't see any CPU/RAM/NETWORK IO/DISK IO bottleneck on OS level.  It just 
> looks like solr server is blocking internally itself.
>
>
> <   ${solr.autoCommit.maxTime:360}
> ---
>>  ${solr.autoCommit.maxTime:15000}
> 16c16
> <   true
> ---
>>  false
>
>
>
> 190318-162811.610-189982 DBG1:doc_count: 10 , doc_size: 539  KB, Res code: 
> 200, QTime: 1405 ms, Request time: 1407 ms.
> 190318-162811.636-189968 DBG1:doc_count: 10 , doc_size: 465  KB, Res code: 
> 200, QTime: 1357 ms, Request time: 1360 ms.
> 190318-162811.732-189968 DBG1:doc_count: 10 , doc_size: 473  KB, Res code: 
> 200, QTime: 90 ms, Request time: 92 ms.
> 190318-162811.995-189981 DBG1:doc_count: 10 , doc_size: 610  KB, Res code: 
> 200, QTime: 5306 ms, Request time: 5308 ms.
> 190318-162814.873-190003 DBG1:doc_count: 10 , doc_size: 508  KB, Res code: 
> 200, QTime: 4775 ms, Request time: 4777 ms.
> 190318-162814.889-189972 DBG1:doc_count: 10 , doc_size: 563  KB, Res code: 
> 200, QTime: 20222 ms, Request time: 20224 ms.
> 190318-162814.975-191817 DBG1:doc_count: 10 , doc_size: 539  KB, Res code: 
> 200, QTime: 27732 ms, Request time: 27735 ms.
> 190318-162814.975-189958 DBG1:doc_count: 10 , doc_size: 616  KB, Res code: 
> 200, QTime: 28106 ms, Request time: 28109 ms.
> 190318-162814.975-190004 DBG1:doc_count: 10 , doc_size: 473  KB, Res code: 
> 200, QTime: 16703 ms, Request time: 16706 ms.
> 190318-162814.982-189963 DBG1:doc_count: 10 , doc_size: 540  KB, Res code: 
> 200, QTime: 28216 ms, Request time: 28218 ms.
> 190318-162814.988-190007 DBG1:doc_count: 10 , doc_size: 673  KB, Res code: 
> 200, QTime: 28133 ms, Request time: 28136 ms.
> 190318-162814.993-189962 DBG1:doc_count: 10 , doc_size: 631  KB, Res code: 
> 200, QTime: 27909 ms, Reques

Re: Solr index slow response

2019-03-18 Thread Walter Underwood
2816.354-189972 DBG1:doc_count: 10 , doc_size: 631  KB, Res code: 
> 200, QTime: 130 ms, Request time: 132 ms.
> 190318-162816.471-189972 DBG1:doc_count: 10 , doc_size: 563  KB, Res code: 
> 200, QTime: 111 ms, Request time: 113 ms.
> 190318-162816.586-189972 DBG1:doc_count: 10 , doc_size: 554  KB, Res code: 
> 200, QTime: 110 ms, Request time: 111 ms.
> 190318-162816.716-189972 DBG1:doc_count: 10 , doc_size: 590  KB, Res code: 
> 200, QTime: 124 ms, Request time: 125 ms.
> 190318-162820.494-189972 DBG1:doc_count: 10 , doc_size: 583  KB, Res code: 
> 200, QTime: 3772 ms, Request time: 3774 ms.
> 190318-162820.574-189953 DBG1:doc_count: 10 , doc_size: 550  KB, Res code: 
> 200, QTime: 32802 ms, Request time: 32804 ms.
> 190318-162820.609-189991 DBG1:doc_count: 10 , doc_size: 586  KB, Res code: 
> 200, QTime: 33787 ms, Request time: 33790 ms.
> 190318-162820.621-189976 DBG1:doc_count: 10 , doc_size: 572  KB, Res code: 
> 200, QTime: 33397 ms, Request time: 33400 ms.
> 190318-162820.622-189985 DBG1:doc_count: 10 , doc_size: 498  KB, Res code: 
> 200, QTime: 32917 ms, Request time: 32919 ms.
> 190318-162820.987-190005 DBG1:doc_count: 10 , doc_size: 629  KB, Res code: 
> 200, QTime: 22207 ms, Request time: 22209 ms.
> 190318-162821.028-189979 DBG1:doc_count: 10 , doc_size: 584  KB, Res code: 
> 200, QTime: 22800 ms, Request time: 22802 ms.
> 190318-162821.056-189948 DBG1:doc_count: 10 , doc_size: 670  KB, Res code: 
> 200, QTime: 34193 ms, Request time: 34195 ms.
> 190318-162821.062-189983 DBG1:doc_count: 10 , doc_size: 675  KB, Res code: 
> 200, QTime: 22250 ms, Request time: 22252 ms.
> 
> 
> 
> 
> BRs
> 
> //Aaron
> 
> 
> From: Chris Ulicny 
> Sent: Monday, March 18, 2019 2:54:25 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Solr index slow response
> 
> One other thing to look at besides the heap is your commit settings. We've
> experienced something similar, and changing commit settings alleviated the
> issue.
> 
> Are you opening a search on every hardcommit? If so, you might want to
> reconsider and use the softcommit for the hourly creation of a new searcher.
> 
> The hardcommit interval should be much lower. Probably something on the
> order of seconds (15000) instead of hours (currently 360). When the
> hard commit fires, numerous merges might be firing off in the background
> due to the volume of documents you are indexing, which might explain the
> periodic bad response times shown in your logs.
> 
> It would depend on your specific scenario, but here's our setup. During
> long periods of constant indexing of documents to a staging collection (~2
> billion documents), we have following commit settings
> 
> softcommit: 360ms (for periodic validation of data, since it's not in
> production)
> hardcommit: openSearcher -> false, 15000ms (no document limit)
> 
> This makes the documents available for searching every hour, but doesn't
> result in the large bursts of IO due to the infrequent hard commits.
> 
> For more info, Erick Erickson has a great write up:
> https://lucidworks.com/2013/08/23/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/
> 
> Best,
> Chris
> 
> On Mon, Mar 18, 2019 at 9:36 AM Aaron Yingcai Sun  wrote:
> 
>> Hi, Emir,
>> 
>> My system used to run with max 32GB, the response time is bad as well.
>> swap is set to 4GB, there 3.2 free, I doubt swap would affect it since
>> there is such huge free memory.
>> 
>> I could try to with set Xms and Xmx to the same value, but I doubt how
>> much would that change the response time.
>> 
>> 
>> BRs
>> 
>> //Aaron
>> 
>> 
>> From: Emir Arnautović 
>> Sent: Monday, March 18, 2019 2:19:19 PM
>> To: solr-user@lucene.apache.org
>> Subject: Re: Solr index slow response
>> 
>> Hi Aaron,
>> Without looking too much into numbers, my bet would be that it is large
>> heap that is causing issues. I would decrease is significantly (<30GB) and
>> see if it is enough for your max load. Also, disable swap or reduce
>> swappiness to min.
>> 
>> In any case, you should install some monitoring tool that would help you
>> do better analysis when you run into problems. One such tool is our
>> monitoring solution: https://sematext.com/spm
>> 
>> HTH,
>> Emir
>> --
>> Monitoring - Log Management - Alerting - Anomaly Detection
>> Solr & Elasticsearch Consulting Support Training - http://sematext.com/
>> 
>> 
>> 
>>> On 18 Mar 2019, at 13:14, Aaron Yingcai Sun  wrote:
>>

Re: Solr index slow response

2019-03-18 Thread Aaron Yingcai Sun
Hello, Chris


Thanks for the tips.  So I tried to set it as you suggested, not see too much 
improvement.  Since I don't need it visible immediately, softCommit is disabled 
totally.

The slow response is happening every few seconds,  if it happens hourly I would 
suspect the hourly auto-commit.  But it happen much more frequently.  I don't 
see any CPU/RAM/NETWORK IO/DISK IO bottleneck on OS level.  It just looks like 
solr server is blocking internally itself.


<   ${solr.autoCommit.maxTime:360}
---
>   ${solr.autoCommit.maxTime:15000}
16c16
<   true
---
>   false



190318-162811.610-189982 DBG1:doc_count: 10 , doc_size: 539  KB, Res code: 200, 
QTime: 1405 ms, Request time: 1407 ms.
190318-162811.636-189968 DBG1:doc_count: 10 , doc_size: 465  KB, Res code: 200, 
QTime: 1357 ms, Request time: 1360 ms.
190318-162811.732-189968 DBG1:doc_count: 10 , doc_size: 473  KB, Res code: 200, 
QTime: 90 ms, Request time: 92 ms.
190318-162811.995-189981 DBG1:doc_count: 10 , doc_size: 610  KB, Res code: 200, 
QTime: 5306 ms, Request time: 5308 ms.
190318-162814.873-190003 DBG1:doc_count: 10 , doc_size: 508  KB, Res code: 200, 
QTime: 4775 ms, Request time: 4777 ms.
190318-162814.889-189972 DBG1:doc_count: 10 , doc_size: 563  KB, Res code: 200, 
QTime: 20222 ms, Request time: 20224 ms.
190318-162814.975-191817 DBG1:doc_count: 10 , doc_size: 539  KB, Res code: 200, 
QTime: 27732 ms, Request time: 27735 ms.
190318-162814.975-189958 DBG1:doc_count: 10 , doc_size: 616  KB, Res code: 200, 
QTime: 28106 ms, Request time: 28109 ms.
190318-162814.975-190004 DBG1:doc_count: 10 , doc_size: 473  KB, Res code: 200, 
QTime: 16703 ms, Request time: 16706 ms.
190318-162814.982-189963 DBG1:doc_count: 10 , doc_size: 540  KB, Res code: 200, 
QTime: 28216 ms, Request time: 28218 ms.
190318-162814.988-190007 DBG1:doc_count: 10 , doc_size: 673  KB, Res code: 200, 
QTime: 28133 ms, Request time: 28136 ms.
190318-162814.993-189962 DBG1:doc_count: 10 , doc_size: 631  KB, Res code: 200, 
QTime: 27909 ms, Request time: 27912 ms.
190318-162814.996-191818 DBG1:doc_count: 10 , doc_size: 529  KB, Res code: 200, 
QTime: 28172 ms, Request time: 28174 ms.
190318-162815.056-189986 DBG1:doc_count: 10 , doc_size: 602  KB, Res code: 200, 
QTime: 11375 ms, Request time: 11378 ms.
190318-162815.100-189988 DBG1:doc_count: 10 , doc_size: 530  KB, Res code: 200, 
QTime: 8663 ms, Request time: 8666 ms.
190318-162815.275-189980 DBG1:doc_count: 10 , doc_size: 553  KB, Res code: 200, 
QTime: 27526 ms, Request time: 27528 ms.
190318-162815.283-189997 DBG1:doc_count: 10 , doc_size: 600  KB, Res code: 200, 
QTime: 27529 ms, Request time: 27535 ms.
190318-162815.318-189961 DBG1:doc_count: 10 , doc_size: 661  KB, Res code: 200, 
QTime: 16617 ms, Request time: 16621 ms.
190318-162815.484-189952 DBG1:doc_count: 10 , doc_size: 549  KB, Res code: 200, 
QTime: 11707 ms, Request time: 11711 ms.
190318-162815.485-189993 DBG1:doc_count: 10 , doc_size: 618  KB, Res code: 200, 
QTime: 28315 ms, Request time: 28321 ms.
190318-162816.216-189972 DBG1:doc_count: 10 , doc_size: 493  KB, Res code: 200, 
QTime: 1320 ms, Request time: 1322 ms.
190318-162816.354-189972 DBG1:doc_count: 10 , doc_size: 631  KB, Res code: 200, 
QTime: 130 ms, Request time: 132 ms.
190318-162816.471-189972 DBG1:doc_count: 10 , doc_size: 563  KB, Res code: 200, 
QTime: 111 ms, Request time: 113 ms.
190318-162816.586-189972 DBG1:doc_count: 10 , doc_size: 554  KB, Res code: 200, 
QTime: 110 ms, Request time: 111 ms.
190318-162816.716-189972 DBG1:doc_count: 10 , doc_size: 590  KB, Res code: 200, 
QTime: 124 ms, Request time: 125 ms.
190318-162820.494-189972 DBG1:doc_count: 10 , doc_size: 583  KB, Res code: 200, 
QTime: 3772 ms, Request time: 3774 ms.
190318-162820.574-189953 DBG1:doc_count: 10 , doc_size: 550  KB, Res code: 200, 
QTime: 32802 ms, Request time: 32804 ms.
190318-162820.609-189991 DBG1:doc_count: 10 , doc_size: 586  KB, Res code: 200, 
QTime: 33787 ms, Request time: 33790 ms.
190318-162820.621-189976 DBG1:doc_count: 10 , doc_size: 572  KB, Res code: 200, 
QTime: 33397 ms, Request time: 33400 ms.
190318-162820.622-189985 DBG1:doc_count: 10 , doc_size: 498  KB, Res code: 200, 
QTime: 32917 ms, Request time: 32919 ms.
190318-162820.987-190005 DBG1:doc_count: 10 , doc_size: 629  KB, Res code: 200, 
QTime: 22207 ms, Request time: 22209 ms.
190318-162821.028-189979 DBG1:doc_count: 10 , doc_size: 584  KB, Res code: 200, 
QTime: 22800 ms, Request time: 22802 ms.
190318-162821.056-189948 DBG1:doc_count: 10 , doc_size: 670  KB, Res code: 200, 
QTime: 34193 ms, Request time: 34195 ms.
190318-162821.062-189983 DBG1:doc_count: 10 , doc_size: 675  KB, Res code: 200, 
QTime: 22250 ms, Request time: 22252 ms.




BRs

//Aaron


From: Chris Ulicny 
Sent: Monday, March 18, 2019 2:54:25 PM
To: solr-user@lucene.apache.org
Subject: Re: Solr index slow response

One other thing to look at besides the heap is your commit settings. We've
ex

Re: Solr index slow response

2019-03-18 Thread Chris Ulicny
One other thing to look at besides the heap is your commit settings. We've
experienced something similar, and changing commit settings alleviated the
issue.

Are you opening a search on every hardcommit? If so, you might want to
reconsider and use the softcommit for the hourly creation of a new searcher.

The hardcommit interval should be much lower. Probably something on the
order of seconds (15000) instead of hours (currently 360). When the
hard commit fires, numerous merges might be firing off in the background
due to the volume of documents you are indexing, which might explain the
periodic bad response times shown in your logs.

It would depend on your specific scenario, but here's our setup. During
long periods of constant indexing of documents to a staging collection (~2
billion documents), we have following commit settings

softcommit: 360ms (for periodic validation of data, since it's not in
production)
hardcommit: openSearcher -> false, 15000ms (no document limit)

This makes the documents available for searching every hour, but doesn't
result in the large bursts of IO due to the infrequent hard commits.

For more info, Erick Erickson has a great write up:
https://lucidworks.com/2013/08/23/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/

Best,
Chris

On Mon, Mar 18, 2019 at 9:36 AM Aaron Yingcai Sun  wrote:

> Hi, Emir,
>
> My system used to run with max 32GB, the response time is bad as well.
> swap is set to 4GB, there 3.2 free, I doubt swap would affect it since
> there is such huge free memory.
>
> I could try to with set Xms and Xmx to the same value, but I doubt how
> much would that change the response time.
>
>
> BRs
>
> //Aaron
>
> 
> From: Emir Arnautović 
> Sent: Monday, March 18, 2019 2:19:19 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Solr index slow response
>
> Hi Aaron,
> Without looking too much into numbers, my bet would be that it is large
> heap that is causing issues. I would decrease is significantly (<30GB) and
> see if it is enough for your max load. Also, disable swap or reduce
> swappiness to min.
>
> In any case, you should install some monitoring tool that would help you
> do better analysis when you run into problems. One such tool is our
> monitoring solution: https://sematext.com/spm
>
> HTH,
> Emir
> --
> Monitoring - Log Management - Alerting - Anomaly Detection
> Solr & Elasticsearch Consulting Support Training - http://sematext.com/
>
>
>
> > On 18 Mar 2019, at 13:14, Aaron Yingcai Sun  wrote:
> >
> > Hello, Emir,
> >
> > Thanks for the reply, this is the solr version and heap info, standalone
> single solr server. I don't have monitor tool connected. only look at
> 'top', has not seen cpu spike so far, when the slow response happens, cpu
> usage is not high at all, around 30%.
> >
> >
> > # curl 'http://.../solr/admin/info/system?wt=json&indent=true'
> > {
> >  "responseHeader":{
> >"status":0,
> >"QTime":27},
> >  "mode":"std",
> >  "solr_home":"/ardome/solr",
> >  "lucene":{
> >"solr-spec-version":"6.5.1",
> >"solr-impl-version":"6.5.1 cd1f23c63abe03ae650c75ec8ccb37762806cc75 -
> jimczi - 2017-04-21 12:23:42",
> >"lucene-spec-version":"6.5.1",
> >"lucene-impl-version":"6.5.1 cd1f23c63abe03ae650c75ec8ccb37762806cc75
> - jimczi - 2017-04-21 12:17:15"},
> >  "jvm":{
> >"version":"1.8.0_144 25.144-b01",
> >"name":"Oracle Corporation Java HotSpot(TM) 64-Bit Server VM",
> >"spec":{
> >  "vendor":"Oracle Corporation",
> >  "name":"Java Platform API Specification",
> >  "version":"1.8"},
> >"jre":{
> >  "vendor":"Oracle Corporation",
> >  "version":"1.8.0_144"},
> >"vm":{
> >  "vendor":"Oracle Corporation",
> >  "name":"Java HotSpot(TM) 64-Bit Server VM",
> >  "version":"25.144-b01"},
> >"processors":32,
> >"memory":{
> >  "free":"69.1 GB",
> >  "total":"180.2 GB",
> >  "max":"266.7 GB",
> >  "used":"111 GB (%41.6)",
> >  "raw":{
> >"free":74238728336,

Re: Solr index slow response

2019-03-18 Thread Emir Arnautović
4GB swap on 400GB machine does not make much sense, so disable it. Even 4GB, 
some pages might be swapped, and if those are some Solr pages, it’ll affect 
Solr.

Setting Xms and Xmx to the same value will not solve your issue but you will 
avoid heap resize when your heap reaches Xms.

Emir
--
Monitoring - Log Management - Alerting - Anomaly Detection
Solr & Elasticsearch Consulting Support Training - http://sematext.com/



> On 18 Mar 2019, at 14:36, Aaron Yingcai Sun  wrote:
> 
> Hi, Emir,
> 
> My system used to run with max 32GB, the response time is bad as well.  swap 
> is set to 4GB, there 3.2 free, I doubt swap would affect it since there is 
> such huge free memory.
> 
> I could try to with set Xms and Xmx to the same value, but I doubt how much 
> would that change the response time.
> 
> 
> BRs
> 
> //Aaron
> 
> 
> From: Emir Arnautović 
> Sent: Monday, March 18, 2019 2:19:19 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Solr index slow response
> 
> Hi Aaron,
> Without looking too much into numbers, my bet would be that it is large heap 
> that is causing issues. I would decrease is significantly (<30GB) and see if 
> it is enough for your max load. Also, disable swap or reduce swappiness to 
> min.
> 
> In any case, you should install some monitoring tool that would help you do 
> better analysis when you run into problems. One such tool is our monitoring 
> solution: https://sematext.com/spm
> 
> HTH,
> Emir
> --
> Monitoring - Log Management - Alerting - Anomaly Detection
> Solr & Elasticsearch Consulting Support Training - http://sematext.com/
> 
> 
> 
>> On 18 Mar 2019, at 13:14, Aaron Yingcai Sun  wrote:
>> 
>> Hello, Emir,
>> 
>> Thanks for the reply, this is the solr version and heap info, standalone 
>> single solr server. I don't have monitor tool connected. only look at 'top', 
>> has not seen cpu spike so far, when the slow response happens, cpu usage is 
>> not high at all, around 30%.
>> 
>> 
>> # curl 'http://.../solr/admin/info/system?wt=json&indent=true'
>> {
>> "responseHeader":{
>>   "status":0,
>>   "QTime":27},
>> "mode":"std",
>> "solr_home":"/ardome/solr",
>> "lucene":{
>>   "solr-spec-version":"6.5.1",
>>   "solr-impl-version":"6.5.1 cd1f23c63abe03ae650c75ec8ccb37762806cc75 - 
>> jimczi - 2017-04-21 12:23:42",
>>   "lucene-spec-version":"6.5.1",
>>   "lucene-impl-version":"6.5.1 cd1f23c63abe03ae650c75ec8ccb37762806cc75 - 
>> jimczi - 2017-04-21 12:17:15"},
>> "jvm":{
>>   "version":"1.8.0_144 25.144-b01",
>>   "name":"Oracle Corporation Java HotSpot(TM) 64-Bit Server VM",
>>   "spec":{
>> "vendor":"Oracle Corporation",
>> "name":"Java Platform API Specification",
>> "version":"1.8"},
>>   "jre":{
>> "vendor":"Oracle Corporation",
>> "version":"1.8.0_144"},
>>   "vm":{
>> "vendor":"Oracle Corporation",
>> "name":"Java HotSpot(TM) 64-Bit Server VM",
>> "version":"25.144-b01"},
>>   "processors":32,
>>   "memory":{
>> "free":"69.1 GB",
>> "total":"180.2 GB",
>> "max":"266.7 GB",
>> "used":"111 GB (%41.6)",
>> "raw":{
>>   "free":74238728336,
>>   "total":193470136320,
>>   "max":286331502592,
>>   "used":119231407984,
>>   "used%":41.64103736566334}},
>>   "jmx":{
>> 
>> "bootclasspath":"/usr/java/jdk1.8.0_144/jre/lib/resources.jar:/usr/java/jdk1.8.0_144/jre/lib/rt.jar:/usr/java/jdk1.8.0_144/jre/lib/sunrsasign.jar:/usr/java/jdk1.8.0_144/jre/lib/jsse.jar:/usr/java/jdk1.8.0_144/jre/lib/jce.jar:/usr/java/jdk1.8.0_144/jre/lib/charsets.jar:/usr/java/jdk1.8.0_144/jre/lib/jfr.jar:/usr/java/jdk1.8.0_144/jre/classes",
>> "classpath":"...",
>> "commandLineArgs":["-Xms100G",
>>   "-Xmx300G",
>>   "-DSTOP.PORT=8079",
>>   "-DSTOP.KEY=..",
>>   "-Dsolr.solr.home=..",
>>   &qu

Re: Solr index slow response

2019-03-18 Thread Emir Arnautović
 , doc_size: 672  KB, Res code: 
> 200, QTime: 124 ms, Request time: 125 ms.
> 190318-142655.210-160208 DBG1:doc_count: 10 , doc_size: 605  KB, Res code: 
> 200, QTime: 108 ms, Request time: 110 ms.
> 190318-142655.304-160208 DBG1:doc_count: 10 , doc_size: 481  KB, Res code: 
> 200, QTime: 89 ms, Request time: 90 ms.
> 190318-142655.410-160208 DBG1:doc_count: 10 , doc_size: 468  KB, Res code: 
> 200, QTime: 101 ms, Request time: 102 ms.
> 
> 
> 
> From: Toke Eskildsen 
> Sent: Monday, March 18, 2019 2:13:12 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Solr index slow response
> 
> On Mon, 2019-03-18 at 10:47 +, Aaron Yingcai Sun wrote:
>> Solr server is running on a quit powerful server, 32 cpus, 400GB RAM,
>> while 300 GB is reserved for solr, [...]
> 
> 300GB for Solr sounds excessive.
> 
>> Our application send 100 documents to solr per request, json encoded.
>> the size is around 5M each time. some times the response time is
>> under 1 seconds, some times could be 300 seconds, the slow response
>> happens very often.
>> ...
>> There are around 100 clients sending those documents at the same
>> time, but each for the client is blocking call which wait the http
>> response then send the next one.
> 
> 100 clients * 5MB/batch = at most 500MB. Or maybe you meant 100 clients
> * 100 documents * 5MB/document = at most 50GB? Either way it is a long
> way from 300GB and the stats you provide further down the thread
> indicates that you are overprovisioning quite a lot:
> 
> "memory":{
>  "free":"69.1 GB",
>  "total":"180.2 GB",
>  "max":"266.7 GB",
>  "used":"111 GB (%41.6)",
> 
> Intermittent slow response times are a known effect of having large
> Java heaps, due to stop-the-world garbage collections.
> 
> Try dialing Xmx _way_ down: If your batches are only 5MB each, try
> Xmx=20g or less. I know that the stats above says that Solr uses 111GB,
> but the JVM has a tendency to expand the heap quite a lot when it is
> getting hammered. If you want to check beforehand, you can see how much
> memeory is freed from full GCs in the GC-log.
> 
> - Toke Eskildsen, Royal Danish Library
> 
> 



Re: Solr index slow response

2019-03-18 Thread Aaron Yingcai Sun
Hi, Emir,

My system used to run with max 32GB, the response time is bad as well.  swap is 
set to 4GB, there 3.2 free, I doubt swap would affect it since there is such 
huge free memory.

I could try to with set Xms and Xmx to the same value, but I doubt how much 
would that change the response time.


BRs

//Aaron


From: Emir Arnautović 
Sent: Monday, March 18, 2019 2:19:19 PM
To: solr-user@lucene.apache.org
Subject: Re: Solr index slow response

Hi Aaron,
Without looking too much into numbers, my bet would be that it is large heap 
that is causing issues. I would decrease is significantly (<30GB) and see if it 
is enough for your max load. Also, disable swap or reduce swappiness to min.

In any case, you should install some monitoring tool that would help you do 
better analysis when you run into problems. One such tool is our monitoring 
solution: https://sematext.com/spm

HTH,
Emir
--
Monitoring - Log Management - Alerting - Anomaly Detection
Solr & Elasticsearch Consulting Support Training - http://sematext.com/



> On 18 Mar 2019, at 13:14, Aaron Yingcai Sun  wrote:
>
> Hello, Emir,
>
> Thanks for the reply, this is the solr version and heap info, standalone 
> single solr server. I don't have monitor tool connected. only look at 'top', 
> has not seen cpu spike so far, when the slow response happens, cpu usage is 
> not high at all, around 30%.
>
>
> # curl 'http://.../solr/admin/info/system?wt=json&indent=true'
> {
>  "responseHeader":{
>"status":0,
>"QTime":27},
>  "mode":"std",
>  "solr_home":"/ardome/solr",
>  "lucene":{
>"solr-spec-version":"6.5.1",
>"solr-impl-version":"6.5.1 cd1f23c63abe03ae650c75ec8ccb37762806cc75 - 
> jimczi - 2017-04-21 12:23:42",
>"lucene-spec-version":"6.5.1",
>"lucene-impl-version":"6.5.1 cd1f23c63abe03ae650c75ec8ccb37762806cc75 - 
> jimczi - 2017-04-21 12:17:15"},
>  "jvm":{
>"version":"1.8.0_144 25.144-b01",
>"name":"Oracle Corporation Java HotSpot(TM) 64-Bit Server VM",
>"spec":{
>  "vendor":"Oracle Corporation",
>  "name":"Java Platform API Specification",
>  "version":"1.8"},
>"jre":{
>  "vendor":"Oracle Corporation",
>  "version":"1.8.0_144"},
>"vm":{
>  "vendor":"Oracle Corporation",
>  "name":"Java HotSpot(TM) 64-Bit Server VM",
>  "version":"25.144-b01"},
>"processors":32,
>"memory":{
>  "free":"69.1 GB",
>  "total":"180.2 GB",
>  "max":"266.7 GB",
>  "used":"111 GB (%41.6)",
>  "raw":{
>"free":74238728336,
>"total":193470136320,
>"max":286331502592,
>"used":119231407984,
>"used%":41.64103736566334}},
>"jmx":{
>  
> "bootclasspath":"/usr/java/jdk1.8.0_144/jre/lib/resources.jar:/usr/java/jdk1.8.0_144/jre/lib/rt.jar:/usr/java/jdk1.8.0_144/jre/lib/sunrsasign.jar:/usr/java/jdk1.8.0_144/jre/lib/jsse.jar:/usr/java/jdk1.8.0_144/jre/lib/jce.jar:/usr/java/jdk1.8.0_144/jre/lib/charsets.jar:/usr/java/jdk1.8.0_144/jre/lib/jfr.jar:/usr/java/jdk1.8.0_144/jre/classes",
>  "classpath":"...",
>  "commandLineArgs":["-Xms100G",
>"-Xmx300G",
>"-DSTOP.PORT=8079",
>"-DSTOP.KEY=..",
>"-Dsolr.solr.home=..",
>"-Djetty.port=8983"],
>  "startTime":"2019-03-18T09:35:27.892Z",
>  "upTimeMS":9258422}},
>  "system":{
>"name":"Linux",
>"arch":"amd64",
>"availableProcessors":32,
>"systemLoadAverage":14.72,
>"version":"3.0.101-311.g08a8a9d-default",
>"committedVirtualMemorySize":2547960700928,
>"freePhysicalMemorySize":4530696192,
>"freeSwapSpaceSize":3486846976,
>"processCpuLoad":0.3257436126790475,
>"processCpuTime":9386945000,
>"systemCpuLoad":0.3279781055816521,
>"totalPhysicalMemorySize":406480175104,
>"totalSwapSpaceSize":4302303232,
>"maxFileDescriptorCount&

Re: Solr index slow response

2019-03-18 Thread Aaron Yingcai Sun
I'm a bit confused, why large heap size would make it slower?  Isn't that give 
it enough room to make it not busy doing GC all the time?

My http/json request contains 100 documents, the total size of the 100 
documents is around 5M, there are ~100 client sending those requests 
continuously.

Previously the JVM is set to max 32 GB ,  the speed was even worse,  now it's 
running with min 100GB, max 300GB, it use around 100GB.


this page suggest to use smaller number of documents per request,   
https://wiki.apache.org/solr/SolrPerformanceProblems

SolrPerformanceProblems - Solr 
Wiki<https://wiki.apache.org/solr/SolrPerformanceProblems>
wiki.apache.org
General information. There is a performance bug that makes *everything* slow in 
versions 6.4.0 and 6.4.1. The problem is fixed in 6.4.2. It is described by 
SOLR-10130.This is highly version specific, so if you are not running one of 
the affected versions, don't worry about it.

So I try to reduce the number, still I could get lots of large response QTime:

190318-142652.695-160214 DBG1:doc_count: 10 , doc_size: 609  KB, Res code: 200, 
QTime: 47918 ms, Request time: 47921 ms.
190318-142652.704-160179 DBG1:doc_count: 10 , doc_size: 568  KB, Res code: 200, 
QTime: 36919 ms, Request time: 36922 ms.
190318-142652.780-160197 DBG1:doc_count: 10 , doc_size: 609  KB, Res code: 200, 
QTime: 36082 ms, Request time: 36084 ms.
190318-142652.859-160200 DBG1:doc_count: 10 , doc_size: 569  KB, Res code: 200, 
QTime: 36880 ms, Request time: 36882 ms.
190318-142653.131-160148 DBG1:doc_count: 10 , doc_size: 608  KB, Res code: 200, 
QTime: 37222 ms, Request time: 37224 ms.
190318-142653.154-160211 DBG1:doc_count: 10 , doc_size: 541  KB, Res code: 200, 
QTime: 37241 ms, Request time: 37243 ms.
190318-142653.223-163490 DBG1:doc_count: 10 , doc_size: 589  KB, Res code: 200, 
QTime: 37174 ms, Request time: 37176 ms.
190318-142653.359-160154 DBG1:doc_count: 10 , doc_size: 592  KB, Res code: 200, 
QTime: 37008 ms, Request time: 37011 ms.
190318-142653.497-163491 DBG1:doc_count: 10 , doc_size: 583  KB, Res code: 200, 
QTime: 24828 ms, Request time: 24830 ms.
190318-142653.987-160208 DBG1:doc_count: 10 , doc_size: 669  KB, Res code: 200, 
QTime: 23900 ms, Request time: 23902 ms.
190318-142654.114-160208 DBG1:doc_count: 10 , doc_size: 544  KB, Res code: 200, 
QTime: 121 ms, Request time: 122 ms.
190318-142654.233-160208 DBG1:doc_count: 10 , doc_size: 536  KB, Res code: 200, 
QTime: 113 ms, Request time: 115 ms.
190318-142654.354-160208 DBG1:doc_count: 10 , doc_size: 598  KB, Res code: 200, 
QTime: 116 ms, Request time: 117 ms.
190318-142654.466-160208 DBG1:doc_count: 10 , doc_size: 546  KB, Res code: 200, 
QTime: 107 ms, Request time: 108 ms.
190318-142654.586-160208 DBG1:doc_count: 10 , doc_size: 566  KB, Res code: 200, 
QTime: 114 ms, Request time: 115 ms.
190318-142654.687-160208 DBG1:doc_count: 10 , doc_size: 541  KB, Res code: 200, 
QTime: 96 ms, Request time: 98 ms.
190318-142654.768-160208 DBG1:doc_count: 10 , doc_size: 455  KB, Res code: 200, 
QTime: 75 ms, Request time: 77 ms.
190318-142654.870-160208 DBG1:doc_count: 10 , doc_size: 538  KB, Res code: 200, 
QTime: 97 ms, Request time: 98 ms.
190318-142654.967-160208 DBG1:doc_count: 10 , doc_size: 539  KB, Res code: 200, 
QTime: 92 ms, Request time: 93 ms.
190318-142655.096-160208 DBG1:doc_count: 10 , doc_size: 672  KB, Res code: 200, 
QTime: 124 ms, Request time: 125 ms.
190318-142655.210-160208 DBG1:doc_count: 10 , doc_size: 605  KB, Res code: 200, 
QTime: 108 ms, Request time: 110 ms.
190318-142655.304-160208 DBG1:doc_count: 10 , doc_size: 481  KB, Res code: 200, 
QTime: 89 ms, Request time: 90 ms.
190318-142655.410-160208 DBG1:doc_count: 10 , doc_size: 468  KB, Res code: 200, 
QTime: 101 ms, Request time: 102 ms.



From: Toke Eskildsen 
Sent: Monday, March 18, 2019 2:13:12 PM
To: solr-user@lucene.apache.org
Subject: Re: Solr index slow response

On Mon, 2019-03-18 at 10:47 +, Aaron Yingcai Sun wrote:
> Solr server is running on a quit powerful server, 32 cpus, 400GB RAM,
> while 300 GB is reserved for solr, [...]

300GB for Solr sounds excessive.

> Our application send 100 documents to solr per request, json encoded.
> the size is around 5M each time. some times the response time is
> under 1 seconds, some times could be 300 seconds, the slow response
> happens very often.
> ...
> There are around 100 clients sending those documents at the same
> time, but each for the client is blocking call which wait the http
> response then send the next one.

100 clients * 5MB/batch = at most 500MB. Or maybe you meant 100 clients
* 100 documents * 5MB/document = at most 50GB? Either way it is a long
way from 300GB and the stats you provide further down the thread
indicates that you are overprovisioning quite a lot:

"memory":{
  "free":"69.1 GB",
  "total":"180.2 GB",
  "

Re: Solr index slow response

2019-03-18 Thread Emir Arnautović
One more thing - it is considered a good practice to use the same value for Xmx 
and Xms.

Emir
--
Monitoring - Log Management - Alerting - Anomaly Detection
Solr & Elasticsearch Consulting Support Training - http://sematext.com/



> On 18 Mar 2019, at 14:19, Emir Arnautović  
> wrote:
> 
> Hi Aaron,
> Without looking too much into numbers, my bet would be that it is large heap 
> that is causing issues. I would decrease is significantly (<30GB) and see if 
> it is enough for your max load. Also, disable swap or reduce swappiness to 
> min.
> 
> In any case, you should install some monitoring tool that would help you do 
> better analysis when you run into problems. One such tool is our monitoring 
> solution: https://sematext.com/spm
> 
> HTH,
> Emir
> --
> Monitoring - Log Management - Alerting - Anomaly Detection
> Solr & Elasticsearch Consulting Support Training - http://sematext.com/
> 
> 
> 
>> On 18 Mar 2019, at 13:14, Aaron Yingcai Sun  wrote:
>> 
>> Hello, Emir,
>> 
>> Thanks for the reply, this is the solr version and heap info, standalone 
>> single solr server. I don't have monitor tool connected. only look at 'top', 
>> has not seen cpu spike so far, when the slow response happens, cpu usage is 
>> not high at all, around 30%.
>> 
>> 
>> # curl 'http://.../solr/admin/info/system?wt=json&indent=true'
>> {
>> "responseHeader":{
>>   "status":0,
>>   "QTime":27},
>> "mode":"std",
>> "solr_home":"/ardome/solr",
>> "lucene":{
>>   "solr-spec-version":"6.5.1",
>>   "solr-impl-version":"6.5.1 cd1f23c63abe03ae650c75ec8ccb37762806cc75 - 
>> jimczi - 2017-04-21 12:23:42",
>>   "lucene-spec-version":"6.5.1",
>>   "lucene-impl-version":"6.5.1 cd1f23c63abe03ae650c75ec8ccb37762806cc75 - 
>> jimczi - 2017-04-21 12:17:15"},
>> "jvm":{
>>   "version":"1.8.0_144 25.144-b01",
>>   "name":"Oracle Corporation Java HotSpot(TM) 64-Bit Server VM",
>>   "spec":{
>> "vendor":"Oracle Corporation",
>> "name":"Java Platform API Specification",
>> "version":"1.8"},
>>   "jre":{
>> "vendor":"Oracle Corporation",
>> "version":"1.8.0_144"},
>>   "vm":{
>> "vendor":"Oracle Corporation",
>> "name":"Java HotSpot(TM) 64-Bit Server VM",
>> "version":"25.144-b01"},
>>   "processors":32,
>>   "memory":{
>> "free":"69.1 GB",
>> "total":"180.2 GB",
>> "max":"266.7 GB",
>> "used":"111 GB (%41.6)",
>> "raw":{
>>   "free":74238728336,
>>   "total":193470136320,
>>   "max":286331502592,
>>   "used":119231407984,
>>   "used%":41.64103736566334}},
>>   "jmx":{
>> 
>> "bootclasspath":"/usr/java/jdk1.8.0_144/jre/lib/resources.jar:/usr/java/jdk1.8.0_144/jre/lib/rt.jar:/usr/java/jdk1.8.0_144/jre/lib/sunrsasign.jar:/usr/java/jdk1.8.0_144/jre/lib/jsse.jar:/usr/java/jdk1.8.0_144/jre/lib/jce.jar:/usr/java/jdk1.8.0_144/jre/lib/charsets.jar:/usr/java/jdk1.8.0_144/jre/lib/jfr.jar:/usr/java/jdk1.8.0_144/jre/classes",
>> "classpath":"...",
>> "commandLineArgs":["-Xms100G",
>>   "-Xmx300G",
>>   "-DSTOP.PORT=8079",
>>   "-DSTOP.KEY=..",
>>   "-Dsolr.solr.home=..",
>>   "-Djetty.port=8983"],
>> "startTime":"2019-03-18T09:35:27.892Z",
>> "upTimeMS":9258422}},
>> "system":{
>>   "name":"Linux",
>>   "arch":"amd64",
>>   "availableProcessors":32,
>>   "systemLoadAverage":14.72,
>>   "version":"3.0.101-311.g08a8a9d-default",
>>   "committedVirtualMemorySize":2547960700928,
>>   "freePhysicalMemorySize":4530696192,
>>   "freeSwapSpaceSize":3486846976,
>>   "processCpuLoad":0.3257436126790475,
>>   "processCpuTime":9386945000,
>>   "systemCpuLoad":0.3279781055

Re: Solr index slow response

2019-03-18 Thread Emir Arnautović
Hi Aaron,
Without looking too much into numbers, my bet would be that it is large heap 
that is causing issues. I would decrease is significantly (<30GB) and see if it 
is enough for your max load. Also, disable swap or reduce swappiness to min.

In any case, you should install some monitoring tool that would help you do 
better analysis when you run into problems. One such tool is our monitoring 
solution: https://sematext.com/spm

HTH,
Emir
--
Monitoring - Log Management - Alerting - Anomaly Detection
Solr & Elasticsearch Consulting Support Training - http://sematext.com/



> On 18 Mar 2019, at 13:14, Aaron Yingcai Sun  wrote:
> 
> Hello, Emir,
> 
> Thanks for the reply, this is the solr version and heap info, standalone 
> single solr server. I don't have monitor tool connected. only look at 'top', 
> has not seen cpu spike so far, when the slow response happens, cpu usage is 
> not high at all, around 30%.
> 
> 
> # curl 'http://.../solr/admin/info/system?wt=json&indent=true'
> {
>  "responseHeader":{
>"status":0,
>"QTime":27},
>  "mode":"std",
>  "solr_home":"/ardome/solr",
>  "lucene":{
>"solr-spec-version":"6.5.1",
>"solr-impl-version":"6.5.1 cd1f23c63abe03ae650c75ec8ccb37762806cc75 - 
> jimczi - 2017-04-21 12:23:42",
>"lucene-spec-version":"6.5.1",
>"lucene-impl-version":"6.5.1 cd1f23c63abe03ae650c75ec8ccb37762806cc75 - 
> jimczi - 2017-04-21 12:17:15"},
>  "jvm":{
>"version":"1.8.0_144 25.144-b01",
>"name":"Oracle Corporation Java HotSpot(TM) 64-Bit Server VM",
>"spec":{
>  "vendor":"Oracle Corporation",
>  "name":"Java Platform API Specification",
>  "version":"1.8"},
>"jre":{
>  "vendor":"Oracle Corporation",
>  "version":"1.8.0_144"},
>"vm":{
>  "vendor":"Oracle Corporation",
>  "name":"Java HotSpot(TM) 64-Bit Server VM",
>  "version":"25.144-b01"},
>"processors":32,
>"memory":{
>  "free":"69.1 GB",
>  "total":"180.2 GB",
>  "max":"266.7 GB",
>  "used":"111 GB (%41.6)",
>  "raw":{
>"free":74238728336,
>"total":193470136320,
>"max":286331502592,
>"used":119231407984,
>"used%":41.64103736566334}},
>"jmx":{
>  
> "bootclasspath":"/usr/java/jdk1.8.0_144/jre/lib/resources.jar:/usr/java/jdk1.8.0_144/jre/lib/rt.jar:/usr/java/jdk1.8.0_144/jre/lib/sunrsasign.jar:/usr/java/jdk1.8.0_144/jre/lib/jsse.jar:/usr/java/jdk1.8.0_144/jre/lib/jce.jar:/usr/java/jdk1.8.0_144/jre/lib/charsets.jar:/usr/java/jdk1.8.0_144/jre/lib/jfr.jar:/usr/java/jdk1.8.0_144/jre/classes",
>  "classpath":"...",
>  "commandLineArgs":["-Xms100G",
>"-Xmx300G",
>"-DSTOP.PORT=8079",
>"-DSTOP.KEY=..",
>"-Dsolr.solr.home=..",
>"-Djetty.port=8983"],
>  "startTime":"2019-03-18T09:35:27.892Z",
>  "upTimeMS":9258422}},
>  "system":{
>"name":"Linux",
>"arch":"amd64",
>"availableProcessors":32,
>"systemLoadAverage":14.72,
>"version":"3.0.101-311.g08a8a9d-default",
>"committedVirtualMemorySize":2547960700928,
>"freePhysicalMemorySize":4530696192,
>"freeSwapSpaceSize":3486846976,
>"processCpuLoad":0.3257436126790475,
>"processCpuTime":9386945000,
>"systemCpuLoad":0.3279781055816521,
>"totalPhysicalMemorySize":406480175104,
>"totalSwapSpaceSize":4302303232,
>"maxFileDescriptorCount":32768,
>"openFileDescriptorCount":385,
>"uname":"Linux ... 3.0.101-311.g08a8a9d-default #1 SMP Wed Dec 14 10:15:37 
> UTC 2016 (08a8a9d) x86_64 x86_64 x86_64 GNU/Linux\n",
>"uptime":" 13:09pm  up 5 days 21:23,  7 users,  load average: 14.72, 
> 12.28, 11.48\n"}}
> 
> 
> 
> 
> 
> From: Emir Arnautović 
> Sent: Monday, March 18, 2019 12:10:3

Re: Solr index slow response

2019-03-18 Thread Toke Eskildsen
On Mon, 2019-03-18 at 10:47 +, Aaron Yingcai Sun wrote:
> Solr server is running on a quit powerful server, 32 cpus, 400GB RAM,
> while 300 GB is reserved for solr, [...]

300GB for Solr sounds excessive.

> Our application send 100 documents to solr per request, json encoded.
> the size is around 5M each time. some times the response time is
> under 1 seconds, some times could be 300 seconds, the slow response
> happens very often.
> ...
> There are around 100 clients sending those documents at the same
> time, but each for the client is blocking call which wait the http
> response then send the next one.

100 clients * 5MB/batch = at most 500MB. Or maybe you meant 100 clients
* 100 documents * 5MB/document = at most 50GB? Either way it is a long
way from 300GB and the stats you provide further down the thread
indicates that you are overprovisioning quite a lot:

"memory":{
  "free":"69.1 GB",
  "total":"180.2 GB",
  "max":"266.7 GB",
  "used":"111 GB (%41.6)",

Intermittent slow response times are a known effect of having large
Java heaps, due to stop-the-world garbage collections. 

Try dialing Xmx _way_ down: If your batches are only 5MB each, try
Xmx=20g or less. I know that the stats above says that Solr uses 111GB,
but the JVM has a tendency to expand the heap quite a lot when it is
getting hammered. If you want to check beforehand, you can see how much
memeory is freed from full GCs in the GC-log.

- Toke Eskildsen, Royal Danish Library




Re: Solr index slow response

2019-03-18 Thread Aaron Yingcai Sun
Hello, Emir,

Thanks for the reply, this is the solr version and heap info, standalone single 
solr server. I don't have monitor tool connected. only look at 'top', has not 
seen cpu spike so far, when the slow response happens, cpu usage is not high at 
all, around 30%.


# curl 'http://.../solr/admin/info/system?wt=json&indent=true'
{
  "responseHeader":{
"status":0,
"QTime":27},
  "mode":"std",
  "solr_home":"/ardome/solr",
  "lucene":{
"solr-spec-version":"6.5.1",
"solr-impl-version":"6.5.1 cd1f23c63abe03ae650c75ec8ccb37762806cc75 - 
jimczi - 2017-04-21 12:23:42",
"lucene-spec-version":"6.5.1",
"lucene-impl-version":"6.5.1 cd1f23c63abe03ae650c75ec8ccb37762806cc75 - 
jimczi - 2017-04-21 12:17:15"},
  "jvm":{
"version":"1.8.0_144 25.144-b01",
"name":"Oracle Corporation Java HotSpot(TM) 64-Bit Server VM",
"spec":{
  "vendor":"Oracle Corporation",
  "name":"Java Platform API Specification",
  "version":"1.8"},
"jre":{
  "vendor":"Oracle Corporation",
  "version":"1.8.0_144"},
"vm":{
  "vendor":"Oracle Corporation",
  "name":"Java HotSpot(TM) 64-Bit Server VM",
  "version":"25.144-b01"},
"processors":32,
"memory":{
  "free":"69.1 GB",
  "total":"180.2 GB",
  "max":"266.7 GB",
  "used":"111 GB (%41.6)",
  "raw":{
"free":74238728336,
"total":193470136320,
"max":286331502592,
"used":119231407984,
"used%":41.64103736566334}},
"jmx":{
  
"bootclasspath":"/usr/java/jdk1.8.0_144/jre/lib/resources.jar:/usr/java/jdk1.8.0_144/jre/lib/rt.jar:/usr/java/jdk1.8.0_144/jre/lib/sunrsasign.jar:/usr/java/jdk1.8.0_144/jre/lib/jsse.jar:/usr/java/jdk1.8.0_144/jre/lib/jce.jar:/usr/java/jdk1.8.0_144/jre/lib/charsets.jar:/usr/java/jdk1.8.0_144/jre/lib/jfr.jar:/usr/java/jdk1.8.0_144/jre/classes",
  "classpath":"...",
  "commandLineArgs":["-Xms100G",
"-Xmx300G",
"-DSTOP.PORT=8079",
"-DSTOP.KEY=..",
"-Dsolr.solr.home=..",
"-Djetty.port=8983"],
  "startTime":"2019-03-18T09:35:27.892Z",
  "upTimeMS":9258422}},
  "system":{
"name":"Linux",
"arch":"amd64",
"availableProcessors":32,
"systemLoadAverage":14.72,
"version":"3.0.101-311.g08a8a9d-default",
"committedVirtualMemorySize":2547960700928,
"freePhysicalMemorySize":4530696192,
"freeSwapSpaceSize":3486846976,
"processCpuLoad":0.3257436126790475,
"processCpuTime":9386945000,
"systemCpuLoad":0.3279781055816521,
"totalPhysicalMemorySize":406480175104,
"totalSwapSpaceSize":4302303232,
"maxFileDescriptorCount":32768,
"openFileDescriptorCount":385,
"uname":"Linux ... 3.0.101-311.g08a8a9d-default #1 SMP Wed Dec 14 10:15:37 
UTC 2016 (08a8a9d) x86_64 x86_64 x86_64 GNU/Linux\n",
"uptime":" 13:09pm  up 5 days 21:23,  7 users,  load average: 14.72, 12.28, 
11.48\n"}}





From: Emir Arnautović 
Sent: Monday, March 18, 2019 12:10:30 PM
To: solr-user@lucene.apache.org
Subject: Re: Solr index slow response

Hi Aaron,
Which version of Solr? How did you configure your heap? Is it standalone Solr 
or SolrCloud? A single server? Do you use some monitoring tool? Do you see some 
spikes, pauses or CPU usage is constant?

Thanks,
Emir
--
Monitoring - Log Management - Alerting - Anomaly Detection
Solr & Elasticsearch Consulting Support Training - http://sematext.com/



> On 18 Mar 2019, at 11:47, Aaron Yingcai Sun  wrote:
>
> Hello, Solr!
>
>
> We are having some performance issue when try to send documents for solr to 
> index. The repose time is very slow and unpredictable some time.
>
>
> Solr server is running on a quit powerful server, 32 cpus, 400GB RAM, while 
> 300 GB is reserved for solr, while this happening, cpu usage is around 30%, 
> mem usage is 34%.  io also look ok according to iotop. SSD disk.
>
>
> Our application send 100 documents to solr per request, json encoded. the 
> size is around 5M each time. some times the response time is under 1 seconds, 
> some times could be 300 seconds, the slow response happens very often.
>
>
> "Soft AutoCommit: disabled", "Hard AutoCommit: if uncommited for 360ms; 
> if 100 uncommited docs"
>
>
> There are around 100 clients sending those documents at the same time, but 
> each for the client is blocking call which wait the http response then send 
> the next one.
>
>
> I tried to make the number of documents smaller in one request, such as 20, 
> but  still I see slow response time to time, like 80 seconds.
>
>
> Would you help to give some hint how improve the response time?  solr does 
> not seems very loaded, there must be a way to make the response faster.
>
>
> BRs
>
> //Aaron
>
>
>



Re: Solr index slow response

2019-03-18 Thread Emir Arnautović
Hi Aaron,
Which version of Solr? How did you configure your heap? Is it standalone Solr 
or SolrCloud? A single server? Do you use some monitoring tool? Do you see some 
spikes, pauses or CPU usage is constant?

Thanks,
Emir
--
Monitoring - Log Management - Alerting - Anomaly Detection
Solr & Elasticsearch Consulting Support Training - http://sematext.com/



> On 18 Mar 2019, at 11:47, Aaron Yingcai Sun  wrote:
> 
> Hello, Solr!
> 
> 
> We are having some performance issue when try to send documents for solr to 
> index. The repose time is very slow and unpredictable some time.
> 
> 
> Solr server is running on a quit powerful server, 32 cpus, 400GB RAM, while 
> 300 GB is reserved for solr, while this happening, cpu usage is around 30%, 
> mem usage is 34%.  io also look ok according to iotop. SSD disk.
> 
> 
> Our application send 100 documents to solr per request, json encoded. the 
> size is around 5M each time. some times the response time is under 1 seconds, 
> some times could be 300 seconds, the slow response happens very often.
> 
> 
> "Soft AutoCommit: disabled", "Hard AutoCommit: if uncommited for 360ms; 
> if 100 uncommited docs"
> 
> 
> There are around 100 clients sending those documents at the same time, but 
> each for the client is blocking call which wait the http response then send 
> the next one.
> 
> 
> I tried to make the number of documents smaller in one request, such as 20, 
> but  still I see slow response time to time, like 80 seconds.
> 
> 
> Would you help to give some hint how improve the response time?  solr does 
> not seems very loaded, there must be a way to make the response faster.
> 
> 
> BRs
> 
> //Aaron
> 
> 
> 



Solr index slow response

2019-03-18 Thread Aaron Yingcai Sun
Hello, Solr!


We are having some performance issue when try to send documents for solr to 
index. The repose time is very slow and unpredictable some time.


Solr server is running on a quit powerful server, 32 cpus, 400GB RAM, while 300 
GB is reserved for solr, while this happening, cpu usage is around 30%, mem 
usage is 34%.  io also look ok according to iotop. SSD disk.


Our application send 100 documents to solr per request, json encoded. the size 
is around 5M each time. some times the response time is under 1 seconds, some 
times could be 300 seconds, the slow response happens very often.


"Soft AutoCommit: disabled", "Hard AutoCommit: if uncommited for 360ms; if 
100 uncommited docs"


There are around 100 clients sending those documents at the same time, but each 
for the client is blocking call which wait the http response then send the next 
one.


I tried to make the number of documents smaller in one request, such as 20, but 
 still I see slow response time to time, like 80 seconds.


Would you help to give some hint how improve the response time?  solr does not 
seems very loaded, there must be a way to make the response faster.


BRs

//Aaron





Re: Solr Index Size after reindex

2019-02-14 Thread David Hastings
The other thing I would be curious about is in your reindexing process, do
you clear out the entire index before hand?  if so perhaps there is content
missing/moved

On Thu, Feb 14, 2019 at 11:07 AM Erick Erickson 
wrote:

> Basically, this is not possible ;). Therefore there's something I
> don't understand
>
> There's nothing anywhere except what's in the index. By that I mean that
> _if_
> you copy an index (the data directory and children) from one place to
> another,
> that's all there is. No information about what's in the index is stored
> anywhere
> else. So there are a couple of possibilities I see:
>
> 1> Your rsync isn't doing what you think. By that I mean that "somehow" it
> isn't
> copying segments (perhaps with the same name, although the size and time
> checks would make it extremely unlikely to skip one). What happens if
> you _delete_ the data index on your target system first?
>
> 2> I'm not entirely sure what happens if there are multiple
> "segments_n" files. in
> the index. That file "points" to all the current segments. From a strictly
> theoretical standpoint, my _guess_ is that Lucene chooses the one with the
> highest "_n" value. So if you have multiple ones of those, it would be
> interesting
> to know,
>
> 3> Has Solr been restarted (or at least the core reloaded) on the target?
>
> So here's the experiment I'd run:
> 1> shut down the Solr running on the target
> 2> delete the data dir.
> 3> restart Solr and verify that you have zero docs. This will recreate
> the data dir and verify that that Solr instance is pointing where you
> think it is as a sanity check.
> 4> stop Solr again on the target.
> 5> do a hard commit on the source.
> 6> get a a long listing "ls -l" on your source index. This should be a
> lot of flies like _0.tim, _0.fdt, _1.tim, _1.fdt etc .
> 7> do your rsync. You should _not_ be indexing to the source at this time.
> 8> start Solr on the target.
> 9> check the target again. Assuming that you have _not_ been adding
> any documents to the source system during the rsync, I'd be stunned if
> there were any differences.
> 10> If there are incorrect counts or other anomalies:
> 10.1> double-check your rsync. Is it really getting the files from your
> source?
> 10.2> compare the long listing from your index you took in <6> with
> the target. Are all files identical size-wise? Are there any files on
> the target that are not on the source and vice-versa? If there are
> differences, that would explain your issues and would point to your
> rsync process being messed up.
>
> If the index directories are identical on the source and target and
> you _still_ see differences then there's an alternate reality that we
> occupy ;).
>
> And the Alfresco folks would probably be the ones to contact.
>
> Best,
> Erick
>
>
>
> On Wed, Feb 13, 2019 at 11:28 PM Mathieu Menard
>  wrote:
> >
> > Hello Andrea,
> >
> > I'm really sorry for the delay of my answer but I beed more information
> before answer you.
> >
> > Yes 5.365.213 is the numDocs you got just after the sync and yes
> 4.537.651 is the numDocs you got in the staging server after the reindexing
> and the colleague who realized the rsync confirm that it has been entirely
> completed.
> >
> > I don't see any transaction not completed that normaly means that the
> indexation is completed. That's why I don't understand the difference.
> >
> > Kind Regards
> >
> > Matthieu
> >
> > Original Message-
> > From: Andrea Gazzarini [mailto:a.gazzar...@sease.io]
> > Sent: samedi 9 février 2019 16:56
> > To: solr-user@lucene.apache.org
> > Subject: Re: Solr Index Size after reindex
> >
> > Yes, those numbers are different and that should explain the different
> size. I think you should be able to find some information in the Alfresco
> or Solr log. There must be a reason about the missing content.
> > For example, are those numbers coming from two comparable snapshots? In
> other words, I imagine that at a given moment X you rsync-ed the two servers
> >
> >   * 5.365.213 is the numDocs you got just after the sync, isn't it?
> >   * 4.537.651 is the numDocs you got in the staging server after the
> > reindexing isn't it? Are you sure the whole reindexing is completed?
> >
> > MaxDocs is the number of documents you have in the index including the
> deleted docs not yet cleared by a merge. In the console you should also see
> the "Deleted 

Re: Solr Index Size after reindex

2019-02-14 Thread Erick Erickson
Basically, this is not possible ;). Therefore there's something I
don't understand

There's nothing anywhere except what's in the index. By that I mean that _if_
you copy an index (the data directory and children) from one place to another,
that's all there is. No information about what's in the index is stored anywhere
else. So there are a couple of possibilities I see:

1> Your rsync isn't doing what you think. By that I mean that "somehow" it isn't
copying segments (perhaps with the same name, although the size and time
checks would make it extremely unlikely to skip one). What happens if
you _delete_ the data index on your target system first?

2> I'm not entirely sure what happens if there are multiple
"segments_n" files. in
the index. That file "points" to all the current segments. From a strictly
theoretical standpoint, my _guess_ is that Lucene chooses the one with the
highest "_n" value. So if you have multiple ones of those, it would be
interesting
to know,

3> Has Solr been restarted (or at least the core reloaded) on the target?

So here's the experiment I'd run:
1> shut down the Solr running on the target
2> delete the data dir.
3> restart Solr and verify that you have zero docs. This will recreate
the data dir and verify that that Solr instance is pointing where you
think it is as a sanity check.
4> stop Solr again on the target.
5> do a hard commit on the source.
6> get a a long listing "ls -l" on your source index. This should be a
lot of flies like _0.tim, _0.fdt, _1.tim, _1.fdt etc .
7> do your rsync. You should _not_ be indexing to the source at this time.
8> start Solr on the target.
9> check the target again. Assuming that you have _not_ been adding
any documents to the source system during the rsync, I'd be stunned if
there were any differences.
10> If there are incorrect counts or other anomalies:
10.1> double-check your rsync. Is it really getting the files from your source?
10.2> compare the long listing from your index you took in <6> with
the target. Are all files identical size-wise? Are there any files on
the target that are not on the source and vice-versa? If there are
differences, that would explain your issues and would point to your
rsync process being messed up.

If the index directories are identical on the source and target and
you _still_ see differences then there's an alternate reality that we
occupy ;).

And the Alfresco folks would probably be the ones to contact.

Best,
Erick



On Wed, Feb 13, 2019 at 11:28 PM Mathieu Menard
 wrote:
>
> Hello Andrea,
>
> I'm really sorry for the delay of my answer but I beed more information 
> before answer you.
>
> Yes 5.365.213 is the numDocs you got just after the sync and yes 4.537.651 is 
> the numDocs you got in the staging server after the reindexing and the 
> colleague who realized the rsync confirm that it has been entirely completed.
>
> I don't see any transaction not completed that normaly means that the 
> indexation is completed. That's why I don't understand the difference.
>
> Kind Regards
>
> Matthieu
>
> Original Message-
> From: Andrea Gazzarini [mailto:a.gazzar...@sease.io]
> Sent: samedi 9 février 2019 16:56
> To: solr-user@lucene.apache.org
> Subject: Re: Solr Index Size after reindex
>
> Yes, those numbers are different and that should explain the different size. 
> I think you should be able to find some information in the Alfresco or Solr 
> log. There must be a reason about the missing content.
> For example, are those numbers coming from two comparable snapshots? In other 
> words, I imagine that at a given moment X you rsync-ed the two servers
>
>   * 5.365.213 is the numDocs you got just after the sync, isn't it?
>   * 4.537.651 is the numDocs you got in the staging server after the
> reindexing isn't it? Are you sure the whole reindexing is completed?
>
> MaxDocs is the number of documents you have in the index including the 
> deleted docs not yet cleared by a merge. In the console you should also see 
> the "Deleted docs" count which should be equal to (maxdocs - numdocs)
>
> Ciao
>
> Andrea
>
> On 08/02/2019 15:53, Mathieu Menard wrote:
> >
> > Hi Andrea,
> >
> > I've checked this information and here is the result:
> >
> >
> >
> > PRODUCTION
> >
> >
> >
> > STAGING
> >
> > *numDocs*
> >
> >
> >
> > 5.365.213
> >
> >
> >
> > 4.537.651
> >
> > *MaxDoc*
> >
> >
> >
> > 5.845.469
> >
> >
> >
> > 5.129.556
> >
> > It seems that there is more th

RE: Solr Index Size after reindex

2019-02-13 Thread Mathieu Menard
Hello Andrea,

I'm really sorry for the delay of my answer but I beed more information before 
answer you.

Yes 5.365.213 is the numDocs you got just after the sync and yes 4.537.651 is 
the numDocs you got in the staging server after the reindexing and the 
colleague who realized the rsync confirm that it has been entirely completed.

I don't see any transaction not completed that normaly means that the 
indexation is completed. That's why I don't understand the difference.

Kind Regards

Matthieu

Original Message-
From: Andrea Gazzarini [mailto:a.gazzar...@sease.io] 
Sent: samedi 9 février 2019 16:56
To: solr-user@lucene.apache.org
Subject: Re: Solr Index Size after reindex

Yes, those numbers are different and that should explain the different size. I 
think you should be able to find some information in the Alfresco or Solr log. 
There must be a reason about the missing content. 
For example, are those numbers coming from two comparable snapshots? In other 
words, I imagine that at a given moment X you rsync-ed the two servers

  * 5.365.213 is the numDocs you got just after the sync, isn't it?
  * 4.537.651 is the numDocs you got in the staging server after the
reindexing isn't it? Are you sure the whole reindexing is completed?

MaxDocs is the number of documents you have in the index including the deleted 
docs not yet cleared by a merge. In the console you should also see the 
"Deleted docs" count which should be equal to (maxdocs - numdocs)

Ciao

Andrea

On 08/02/2019 15:53, Mathieu Menard wrote:
>
> Hi Andrea,
>
> I've checked this information and here is the result:
>
>   
>
> PRODUCTION
>
>   
>
> STAGING
>
> *numDocs*
>
>   
>
> 5.365.213
>
>   
>
> 4.537.651
>
> *MaxDoc*
>
>   
>
> 5.845.469
>
>   
>
> 5.129.556
>
> It seems that there is more than 800.00 docs in PRODUCTION that will 
> explain the size of indexes more important. But there is a thing that 
> I don't understand, we have copied the DB and the contenstore the 
> numDocs for the two environments should be the same no?
>
> Could you also explain me the meaning of the maxDocs value pleases?
>
> Thanks
>
> Matthieu
>
> *From:*Andrea Gazzarini [mailto:a.gazzar...@sease.io]
> *Sent:* vendredi 8 février 2019 14:54
> *To:* solr-user@lucene.apache.org
> *Subject:* Re: Solr Index Size after reindex
>
> Hi Mathieu,
> what about the docs in the two infrastructures? Do they have the same 
> numbers (numdocs / maxdocs)? Any meaningful message (error or not) in 
> log files?
>
> Andrea
>
> On 08/02/2019 14:19, Mathieu Menard wrote:
>
> Hello,
>
> I would like to have your point of view about an observation we
> have made on our two alfresco install (Production and Staging
> environment) and more specifically on the size of our solr indexes
> on these two environments.
>
> Regularly we do a rsync between the Production and the Staging
> environment, we make a copy of the Alfresco's DB and a copy of the
> entire contenstore after that we reindex all the alfresco content.
>
> We have noticed that for the production environment we have 19 Gb
> of indexes while in the staging we have "only" 11. Gb of indexes.
> We have some difficulties to understand this difference because we
> assume that the indexes optimization in the same for a full
> reindex or for the normal use of solr.
>
> I've verified the configuration between the two solr instances and
> I don't see any differences could you help me to better understand
>  this phenomenon.
>
> Here you can find some information about our two environment, if
> you need more details, I will give you as soon as possible:
>
>   
>
> PRODUCTION
>
>   
>
> STAGING
>
> Alfresco version
>
>   
>
> 5.1.1.4
>
>   
>
> 5.1.1.4
>
> Solr Version
>
>   
>
>   
>
> Java version
>
>   
>
>   
>
> Linux Machine
>
>   
>
> See Staging_caracteristics.txt file in attachment
>
>   
>
> See Staging_caracteristics.txt file in attachment
>
> Please let me know if you any other information I will sent it to
> you rapidly.
>
> Kind Regards
>
> Matthieu
>


Re: Solr Index Size after reindex

2019-02-09 Thread Andrea Gazzarini
Yes, those numbers are different and that should explain the different 
size. I think you should be able to find some information in the 
Alfresco or Solr log. There must be a reason about the missing content. 
For example, are those numbers coming from two comparable snapshots? In 
other words, I imagine that at a given moment X you rsync-ed the two servers


 * 5.365.213 is the numDocs you got just after the sync, isn't it?
 * 4.537.651 is the numDocs you got in the staging server after the
   reindexing isn't it? Are you sure the whole reindexing is completed?

MaxDocs is the number of documents you have in the index including the 
deleted docs not yet cleared by a merge. In the console you should also 
see the "Deleted docs" count which should be equal to (maxdocs - numdocs)


Ciao

Andrea

On 08/02/2019 15:53, Mathieu Menard wrote:


Hi Andrea,

I’ve checked this information and here is the result:



PRODUCTION



STAGING

*numDocs*



5.365.213



4.537.651

*MaxDoc*



5.845.469



5.129.556

It seems that there is more than 800.00 docs in PRODUCTION that will 
explain the size of indexes more important. But there is a thing that 
I don’t understand, we have copied the DB and the contenstore the 
numDocs for the two environments should be the same no?


Could you also explain me the meaning of the maxDocs value pleases?

Thanks

Matthieu

*From:*Andrea Gazzarini [mailto:a.gazzar...@sease.io]
*Sent:* vendredi 8 février 2019 14:54
*To:* solr-user@lucene.apache.org
*Subject:* Re: Solr Index Size after reindex

Hi Mathieu,
what about the docs in the two infrastructures? Do they have the same 
numbers (numdocs / maxdocs)? Any meaningful message (error or not) in 
log files?


Andrea

On 08/02/2019 14:19, Mathieu Menard wrote:

Hello,

I would like to have your point of view about an observation we
have made on our two alfresco install (Production and Staging
environment) and more specifically on the size of our solr indexes
on these two environments.

Regularly we do a rsync between the Production and the Staging
environment, we make a copy of the Alfresco’s DB and a copy of the
entire contenstore after that we reindex all the alfresco content.

We have noticed that for the production environment we have 19 Gb
of indexes while in the staging we have “only” 11. Gb of indexes.
We have some difficulties to understand this difference because we
assume that the indexes optimization in the same for a full
reindex or for the normal use of solr.

I’ve verified the configuration between the two solr instances and
I don’t see any differences could you help me to better understand
 this phenomenon.

Here you can find some information about our two environment, if
you need more details, I will give you as soon as possible:



PRODUCTION



STAGING

Alfresco version



5.1.1.4



5.1.1.4

Solr Version





Java version





Linux Machine



See Staging_caracteristics.txt file in attachment



See Staging_caracteristics.txt file in attachment

Please let me know if you any other information I will sent it to
you rapidly.

Kind Regards

Matthieu



RE: Solr Index Size after reindex

2019-02-08 Thread Mathieu Menard
Hi Andrea,

I've checked this information and here is the result:



PRODUCTION

STAGING

numDocs

5.365.213

4.537.651

MaxDoc

5.845.469

5.129.556


It seems that there is more than 800.00 docs in PRODUCTION that will explain 
the size of indexes more important. But there is a thing that I don't 
understand, we have copied the DB and the contenstore the numDocs for the two 
environments should be the same no?

Could you also explain me the meaning of the maxDocs value pleases?

Thanks

Matthieu


From: Andrea Gazzarini [mailto:a.gazzar...@sease.io]
Sent: vendredi 8 février 2019 14:54
To: solr-user@lucene.apache.org
Subject: Re: Solr Index Size after reindex

Hi Mathieu,
what about the docs in the two infrastructures? Do they have the same numbers 
(numdocs / maxdocs)? Any meaningful message (error or not) in log files?

Andrea
On 08/02/2019 14:19, Mathieu Menard wrote:
Hello,

I would like to have your point of view about an observation we have made on 
our two alfresco install (Production and Staging environment) and more 
specifically on the size of our solr indexes on these two environments.

Regularly we do a rsync between the Production and the Staging environment, we 
make a copy of the Alfresco's DB and a copy of the entire contenstore after 
that we reindex all the alfresco content.

We have noticed that for the production environment we have 19 Gb of indexes 
while in the staging we have "only" 11. Gb of indexes. We have some 
difficulties to understand this difference because we assume that the indexes 
optimization in the same for a full reindex or for the normal use of solr.

I've verified the configuration between the two solr instances and I don't see 
any differences could you help me to better understand  this phenomenon.

Here you can find some information about our two environment, if you need more 
details, I will give you as soon as possible:



PRODUCTION

STAGING

Alfresco version

5.1.1.4

5.1.1.4

Solr Version

[cid:image002.jpg@01D4BFC5.52F6DE40]

[cid:image003.jpg@01D4BFC5.52F6DE40]

Java version

[cid:image004.png@01D4BFC5.52F6DE40]

[cid:image005.png@01D4BFC5.52F6DE40]

Linux Machine

See Staging_caracteristics.txt file in attachment

See Staging_caracteristics.txt file in attachment


Please let me know if you any other information I will sent it to you rapidly.

Kind Regards

Matthieu




Re: Solr Index Size after reindex

2019-02-08 Thread Andrea Gazzarini

Hi Mathieu,
what about the docs in the two infrastructures? Do they have the same 
numbers (numdocs / maxdocs)? Any meaningful message (error or not) in 
log files?


Andrea

On 08/02/2019 14:19, Mathieu Menard wrote:


Hello,

I would like to have your point of view about an observation we have 
made on our two alfresco install (Production and Staging environment) 
and more specifically on the size of our solr indexes on these two 
environments.


Regularly we do a rsync between the Production and the Staging 
environment, we make a copy of the Alfresco’s DB and a copy of the 
entire contenstore after that we reindex all the alfresco content.


We have noticed that for the production environment we have 19 Gb of 
indexes while in the staging we have “only” 11. Gb of indexes. We have 
some difficulties to understand this difference because we assume that 
the indexes optimization in the same for a full reindex or for the 
normal use of solr.


I’ve verified the configuration between the two solr instances and I 
don’t see any differences could you help me to better understand  this 
phenomenon.


Here you can find some information about our two environment, if you 
need more details, I will give you as soon as possible:




PRODUCTION



STAGING

Alfresco version



5.1.1.4



5.1.1.4

Solr Version





Java version





Linux Machine



See Staging_caracteristics.txt file in attachment



See Staging_caracteristics.txt file in attachment

Please let me know if you any other information I will sent it to you 
rapidly.


Kind Regards

Matthieu





Re: Solr index writing to s3

2019-01-17 Thread Mikhail Khludnev
There is some experience on backup to s3
https://issues.apache.org/jira/browse/SOLR-9952 iirc, it lacks
performance.
Jörn, it's not a point, but literally s3 consistency might be enough, since
s3 provides read-after-write for PUT and Lucene index writer is
append-only.

On Thu, Jan 17, 2019 at 10:15 AM Jörn Franke  wrote:

> This is not a requirement. This is a statement to a problem where there
> could be other solutions. s3 is only eventually consistent and I am not
> sure Solr works properly in this case. You may also need to check the S3
> consistency to be applied.
>
> > Am 16.01.2019 um 19:39 schrieb Naveen M :
> >
> > hi,
> >
> > My requirement is to write the index data into S3, we have solr installed
> > on aws instances. Please let me know if there is any documentation on how
> > to achieve writing the index data to s3.
> >
> > Thanks
>


-- 
Sincerely yours
Mikhail Khludnev


Re: Solr index writing to s3

2019-01-16 Thread Jörn Franke
This is not a requirement. This is a statement to a problem where there could 
be other solutions. s3 is only eventually consistent and I am not sure Solr 
works properly in this case. You may also need to check the S3 consistency to 
be applied.

> Am 16.01.2019 um 19:39 schrieb Naveen M :
> 
> hi,
> 
> My requirement is to write the index data into S3, we have solr installed
> on aws instances. Please let me know if there is any documentation on how
> to achieve writing the index data to s3.
> 
> Thanks


Re: Solr index writing to s3

2019-01-16 Thread Hendrik Haddorp
Theoretically you should be able to use the HDFS backend, which you can 
configure to use s3. Last time I tried that it did however not work for 
some reason. Here is an example for that, which also seems to have 
ultimately failed: 
https://community.plm.automation.siemens.com/t5/Developer-Space/Running-Solr-on-S3/td-p/449360


On 16.01.2019 19:39, Naveen M wrote:

hi,

My requirement is to write the index data into S3, we have solr installed
on aws instances. Please let me know if there is any documentation on how
to achieve writing the index data to s3.

Thanks





Solr index writing to s3

2019-01-16 Thread Naveen M
hi,

My requirement is to write the index data into S3, we have solr installed
on aws instances. Please let me know if there is any documentation on how
to achieve writing the index data to s3.

Thanks


[solr-index-update] solr update Is there a "literal.field_name" feature in json?

2018-12-16 Thread 유정인
Hello,

The solr update's csv has a "literal.field_name" function.

Does json have a similar function?

No function found.

Thank you.



Re: Solr Index Data will be delete if state.json did not exists

2018-12-14 Thread Jan Høydahl
I would use the Backup/Restore API
https://lucene.apache.org/solr/guide/7_5/making-and-restoring-backups.html 


Alternatively, you could create collection B, using same configset as A, stop 
solr, copy the data folder and start again

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com

> 14. des. 2018 kl. 07:46 skrev Lei Wang :
> 
> Hi guys,
> 
> Currently I am running a 2 nodes cloud of Solr 7.5, I already have a
> collection named A and it worked fine with 20GB index Data, while I want to
> create a collection named B and want to copy index data from A.
> So in Solr5.5, I just copy index folder from A and renamed to B. restart
> solr cluster, collection B will be register successfully to solr, and
> related data will be pushed to zookeeper(leader info etc).
> In Solr7.5, I assume because of
> https://issues.apache.org/jira/browse/SOLR-12066, index folder B will be
> deleted since no state.json info about collection B can be found in
> zookeeper,
> 
> So my question is what should I do If I want B can register to solr cluster
> successfully other than folder be deleted?
> 
> I have tried to set legacyCloud to true, and B can be registered to Solr
> cloud successfully.  collection B status data will be pushed into
> /clusterstate.json
> 
> ,
> I have to call MIGRATESTATEFORMAT first then remove legacyCloud or set it
> to false.
> 
> So if they is any other solutions for this case?
> 
> Looking forward your response.
> 
> Thanks,
> Lyle



Solr Index Data will be delete if state.json did not exists

2018-12-13 Thread Lei Wang
Hi guys,

Currently I am running a 2 nodes cloud of Solr 7.5, I already have a
collection named A and it worked fine with 20GB index Data, while I want to
create a collection named B and want to copy index data from A.
So in Solr5.5, I just copy index folder from A and renamed to B. restart
solr cluster, collection B will be register successfully to solr, and
related data will be pushed to zookeeper(leader info etc).
In Solr7.5, I assume because of
https://issues.apache.org/jira/browse/SOLR-12066, index folder B will be
deleted since no state.json info about collection B can be found in
zookeeper,

So my question is what should I do If I want B can register to solr cluster
successfully other than folder be deleted?

I have tried to set legacyCloud to true, and B can be registered to Solr
cloud successfully.  collection B status data will be pushed into
/clusterstate.json

,
I have to call MIGRATESTATEFORMAT first then remove legacyCloud or set it
to false.

So if they is any other solutions for this case?

Looking forward your response.

Thanks,
Lyle


Re: [solr-index]Can I do a lot of analysis on one field at the time of indexing?

2018-12-13 Thread Walter Underwood
I’m heading out on vacation for about a week and half, not sure I’ll have time.

Start with the discussion in this mail thread.

http://lucene.472066.n3.nabble.com/Running-an-analyzer-chain-in-an-update-request-processor-td4384207.html
 
<http://lucene.472066.n3.nabble.com/Running-an-analyzer-chain-in-an-update-request-processor-td4384207.html>

But I would also think about other ways to do it.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)

> On Dec 13, 2018, at 9:37 PM, 유정인  wrote:
> 
> WalterUnderwood, thank you for your reply.
> 
> If you can afford the time, can you give us a specific sample of the proposed 
> method?
> 
> Thank you.
> 
> -Original Message-
> From: Walter Underwood  
> Sent: Friday, December 14, 2018 12:11 PM
> To: solr-user@lucene.apache.org
> Subject: Re: [solr-index]Can I do a lot of analysis on one field at the time 
> of indexing?
> 
> Right, no feature that does that for you.
> 
> You should be able to code that with an update request processor script.
> You can fetch an analyzer chain, run it, add the results to a field, then do 
> that again.
> 
> I have one that runs a chain with minhash then saves the hex values of the 
> hashes to a field.
> 
> It is fussy, but doable.
> 
> wunder
> Walter Underwood
> wun...@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
> 
>> On Dec 13, 2018, at 6:55 PM, Erick Erickson  wrote:
>> 
>> In a word, "no". A field can have exactly one tokenizer, and there are 
>> no conditional filters. You can copyField to multiple individual 
>> fields and treat each one of those differently, i.e. copy from title 
>> to title1, title2 etc. where each one has a different analysis chain.
>> 
>> Best,
>> Erick
>> On Thu, Dec 13, 2018 at 5:21 PM 유정인  wrote:
>>> 
>>> Hello
>>> 
>>> I have a question about index schemas.
>>> 
>>> 1) Can I do various analysis on one field?
>>> For example, you can analyze the 'title' field with multiple 
>>> tokenizers, and merge the analysis into a single field.
>>> 
>>> 2) You can collect multiple fields in one field using 'copyField' function.
>>> However, several fields have different data attributes (eg, category 
>>> fields, text fields, etc.) _) At this time, I would like to analyze 
>>> each field differently.
>>> 
>>> Do you have these features in version 7.5? Is there any kind of 
>>> shortcut to do these similar functions?
>>> 
>>> Thank you for your advice.
> 
> 



RE: [solr-index]Can I do a lot of analysis on one field at the time of indexing?

2018-12-13 Thread 유정인
WalterUnderwood, thank you for your reply.

If you can afford the time, can you give us a specific sample of the proposed 
method?

Thank you.

-Original Message-
From: Walter Underwood  
Sent: Friday, December 14, 2018 12:11 PM
To: solr-user@lucene.apache.org
Subject: Re: [solr-index]Can I do a lot of analysis on one field at the time of 
indexing?

Right, no feature that does that for you.

You should be able to code that with an update request processor script.
You can fetch an analyzer chain, run it, add the results to a field, then do 
that again.

I have one that runs a chain with minhash then saves the hex values of the 
hashes to a field.

It is fussy, but doable.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)

> On Dec 13, 2018, at 6:55 PM, Erick Erickson  wrote:
> 
> In a word, "no". A field can have exactly one tokenizer, and there are 
> no conditional filters. You can copyField to multiple individual 
> fields and treat each one of those differently, i.e. copy from title 
> to title1, title2 etc. where each one has a different analysis chain.
> 
> Best,
> Erick
> On Thu, Dec 13, 2018 at 5:21 PM 유정인  wrote:
>> 
>> Hello
>> 
>> I have a question about index schemas.
>> 
>> 1) Can I do various analysis on one field?
>> For example, you can analyze the 'title' field with multiple 
>> tokenizers, and merge the analysis into a single field.
>> 
>> 2) You can collect multiple fields in one field using 'copyField' function.
>> However, several fields have different data attributes (eg, category 
>> fields, text fields, etc.) _) At this time, I would like to analyze 
>> each field differently.
>> 
>> Do you have these features in version 7.5? Is there any kind of 
>> shortcut to do these similar functions?
>> 
>> Thank you for your advice.




Re: [solr-index]Can I do a lot of analysis on one field at the time of indexing?

2018-12-13 Thread Walter Underwood
Right, no feature that does that for you.

You should be able to code that with an update request processor script.
You can fetch an analyzer chain, run it, add the results to a field, then
do that again.

I have one that runs a chain with minhash then saves the hex values of
the hashes to a field.

It is fussy, but doable.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)

> On Dec 13, 2018, at 6:55 PM, Erick Erickson  wrote:
> 
> In a word, "no". A field can have exactly one tokenizer, and there are
> no conditional filters. You can copyField to multiple individual
> fields and treat each one of those differently, i.e. copy from title
> to title1, title2 etc. where each one has a different analysis chain.
> 
> Best,
> Erick
> On Thu, Dec 13, 2018 at 5:21 PM 유정인  wrote:
>> 
>> Hello
>> 
>> I have a question about index schemas.
>> 
>> 1) Can I do various analysis on one field?
>> For example, you can analyze the 'title' field with multiple tokenizers,
>> and merge the analysis into a single field.
>> 
>> 2) You can collect multiple fields in one field using 'copyField' function.
>> However, several fields have different data attributes (eg, category
>> fields, text fields, etc.) _)
>> At this time, I would like to analyze each field differently.
>> 
>> Do you have these features in version 7.5? Is there any kind of shortcut to
>> do these similar functions?
>> 
>> Thank you for your advice.



Re: [solr-index]Can I do a lot of analysis on one field at the time of indexing?

2018-12-13 Thread Erick Erickson
In a word, "no". A field can have exactly one tokenizer, and there are
no conditional filters. You can copyField to multiple individual
fields and treat each one of those differently, i.e. copy from title
to title1, title2 etc. where each one has a different analysis chain.

Best,
Erick
On Thu, Dec 13, 2018 at 5:21 PM 유정인  wrote:
>
> Hello
>
> I have a question about index schemas.
>
> 1) Can I do various analysis on one field?
> For example, you can analyze the 'title' field with multiple tokenizers,
> and merge the analysis into a single field.
>
> 2) You can collect multiple fields in one field using 'copyField' function.
> However, several fields have different data attributes (eg, category
> fields, text fields, etc.) _)
> At this time, I would like to analyze each field differently.
>
> Do you have these features in version 7.5? Is there any kind of shortcut to
> do these similar functions?
>
> Thank you for your advice.
>
>
>


[solr-index]Can I do a lot of analysis on one field at the time of indexing?

2018-12-13 Thread 유정인
Hello

I have a question about index schemas.

1) Can I do various analysis on one field?
For example, you can analyze the 'title' field with multiple tokenizers,
and merge the analysis into a single field.

2) You can collect multiple fields in one field using 'copyField' function.
However, several fields have different data attributes (eg, category
fields, text fields, etc.) _)
At this time, I would like to analyze each field differently.

Do you have these features in version 7.5? Is there any kind of shortcut to
do these similar functions?

Thank you for your advice.

 



Re: Moving Solr index from Staging to Production

2018-11-28 Thread Toke Eskildsen
Arunan Sugunakumar  wrote:

> https://lucene.apache.org/solr/guide/6_6/making-and-restoring-backups.html

We (also?) prefer to keep our stage/build setup separate from production. 
Backup + restore works well for us. It is very fast, as it is basically just 
copying the segment files.

- Toke Eskildsen


Re: Moving Solr index from Staging to Production

2018-11-28 Thread David Hastings
you just set up the solr install on the production server as a slave to
your current install and hit the replicate button from the admin interface
on the production server

On Wed, Nov 28, 2018 at 1:34 PM Arunan Sugunakumar 
wrote:

> Hi,
>
> I have deployed Solr 7.2 in a staging server in standalone mode. I want to
> move it to the production server.
>
> I would like to know whether I need to run the indexing process again or is
> there any easier way to move the existing index?
>
> I went through this documentation but I couldn't figure out whether it is
> the right way.
> https://lucene.apache.org/solr/guide/6_6/making-and-restoring-backups.html
>
> Thanks in advance!!
>
> Regards
> Arunan
>


Moving Solr index from Staging to Production

2018-11-28 Thread Arunan Sugunakumar
Hi,

I have deployed Solr 7.2 in a staging server in standalone mode. I want to
move it to the production server.

I would like to know whether I need to run the indexing process again or is
there any easier way to move the existing index?

I went through this documentation but I couldn't figure out whether it is
the right way.
https://lucene.apache.org/solr/guide/6_6/making-and-restoring-backups.html

Thanks in advance!!

Regards
Arunan


Solr index size affected by duplication

2018-11-18 Thread sagandhi
Hi,

This is a sample doc -

parent
shirt

child
Red
XL
6

Red
XL
6


The parent doc represents an item/object and the nested docs contain
extended properties of the object in parent doc.
So while searching the nested docs are filtered out for proper result count.
This required duplicating the nested doc fields in the parent doc.

This duplication of fields has resulted in huge Solr index size and I am
planning to get rid of them and use blockjoin for nested doc fields. 
This has caused another serious problem where if the value I am searching
for is present in a nested doc, no results are found (as nested docs are
filtered out as a rule. This used to work before because even if the nested
doc is filtered out, the parent doc is still returned)

I have come up with 2 approaches to solve this.
1. Include global field while indexing:
For each field in nested doc add the corresponding value in global field in
the parent doc.

parent

child
Red
XL
6

Red
XL
6


2. Use a new copy field:
The fields in nested doc have unique name patterns from other fields so I
can easily create another copy field that contains only the nested doc
fields.
Now while querying, I use block-join on this copy field along with the
existing global field like so -

global:(red) OR {!parent which=doc_type:parent}c_global:(red)

Add this in schema:


3. I came across another approach/hack accidentally.
I had modified the existing schema to remove duplicate parent fields but the
data I used for reindexing contained the duplicate parent fields.
So the global field contains values from both parent and nested field. But
the indexed doc itself will skip the parent doc fields as the schema doesn't
have them.
I was able to search for nested doc field values, and the total index size
was less than the above two.

Can someone please suggest which is the better option and why?

Thanks!
Soham



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: SOLR Index Time Running Optimization

2018-09-26 Thread Walter Underwood
How long does the query take when it is run directly, without Solr?

For our DIH queries, Solr was not the slow part. It took 90 minutes
directly or with DIH. With our big cluster, I’ve seen indexing rates of
one million docs per minute.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)

> On Sep 26, 2018, at 9:44 AM, Jan Høydahl  wrote:
> 
> With DIH you are doing indexing single-threaded. You should be able to 
> configure multiple DIH's on the same collection and then partition the data 
> between them, issuing slightly different SQL to each. But I don't exactly 
> know what that would look like.
> 
> --
> Jan Høydahl, search solution architect
> Cominvent AS - www.cominvent.com
> 
>> 26. sep. 2018 kl. 14:30 skrev Susheel Kumar :
>> 
>> Also are you using Solr data import? That will be much slower compare to if
>> you write our own little indexer which does indexing in batches and with
>> multiple threads.
>> 
>> On Wed, Sep 26, 2018 at 8:00 AM Vincenzo D'Amore  wrote:
>> 
>>> Hi, I know this is the shortest way but, had you tried to add more core or
>>> CPU to your solr instances? How big is you collection in terms of GB and
>>> number of documents?
>>> 
>>> Ciao,
>>> Vincenzo
>>> 
>>> 
 On 26 Sep 2018, at 08:36, Krizelle Mae Hernandez <
>>> krizellemae.marti...@sas.com> wrote:
 
 Hi.
 
 Our SOLR currently is running approximately 39hours for Full and Delta
>>> Import. I would like to ask for your assistance on how can we shorten the
>>> 39hours run time in any possible solution?
 For SOLR version, we are using solr 5.3.1.
 
 Regards,
 Krizelle Mae M. Hernandez
>>> 
> 



Re: SOLR Index Time Running Optimization

2018-09-26 Thread Jan Høydahl
With DIH you are doing indexing single-threaded. You should be able to 
configure multiple DIH's on the same collection and then partition the data 
between them, issuing slightly different SQL to each. But I don't exactly know 
what that would look like.

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com

> 26. sep. 2018 kl. 14:30 skrev Susheel Kumar :
> 
> Also are you using Solr data import? That will be much slower compare to if
> you write our own little indexer which does indexing in batches and with
> multiple threads.
> 
> On Wed, Sep 26, 2018 at 8:00 AM Vincenzo D'Amore  wrote:
> 
>> Hi, I know this is the shortest way but, had you tried to add more core or
>> CPU to your solr instances? How big is you collection in terms of GB and
>> number of documents?
>> 
>> Ciao,
>> Vincenzo
>> 
>> 
>>> On 26 Sep 2018, at 08:36, Krizelle Mae Hernandez <
>> krizellemae.marti...@sas.com> wrote:
>>> 
>>> Hi.
>>> 
>>> Our SOLR currently is running approximately 39hours for Full and Delta
>> Import. I would like to ask for your assistance on how can we shorten the
>> 39hours run time in any possible solution?
>>> For SOLR version, we are using solr 5.3.1.
>>> 
>>> Regards,
>>> Krizelle Mae M. Hernandez
>> 



Re: SOLR Index Time Running Optimization

2018-09-26 Thread Susheel Kumar
Also are you using Solr data import? That will be much slower compare to if
you write our own little indexer which does indexing in batches and with
multiple threads.

On Wed, Sep 26, 2018 at 8:00 AM Vincenzo D'Amore  wrote:

> Hi, I know this is the shortest way but, had you tried to add more core or
> CPU to your solr instances? How big is you collection in terms of GB and
> number of documents?
>
> Ciao,
> Vincenzo
>
>
> > On 26 Sep 2018, at 08:36, Krizelle Mae Hernandez <
> krizellemae.marti...@sas.com> wrote:
> >
> > Hi.
> >
> > Our SOLR currently is running approximately 39hours for Full and Delta
> Import. I would like to ask for your assistance on how can we shorten the
> 39hours run time in any possible solution?
> > For SOLR version, we are using solr 5.3.1.
> >
> > Regards,
> > Krizelle Mae M. Hernandez
>


Re: SOLR Index Time Running Optimization

2018-09-26 Thread Vincenzo D'Amore
Hi, I know this is the shortest way but, had you tried to add more core or CPU 
to your solr instances? How big is you collection in terms of GB and number of 
documents?

Ciao,
Vincenzo


> On 26 Sep 2018, at 08:36, Krizelle Mae Hernandez 
>  wrote:
> 
> Hi.
> 
> Our SOLR currently is running approximately 39hours for Full and Delta 
> Import. I would like to ask for your assistance on how can we shorten the 
> 39hours run time in any possible solution?
> For SOLR version, we are using solr 5.3.1.
> 
> Regards,
> Krizelle Mae M. Hernandez


SOLR Index Time Running Optimization

2018-09-25 Thread Krizelle Mae Hernandez
Hi.

Our SOLR currently is running approximately 39hours for Full and Delta Import. 
I would like to ask for your assistance on how can we shorten the 39hours run 
time in any possible solution?
For SOLR version, we are using solr 5.3.1.

Regards,
Krizelle Mae M. Hernandez


Re: Solr index clearing

2018-09-25 Thread Jan Høydahl
Hi,

Solr does not do anything automatically, so I think this is a question for the 
Nutch community - http://nutch.apache.org/mailing_lists.html

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com

> 24. sep. 2018 kl. 20:06 skrev Bineesh :
> 
> Team,
> 
> We use solr 7.3.1 and Nucth 1.15.
> 
> I created two collections in solr and data successfully indexed from Nutch
> after crawling. Up on the third collection index in solr, i see that first
> collecion indexed data automatically clears.Pls suggest
> 
> 
> 
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html



Solr index clearing

2018-09-24 Thread Bineesh
Team,

We use solr 7.3.1 and Nucth 1.15.

I created two collections in solr and data successfully indexed from Nutch
after crawling. Up on the third collection index in solr, i see that first
collecion indexed data automatically clears.Pls suggest



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Solr Index Issues

2018-09-10 Thread Walter Underwood
Every time you see "Expected mime type application/octet-stream but got 
text/html” from SolrJ,
it means that Solr returned an error. Look for an error in the Solr logs at the 
same time as the
SolrJ message.

It could be any error, which is why we can’t help more. After you know the Solr 
error, we might
be able to help.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)

> On Sep 10, 2018, at 7:45 AM, Erick Erickson  wrote:
> 
> It would be best to ask on the Nutch mailing list, this list doesn't
> have very many people who know _how_ Nutch uses Solr though.
> 
> Best,
> Erick
> On Sun, Sep 9, 2018 at 11:47 PM Bineesh  wrote:
>> 
>> Hi Team,
>> 
>> We are using Nutch 1.15 and Solr 6.6.3
>> 
>> We tried crawling one of the URL and and noticed issues while indexing data
>> to solr.Below is the capture from logs
>> 
>> Caused by:
>> org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error
>> from server at http://localhost:8983/solr/nutch: Expected mime type
>> application/octet-stream but got text/html. 
>> 
>> Here in the log i see collection name is nutch but the actual collection
>> name i created is Nutch1.15_Test
>> 
>> Given below is the command used for crawling
>> 
>> bin/nutch solrindex http://10.150.17.32:8983/solr/Nutch1.15_Test
>> crawl/crawldb -linkdb crawl/linkdb crawl/segments/*
>> 
>> 
>> Please suggest any workarounds if available. Thank you
>> 
>> 
>> 
>> --
>> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html



Re: Solr Index Issues

2018-09-10 Thread Erick Erickson
It would be best to ask on the Nutch mailing list, this list doesn't
have very many people who know _how_ Nutch uses Solr though.

Best,
Erick
On Sun, Sep 9, 2018 at 11:47 PM Bineesh  wrote:
>
> Hi Team,
>
> We are using Nutch 1.15 and Solr 6.6.3
>
> We tried crawling one of the URL and and noticed issues while indexing data
> to solr.Below is the capture from logs
>
> Caused by:
> org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error
> from server at http://localhost:8983/solr/nutch: Expected mime type
> application/octet-stream but got text/html. 
>
> Here in the log i see collection name is nutch but the actual collection
> name i created is Nutch1.15_Test
>
> Given below is the command used for crawling
>
> bin/nutch solrindex http://10.150.17.32:8983/solr/Nutch1.15_Test
> crawl/crawldb -linkdb crawl/linkdb crawl/segments/*
>
>
> Please suggest any workarounds if available. Thank you
>
>
>
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Solr Index Issues

2018-09-09 Thread Bineesh
Hi Team,

We are using Nutch 1.15 and Solr 6.6.3

We tried crawling one of the URL and and noticed issues while indexing data
to solr.Below is the capture from logs 

Caused by:
org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error
from server at http://localhost:8983/solr/nutch: Expected mime type
application/octet-stream but got text/html. 

Here in the log i see collection name is nutch but the actual collection
name i created is Nutch1.15_Test

Given below is the command used for crawling

bin/nutch solrindex http://10.150.17.32:8983/solr/Nutch1.15_Test
crawl/crawldb -linkdb crawl/linkdb crawl/segments/*


Please suggest any workarounds if available. Thank you



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: can you migrate solr index files from osx to linux

2018-02-07 Thread Jeff Dyke
I forgot to report back on this.  For anyone that runs into it, you need
the entire data directory not just the index directory, at least that's
what made it work for me.

On Thu, Feb 1, 2018 at 9:52 PM, Erick Erickson 
wrote:

> I think SCP will be fine. Shawn's comment is probably the issue.
>
> Best,
> Erick
>
> On Thu, Feb 1, 2018 at 4:34 PM, Shawn Heisey  wrote:
> > On 2/1/2018 4:32 PM, Jeff Dyke wrote:
> >> I just created a tar file, actually a tar.gz file and scp'd to a
> server, at
> >> first i was worried that the gzip caused issues, but as i mentioned no
> >> errors on start up, and i thought i would see some.  @Erick, how would
> you
> >> recommend.  This is going to be less of an issue b/c i need to build the
> >> index programmatically anyway, but would be nice to know if only for
> >> curiosity.  Perhaps making a replication backup and then restoring on
> the
> >> new server would be better.  In the middle of other things now, will
> try a
> >> few of those, plus some other ideas.
> >
> > I think the problem is that you're copying the index files into
> > ${instanceDir}/data and not ${instanceDir}/data/index.  The index
> > directory is what Solr is actually going to use.
> >
> > Delete everything that already exists in the index directory before
> > putting the files in there.
> >
> > You probably don't need to do a full restart, you could probably just
> > reload the core.
> >
> > Thanks,
> > Shawn
>


Re: can you migrate solr index files from osx to linux

2018-02-01 Thread Erick Erickson
I think SCP will be fine. Shawn's comment is probably the issue.

Best,
Erick

On Thu, Feb 1, 2018 at 4:34 PM, Shawn Heisey  wrote:
> On 2/1/2018 4:32 PM, Jeff Dyke wrote:
>> I just created a tar file, actually a tar.gz file and scp'd to a server, at
>> first i was worried that the gzip caused issues, but as i mentioned no
>> errors on start up, and i thought i would see some.  @Erick, how would you
>> recommend.  This is going to be less of an issue b/c i need to build the
>> index programmatically anyway, but would be nice to know if only for
>> curiosity.  Perhaps making a replication backup and then restoring on the
>> new server would be better.  In the middle of other things now, will try a
>> few of those, plus some other ideas.
>
> I think the problem is that you're copying the index files into
> ${instanceDir}/data and not ${instanceDir}/data/index.  The index
> directory is what Solr is actually going to use.
>
> Delete everything that already exists in the index directory before
> putting the files in there.
>
> You probably don't need to do a full restart, you could probably just
> reload the core.
>
> Thanks,
> Shawn


Re: can you migrate solr index files from osx to linux

2018-02-01 Thread Shawn Heisey
On 2/1/2018 4:32 PM, Jeff Dyke wrote:
> I just created a tar file, actually a tar.gz file and scp'd to a server, at
> first i was worried that the gzip caused issues, but as i mentioned no
> errors on start up, and i thought i would see some.  @Erick, how would you
> recommend.  This is going to be less of an issue b/c i need to build the
> index programmatically anyway, but would be nice to know if only for
> curiosity.  Perhaps making a replication backup and then restoring on the
> new server would be better.  In the middle of other things now, will try a
> few of those, plus some other ideas.

I think the problem is that you're copying the index files into
${instanceDir}/data and not ${instanceDir}/data/index.  The index
directory is what Solr is actually going to use.

Delete everything that already exists in the index directory before
putting the files in there.

You probably don't need to do a full restart, you could probably just
reload the core.

Thanks,
Shawn


Re: can you migrate solr index files from osx to linux

2018-02-01 Thread Jeff Dyke
I just created a tar file, actually a tar.gz file and scp'd to a server, at
first i was worried that the gzip caused issues, but as i mentioned no
errors on start up, and i thought i would see some.  @Erick, how would you
recommend.  This is going to be less of an issue b/c i need to build the
index programmatically anyway, but would be nice to know if only for
curiosity.  Perhaps making a replication backup and then restoring on the
new server would be better.  In the middle of other things now, will try a
few of those, plus some other ideas.

On Thu, Feb 1, 2018 at 4:49 PM, Erick Erickson 
wrote:

> One note, be _very_ sure you copy in binary mode..
>
> On Thu, Feb 1, 2018 at 1:33 PM, Shawn Heisey  wrote:
> > On 2/1/2018 12:56 PM, Jeff Dyke wrote:
> >> That's exactly what i thought as well.  The only difference and i can
> try
> >> to downgrade OSX is 7.2, and i grabbed 7.2.1 for install on Ubuntu.  I
> >> didn't think a point minor point release would matter.
> >>
> >> solr@stagingsolr01:~/data/issuers/data$ ls -1
> >> 981552
> >> index
> >> _mg8.dii
> >> _mg8.dim
> >> _mg8.fdt
> >> _mg8.fdx
> >> _mg8.fnm
> >> _mg8_Lucene50_0.doc
> >> _mg8_Lucene50_0.pos
> >> _mg8_Lucene50_0.tim
> >> _mg8_Lucene50_0.tip
> >> _mg8_Lucene70_0.dvd
> >> _mg8_Lucene70_0.dvm
> >> _mg8.nvd
> >> _mg8.nvm
> >> _mg8.si
> >
> > It's almost a guarantee that the index format in 7.2.0 and 7.2.1 will be
> > identical.  I don't think I've ever heard of an instance where a point
> > release changed the index format.  It does change in minor releases
> > sometimes, though.  If the format were incompatible, there would be
> > errors in the log.
> >
> > Those filenames look correct ... but it looks to me like they are in
> > ${instanceDir}/data ... they would normally be found in the index
> > directory inside the data directory.
> >
> > Thanks,
> > Shawn
> >
>


Re: can you migrate solr index files from osx to linux

2018-02-01 Thread Erick Erickson
One note, be _very_ sure you copy in binary mode..

On Thu, Feb 1, 2018 at 1:33 PM, Shawn Heisey  wrote:
> On 2/1/2018 12:56 PM, Jeff Dyke wrote:
>> That's exactly what i thought as well.  The only difference and i can try
>> to downgrade OSX is 7.2, and i grabbed 7.2.1 for install on Ubuntu.  I
>> didn't think a point minor point release would matter.
>>
>> solr@stagingsolr01:~/data/issuers/data$ ls -1
>> 981552
>> index
>> _mg8.dii
>> _mg8.dim
>> _mg8.fdt
>> _mg8.fdx
>> _mg8.fnm
>> _mg8_Lucene50_0.doc
>> _mg8_Lucene50_0.pos
>> _mg8_Lucene50_0.tim
>> _mg8_Lucene50_0.tip
>> _mg8_Lucene70_0.dvd
>> _mg8_Lucene70_0.dvm
>> _mg8.nvd
>> _mg8.nvm
>> _mg8.si
>
> It's almost a guarantee that the index format in 7.2.0 and 7.2.1 will be
> identical.  I don't think I've ever heard of an instance where a point
> release changed the index format.  It does change in minor releases
> sometimes, though.  If the format were incompatible, there would be
> errors in the log.
>
> Those filenames look correct ... but it looks to me like they are in
> ${instanceDir}/data ... they would normally be found in the index
> directory inside the data directory.
>
> Thanks,
> Shawn
>


Re: can you migrate solr index files from osx to linux

2018-02-01 Thread Shawn Heisey
On 2/1/2018 12:56 PM, Jeff Dyke wrote:
> That's exactly what i thought as well.  The only difference and i can try
> to downgrade OSX is 7.2, and i grabbed 7.2.1 for install on Ubuntu.  I
> didn't think a point minor point release would matter.
>
> solr@stagingsolr01:~/data/issuers/data$ ls -1
> 981552
> index
> _mg8.dii
> _mg8.dim
> _mg8.fdt
> _mg8.fdx
> _mg8.fnm
> _mg8_Lucene50_0.doc
> _mg8_Lucene50_0.pos
> _mg8_Lucene50_0.tim
> _mg8_Lucene50_0.tip
> _mg8_Lucene70_0.dvd
> _mg8_Lucene70_0.dvm
> _mg8.nvd
> _mg8.nvm
> _mg8.si

It's almost a guarantee that the index format in 7.2.0 and 7.2.1 will be
identical.  I don't think I've ever heard of an instance where a point
release changed the index format.  It does change in minor releases
sometimes, though.  If the format were incompatible, there would be
errors in the log.

Those filenames look correct ... but it looks to me like they are in
${instanceDir}/data ... they would normally be found in the index
directory inside the data directory.

Thanks,
Shawn



Re: can you migrate solr index files from osx to linux

2018-02-01 Thread Jeff Dyke
That's exactly what i thought as well.  The only difference and i can try
to downgrade OSX is 7.2, and i grabbed 7.2.1 for install on Ubuntu.  I
didn't think a point minor point release would matter.

solr@stagingsolr01:~/data/issuers/data$ ls -1
981552
index
_mg8.dii
_mg8.dim
_mg8.fdt
_mg8.fdx
_mg8.fnm
_mg8_Lucene50_0.doc
_mg8_Lucene50_0.pos
_mg8_Lucene50_0.tim
_mg8_Lucene50_0.tip
_mg8_Lucene70_0.dvd
_mg8_Lucene70_0.dvm
_mg8.nvd
_mg8.nvm
_mg8.si
_mir.cfe
_mir.cfs
_mir.si
_mjc.dii
_mjc.dim
_mjc.fdt
_mjc.fdx
_mjc.fnm
_mjc_Lucene50_0.doc
_mjc_Lucene50_0.pos
_mjc_Lucene50_0.tim
_mjc_Lucene50_0.tip
_mjc_Lucene70_0.dvd
_mjc_Lucene70_0.dvm
_mjc.nvd
_mjc.nvm
_mjc.si
_mm3.dii
_mm3.dim
_mm3.fdt
_mm3.fdx
_mm3.fnm
_mm3_Lucene50_0.doc
_mm3_Lucene50_0.pos
_mm3_Lucene50_0.tim
_mm3_Lucene50_0.tip
_mm3_Lucene70_0.dvd
_mm3_Lucene70_0.dvm
_mm3.nvd
_mm3.nvm
_mm3.si
_mp6.dii
_mp6.dim
_mp6.fdt
_mp6.fdx
_mp6.fnm
_mp6_Lucene50_0.doc
_mp6_Lucene50_0.pos
_mp6_Lucene50_0.tim
_mp6_Lucene50_0.tip
_mp6_Lucene70_0.dvd
_mp6_Lucene70_0.dvm
_mp6.nvd
_mp6.nvm
_mp6.si
_mre.cfe
_mre.cfs
_mre.si
_mro.cfe
_mro.cfs
_mro.si
_mry.cfe
_mry.cfs
_mry.si
_ms8.cfe
_ms8.cfs
_ms8.si
_msi.cfe
_msi.cfs
_msi.si
_mss.cfe
_mss.cfs
_mss.si
_msu.dii
_msu.dim
_msu.fdt
_msu.fdx
_msu.fnm
_msu_Lucene50_0.doc
_msu_Lucene50_0.pos
_msu_Lucene50_0.tim
_msu_Lucene50_0.tip
_msu_Lucene70_0.dvd
_msu_Lucene70_0.dvm
_msu.nvd
_msu.nvm
_msu.si
_mt2.cfe
_mt2.cfs
_mt2.si
_mta.dii
_mta.dim
_mta.fdt
_mta.fdx
_mta.fnm
_mta_Lucene50_0.doc
_mta_Lucene50_0.pos
_mta_Lucene50_0.tim
_mta_Lucene50_0.tip
_mta_Lucene70_0.dvd
_mta_Lucene70_0.dvm
_mta.nvd
_mta.nvm
_mta.si
_mtc.cfe
_mtc.cfs
_mtc.si
_mtm.cfe
_mtm.cfs
_mtm.si
_mtn.dii
_mtn.dim
_mtn.fdt
_mtn.fdx
_mtn.fnm
_mtn_Lucene50_0.doc
_mtn_Lucene50_0.pos
_mtn_Lucene50_0.tim
_mtn_Lucene50_0.tip
_mtn_Lucene70_0.dvd
_mtn_Lucene70_0.dvm
_mtn.nvd
_mtn.nvm
_mtn.si
_mto.dii
_mto.dim
_mto.fdt
_mto.fdx
_mto.fnm
_mto_Lucene50_0.doc
_mto_Lucene50_0.pos
_mto_Lucene50_0.tim
_mto_Lucene50_0.tip
_mto_Lucene70_0.dvd
_mto_Lucene70_0.dvm
_mto.nvd
_mto.nvm
_mto.si
segments_kll
snapshot_metadata
tlog
write.lock

Thanks,
Jeff

On Thu, Feb 1, 2018 at 1:37 PM, Shawn Heisey  wrote:

> On 2/1/2018 11:14 AM, Jeff Dyke wrote:
>
>> I've been developing locally on OSX and am now going through the process
>> of
>> automating the installation on AWS Ubuntu.  I have created a core, added
>> my
>> fields and then untarred the data directory on my Ubuntu instance,
>> restarted solr (to hopefully reindex), but no documents are seen.
>> Nor are any errors thrown in the logs or at startup.
>>
>> Given the case sensitivity differences between OSX and Linux, could that
>> be
>> a problem?  Are there further steps required, or is just not possible.
>> Granted i'm going to programmatically rebuild the index, but wanted to
>> start here.
>>
>
> The Lucene index format is not determined by the operating system or the
> hardware architecture at all.  Some of the index filenames include one
> uppercase letter in recent versions.  You should be able to copy the index
> directory from any OS to any other OS and have it work, as long as the
> source/destination Solr versions are compatible and the filenames are
> correct.
>
> What version of Solr?  On the Linux side, what are the filenames you see
> in data/index?
>
> Here's a directory listing from one of mine, running Solr 6.6.2. The
> directory layout is nonstandard because I have customized dataDir in the
> core.properties file.
>
> root@bigindy5:/index/solr6/data/data/inc_1/index# ls -1
> _64na.fdt
> _64na.fdx
> _64na.fnm
> _64na_Lucene50_0.doc
> _64na_Lucene50_0.pos
> _64na_Lucene50_0.tim
> _64na_Lucene50_0.tip
> _64na_Lucene54_0.dvd
> _64na_Lucene54_0.dvm
> _64na.nvd
> _64na.nvm
> _64na.si
> _64na.tvd
> _64na.tvx
> _64nb.fdt
> _64nb.fdx
> _64nb.fnm
> _64nb_Lucene50_0.doc
> _64nb_Lucene50_0.pos
> _64nb_Lucene50_0.tim
> _64nb_Lucene50_0.tip
> _64nb_Lucene54_0.dvd
> _64nb_Lucene54_0.dvm
> _64nb.nvd
> _64nb.nvm
> _64nb.si
> _64nb.tvd
> _64nb.tvx
> _64nc.fdt
> _64nc.fdx
> _64nc.fnm
> _64nc_Lucene50_0.doc
> _64nc_Lucene50_0.pos
> _64nc_Lucene50_0.tim
> _64nc_Lucene50_0.tip
> _64nc_Lucene54_0.dvd
> _64nc_Lucene54_0.dvm
> _64nc.nvd
> _64nc.nvm
> _64nc.si
> _64nc.tvd
> _64nc.tvx
> _64nd.fdt
> _64nd.fdx
> _64nd.fnm
> _64nd_Lucene50_0.doc
> _64nd_Lucene50_0.pos
> _64nd_Lucene50_0.tim
> _64nd_Lucene50_0.tip
> _64nd_Lucene54_0.dvd
> _64nd_Lucene54_0.dvm
> _64nd.nvd
> _64nd.nvm
> _64nd.si
> _64nd.tvd
> _64nd.tvx
> _64ne.fdt
> _64ne.fdx
> _64ne.fnm
> _64ne_Lucene50_0.doc
> _64ne_Lucene50_0.pos
> _64ne_Lucene50_0.tim
> _64ne_Lucene50_0.tip
> _64ne_Lucene54_0.dvd
> _64ne_Lucene54_0.dvm
> _64ne.nvd
> _64ne.nvm
> _64ne.si
> _64ne.tvd
> _64ne.tvx
> _64nf.fdt
> _64nf.fdx
> _64nf.fnm
> _64nf_Lucene50_0.doc
> _64nf_Lucene50_0.pos
> _64nf_Lucene50_0.tim
> _64nf_Lucene50_0.tip
> _64nf_Lucene54_0.dvd
> _64nf_Lucene54_0.dvm
> _64nf.nvd
> _64nf.nvm
> _64nf.si
> _64nf.tvd
> _64nf.tvx
> _64ng.fdt
> _64ng.fdx
> _64ng.fnm
> _64ng_Lucene50_0.doc
> _64ng_Lucene50

  1   2   3   4   5   6   7   8   9   10   >