Re: Need an advice for architecture.

2018-07-19 Thread Chris Hostetter


: FWIW: I used the script below to build myself 3.8 million documents, with 
: 300 "text fields" consisting of anywhere from 1-10 "words" (integers 
: between 1 and 200)

Whoops ... forgot to post the script...


#!/usr/bin/perl

use strict;
use warnings;

my $num_docs = 3_800_000;
my $max_words_in_field = 10;
my $words_in_vocab = 200;
my $num_fields = 300;

# header
print "id";
map { print ",${_}_t" } 1..$num_fields;
print "\n";

while ($num_docs--) {
print "$num_docs"; # uniqueKey
for (1..$num_fields) {
my $words_in_field = int(rand($max_words_in_field));
print ",\"";
map { print int(rand($words_in_vocab)) . " " } 0..$words_in_field;
print "\"";
}
print "\n";
}




Re: Need an advice for architecture.

2018-07-19 Thread Chris Hostetter


: SQL DB 4M documents with up to 5000 metadata fields each document [2xXeon
: 2.1Ghz, 32GB RAM]
: Actual Solr: 1 Core version 4.6, 3.8M documents, schema has 300 metadata
: fields to import, size 3.6GB [2xXeon 2.4Ghz, 32GB RAM]
: (atm we need 35h to build the index and about 24h for a mass update which
: affects the production)

The first question i have is why you are using a version of Solr that's 
almost 5 years old.

The second question you should consider is what your indexing process 
looks like, and whether it's multithreaded or not, and if the bottleneck 
is your network/DB.

The third question to consider is your solr configuration / schema: how 
complex the solr side indexing process is -- ie: are these 300 fields all 
TextFields with complex analyzers?

FWIW: I used the script below to build myself 3.8 million documents, with 
300 "text fields" consisting of anywhere from 1-10 "words" (integers 
between 1 and 200)

The resulting CSV file was 24GB, and using a simple curl command to index 
with a single client thread (and a single solr thread) against the solr 
7.4 running with the sample techproducts configs took less then 2 hours on 
my laptop (less CPU & half as much ram compared to your server) while i 
was doing other stuff.

(I would bet your current indexing speed has very little to do with solr 
and is largey a factor of your source DB and how you are sending the data 
to solr)


-Hoss
http://www.lucidworks.com/


Re: Document Count Difference Between Solr Versions 4.7 and 7.3

2018-07-19 Thread Chris Hostetter



: I performed a bulk reindex against one of our larger databases for the first
: time using solr 7.3. The document count was substantially less (like at
: least 15% less) than our most recent bulk reindex from th previous solr 4.7
: server. I will perform a more careful analysis, but I am assuming the
: document count should not be different against the same database, even
: accounting for the schema updates required for going from 4.7 to 7.3.

Was the exact same souce data used in both cases?  ... you mentioned "most 
recent bulk reindex" but it's not clear if the source data changed since 
that last index job.

what does your bulk indexing code look like? does it log errors from solr?

were there any errors in the solr logs?


-Hoss
http://www.lucidworks.com/


RE: SOLR 7.2.1 on SLES 11?

2018-07-19 Thread Lichte, Lucas R - DHS (Tek Systems)
Welp, that didn't go spectacularly.  All the OpenSuSE SLES 11 downloads are 
RPM, both source and compiled.  Non-relocatable.  I did attempt to rebuild, but 
it choked on the following dependencies:

audit-devel is needed by bash-4.3-286.1.x86_64
fdupes is needed by bash-4.3-286.1.x86_64
patchutils is needed by bash-4.3-286.1.x86_64

If I can find a repository for them I can throw that into Zypper, but thus far 
I've failed.  Anyone out there have any suggestions?

-Original Message-
From: Lichte, Lucas R - DHS (Tek Systems) 
[mailto:lucas.lic...@dhs.wisconsin.gov] 
Sent: Wednesday, July 11, 2018 3:12 PM
To: solr-user@lucene.apache.org
Subject: RE: SOLR 7.2.1 on SLES 11?

Thanks for the head's up on that bug, it looks like we'll be doing some script 
editing either way.  I think 1 is the most popular with the team at this point, 
but I'll take the temperature and see how people feel.

-Original Message-
From: Shawn Heisey [mailto:apa...@elyograg.org] 
Sent: Wednesday, July 11, 2018 2:04 PM
To: solr-user@lucene.apache.org
Subject: Re: SOLR 7.2.1 on SLES 11?

On 7/11/2018 12:09 PM, Lichte, Lucas R - DHS (Tek Systems) wrote:
> Hello, we're trying to get SOLR 7.2.1 running on SLES 11 but we hit issues 
> with BASH 3 and the ${distro_string,,} at the beginning of the 
> install_solr_service.sh.  We're just trying to get this upgraded without 
> tossing out the old DB serves so we can get the content team happy and move 
> on to redesigning the environment.  We're wondering if anyone else has hit 
> this, and if they have any lessons learned.
>
> As we see it, there's a few options:
>
> 1.   Install OpenSUSE BASH 4, maybe in /opt
>
> 2.   Update the lowercase method to something from BASH 3 ( pipe to tr?)
>
> 3.   Do this by hand without the install_solr_service.sh
>
> 4.   Build new Redhat servers, migrate the DB and nuke these things.

Both bash 4 and SLES 11 are more than nine years old.  Upgrades are
definitely recommended.

The option that might be fastest is the second one you've presented --
changing anything in the scripts that requires bash 4 so it's compatible
with bash 3.  If you're comfortable with modifying a shell script in
this way, this is a good option.

The first option is probably a little bit safer -- install bash 4, and
make sure that this is the version used when installing and when
starting Solr.  That could be a PATH adjustment, or changing the shebang
in each script.

There is another issue you're going to need to deal with on SLES.  A fix
for this issue has not been committed to the source repository:

https://secure-web.cisco.com/1t8VBNgY_sYJsqMF0W7q4JFwbT7oK6SKtn6P7g6r3FhhNbrIOZEfCoZsmsAi3v22fJ1oXP7lOSwU6SNv1nCeY9u6V-zUCAYo6hVkHGu78vrtg3CJ8vy0AUnEkx0qsrV_tlSOejpFw2cFEYcYHllu8JO6rFCBDVOlGU-vEnR59YvzuL38hOD3qg62rO_i-g-JrT2BRLaZeieXUwhOUBmr85Ucz7nPlLxDSr935AXGdPQvoZmPurfOlY2Q0HFTG9fetjkv0Q0lOSefrwM5h1wR3cQ/https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FSOLR-11853

Thanks,
Shawn




Re: CDCR documentation typo

2018-07-19 Thread Erick Erickson
Thanks, but I think that section has been reworked, that typo isn't in
the current documentation. It's doubtful that we'll re-release that
reference guide.

Best,
Erick

On Thu, Jul 19, 2018 at 3:14 AM, Yair Yotam  wrote:
> Hi,
>
> CDCR documentation page for v 7.1:
> https://lucene.apache.org/solr/guide/7_1/cross-data-center-replication-cdcr.html
>
> Contains a typo in "real world" scenario section - solrconfig.xml:
> Target & Source should be lowercase.
>
> Using this configuration as reference will result in a generic none
> informative exception.
>
> Regards,
> Yair


Re: CDCR documentation typo

2018-07-19 Thread Alexandre Rafalovitch
Thank you for sharing this with others. For documentation, it looks
like it had been refactored and fixed already:
https://lucene.apache.org/solr/guide/7_4/cdcr-config.html

Regards,
   Alex.

On 19 July 2018 at 06:14, Yair Yotam  wrote:
> Hi,
>
> CDCR documentation page for v 7.1:
> https://lucene.apache.org/solr/guide/7_1/cross-data-center-replication-cdcr.html
>
> Contains a typo in "real world" scenario section - solrconfig.xml:
> Target & Source should be lowercase.
>
> Using this configuration as reference will result in a generic none
> informative exception.
>
> Regards,
> Yair


Re: Need an advice for architecture.

2018-07-19 Thread Walter Underwood
Are you doing a commit after every document? Is the index on local disk?

That is very slow indexing. With four shards and smaller documents, we can 
index about a million documents per minute.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)

> On Jul 19, 2018, at 1:28 AM, Emir Arnautović  
> wrote:
> 
> Hi Francois,
> If I got your numbers right, you are indexing on a single server and indexing 
> rate is ~31 doc/s. I would first check if something is wrong with indexing 
> logic. You check where the bottleneck is: do you read documents from DB fast 
> enough, do you batch documents…
> Assuming you cannot have better rate than 30 doc/s and that bottleneck is 
> Solr, in order to finish it in 6h, you need to parallelise indexing on Solr 
> by splitting index to ~6 servers and have overall indexing rate of 180 doc/s.
> 
> Thanks,
> Emir
> --
> Monitoring - Log Management - Alerting - Anomaly Detection
> Solr & Elasticsearch Consulting Support Training - http://sematext.com/
> 
> 
> 
>> On 19 Jul 2018, at 09:59, servus01  wrote:
>> 
>> Would like to ask what your recommendations are for a new performant Solr
>> architecture. 
>> 
>> SQL DB 4M documents with up to 5000 metadata fields each document [2xXeon
>> 2.1Ghz, 32GB RAM]
>> Actual Solr: 1 Core version 4.6, 3.8M documents, schema has 300 metadata
>> fields to import, size 3.6GB [2xXeon 2.4Ghz, 32GB RAM]
>> (atm we need 35h to build the index and about 24h for a mass update which
>> affects the production)
>> 
>> Building the index should be less than 6h. Sometimes we change some of the
>> Metadata fields which affects most of the documents and therefore a
>> massupdate / reindex is necessary. Reindex is ok also for about 6h (night)
>> but should not have an impact to user queries. Anyway, every faster indexing
>> is very welcome. We will have max. 20 - 30 CCUser.
>> 
>> So i asked myself. How many nodes, Lshards, replicas ect. Could someone
>> please give me recommendation for a fast working architecture. 
>> 
>> really appreciate this, best
>> 
>> Francois 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> --
>> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
> 



Re: Document Count Difference Between Solr Versions 4.7 and 7.3

2018-07-19 Thread David Hastings
monitor the logging on the admin interface while indexing.  also make sure
to add a commit when done to get the docs in the collection before
comparing the document counts

On Thu, Jul 19, 2018 at 10:30 AM, THADC 
wrote:

> Hi,
>
> I performed a bulk reindex against one of our larger databases for the
> first
> time using solr 7.3. The document count was substantially less (like at
> least 15% less) than our most recent bulk reindex from th previous solr 4.7
> server. I will perform a more careful analysis, but I am assuming the
> document count should not be different against the same database, even
> accounting for the schema updates required for going from 4.7 to 7.3.
>
> Any response appreciated. Thank you.
>
>
>
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>


Document Count Difference Between Solr Versions 4.7 and 7.3

2018-07-19 Thread THADC
Hi,

I performed a bulk reindex against one of our larger databases for the first
time using solr 7.3. The document count was substantially less (like at
least 15% less) than our most recent bulk reindex from th previous solr 4.7
server. I will perform a more careful analysis, but I am assuming the
document count should not be different against the same database, even
accounting for the schema updates required for going from 4.7 to 7.3.

Any response appreciated. Thank you.



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Solr Nodes Killed During a ReIndexing Process on New VMs Out of Memory Error

2018-07-19 Thread THADC
Thanks, made heap size considerably larger and its fine now. Thank you



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


CDCR documentation typo

2018-07-19 Thread Yair Yotam
Hi,

CDCR documentation page for v 7.1:
https://lucene.apache.org/solr/guide/7_1/cross-data-center-replication-cdcr.html

Contains a typo in "real world" scenario section - solrconfig.xml:
Target & Source should be lowercase.

Using this configuration as reference will result in a generic none
informative exception.

Regards,
Yair


Problem in QueryElevationComponent with solr 7.4.0

2018-07-19 Thread nc-tech-user
Hello.


We are using solr 6.6.2 and want to upgrade it to version 7.4.0.

But we have a problem with QueryElevationComponent when adding parameter 
"elevateIds=..." and "fl=[elevated]"


Expample of query 
/solr/products/select?omitHeader=true=1,2,3,4,5=*:*=0=20=id,[elevated]=true=category_1_id_is:123​=true​

and in response we've got http error 500 with such stacktrace


java.lang.AssertionError: Expected an IndexableField but got: class 
java.lang.String
at 
org.apache.solr.response.transform.BaseEditorialTransformer.getKey(BaseEditorialTransformer.java:72)
at 
org.apache.solr.response.transform.BaseEditorialTransformer.transform(BaseEditorialTransformer.java:52)
at org.apache.solr.response.DocsStreamer.next(DocsStreamer.java:123)
at org.apache.solr.response.DocsStreamer.next(DocsStreamer.java:59)
at 
org.apache.solr.response.TextResponseWriter.writeDocuments(TextResponseWriter.java:276)
at 
org.apache.solr.response.TextResponseWriter.writeVal(TextResponseWriter.java:162)
at 
org.apache.solr.response.JSONWriter.writeNamedListAsMapWithDups(JSONResponseWriter.java:209)
at 
org.apache.solr.response.JSONWriter.writeNamedList(JSONResponseWriter.java:325)
at 
org.apache.solr.response.JSONWriter.writeResponse(JSONResponseWriter.java:120)
at org.apache.solr.response.JSONResponseWriter.write(JSONResponseWriter.java:71)
at 
org.apache.solr.response.QueryResponseWriterUtil.writeQueryResponse(QueryResponseWriterUtil.java:65)
at org.apache.solr.servlet.HttpSolrCall.writeResponse(HttpSolrCall.java:787)
at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:524)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:377)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:323)
at 
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1634)
at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:533)
at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:146)
at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)
at 
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)
at 
org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:257)
at 
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1595)
at 
org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:255)
at 
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1253)
at 
org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:203)
at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:473)
at 
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1564)
at 
org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:201)
at 
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1155)
at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:144)
at 
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:219)
at 
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:126)
at 
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)
at 
org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(RewriteHandler.java:335)
at 
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)
at org.eclipse.jetty.server.Server.handle(Server.java:531)
at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:352)
at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:260)
at 
org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:281)
at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:102)
at org.eclipse.jetty.io.ChannelEndPoint$2.run(ChannelEndPoint.java:118)
at 
org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:333)
at 
org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:310)
at 
org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:168)
at 
org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:126)
at 
org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:366)
at 
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:760)
at 
org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:678)
at java.lang.Thread.run(Thread.java:748)


Configuration in solrconf.xml of select request handler is




explicit
10




query
facet
stats
debug
spellcheck
elevator



And the elevator component config is



string
elevate.xml






Re: Need an advice for architecture.

2018-07-19 Thread Emir Arnautović
Hi Francois,
If I got your numbers right, you are indexing on a single server and indexing 
rate is ~31 doc/s. I would first check if something is wrong with indexing 
logic. You check where the bottleneck is: do you read documents from DB fast 
enough, do you batch documents…
Assuming you cannot have better rate than 30 doc/s and that bottleneck is Solr, 
in order to finish it in 6h, you need to parallelise indexing on Solr by 
splitting index to ~6 servers and have overall indexing rate of 180 doc/s.

Thanks,
Emir
--
Monitoring - Log Management - Alerting - Anomaly Detection
Solr & Elasticsearch Consulting Support Training - http://sematext.com/



> On 19 Jul 2018, at 09:59, servus01  wrote:
> 
> Would like to ask what your recommendations are for a new performant Solr
> architecture. 
> 
> SQL DB 4M documents with up to 5000 metadata fields each document [2xXeon
> 2.1Ghz, 32GB RAM]
> Actual Solr: 1 Core version 4.6, 3.8M documents, schema has 300 metadata
> fields to import, size 3.6GB [2xXeon 2.4Ghz, 32GB RAM]
> (atm we need 35h to build the index and about 24h for a mass update which
> affects the production)
> 
> Building the index should be less than 6h. Sometimes we change some of the
> Metadata fields which affects most of the documents and therefore a
> massupdate / reindex is necessary. Reindex is ok also for about 6h (night)
> but should not have an impact to user queries. Anyway, every faster indexing
> is very welcome. We will have max. 20 - 30 CCUser.
> 
> So i asked myself. How many nodes, Lshards, replicas ect. Could someone
> please give me recommendation for a fast working architecture. 
> 
> really appreciate this, best
> 
> Francois 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html



Need an advice for architecture.

2018-07-19 Thread servus01
Would like to ask what your recommendations are for a new performant Solr
architecture. 

SQL DB 4M documents with up to 5000 metadata fields each document [2xXeon
2.1Ghz, 32GB RAM]
Actual Solr: 1 Core version 4.6, 3.8M documents, schema has 300 metadata
fields to import, size 3.6GB [2xXeon 2.4Ghz, 32GB RAM]
(atm we need 35h to build the index and about 24h for a mass update which
affects the production)

Building the index should be less than 6h. Sometimes we change some of the
Metadata fields which affects most of the documents and therefore a
massupdate / reindex is necessary. Reindex is ok also for about 6h (night)
but should not have an impact to user queries. Anyway, every faster indexing
is very welcome. We will have max. 20 - 30 CCUser.

So i asked myself. How many nodes, Lshards, replicas ect. Could someone
please give me recommendation for a fast working architecture. 

really appreciate this, best

Francois 









--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html