Re: slow indexing when keys are verious

2017-04-11 Thread moscovig
Hi all

We have changed all solr configs and commit parameters that were mentioned
by Shawn,
but still - when inserting the same 300 documents from 20 threads we see no
latency
and when inserting different 300 docs from 20 threads it is very slow and no
cpu/ram/disk/network are showing high metrics.

I am wondering if the problem might be related to the fact that when
inserting different 300 docs from each thread,
the key is the only field that varied whilst the other fields are identical.
So maybe many same values on the other fields for different keys cause the
latency? 

As for latency that is related to doc routing, I don't see where it can
affect us. Is it the zookeeper that might become a bottleneck? 

Thanks!
Gilad




--
View this message in context: 
http://lucene.472066.n3.nabble.com/slow-indexing-when-keys-are-verious-tp4327681p4329451.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Using BasicAuth with SolrJ Code

2017-04-11 Thread Noble Paul
can u paste the stacktrace here

On Tue, Apr 11, 2017 at 1:19 PM, Zheng Lin Edwin Yeo
 wrote:
> I found from StackOverflow  that we should declare it this way:
> http://stackoverflow.com/questions/43335419/using-basicauth-with-solrj-code
>
>
> SolrRequest req = new QueryRequest(new SolrQuery("*:*"));//create a new
> request object
> req.setBasicAuthCredentials(userName, password);
> solrClient.request(req);
>
> Is that correct?
>
> For this, the NullPointerException is not coming out, but the SolrJ is
> still not able to get authenticated. I'm still getting Error Code 401 even
> after putting in this code.
>
> Any advice on which part of the SolrJ code should we place this code in?
>
> Regards,
> Edwin
>
>
> On 10 April 2017 at 23:50, Zheng Lin Edwin Yeo  wrote:
>
>> Hi,
>>
>> I have just set up the Basic Authentication Plugin in Solr 6.4.2 on
>> SolrCloud, and I am trying to modify my SolrJ code so that the code can go
>> through the authentication and do the indexing.
>>
>> I tried using the following code from the Solr Documentation
>> https://cwiki.apache.org/confluence/display/solr/Basic+Authentication+
>> Plugin.
>>
>> SolrRequest req ;//create a new request object
>> req.setBasicAuthCredentials(userName, password);
>> solrClient.request(req);
>>
>> However, the code complains that the req is not initialized.
>>
>> If I initialized it, it will be initialize as null.
>>
>> SolrRequest req = null;//create a new request object
>> req.setBasicAuthCredentials(userName, password);
>> solrClient.request(req);
>>
>> This will caused a null pointer exception.
>> Exception in thread "main" java.lang.NullPointerException
>>
>> How should we go about putting these codes, so that the error can be
>> prevented?
>>
>> Regards,
>> Edwin
>>
>>



-- 
-
Noble Paul


Re: Solr 6.4. Can't index MS Visio vsdx files

2017-04-11 Thread Gytis Mikuciunas
when 1.15 will be released? maybe you have some beta version and I could
test it :)

SAX sounds interesting, and from info that I found in google it could solve
my issues.

On Tue, Apr 11, 2017 at 10:48 PM, Allison, Timothy B. 
wrote:

> It depends.  We've been trying to make parsers more, erm, flexible, but
> there are some problems from which we cannot recover.
>
> Tl;dr there isn't a short answer.  :(
>
> My sense is that DIH/ExtractingDocumentHandler is intended to get people
> up and running with Solr easily but it is not really a great idea for
> production.  See Erick's gem: https://lucidworks.com/2012/
> 02/14/indexing-with-solrj/
>
> As for the Tika portion... at the very least, Tika _shouldn't_ cause the
> ingesting process to crash.  At most, it should fail at the file level and
> not cause greater havoc.  In practice, if you're processing millions of
> files from the wild, you'll run into bad behavior and need to defend
> against permanent hangs, oom, memory leaks.
>
> Also, at the least, if there's an exception with an embedded file, Tika
> should catch it and keep going with the rest of the file.  If this doesn't
> happen let us know!  We are aware that some types of embedded file stream
> problems were causing parse failures on the entire file, and we now catch
> those in Tika 1.15-SNAPSHOT and don't let them percolate up through the
> parent file (they're reported in the metadata though).
>
> Specifically for your stack traces:
>
> For your initial problem with the missing class exceptions -- I thought we
> used to catch those in docx and log them.  I haven't been able to track
> this down, though.  I can look more if you have a need.
>
> For "Caused by: org.apache.poi.POIXMLException: Invalid 'Row_Type' name
> 'PolylineTo' ", this problem might go away if we implemented a pure SAX
> parser for vsdx.  We just did this for docx and pptx (coming in 1.15) and
> these are more robust to variation because they aren't requiring a match
> with the ooxml schema.  I haven't looked much at vsdx, but that _might_
> help.
>
> For "TODO Support v5 Pointers", this isn't supported and would require
> contributions.  However, I agree that POI shouldn't throw a Runtime
> exception.  Perhaps open an issue in POI, or maybe we should catch this
> special example at the Tika level?
>
> For "Caused by: java.lang.ArrayIndexOutOfBoundsException:", the POI team
> _might_ be able to modify the parser to ignore a stream if there's an
> exception, but that's often a sign that something needs to be fixed with
> the parser.  In short, the solution will come from POI.
>
> Best,
>
>  Tim
>
> -Original Message-
> From: Gytis Mikuciunas [mailto:gyt...@gmail.com]
> Sent: Tuesday, April 11, 2017 1:56 PM
> To: solr-user@lucene.apache.org
> Subject: RE: Solr 6.4. Can't index MS Visio vsdx files
>
> Thanks for your responses.
> Are there any posibilities to ignore parsing errors and continue indexing?
> because now solr/tika stops parsing whole document if it finds any
> exception
>
> On Apr 11, 2017 19:51, "Allison, Timothy B."  wrote:
>
> > You might want to drop a note to the dev or user's list on Apache POI.
> >
> > I'm not extremely familiar with the vsd(x) portion of our code base.
> >
> > The first item ("PolylineTo") may be caused by a mismatch btwn your
> > doc and the ooxml spec.
> >
> > The second item appears to be an unsupported feature.
> >
> > The third item may be an area for improvement within our codebase...I
> > can't tell just from the stacktrace.
> >
> > You'll probably get more helpful answers over on POI.  Sorry, I can't
> > help with this...
> >
> > Best,
> >
> >Tim
> >
> > P.S.
> > >  3.1. ooxml-schemas-1.3.jar instead of poi-ooxml-schemas-3.15.jar
> >
> > You shouldn't need both. Ooxml-schemas-1.3.jar should be a super set
> > of poi-ooxml-schemas-3.15.jar
> >
> >
> >
>


NonRepeatableRequestException Error during indexing after setting up Basic Authentication

2017-04-11 Thread Zheng Lin Edwin Yeo
Hi,

I'm getting an error with indexing using SolrJ after setting up the Basic
Authentication with the following code.

Credentials defaultcreds = new UsernamePasswordCredentials("id",
"password");
appendAuthentication(defaultcreds, "BASIC", solr);

private static void appendAuthentication(Credentials credentials, String
authPolicy, SolrClient solrClient) {
//  if (isHttpSolrClient(solrClient)) {
  HttpSolrClient httpSolrClient = (HttpSolrClient) solrClient;

//   if (credentials != null && StringUtils.isNotBlank(authPolicy)
// && assertHttpClientInstance(httpSolrClient.getHttpClient())) {
   AbstractHttpClient httpClient = (AbstractHttpClient)
httpSolrClient.getHttpClient();
   httpClient.getCredentialsProvider().setCredentials(new
AuthScope(AuthScope.ANY), credentials);
   httpClient.getParams().setParameter(AuthPNames.TARGET_AUTH_PREF,
Arrays.asList(authPolicy));
//   }
//  }
}


The is the error message that I got.

org.apache.solr.client.solrj.SolrServerException: IOException occured when
talking to server at: http://localhost:8983/edm/chinaSapSo
at
org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:624)
at
org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:279)
at
org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:268)
at org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:149)
at org.apache.solr.client.solrj.SolrClient.add(SolrClient.java:106)
at org.apache.solr.client.solrj.SolrClient.add(SolrClient.java:71)
at org.apache.solr.client.solrj.SolrClient.add(SolrClient.java:85)
at testing.indexing(testing.java:2848)
at testing.main(testing.java:265)
Caused by: org.apache.http.client.ClientProtocolException
at
org.apache.http.impl.client.AbstractHttpClient.doExecute(AbstractHttpClient.java:839)
at
org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:83)
at
org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:56)
at
org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:515)
... 8 more
Caused by: org.apache.http.client.NonRepeatableRequestException: Cannot
retry request with a non-repeatable request entity.
at
org.apache.http.impl.client.DefaultRequestDirector.tryExecute(DefaultRequestDirector.java:662)
at
org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:486)
at
org.apache.http.impl.client.AbstractHttpClient.doExecute(AbstractHttpClient.java:835)
... 11 more


The error occurs at the part where I do the adding of the index to Solr

solr.add(batch);


This is what is defined for "solr".

static SolrClient solr;
solr = new HttpSolrClient( SOLR_URL );


What could be the reason for this? Is there anything wrong with my code?
I'm using SolrCloud on Solr 6.4.2.

Regards,
Edwin


Re: Long GC pauses while reading Solr docs using Cursor approach

2017-04-11 Thread Walter Underwood
JVM version? We’re running v8 update 121 with the G1 collector and it is 
working really well. We also have an 8GB heap.

Graph your heap usage. You’ll see a sawtooth shape, where it grows, then there 
is a major GC. The maximum of the base of the sawtooth is the working set of 
heap that your Solr installation needs. Set the heap to that value, plus a 
gigabyte or so. We run with a 2GB eden (new space) because so much of Solr’s 
allocations have a lifetime of one request. So, the base of the sawtooth, plus 
a gigabyte breathing room, plus two more for eden. That should work.

I don’t set all the ratios and stuff. When were running CMS, I set a size for 
the heap and a size for the new space. Done. With G1, I don’t even get that 
fussy.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)


> On Apr 11, 2017, at 8:22 PM, Shawn Heisey  wrote:
> 
> On 4/11/2017 2:56 PM, Chetas Joshi wrote:
>> I am using Solr (5.5.0) on HDFS. SolrCloud of 80 nodes. Sold collection
>> with number of shards = 80 and replication Factor=2
>> 
>> Sold JVM heap size = 20 GB
>> solr.hdfs.blockcache.enabled = true
>> solr.hdfs.blockcache.direct.memory.allocation = true
>> MaxDirectMemorySize = 25 GB
>> 
>> I am querying a solr collection with index size = 500 MB per core.
> 
> I see that you and I have traded messages before on the list.
> 
> How much total system memory is there per server?  How many of these
> 500MB cores are on each server?  How many docs are in a 500MB core?  The
> answers to these questions may affect the other advice that I give you.
> 
>> The off-heap (25 GB) is huge so that it can load the entire index.
> 
> I still know very little about how HDFS handles caching and memory.  You
> want to be sure that as much data as possible from your indexes is
> sitting in local memory on the server.
> 
>> Using cursor approach (number of rows = 100K), I read 2 fields (Total 40
>> bytes per solr doc) from the Solr docs that satisfy the query. The docs are 
>> sorted by "id" and then by those 2 fields.
>> 
>> I am not able to understand why the heap memory is getting full and Full
>> GCs are consecutively running with long GC pauses (> 30 seconds). I am
>> using CMS GC.
> 
> A 20GB heap is quite large.  Do you actually need it to be that large? 
> If you graph JVM heap usage over a long period of time, what are the low
> points in the graph?
> 
> A result containing 100K docs is going to be pretty large, even with a
> limited number of fields.  It is likely to be several megabytes.  It
> will need to be entirely built in the heap memory before it is sent to
> the client -- both as Lucene data structures (which will probably be
> much larger than the actual response due to Java overhead) and as the
> actual response format.  Then it will be garbage as soon as the response
> is done.  Repeat this enough times, and you're going to go through even
> a 20GB heap pretty fast, and need a full GC.  Full GCs on a 20GB heap
> are slow.
> 
> You could try switching to G1, as long as you realize that you're going
> against advice from Lucene experts but honestly, I do not expect
> this to really help, because you would probably still need full GCs due
> to the rate that garbage is being created.  If you do try it, I would
> strongly recommend the latest Java 8, either Oracle or OpenJDK.  Here's
> my wiki page where I discuss this:
> 
> https://wiki.apache.org/solr/ShawnHeisey#G1_.28Garbage_First.29_Collector
> 
> Reducing the heap size (which may not be possible -- need to know the
> answer to the question about memory graphing) and reducing the number of
> rows per query are the only quick solutions I can think of.
> 
> Thanks,
> Shawn
> 



Re: Long GC pauses while reading Solr docs using Cursor approach

2017-04-11 Thread Shawn Heisey
On 4/11/2017 2:56 PM, Chetas Joshi wrote:
> I am using Solr (5.5.0) on HDFS. SolrCloud of 80 nodes. Sold collection
> with number of shards = 80 and replication Factor=2
>
> Sold JVM heap size = 20 GB
> solr.hdfs.blockcache.enabled = true
> solr.hdfs.blockcache.direct.memory.allocation = true
> MaxDirectMemorySize = 25 GB
>
> I am querying a solr collection with index size = 500 MB per core.

I see that you and I have traded messages before on the list.

How much total system memory is there per server?  How many of these
500MB cores are on each server?  How many docs are in a 500MB core?  The
answers to these questions may affect the other advice that I give you.

> The off-heap (25 GB) is huge so that it can load the entire index.

I still know very little about how HDFS handles caching and memory.  You
want to be sure that as much data as possible from your indexes is
sitting in local memory on the server.

> Using cursor approach (number of rows = 100K), I read 2 fields (Total 40
> bytes per solr doc) from the Solr docs that satisfy the query. The docs are 
> sorted by "id" and then by those 2 fields.
>
> I am not able to understand why the heap memory is getting full and Full
> GCs are consecutively running with long GC pauses (> 30 seconds). I am
> using CMS GC.

A 20GB heap is quite large.  Do you actually need it to be that large? 
If you graph JVM heap usage over a long period of time, what are the low
points in the graph?

A result containing 100K docs is going to be pretty large, even with a
limited number of fields.  It is likely to be several megabytes.  It
will need to be entirely built in the heap memory before it is sent to
the client -- both as Lucene data structures (which will probably be
much larger than the actual response due to Java overhead) and as the
actual response format.  Then it will be garbage as soon as the response
is done.  Repeat this enough times, and you're going to go through even
a 20GB heap pretty fast, and need a full GC.  Full GCs on a 20GB heap
are slow.

You could try switching to G1, as long as you realize that you're going
against advice from Lucene experts but honestly, I do not expect
this to really help, because you would probably still need full GCs due
to the rate that garbage is being created.  If you do try it, I would
strongly recommend the latest Java 8, either Oracle or OpenJDK.  Here's
my wiki page where I discuss this:

https://wiki.apache.org/solr/ShawnHeisey#G1_.28Garbage_First.29_Collector

Reducing the heap size (which may not be possible -- need to know the
answer to the question about memory graphing) and reducing the number of
rows per query are the only quick solutions I can think of.

Thanks,
Shawn



Re: Deleting a field in schema.xml, reindex needed?

2017-04-11 Thread Shawn Heisey
On 4/11/2017 2:19 PM, Scruggs, Matt wrote:
> I’m updating our schema.xml file with 1 change: deleting a field. 
>
> Do I need to re-index all of my documents in Solr, or can I simply reload my 
> collection config by calling:
>
> http://mysolrhost:8000/solr/admin/collections?action=RELOAD&name=mycollection

Deleting a field won't require a reindex, but any data in that field
will remain in your index until you do.  This probably can affect the
performance of the index, but unless you're running with insufficient
resources, you may not even notice.

Thanks,
Shawn



Re: Expiry of Basic Authentication Plugin

2017-04-11 Thread Zheng Lin Edwin Yeo
Hi Jordi,

Thanks for the advice.

Regards,
Edwin


On 11 April 2017 at 18:27, Jordi Domingo Borràs 
wrote:

> Browsers retain basic auth information. You have to close it or clean
> browsing history. You can also change the user password at server side.
>
> Best
>
> On Tue, Apr 11, 2017 at 7:18 AM, Zheng Lin Edwin Yeo  >
> wrote:
>
> > Anyone has any idea if the authentication will expired automatically?
> Mine
> > has already been authenticated for more than 20 hours, and it has not
> auto
> > logged out yet.
> >
> > Regards,
> > Edwin
> >
> > On 11 April 2017 at 00:19, Zheng Lin Edwin Yeo 
> > wrote:
> >
> > > Hi,
> > >
> > > Would like to check, after I have entered the authentication to access
> > > Solr with Basic Authentication Plugin, will the authentication be
> expired
> > > automatically after a period of time?
> > >
> > > I'm using SolrCloud on Solr 6.4.2
> > >
> > > Regards,
> > > Edwin
> > >
> >
>


Re: Deleting a field in schema.xml, reindex needed?

2017-04-11 Thread Walter Underwood
When I have done this, it is in multiple steps.

1. Change the indexing so that no data is going to that field.
2. Reindex, so the field is empty.
3. Remove the field from the schema.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)


> On Apr 11, 2017, at 3:10 PM, Markus Jelsma  wrote:
> 
> Hi - We did this on one occasion and Solr started complaining in the logs 
> about a field that is present but not defined. We thought the problem would 
> go away within 30 days - the time within every document is reindexed or 
> deleted - but it did not, for some reason. Forcing a merge did not solve the 
> warnings, although i thought it should.
> 
> So we decided to delete everything, reindex in a standby index and get that 
> one back online. But don't worry, it are just warnings, everything worked 
> well and nothing failed. Maybe our 30 days reindexing strategy failed at that 
> point, or i didn't wait the for exactly 30 days.
> 
> Regards,
> Markus
> 
> 
> 
> -Original message-
>> From:Scruggs, Matt 
>> Sent: Tuesday 11th April 2017 23:59
>> To: solr-user@lucene.apache.org
>> Subject: Deleting a field in schema.xml, reindex needed?
>> 
>> I’m updating our schema.xml file with 1 change: deleting a field. 
>> 
>> Do I need to re-index all of my documents in Solr, or can I simply reload my 
>> collection config by calling:
>> 
>> http://mysolrhost:8000/solr/admin/collections?action=RELOAD&name=mycollection
>> 
>> 
>> Thanks,
>> Matt
>> 
>> 



RE: Deleting a field in schema.xml, reindex needed?

2017-04-11 Thread Markus Jelsma
Hi - We did this on one occasion and Solr started complaining in the logs about 
a field that is present but not defined. We thought the problem would go away 
within 30 days - the time within every document is reindexed or deleted - but 
it did not, for some reason. Forcing a merge did not solve the warnings, 
although i thought it should.

So we decided to delete everything, reindex in a standby index and get that one 
back online. But don't worry, it are just warnings, everything worked well and 
nothing failed. Maybe our 30 days reindexing strategy failed at that point, or 
i didn't wait the for exactly 30 days.

Regards,
Markus

 
 
-Original message-
> From:Scruggs, Matt 
> Sent: Tuesday 11th April 2017 23:59
> To: solr-user@lucene.apache.org
> Subject: Deleting a field in schema.xml, reindex needed?
> 
> I’m updating our schema.xml file with 1 change: deleting a field. 
> 
> Do I need to re-index all of my documents in Solr, or can I simply reload my 
> collection config by calling:
> 
> http://mysolrhost:8000/solr/admin/collections?action=RELOAD&name=mycollection
> 
> 
> Thanks,
> Matt
> 
> 


Deleting a field in schema.xml, reindex needed?

2017-04-11 Thread Scruggs, Matt
I’m updating our schema.xml file with 1 change: deleting a field. 

Do I need to re-index all of my documents in Solr, or can I simply reload my 
collection config by calling:

http://mysolrhost:8000/solr/admin/collections?action=RELOAD&name=mycollection


Thanks,
Matt



RE: CommonGrams

2017-04-11 Thread Markus Jelsma
Hi - i cannot think of any real drawback right away. But you probably can 
expect a slightly different ordered MLT response. It should not be a problem if 
you select enough terms for MLT lookup.

Regards,
Markus

 
 
-Original message-
> From:David Hastings 
> Sent: Tuesday 11th April 2017 22:18
> To: solr-user@lucene.apache.org
> Subject: CommonGrams
> 
> Hi, was wondering if there are any known drawbacks to using the CommonGram
> factory, in regards to such features as the "more like this"
> 


Re: SolrJ appears to have problems with Docker Toolbox

2017-04-11 Thread Shawn Heisey
On 4/8/2017 6:42 PM, Mike Thomsen wrote:
> I'm running two nodes of SolrCloud in Docker on Windows using Docker
> Toolbox.  The problem I am having is that Docker Toolbox runs inside of a
> VM and so it has an internal network inside the VM that is not accessible
> to the Docker Toolbox VM's host OS. If I go to the VM's IP which is
> 192.168.99.100, I can load the admin UI and do basic operations that are
> written to go against that IP and port (like querying, schema editor,
> manually adding documents, etc.)
>
> However, when I try to run code that uses SolrJ to add documents, it fails
> because the ZK configuration has the IPs for the internal Docker network
> which is 172.X.Y..Z. If I log into the toolbox VM and run the Java code
> from there, it works just fine. From the host OS, doesn't.

SolrCloud and the CloudSolrClient class from SolrJ will have issues if
each instance registers with zookeeper using addresses that are not
reachable from other Solr instances AND from clients.  In situations
where there are both external and internal addresses, each SolrCloud
node must be configured to register with Zookeeper using the external
address or name, and the networking must be set up so clients and other
Solr instances can communicate using that address.  See the "host"
parameter here:

https://cwiki.apache.org/confluence/display/solr/Parameter+Reference

If you are also translating TCP ports, you probably need to define the
hostPort parameter as well as the host parameter.

By default, SolrCloud does the best it can to detect the address and
port it registers in Zookeeper.  When translation is involved or the
machine has more than one NIC, that often results in incorrect
information in Zookeeper.

If you change the registration information for existing nodes in an
existing cloud, you may find yourself in a situation where you need to
manually edit the zookeeper database to remove information about the
incorrect addresses that were registered before.  If you can do so,
setting up a new cloud from scratch with a fresh ZK ensemble (or a
different chroot within the existing ensemble) may be the best plan.

Thanks,
Shawn



Re: Dynamic schema memory consumption

2017-04-11 Thread Dorian Hoxha
Here is a small snippet that I copy pated from Shawn Helsey (who is a core
contributor I think, he's good):

> One thing to note:  SolrCloud begins to have performance issues when the
> number of collections in the cloud reaches the low hundreds.  It's not
> going to scale very well with a collection per user or per mailbox
> unless there aren't very many users.  There are people looking into how
> to scale better, but this hasn't really gone anywhere yet.  Here's one
> issue about it, with a lot of very dense comments:
>
> https://issues.apache.org/jira/browse/SOLR-7191


On Tue, Apr 11, 2017 at 9:11 PM, Dorian Hoxha 
wrote:

> And this overhead depends on what? I mean, if I create an empty collection
>> will it take up much heap size  just for "being there" ?
>
> Yes. You can search on elastic-search/solr/lucene mailing lists and see
> that it's true. But nobody has `empty` collections, so yours will have a
> schema and some data/segments and translog.
>
> On Tue, Apr 11, 2017 at 7:41 PM, jpereira  wrote:
>
>> The way the data is spread across the cluster is not really uniform. Most
>> of
>> shards have way lower than 50GB; I would say about 15% of the total shards
>> have more than 50GB.
>>
>>
>> Dorian Hoxha wrote
>> > Each shard is a lucene index which has a lot of overhead.
>>
>> And this overhead depends on what? I mean, if I create an empty collection
>> will it take up much heap size  just for "being there" ?
>>
>>
>> Dorian Hoxha wrote
>> > I don't know about static/dynamic memory-issue though.
>>
>> I could not find anything related in the docs or the mailing list either,
>> but I'm still not ready to discard this suspicion...
>>
>> Again, thx for your time
>>
>>
>>
>> --
>> View this message in context: http://lucene.472066.n3.nabble
>> .com/Dynamic-schema-memory-consumption-tp4329184p4329367.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>>
>
>


Long GC pauses while reading Solr docs using Cursor approach

2017-04-11 Thread Chetas Joshi
Hello,

I am using Solr (5.5.0) on HDFS. SolrCloud of 80 nodes. Sold collection
with number of shards = 80 and replication Factor=2

Sold JVM heap size = 20 GB
solr.hdfs.blockcache.enabled = true
solr.hdfs.blockcache.direct.memory.allocation = true
MaxDirectMemorySize = 25 GB

I am querying a solr collection with index size = 500 MB per core.

The off-heap (25 GB) is huge so that it can load the entire index.

Using cursor approach (number of rows = 100K), I read 2 fields (Total 40
bytes per solr doc) from the Solr docs that satisfy the query. The docs are
sorted by "id" and then by those 2 fields.

I am not able to understand why the heap memory is getting full and Full
GCs are consecutively running with long GC pauses (> 30 seconds). I am
using CMS GC.

-XX:NewRatio=3 \

-XX:SurvivorRatio=4 \

-XX:TargetSurvivorRatio=90 \

-XX:MaxTenuringThreshold=8 \

-XX:+UseConcMarkSweepGC \

-XX:+UseParNewGC \

-XX:ConcGCThreads=4 -XX:ParallelGCThreads=4 \

-XX:+CMSScavengeBeforeRemark \

-XX:PretenureSizeThreshold=64m \

-XX:+UseCMSInitiatingOccupancyOnly \

-XX:CMSInitiatingOccupancyFraction=50 \

-XX:CMSMaxAbortablePrecleanTime=6000 \

-XX:+CMSParallelRemarkEnabled \

-XX:+ParallelRefProcEnabled"


Please guide me in debugging the heap usage issue.


Thanks!


Re: simple matches not catching at query time

2017-04-11 Thread Mikhail Khludnev
John,

Here I mean a query, which matches a doc, which it expected to be matched
by the problem query.
https://cwiki.apache.org/confluence/display/solr/Common+Query+Parameters#CommonQueryParameters-TheexplainOtherParameter

On Tue, Apr 11, 2017 at 11:32 PM, John Blythe  wrote:

> first off, i don't think i have a full handle on the import of what is
> outputted by the debugger.
>
> that said, if "...PhraseQuery(manufacturer_split_syn:\"vendor vendor\")"
> is
> matching against `vendor_coolmed | coolmed | vendor`, then 'vendor' should
> match. the query analyzer is keywordtokenizer, pattern replacement
> (replaces all non-alphanumeric with underscores), checks for synonyms (the
> underscores are my way around the multi term synonym problem), then
> worddelimiter is used to blow out the underscores and generate word parts
> ("vendor_vendor" => 'vendor' 'vendor'), stop filter, lower case, stem.
>
> in your mentioned strategy, what is the "id:" representative of?
>
> thanks!
>
> --
> *John Blythe*
> Product Manager & Lead Developer
>
> 251.605.3071 | j...@curvolabs.com
> www.curvolabs.com
>
> 58 Adams Ave
> Evansville, IN 47713
>
> On Tue, Apr 11, 2017 at 4:12 PM, Mikhail Khludnev  wrote:
>
> > John,
> >
> > How do you suppose to match any of "parsed_filter_queries":["
> > MultiPhraseQuery(manufacturer_syn_both:\"(vendor_vendor_us vendor)
> > vendor\")", "PhraseQuery(manufacturer_split_syn:\"vendor vendor\")"
> > against
> > vendor_coolmed | coolmed | vendor ?
> >
> > I just can't see any chance to match them.
> >
> > One possible strategy is pick the simplest filter query, put it as a main
> > query.
> > Then pass &expainOther=id: and share the explanation.
> >
> >
> >
> > On Tue, Apr 11, 2017 at 8:57 PM, John Blythe  wrote:
> >
> > > hi, erick.
> > >
> > > appreciate the feedback.
> > >
> > > 1> i'm sending the terms to solr enquoted
> > > 2> i'd thought that at one point and reran the indexing. i _had_ had
> two
> > of
> > > the fields not indexed, but this represented one pass (same analyzer)
> > from
> > > two diff source fields while 2 or 3 of the other 4 fields _were_
> seeming
> > as
> > > if they should match. maybe just need to do this for said sanity at
> this
> > > point lol
> > > 3> i'm using dismax, no mm param set
> > >
> > > some further context:
> > >
> > > i'm querying something like this: ...fq=manufacturer:("VENDOR:VENDOR
> > US")
> > > OR manufacturer_syn:("VENDOR:VENDOR US")...
> > >
> > > The indexed value is: "Vendor"
> > >
> > > the output of field 1 in the Analysis tab would be:
> > > *index*: vendor_coolmed | coolmed | vendor
> > > *query*: vendor_vendor_coolmed | vendor | vendor
> > >
> > > the other field (and a couple other, related ones, actually) have
> similar
> > > situations where I see a clear match (as well as get the confirmation
> of
> > it
> > > when switching to the old UI and seeing the highlighting) yet get no
> > > results in my actual query.
> > >
> > > a further note. when i get the query debugging enabled I can see this
> in
> > > the output:
> > > "filter_queries":["manufacturer_syn_both:\"Vendor:Vendor US\"",
> > > "manufacturer_split_syn:(\"Vendor:Vendor US\")"],
> > > "parsed_filter_queries":["
> > > MultiPhraseQuery(manufacturer_syn_both:\"(vendor_vendor_us vendor)
> > > vendor\")", "PhraseQuery(manufacturer_split_syn:\"vendor
> vendor\")"],...
> > >
> > > It looks as if the parsed query is wrapped in quotes even after having
> > been
> > > parsed, so while the correct tokens, i.e. "vendor", are present to
> match
> > > against the indexed value, the fact that the entire parsed derivative
> of
> > > the initial query is sent to match (if that's indeed what's happening)
> > > won't actually get any hits. Yet if I remove the quotes when sending
> over
> > > to query then the parsing doesn't get to a point of having any
> > > worthwhile/matching tokens to begin with.
> > >
> > > one last thing: i've attempted with just "vendor" being sent over to
> help
> > > remove complexity and, once more, i see Analysis chain functioning just
> > > fine but the query itself getting 0 hits.
> > >
> > > think TermComponents is the best option at this point or something else
> > > given the above filler info?
> > >
> > > --
> > > *John Blythe*
> > > Product Manager & Lead Developer
> > >
> > > 251.605.3071 | j...@curvolabs.com
> > > www.curvolabs.com
> > >
> > > 58 Adams Ave
> > > Evansville, IN 47713
> > >
> > > On Tue, Apr 11, 2017 at 1:20 PM, Erick Erickson <
> erickerick...@gmail.com
> > >
> > > wrote:
> > >
> > > > &debug=query is your friend. There are several issues that often trip
> > > > people up:
> > > >
> > > > 1> The analysis tab pre-supposes that what you put in the boxes gets
> > > > all the way to the field in question. Trivial example:
> > > > I put (without quotes) "erick erickson" in the "name" field in the
> > > > analysis page and see that it gets tokenized correctly. But the query
> > > > "name:erick erickson" actually gets parsed at a higher level into
> 

Re: simple matches not catching at query time

2017-04-11 Thread John Blythe
first off, i don't think i have a full handle on the import of what is
outputted by the debugger.

that said, if "...PhraseQuery(manufacturer_split_syn:\"vendor vendor\")" is
matching against `vendor_coolmed | coolmed | vendor`, then 'vendor' should
match. the query analyzer is keywordtokenizer, pattern replacement
(replaces all non-alphanumeric with underscores), checks for synonyms (the
underscores are my way around the multi term synonym problem), then
worddelimiter is used to blow out the underscores and generate word parts
("vendor_vendor" => 'vendor' 'vendor'), stop filter, lower case, stem.

in your mentioned strategy, what is the "id:" representative of?

thanks!

-- 
*John Blythe*
Product Manager & Lead Developer

251.605.3071 | j...@curvolabs.com
www.curvolabs.com

58 Adams Ave
Evansville, IN 47713

On Tue, Apr 11, 2017 at 4:12 PM, Mikhail Khludnev  wrote:

> John,
>
> How do you suppose to match any of "parsed_filter_queries":["
> MultiPhraseQuery(manufacturer_syn_both:\"(vendor_vendor_us vendor)
> vendor\")", "PhraseQuery(manufacturer_split_syn:\"vendor vendor\")"
> against
> vendor_coolmed | coolmed | vendor ?
>
> I just can't see any chance to match them.
>
> One possible strategy is pick the simplest filter query, put it as a main
> query.
> Then pass &expainOther=id: and share the explanation.
>
>
>
> On Tue, Apr 11, 2017 at 8:57 PM, John Blythe  wrote:
>
> > hi, erick.
> >
> > appreciate the feedback.
> >
> > 1> i'm sending the terms to solr enquoted
> > 2> i'd thought that at one point and reran the indexing. i _had_ had two
> of
> > the fields not indexed, but this represented one pass (same analyzer)
> from
> > two diff source fields while 2 or 3 of the other 4 fields _were_ seeming
> as
> > if they should match. maybe just need to do this for said sanity at this
> > point lol
> > 3> i'm using dismax, no mm param set
> >
> > some further context:
> >
> > i'm querying something like this: ...fq=manufacturer:("VENDOR:VENDOR
> US")
> > OR manufacturer_syn:("VENDOR:VENDOR US")...
> >
> > The indexed value is: "Vendor"
> >
> > the output of field 1 in the Analysis tab would be:
> > *index*: vendor_coolmed | coolmed | vendor
> > *query*: vendor_vendor_coolmed | vendor | vendor
> >
> > the other field (and a couple other, related ones, actually) have similar
> > situations where I see a clear match (as well as get the confirmation of
> it
> > when switching to the old UI and seeing the highlighting) yet get no
> > results in my actual query.
> >
> > a further note. when i get the query debugging enabled I can see this in
> > the output:
> > "filter_queries":["manufacturer_syn_both:\"Vendor:Vendor US\"",
> > "manufacturer_split_syn:(\"Vendor:Vendor US\")"],
> > "parsed_filter_queries":["
> > MultiPhraseQuery(manufacturer_syn_both:\"(vendor_vendor_us vendor)
> > vendor\")", "PhraseQuery(manufacturer_split_syn:\"vendor vendor\")"],...
> >
> > It looks as if the parsed query is wrapped in quotes even after having
> been
> > parsed, so while the correct tokens, i.e. "vendor", are present to match
> > against the indexed value, the fact that the entire parsed derivative of
> > the initial query is sent to match (if that's indeed what's happening)
> > won't actually get any hits. Yet if I remove the quotes when sending over
> > to query then the parsing doesn't get to a point of having any
> > worthwhile/matching tokens to begin with.
> >
> > one last thing: i've attempted with just "vendor" being sent over to help
> > remove complexity and, once more, i see Analysis chain functioning just
> > fine but the query itself getting 0 hits.
> >
> > think TermComponents is the best option at this point or something else
> > given the above filler info?
> >
> > --
> > *John Blythe*
> > Product Manager & Lead Developer
> >
> > 251.605.3071 | j...@curvolabs.com
> > www.curvolabs.com
> >
> > 58 Adams Ave
> > Evansville, IN 47713
> >
> > On Tue, Apr 11, 2017 at 1:20 PM, Erick Erickson  >
> > wrote:
> >
> > > &debug=query is your friend. There are several issues that often trip
> > > people up:
> > >
> > > 1> The analysis tab pre-supposes that what you put in the boxes gets
> > > all the way to the field in question. Trivial example:
> > > I put (without quotes) "erick erickson" in the "name" field in the
> > > analysis page and see that it gets tokenized correctly. But the query
> > > "name:erick erickson" actually gets parsed at a higher level into
> > > name:erick default_search_field:erickson. See the discussion at:
> > > SOLR-9185
> > >
> > > 2> what you think is in your indexed field isn't really. Can happen if
> > > you changed your analysis chain but didn't totally re-index. Can
> > > happen because one of the parts of the analysis chain works
> > > differently than you expect (WordDelimiterFilterFactory, for instance,
> > > has a ton of options that can alter the tokens emitted). The
> > > TermsComponent will let you examine the terms actually _in_ the index
> > > that you search on. You stated that the 

CommonGrams

2017-04-11 Thread David Hastings
Hi, was wondering if there are any known drawbacks to using the CommonGram
factory, in regards to such features as the "more like this"


Re: simple matches not catching at query time

2017-04-11 Thread Mikhail Khludnev
John,

How do you suppose to match any of "parsed_filter_queries":["
MultiPhraseQuery(manufacturer_syn_both:\"(vendor_vendor_us vendor)
vendor\")", "PhraseQuery(manufacturer_split_syn:\"vendor vendor\")"
against
vendor_coolmed | coolmed | vendor ?

I just can't see any chance to match them.

One possible strategy is pick the simplest filter query, put it as a main
query.
Then pass &expainOther=id: and share the explanation.



On Tue, Apr 11, 2017 at 8:57 PM, John Blythe  wrote:

> hi, erick.
>
> appreciate the feedback.
>
> 1> i'm sending the terms to solr enquoted
> 2> i'd thought that at one point and reran the indexing. i _had_ had two of
> the fields not indexed, but this represented one pass (same analyzer) from
> two diff source fields while 2 or 3 of the other 4 fields _were_ seeming as
> if they should match. maybe just need to do this for said sanity at this
> point lol
> 3> i'm using dismax, no mm param set
>
> some further context:
>
> i'm querying something like this: ...fq=manufacturer:("VENDOR:VENDOR US")
> OR manufacturer_syn:("VENDOR:VENDOR US")...
>
> The indexed value is: "Vendor"
>
> the output of field 1 in the Analysis tab would be:
> *index*: vendor_coolmed | coolmed | vendor
> *query*: vendor_vendor_coolmed | vendor | vendor
>
> the other field (and a couple other, related ones, actually) have similar
> situations where I see a clear match (as well as get the confirmation of it
> when switching to the old UI and seeing the highlighting) yet get no
> results in my actual query.
>
> a further note. when i get the query debugging enabled I can see this in
> the output:
> "filter_queries":["manufacturer_syn_both:\"Vendor:Vendor US\"",
> "manufacturer_split_syn:(\"Vendor:Vendor US\")"],
> "parsed_filter_queries":["
> MultiPhraseQuery(manufacturer_syn_both:\"(vendor_vendor_us vendor)
> vendor\")", "PhraseQuery(manufacturer_split_syn:\"vendor vendor\")"],...
>
> It looks as if the parsed query is wrapped in quotes even after having been
> parsed, so while the correct tokens, i.e. "vendor", are present to match
> against the indexed value, the fact that the entire parsed derivative of
> the initial query is sent to match (if that's indeed what's happening)
> won't actually get any hits. Yet if I remove the quotes when sending over
> to query then the parsing doesn't get to a point of having any
> worthwhile/matching tokens to begin with.
>
> one last thing: i've attempted with just "vendor" being sent over to help
> remove complexity and, once more, i see Analysis chain functioning just
> fine but the query itself getting 0 hits.
>
> think TermComponents is the best option at this point or something else
> given the above filler info?
>
> --
> *John Blythe*
> Product Manager & Lead Developer
>
> 251.605.3071 | j...@curvolabs.com
> www.curvolabs.com
>
> 58 Adams Ave
> Evansville, IN 47713
>
> On Tue, Apr 11, 2017 at 1:20 PM, Erick Erickson 
> wrote:
>
> > &debug=query is your friend. There are several issues that often trip
> > people up:
> >
> > 1> The analysis tab pre-supposes that what you put in the boxes gets
> > all the way to the field in question. Trivial example:
> > I put (without quotes) "erick erickson" in the "name" field in the
> > analysis page and see that it gets tokenized correctly. But the query
> > "name:erick erickson" actually gets parsed at a higher level into
> > name:erick default_search_field:erickson. See the discussion at:
> > SOLR-9185
> >
> > 2> what you think is in your indexed field isn't really. Can happen if
> > you changed your analysis chain but didn't totally re-index. Can
> > happen because one of the parts of the analysis chain works
> > differently than you expect (WordDelimiterFilterFactory, for instance,
> > has a ton of options that can alter the tokens emitted). The
> > TermsComponent will let you examine the terms actually _in_ the index
> > that you search on. You stated that the analysis page shows you what
> > you expect, so this is a sanity check.
> >
> > 3> You're using edismax and setting some parameter, mm=100% is a
> > favorite and it's having this effect.
> >
> > So add debug=query and provide a sample document (or just a field) and
> > the schema definition for the field in question if you're still
> > stumped.
> >
> > Best,
> > Erick
> >
> > On Tue, Apr 11, 2017 at 8:35 AM, John Blythe  wrote:
> > > hi everyone.
> > >
> > > i recently wrote in ('analysis matching, query not') but never heard
> back
> > > so wanted to follow up. i'm at my wit's end currently. i have several
> > > fields that are showing matches in the analysis tab. when i dumb down
> the
> > > string sent over to query it still gives me issues in some field cases.
> > >
> > > any thoughts on how to debug to figure out wtf is going on here would
> be
> > > greatly appreciated. the use case is straightforward and the solution
> > > should be as well, so i'm at a loss as to how in the world i'm having
> > > issues w this.
> > >
> > > can provide any amount of contextualizin

RE: Solr 6.4. Can't index MS Visio vsdx files

2017-04-11 Thread Allison, Timothy B.
It depends.  We've been trying to make parsers more, erm, flexible, but there 
are some problems from which we cannot recover.

Tl;dr there isn't a short answer.  :(

My sense is that DIH/ExtractingDocumentHandler is intended to get people up and 
running with Solr easily but it is not really a great idea for production.  See 
Erick's gem: https://lucidworks.com/2012/02/14/indexing-with-solrj/ 

As for the Tika portion... at the very least, Tika _shouldn't_ cause the 
ingesting process to crash.  At most, it should fail at the file level and not 
cause greater havoc.  In practice, if you're processing millions of files from 
the wild, you'll run into bad behavior and need to defend against permanent 
hangs, oom, memory leaks.

Also, at the least, if there's an exception with an embedded file, Tika should 
catch it and keep going with the rest of the file.  If this doesn't happen let 
us know!  We are aware that some types of embedded file stream problems were 
causing parse failures on the entire file, and we now catch those in Tika 
1.15-SNAPSHOT and don't let them percolate up through the parent file (they're 
reported in the metadata though).

Specifically for your stack traces:

For your initial problem with the missing class exceptions -- I thought we used 
to catch those in docx and log them.  I haven't been able to track this down, 
though.  I can look more if you have a need.

For "Caused by: org.apache.poi.POIXMLException: Invalid 'Row_Type' name 
'PolylineTo' ", this problem might go away if we implemented a pure SAX parser 
for vsdx.  We just did this for docx and pptx (coming in 1.15) and these are 
more robust to variation because they aren't requiring a match with the ooxml 
schema.  I haven't looked much at vsdx, but that _might_ help.

For "TODO Support v5 Pointers", this isn't supported and would require 
contributions.  However, I agree that POI shouldn't throw a Runtime exception.  
Perhaps open an issue in POI, or maybe we should catch this special example at 
the Tika level?

For "Caused by: java.lang.ArrayIndexOutOfBoundsException:", the POI team 
_might_ be able to modify the parser to ignore a stream if there's an 
exception, but that's often a sign that something needs to be fixed with the 
parser.  In short, the solution will come from POI.

Best,

 Tim

-Original Message-
From: Gytis Mikuciunas [mailto:gyt...@gmail.com] 
Sent: Tuesday, April 11, 2017 1:56 PM
To: solr-user@lucene.apache.org
Subject: RE: Solr 6.4. Can't index MS Visio vsdx files

Thanks for your responses.
Are there any posibilities to ignore parsing errors and continue indexing?
because now solr/tika stops parsing whole document if it finds any exception

On Apr 11, 2017 19:51, "Allison, Timothy B."  wrote:

> You might want to drop a note to the dev or user's list on Apache POI.
>
> I'm not extremely familiar with the vsd(x) portion of our code base.
>
> The first item ("PolylineTo") may be caused by a mismatch btwn your 
> doc and the ooxml spec.
>
> The second item appears to be an unsupported feature.
>
> The third item may be an area for improvement within our codebase...I 
> can't tell just from the stacktrace.
>
> You'll probably get more helpful answers over on POI.  Sorry, I can't 
> help with this...
>
> Best,
>
>Tim
>
> P.S.
> >  3.1. ooxml-schemas-1.3.jar instead of poi-ooxml-schemas-3.15.jar
>
> You shouldn't need both. Ooxml-schemas-1.3.jar should be a super set 
> of poi-ooxml-schemas-3.15.jar
>
>
>


Re: Dynamic schema memory consumption

2017-04-11 Thread Dorian Hoxha
>
> And this overhead depends on what? I mean, if I create an empty collection
> will it take up much heap size  just for "being there" ?

Yes. You can search on elastic-search/solr/lucene mailing lists and see
that it's true. But nobody has `empty` collections, so yours will have a
schema and some data/segments and translog.

On Tue, Apr 11, 2017 at 7:41 PM, jpereira  wrote:

> The way the data is spread across the cluster is not really uniform. Most
> of
> shards have way lower than 50GB; I would say about 15% of the total shards
> have more than 50GB.
>
>
> Dorian Hoxha wrote
> > Each shard is a lucene index which has a lot of overhead.
>
> And this overhead depends on what? I mean, if I create an empty collection
> will it take up much heap size  just for "being there" ?
>
>
> Dorian Hoxha wrote
> > I don't know about static/dynamic memory-issue though.
>
> I could not find anything related in the docs or the mailing list either,
> but I'm still not ready to discard this suspicion...
>
> Again, thx for your time
>
>
>
> --
> View this message in context: http://lucene.472066.n3.
> nabble.com/Dynamic-schema-memory-consumption-tp4329184p4329367.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: simple matches not catching at query time

2017-04-11 Thread John Blythe
hi, erick.

appreciate the feedback.

1> i'm sending the terms to solr enquoted
2> i'd thought that at one point and reran the indexing. i _had_ had two of
the fields not indexed, but this represented one pass (same analyzer) from
two diff source fields while 2 or 3 of the other 4 fields _were_ seeming as
if they should match. maybe just need to do this for said sanity at this
point lol
3> i'm using dismax, no mm param set

some further context:

i'm querying something like this: ...fq=manufacturer:("VENDOR:VENDOR US")
OR manufacturer_syn:("VENDOR:VENDOR US")...

The indexed value is: "Vendor"

the output of field 1 in the Analysis tab would be:
*index*: vendor_coolmed | coolmed | vendor
*query*: vendor_vendor_coolmed | vendor | vendor

the other field (and a couple other, related ones, actually) have similar
situations where I see a clear match (as well as get the confirmation of it
when switching to the old UI and seeing the highlighting) yet get no
results in my actual query.

a further note. when i get the query debugging enabled I can see this in
the output:
"filter_queries":["manufacturer_syn_both:\"Vendor:Vendor US\"",
"manufacturer_split_syn:(\"Vendor:Vendor US\")"], "parsed_filter_queries":["
MultiPhraseQuery(manufacturer_syn_both:\"(vendor_vendor_us vendor)
vendor\")", "PhraseQuery(manufacturer_split_syn:\"vendor vendor\")"],...

It looks as if the parsed query is wrapped in quotes even after having been
parsed, so while the correct tokens, i.e. "vendor", are present to match
against the indexed value, the fact that the entire parsed derivative of
the initial query is sent to match (if that's indeed what's happening)
won't actually get any hits. Yet if I remove the quotes when sending over
to query then the parsing doesn't get to a point of having any
worthwhile/matching tokens to begin with.

one last thing: i've attempted with just "vendor" being sent over to help
remove complexity and, once more, i see Analysis chain functioning just
fine but the query itself getting 0 hits.

think TermComponents is the best option at this point or something else
given the above filler info?

-- 
*John Blythe*
Product Manager & Lead Developer

251.605.3071 | j...@curvolabs.com
www.curvolabs.com

58 Adams Ave
Evansville, IN 47713

On Tue, Apr 11, 2017 at 1:20 PM, Erick Erickson 
wrote:

> &debug=query is your friend. There are several issues that often trip
> people up:
>
> 1> The analysis tab pre-supposes that what you put in the boxes gets
> all the way to the field in question. Trivial example:
> I put (without quotes) "erick erickson" in the "name" field in the
> analysis page and see that it gets tokenized correctly. But the query
> "name:erick erickson" actually gets parsed at a higher level into
> name:erick default_search_field:erickson. See the discussion at:
> SOLR-9185
>
> 2> what you think is in your indexed field isn't really. Can happen if
> you changed your analysis chain but didn't totally re-index. Can
> happen because one of the parts of the analysis chain works
> differently than you expect (WordDelimiterFilterFactory, for instance,
> has a ton of options that can alter the tokens emitted). The
> TermsComponent will let you examine the terms actually _in_ the index
> that you search on. You stated that the analysis page shows you what
> you expect, so this is a sanity check.
>
> 3> You're using edismax and setting some parameter, mm=100% is a
> favorite and it's having this effect.
>
> So add debug=query and provide a sample document (or just a field) and
> the schema definition for the field in question if you're still
> stumped.
>
> Best,
> Erick
>
> On Tue, Apr 11, 2017 at 8:35 AM, John Blythe  wrote:
> > hi everyone.
> >
> > i recently wrote in ('analysis matching, query not') but never heard back
> > so wanted to follow up. i'm at my wit's end currently. i have several
> > fields that are showing matches in the analysis tab. when i dumb down the
> > string sent over to query it still gives me issues in some field cases.
> >
> > any thoughts on how to debug to figure out wtf is going on here would be
> > greatly appreciated. the use case is straightforward and the solution
> > should be as well, so i'm at a loss as to how in the world i'm having
> > issues w this.
> >
> > can provide any amount of contextualizing information you need, just let
> me
> > know what could be beneficial.
> >
> > best,
> >
> > john
>


RE: Solr 6.4. Can't index MS Visio vsdx files

2017-04-11 Thread Gytis Mikuciunas
Thanks for your responses.
Are there any posibilities to ignore parsing errors and continue indexing?
because now solr/tika stops parsing whole document if it finds any exception

On Apr 11, 2017 19:51, "Allison, Timothy B."  wrote:

> You might want to drop a note to the dev or user's list on Apache POI.
>
> I'm not extremely familiar with the vsd(x) portion of our code base.
>
> The first item ("PolylineTo") may be caused by a mismatch btwn your doc
> and the ooxml spec.
>
> The second item appears to be an unsupported feature.
>
> The third item may be an area for improvement within our codebase...I
> can't tell just from the stacktrace.
>
> You'll probably get more helpful answers over on POI.  Sorry, I can't help
> with this...
>
> Best,
>
>Tim
>
> P.S.
> >  3.1. ooxml-schemas-1.3.jar instead of poi-ooxml-schemas-3.15.jar
>
> You shouldn't need both. Ooxml-schemas-1.3.jar should be a super set of
> poi-ooxml-schemas-3.15.jar
>
>
>


Re: Dynamic schema memory consumption

2017-04-11 Thread jpereira
The way the data is spread across the cluster is not really uniform. Most of
shards have way lower than 50GB; I would say about 15% of the total shards
have more than 50GB.


Dorian Hoxha wrote
> Each shard is a lucene index which has a lot of overhead. 

And this overhead depends on what? I mean, if I create an empty collection
will it take up much heap size  just for "being there" ?


Dorian Hoxha wrote
> I don't know about static/dynamic memory-issue though.

I could not find anything related in the docs or the mailing list either,
but I'm still not ready to discard this suspicion...

Again, thx for your time



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Dynamic-schema-memory-consumption-tp4329184p4329367.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Grouped Result sort issue

2017-04-11 Thread Erick Erickson
Skimming, I don't think this is inconsistent. First I assume that
you're OK with the second example, it's this one seems odd to you:

sort=score asc
group.sort=score desc

You're telling Solr to return the highest scoring doc in each group.
However, you're asking to order the _groups_ in ascending score order
(i.e. the group with the lowest scoring doc first) of _any_ doc in
that group, not just the one(s) returned. These are two separate
things.

 "groupValue":"63",
  "doclist":{"numFound":143,

My bet is that the 143rd doc's score in this group has a lower score
than any document returned in any group.

To verify:
Specify:
sort=score asc
group.sort=score asc

My bet: The ordering of the groups will be the same as
sort=score asc
group.sort=score desc

It's just that the doc returned will be the lowest scoring doc.

Best,
Erick



On Tue, Apr 11, 2017 at 8:16 AM, Eric Cartman  wrote:
> I modified and cleaned the previous query. As you can see the first query
> sorting is a bit odd.
>
> Using parameters
> sort=score asc
> group.sort=score desc
>
> http://localhost:8983/solr/mcontent.ph_post/select?=&fl=*,score&group.field=partnerId&group.limit=1&group.main=false&group.ngroups=true&group.sort=score
> desc&group=true&indent=on&q=text:cars&rows=5000&sort=score
> asc&start=0&wt=json&omitHeader=true
>
> {
>   "grouped":{
> "partnerId":{
>   "matches":8681,
>   "ngroups":10,
>   "groups":[{
>   "groupValue":"63",
>   "doclist":{"numFound":143,"start":0,"maxScore":0.48749906,"docs":[
>   {
> "postId":"26317",
> "score":0.48749906}]
>   }},
> {
>   "groupValue":"64",
>   "doclist":{"numFound":144,"start":0,"maxScore":0.34190965,"docs":[
>   {
> "postId":"25549",
> "score":0.34190965}]
>   }},
> {
>   "groupValue":"28",
>   "doclist":{"numFound":2023,"start":0,"maxScore":0.6838193,"docs":[
>   {
> "postId":"31447",
> "score":0.6838193}]
>   }},
> {
>   "groupValue":"23",
>   "doclist":{"numFound":3539,"start":0,"maxScore":0.6223264,"docs":[
>   {
> "postId":"15053",
> "score":0.6223264}]
>   }},
> {
>   "groupValue":"25",
>   "doclist":{"numFound":2651,"start":0,"maxScore":0.9381923,"docs":[
>   {
> "postId":"21199",
> "score":0.9381923}]
>   }},
> {
>   "groupValue":"61",
>   "doclist":{"numFound":160,"start":0,"maxScore":0.66007686,"docs":[
>   {
> "postId":"8730",
> "score":0.66007686}]
>   }},
> {
>   "groupValue":"141",
>   "doclist":{"numFound":9,"start":0,"maxScore":0.5074051,"docs":[
>   {
> "postId":"34406",
> "score":0.5074051}]
>   }},
> {
>   "groupValue":"142",
>   "doclist":{"numFound":9,"start":0,"maxScore":0.22002561,"docs":[
>   {
> "postId":"35000",
> "score":0.22002561}]
>   }},
> {
>   "groupValue":"189",
>   "doclist":{"numFound":1,"start":0,"maxScore":0.09951033,"docs":[
>   {
> "postId":"33971",
> "score":0.09951033}]
>   }},
> {
>   "groupValue":"40",
>   "doclist":{"numFound":2,"start":0,"maxScore":0.3283673,"docs":[
>   {
> "postId":"30142",
> "score":0.3283673}]
>   }}]}}}
>
> Using parameters
> sort=score desc
> group.sort=score desc
>
> http://localhost:8983/solr/mcontent.ph_post/select?=&fl=*,score&group.field=partnerId&group.limit=1&group.main=false&group.ngroups=true&group.sort=score
> desc&group=true&indent=on&q=text:cars&rows=5000&sort=score
> desc&start=0&wt=json&omitHeader=true
> {
>   "grouped":{
> "partnerId":{
>   "matches":8681,
>   "ngroups":10,
>   "groups":[{
>   "groupValue":"25",
>   "doclist":{"numFound":2651,"start":0,"maxScore":0.9381923,"docs":[
>   {
> "postId":"21199",
> "score":0.9381923}]
>   }},
> {
>   "groupValue":"28",
>   "doclist":{"numFound":2023,"start":0,"maxScore":0.6838193,"docs":[
>   {
> "postId":"31447",
> "score":0.6838193}]
>   }},
> {
>   "groupValue":"61",
>   "doclist":{"numFound":160,"start":0,"maxScore":0.66007686,"docs":[
>   {
> "postId":"8730",
> "score":0.66007686}]
>   }},
> {
>   "groupValue":"23",
>   "doclist":{"numFound":3539,"start":0,"maxScore":0.6223264,"docs":[
>   {
> "p

Re: simple matches not catching at query time

2017-04-11 Thread Erick Erickson
&debug=query is your friend. There are several issues that often trip people up:

1> The analysis tab pre-supposes that what you put in the boxes gets
all the way to the field in question. Trivial example:
I put (without quotes) "erick erickson" in the "name" field in the
analysis page and see that it gets tokenized correctly. But the query
"name:erick erickson" actually gets parsed at a higher level into
name:erick default_search_field:erickson. See the discussion at:
SOLR-9185

2> what you think is in your indexed field isn't really. Can happen if
you changed your analysis chain but didn't totally re-index. Can
happen because one of the parts of the analysis chain works
differently than you expect (WordDelimiterFilterFactory, for instance,
has a ton of options that can alter the tokens emitted). The
TermsComponent will let you examine the terms actually _in_ the index
that you search on. You stated that the analysis page shows you what
you expect, so this is a sanity check.

3> You're using edismax and setting some parameter, mm=100% is a
favorite and it's having this effect.

So add debug=query and provide a sample document (or just a field) and
the schema definition for the field in question if you're still
stumped.

Best,
Erick

On Tue, Apr 11, 2017 at 8:35 AM, John Blythe  wrote:
> hi everyone.
>
> i recently wrote in ('analysis matching, query not') but never heard back
> so wanted to follow up. i'm at my wit's end currently. i have several
> fields that are showing matches in the analysis tab. when i dumb down the
> string sent over to query it still gives me issues in some field cases.
>
> any thoughts on how to debug to figure out wtf is going on here would be
> greatly appreciated. the use case is straightforward and the solution
> should be as well, so i'm at a loss as to how in the world i'm having
> issues w this.
>
> can provide any amount of contextualizing information you need, just let me
> know what could be beneficial.
>
> best,
>
> john


RE: Solr 6.4. Can't index MS Visio vsdx files

2017-04-11 Thread Allison, Timothy B.
You might want to drop a note to the dev or user's list on Apache POI.

I'm not extremely familiar with the vsd(x) portion of our code base.

The first item ("PolylineTo") may be caused by a mismatch btwn your doc and the 
ooxml spec.

The second item appears to be an unsupported feature.

The third item may be an area for improvement within our codebase...I can't 
tell just from the stacktrace.

You'll probably get more helpful answers over on POI.  Sorry, I can't help with 
this...

Best,

   Tim

P.S.
>  3.1. ooxml-schemas-1.3.jar instead of poi-ooxml-schemas-3.15.jar

You shouldn't need both. Ooxml-schemas-1.3.jar should be a super set of 
poi-ooxml-schemas-3.15.jar




Re: SolrJ appears to have problems with Docker Toolbox

2017-04-11 Thread Vincenzo D'Amore
Ok :)

But if you have time have a look at my project https://github.com/freedev/
solrcloud-zookeeper-docker

The project builds a couple of docker instances (solr - zookeeper) or a
cluster with 6 nodes.

Then you have just to put in your hosts file the ip addresses of your VM
and you can play with it.



On Tue, Apr 11, 2017 at 6:06 PM, Mike Thomsen 
wrote:

> Thanks. I think I'll take a look at that. I decided to just build a big
> vagrant-managed desktop VM to let me run Ubuntu on my company machine, so I
> expect that this pain point may be largely gone soon.
>
> On Mon, Apr 10, 2017 at 12:31 PM, Vincenzo D'Amore 
> wrote:
>
> > Hi Mike
> >
> > disclaimer I'm the author of https://github.com/freedev/
> > solrcloud-zookeeper-docker
> >
> > I had same problem when I tried to create a cluster SolrCloud with
> docker,
> > just because the docker instances were referred by ip addresses I cannot
> > access with SolrJ.
> >
> > I avoided this problem referring each docker instance via a hostname
> > instead of ip address.
> >
> > Docker-compose is a great help to have a network where your docker
> > instances can be resolved using their names.
> >
> > I'll suggest to take a look at my project, in particular at the
> > docker-compose.yml used to start a SolrCloud cluster (3 Solr nodes with a
> > zookeeper ensemble of 3):
> >
> > https://raw.githubusercontent.com/freedev/solrcloud-
> > zookeeper-docker/master/
> > solrcloud-3-nodes-zookeeper-ensemble/docker-compose.yml
> >
> > Ok, I know, it sounds too much create a SolrCloud into a single VM, I did
> > it just to understand how Solr works... :)
> >
> > Once you've build your SolrCloud Docker network, you can map the name of
> > your docker instances externally, for example in your private network or
> in
> > your hosts file.
> >
> > In other words, given a Docker Solr instance named solr-1, in the docker
> > network the instance named solr-1 has a docker ip address that cannot be
> > used outside the VM.
> >
> > So when you use SolrJ client on your computer you must have into
> /etc/hosts
> > an entry solr-1 that points to the ip address your VM (the public network
> > interface where the docker instance is mapped).
> >
> > Hope you understand... :)
> >
> > Cheers,
> > Vincenzo
> >
> >
> > On Sun, Apr 9, 2017 at 2:42 AM, Mike Thomsen 
> > wrote:
> >
> > > I'm running two nodes of SolrCloud in Docker on Windows using Docker
> > > Toolbox.  The problem I am having is that Docker Toolbox runs inside
> of a
> > > VM and so it has an internal network inside the VM that is not
> accessible
> > > to the Docker Toolbox VM's host OS. If I go to the VM's IP which is
> > > 192.168.99.100, I can load the admin UI and do basic operations that
> are
> > > written to go against that IP and port (like querying, schema editor,
> > > manually adding documents, etc.)
> > >
> > > However, when I try to run code that uses SolrJ to add documents, it
> > fails
> > > because the ZK configuration has the IPs for the internal Docker
> network
> > > which is 172.X.Y..Z. If I log into the toolbox VM and run the Java code
> > > from there, it works just fine. From the host OS, doesn't.
> > >
> > > Anyone have any ideas on how to get around this? If I rewrite the
> > indexing
> > > code to do a manual JSON POST to the update handler on one of the
> nodes,
> > it
> > > does work just fine, but that leaves me not using SolrJ.
> > >
> > > Thanks,
> > >
> > > Mike
> > >
> >
> >
> >
> > --
> > Vincenzo D'Amore
> > email: v.dam...@gmail.com
> > skype: free.dev
> > mobile: +39 349 8513251 <349%20851%203251>
> >
>



-- 
Vincenzo D'Amore
email: v.dam...@gmail.com
skype: free.dev
mobile: +39 349 8513251


Invoking a SerachHandler inside Solr Plugin

2017-04-11 Thread Max Bridgewater
I am looking for best practices when a search component in one handler,
needs to invoke another handler, say /basic. So far, I got this working
prototype:

public void process(ResponseBuilder rb) throws IOException {
  SolrQueryResponse response = new SolrQueryResponse();
 ModifiableSolrParams params=new ModifiableSolrParams();
 params.add("defType",
"lucene").add("fl","product_id").add("wt","json").
add("df","competitor_product_titles").add("echoParams","explicit").add("q",rb.req.getParams().get("q"));
  SolrQueryRequest request= new
LocalSolrQueryRequest(rb.req.getCore(),params );
  SolrRequestHandler hdlr =
rb.req.getCore().getRequestHandler("/basic");
  rb.req.getCore().execute(hdlr, request, response);
  DocList
docList=((ResultContext)response.getValues().get("response")).docs;
 //Do some crazy stuff with the result
}


My concerns:

1) What is a clean way to read the /basic handler's default parameters
from solrconfig.xml and use them in LocalSolrQueryRequest().
2) Is there a better way to accomplish this task overall?


Thanks,
Max.


Re: Dynamic schema memory consumption

2017-04-11 Thread Dorian Hoxha
What I'm suggesting, is that you should aim for max(50GB) per shard of
data. How much is it currently ?
Each shard is a lucene index which has a lot of overhead. If you can, try
to have 20x-50x-100x less shards than you currently do and you'll see lower
heap requirement. I don't know about static/dynamic memory-issue though.

On Tue, Apr 11, 2017 at 6:09 PM, jpereira  wrote:

> Dorian Hoxha wrote
> > Isn't 18K lucene-indexes (1 for each shard, not counting the replicas) a
> > little too much for 3TB of data ?
> > Something like 0.167GB for each shard ?
> > Isn't that too much overhead (i've mostly worked with es but still lucene
> > underneath) ?
>
> I don't have only 3TB , I have 3TB in two tier2 machines, the whole cluster
> is 12 TB :) So what I was trying to explain was this:
>
> NODES A & B
> 3TB per machine , 36 collections * 12 shards (432 indexes) , average heap
> footprint of 60GB
>
> NODES C & D - at first
> ~725GB per machine, 4 collections * 12 shards (48 indexes) , average heap
> footprint of 12GB
>
> NODES C & D - after addding 220GB schemaless data
> ~1TB per machine, 46 collections * 12 shards (552 indexes),  average heap
> footprint of 48GB
>
> So, what you are suggesting is that the culprit for the bump in heap
> footprint is the new collections?
>
>
> Dorian Hoxha wrote
> > Also you should change the heap 32GB->30GB so you're guaranteed to get
> > pointer compression. I think you should have no need to increase it more
> > than this, since most things have moved to out-of-heap stuff, like
> > docValues etc.
>
> I was forced to raise the heap size because the memory requirements
> dramatically raised, hence this post :)
>
> Thanks
>
>
>
> --
> View this message in context: http://lucene.472066.n3.
> nabble.com/Dynamic-schema-memory-consumption-tp4329184p4329345.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: Dynamic schema memory consumption

2017-04-11 Thread jpereira
Dorian Hoxha wrote
> Isn't 18K lucene-indexes (1 for each shard, not counting the replicas) a
> little too much for 3TB of data ?
> Something like 0.167GB for each shard ?
> Isn't that too much overhead (i've mostly worked with es but still lucene
> underneath) ?

I don't have only 3TB , I have 3TB in two tier2 machines, the whole cluster
is 12 TB :) So what I was trying to explain was this:

NODES A & B
3TB per machine , 36 collections * 12 shards (432 indexes) , average heap
footprint of 60GB

NODES C & D - at first
~725GB per machine, 4 collections * 12 shards (48 indexes) , average heap
footprint of 12GB

NODES C & D - after addding 220GB schemaless data
~1TB per machine, 46 collections * 12 shards (552 indexes),  average heap
footprint of 48GB

So, what you are suggesting is that the culprit for the bump in heap
footprint is the new collections?


Dorian Hoxha wrote
> Also you should change the heap 32GB->30GB so you're guaranteed to get 
> pointer compression. I think you should have no need to increase it more 
> than this, since most things have moved to out-of-heap stuff, like 
> docValues etc. 

I was forced to raise the heap size because the memory requirements
dramatically raised, hence this post :)

Thanks



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Dynamic-schema-memory-consumption-tp4329184p4329345.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: SolrJ appears to have problems with Docker Toolbox

2017-04-11 Thread Mike Thomsen
Thanks. I think I'll take a look at that. I decided to just build a big
vagrant-managed desktop VM to let me run Ubuntu on my company machine, so I
expect that this pain point may be largely gone soon.

On Mon, Apr 10, 2017 at 12:31 PM, Vincenzo D'Amore 
wrote:

> Hi Mike
>
> disclaimer I'm the author of https://github.com/freedev/
> solrcloud-zookeeper-docker
>
> I had same problem when I tried to create a cluster SolrCloud with docker,
> just because the docker instances were referred by ip addresses I cannot
> access with SolrJ.
>
> I avoided this problem referring each docker instance via a hostname
> instead of ip address.
>
> Docker-compose is a great help to have a network where your docker
> instances can be resolved using their names.
>
> I'll suggest to take a look at my project, in particular at the
> docker-compose.yml used to start a SolrCloud cluster (3 Solr nodes with a
> zookeeper ensemble of 3):
>
> https://raw.githubusercontent.com/freedev/solrcloud-
> zookeeper-docker/master/
> solrcloud-3-nodes-zookeeper-ensemble/docker-compose.yml
>
> Ok, I know, it sounds too much create a SolrCloud into a single VM, I did
> it just to understand how Solr works... :)
>
> Once you've build your SolrCloud Docker network, you can map the name of
> your docker instances externally, for example in your private network or in
> your hosts file.
>
> In other words, given a Docker Solr instance named solr-1, in the docker
> network the instance named solr-1 has a docker ip address that cannot be
> used outside the VM.
>
> So when you use SolrJ client on your computer you must have into /etc/hosts
> an entry solr-1 that points to the ip address your VM (the public network
> interface where the docker instance is mapped).
>
> Hope you understand... :)
>
> Cheers,
> Vincenzo
>
>
> On Sun, Apr 9, 2017 at 2:42 AM, Mike Thomsen 
> wrote:
>
> > I'm running two nodes of SolrCloud in Docker on Windows using Docker
> > Toolbox.  The problem I am having is that Docker Toolbox runs inside of a
> > VM and so it has an internal network inside the VM that is not accessible
> > to the Docker Toolbox VM's host OS. If I go to the VM's IP which is
> > 192.168.99.100, I can load the admin UI and do basic operations that are
> > written to go against that IP and port (like querying, schema editor,
> > manually adding documents, etc.)
> >
> > However, when I try to run code that uses SolrJ to add documents, it
> fails
> > because the ZK configuration has the IPs for the internal Docker network
> > which is 172.X.Y..Z. If I log into the toolbox VM and run the Java code
> > from there, it works just fine. From the host OS, doesn't.
> >
> > Anyone have any ideas on how to get around this? If I rewrite the
> indexing
> > code to do a manual JSON POST to the update handler on one of the nodes,
> it
> > does work just fine, but that leaves me not using SolrJ.
> >
> > Thanks,
> >
> > Mike
> >
>
>
>
> --
> Vincenzo D'Amore
> email: v.dam...@gmail.com
> skype: free.dev
> mobile: +39 349 8513251 <349%20851%203251>
>


simple matches not catching at query time

2017-04-11 Thread John Blythe
hi everyone.

i recently wrote in ('analysis matching, query not') but never heard back
so wanted to follow up. i'm at my wit's end currently. i have several
fields that are showing matches in the analysis tab. when i dumb down the
string sent over to query it still gives me issues in some field cases.

any thoughts on how to debug to figure out wtf is going on here would be
greatly appreciated. the use case is straightforward and the solution
should be as well, so i'm at a loss as to how in the world i'm having
issues w this.

can provide any amount of contextualizing information you need, just let me
know what could be beneficial.

best,

john


Re: Grouped Result sort issue

2017-04-11 Thread Eric Cartman
I modified and cleaned the previous query. As you can see the first query
sorting is a bit odd.

Using parameters
sort=score asc
group.sort=score desc

http://localhost:8983/solr/mcontent.ph_post/select?=&fl=*,score&group.field=partnerId&group.limit=1&group.main=false&group.ngroups=true&group.sort=score
desc&group=true&indent=on&q=text:cars&rows=5000&sort=score
asc&start=0&wt=json&omitHeader=true

{
  "grouped":{
"partnerId":{
  "matches":8681,
  "ngroups":10,
  "groups":[{
  "groupValue":"63",
  "doclist":{"numFound":143,"start":0,"maxScore":0.48749906,"docs":[
  {
"postId":"26317",
"score":0.48749906}]
  }},
{
  "groupValue":"64",
  "doclist":{"numFound":144,"start":0,"maxScore":0.34190965,"docs":[
  {
"postId":"25549",
"score":0.34190965}]
  }},
{
  "groupValue":"28",
  "doclist":{"numFound":2023,"start":0,"maxScore":0.6838193,"docs":[
  {
"postId":"31447",
"score":0.6838193}]
  }},
{
  "groupValue":"23",
  "doclist":{"numFound":3539,"start":0,"maxScore":0.6223264,"docs":[
  {
"postId":"15053",
"score":0.6223264}]
  }},
{
  "groupValue":"25",
  "doclist":{"numFound":2651,"start":0,"maxScore":0.9381923,"docs":[
  {
"postId":"21199",
"score":0.9381923}]
  }},
{
  "groupValue":"61",
  "doclist":{"numFound":160,"start":0,"maxScore":0.66007686,"docs":[
  {
"postId":"8730",
"score":0.66007686}]
  }},
{
  "groupValue":"141",
  "doclist":{"numFound":9,"start":0,"maxScore":0.5074051,"docs":[
  {
"postId":"34406",
"score":0.5074051}]
  }},
{
  "groupValue":"142",
  "doclist":{"numFound":9,"start":0,"maxScore":0.22002561,"docs":[
  {
"postId":"35000",
"score":0.22002561}]
  }},
{
  "groupValue":"189",
  "doclist":{"numFound":1,"start":0,"maxScore":0.09951033,"docs":[
  {
"postId":"33971",
"score":0.09951033}]
  }},
{
  "groupValue":"40",
  "doclist":{"numFound":2,"start":0,"maxScore":0.3283673,"docs":[
  {
"postId":"30142",
"score":0.3283673}]
  }}]}}}

Using parameters
sort=score desc
group.sort=score desc

http://localhost:8983/solr/mcontent.ph_post/select?=&fl=*,score&group.field=partnerId&group.limit=1&group.main=false&group.ngroups=true&group.sort=score
desc&group=true&indent=on&q=text:cars&rows=5000&sort=score
desc&start=0&wt=json&omitHeader=true
{
  "grouped":{
"partnerId":{
  "matches":8681,
  "ngroups":10,
  "groups":[{
  "groupValue":"25",
  "doclist":{"numFound":2651,"start":0,"maxScore":0.9381923,"docs":[
  {
"postId":"21199",
"score":0.9381923}]
  }},
{
  "groupValue":"28",
  "doclist":{"numFound":2023,"start":0,"maxScore":0.6838193,"docs":[
  {
"postId":"31447",
"score":0.6838193}]
  }},
{
  "groupValue":"61",
  "doclist":{"numFound":160,"start":0,"maxScore":0.66007686,"docs":[
  {
"postId":"8730",
"score":0.66007686}]
  }},
{
  "groupValue":"23",
  "doclist":{"numFound":3539,"start":0,"maxScore":0.6223264,"docs":[
  {
"postId":"15053",
"score":0.6223264}]
  }},
{
  "groupValue":"141",
  "doclist":{"numFound":9,"start":0,"maxScore":0.5074051,"docs":[
  {
"postId":"34406",
"score":0.5074051}]
  }},
{
  "groupValue":"63",
  "doclist":{"numFound":143,"start":0,"maxScore":0.48749906,"docs":[
  {
"postId":"26317",
"score":0.48749906}]
  }},
{
  "groupValue":"64",
  "doclist":{"numFound":144,"start":0,"maxScore":0.34190965,"docs":[
  {
"postId":"25549",
"score":0.34190965}]
  }},
{
  "groupValue":"40",
  "doclist":{"numFound":2,"start":0,"maxScore":0.3283673,"docs":[
  {
"postId":"30142",
"score":0.3283673}]
  }},
{
  "groupValue":"142",
  "doclist":{"numFound":9,"start":0,"maxScore":0.22002561,"docs":[
  {
"postId":"35000",
"score":0.22002561}]
  }},
   

Re: Solr/ Velocity dont show full field value

2017-04-11 Thread Erik Hatcher
#field() is defined in _macros.vm as this monstrosity:

# TODO: make this parameterized fully, no context sensitivity
#macro(field $f)
  #if($response.response.highlighting.get($docId).get($f).get(0))
#set($pad = "")
  #foreach($v in $response.response.highlighting.get($docId).get($f))
$pad$v##  #TODO: $esc.html() or maybe make that optional?
#set($pad = " ... ")
  #end
  #else
$esc.html($display.list($doc.getFieldValues($f), ", "))
  #end
#end
Basically that’s saying if there is highlighting returned for the specified 
field, then render it, otherwise render the full field value.  
$doc.getFieldValue() won’t ever work with highlighting - it’s the raw returned 
field value (or empty, potentially) - highlighting has to be looked up 
separately and that’s what the #field() macro tries to do - make it look a bit 
more seamless and slick, to just do #field(“field_name”).  But it does rely on 
highlighting working - so try the json or xml response until you get the 
highlighting configured as needed.

Erik


> On Apr 11, 2017, at 6:14 AM, Hamso  wrote:
> 
> Hey guys,
> I have a problem:
> 
> In Velocity:
> 
> *Beschreibung:*#field('LONG_TEXT')
> 
> In Solr the field "LONG_TEXT" dont show everything only the first ~90-110
> characters.
> But if I set "$doc.getFieldValue('LONG_TEXT')" in the Velocity file, then he
> show me everything whats inside in the field "LONG_TEXT".
> But there is one problem, if I use "$doc.getFieldValue('LONG_TEXT')" instead
> of #field('LONG_TEXT'), the highlight doesnt work.
> Can someone please help me, why #field('LONG_TEXT') doesnt show everthing
> whats inside the field, or why highlighting with
> "$doc.getFieldValue('LONG_TEXT')" doesnt work.
> 
> Schema.xml:
> 
>   />
> 
> positionIncrementGap="100">
>
>  
>   ignoreCase="true"/>
>  
>   maxGramSize="500"/>
> 
>
>  
>   ignoreCase="true"/>
>   ignoreCase="true" synonyms="synonyms.txt"/>
>  
>
> 
> 
> solrconfig only in /browse:
> 
>   on
>   LONG_TEXT
>   true
>   html
>   
>   
> 
> 
> 
> 
> 
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Solr-Velocity-dont-show-full-field-value-tp4329290.html
> Sent from the Solr - User mailing list archive at Nabble.com.



Re: Grouped Result sort issue

2017-04-11 Thread Erick Erickson
the group.sort spec is specified twice in the URL

group.sort=score desc&
group.sort=score desc

Is there a chance that during testing you only changed _one_ of them so you had

group.sort=score desc&
group.sort=score asc

? I think the last one should win.. Shot in the dark.

Best,
Erick

On Tue, Apr 11, 2017 at 3:23 AM, alessandro.benedetti
 wrote:
> To be fair the second result seems consistent with the Solr grouping logic :
>
> *First Query results (Suspicious)*
> 1) group.sort= score desc -> select the group head as you have 1 doc per
> group( the head will be the top scoring doc per group)
> 2) sort=score asc -> sort the groups by the score of the head ascending ( so
> the final resulting groups should be ascending in score)
>
>
> *Second Query results ( CORRECT)*
> 1) group.sort= score desc -> select the group head as you have 1 doc per
> group( the head will be the top scoring doc per group)
> 2) sort -> sort the groups by the score of the head ( so the final resulting
> groups are sorted descending)
>
> Are we sure the the sort is expected to sort the groups after the grouping
> happened ?
> I need to check the internals but I agree the current behaviour is not
> intuitive.
>
> Cheers
>
>
>
>
>
> -
> ---
> Alessandro Benedetti
> Search Consultant, R&D Software Engineer, Director
> Sease Ltd. - www.sease.io
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Grouped-Result-sort-issue-tp4329255p4329292.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Re: Filtering results by minimum relevancy score

2017-04-11 Thread Dorian Hoxha
Can't the filter be used in cases when you're paginating in
sharded-scenario ?
So if you do limit=10, offset=10, each shard will return 20 docs ?
While if you do limit=10, _score<=last_page.min_score, then each shard will
return 10 docs ? (they will still score all docs, but merging will be
faster)

Makes sense ?

On Tue, Apr 11, 2017 at 12:49 PM, alessandro.benedetti  wrote:

> Can i ask what is the final requirement here ?
> What are you trying to do ?
>  - just display less results ?
> you can easily do at search client time, cutting after a certain amount
> - make search faster returning less results ?
> This is not going to work, as you need to score all of them as Erick
> explained.
>
> Function query ( as Mikhail specified) will run on a per document basis (
> if
> I am correct), so if your idea was to speed up the things, this is not
> going
> to work.
>
> It makes much more sense to refine your system to improve relevancy if your
> concern is to have more relevant docs.
> If your concern is just to not show that many pages, you can limit that
> client side.
>
>
>
>
>
>
> -
> ---
> Alessandro Benedetti
> Search Consultant, R&D Software Engineer, Director
> Sease Ltd. - www.sease.io
> --
> View this message in context: http://lucene.472066.n3.
> nabble.com/Filtering-results-by-minimum-relevancy-score-
> tp4329180p4329295.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Solr/ Velocity dont show full field value

2017-04-11 Thread Hamso
Hey guys,
I have a problem:

In Velocity:

*Beschreibung:*#field('LONG_TEXT')

In Solr the field "LONG_TEXT" dont show everything only the first ~90-110
characters.
But if I set "$doc.getFieldValue('LONG_TEXT')" in the Velocity file, then he
show me everything whats inside in the field "LONG_TEXT".
But there is one problem, if I use "$doc.getFieldValue('LONG_TEXT')" instead
of #field('LONG_TEXT'), the highlight doesnt work.
Can someone please help me, why #field('LONG_TEXT') doesnt show everthing
whats inside the field, or why highlighting with
"$doc.getFieldValue('LONG_TEXT')" doesnt work.

Schema.xml:

  



  
  
  
  
 

  
  
  
  



solrconfig only in /browse:

   on
   LONG_TEXT
   true
   html
   
   





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-Velocity-dont-show-full-field-value-tp4329290.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr 6.4. Can't index MS Visio vsdx files

2017-04-11 Thread Gytis Mikuciunas
Hi,

history:
1. we're using single core Solr 6.4 instance on windows server (windows
server 2012 R2 standard),
2. Java v8, (build 1.8.0_121-b13).
3. as a workaround for earlier issues with visio files, we have in
solr-6.4.0\contrib\extraction\lib:
  3.1. ooxml-schemas-1.3.jar instead of poi-ooxml-schemas-3.15.jar
  3.2. curvesapi-1.03.jar

This workaround solved many parsing issues on visio files. However we still
have some other parsing issues left with bunch of visio files.

Could you propose a solution for us on how to fix them?


errors similar to these:

{
"responseHeader": {
"status": 500,
"QTime": 155
},
"error": {
"msg": "org.apache.tika.exception.TikaException: Unexpected
RuntimeException from
org.apache.tika.parser.microsoft.ooxml.OOXMLParser@3c9f695c",
"code": 500,
"trace": "org.apache.solr.common.SolrException:
org.apache.tika.exception.TikaException: Unexpected RuntimeException from
org.apache.tika.parser.microsoft.ooxml.OOXMLParser@3c9f695c\r\n\tat
org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:234)\r\n\tat
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:68)\r\n\tat
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:166)\r\n\tat
org.apache.solr.core.SolrCore.execute(SolrCore.java:2306)\r\n\tat
org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:658)\r\n\tat
org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:464)\r\n\tat
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:345)\r\n\tat
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:296)\r\n\tat
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1691)\r\n\tat
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:582)\r\n\tat
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)\r\n\tat
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:524)\r\n\tat
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:226)\r\n\tat
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180)\r\n\tat
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:512)\r\n\tat
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)\r\n\tat
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1112)\r\n\tat
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)\r\n\tat
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:213)\r\n\tat
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:119)\r\n\tat
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)\r\n\tat
org.eclipse.jetty.server.Server.handle(Server.java:534)\r\n\tat
org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:320)\r\n\tat
org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:251)\r\n\tat
org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:273)\r\n\tat
org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:95)\r\n\tat
org.eclipse.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93)\r\n\tat
org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.executeProduceConsume(ExecuteProduceConsume.java:303)\r\n\tat
org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.produceConsume(ExecuteProduceConsume.java:148)\r\n\tat
org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProduceConsume.java:136)\r\n\tat
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:671)\r\n\tat
org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:589)\r\n\tat
java.lang.Thread.run(Unknown Source)\r\nCaused by:
org.apache.tika.exception.TikaException: Unexpected RuntimeException from
org.apache.tika.parser.microsoft.ooxml.OOXMLParser@3c9f695c\r\n\tat
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:282)\r\n\tat
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)\r\n\tat
org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)\r\n\tat
org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:228)\r\n\t...
32 more\r\nCaused by: org.apache.poi.POIXMLException:
/visio/masters/masters.xml: /visio/masters/master50.xml: :
Invalid 'Row_Type' name 'PolylineTo'\r\n\tat
org.apache.poi.xdgf.exceptions.XDGFException.wrap(XDGFException.java:43)\r\n\tat
org.apache.poi.xdgf.usermodel.XDGFMasters.onDocumentRead(XDGFMasters.java:107)\r\n\tat
org.apache.poi.xdgf.usermodel.XmlVisioDocument.onDocumentRead(XmlVisioDocument.java:106)\r\n\tat
org.apache.poi.POIXMLDocument.load(POIXMLDocument.java:190)\r\n\tat
org.apache.poi.xdgf.usermodel.XmlVisioDocument.(XmlVisioDocume

Re: Filtering results by minimum relevancy score

2017-04-11 Thread alessandro.benedetti
Can i ask what is the final requirement here ?
What are you trying to do ? 
 - just display less results ?
you can easily do at search client time, cutting after a certain amount
- make search faster returning less results ?
This is not going to work, as you need to score all of them as Erick
explained.

Function query ( as Mikhail specified) will run on a per document basis ( if
I am correct), so if your idea was to speed up the things, this is not going
to work.

It makes much more sense to refine your system to improve relevancy if your
concern is to have more relevant docs.
If your concern is just to not show that many pages, you can limit that
client side.






-
---
Alessandro Benedetti
Search Consultant, R&D Software Engineer, Director
Sease Ltd. - www.sease.io
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Filtering-results-by-minimum-relevancy-score-tp4329180p4329295.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr Index size keeps fluctuating, becomes ~4x normal size.

2017-04-11 Thread Toke Eskildsen
On Mon, 2017-04-10 at 13:27 +0530, Himanshu Sachdeva wrote:
> Thanks for your time and quick response. As you said, I changed our
> logging level from SEVERE to INFO and indeed found the performance
> warning *Overlapping onDeckSearchers=2* in the logs.

If you only see it occasionally, it is probably not a problem. If you
see it often, that means that you are re-opening at a high rate,
relative to the time it takes for a searcher to be ready.

Since each searcher holds a lock on the files it searches, and you have
multiple concurrent open searchers on a volatile index, that helps
explain the index size fluctuations.

Each searcher also requires heap, which might explain why you get Out
Of Memory errors.

This all boils down to avoid having (too many) overlapping warming
searchers. 

* Reduce your auto-warm if it is high
* Prolong the time between searcher-opening commits
* Check that you have docValues on fields that you facet or group on

> I am considering limiting the *maxWarmingSearchers* count in
> configuration but want to be sure that nothing breaks in production
> in case simultaneous commits do happen afterwards.

That is one way of doing it, but it does not help you pinpoint where
your problem is. 

> What would happen if we set *maxWarmingSearchers* count to 1 and make
> simultaneous commit from different endpoints? I understand that solr
> will prevent opening a new searcher for the second commit but is that
> all there is to it? Does it mean solr will serve stale data( i.e.
> send stale data to the slaves) ignoring the changes from the second
> commit? [...]

Sorry, I am not that familiar with the details of master-slave-setups.
-- 
Toke Eskildsen, Royal Danish Library


Re: Expiry of Basic Authentication Plugin

2017-04-11 Thread Jordi Domingo Borràs
Browsers retain basic auth information. You have to close it or clean
browsing history. You can also change the user password at server side.

Best

On Tue, Apr 11, 2017 at 7:18 AM, Zheng Lin Edwin Yeo 
wrote:

> Anyone has any idea if the authentication will expired automatically? Mine
> has already been authenticated for more than 20 hours, and it has not auto
> logged out yet.
>
> Regards,
> Edwin
>
> On 11 April 2017 at 00:19, Zheng Lin Edwin Yeo 
> wrote:
>
> > Hi,
> >
> > Would like to check, after I have entered the authentication to access
> > Solr with Basic Authentication Plugin, will the authentication be expired
> > automatically after a period of time?
> >
> > I'm using SolrCloud on Solr 6.4.2
> >
> > Regards,
> > Edwin
> >
>


Re: Grouped Result sort issue

2017-04-11 Thread alessandro.benedetti
To be fair the second result seems consistent with the Solr grouping logic :

*First Query results (Suspicious)*
1) group.sort= score desc -> select the group head as you have 1 doc per
group( the head will be the top scoring doc per group)
2) sort=score asc -> sort the groups by the score of the head ascending ( so
the final resulting groups should be ascending in score)


*Second Query results ( CORRECT)*
1) group.sort= score desc -> select the group head as you have 1 doc per
group( the head will be the top scoring doc per group)
2) sort -> sort the groups by the score of the head ( so the final resulting
groups are sorted descending)

Are we sure the the sort is expected to sort the groups after the grouping
happened ?
I need to check the internals but I agree the current behaviour is not
intuitive.

Cheers





-
---
Alessandro Benedetti
Search Consultant, R&D Software Engineer, Director
Sease Ltd. - www.sease.io
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Grouped-Result-sort-issue-tp4329255p4329292.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Dynamic schema memory consumption

2017-04-11 Thread Dorian Hoxha
Also you should change the heap 32GB->30GB so you're guaranteed to get
pointer compression. I think you should have no need to increase it more
than this, since most things have moved to out-of-heap stuff, like
docValues etc.

On Tue, Apr 11, 2017 at 12:07 PM, Dorian Hoxha 
wrote:

> Isn't 18K lucene-indexes (1 for each shard, not counting the replicas) a
> little too much for 3TB of data ?
> Something like 0.167GB for each shard ?
> Isn't that too much overhead (i've mostly worked with es but still lucene
> underneath) ?
>
> Can't you use 1/100 the current number of collections ?
>
>
> On Mon, Apr 10, 2017 at 5:22 PM, jpereira  wrote:
>
>> Hello guys,
>>
>> I manage a Solr cluster and I am experiencing some problems with dynamic
>> schemas.
>>
>> The cluster has 16 nodes and 1500 collections with 12 shards per
>> collection
>> and 2 replicas per shard. The nodes can be divided in 2 major tiers:
>>  - tier1 is composed of 12 machines with 4 physical cores (8 virtual),
>> 32GB
>> ram and 4TB ssd; these are used mostly for direct queries and data
>> exports;
>>  - tier2 is composed of 4 machines with 20 physical cores (40 virtual),
>> 128GB and 4TB ssd; these are mostly for aggregation queries (facets)
>>
>> The problem I am experiencing is that when using dynamic schemas, the Solr
>> heap size rises dramatically.
>>
>> I have two tier2 machines (lets call them A and B) running one Solr
>> instance
>> each with 96GB heap size, with 36 collections totaling 3TB of mainly
>> fixed-schema (55GB schemaless) data indexed in each machine, and the heap
>> consumption is on average 60GB (it peaks at around 80GB and drops to
>> around
>> 40GB after a GC run).
>>
>> On the other tier2 machines (C and D) I was running one Solr instance on
>> each machine with 32GB heap size and 4 fixed schema collections with about
>> 725GB of data indexed in each machine, which took up about 12GB of heap
>> size. Recently I added 46 collections to these machines with about 220Gb
>> of
>> data. In order to do this I was forced to raise the heap size to 64GB and
>> after indexing everything now the machines have an averaged consumption of
>> 48GB (!!!) (max ~55GB, after GC runs ~37GB)
>>
>> I also noticed that when indexed fixed schema data the CPU utilization is
>> also dramatically lower. I have around 100 workers indexing fixed schema
>> data with and CPU utilization rate of about 10%, while I have only one
>> worker for schemaless data with a CPU utilization cost of about 20%.
>>
>> So, I have a two big questions here:
>> 1. Is this dramatic rise in resources consumption when using dynamic
>> fields
>> "normal"?
>> 2. Is there a way to lower the memory requirements? If so, how?
>>
>> Thanks for your time!
>>
>>
>>
>> --
>> View this message in context: http://lucene.472066.n3.nabble
>> .com/Dynamic-schema-memory-consumption-tp4329184.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>>
>
>


Re: Dynamic schema memory consumption

2017-04-11 Thread Dorian Hoxha
Isn't 18K lucene-indexes (1 for each shard, not counting the replicas) a
little too much for 3TB of data ?
Something like 0.167GB for each shard ?
Isn't that too much overhead (i've mostly worked with es but still lucene
underneath) ?

Can't you use 1/100 the current number of collections ?

On Mon, Apr 10, 2017 at 5:22 PM, jpereira  wrote:

> Hello guys,
>
> I manage a Solr cluster and I am experiencing some problems with dynamic
> schemas.
>
> The cluster has 16 nodes and 1500 collections with 12 shards per collection
> and 2 replicas per shard. The nodes can be divided in 2 major tiers:
>  - tier1 is composed of 12 machines with 4 physical cores (8 virtual), 32GB
> ram and 4TB ssd; these are used mostly for direct queries and data exports;
>  - tier2 is composed of 4 machines with 20 physical cores (40 virtual),
> 128GB and 4TB ssd; these are mostly for aggregation queries (facets)
>
> The problem I am experiencing is that when using dynamic schemas, the Solr
> heap size rises dramatically.
>
> I have two tier2 machines (lets call them A and B) running one Solr
> instance
> each with 96GB heap size, with 36 collections totaling 3TB of mainly
> fixed-schema (55GB schemaless) data indexed in each machine, and the heap
> consumption is on average 60GB (it peaks at around 80GB and drops to around
> 40GB after a GC run).
>
> On the other tier2 machines (C and D) I was running one Solr instance on
> each machine with 32GB heap size and 4 fixed schema collections with about
> 725GB of data indexed in each machine, which took up about 12GB of heap
> size. Recently I added 46 collections to these machines with about 220Gb of
> data. In order to do this I was forced to raise the heap size to 64GB and
> after indexing everything now the machines have an averaged consumption of
> 48GB (!!!) (max ~55GB, after GC runs ~37GB)
>
> I also noticed that when indexed fixed schema data the CPU utilization is
> also dramatically lower. I have around 100 workers indexing fixed schema
> data with and CPU utilization rate of about 10%, while I have only one
> worker for schemaless data with a CPU utilization cost of about 20%.
>
> So, I have a two big questions here:
> 1. Is this dramatic rise in resources consumption when using dynamic fields
> "normal"?
> 2. Is there a way to lower the memory requirements? If so, how?
>
> Thanks for your time!
>
>
>
> --
> View this message in context: http://lucene.472066.n3.nabble
> .com/Dynamic-schema-memory-consumption-tp4329184.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>