Re: /export handler to stream data using CloudSolrStream: JSONParse Exception

2016-10-20 Thread Joel Bernstein
I suspect this is a bug with improperly escaped json. SOLR-7441
 resolved this issue and
released in Solr 6.0.

There have been a large number of improvements, bug fixes, new features and
much better error handling in Solr 6 Streaming Expressions.

Joel Bernstein
http://joelsolr.blogspot.com/

On Thu, Oct 20, 2016 at 5:49 PM, Chetas Joshi 
wrote:

> Hello,
>
> I am using /export handler to stream data using CloudSolrStream.
>
> I am using fl=uuid,space,timestamp where uuid and space are Strings and
> timestamp is long. My query (q=...) is not on these fields.
>
> While reading the results from the Solr cloud, I get the following errors
>
> org.noggit.JSONParser$ParseException: Expected ',' or '}':
> char=5,position=110938
> BEFORE='uuid":"0lG99s8vyaKB2I/I","space":"uuid","timestamp":1 5'
> AFTER='DB6
> 474294954},{"uuid":"0lG99sHT8P5e'
>
>
> Or (For a different query
>
>
> org.noggit.JSONParser$ParseException: Expected ',' or '}':
> char=",position=122528
> BEFORE=':1475618674},{"uuid":"Whz991tX6P4beuhp","space": 3076 "'
> AFTER='uuid","timestamp":1476131442},{"uui'
>
>
> Now what are the possible reasons of me getting this error?
>
>
> Is this related to some kind of data corruption?
>
>
> What are some of the things (possibly some characters in String) that JSON
> will have hard time parsing?
>
>
> The Solr version I use is 5.5.0
>
>
> Thanks
>
>
> Chetas.
>


Re: For TTL, does expirationFieldName need to be indexed?

2016-10-20 Thread Chetas Joshi
You just need to have indexed=true. It will use the inverted index to
delete the expired documents. You don't need stored=true as all the info
required by the DocExpirationUpdateProcessorFactory to delete a document is
there in the inverted index.

On Thu, Oct 20, 2016 at 4:26 PM, Brent  wrote:

> Thanks for the reply.
>
> Follow up:
> Do I need to have the field stored? While I don't need to ever look at the
> field's original contents, I'm guessing that the
> DocExpirationUpdateProcessorFactory does, so that would mean I need to
> have
> stored=true as well, correct?
>
>
>
> --
> View this message in context: http://lucene.472066.n3.
> nabble.com/For-TTL-does-expirationFieldName-need-to-
> be-indexed-tp4301522p4302386.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: For TTL, does expirationFieldName need to be indexed?

2016-10-20 Thread Brent
Thanks for the reply.

Follow up:
Do I need to have the field stored? While I don't need to ever look at the
field's original contents, I'm guessing that the
DocExpirationUpdateProcessorFactory does, so that would mean I need to have
stored=true as well, correct? 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/For-TTL-does-expirationFieldName-need-to-be-indexed-tp4301522p4302386.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: Load balancing with solr cloud

2016-10-20 Thread Garth Grimm
No matter where you send the update to initially, it will get sent to the 
leader of the shard first.  The leader does a parsing of it to ensure it can be 
indexed, then it will send it to all the replicas in parallel.  The replicas 
will do their parsing and report back that they have persisted the data to 
their tlogs.  Once the leader hears back from all the replicas, the leader will 
reply back that the update is complete, and your client will receive it's HTTP 
response on the transaction.

At least that's the general case flow.

So it really won't matter how your load balancing is handled above the cloud.  
All the work is done the same way, with the leader having to do slightly more 
work than the replicas.

If you can manage to initially send all the updates to the correct leader, you 
can skip one hop before the work starts, which may buy you a small performance 
boost compared to randomly picking a node to send the request to.  But you'll 
need to be taxing the cloud pretty heavily before that difference becomes too 
noticeable.

-Original Message-
From: Sadheera Vithanage [mailto:sadhee...@gmail.com] 
Sent: Thursday, October 20, 2016 5:55 PM
To: solr-user@lucene.apache.org
Subject: Re: Load balancing with solr cloud

Thank you very much John and Garth,

I've tested it out and it works fine, I can send the updates to any of the solr 
nodes.

If I am not using a zookeeper aware client and If I direct all my queries (read 
queries) always to the leader of the solr instances,does it automatically load 
balance between the replicas?

Or do I have to hit each instance in a round robin way and have the load 
balanced through the code?

Please advise the best way to do so..

Thank you very much again..



On Fri, Oct 21, 2016 at 9:18 AM, Garth Grimm < 
garthgr...@averyranchconsulting.com> wrote:

> Actually, zookeeper really won't participate in the update process at all.
>
> If you're using a "zookeeper aware" client like SolrJ, the SolrJ 
> library will read the cloud configuration from zookeeper, but will 
> send all the updates to the leader of the shard that the document is meant to 
> go to.
>
> If you're not using a "zookeeper aware" client, you can send the 
> update to any of the solr nodes, and they will evaluate the cloud 
> configuration information they've already received from zookeeper, and 
> then forward the document to leader of the shard that will handle the 
> document update.
>
> In general, Zookeeper really only provides the cloud configuration 
> information once (at most) during all the updates, the actual document 
> update only gets sent to solr nodes.  There's definitely no need to 
> distribute load between zookeepers for this situation.
>
> Regards,
> Garth Grimm
>
> -Original Message-
> From: Sadheera Vithanage [mailto:sadhee...@gmail.com]
> Sent: Thursday, October 20, 2016 5:11 PM
> To: solr-user@lucene.apache.org
> Subject: Load balancing with solr cloud
>
> Hi again Experts,
>
> I have a question related to load balancing in solr cloud.
>
> If we have 3 zookeeper nodes and 3 solr instances (1 leader, 2 
> secondary replicas and 1 shard), when the traffic comes in the primary 
> zookeeper server will be hammered, correct?
>
> I understand (or is it wrong) that zookeeper will load balance between 
> solr nodes but if we want to distribute the load between zookeeper 
> nodes as well, what is the best approach.
>
> Cost is a concern for us too.
>
> Thank you very much, in advance.
>
> --
> Regards
>
> Sadheera Vithanage
>



--
Regards

Sadheera Vithanage


Re: Load balancing with solr cloud

2016-10-20 Thread Sadheera Vithanage
Thank you very much John and Garth,

I've tested it out and it works fine, I can send the updates to any of the
solr nodes.

If I am not using a zookeeper aware client and If I direct all my queries
(read queries) always to the leader of the solr instances,does it
automatically load balance between the replicas?

Or do I have to hit each instance in a round robin way and have the load
balanced through the code?

Please advise the best way to do so..

Thank you very much again..



On Fri, Oct 21, 2016 at 9:18 AM, Garth Grimm <
garthgr...@averyranchconsulting.com> wrote:

> Actually, zookeeper really won't participate in the update process at all.
>
> If you're using a "zookeeper aware" client like SolrJ, the SolrJ library
> will read the cloud configuration from zookeeper, but will send all the
> updates to the leader of the shard that the document is meant to go to.
>
> If you're not using a "zookeeper aware" client, you can send the update to
> any of the solr nodes, and they will evaluate the cloud configuration
> information they've already received from zookeeper, and then forward the
> document to leader of the shard that will handle the document update.
>
> In general, Zookeeper really only provides the cloud configuration
> information once (at most) during all the updates, the actual document
> update only gets sent to solr nodes.  There's definitely no need to
> distribute load between zookeepers for this situation.
>
> Regards,
> Garth Grimm
>
> -Original Message-
> From: Sadheera Vithanage [mailto:sadhee...@gmail.com]
> Sent: Thursday, October 20, 2016 5:11 PM
> To: solr-user@lucene.apache.org
> Subject: Load balancing with solr cloud
>
> Hi again Experts,
>
> I have a question related to load balancing in solr cloud.
>
> If we have 3 zookeeper nodes and 3 solr instances (1 leader, 2 secondary
> replicas and 1 shard), when the traffic comes in the primary zookeeper
> server will be hammered, correct?
>
> I understand (or is it wrong) that zookeeper will load balance between
> solr nodes but if we want to distribute the load between zookeeper nodes as
> well, what is the best approach.
>
> Cost is a concern for us too.
>
> Thank you very much, in advance.
>
> --
> Regards
>
> Sadheera Vithanage
>



-- 
Regards

Sadheera Vithanage


RE: Load balancing with solr cloud

2016-10-20 Thread Garth Grimm
Actually, zookeeper really won't participate in the update process at all.

If you're using a "zookeeper aware" client like SolrJ, the SolrJ library will 
read the cloud configuration from zookeeper, but will send all the updates to 
the leader of the shard that the document is meant to go to.

If you're not using a "zookeeper aware" client, you can send the update to any 
of the solr nodes, and they will evaluate the cloud configuration information 
they've already received from zookeeper, and then forward the document to 
leader of the shard that will handle the document update.

In general, Zookeeper really only provides the cloud configuration information 
once (at most) during all the updates, the actual document update only gets 
sent to solr nodes.  There's definitely no need to distribute load between 
zookeepers for this situation.

Regards,
Garth Grimm

-Original Message-
From: Sadheera Vithanage [mailto:sadhee...@gmail.com] 
Sent: Thursday, October 20, 2016 5:11 PM
To: solr-user@lucene.apache.org
Subject: Load balancing with solr cloud

Hi again Experts,

I have a question related to load balancing in solr cloud.

If we have 3 zookeeper nodes and 3 solr instances (1 leader, 2 secondary 
replicas and 1 shard), when the traffic comes in the primary zookeeper server 
will be hammered, correct?

I understand (or is it wrong) that zookeeper will load balance between solr 
nodes but if we want to distribute the load between zookeeper nodes as well, 
what is the best approach.

Cost is a concern for us too.

Thank you very much, in advance.

--
Regards

Sadheera Vithanage


Re: Load balancing with solr cloud

2016-10-20 Thread John Bickerstaff
Others on the list are more expert, but I think your #1 Zookeeper will not
get hammered.

As I understand it, Solr itself (the leader) will handle farming out the
work to the other two Solr nodes.

The amount of traffic on the Zookeeper instances should be minimal.

Now - could your SolrCloud of 3 nodes get hammered too hard to keep up?
Yes, but I believe you'd see Solr go down due to overload before you would
ever see any trouble with Zookeeper because of load.

I will be corrected shortly  if I got any of that wrong...

On Thu, Oct 20, 2016 at 4:11 PM, Sadheera Vithanage 
wrote:

> Hi again Experts,
>
> I have a question related to load balancing in solr cloud.
>
> If we have 3 zookeeper nodes and 3 solr instances (1 leader, 2 secondary
> replicas and 1 shard), when the traffic comes in the primary zookeeper
> server will be hammered, correct?
>
> I understand (or is it wrong) that zookeeper will load balance between solr
> nodes but if we want to distribute the load between zookeeper nodes as
> well, what is the best approach.
>
> Cost is a concern for us too.
>
> Thank you very much, in advance.
>
> --
> Regards
>
> Sadheera Vithanage
>


Load balancing with solr cloud

2016-10-20 Thread Sadheera Vithanage
Hi again Experts,

I have a question related to load balancing in solr cloud.

If we have 3 zookeeper nodes and 3 solr instances (1 leader, 2 secondary
replicas and 1 shard), when the traffic comes in the primary zookeeper
server will be hammered, correct?

I understand (or is it wrong) that zookeeper will load balance between solr
nodes but if we want to distribute the load between zookeeper nodes as
well, what is the best approach.

Cost is a concern for us too.

Thank you very much, in advance.

-- 
Regards

Sadheera Vithanage


/export handler to stream data using CloudSolrStream: JSONParse Exception

2016-10-20 Thread Chetas Joshi
Hello,

I am using /export handler to stream data using CloudSolrStream.

I am using fl=uuid,space,timestamp where uuid and space are Strings and
timestamp is long. My query (q=...) is not on these fields.

While reading the results from the Solr cloud, I get the following errors

org.noggit.JSONParser$ParseException: Expected ',' or '}':
char=5,position=110938
BEFORE='uuid":"0lG99s8vyaKB2I/I","space":"uuid","timestamp":1 5' AFTER='DB6
474294954},{"uuid":"0lG99sHT8P5e'


Or (For a different query


org.noggit.JSONParser$ParseException: Expected ',' or '}':
char=",position=122528
BEFORE=':1475618674},{"uuid":"Whz991tX6P4beuhp","space": 3076 "'
AFTER='uuid","timestamp":1476131442},{"uui'


Now what are the possible reasons of me getting this error?


Is this related to some kind of data corruption?


What are some of the things (possibly some characters in String) that JSON
will have hard time parsing?


The Solr version I use is 5.5.0


Thanks


Chetas.


Re: indexing - offline

2016-10-20 Thread Rallavagu

Thanks Evan for quick response.

On 10/20/16 10:19 AM, Tom Evans wrote:

On Thu, Oct 20, 2016 at 5:38 PM, Rallavagu  wrote:

Solr 5.4.1 cloud with embedded jetty

Looking for some ideas around offline indexing where an independent node
will be indexed offline (not in the cloud) and added to the cloud to become
leader so other cloud nodes will get replicated. Wonder if this is possible
without interrupting the live service. Thanks.


How we do this, to reindex collection "foo":

1) First, collection "foo" should be an alias to the real collection,
eg "foo_1" aliased to "foo"
2) Have a node "node_i" in the cluster that is used for indexing. It
doesn't hold any shards of any collections
So, a node is part of the cluster but no collections? How can we add a 
node to cloud without active participation?



3) Use collections API to create collection "foo_2", with however many
shards required, but all placed on "node_i"
4) Index "foo_2" with new data with DIH or direct indexing to "node_1".
5) Use collections API to expand "foo_2" to all the nodes/replicas
that it should be on
Could you please point me to documentation on how to do this? I am 
referring to this doc 
https://cwiki.apache.org/confluence/display/solr/Collections+API. But, 
it has many options and honestly not sure which one would be useful in 
this case.


Thanks


6) Remove "foo_2" from "node_i"
7) Verify contents of "foo_2" are correct
8) Use collections API to change alias for "foo" to "foo_2"
9) Remove "foo_1" collection once happy

This avoids indexing overwhelming the performance of the cluster (or
any nodes in the cluster that receive queries), and can be performed
with zero downtime or config changes on the clients.

Cheers

Tom



Re: (solrcloud) Importing documents into "implicit" router

2016-10-20 Thread John Bickerstaff
This may help?
https://cwiki.apache.org/confluence/display/solr/Shards+and+Indexing+Data+in+SolrCloud

On Thu, Oct 20, 2016 at 12:09 PM, Customer 
wrote:

> Hey,
>
> I hope you all are doing well..
>
> I got a router with "router.name=implicit" with couple of shards (lets
> call them shardA and shardB) and got a mysql table ready to import for
> testing purposes. So for example I want to load half of the data to shardA
> and the rest - to the shardB. Question is - how I can do that ? I thought
> this is something to add to the RESTful call when doing import for example
> like curl -m 9 "http://localhost:8983/solr/te
> stIMPLICIT2/dataimport?=command=full-import=shardA , but looks
> like I was wrong.
>
> Thanks
>


Re: (solrcloud) Importing documents into "implicit" router

2016-10-20 Thread John Bickerstaff
more specifically, this bit from that page seems like it might be of
interest:

If you created the collection and defined the "implicit" router at the time
of creation, you can additionally define a router.field parameter to use a
field from each document to identify a shard where the document belongs. If
the field specified is missing in the document, however, the document will
be rejected. You could also use the _route_ parameter to name a specific
shard.

On Thu, Oct 20, 2016 at 12:12 PM, John Bickerstaff  wrote:

> This may help?  https://cwiki.apache.org/confluence/display/solr/
> Shards+and+Indexing+Data+in+SolrCloud
>
> On Thu, Oct 20, 2016 at 12:09 PM, Customer 
> wrote:
>
>> Hey,
>>
>> I hope you all are doing well..
>>
>> I got a router with "router.name=implicit" with couple of shards (lets
>> call them shardA and shardB) and got a mysql table ready to import for
>> testing purposes. So for example I want to load half of the data to shardA
>> and the rest - to the shardB. Question is - how I can do that ? I thought
>> this is something to add to the RESTful call when doing import for example
>> like curl -m 9 "http://localhost:8983/solr/te
>> stIMPLICIT2/dataimport?=command=full-import=shardA , but looks
>> like I was wrong.
>>
>> Thanks
>>
>
>


(solrcloud) Importing documents into "implicit" router

2016-10-20 Thread Customer

Hey,

I hope you all are doing well..

I got a router with "router.name=implicit" with couple of shards (lets 
call them shardA and shardB) and got a mysql table ready to import for 
testing purposes. So for example I want to load half of the data to 
shardA and the rest - to the shardB. Question is - how I can do that ? I 
thought this is something to add to the RESTful call when doing import 
for example like curl -m 9 
"http://localhost:8983/solr/testIMPLICIT2/dataimport?=command=full-import=shardA 
, but looks like I was wrong.


Thanks


Re: Result Grouping vs. Collapsing Query Parser -- Can one be deprecated?

2016-10-20 Thread Jeff Wartes
I’ll also mention the choice to improve processing speed by allocating more 
memory, which increases the importance of GC tuning. This bit me when I tried 
using it on a larger index. 
https://issues.apache.org/jira/browse/SOLR-9125

I don’t know if the result grouping feature shares the same issue. Probably.
I actually never bothered trying it, since the comments I’d read made it seem 
like a non-starter.


On 10/19/16, 4:34 PM, "Joel Bernstein"  wrote:

Also as you consider using collapse you'll want to keep in mind the feature
compromises that were made to achieve the higher performance:

1) Collapse does not directly support faceting. It simply collapses the
results and the faceting components compute facets on the collapsed result
set. Grouping has direct support for faceting which, can be slow, but it
has options other then just computing facets on the collapsed result set.

2) Originally collapse only supported selecting group heads with min/max
value of a numeric field. It did not support using the sort parameter for
selecting the group head. Recently the sort parameter was added to
collapse, but this likely is not nearly as fast as using the min/max for
selecting group heads.



Joel Bernstein
http://joelsolr.blogspot.com/

On Wed, Oct 19, 2016 at 7:20 PM, Joel Bernstein  wrote:

> Originally collapsing was designed with a very small feature set and one
> goal in mind: High performance collapsing on high cardinality fields. To
> avoid having to compromise on that goal, it was developed as a separate
> feature.
>
> The trick in combining grouping and collapsing into one feature, is to do
> it in a way that does not hurt the original performance goal of collapse.
> Otherwise we'll be back to just have slow grouping.
>
> Perhaps the new API's that are being worked could have a facade over
> grouping and collapsing so they would share the same API.
>
>
>
>
>
>
>
> Joel Bernstein
> http://joelsolr.blogspot.com/
>
> On Wed, Oct 19, 2016 at 6:51 PM, Mike Lissner  com> wrote:
>
>> Hi all,
>>
>> I've had a rotten day today because of Solr. I want to share my 
experience
>> and perhaps see if we can do something to fix this particular situation 
in
>> the future.
>>
>> Solr currently has two ways to get grouped results (so far!). You can
>> either use Result Grouping or you can use the Collapsing Query Parser.
>> Result grouping seems like the obvious way to go. It's well documented,
>> the
>> parameters are clear, it doesn't use a bunch of weird syntax (ie,
>> {!collapse blah=foo}), and it uses the feature name from SQL (so it comes
>> up in Google).
>>
>> OTOH, if you use faceting with result grouping, which I imagine many
>> people
>> do, you get terrible performance. In our case it went from subsecond to
>> 10-120 seconds for big queries. Insanely bad.
>>
>> Collapsing Query Parser looks like a good way forward for us, and we'll 
be
>> investigating that, but it uses the Expand component that our library
>> doesn't support, to say nothing of the truly bizarre syntax. So this will
>> be a fair amount of effort to switch.
>>
>> I'm curious if there is anything we can do to clean up this situation.
>> What
>> I'd really like to do is:
>>
>> 1. Put a HUGE warning on the Result Grouping docs directing people away
>> from the feature if they plan to use faceting (or perhaps directing them
>> away no matter what?)
>>
>> 2. Work towards eliminating one or the other of these features. They're
>> nearly completely compatible, except for their syntax and performance. 
The
>> collapsing query parser apparently was only written because the result
>> grouping had such bad performance -- In other words, it doesn't exist to
>> provide unique features, it exists to be faster than the old way. Maybe 
we
>> can get rid of one or the other of these, taking the best parts from each
>> (syntax from Result Grouping, and performance from Collapse Query 
Parser)?
>>
>> Thanks,
>>
>> Mike
>>
>> PS -- For some extra context, I want to share some other reasons this is
>> frustrating:
>>
>> 1. I just spent a week upgrading a third-party library so it would 
support
>> grouped results, and another week implementing the feature in our code
>> with
>> tests and everything. That was a waste.
>> 2. It's hard to notice performance issues until after you deploy to a big
>> data environment. This creates a bad situation for users until you detect
>> it and revert the new features.
>> 3. The documentation *could* say something about the fact that a new
>> feature was 

Re: indexing - offline

2016-10-20 Thread Tom Evans
On Thu, Oct 20, 2016 at 5:38 PM, Rallavagu  wrote:
> Solr 5.4.1 cloud with embedded jetty
>
> Looking for some ideas around offline indexing where an independent node
> will be indexed offline (not in the cloud) and added to the cloud to become
> leader so other cloud nodes will get replicated. Wonder if this is possible
> without interrupting the live service. Thanks.

How we do this, to reindex collection "foo":

1) First, collection "foo" should be an alias to the real collection,
eg "foo_1" aliased to "foo"
2) Have a node "node_i" in the cluster that is used for indexing. It
doesn't hold any shards of any collections
3) Use collections API to create collection "foo_2", with however many
shards required, but all placed on "node_i"
4) Index "foo_2" with new data with DIH or direct indexing to "node_1".
5) Use collections API to expand "foo_2" to all the nodes/replicas
that it should be on
6) Remove "foo_2" from "node_i"
7) Verify contents of "foo_2" are correct
8) Use collections API to change alias for "foo" to "foo_2"
9) Remove "foo_1" collection once happy

This avoids indexing overwhelming the performance of the cluster (or
any nodes in the cluster that receive queries), and can be performed
with zero downtime or config changes on the clients.

Cheers

Tom


Soft commit from curl

2016-10-20 Thread Michal Danilák
Does the following command issue soft commit or hard commit?

curl http://localhost:8984/solr/update?softCommit=true -H "Content-Type:
text/xml" --data-binary ''

How to find out which commit was triggered? Can I get it somewhere in logs?

Thanks.


indexing - offline

2016-10-20 Thread Rallavagu

Solr 5.4.1 cloud with embedded jetty

Looking for some ideas around offline indexing where an independent node 
will be indexed offline (not in the cloud) and added to the cloud to 
become leader so other cloud nodes will get replicated. Wonder if this 
is possible without interrupting the live service. Thanks.


Re: Memory Issue with SnapPuller

2016-10-20 Thread Jihwan Kim
This is also the screenshot of jvisualvm.
This exception occurred at 2:55PM and 3:40PM and OOME occurs at 3:41PM.
SnapPuller  - java.lang.InterruptedException
at java.util.concurrent.FutureTask.awaitDone(FutureTask.java:404)
at java.util.concurrent.FutureTask.get(FutureTask.java:191)
at SnapPuller.openNewSearcherAndUpdateCommitPoint(SnapPuller.java:680)

On Thu, Oct 20, 2016 at 10:14 AM, Jihwan Kim  wrote:

> Good points.
> I am able to create this with periodic snap puller and only one http
> request.  When I load the Solr on tomcat, the initial memory usage was
> between 600M to 800 M.  First time, I used 1.5 G and then increased the
> heap to 3.5G.  (When I said 'triple', I meant comparing to the initial
> memory consumption)
>
> OK... Shall we focus on my second question: When the core reload happens
> successfully (no matter it throws the exception or not), does Solr need to
> call the openNewSearcherAndUpdateCommitPoint method?  I think this
> openNewSearcherAndUpdateCommitPoint method tries to open a new searcher
> on the old SolrCore.
>
> Thank you!
>
>
> On Thu, Oct 20, 2016 at 9:55 AM, Erick Erickson 
> wrote:
>
>> You say you tripled the memory. Up to what? Tripling from 500M t0 1.5G
>> isn't likely enough, tripling from 6G to 18G is something else
>> again
>>
>> You can take a look through any of the memory profilers and try to
>> catch the objects (and where they're being allocated). The second is
>> to look at the stack trace (presuming you don't have an OOM killer
>> script running) and perhaps triangulate that way.
>>
>> Best,
>> Erick
>>
>> On Thu, Oct 20, 2016 at 11:44 AM, Jihwan Kim  wrote:
>> > Thank you Shawn. I understand the two options.
>> > After my own testing with a smaller heap, I increased my heap size more
>> than
>> > triple, but OOME happens again with my testing cases under the
>> controlled
>> > thread process. Increased heap size just delayed the OOME.
>> >
>> > Can you provide a feedback on my second question:  When the core reload
>> > happens successfully (no matter it throws the exception or not), does
>> Solr
>> > need to call the openNewSearcherAndUpdateCommitPoint method?
>> >
>> > As I described on my previous email, a thread created from
>> > openNewSearcherAndUpdateCommitPoint method hangs and cause a high CPU
>> usage
>> > and a slow response time.  Attached image is the thread hung.
>> >
>> >
>> >
>> > On Thu, Oct 20, 2016 at 9:29 AM, Shawn Heisey 
>> wrote:
>> >>
>> >> On 10/20/2016 8:44 AM, Jihwan Kim wrote:
>> >> > We are using Solr 4.10.4 and experiencing out of memory exception. It
>> >> > seems the problem is cause by the following code & scenario.
>> >>
>> >> When you get an OutOfMemoryError exception that tells you there's not
>> >> enough heap space, the place where the exception happens is frequently
>> >> unrelated to the actual source of the problem.  Also, unless the
>> >> programmer engages in extraordinary effort, encountering OOME will
>> cause
>> >> program behavior to become completely unpredictable.  Most of Solr has
>> >> *NOT* had the benefit of extraordinary effort to handle OOME
>> gracefully.
>> >>
>> >> Before continuing with troubleshooting of SnapPuller, you're going to
>> >> need to fix the OOME error.  When you run out of memory, that is likely
>> >> to be the CAUSE of any errors you're seeing, not a symptom.
>> >>
>> >> There are exactly two ways to deal with OOME:  Increase the max heap,
>> or
>> >> take steps to reduce the amount of heap required.  Increasing the heap
>> >> is the easiest option, and typically the first step.  Sometimes it's
>> the
>> >> ONLY option.
>> >>
>> >> https://wiki.apache.org/solr/SolrPerformanceProblems#Java_Heap
>> >>
>> >> Thanks,
>> >> Shawn
>> >>
>> >
>>
>
>


Re: Memory Issue with SnapPuller

2016-10-20 Thread Jihwan Kim
Good points.
I am able to create this with periodic snap puller and only one http
request.  When I load the Solr on tomcat, the initial memory usage was
between 600M to 800 M.  First time, I used 1.5 G and then increased the
heap to 3.5G.  (When I said 'triple', I meant comparing to the initial
memory consumption)

OK... Shall we focus on my second question: When the core reload happens
successfully (no matter it throws the exception or not), does Solr need to
call the openNewSearcherAndUpdateCommitPoint method?  I think this
openNewSearcherAndUpdateCommitPoint method tries to open a new searcher on
the old SolrCore.

Thank you!


On Thu, Oct 20, 2016 at 9:55 AM, Erick Erickson 
wrote:

> You say you tripled the memory. Up to what? Tripling from 500M t0 1.5G
> isn't likely enough, tripling from 6G to 18G is something else
> again
>
> You can take a look through any of the memory profilers and try to
> catch the objects (and where they're being allocated). The second is
> to look at the stack trace (presuming you don't have an OOM killer
> script running) and perhaps triangulate that way.
>
> Best,
> Erick
>
> On Thu, Oct 20, 2016 at 11:44 AM, Jihwan Kim  wrote:
> > Thank you Shawn. I understand the two options.
> > After my own testing with a smaller heap, I increased my heap size more
> than
> > triple, but OOME happens again with my testing cases under the controlled
> > thread process. Increased heap size just delayed the OOME.
> >
> > Can you provide a feedback on my second question:  When the core reload
> > happens successfully (no matter it throws the exception or not), does
> Solr
> > need to call the openNewSearcherAndUpdateCommitPoint method?
> >
> > As I described on my previous email, a thread created from
> > openNewSearcherAndUpdateCommitPoint method hangs and cause a high CPU
> usage
> > and a slow response time.  Attached image is the thread hung.
> >
> >
> >
> > On Thu, Oct 20, 2016 at 9:29 AM, Shawn Heisey 
> wrote:
> >>
> >> On 10/20/2016 8:44 AM, Jihwan Kim wrote:
> >> > We are using Solr 4.10.4 and experiencing out of memory exception. It
> >> > seems the problem is cause by the following code & scenario.
> >>
> >> When you get an OutOfMemoryError exception that tells you there's not
> >> enough heap space, the place where the exception happens is frequently
> >> unrelated to the actual source of the problem.  Also, unless the
> >> programmer engages in extraordinary effort, encountering OOME will cause
> >> program behavior to become completely unpredictable.  Most of Solr has
> >> *NOT* had the benefit of extraordinary effort to handle OOME gracefully.
> >>
> >> Before continuing with troubleshooting of SnapPuller, you're going to
> >> need to fix the OOME error.  When you run out of memory, that is likely
> >> to be the CAUSE of any errors you're seeing, not a symptom.
> >>
> >> There are exactly two ways to deal with OOME:  Increase the max heap, or
> >> take steps to reduce the amount of heap required.  Increasing the heap
> >> is the easiest option, and typically the first step.  Sometimes it's the
> >> ONLY option.
> >>
> >> https://wiki.apache.org/solr/SolrPerformanceProblems#Java_Heap
> >>
> >> Thanks,
> >> Shawn
> >>
> >
>


Re: registered User

2016-10-20 Thread Erick Erickson
Done, thanks!

On Thu, Oct 20, 2016 at 4:56 AM, kult.n...@googlemail.com
 wrote:
> Hi,
>
> please add my User "NilsFaupel" of the Solr Wiki to the ContributorsGroup.
>
> Regards
>
> Nils


Re: Memory Issue with SnapPuller

2016-10-20 Thread Erick Erickson
You say you tripled the memory. Up to what? Tripling from 500M t0 1.5G
isn't likely enough, tripling from 6G to 18G is something else
again

You can take a look through any of the memory profilers and try to
catch the objects (and where they're being allocated). The second is
to look at the stack trace (presuming you don't have an OOM killer
script running) and perhaps triangulate that way.

Best,
Erick

On Thu, Oct 20, 2016 at 11:44 AM, Jihwan Kim  wrote:
> Thank you Shawn. I understand the two options.
> After my own testing with a smaller heap, I increased my heap size more than
> triple, but OOME happens again with my testing cases under the controlled
> thread process. Increased heap size just delayed the OOME.
>
> Can you provide a feedback on my second question:  When the core reload
> happens successfully (no matter it throws the exception or not), does Solr
> need to call the openNewSearcherAndUpdateCommitPoint method?
>
> As I described on my previous email, a thread created from
> openNewSearcherAndUpdateCommitPoint method hangs and cause a high CPU usage
> and a slow response time.  Attached image is the thread hung.
>
>
>
> On Thu, Oct 20, 2016 at 9:29 AM, Shawn Heisey  wrote:
>>
>> On 10/20/2016 8:44 AM, Jihwan Kim wrote:
>> > We are using Solr 4.10.4 and experiencing out of memory exception. It
>> > seems the problem is cause by the following code & scenario.
>>
>> When you get an OutOfMemoryError exception that tells you there's not
>> enough heap space, the place where the exception happens is frequently
>> unrelated to the actual source of the problem.  Also, unless the
>> programmer engages in extraordinary effort, encountering OOME will cause
>> program behavior to become completely unpredictable.  Most of Solr has
>> *NOT* had the benefit of extraordinary effort to handle OOME gracefully.
>>
>> Before continuing with troubleshooting of SnapPuller, you're going to
>> need to fix the OOME error.  When you run out of memory, that is likely
>> to be the CAUSE of any errors you're seeing, not a symptom.
>>
>> There are exactly two ways to deal with OOME:  Increase the max heap, or
>> take steps to reduce the amount of heap required.  Increasing the heap
>> is the easiest option, and typically the first step.  Sometimes it's the
>> ONLY option.
>>
>> https://wiki.apache.org/solr/SolrPerformanceProblems#Java_Heap
>>
>> Thanks,
>> Shawn
>>
>


Re: Memory Issue with SnapPuller

2016-10-20 Thread Jihwan Kim
Thank you Shawn. I understand the two options.
After my own testing with a smaller heap, I increased my heap size more
than triple, but OOME happens again with my testing cases under the
controlled thread process. Increased heap size just delayed the OOME.

Can you provide a feedback on my second question:  When the core reload
happens successfully (no matter it throws the exception or not), does Solr
need to call the openNewSearcherAndUpdateCommitPoint method?

As I described on my previous email, a thread created from
openNewSearcherAndUpdateCommitPoint method hangs and cause a high CPU usage
and a slow response time.  Attached image is the thread hung.



On Thu, Oct 20, 2016 at 9:29 AM, Shawn Heisey  wrote:

> On 10/20/2016 8:44 AM, Jihwan Kim wrote:
> > We are using Solr 4.10.4 and experiencing out of memory exception. It
> > seems the problem is cause by the following code & scenario.
>
> When you get an OutOfMemoryError exception that tells you there's not
> enough heap space, the place where the exception happens is frequently
> unrelated to the actual source of the problem.  Also, unless the
> programmer engages in extraordinary effort, encountering OOME will cause
> program behavior to become completely unpredictable.  Most of Solr has
> *NOT* had the benefit of extraordinary effort to handle OOME gracefully.
>
> Before continuing with troubleshooting of SnapPuller, you're going to
> need to fix the OOME error.  When you run out of memory, that is likely
> to be the CAUSE of any errors you're seeing, not a symptom.
>
> There are exactly two ways to deal with OOME:  Increase the max heap, or
> take steps to reduce the amount of heap required.  Increasing the heap
> is the easiest option, and typically the first step.  Sometimes it's the
> ONLY option.
>
> https://wiki.apache.org/solr/SolrPerformanceProblems#Java_Heap
>
> Thanks,
> Shawn
>
>


Re: Memory Issue with SnapPuller

2016-10-20 Thread Shawn Heisey
On 10/20/2016 8:44 AM, Jihwan Kim wrote:
> We are using Solr 4.10.4 and experiencing out of memory exception. It
> seems the problem is cause by the following code & scenario. 

When you get an OutOfMemoryError exception that tells you there's not
enough heap space, the place where the exception happens is frequently
unrelated to the actual source of the problem.  Also, unless the
programmer engages in extraordinary effort, encountering OOME will cause
program behavior to become completely unpredictable.  Most of Solr has
*NOT* had the benefit of extraordinary effort to handle OOME gracefully.

Before continuing with troubleshooting of SnapPuller, you're going to
need to fix the OOME error.  When you run out of memory, that is likely
to be the CAUSE of any errors you're seeing, not a symptom.

There are exactly two ways to deal with OOME:  Increase the max heap, or
take steps to reduce the amount of heap required.  Increasing the heap
is the easiest option, and typically the first step.  Sometimes it's the
ONLY option.

https://wiki.apache.org/solr/SolrPerformanceProblems#Java_Heap

Thanks,
Shawn



Re: Memory Issue with SnapPuller

2016-10-20 Thread Jihwan Kim
A little more about "At certain timing, this method also throw "
SnapPuller  - java.lang.InterruptedException
at java.util.concurrent.FutureTask.awaitDone(FutureTask.java:404)
at java.util.concurrent.FutureTask.get(FutureTask.java:191)
at SnapPuller.openNewSearcherAndUpdateCommitPoint(SnapPuller.java:680)

This is less confident scenario.
The old core didn't complete the close() method during the reloadCore.
Then, it execute the openNewSearcherAndUpdateCommitPoint method.  Now, a
http request, as an example, finished a process and called the SolrCore
close() method.  refCount is 0. and go to all other process in the close()


On Thu, Oct 20, 2016 at 8:44 AM, Jihwan Kim  wrote:

> Hi,
> We are using Solr 4.10.4 and experiencing out of memory exception.  It
> seems the problem is cause by the following code & scenario.
>
> This is the last part of a fetchLastIndex method in SnapPuller.java
>
> // we must reload the core after we open the IW back up
> if (reloadCore) {
>   reloadCore();
> }
>
> if (successfulInstall) {
>   if (isFullCopyNeeded) {
> // let the system know we are changing dir's and the old one
> // may be closed
> if (indexDir != null) {
>   LOG.info("removing old index directory " + indexDir);
>   core.getDirectoryFactory().doneWithDirectory(indexDir);
>   core.getDirectoryFactory().remove(indexDir);
> }
>   }
>   if (isFullCopyNeeded) {
> solrCore.getUpdateHandler().newIndexWriter(isFullCopyNeeded);
>   }
>
>   openNewSearcherAndUpdateCommitPoint(isFullCopyNeeded);
> }
>
> Inside the reloadCore, it create a new core, register it, and try to close
> the current/old core.  When the closing old core process goes normal, it
> throws an exception "SnapPull failed :org.apache.solr.common.SolrException:
> Index fetch failed Caused by java.lang.RuntimeException: Interrupted while
> waiting for core reload to finish Caused by Caused by: java.lang.
> InterruptedException."
>
> Despite this exception, the process seems OK because it just terminate the
> SnapPuller thread but all other threads that process the closing go well.
>
> *Now, the problem is when the close() method called during the reloadCore
> doesn't really close the core.*
> This is the beginning of the close() method.
> public void close() {
> int count = refCount.decrementAndGet();
> if (count > 0) return; // close is called often, and only actually
> closes if nothing is using it.
> if (count < 0) {
>log.error("Too many close [count:{}] on {}. Please report this
> exception to solr-user@lucene.apache.org", count, this );
>assert false : "Too many closes on SolrCore";
>return;
> }
> log.info(logid+" CLOSING SolrCore " + this);
>
> When a HTTP Request is executing, the refCount is greater than 1. So, when
> the old core is trying to be closed during the core reload, the if (count >
> 0) condition simply return this method.
>
> Then, fetchLastIndex method in SnapPuller processes next code and execute "
> openNewSearcherAndUpdateCommitPoint".  If you look at this method, it
> tries to open a new searcher of the solrCore which is referenced during the
> SnapPuller constructor and I believe this one points to the old core.  At
> certain timing, this method also throw
> SnapPuller  - java.lang.InterruptedException
> at java.util.concurrent.FutureTask.awaitDone(FutureTask.java:404)
> at java.util.concurrent.FutureTask.get(FutureTask.java:191)
> at SnapPuller.openNewSearcherAndUpdateCommitPoint(SnapPuller.java:680)
>
> After this exception, things start to go bad.
>
> *In summary, I have two questions.*
> 1. Can you confirm this memory / thread issue?
> 2. When the core reload happens successfully (no matter it throws the
> exception or not), does Solr need to call the 
> openNewSearcherAndUpdateCommitPoint
> method?
>
> Thanks.
>


Re: Memory Issue with SnapPuller

2016-10-20 Thread Jihwan Kim
Sorry, wrong button was clicked.

A little more about "At certain timing, this method also throw "
SnapPuller  - java.lang.InterruptedException
at java.util.concurrent.FutureTask.awaitDone(FutureTask.java:404)
at java.util.concurrent.FutureTask.get(FutureTask.java:191)
at SnapPuller.openNewSearcherAndUpdateCommitPoint(SnapPuller.java:680)

This is a scenario (with less confident).
The old core didn't complete the close() method because of the refCount
during the reloadCore.  Then, it execute the
openNewSearcherAndUpdateCommitPoint
method.  Now, a http request, as an example, finished a process and called
the SolrCore close() method.  refCount is 0. and go to all other process in
the close() method of the SolrCore.
In this case, the InterruptedException can be thrown in the
openNewSearcherAndUpdateCommitPoint.  After that, I noticed a one thread
that executes a newSearcher process hangs and high CPU usage remains high.
We are also using a larger external field file too.



On Thu, Oct 20, 2016 at 9:11 AM, Jihwan Kim  wrote:

> A little more about "At certain timing, this method also throw "
> SnapPuller  - java.lang.InterruptedException
> at java.util.concurrent.FutureTask.awaitDone(FutureTask.java:404)
> at java.util.concurrent.FutureTask.get(FutureTask.java:191)
> at SnapPuller.openNewSearcherAndUpdateCommitPoint(SnapPuller.java:680)
>
> This is less confident scenario.
> The old core didn't complete the close() method during the reloadCore.
> Then, it execute the openNewSearcherAndUpdateCommitPoint method.  Now, a
> http request, as an example, finished a process and called the SolrCore
> close() method.  refCount is 0. and go to all other process in the close()
>
>
> On Thu, Oct 20, 2016 at 8:44 AM, Jihwan Kim  wrote:
>
>> Hi,
>> We are using Solr 4.10.4 and experiencing out of memory exception.  It
>> seems the problem is cause by the following code & scenario.
>>
>> This is the last part of a fetchLastIndex method in SnapPuller.java
>>
>> // we must reload the core after we open the IW back up
>> if (reloadCore) {
>>   reloadCore();
>> }
>>
>> if (successfulInstall) {
>>   if (isFullCopyNeeded) {
>> // let the system know we are changing dir's and the old one
>> // may be closed
>> if (indexDir != null) {
>>   LOG.info("removing old index directory " + indexDir);
>>   core.getDirectoryFactory().doneWithDirectory(indexDir);
>>   core.getDirectoryFactory().remove(indexDir);
>> }
>>   }
>>   if (isFullCopyNeeded) {
>> solrCore.getUpdateHandler().newIndexWriter(isFullCopyNeeded);
>>   }
>>
>>   openNewSearcherAndUpdateCommitPoint(isFullCopyNeeded);
>> }
>>
>> Inside the reloadCore, it create a new core, register it, and try to
>> close the current/old core.  When the closing old core process goes normal,
>> it throws an exception "SnapPull failed 
>> :org.apache.solr.common.SolrException:
>> Index fetch failed Caused by java.lang.RuntimeException: Interrupted while
>> waiting for core reload to finish Caused by Caused by:
>> java.lang.InterruptedException."
>>
>> Despite this exception, the process seems OK because it just terminate
>> the SnapPuller thread but all other threads that process the closing go
>> well.
>>
>> *Now, the problem is when the close() method called during the reloadCore
>> doesn't really close the core.*
>> This is the beginning of the close() method.
>> public void close() {
>> int count = refCount.decrementAndGet();
>> if (count > 0) return; // close is called often, and only
>> actually closes if nothing is using it.
>> if (count < 0) {
>>log.error("Too many close [count:{}] on {}. Please report this
>> exception to solr-user@lucene.apache.org", count, this );
>>assert false : "Too many closes on SolrCore";
>>return;
>> }
>> log.info(logid+" CLOSING SolrCore " + this);
>>
>> When a HTTP Request is executing, the refCount is greater than 1. So,
>> when the old core is trying to be closed during the core reload, the if
>> (count > 0) condition simply return this method.
>>
>> Then, fetchLastIndex method in SnapPuller processes next code and execute
>> "openNewSearcherAndUpdateCommitPoint".  If you look at this method, it
>> tries to open a new searcher of the solrCore which is referenced during the
>> SnapPuller constructor and I believe this one points to the old core.  At
>> certain timing, this method also throw
>> SnapPuller  - java.lang.InterruptedException
>> at java.util.concurrent.FutureTask.awaitDone(FutureTask.java:404)
>> at java.util.concurrent.FutureTask.get(FutureTask.java:191)
>> at SnapPuller.openNewSearcherAndUpdateCommitPoint(
>> SnapPuller.java:680)
>>
>> After this exception, things start to go bad.
>>
>> *In summary, I have two questions.*
>> 1. Can 

Memory Issue with SnapPuller

2016-10-20 Thread Jihwan Kim
Hi,
We are using Solr 4.10.4 and experiencing out of memory exception.  It
seems the problem is cause by the following code & scenario.

This is the last part of a fetchLastIndex method in SnapPuller.java

// we must reload the core after we open the IW back up
if (reloadCore) {
  reloadCore();
}

if (successfulInstall) {
  if (isFullCopyNeeded) {
// let the system know we are changing dir's and the old one
// may be closed
if (indexDir != null) {
  LOG.info("removing old index directory " + indexDir);
  core.getDirectoryFactory().doneWithDirectory(indexDir);
  core.getDirectoryFactory().remove(indexDir);
}
  }
  if (isFullCopyNeeded) {
solrCore.getUpdateHandler().newIndexWriter(isFullCopyNeeded);
  }

  openNewSearcherAndUpdateCommitPoint(isFullCopyNeeded);
}

Inside the reloadCore, it create a new core, register it, and try to close
the current/old core.  When the closing old core process goes normal, it
throws an exception "SnapPull failed :org.apache.solr.common.SolrException:
Index fetch failed Caused by java.lang.RuntimeException: Interrupted while
waiting for core reload to finish Caused by Caused by:
java.lang.InterruptedException."

Despite this exception, the process seems OK because it just terminate the
SnapPuller thread but all other threads that process the closing go well.

*Now, the problem is when the close() method called during the reloadCore
doesn't really close the core.*
This is the beginning of the close() method.
public void close() {
int count = refCount.decrementAndGet();
if (count > 0) return; // close is called often, and only actually
closes if nothing is using it.
if (count < 0) {
   log.error("Too many close [count:{}] on {}. Please report this
exception to solr-user@lucene.apache.org", count, this );
   assert false : "Too many closes on SolrCore";
   return;
}
log.info(logid+" CLOSING SolrCore " + this);

When a HTTP Request is executing, the refCount is greater than 1. So, when
the old core is trying to be closed during the core reload, the if (count >
0) condition simply return this method.

Then, fetchLastIndex method in SnapPuller processes next code and execute
"openNewSearcherAndUpdateCommitPoint".  If you look at this method, it
tries to open a new searcher of the solrCore which is referenced during the
SnapPuller constructor and I believe this one points to the old core.  At
certain timing, this method also throw
SnapPuller  - java.lang.InterruptedException
at java.util.concurrent.FutureTask.awaitDone(FutureTask.java:404)
at java.util.concurrent.FutureTask.get(FutureTask.java:191)
at SnapPuller.openNewSearcherAndUpdateCommitPoint(SnapPuller.java:680)

After this exception, things start to go bad.

*In summary, I have two questions.*
1. Can you confirm this memory / thread issue?
2. When the core reload happens successfully (no matter it throws the
exception or not), does Solr need to call the
openNewSearcherAndUpdateCommitPoint method?

Thanks.


Re: Facet behavior

2016-10-20 Thread Yonik Seeley
On Thu, Oct 20, 2016 at 8:45 AM, Bastien Latard | MDPI AG
 wrote:
> Hi Yonik,
>
> Thanks for your answer!
> I'm not quite I understood everything...please, see my comments below.
>
>
>> On Wed, Oct 19, 2016 at 6:23 AM, Bastien Latard | MDPI AG
>>  wrote:
>>>
>>> I just had a question about facets.
>>> *==> Is the facet run on all documents (to pre-process/cache the data) or
>>> only on returned documents?*
>>
>> Yes ;-)
>>
>> There are sometimes per-field data structures that are cached to
>> support faceting.  This can make the first facet request after a new
>> searcher take longer.  Unless you're using docValues, then the cost is
>> much less.
>
> So how to force it to use docValues? Simply:
>  docValues="true" />
> Are there other advantage/inconvenient?

You probably still want the field indexed as well... that supports
fast filtering by specific values (fq=my_field:value1)
without having to do a complete column scan.

>> Then there are per-request data structures (like a count array) that
>> are O(field_cardinality) and not O(matching_docs).
>> But then for default field-cache faceting, the actual counting part is
>> O(matching_docs).
>> So yes, at the end of  the day we only facet on the matching
>> documents... but what the total field looks like certainly matters.
>
> This would only be like that if I would use docValues, right?

If docvalues aren't indexed, then they are built in memory (or
something like them) before they are used.

-Yonik

> If I have such field declaration (dedicated field for facet-- without
> stemming), what would be the best setting?
>  required="false" multiValued="true" />
>
> Kind regards,
> Bastien
>


group.facet fails when facet on double field

2016-10-20 Thread karel braeckman
Hi,

We are trying to upgrade from Solr 4.8 to Solr 6.2.

This query:

?q=*%3A*=0=2=json=true=true=mediaObjectId=true=rating=true

is returning the following error:

null:org.apache.solr.common.SolrException: Exception during facet.field: rating
at 
org.apache.solr.request.SimpleFacets.lambda$getFacetFieldCounts$0(SimpleFacets.java:739)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at org.apache.solr.request.SimpleFacets$2.execute(SimpleFacets.java:672)
...
Caused by: java.lang.IllegalStateException: unexpected docvalues type
NUMERIC for field 'mediaObjectId' (expected=SORTED). Re-index with
correct docvalues type.
at org.apache.lucene.index.DocValues.checkField(DocValues.java:212)
at org.apache.lucene.index.DocValues.getSorted(DocValues.java:264)
at 
org.apache.lucene.search.grouping.term.TermGroupFacetCollector$SV.doSetNextReader(TermGroupFacetCollector.java:128)
...


The same query without the group.facet=true option does not give an
error. On Solr 4.8 the query did not give problems.


The relevant fields are configured as follows:




Am I doing anything wrong, or do you have any suggestions on what to try next?


Best regards

Karel Braeckman


Re: Facet behavior

2016-10-20 Thread Bastien Latard | MDPI AG

Hi Yonik,

Thanks for your answer!
I'm not quite I understood everything...please, see my comments below.



On Wed, Oct 19, 2016 at 6:23 AM, Bastien Latard | MDPI AG
 wrote:

I just had a question about facets.
*==> Is the facet run on all documents (to pre-process/cache the data) or
only on returned documents?*

Yes ;-)

There are sometimes per-field data structures that are cached to
support faceting.  This can make the first facet request after a new
searcher take longer.  Unless you're using docValues, then the cost is
much less.

So how to force it to use docValues? Simply:
docValues="true" />

Are there other advantage/inconvenient?


Then there are per-request data structures (like a count array) that
are O(field_cardinality) and not O(matching_docs).
But then for default field-cache faceting, the actual counting part is
O(matching_docs).
So yes, at the end of  the day we only facet on the matching
documents... but what the total field looks like certainly matters.

This would only be like that if I would use docValues, right?

If I have such field declaration (dedicated field for facet-- without 
stemming), what would be the best setting?
stored="true" required="false" multiValued="true" />


Kind regards,
Bastien



Filter result of facting query

2016-10-20 Thread Davide Isoardi
Hi all,

I needed filtering, for range, the query result of faceting.
E.G.:
q=*%3A*=id=json=0=true=client=10=false=true

I have this result but I would like only if in specific range (from 5 to 
9). In this case I would returned only "RoundTeam",703461 and 
"Hootsuite",575569,

{
  "responseHeader":{
"status":0,
"QTime":286,
"params":{
  "q":"*:*",
  "facet.limit":"10",
  "df":"id",
  "facet.field":"client",
  "indent":"true",
  "facet.missing":"false",
  "rows":"0",
  "wt":"json",
  "facet":"true"}},
  "response":{"numFound":14869526,"start":0,"maxScore":1.0,"docs":[]
  },
  "facet_counts":{
"facet_queries":{},
"facet_fields":{
  "client":[
"Twitter",6650927,
"IFTTT",926574,
"RoundTeam",703461,
"Hootsuite",575569,
"TwitterFeed",431527,
"Buffer",431332,
"DeliverIT",382915,
"TweetDeck",323392,
"BigData TweetBot",296099,
"LinkedIn",128610]},
"facet_dates":{},
"facet_ranges":{},
"facet_intervals":{},
"facet_heatmaps":{}}}

Do you have any idea to obtain this result?

Thancks in advance
Davide Isoardi
eCube S.r.l.
isoa...@ecubecenter.it
http://www.ecubecenter.it
Tel.  +390113999301
Mobile +393288204915
Fax. +390113999309

 [ecube]  [ecube-firma-mail] 
 [TW1]    [IN1] 

Informativa ai sensi del Decr.Lgs Privacy n.196/2003
ECUBE tratta i dati personali secondo quanto specificato nella pagina "Privacy 
Policy" disponibile su http://www.ecubecenter.it/privacy.pdf. Le informazioni 
contenute nel presente messaggio sono destinate esclusivamente al/ai 
destinatario/i indicato/i. Qualora riceviate il presente messaggio per errore, 
vi preghiamo di voler cortesemente darcene notizia via e-mail 
(i...@ecubecenter.it) e di provvedere ad eliminare 
il messaggio ricevuto erroneamente, essendo illegittimo ed illecito ogni 
diverso utilizzo.




registered User

2016-10-20 Thread kult.n...@googlemail.com
Hi,

please add my User "NilsFaupel" of the Solr Wiki to the ContributorsGroup.

Regards

Nils