Re: Get groups which has the number of elements greater than...

2014-12-16 Thread andreic9203
Hello again lboutros,

The faceting seems to bring only the pivot field with the associated
counting. It's ok, but it's not what I want.
Do you know a way to bring also the documents? I don't know, look at this
example  Faceting-Pivot(DecisionTree)Faceting

  
After the "count" tag, is possible to bring the documents which are counted?
With all fields, with all data...

Thank you again,
Andrei



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Get-groups-which-has-the-number-of-elements-greater-than-tp4174352p4174686.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Join in SOLR

2014-12-16 Thread Rajesh
Thanks Mikhail. As per what you have mentioned can I get a list of sub
entities with this new Zipper join. Because in existing DIH I'm getting a
list for individual fields of the sub entities. 

And also, I've not found DIH 5 jar anywhere. Is it still in development.




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Join-in-SOLR-tp4173930p4174679.html
Sent from the Solr - User mailing list archive at Nabble.com.


questions about BlockJoinParentQParser

2014-12-16 Thread Michael Sokolov
I'm trying to use BJPQP and ran into a few little gotchas that I'd like 
to share with y'all in case you have any advice.


First I ran into an NPE that probably should be handled better - maybe 
just an exception with a better message.  The framework I'm working in 
makes it slightly annoying to use localParams syntax (I have to bypass 
special-character escaping in the client framework), so I thought I'd 
set defType=parent and pass the "which" as a global parameter, but if 
you try this you get an NPE since the QParser expects "which" to be 
passed in as a local param.  It's probably not really a sane use case 
since you have to use localParams to get your default qparser 
instantiated in that case anyway, so why would you do it?  Still - it 
would be good to report a clearer exception to the user.


Then I got my query working, but results were coming back in a funky 
order.  I realized that the client doc scores were being thrown away -- 
BJPQP is hard-coded to use ScoreMode.None.  So then I went to subclass 
the QParser (and plugin) to override the score mode -- createQuery is 
protected, which would seem to make this convenient to do this, but the 
class itself (BlockJoinParentQParser) is package private.  Then I 
thought I'd just put my class in the same package, but this fails at 
runtime since it's loaded by a different class loader. argh.  I would 
have to copy and fork the whole class to get this working.


I guess that's what I'll do, but this should be easier.  Am I missing 
something?  Is there another way to get a scoring ToParentBlockJoinQuery 
in Solr?


Thanks

-Mike


Re: Partial match autosuggest (match a word occurring anywhere in a field)

2014-12-16 Thread Ahmet Arslan
Hi BBrani,

Yes it is possible. Create another field, say edgytext_partial, use whitespace 
tokenises this time.
And query on both edgytext and edgytext_partial. you can even apply different 
boosts.

Ahmet

 



On Wednesday, December 17, 2014 2:44 AM, bbarani  wrote:
Hi,

I am trying to figure out a way to implement partial match autosuggest but
it doesn't work in some 
cases.

When I search for iphone 5s, I am able to see the below results.

title_new:Apple iPhone 5s - 16GB - Gold

but when I search for iphone gold (in title_new field), I am not able to see
the above result. Is there a way to implement full partial match (occuring
anywhere in a field)?


Please find below my fieldtype configuration for title_new

 
 



 
 


 
 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Partial-match-autosuggest-match-a-word-occurring-anywhere-in-a-field-tp4174660.html
Sent from the Solr - User mailing list archive at Nabble.com.


Partial match autosuggest (match a word occurring anywhere in a field)

2014-12-16 Thread bbarani
Hi,

I am trying to figure out a way to implement partial match autosuggest but
it doesn't work in some 
cases.

When I search for iphone 5s, I am able to see the below results.

title_new:Apple iPhone 5s - 16GB - Gold

but when I search for iphone gold (in title_new field), I am not able to see
the above result. Is there a way to implement full partial match (occuring
anywhere in a field)?


Please find below my fieldtype configuration for title_new

 
  



  
  


  
 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Partial-match-autosuggest-match-a-word-occurring-anywhere-in-a-field-tp4174660.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Identical query returning different aggregate results

2014-12-16 Thread Erick Erickson
Wow, advancing senility... _I'm_ actually the person that committed that fix...

Siiihhh.

On Tue, Dec 16, 2014 at 5:38 PM, David Smith
 wrote:
> Chris,
>
> Yes, your suggestion worked.  Changing the parameter in my query from
>
> ...f.eventDate.facet.mincount=1...
>
>
> to
>
> ...f.eventDate.facet.mincount=0...
>
>
> worked around the problem. And I agree that SOLR-6154 describes what I 
> observed almost exactly.  Once 5.0 is available, I'll test this again with 
> "mincount=1".
>
> Thanks everyone for your help! It is very much appreciated.
>
> Regards,
> David
>
>  On Tuesday, December 16, 2014 4:38 PM, Chris Hostetter 
>  wrote:
>
>
>
> sounds like this bug...
>
> https://issues.apache.org/jira/browse/SOLR-6154
>
> ...in which case it has nothing to do with your use of multiple
> collections, it's just dependent on wether or not the first node to
> respond happens to have a doc in every "range bucket" .. any bucket
> missing (because of your mincount=1) from the first core to
> respond is then ignored in the response fro mthe subsequent cores.
>
> workarround is to set mincount=0 for your facet ranges.
>
>
>
> : Date: Tue, 16 Dec 2014 17:17:05 + (UTC)
> : From: David Smith 
> : Reply-To: solr-user@lucene.apache.org, David Smith 
> : To: Solr-user 
> : Subject: Identical query returning different aggregate results
> :
> : I have a prototype SolrCloud 4.10.2 setup with 13 collections (of 1 
> replica, 1 shard each) and a separate 1-node Zookeeper 3.4.6.
> : The very first app test case I wrote is failing intermittently in this 
> environment, when I only have 4 documents ingested into the cloud.
> : I dug in and found when I query against multiple collections, using the 
> "collection=" parameter, the aggregates I request are correct about 50% of 
> the time.  The other 50% of the time, the aggregate returned by Solr is not 
> correct. Note this is for the identical query.  In other words, I can run the 
> same query multiple times in a row, and get different answers.
> :
> : The simplest version of the query that still exhibits the odd behavior is 
> as follows:
> : 
> http://192.168.59.103:8985/solr/query_handler/query?facet.range=eventDate&f.eventDate.facet.range.end=2014-12-31T23:59:59.999Z&f.eventDate.facet.range.gap=%2B1DAY&fl=eventDate,id&start=0&collection=2014_04,2014_03&rows=10&f.eventDate.facet.range.start=2014-01-01T00:00:00.000Z&q=*:*&f.eventDate.facet.mincount=1&facet=true
> :
> : When it SUCCEEDS, the aggregate correctly appears like this:
> :
> :   "facet_counts":{"facet_queries":{},"facet_fields":{},
> "facet_dates":{},"facet_ranges":{  "eventDate":{"counts":[
>   "2014-04-01T00:00:00Z",3],"gap":"+1DAY",
> "start":"2014-01-01T00:00:00Z","end":"2015-01-01T00:00:00Z"}},
> "facet_intervals":{}}}
> :
> : When it FAILS, note that the counts[] array is empty:
> :   "facet_counts":{"facet_queries":{},"facet_fields":{},
> "facet_dates":{},"facet_ranges":{  "eventDate":{"counts":[],  
>   "gap":"+1DAY","start":"2014-01-01T00:00:00Z",
> "end":"2015-01-01T00:00:00Z"}},"facet_intervals":{}}}
> :
> : If I further simplify the query, by removing range options or reducing to 
> one (1) collection name, then the problem goes away.
> :
> : The solr logs are clean at INFO level, and there is no substantive 
> difference in log output when the query succeeds vs fails, leaving me stumped 
> where to look next.  Suggestions welcome.
> : Regards,
> : David
> :
> :
> :
> :
> :
>
> -Hoss
> http://www.lucidworks.com/
>
>


Re: Solr Node Resource allocation.

2014-12-16 Thread Erick Erickson
Identifying the "lease used" cores... no tools integrated with Solr
that I know of.

But once you do figure out what machines to use, the collections API CREATE
command has a createNodeSet which will put the new collection on the
specified nodes.

And the ADDREPLICA command also allows you to specify a node parameter when you
add one.

Best,
Erick

On Tue, Dec 16, 2014 at 5:07 PM, Elan Palani  wrote:
>
> Hello..
>
> Let’s assume I have a SolrCloud with 10 node cluster . most of the time I 
> want to New collection created with 2 shards and 2 replicas on (4 different 
> nodes)
>  using CloudSolrServer we could randomly find the nodes to allocate. but Is 
> there a way to pick and choose the set of nodes based on
> their resource usage/allocation. (CPU,MEM,Disk, Collection Size, #of Doc’s 
> etc.) . Any suggestions or tools available?
>
> Basically I am trying to identify the least used nodes for creating cores 
> with some rules..
>
> Thanks in Advance.
>
> Elan


Re: Identical query returning different aggregate results

2014-12-16 Thread David Smith
Chris,

Yes, your suggestion worked.  Changing the parameter in my query from 

...f.eventDate.facet.mincount=1...


to

...f.eventDate.facet.mincount=0...


worked around the problem. And I agree that SOLR-6154 describes what I observed 
almost exactly.  Once 5.0 is available, I'll test this again with "mincount=1".

Thanks everyone for your help! It is very much appreciated.

Regards,
David 

 On Tuesday, December 16, 2014 4:38 PM, Chris Hostetter 
 wrote:
   

 
sounds like this bug...

https://issues.apache.org/jira/browse/SOLR-6154

...in which case it has nothing to do with your use of multiple 
collections, it's just dependent on wether or not the first node to 
respond happens to have a doc in every "range bucket" .. any bucket 
missing (because of your mincount=1) from the first core to 
respond is then ignored in the response fro mthe subsequent cores.

workarround is to set mincount=0 for your facet ranges.



: Date: Tue, 16 Dec 2014 17:17:05 + (UTC)
: From: David Smith 
: Reply-To: solr-user@lucene.apache.org, David Smith 
: To: Solr-user 
: Subject: Identical query returning different aggregate results
: 
: I have a prototype SolrCloud 4.10.2 setup with 13 collections (of 1 replica, 
1 shard each) and a separate 1-node Zookeeper 3.4.6.  
: The very first app test case I wrote is failing intermittently in this 
environment, when I only have 4 documents ingested into the cloud.
: I dug in and found when I query against multiple collections, using the 
"collection=" parameter, the aggregates I request are correct about 50% of the 
time.  The other 50% of the time, the aggregate returned by Solr is not 
correct. Note this is for the identical query.  In other words, I can run the 
same query multiple times in a row, and get different answers.
: 
: The simplest version of the query that still exhibits the odd behavior is as 
follows:
: 
http://192.168.59.103:8985/solr/query_handler/query?facet.range=eventDate&f.eventDate.facet.range.end=2014-12-31T23:59:59.999Z&f.eventDate.facet.range.gap=%2B1DAY&fl=eventDate,id&start=0&collection=2014_04,2014_03&rows=10&f.eventDate.facet.range.start=2014-01-01T00:00:00.000Z&q=*:*&f.eventDate.facet.mincount=1&facet=true
: 
: When it SUCCEEDS, the aggregate correctly appears like this:
: 
:   "facet_counts":{    "facet_queries":{},    "facet_fields":{},    
"facet_dates":{},    "facet_ranges":{      "eventDate":{        "counts":[      
    "2014-04-01T00:00:00Z",3],        "gap":"+1DAY",        
"start":"2014-01-01T00:00:00Z",        "end":"2015-01-01T00:00:00Z"}},    
"facet_intervals":{}}}
: 
: When it FAILS, note that the counts[] array is empty:
:   "facet_counts":{    "facet_queries":{},    "facet_fields":{},    
"facet_dates":{},    "facet_ranges":{      "eventDate":{        "counts":[],    
    "gap":"+1DAY",        "start":"2014-01-01T00:00:00Z",        
"end":"2015-01-01T00:00:00Z"}},    "facet_intervals":{}}}
: 
: If I further simplify the query, by removing range options or reducing to one 
(1) collection name, then the problem goes away.
: 
: The solr logs are clean at INFO level, and there is no substantive difference 
in log output when the query succeeds vs fails, leaving me stumped where to 
look next.  Suggestions welcome.
: Regards,
: David
: 
: 
: 
: 
: 

-Hoss
http://www.lucidworks.com/

   

Solr Node Resource allocation.

2014-12-16 Thread Elan Palani

Hello.. 

Let’s assume I have a SolrCloud with 10 node cluster . most of the time I want 
to New collection created with 2 shards and 2 replicas on (4 different nodes) 
 using CloudSolrServer we could randomly find the nodes to allocate. but Is 
there a way to pick and choose the set of nodes based on 
their resource usage/allocation. (CPU,MEM,Disk, Collection Size, #of Doc’s 
etc.) . Any suggestions or tools available?

Basically I am trying to identify the least used nodes for creating cores with 
some rules.. 

Thanks in Advance.

Elan

Re: Identical query returning different aggregate results

2014-12-16 Thread Chris Hostetter

sounds like this bug...

https://issues.apache.org/jira/browse/SOLR-6154

...in which case it has nothing to do with your use of multiple 
collections, it's just dependent on wether or not the first node to 
respond happens to have a doc in every "range bucket" .. any bucket 
missing (because of your mincount=1) from the first core to 
respond is then ignored in the response fro mthe subsequent cores.

workarround is to set mincount=0 for your facet ranges.



: Date: Tue, 16 Dec 2014 17:17:05 + (UTC)
: From: David Smith 
: Reply-To: solr-user@lucene.apache.org, David Smith 
: To: Solr-user 
: Subject: Identical query returning different aggregate results
: 
: I have a prototype SolrCloud 4.10.2 setup with 13 collections (of 1 replica, 
1 shard each) and a separate 1-node Zookeeper 3.4.6.  
: The very first app test case I wrote is failing intermittently in this 
environment, when I only have 4 documents ingested into the cloud.
: I dug in and found when I query against multiple collections, using the 
"collection=" parameter, the aggregates I request are correct about 50% of the 
time.  The other 50% of the time, the aggregate returned by Solr is not 
correct. Note this is for the identical query.  In other words, I can run the 
same query multiple times in a row, and get different answers.
: 
: The simplest version of the query that still exhibits the odd behavior is as 
follows:
: 
http://192.168.59.103:8985/solr/query_handler/query?facet.range=eventDate&f.eventDate.facet.range.end=2014-12-31T23:59:59.999Z&f.eventDate.facet.range.gap=%2B1DAY&fl=eventDate,id&start=0&collection=2014_04,2014_03&rows=10&f.eventDate.facet.range.start=2014-01-01T00:00:00.000Z&q=*:*&f.eventDate.facet.mincount=1&facet=true
: 
: When it SUCCEEDS, the aggregate correctly appears like this:
: 
:   "facet_counts":{    "facet_queries":{},    "facet_fields":{},    
"facet_dates":{},    "facet_ranges":{      "eventDate":{        "counts":[      
    "2014-04-01T00:00:00Z",3],        "gap":"+1DAY",        
"start":"2014-01-01T00:00:00Z",        "end":"2015-01-01T00:00:00Z"}},    
"facet_intervals":{}}}
: 
: When it FAILS, note that the counts[] array is empty:
:   "facet_counts":{    "facet_queries":{},    "facet_fields":{},    
"facet_dates":{},    "facet_ranges":{      "eventDate":{        "counts":[],    
    "gap":"+1DAY",        "start":"2014-01-01T00:00:00Z",        
"end":"2015-01-01T00:00:00Z"}},    "facet_intervals":{}}}
: 
: If I further simplify the query, by removing range options or reducing to one 
(1) collection name, then the problem goes away.
: 
: The solr logs are clean at INFO level, and there is no substantive difference 
in log output when the query succeeds vs fails, leaving me stumped where to 
look next.  Suggestions welcome.
: Regards,
: David
: 
: 
: 
: 
: 

-Hoss
http://www.lucidworks.com/

splitshard the collection time out:900s

2014-12-16 Thread Randy Castro
Hello,
I'm experiencing the exact same issue.  Unfortunately I'm using Solr 4.7 so the 
async call is not available to me.  The only thing I could find in the log is 
the following entry:

solr.log.1:INFO  - 2014-12-16 21:49:02.783; 
org.apache.solr.handler.admin.CollectionsHandler; Splitting shard : 
shard=shardset2_1_1_0_0_0&action=SPLITSHARD&collection=records

But there's no indication in the clusterstate that it is going (I don't see any 
child shards with the status "constructing", and eventually it times out with 
the error below.  Anything else I can try or check?



500
900011


splitshard the collection time out:900s

org.apache.solr.common.SolrException: splitshard the collection time out:900s 
at 
org.apache.solr.handler.admin.CollectionsHandler.handleResponse(CollectionsHandler.java:252)
 at 
org.apache.solr.handler.admin.CollectionsHandler.handleSplitShardAction(CollectionsHandler.java:484)
 at 
org.apache.solr.handler.admin.CollectionsHandler.handleRequestBody(CollectionsHandler.java:165)
 at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
 at 
org.apache.solr.servlet.SolrDispatchFilter.handleAdminRequest(SolrDispatchFilter.java:732)
 at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:277)
 at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:217)
 at 
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419)
 at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455) 
at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137) 
at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557) 
at 
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
 at 
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075)
 at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384) 
at 
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)
 at 
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009)
 at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135) 
at 
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)
 at 
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154)
 at 
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116) 
at org.eclipse.jetty.server.Server.handle(Server.java:368) at 
org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489)
 at 
org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53)
 at 
org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:942)
 at 
org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:1004)
 at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:640) at 
org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235) at 
org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72)
 at 
org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264)
 at 
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
 at 
org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543) 
at java.lang.Thread.run(Thread.java:744)

500



Randy Castro | System Administrator | PeopleFinders.com | office (916) 266.0594 
| ra...@peoplefinders.com



Re: OutOfMemoryError

2014-12-16 Thread Trilok Prithvi
Shawn, looks like the JVM bump did the trick. Thanks!

On Tue, Dec 16, 2014 at 10:39 AM, Trilok Prithvi 
wrote:
>
> Thanks Shawn. We will increase the JVM to 4GB and see how it performs.
>
> Alexandre,
> Our queries are simple (with strdist() function in almost all the
> queries). No facets, or sorts.
> But we do a lot of data loads. We index data a lot (several documents,
> ranging from 10 - 10 documents) and we upload data through out the day.
> Basically, we are heavy on indexing and querying (simple queries) at the
> same time.
>
>
>
> On Tue, Dec 16, 2014 at 10:17 AM, Alexandre Rafalovitch <
> arafa...@gmail.com> wrote:
>>
>> What's your queries look like? Especially FQs, facets, sort, etc. All
>> of those things require caches of various sorts.
>>
>> Regards,
>>Alex.
>> Personal: http://www.outerthoughts.com/ and @arafalov
>> Solr resources and newsletter: http://www.solr-start.com/ and @solrstart
>> Solr popularizers community: https://www.linkedin.com/groups?gid=6713853
>>
>>
>> On 16 December 2014 at 11:55, Trilok Prithvi 
>> wrote:
>> > We are getting OOME pretty often (every hour or so). We are restarting
>> > nodes to keep up with it.
>> >
>> > Here is our setup:
>> > SolrCloud 4.10.2 (2 shards, 2 replicas) with 3 zookeepers.
>> >
>> > Each node has:
>> > 16GB RAM
>> > 2GB JVM (Xmx 2048, Xms 1024)
>> > ~100 Million documents (split among 2 shards - ~50M on each shard)
>> > Solr Core is about ~16GB of data on each node.
>> >
>> > *Physical Memory is almost always 99% full.*
>> >
>> >
>> > The commit setup is as follows:
>> >
>> >   > > "dir">${solr.ulog.dir:}  
>> 30> > maxTime> 10 false > > autoCommit>  5000 
>> > > updateHandler>
>> > Rest of the solrconfig.xml setup is all default.
>> >
>> > Some of the errors that we see on Solr ADMIN Logging is as follows:
>> >
>> > java.lang.OutOfMemoryError: Java heap space
>> >
>> > org.apache.solr.common.SolrException: no servers hosting shard:
>> >
>> > org.apache.http.TruncatedChunkException: Truncated chunk ( expected
>> > size: 8192; actual size: 7222)
>> >
>> >
>> > Please let me know if you need anymore information.
>> >
>> >
>> > Thanks!
>>
>


Re: Identical query returning different aggregate results

2014-12-16 Thread Erick Erickson
Ah, OK. I didn't get that when I read your first e-mail...

Hmmm, this is still a puzzle then. Tail the respective Solr logs, you _should_
be seeing the sub-query go to each of them and the sub-query _should_
carry along all of the faceting information. Or this might just be a flat bug...

Best,
Erick

On Tue, Dec 16, 2014 at 2:46 PM, David Smith
 wrote:
> Hi Erick,
> Thanks for your reply.
> My test environment only has one shard and one replica per collection.  So, I 
> think there is no possibility of replicas getting out of sync.  Here is how I 
> create each (month-based) collection:
> http://192.168.59.103:8983/solr/admin/collections?action=CREATE&name=2014_01&numShards=1&replicationFactor=1&maxShardsPerNode=1&collection.configName=main_confhttp://192.168.59.103:8983/solr/admin/collections?action=CREATE&name=2014_02&numShards=1&replicationFactor=1&maxShardsPerNode=1&collection.configName=main_confhttp://192.168.59.103:8983/solr/admin/collections?action=CREATE&name=2014_03&numShards=1&replicationFactor=1&maxShardsPerNode=1&collection.configName=main_conf...etc,
>  etc...
>
> Still, I think you are on to something.  I had already noticed that querying 
> one collection at a time works.  For example, if I change my query 
> oh-so-slightly from this:
>
> "collection=2014_04,2014_03"
>
> to this
>
> "...collection=2014_04"
>
> Then, the results are correct 100% of the time. I think substantively this is 
> the same as specifying the name of the shard since, again, in my test 
> environment I only have one shard per collection anyway.
> I should mention that the "2014_03" collection is empty.  0 documents.  All 3 
> documents which satisfy the facet range are in the "2014_04" collection.  So, 
> it's a real head-scratcher that introducing that collection name into the 
> query makes the results misbehave.
> Kind regards,David
>  On Tuesday, December 16, 2014 2:25 PM, Erick Erickson 
>  wrote:
>
>
>  bq: Facet counts include deleted documents until the segments merge
>
> Whoa! Facet counts do _not_ require segment merging to be accurate.
> What merging does is remove the _term_ information associated with
> deleted documents, and removes their contribution to the TF/IDF
> scores.
>
> David:
> Hmmm, what happens if you direct the query not only to a single
> collection, but to a single shard? Add &distrib=false to the query and
> point it to each of your replicas. (one collection at a time). The
> expectation is that each replica for a slice within a collection has
> identical documents.
>
> One possibility is that somehow your shards are out of sync on a
> collection. So the internal load balancing that happens sometimes
> sends the query to one replica and sometime to another. 2 replicas
> (leader and follower) and 50% failure, coincidence?
>
> That just bumps the question up another level of course, the next
> question is _why_ is the shard out of sync. So in that case I'd issue
> a commit to all the collections on the off chance that somehow that
> didn't happen and try again (very low probability that this is the
> root cause, but you never know).
>
> but it sure sounds like one replica doesn't agree with another, so the
> above will give us place to look.
>
> Best,
> Erick
>
>
>
> On Tue, Dec 16, 2014 at 12:12 PM, David Smith
>  wrote:
>> Alex,
>> Good suggestion, but in this case, no.  This example is from a cleanroom 
>> type test environment where the collections have very recently been created, 
>> there are only 4 documents total across all collections, and no delete's 
>> have been issued.
>> Kind regards,
>> David
>>
>>
>>  On Tuesday, December 16, 2014 12:01 PM, Alexandre Rafalovitch 
>>  wrote:
>>
>>
>>  Facet counts include deleted documents until the segments merge. Could that
>> be an issue?
>>
>> Regards,
>>Alex
>> On 16/12/2014 12:18 pm, "David Smith"  wrote:
>>
>>> I have a prototype SolrCloud 4.10.2 setup with 13 collections (of 1
>>> replica, 1 shard each) and a separate 1-node Zookeeper 3.4.6.
>>> The very first app test case I wrote is failing intermittently in this
>>> environment, when I only have 4 documents ingested into the cloud.
>>> I dug in and found when I query against multiple collections, using the
>>> "collection=" parameter, the aggregates I request are correct about 50% of
>>> the time.  The other 50% of the time, the aggregate returned by Solr is not
>>> correct. Note this is for the identical query.  In other words, I can run
>>> the same query multiple times in a row, and get different answers.
>>>
>>> The simplest version of the query that still exhibits the odd behavior is
>>> as follows:
>>>
>>> http://192.168.59.103:8985/solr/query_handler/query?facet.range=eventDate&f.eventDate.facet.range.end=2014-12-31T23:59:59.999Z&f.eventDate.facet.range.gap=%2B1DAY&fl=eventDate,id&start=0&collection=2014_04,2014_03&rows=10&f.eventDate.facet.range.start=2014-01-01T00:00:00.000Z&q=*:*&f.eventDate.facet.mincount=1&facet=true
>>>
>>> When it SUCCEEDS, t

Re: Identical query returning different aggregate results

2014-12-16 Thread David Smith
Hi Erick,
Thanks for your reply.
My test environment only has one shard and one replica per collection.  So, I 
think there is no possibility of replicas getting out of sync.  Here is how I 
create each (month-based) collection:
http://192.168.59.103:8983/solr/admin/collections?action=CREATE&name=2014_01&numShards=1&replicationFactor=1&maxShardsPerNode=1&collection.configName=main_confhttp://192.168.59.103:8983/solr/admin/collections?action=CREATE&name=2014_02&numShards=1&replicationFactor=1&maxShardsPerNode=1&collection.configName=main_confhttp://192.168.59.103:8983/solr/admin/collections?action=CREATE&name=2014_03&numShards=1&replicationFactor=1&maxShardsPerNode=1&collection.configName=main_conf...etc,
 etc...

Still, I think you are on to something.  I had already noticed that querying 
one collection at a time works.  For example, if I change my query 
oh-so-slightly from this:

"collection=2014_04,2014_03"

to this

"...collection=2014_04"

Then, the results are correct 100% of the time. I think substantively this is 
the same as specifying the name of the shard since, again, in my test 
environment I only have one shard per collection anyway.
I should mention that the "2014_03" collection is empty.  0 documents.  All 3 
documents which satisfy the facet range are in the "2014_04" collection.  So, 
it's a real head-scratcher that introducing that collection name into the query 
makes the results misbehave.
Kind regards,David
 On Tuesday, December 16, 2014 2:25 PM, Erick Erickson 
 wrote:
   

 bq: Facet counts include deleted documents until the segments merge

Whoa! Facet counts do _not_ require segment merging to be accurate.
What merging does is remove the _term_ information associated with
deleted documents, and removes their contribution to the TF/IDF
scores.

David:
Hmmm, what happens if you direct the query not only to a single
collection, but to a single shard? Add &distrib=false to the query and
point it to each of your replicas. (one collection at a time). The
expectation is that each replica for a slice within a collection has
identical documents.

One possibility is that somehow your shards are out of sync on a
collection. So the internal load balancing that happens sometimes
sends the query to one replica and sometime to another. 2 replicas
(leader and follower) and 50% failure, coincidence?

That just bumps the question up another level of course, the next
question is _why_ is the shard out of sync. So in that case I'd issue
a commit to all the collections on the off chance that somehow that
didn't happen and try again (very low probability that this is the
root cause, but you never know).

but it sure sounds like one replica doesn't agree with another, so the
above will give us place to look.

Best,
Erick



On Tue, Dec 16, 2014 at 12:12 PM, David Smith
 wrote:
> Alex,
> Good suggestion, but in this case, no.  This example is from a cleanroom type 
> test environment where the collections have very recently been created, there 
> are only 4 documents total across all collections, and no delete's have been 
> issued.
> Kind regards,
> David
>
>
>      On Tuesday, December 16, 2014 12:01 PM, Alexandre Rafalovitch 
> wrote:
>
>
>  Facet counts include deleted documents until the segments merge. Could that
> be an issue?
>
> Regards,
>    Alex
> On 16/12/2014 12:18 pm, "David Smith"  wrote:
>
>> I have a prototype SolrCloud 4.10.2 setup with 13 collections (of 1
>> replica, 1 shard each) and a separate 1-node Zookeeper 3.4.6.
>> The very first app test case I wrote is failing intermittently in this
>> environment, when I only have 4 documents ingested into the cloud.
>> I dug in and found when I query against multiple collections, using the
>> "collection=" parameter, the aggregates I request are correct about 50% of
>> the time.  The other 50% of the time, the aggregate returned by Solr is not
>> correct. Note this is for the identical query.  In other words, I can run
>> the same query multiple times in a row, and get different answers.
>>
>> The simplest version of the query that still exhibits the odd behavior is
>> as follows:
>>
>> http://192.168.59.103:8985/solr/query_handler/query?facet.range=eventDate&f.eventDate.facet.range.end=2014-12-31T23:59:59.999Z&f.eventDate.facet.range.gap=%2B1DAY&fl=eventDate,id&start=0&collection=2014_04,2014_03&rows=10&f.eventDate.facet.range.start=2014-01-01T00:00:00.000Z&q=*:*&f.eventDate.facet.mincount=1&facet=true
>>
>> When it SUCCEEDS, the aggregate correctly appears like this:
>>
>>  "facet_counts":{    "facet_queries":{},    "facet_fields":{},
>> "facet_dates":{},    "facet_ranges":{      "eventDate":{        "counts":[
>>        "2014-04-01T00:00:00Z",3],        "gap":"+1DAY",
>> "start":"2014-01-01T00:00:00Z",        "end":"2015-01-01T00:00:00Z"}},
>> "facet_intervals":{}}}
>>
>> When it FAILS, note that the counts[] array is empty:
>>  "facet_counts":{    "facet_queries":{},    "facet_fields":{},
>> "facet_dates":{},    "facet_

Re: SolrCloud Collection creation timeout

2014-12-16 Thread E S J
Hi Shanaka,

Try out this,

http://:/solr/admin/collections?action=CREATE&name=&replicationFactor=<#
OF REPLICATION>&numShards=<# OF SHARDS>&collection.configName=&maxShardsPerNode=<# OF MAX SHARDS>&wt=json&indent=2

ex :
http://solr1.internal:7070/solr/admin/collections?action=CREATE&name=c-ins&replicationFactor=4&numShards=4&collection.configName=default&maxShardsPerNode=4&wt=json&indent=2


If you keep getting SolrTime out , Try out creating Solr Admin and see you
can achieve same, If that's failing you have to connect to ZK server and
see It's connected to Solr correctly.
Have you register solr configurations with ZK  (using zkcli.sh)?

Thanks,
Shanaka J


On 16 December 2014 at 16:44, Shanaka Munasinghe 
wrote:
>
> Hi All
>
> Im using solrcloud 4.10.2 with zookeepr 3.4.6. there are 3 zk servers and
> 4 solr instances. deployed on weblogic 12c server. zookeeper and solr
> servers are running without any issue. all 3 zk servers are separate
> physical servers. i have tested this configuration in locally using virtual
> servers and its working fine. but in the production servers everytime im
> getting below timeout exception when issue any of the collection API
> commands. please help
>
> command
>
> http://
> :/solr/admin/collections?action=CREATE&name=en-collection&numShards=1&replicationFactor=4&collection.configName=conf_en<
> http://10.52.133.59:61011/solr/admin/collections?action=CREATE&name=en-collection&numShards=1&replicationFactor=4&collection.configName=conf_en
> >
>
>
>
> Exception
>
> null:org.apache.solr.common.SolrException: createcollection the collection
> time out:180s
>
>at
> org.apache.solr.handler.admin.CollectionsHandler.handleResponse(CollectionsHandler.java:368)
>
>at
> org.apache.solr.handler.admin.CollectionsHandler.handleResponse(CollectionsHandler.java:320)
>
>at
> org.apache.solr.handler.admin.CollectionsHandler.handleCreateAction(CollectionsHandler.java:486)
>
>at
> org.apache.solr.handler.admin.CollectionsHandler.handleRequestBody(CollectionsHandler.java:148)
>
>at
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
>
>at
> org.apache.solr.servlet.SolrDispatchFilter.handleAdminRequest(SolrDispatchFilter.java:729)
>
>at
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:267)
>
>at
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:207)
>
>at
> weblogic.servlet.internal.FilterChainImpl.doFilter(FilterChainImpl.java:79)
>
>at
> weblogic.servlet.internal.WebAppServletContext$ServletInvocationAction.wrapRun(WebAppServletContext.java:3367)
>
>at
> weblogic.servlet.internal.WebAppServletContext$ServletInvocationAction.run(WebAppServletContext.java:)
>
>at
> weblogic.security.acl.internal.AuthenticatedSubject.doAs(AuthenticatedSubject.java:321)
>
>at
> weblogic.security.service.SecurityManager.runAs(SecurityManager.java:120)
>
>at
> weblogic.servlet.provider.WlsSubjectHandle.run(WlsSubjectHandle.java:57)
>
>at
> weblogic.servlet.internal.WebAppServletContext.doSecuredExecute(WebAppServletContext.java:2220)
>
>at
> weblogic.servlet.internal.WebAppServletContext.securedExecute(WebAppServletContext.java:2146)
>
>at
> weblogic.servlet.internal.WebAppServletContext.execute(WebAppServletContext.java:2124)
>
>at
> weblogic.servlet.internal.ServletRequestImpl.run(ServletRequestImpl.java:1564)
>
>at
> weblogic.servlet.provider.ContainerSupportProviderImpl$WlsRequestExecutor.run(ContainerSupportProviderImpl.java:254)
>
>at weblogic.work.ExecuteThread.execute(ExecuteThread.java:295)
>
>at weblogic.work.ExecuteThread.run(ExecuteThread.java:254)
>
>
> Thanks & Best Regards.
> Shanaka Munasinghe
>
>
>
> --
> Virtusa was recently featured in Everest Group's PEAK Matrix for Banking
> Application Outsourcing,
> Life Sciences IT Outsourcing and Healthcare Payer Industry IT
> Outsourcing,Forrester Research's
> report on major mid-sized offshore IT services vendors, 2013 Forbes List
> of 100 Best
> Public Companies In America with revenue less than $1B and won the 2013
> Frost & Sullivan
> Customer Value Leadership Award for System Integration for CEM in
> Healthcare.
>
>
> --
> This message, including any attachments, contains confidential information
> intended for a specific individual and purpose, and is intended for the
> addressee
> only. Any unauthorized disclosure, use, dissemination, copying, or
> distribution of
> this message or any of its attachments or the information contained in
> this e-mail,
> or the taking of any action based on it, is strictly prohibited. If you
> are not the
> inte

Re: Identical query returning different aggregate results

2014-12-16 Thread Erick Erickson
bq: Facet counts include deleted documents until the segments merge

Whoa! Facet counts do _not_ require segment merging to be accurate.
What merging does is remove the _term_ information associated with
deleted documents, and removes their contribution to the TF/IDF
scores.

David:
Hmmm, what happens if you direct the query not only to a single
collection, but to a single shard? Add &distrib=false to the query and
point it to each of your replicas. (one collection at a time). The
expectation is that each replica for a slice within a collection has
identical documents.

One possibility is that somehow your shards are out of sync on a
collection. So the internal load balancing that happens sometimes
sends the query to one replica and sometime to another. 2 replicas
(leader and follower) and 50% failure, coincidence?

That just bumps the question up another level of course, the next
question is _why_ is the shard out of sync. So in that case I'd issue
a commit to all the collections on the off chance that somehow that
didn't happen and try again (very low probability that this is the
root cause, but you never know).

but it sure sounds like one replica doesn't agree with another, so the
above will give us place to look.

Best,
Erick



On Tue, Dec 16, 2014 at 12:12 PM, David Smith
 wrote:
> Alex,
> Good suggestion, but in this case, no.  This example is from a cleanroom type 
> test environment where the collections have very recently been created, there 
> are only 4 documents total across all collections, and no delete's have been 
> issued.
> Kind regards,
> David
>
>
>  On Tuesday, December 16, 2014 12:01 PM, Alexandre Rafalovitch 
>  wrote:
>
>
>  Facet counts include deleted documents until the segments merge. Could that
> be an issue?
>
> Regards,
> Alex
> On 16/12/2014 12:18 pm, "David Smith"  wrote:
>
>> I have a prototype SolrCloud 4.10.2 setup with 13 collections (of 1
>> replica, 1 shard each) and a separate 1-node Zookeeper 3.4.6.
>> The very first app test case I wrote is failing intermittently in this
>> environment, when I only have 4 documents ingested into the cloud.
>> I dug in and found when I query against multiple collections, using the
>> "collection=" parameter, the aggregates I request are correct about 50% of
>> the time.  The other 50% of the time, the aggregate returned by Solr is not
>> correct. Note this is for the identical query.  In other words, I can run
>> the same query multiple times in a row, and get different answers.
>>
>> The simplest version of the query that still exhibits the odd behavior is
>> as follows:
>>
>> http://192.168.59.103:8985/solr/query_handler/query?facet.range=eventDate&f.eventDate.facet.range.end=2014-12-31T23:59:59.999Z&f.eventDate.facet.range.gap=%2B1DAY&fl=eventDate,id&start=0&collection=2014_04,2014_03&rows=10&f.eventDate.facet.range.start=2014-01-01T00:00:00.000Z&q=*:*&f.eventDate.facet.mincount=1&facet=true
>>
>> When it SUCCEEDS, the aggregate correctly appears like this:
>>
>>  "facet_counts":{"facet_queries":{},"facet_fields":{},
>> "facet_dates":{},"facet_ranges":{  "eventDate":{"counts":[
>>"2014-04-01T00:00:00Z",3],"gap":"+1DAY",
>> "start":"2014-01-01T00:00:00Z","end":"2015-01-01T00:00:00Z"}},
>> "facet_intervals":{}}}
>>
>> When it FAILS, note that the counts[] array is empty:
>>  "facet_counts":{"facet_queries":{},"facet_fields":{},
>> "facet_dates":{},"facet_ranges":{  "eventDate":{
>> "counts":[],"gap":"+1DAY","start":"2014-01-01T00:00:00Z",
>>  "end":"2015-01-01T00:00:00Z"}},"facet_intervals":{}}}
>>
>> If I further simplify the query, by removing range options or reducing to
>> one (1) collection name, then the problem goes away.
>>
>> The solr logs are clean at INFO level, and there is no substantive
>> difference in log output when the query succeeds vs fails, leaving me
>> stumped where to look next.  Suggestions welcome.
>> Regards,
>> David
>>
>>
>>
>>
>>
>
>


Re: splitshard the collection time out:900s

2014-12-16 Thread Anshum Gupta
As Joseph mentioned, the shard split is still running the background.

In case it fails (it shouldn't), if you're running Solr 4.8 or newer, I
would recommend using the ASYNC calls for long running collections API
calls as they have an accompanying REQUESTSTATUS Api call that gets you a
confirmation of task completion/failure.
e.g.
http:
//localhost:8983/solr/admin/collections?action=SPLITSHARD&collection=collection1&shard=shard1&async=1000

REQUESTSTATUS Call:
http:
//localhost:8983/solr/admin/collections?action=REQUESTSTATUS&requestid=1000



On Tue, Dec 16, 2014 at 10:37 AM, Joseph Obernberger <
joseph.obernber...@gmail.com> wrote:
>
> Shard splits can take a long time - the 900 seconds is just the REST
> timeout.  The split is still taking place.
>
> On Tue, Dec 16, 2014 at 12:43 PM, Trilok Prithvi  >
> wrote:
> >
> > Sorry... I sent without explaining the situation.
> >
> > We did splitshard:
> >
> >
> solr/admin/collections?action=SPLITSHARD&collection=anotherCollection&shard=shard1
> > and we got the above error.
> >
> > Any idea?
> >
> >
> >
> > On Tue, Dec 16, 2014 at 10:41 AM, Trilok Prithvi <
> trilok.prit...@gmail.com
> > >
> > wrote:
> > >
> > > 
> > > 
> > > 500
> > > 900395
> > > 
> > > 
> > > splitshard the collection time out:900s
> > > 
> > > org.apache.solr.common.SolrException: splitshard the collection time
> > > out:900s at
> > >
> >
> org.apache.solr.handler.admin.CollectionsHandler.handleResponse(CollectionsHandler.java:368)
> > > at
> > >
> >
> org.apache.solr.handler.admin.CollectionsHandler.handleSplitShardAction(CollectionsHandler.java:606)
> > > at
> > >
> >
> org.apache.solr.handler.admin.CollectionsHandler.handleRequestBody(CollectionsHandler.java:172)
> > > at
> > >
> >
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
> > > at
> > >
> >
> org.apache.solr.servlet.SolrDispatchFilter.handleAdminRequest(SolrDispatchFilter.java:729)
> > > at
> > >
> >
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:267)
> > > at
> > >
> >
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:207)
> > > at
> > >
> >
> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:241)
> > > at
> > >
> >
> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208)
> > > at
> > >
> >
> org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:220)
> > > at
> > >
> >
> org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:122)
> > > at
> > >
> >
> org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:170)
> > > at
> > >
> >
> org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:98)
> > > at
> > >
> org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:950)
> > > at
> > >
> >
> org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:116)
> > > at
> > >
> >
> org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:408)
> > > at
> > >
> >
> org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1040)
> > > at
> > >
> >
> org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:607)
> > > at
> > >
> >
> org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:313)
> > > at
> > >
> >
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> > > at
> > >
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> > > at java.lang.Thread.run(Thread.java:745)
> > > 
> > > 500
> > > 
> > > 
> > >
> >
>


-- 
Anshum Gupta
http://about.me/anshumgupta


Re: splitshard the collection time out:900s

2014-12-16 Thread Joseph Obernberger
Shard splits can take a long time - the 900 seconds is just the REST
timeout.  The split is still taking place.

On Tue, Dec 16, 2014 at 12:43 PM, Trilok Prithvi 
wrote:
>
> Sorry... I sent without explaining the situation.
>
> We did splitshard:
>
> solr/admin/collections?action=SPLITSHARD&collection=anotherCollection&shard=shard1
> and we got the above error.
>
> Any idea?
>
>
>
> On Tue, Dec 16, 2014 at 10:41 AM, Trilok Prithvi  >
> wrote:
> >
> > 
> > 
> > 500
> > 900395
> > 
> > 
> > splitshard the collection time out:900s
> > 
> > org.apache.solr.common.SolrException: splitshard the collection time
> > out:900s at
> >
> org.apache.solr.handler.admin.CollectionsHandler.handleResponse(CollectionsHandler.java:368)
> > at
> >
> org.apache.solr.handler.admin.CollectionsHandler.handleSplitShardAction(CollectionsHandler.java:606)
> > at
> >
> org.apache.solr.handler.admin.CollectionsHandler.handleRequestBody(CollectionsHandler.java:172)
> > at
> >
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
> > at
> >
> org.apache.solr.servlet.SolrDispatchFilter.handleAdminRequest(SolrDispatchFilter.java:729)
> > at
> >
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:267)
> > at
> >
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:207)
> > at
> >
> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:241)
> > at
> >
> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208)
> > at
> >
> org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:220)
> > at
> >
> org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:122)
> > at
> >
> org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:170)
> > at
> >
> org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:98)
> > at
> > org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:950)
> > at
> >
> org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:116)
> > at
> >
> org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:408)
> > at
> >
> org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1040)
> > at
> >
> org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:607)
> > at
> >
> org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:313)
> > at
> >
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> > at
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> > at java.lang.Thread.run(Thread.java:745)
> > 
> > 500
> > 
> > 
> >
>


Re: Identical query returning different aggregate results

2014-12-16 Thread David Smith
Alex,
Good suggestion, but in this case, no.  This example is from a cleanroom type 
test environment where the collections have very recently been created, there 
are only 4 documents total across all collections, and no delete's have been 
issued.
Kind regards,
David
 

 On Tuesday, December 16, 2014 12:01 PM, Alexandre Rafalovitch 
 wrote:
   

 Facet counts include deleted documents until the segments merge. Could that
be an issue?

Regards,
    Alex
On 16/12/2014 12:18 pm, "David Smith"  wrote:

> I have a prototype SolrCloud 4.10.2 setup with 13 collections (of 1
> replica, 1 shard each) and a separate 1-node Zookeeper 3.4.6.
> The very first app test case I wrote is failing intermittently in this
> environment, when I only have 4 documents ingested into the cloud.
> I dug in and found when I query against multiple collections, using the
> "collection=" parameter, the aggregates I request are correct about 50% of
> the time.  The other 50% of the time, the aggregate returned by Solr is not
> correct. Note this is for the identical query.  In other words, I can run
> the same query multiple times in a row, and get different answers.
>
> The simplest version of the query that still exhibits the odd behavior is
> as follows:
>
> http://192.168.59.103:8985/solr/query_handler/query?facet.range=eventDate&f.eventDate.facet.range.end=2014-12-31T23:59:59.999Z&f.eventDate.facet.range.gap=%2B1DAY&fl=eventDate,id&start=0&collection=2014_04,2014_03&rows=10&f.eventDate.facet.range.start=2014-01-01T00:00:00.000Z&q=*:*&f.eventDate.facet.mincount=1&facet=true
>
> When it SUCCEEDS, the aggregate correctly appears like this:
>
>  "facet_counts":{    "facet_queries":{},    "facet_fields":{},
> "facet_dates":{},    "facet_ranges":{      "eventDate":{        "counts":[
>        "2014-04-01T00:00:00Z",3],        "gap":"+1DAY",
> "start":"2014-01-01T00:00:00Z",        "end":"2015-01-01T00:00:00Z"}},
> "facet_intervals":{}}}
>
> When it FAILS, note that the counts[] array is empty:
>  "facet_counts":{    "facet_queries":{},    "facet_fields":{},
> "facet_dates":{},    "facet_ranges":{      "eventDate":{
> "counts":[],        "gap":"+1DAY",        "start":"2014-01-01T00:00:00Z",
>      "end":"2015-01-01T00:00:00Z"}},    "facet_intervals":{}}}
>
> If I further simplify the query, by removing range options or reducing to
> one (1) collection name, then the problem goes away.
>
> The solr logs are clean at INFO level, and there is no substantive
> difference in log output when the query succeeds vs fails, leaving me
> stumped where to look next.  Suggestions welcome.
> Regards,
> David
>
>
>
>
>

   

Re: Identical query returning different aggregate results

2014-12-16 Thread Alexandre Rafalovitch
Facet counts include deleted documents until the segments merge. Could that
be an issue?

Regards,
 Alex
On 16/12/2014 12:18 pm, "David Smith"  wrote:

> I have a prototype SolrCloud 4.10.2 setup with 13 collections (of 1
> replica, 1 shard each) and a separate 1-node Zookeeper 3.4.6.
> The very first app test case I wrote is failing intermittently in this
> environment, when I only have 4 documents ingested into the cloud.
> I dug in and found when I query against multiple collections, using the
> "collection=" parameter, the aggregates I request are correct about 50% of
> the time.  The other 50% of the time, the aggregate returned by Solr is not
> correct. Note this is for the identical query.  In other words, I can run
> the same query multiple times in a row, and get different answers.
>
> The simplest version of the query that still exhibits the odd behavior is
> as follows:
>
> http://192.168.59.103:8985/solr/query_handler/query?facet.range=eventDate&f.eventDate.facet.range.end=2014-12-31T23:59:59.999Z&f.eventDate.facet.range.gap=%2B1DAY&fl=eventDate,id&start=0&collection=2014_04,2014_03&rows=10&f.eventDate.facet.range.start=2014-01-01T00:00:00.000Z&q=*:*&f.eventDate.facet.mincount=1&facet=true
>
> When it SUCCEEDS, the aggregate correctly appears like this:
>
>   "facet_counts":{"facet_queries":{},"facet_fields":{},
> "facet_dates":{},"facet_ranges":{  "eventDate":{"counts":[
> "2014-04-01T00:00:00Z",3],"gap":"+1DAY",
> "start":"2014-01-01T00:00:00Z","end":"2015-01-01T00:00:00Z"}},
> "facet_intervals":{}}}
>
> When it FAILS, note that the counts[] array is empty:
>   "facet_counts":{"facet_queries":{},"facet_fields":{},
> "facet_dates":{},"facet_ranges":{  "eventDate":{
> "counts":[],"gap":"+1DAY","start":"2014-01-01T00:00:00Z",
>   "end":"2015-01-01T00:00:00Z"}},"facet_intervals":{}}}
>
> If I further simplify the query, by removing range options or reducing to
> one (1) collection name, then the problem goes away.
>
> The solr logs are clean at INFO level, and there is no substantive
> difference in log output when the query succeeds vs fails, leaving me
> stumped where to look next.  Suggestions welcome.
> Regards,
> David
>
>
>
>
>


Re: splitshard the collection time out:900s

2014-12-16 Thread Trilok Prithvi
Sorry... I sent without explaining the situation.

We did splitshard:
solr/admin/collections?action=SPLITSHARD&collection=anotherCollection&shard=shard1
and we got the above error.

Any idea?



On Tue, Dec 16, 2014 at 10:41 AM, Trilok Prithvi 
wrote:
>
> 
> 
> 500
> 900395
> 
> 
> splitshard the collection time out:900s
> 
> org.apache.solr.common.SolrException: splitshard the collection time
> out:900s at
> org.apache.solr.handler.admin.CollectionsHandler.handleResponse(CollectionsHandler.java:368)
> at
> org.apache.solr.handler.admin.CollectionsHandler.handleSplitShardAction(CollectionsHandler.java:606)
> at
> org.apache.solr.handler.admin.CollectionsHandler.handleRequestBody(CollectionsHandler.java:172)
> at
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
> at
> org.apache.solr.servlet.SolrDispatchFilter.handleAdminRequest(SolrDispatchFilter.java:729)
> at
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:267)
> at
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:207)
> at
> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:241)
> at
> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208)
> at
> org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:220)
> at
> org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:122)
> at
> org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:170)
> at
> org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:98)
> at
> org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:950)
> at
> org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:116)
> at
> org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:408)
> at
> org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1040)
> at
> org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:607)
> at
> org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:313)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
> 
> 500
> 
> 
>


splitshard the collection time out:900s

2014-12-16 Thread Trilok Prithvi


500
900395


splitshard the collection time out:900s

org.apache.solr.common.SolrException: splitshard the collection time
out:900s at
org.apache.solr.handler.admin.CollectionsHandler.handleResponse(CollectionsHandler.java:368)
at
org.apache.solr.handler.admin.CollectionsHandler.handleSplitShardAction(CollectionsHandler.java:606)
at
org.apache.solr.handler.admin.CollectionsHandler.handleRequestBody(CollectionsHandler.java:172)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
at
org.apache.solr.servlet.SolrDispatchFilter.handleAdminRequest(SolrDispatchFilter.java:729)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:267)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:207)
at
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:241)
at
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208)
at
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:220)
at
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:122)
at
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:170)
at
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:98)
at
org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:950)
at
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:116)
at
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:408)
at
org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1040)
at
org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:607)
at
org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:313)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)

500




Re: OutOfMemoryError

2014-12-16 Thread Trilok Prithvi
Thanks Shawn. We will increase the JVM to 4GB and see how it performs.

Alexandre,
Our queries are simple (with strdist() function in almost all the queries).
No facets, or sorts.
But we do a lot of data loads. We index data a lot (several documents,
ranging from 10 - 10 documents) and we upload data through out the day.
Basically, we are heavy on indexing and querying (simple queries) at the
same time.



On Tue, Dec 16, 2014 at 10:17 AM, Alexandre Rafalovitch 
wrote:
>
> What's your queries look like? Especially FQs, facets, sort, etc. All
> of those things require caches of various sorts.
>
> Regards,
>Alex.
> Personal: http://www.outerthoughts.com/ and @arafalov
> Solr resources and newsletter: http://www.solr-start.com/ and @solrstart
> Solr popularizers community: https://www.linkedin.com/groups?gid=6713853
>
>
> On 16 December 2014 at 11:55, Trilok Prithvi 
> wrote:
> > We are getting OOME pretty often (every hour or so). We are restarting
> > nodes to keep up with it.
> >
> > Here is our setup:
> > SolrCloud 4.10.2 (2 shards, 2 replicas) with 3 zookeepers.
> >
> > Each node has:
> > 16GB RAM
> > 2GB JVM (Xmx 2048, Xms 1024)
> > ~100 Million documents (split among 2 shards - ~50M on each shard)
> > Solr Core is about ~16GB of data on each node.
> >
> > *Physical Memory is almost always 99% full.*
> >
> >
> > The commit setup is as follows:
> >
> >> "dir">${solr.ulog.dir:}   30 > maxTime> 10 false  > autoCommit>  5000   > updateHandler>
> > Rest of the solrconfig.xml setup is all default.
> >
> > Some of the errors that we see on Solr ADMIN Logging is as follows:
> >
> > java.lang.OutOfMemoryError: Java heap space
> >
> > org.apache.solr.common.SolrException: no servers hosting shard:
> >
> > org.apache.http.TruncatedChunkException: Truncated chunk ( expected
> > size: 8192; actual size: 7222)
> >
> >
> > Please let me know if you need anymore information.
> >
> >
> > Thanks!
>


ANNOUNCE: CFP and Travel Assistance now open for ApacheCon North America 2015

2014-12-16 Thread Chris Hostetter


(NOTE: cross posted to several lucene lists, if you have replies, please 
confine them to general@lucene)


-- Forwarded message --

In case you've missed it:

- ApacheCon North America returns to Austin, Texas, 13-17 April 2015 
http://apachecon.com/

- Call for Papers open until 1 February --submissions and presentation 
guidelines 
http://events.linuxfoundation.org/events/apachecon-north-america/program/cfp

- Become involved with the program selection process --check out 
http://s.apache.org/60N

- Applications accepted for Apache Travel Assistance --deadline is 6 February! 
http://www.apache.org/travel/


We look forward to seeing you in Austin!



Re: OutOfMemoryError

2014-12-16 Thread Alexandre Rafalovitch
What's your queries look like? Especially FQs, facets, sort, etc. All
of those things require caches of various sorts.

Regards,
   Alex.
Personal: http://www.outerthoughts.com/ and @arafalov
Solr resources and newsletter: http://www.solr-start.com/ and @solrstart
Solr popularizers community: https://www.linkedin.com/groups?gid=6713853


On 16 December 2014 at 11:55, Trilok Prithvi  wrote:
> We are getting OOME pretty often (every hour or so). We are restarting
> nodes to keep up with it.
>
> Here is our setup:
> SolrCloud 4.10.2 (2 shards, 2 replicas) with 3 zookeepers.
>
> Each node has:
> 16GB RAM
> 2GB JVM (Xmx 2048, Xms 1024)
> ~100 Million documents (split among 2 shards - ~50M on each shard)
> Solr Core is about ~16GB of data on each node.
>
> *Physical Memory is almost always 99% full.*
>
>
> The commit setup is as follows:
>
>"dir">${solr.ulog.dir:}   30 maxTime> 10 false  autoCommit>  5000   updateHandler>
> Rest of the solrconfig.xml setup is all default.
>
> Some of the errors that we see on Solr ADMIN Logging is as follows:
>
> java.lang.OutOfMemoryError: Java heap space
>
> org.apache.solr.common.SolrException: no servers hosting shard:
>
> org.apache.http.TruncatedChunkException: Truncated chunk ( expected
> size: 8192; actual size: 7222)
>
>
> Please let me know if you need anymore information.
>
>
> Thanks!


Identical query returning different aggregate results

2014-12-16 Thread David Smith
I have a prototype SolrCloud 4.10.2 setup with 13 collections (of 1 replica, 1 
shard each) and a separate 1-node Zookeeper 3.4.6.  
The very first app test case I wrote is failing intermittently in this 
environment, when I only have 4 documents ingested into the cloud.
I dug in and found when I query against multiple collections, using the 
"collection=" parameter, the aggregates I request are correct about 50% of the 
time.  The other 50% of the time, the aggregate returned by Solr is not 
correct. Note this is for the identical query.  In other words, I can run the 
same query multiple times in a row, and get different answers.

The simplest version of the query that still exhibits the odd behavior is as 
follows:
http://192.168.59.103:8985/solr/query_handler/query?facet.range=eventDate&f.eventDate.facet.range.end=2014-12-31T23:59:59.999Z&f.eventDate.facet.range.gap=%2B1DAY&fl=eventDate,id&start=0&collection=2014_04,2014_03&rows=10&f.eventDate.facet.range.start=2014-01-01T00:00:00.000Z&q=*:*&f.eventDate.facet.mincount=1&facet=true

When it SUCCEEDS, the aggregate correctly appears like this:

  "facet_counts":{    "facet_queries":{},    "facet_fields":{},    
"facet_dates":{},    "facet_ranges":{      "eventDate":{        "counts":[      
    "2014-04-01T00:00:00Z",3],        "gap":"+1DAY",        
"start":"2014-01-01T00:00:00Z",        "end":"2015-01-01T00:00:00Z"}},    
"facet_intervals":{}}}

When it FAILS, note that the counts[] array is empty:
  "facet_counts":{    "facet_queries":{},    "facet_fields":{},    
"facet_dates":{},    "facet_ranges":{      "eventDate":{        "counts":[],    
    "gap":"+1DAY",        "start":"2014-01-01T00:00:00Z",        
"end":"2015-01-01T00:00:00Z"}},    "facet_intervals":{}}}

If I further simplify the query, by removing range options or reducing to one 
(1) collection name, then the problem goes away.

The solr logs are clean at INFO level, and there is no substantive difference 
in log output when the query succeeds vs fails, leaving me stumped where to 
look next.  Suggestions welcome.
Regards,
David






Re: OutOfMemoryError

2014-12-16 Thread Shawn Heisey

On 12/16/2014 9:55 AM, Trilok Prithvi wrote:

We are getting OOME pretty often (every hour or so). We are restarting
nodes to keep up with it.

Here is our setup:
SolrCloud 4.10.2 (2 shards, 2 replicas) with 3 zookeepers.

Each node has:
16GB RAM
2GB JVM (Xmx 2048, Xms 1024)
~100 Million documents (split among 2 shards - ~50M on each shard)
Solr Core is about ~16GB of data on each node.

*Physical Memory is almost always 99% full.*


I'm pretty sure that a 2GB heap will simply not be big enough for 100 
million documents.  The fact that you can get it to function for even an 
hour is pretty amazing.


If you can upgrade the memory beyond 16GB, you should ... and you'll 
need to increase your Java heap.  I would use 4GB as a starting point.


http://wiki.apache.org/solr/SolrPerformanceProblems#How_much_heap_space_do_I_need.3F

It's completely normal for physical memory to be full.  The OS uses 
available memory for disk caching.


http://en.wikipedia.org/wiki/Page_cache

Thanks,
Shawn



OutOfMemoryError

2014-12-16 Thread Trilok Prithvi
We are getting OOME pretty often (every hour or so). We are restarting
nodes to keep up with it.

Here is our setup:
SolrCloud 4.10.2 (2 shards, 2 replicas) with 3 zookeepers.

Each node has:
16GB RAM
2GB JVM (Xmx 2048, Xms 1024)
~100 Million documents (split among 2 shards - ~50M on each shard)
Solr Core is about ~16GB of data on each node.

*Physical Memory is almost always 99% full.*


The commit setup is as follows:

  ${solr.ulog.dir:}   30 10 false   5000  
Rest of the solrconfig.xml setup is all default.

Some of the errors that we see on Solr ADMIN Logging is as follows:

java.lang.OutOfMemoryError: Java heap space

org.apache.solr.common.SolrException: no servers hosting shard:

org.apache.http.TruncatedChunkException: Truncated chunk ( expected
size: 8192; actual size: 7222)


Please let me know if you need anymore information.


Thanks!


ApacheCon 2015 (April) UIMA Track

2014-12-16 Thread Marshall Schor
We are planning a UIMA Track at the next ApacheCon conference (being held in
Austin Texas, April 13-17th, 2015).

Topics / areas where talks are solicited include:

   - UIMA itself (including its subprojects), new features, directions, etc.,
which could be of interest to people using UIMA.
   - Interesting Applications built using UIMA, including how UIMA is of
benefit, of interest to people exploring UIMA and its uses
   - Demos of UIMA tooling, to appeal to users who want to become more effective
in their unstructured analysis work
   - Connections with other Apache projects - what they are and how they're
using UIMA in their analytics
   - Experiences with UIMA scaleout - to share with others who are scaling out
UIMA pipelines

The official call for papers is here:
http://events.linuxfoundation.org/events/apachecon-north-america/program/cfp.
Please consider submitting talks; the deadline is February 1st.  

Questions? Please email d...@uima.apache.org

-Marshall Schor, for the Apache UIMA project



Re: WordBreakSolrSpellChecker Usage

2014-12-16 Thread Matt Mongeau
James,

Thanks so much, you were spot on and it's great to understand why I was
getting the results I was. Solving this has been a breath of fresh air and
I appreciate greatly the advice and assistance you have given!

- Matt

On Tue, Dec 16, 2014 at 9:24 AM, Dyer, James 
wrote:
>
> Matt,
>
> Seeing the response, my guess is you have "point" in your index, and that
> it has a higher frequency than "rockpoint".  By default the spellchecker
> will never try to correct something that exists in your index.  Adding
> "spellcheck.onlyMorePopular=true" might help, but only if the correction
> has a higher frequency than the original.  Try using
> "spellcheck.alternativeTermCount=n" instead of
> "spellcheck.onlyMorePopular=true".  See
> http://wiki.apache.org/solr/SpellCheckComponent#spellcheck.alternativeTermCount
> for more information.
>
> James Dyer
> Ingram Content Group
> (615) 213-4311
>
> -Original Message-
> From: Matt Mongeau [mailto:halogenandto...@gmail.com]
> Sent: Monday, December 15, 2014 10:23 AM
> To: solr-user@lucene.apache.org
> Subject: Re: WordBreakSolrSpellChecker Usage
>
> I think you were right about maxChanges, that does seem get rid of the
> ridiculous values. However I don't seem to be getting anything reasonable.
> Most variations look something like:
>
>
> http://localhost:8982/solr/development/select?q=Rock+point&fq=type%3ACompany&wt=ruby&indent=true&defType=edismax&qf=name_text&stopwords=true&lowercaseOperators=true&spellcheck=true&spellcheck.count=20&spellcheck.onlyMorePopular=true&spellcheck.extendedResults=true&spellcheck.collate=true&spellcheck.maxCollations=1&spellcheck.maxCollationTries=10&spellcheck.accuracy=0.5
>
> {
>   'responseHeader'=>{
> 'status'=>0,
> 'QTime'=>20},
>   'response'=>{'numFound'=>0,'start'=>0,'docs'=>[]
>   },
>   'spellcheck'=>{
> 'suggestions'=>[
>   'rock',{
> 'numFound'=>5,
> 'startOffset'=>0,
> 'endOffset'=>4,
> 'origFreq'=>3,
> 'suggestion'=>[{
> 'word'=>'rocky',
> 'freq'=>3},
>   {
> 'word'=>'brook',
> 'freq'=>6},
>   {
> 'word'=>'york',
> 'freq'=>460},
>   {
> 'word'=>'oak',
> 'freq'=>7},
>   {
> 'word'=>'boca',
> 'freq'=>3}]},
>   'correctlySpelled',false]}}
>
>
> I'm going to post both my solrconfig.xml and schema.xml because maybe
> I'm just doing something crazy. They can both be found here:
> https://gist.github.com/halogenandtoast/76fd5dcfae1c4edeba30
>
>
> On Thu, Dec 11, 2014 at 1:19 PM, Dyer, James  >
> wrote:
> >
> > Matt,
> >
> > There is no exact number here, but I would think most people would want
> > "count" to be maybe 10-20.  Increasing this incurs a very small
> performance
> > penalty for each term it generates suggestions for, but you probably
> won't
> > notice a difference.  For "maxCollationTries", 5 is a reasonable number
> but
> > you might see improved collations if this is also perhaps 10.  With this
> > one, you get a much larger performance penalty, but only when it need to
> > try more combinations to return the "maxCollations".  In your case you
> have
> > this at 5 also, right?  I would reduce this to the maximum number of
> > re-written queries your application or users is actually going to use.
> In
> > a lot of cases, 1 is the right number here.  This would improve
> performance
> > for you in some cases.
> >
> > Possibly the reason “Rock point” > “Rockpoint” is failing is because you
> > have "maxChanges" set to 10.  This tells it you are willing for it to
> break
> > a word into 10 separate parts, or to combine up to 10 adjacent words into
> > 1.  Having taken a quick glance at the code, I think what is happening is
> > it is trying things like "r ock p oint" and "r o ck p o int", etc and
> never
> > getting to your intended result.  In a typical scenario I would set
> > "maxChanges" to 1-3, and often 1 is probably the most appropriate value
> > here.
> >
> > James Dyer
> > Ingram Content Group
> > (615) 213-4311
> >
> >
> > -Original Message-
> > From: Matt Mongeau [mailto:halogenandto...@gmail.com]
> > Sent: Thursday, December 11, 2014 11:34 AM
> > To: solr-user@lucene.apache.org
> > Subject: Re: WordBreakSolrSpellChecker Usage
> >
> > Is there a suggested value for this. I bumped them up to 20 and still
> > nothing has seemed to change.
> >
> > On Thu, Dec 11, 2014 at 9:42 AM, Dyer, James <
> james.d...@ingramcontent.com
> > >
> > wrote:
> >
> > > My first guess here, is seeing it works some of the time but not
> others,
> > > is that these values are too low:
> > >
> > > 5
> > > 5
> > >
> > > You know spellcheck.count is too low if the suggestion you want is not
> in
> > > the "suggestions" part of the response, but increasing it makes it get
> > > included.
> > >
> > > You know that spellcheck.maxCollationTries is too low if it exists in
> > > "suggestions" but it is not getting sugg

Re: Solr join not working in slorCloud env

2014-12-16 Thread Erick Erickson
Joins do are not supported when the various cores are not
on the same node, see:

https://wiki.apache.org/solr/DistributedSearch, the line:
Doesn't support Join -- (see https://issues.apache.org/jira/browse/LUCENE-3759)


Best,
Erick

On Mon, Dec 15, 2014 at 7:19 PM, ArnabK  wrote:
> For Eg: I have 2 shards in solr cloud environment, My query is like
> http://localhost:8983/solr/B/select?wt=json&indent=true&q=*:*&fq={!join
> from=id to=uid fromIndex=A}type:xxx
>
> Now what happens while retrieving data from collection B the query only
> searching in the shard which gave data for the join query {!join from=id
> to=uid fromIndex=A}type:xxx
>
> meaning if the document is found in shard 2 for collection A, the query is
> only searching the data in shard 2 even for collection B also.
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Solr-join-not-working-in-slorCloud-env-tp4174455.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Re: My new lemmatizer interfers with the highlighter

2014-12-16 Thread Erlend Garåsen


Thanks Ahmet,

I think I have solved the problem, but I didn't replace the line you 
suggested. Instead I added the createToken method with 
AttributeSource.State as a parameter and overrode the reset method. I 
cannot reproduce the problem anymore.


BTW, what's the purpose of AttributeSource.State? Perhaps that alone has 
solved the problem.


Erlend

On 15.12.14 16:13, Ahmet Arslan wrote:

Hi Erlend,

I have written a similar token filter. Please see :

https://github.com/iorixxx/lucene-solr-analysis-turkish/blob/master/src/main/java/org/apache/lucene/analysis/tr/Zemberek2DeasciifyFilterFactory.java

replace

final String[] values = stemmer.stem(tokenTerm);

with

stack = stemmer.stem(tokenTerm);

Ahmet




On Monday, December 15, 2014 4:53 PM, Michael Sokolov 
 wrote:
Well I think your first step should be finding a reproducible test case
and encoding it as a unit test.  But I suspect ultimately the fix will
be something to do with positionIncrement ...

-Mike


On 12/15/2014 09:08 AM, Erlend Garåsen wrote:

On 15.12.14 14:11, Michael Sokolov wrote:

I'm not sure, but is it necessary to set positionIncAttr to 1 when there
are *not* any lemmas found?  I think the usual pattern is to call
clearAttributes() at the start of incrementToken


It is set to 0 only if there are stems/lemmas found:
if (!terms.isEmpty()) {
   positionAttr.setPositionIncrement(0);

The terms list will only contain entries if there are lemmas found.

But maybe I should empty this list before I return true, just like this?

if (!terms.isEmpty()) {
   termAtt.setEmpty().append(terms.poll());
   positionAttr.setPositionIncrement(0);
   terms.clear();
   return true;
} else if ...





Re: All documents indexed into the same shard despite different prefix in id field

2014-12-16 Thread Will Miller
Thanks Chris...

I changed the test and assigned a unique number to each document as the prefix 
and the documents did index across the two shards. I then increased the data 
set to include documents from all 6 expected shard keys and I do see them being 
indexed across both shards. I was just lucky to have started testing with 3 
different prefixes that happened to index into the same shard. 

-Will


From: Chris Hostetter 
Sent: Monday, December 15, 2014 6:45 PM
To: solr-user@lucene.apache.org
Subject: Re: All documents indexed into the same shard despite different prefix 
in id field

: ?I have a SolrCloud cluster with two servers and I created a collection using 
two shards with this command:
...
: There were 230 documents in the set I indexed and there were 3 different 
prefixes (RM!, WW! and BH!) but all were routed into the same shard. Is there 
anything I can do to debug this further?

I'm not really a math expert but...

If you have N (2) shards, and a single prefix ("RM") there is a 100%
chance that that prefix will hash into 1 of those N=2 shards.

For a 2nd prefix ("WW") there is a 1/N (1/2) chance that it will hash into
the same shard as your first prefix ("RM").

Likewise, there is a 1/N (1/2) chance that any other prefix ("BH") will
hash into the same hard as your first prefix ("RM").

Which means there is a 25% (1/2 * 1/2 = 1/4) chance tha 3 randomly
selected prefixes will all hash to the same shard.

(In general, if you have N shards, and P # of unique prefixes, then the
odds that they all wind up in the same shard is going to be:
"(1/N)**(P-1)")

So i suspect you just go unlucky with the 3 prefixes you happen to try in
your small test.






-Hoss
http://www.lucidworks.com/


RE: WordBreakSolrSpellChecker Usage

2014-12-16 Thread Dyer, James
Matt,

Seeing the response, my guess is you have "point" in your index, and that it 
has a higher frequency than "rockpoint".  By default the spellchecker will 
never try to correct something that exists in your index.  Adding 
"spellcheck.onlyMorePopular=true" might help, but only if the correction has a 
higher frequency than the original.  Try using 
"spellcheck.alternativeTermCount=n" instead of 
"spellcheck.onlyMorePopular=true".  See 
http://wiki.apache.org/solr/SpellCheckComponent#spellcheck.alternativeTermCount 
for more information.

James Dyer
Ingram Content Group
(615) 213-4311

-Original Message-
From: Matt Mongeau [mailto:halogenandto...@gmail.com] 
Sent: Monday, December 15, 2014 10:23 AM
To: solr-user@lucene.apache.org
Subject: Re: WordBreakSolrSpellChecker Usage

I think you were right about maxChanges, that does seem get rid of the
ridiculous values. However I don't seem to be getting anything reasonable.
Most variations look something like:

http://localhost:8982/solr/development/select?q=Rock+point&fq=type%3ACompany&wt=ruby&indent=true&defType=edismax&qf=name_text&stopwords=true&lowercaseOperators=true&spellcheck=true&spellcheck.count=20&spellcheck.onlyMorePopular=true&spellcheck.extendedResults=true&spellcheck.collate=true&spellcheck.maxCollations=1&spellcheck.maxCollationTries=10&spellcheck.accuracy=0.5

{
  'responseHeader'=>{
'status'=>0,
'QTime'=>20},
  'response'=>{'numFound'=>0,'start'=>0,'docs'=>[]
  },
  'spellcheck'=>{
'suggestions'=>[
  'rock',{
'numFound'=>5,
'startOffset'=>0,
'endOffset'=>4,
'origFreq'=>3,
'suggestion'=>[{
'word'=>'rocky',
'freq'=>3},
  {
'word'=>'brook',
'freq'=>6},
  {
'word'=>'york',
'freq'=>460},
  {
'word'=>'oak',
'freq'=>7},
  {
'word'=>'boca',
'freq'=>3}]},
  'correctlySpelled',false]}}


I'm going to post both my solrconfig.xml and schema.xml because maybe
I'm just doing something crazy. They can both be found here:
https://gist.github.com/halogenandtoast/76fd5dcfae1c4edeba30


On Thu, Dec 11, 2014 at 1:19 PM, Dyer, James 
wrote:
>
> Matt,
>
> There is no exact number here, but I would think most people would want
> "count" to be maybe 10-20.  Increasing this incurs a very small performance
> penalty for each term it generates suggestions for, but you probably won't
> notice a difference.  For "maxCollationTries", 5 is a reasonable number but
> you might see improved collations if this is also perhaps 10.  With this
> one, you get a much larger performance penalty, but only when it need to
> try more combinations to return the "maxCollations".  In your case you have
> this at 5 also, right?  I would reduce this to the maximum number of
> re-written queries your application or users is actually going to use.  In
> a lot of cases, 1 is the right number here.  This would improve performance
> for you in some cases.
>
> Possibly the reason “Rock point” > “Rockpoint” is failing is because you
> have "maxChanges" set to 10.  This tells it you are willing for it to break
> a word into 10 separate parts, or to combine up to 10 adjacent words into
> 1.  Having taken a quick glance at the code, I think what is happening is
> it is trying things like "r ock p oint" and "r o ck p o int", etc and never
> getting to your intended result.  In a typical scenario I would set
> "maxChanges" to 1-3, and often 1 is probably the most appropriate value
> here.
>
> James Dyer
> Ingram Content Group
> (615) 213-4311
>
>
> -Original Message-
> From: Matt Mongeau [mailto:halogenandto...@gmail.com]
> Sent: Thursday, December 11, 2014 11:34 AM
> To: solr-user@lucene.apache.org
> Subject: Re: WordBreakSolrSpellChecker Usage
>
> Is there a suggested value for this. I bumped them up to 20 and still
> nothing has seemed to change.
>
> On Thu, Dec 11, 2014 at 9:42 AM, Dyer, James  >
> wrote:
>
> > My first guess here, is seeing it works some of the time but not others,
> > is that these values are too low:
> >
> > 5
> > 5
> >
> > You know spellcheck.count is too low if the suggestion you want is not in
> > the "suggestions" part of the response, but increasing it makes it get
> > included.
> >
> > You know that spellcheck.maxCollationTries is too low if it exists in
> > "suggestions" but it is not getting suggested in the "collation" section.
> >
> > James Dyer
> > Ingram Content Group
> > (615) 213-4311
> >
> >
> > -Original Message-
> > From: Matt Mongeau [mailto:halogenandto...@gmail.com]
> > Sent: Wednesday, December 10, 2014 12:43 PM
> > To: solr-user@lucene.apache.org
> > Subject: Fwd: WordBreakSolrSpellChecker Usage
> >
> > If I have my search component setup like this
> > https://gist.github.com/halogenandtoast/cf9f296d01527080f18c and I have
> an
> > entry for “Rockpoint” shouldn’t “Rock point” generate suggestions?
> >
> > This doesn't seem 

Re: first time user

2014-12-16 Thread Jack Krupansky
I believe the solution is simply that you need to become much more familiar 
with the capabilities of the tools that you are using. Asking for a specific 
solution isn't necessarily the best approach - it runs the risk of what we 
can "an XY problem", where you are asking us one thing, but the real problem 
is further upstream and hasn't been fully expressed. My model is to give you 
a lot of examples and you can decide for yourself which best exemplifies 
what you are trying to do. And to give more detail on the features of Solr.


-- Jack Krupansky

-Original Message- 
From: onyourmark

Sent: Tuesday, December 16, 2014 7:28 AM
To: solr-user@lucene.apache.org
Subject: Re: first time user

Thanks Jack. Can I ask, does it give a solution to my problem of the
semicolons in the text and as delimiters?

Bill

On Tue, Dec 16, 2014 at 9:19 PM, Jack Krupansky-2 [via Lucene] <
ml-node+s472066n4174529...@n3.nabble.com> wrote:


My Solr Deep Dive e-book has full details and lots of examples for CSV
indexing:

http://www.lulu.com/us/en/shop/jack-krupansky/solr-4x-deep-dive-early-access-release-7/ebook/product-21203548.html

-- Jack Krupansky

-Original Message-
From: Alexandre Rafalovitch
Sent: Tuesday, December 16, 2014 12:12 AM
To: solr-user
Subject: Re: first time user

Just test a manual example yourself. Much easier. I am pretty sure Solr
can't read your mind which particular semicolon is which use case.

Worse though, I can't remember how smart it is about quotes either.

Easier to test than to guess.

Regards,
Alex
On 15/12/2014 7:28 pm, "onyourmark" <[hidden email]
> wrote:

> Hi Alex, thank you for the response and information. In your opinion,
data
> is
> stored in semicolon delimited files and some of the fields in the data
are
> text and may on occasion have semicolons in them, will it be possible
for
> solr to index the data properly by itself or will I have to use some
> outside
> scripting language like python to enclose all text with quotation marks?
> Thanks again.
>
>
>
> --
> View this message in context:
>
http://lucene.472066.n3.nabble.com/first-time-user-tp4174121p4174449.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>



--
 If you reply to this email, your message will be added to the discussion
below:
http://lucene.472066.n3.nabble.com/first-time-user-tp4174121p4174529.html
 To unsubscribe from first time user, click here

.
NAML







--
View this message in context: 
http://lucene.472066.n3.nabble.com/first-time-user-tp4174121p4174531.html
Sent from the Solr - User mailing list archive at Nabble.com. 



Re: first time user

2014-12-16 Thread onyourmark
Thanks Jack. Can I ask, does it give a solution to my problem of the
semicolons in the text and as delimiters?

Bill

On Tue, Dec 16, 2014 at 9:19 PM, Jack Krupansky-2 [via Lucene] <
ml-node+s472066n4174529...@n3.nabble.com> wrote:
>
> My Solr Deep Dive e-book has full details and lots of examples for CSV
> indexing:
>
> http://www.lulu.com/us/en/shop/jack-krupansky/solr-4x-deep-dive-early-access-release-7/ebook/product-21203548.html
>
> -- Jack Krupansky
>
> -Original Message-
> From: Alexandre Rafalovitch
> Sent: Tuesday, December 16, 2014 12:12 AM
> To: solr-user
> Subject: Re: first time user
>
> Just test a manual example yourself. Much easier. I am pretty sure Solr
> can't read your mind which particular semicolon is which use case.
>
> Worse though, I can't remember how smart it is about quotes either.
>
> Easier to test than to guess.
>
> Regards,
> Alex
> On 15/12/2014 7:28 pm, "onyourmark" <[hidden email]
> > wrote:
>
> > Hi Alex, thank you for the response and information. In your opinion,
> data
> > is
> > stored in semicolon delimited files and some of the fields in the data
> are
> > text and may on occasion have semicolons in them, will it be possible
> for
> > solr to index the data properly by itself or will I have to use some
> > outside
> > scripting language like python to enclose all text with quotation marks?
> > Thanks again.
> >
> >
> >
> > --
> > View this message in context:
> >
> http://lucene.472066.n3.nabble.com/first-time-user-tp4174121p4174449.html
> > Sent from the Solr - User mailing list archive at Nabble.com.
> >
>
>
>
> --
>  If you reply to this email, your message will be added to the discussion
> below:
> http://lucene.472066.n3.nabble.com/first-time-user-tp4174121p4174529.html
>  To unsubscribe from first time user, click here
> 
> .
> NAML
> 
>




--
View this message in context: 
http://lucene.472066.n3.nabble.com/first-time-user-tp4174121p4174531.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: first time user

2014-12-16 Thread Jack Krupansky
My Solr Deep Dive e-book has full details and lots of examples for CSV 
indexing:

http://www.lulu.com/us/en/shop/jack-krupansky/solr-4x-deep-dive-early-access-release-7/ebook/product-21203548.html

-- Jack Krupansky

-Original Message- 
From: Alexandre Rafalovitch

Sent: Tuesday, December 16, 2014 12:12 AM
To: solr-user
Subject: Re: first time user

Just test a manual example yourself. Much easier. I am pretty sure Solr
can't read your mind which particular semicolon is which use case.

Worse though, I can't remember how smart it is about quotes either.

Easier to test than to guess.

Regards,
   Alex
On 15/12/2014 7:28 pm, "onyourmark"  wrote:


Hi Alex, thank you for the response and information. In your opinion, data
is
stored in semicolon delimited files and some of the fields in the data are
text and may on occasion have semicolons in them, will it be possible for
solr to index the data properly by itself or will I have to use some
outside
scripting language like python to enclose all text with quotation marks?
Thanks again.



--
View this message in context:
http://lucene.472066.n3.nabble.com/first-time-user-tp4174121p4174449.html
Sent from the Solr - User mailing list archive at Nabble.com.





Re: Solr hangs on distributed updates

2014-12-16 Thread Peter Keegan
> As of 4.10, commits/optimize etc are executed in parallel.
Excellent - thanks.

On Tue, Dec 16, 2014 at 6:51 AM, Shalin Shekhar Mangar <
shalinman...@gmail.com> wrote:
>
> On Tue, Dec 16, 2014 at 11:34 AM, Peter Keegan 
> wrote:
> >
> > > A distributed update is streamed to all available replicas in parallel.
> >
> > Hmm, that's not what I'm seeing with 4.6.1, as I tail the logs on leader
> > and replicas. Mark Miller comments on this last May:
> >
> >
> >
> http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201404.mbox/%3CetPan.534d8d6d.74b0dc51.13a79@airmetal.local%3E
> >
> >
> Yes, sorry I didn't notice that you are on 4.6.1. This was changed in 4.10
> with https://issues.apache.org/jira/browse/SOLR-6264
>
> As of 4.10, commits/optimize etc are executed in parallel.
>
>
> > On Mon, Dec 15, 2014 at 8:11 PM, Shalin Shekhar Mangar <
> > shalinman...@gmail.com> wrote:
> > >
> > > On Mon, Dec 15, 2014 at 8:41 PM, Peter Keegan 
> > > wrote:
> > > >
> > > > If a timeout occurs, does the distributed update then go to the next
> > > > replica?
> > > >
> > >
> > > A distributed update is streamed to all available replicas in parallel.
> > >
> > >
> > > >
> > > > On Fri, Dec 12, 2014 at 3:42 PM, Shalin Shekhar Mangar <
> > > > shalinman...@gmail.com> wrote:
> > > > >
> > > > > Sorry I should have specified. These timeouts go inside the
> > 
> > > > > section and apply for inter-shard update requests only. The socket
> > and
> > > > > connection timeout inside the shardHandlerFactory section apply for
> > > > > inter-shard search requests.
> > > > >
> > > > > On Fri, Dec 12, 2014 at 8:38 PM, Peter Keegan <
> > peterlkee...@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > Btw, are the following timeouts still supported in solr.xml, and
> do
> > > > they
> > > > > > only apply to distributed search?
> > > > > >
> > > > > >> > > > > class="HttpShardHandlerFactory">
> > > > > > ${socketTimeout:0}
> > > > > > ${connTimeout:0}
> > > > > >   
> > > > > >
> > > > > > Thanks,
> > > > > > Peter
> > > > > >
> > > > > > On Fri, Dec 12, 2014 at 3:14 PM, Peter Keegan <
> > > peterlkee...@gmail.com>
> > > > > > wrote:
> > > > > >
> > > > > > > No, I wasn't aware of these. I will give that a try. If I stop
> > the
> > > > Solr
> > > > > > > jetty service manually, things recover fine, but the hang
> occurs
> > > > when I
> > > > > > > 'stop' or 'terminate' the EC2 instance. The Zookeeper leader
> > > reports
> > > > a
> > > > > > > 15-sec timeout from the stopped node, and expires the session,
> > but
> > > > the
> > > > > > Solr
> > > > > > > leader never gets notified. This seems like a bug in ZK.
> > > > > > >
> > > > > > > Thanks,
> > > > > > > Peter
> > > > > > >
> > > > > > >
> > > > > > > On Fri, Dec 12, 2014 at 2:43 PM, Shalin Shekhar Mangar <
> > > > > > > shalinman...@gmail.com> wrote:
> > > > > > >
> > > > > > >> Do you have distribUpdateConnTimeout and
> distribUpdateSoTimeout
> > > set
> > > > to
> > > > > > >> reasonable values in your solr.xml? These are the timeouts
> used
> > > for
> > > > > > >> inter-shard update requests.
> > > > > > >>
> > > > > > >> On Fri, Dec 12, 2014 at 2:20 PM, Peter Keegan <
> > > > peterlkee...@gmail.com
> > > > > >
> > > > > > >> wrote:
> > > > > > >>
> > > > > > >> > We are running SolrCloud in AWS and using their auto scaling
> > > > groups
> > > > > to
> > > > > > >> spin
> > > > > > >> > up new Solr replicas when CPU utilization exceeds a
> threshold
> > > for
> > > > a
> > > > > > >> period
> > > > > > >> > of time. All is well until the replicas are terminated when
> > CPU
> > > > > > >> utilization
> > > > > > >> > falls below another threshold. What happens is that index
> > > updates
> > > > > sent
> > > > > > >> to
> > > > > > >> > the Solr leader hang forever in both the Solr leader and the
> > > SolrJ
> > > > > > >> client
> > > > > > >> > app. Searches work fine.  Here are 2 thread stack traces
> from
> > > the
> > > > > Solr
> > > > > > >> > leader and 2 from the client app:
> > > > > > >> >
> > > > > > >> > 1) Solr-leader thread doing a distributed commit:
> > > > > > >> >
> > > > > > >> > Thread 23527: (state = IN_NATIVE)
> > > > > > >> >  -
> > > java.net.SocketInputStream.socketRead0(java.io.FileDescriptor,
> > > > > > >> byte[],
> > > > > > >> > int, int, int) @bci=0 (Compiled frame; information may be
> > > > imprecise)
> > > > > > >> >  - java.net.SocketInputStream.read(byte[], int, int, int)
> > > @bci=79,
> > > > > > >> line=150
> > > > > > >> > (Compiled frame)
> > > > > > >> >  - java.net.SocketInputStream.read(byte[], int, int)
> @bci=11,
> > > > > line=121
> > > > > > >> > (Compiled frame)
> > > > > > >> >  -
> > > org.apache.http.impl.io.AbstractSessionInputBuffer.fillBuffer()
> > > > > > >> @bci=71,
> > > > > > >> > line=166 (Compiled frame)
> > > > > > >> >  - org.apache.http.impl.io.SocketInputBuffer.fillBuffer()
> > > @bci=1,
> > > > > > >> line=90
> > > > > > >> > (Compiled frame)
> > > > > > >> >  -
> > > > > > >> >
> > > > > 

Re: Solr hangs on distributed updates

2014-12-16 Thread Shalin Shekhar Mangar
On Tue, Dec 16, 2014 at 11:34 AM, Peter Keegan 
wrote:
>
> > A distributed update is streamed to all available replicas in parallel.
>
> Hmm, that's not what I'm seeing with 4.6.1, as I tail the logs on leader
> and replicas. Mark Miller comments on this last May:
>
>
> http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201404.mbox/%3CetPan.534d8d6d.74b0dc51.13a79@airmetal.local%3E
>
>
Yes, sorry I didn't notice that you are on 4.6.1. This was changed in 4.10
with https://issues.apache.org/jira/browse/SOLR-6264

As of 4.10, commits/optimize etc are executed in parallel.


> On Mon, Dec 15, 2014 at 8:11 PM, Shalin Shekhar Mangar <
> shalinman...@gmail.com> wrote:
> >
> > On Mon, Dec 15, 2014 at 8:41 PM, Peter Keegan 
> > wrote:
> > >
> > > If a timeout occurs, does the distributed update then go to the next
> > > replica?
> > >
> >
> > A distributed update is streamed to all available replicas in parallel.
> >
> >
> > >
> > > On Fri, Dec 12, 2014 at 3:42 PM, Shalin Shekhar Mangar <
> > > shalinman...@gmail.com> wrote:
> > > >
> > > > Sorry I should have specified. These timeouts go inside the
> 
> > > > section and apply for inter-shard update requests only. The socket
> and
> > > > connection timeout inside the shardHandlerFactory section apply for
> > > > inter-shard search requests.
> > > >
> > > > On Fri, Dec 12, 2014 at 8:38 PM, Peter Keegan <
> peterlkee...@gmail.com>
> > > > wrote:
> > > >
> > > > > Btw, are the following timeouts still supported in solr.xml, and do
> > > they
> > > > > only apply to distributed search?
> > > > >
> > > > >> > > > class="HttpShardHandlerFactory">
> > > > > ${socketTimeout:0}
> > > > > ${connTimeout:0}
> > > > >   
> > > > >
> > > > > Thanks,
> > > > > Peter
> > > > >
> > > > > On Fri, Dec 12, 2014 at 3:14 PM, Peter Keegan <
> > peterlkee...@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > No, I wasn't aware of these. I will give that a try. If I stop
> the
> > > Solr
> > > > > > jetty service manually, things recover fine, but the hang occurs
> > > when I
> > > > > > 'stop' or 'terminate' the EC2 instance. The Zookeeper leader
> > reports
> > > a
> > > > > > 15-sec timeout from the stopped node, and expires the session,
> but
> > > the
> > > > > Solr
> > > > > > leader never gets notified. This seems like a bug in ZK.
> > > > > >
> > > > > > Thanks,
> > > > > > Peter
> > > > > >
> > > > > >
> > > > > > On Fri, Dec 12, 2014 at 2:43 PM, Shalin Shekhar Mangar <
> > > > > > shalinman...@gmail.com> wrote:
> > > > > >
> > > > > >> Do you have distribUpdateConnTimeout and distribUpdateSoTimeout
> > set
> > > to
> > > > > >> reasonable values in your solr.xml? These are the timeouts used
> > for
> > > > > >> inter-shard update requests.
> > > > > >>
> > > > > >> On Fri, Dec 12, 2014 at 2:20 PM, Peter Keegan <
> > > peterlkee...@gmail.com
> > > > >
> > > > > >> wrote:
> > > > > >>
> > > > > >> > We are running SolrCloud in AWS and using their auto scaling
> > > groups
> > > > to
> > > > > >> spin
> > > > > >> > up new Solr replicas when CPU utilization exceeds a threshold
> > for
> > > a
> > > > > >> period
> > > > > >> > of time. All is well until the replicas are terminated when
> CPU
> > > > > >> utilization
> > > > > >> > falls below another threshold. What happens is that index
> > updates
> > > > sent
> > > > > >> to
> > > > > >> > the Solr leader hang forever in both the Solr leader and the
> > SolrJ
> > > > > >> client
> > > > > >> > app. Searches work fine.  Here are 2 thread stack traces from
> > the
> > > > Solr
> > > > > >> > leader and 2 from the client app:
> > > > > >> >
> > > > > >> > 1) Solr-leader thread doing a distributed commit:
> > > > > >> >
> > > > > >> > Thread 23527: (state = IN_NATIVE)
> > > > > >> >  -
> > java.net.SocketInputStream.socketRead0(java.io.FileDescriptor,
> > > > > >> byte[],
> > > > > >> > int, int, int) @bci=0 (Compiled frame; information may be
> > > imprecise)
> > > > > >> >  - java.net.SocketInputStream.read(byte[], int, int, int)
> > @bci=79,
> > > > > >> line=150
> > > > > >> > (Compiled frame)
> > > > > >> >  - java.net.SocketInputStream.read(byte[], int, int) @bci=11,
> > > > line=121
> > > > > >> > (Compiled frame)
> > > > > >> >  -
> > org.apache.http.impl.io.AbstractSessionInputBuffer.fillBuffer()
> > > > > >> @bci=71,
> > > > > >> > line=166 (Compiled frame)
> > > > > >> >  - org.apache.http.impl.io.SocketInputBuffer.fillBuffer()
> > @bci=1,
> > > > > >> line=90
> > > > > >> > (Compiled frame)
> > > > > >> >  -
> > > > > >> >
> > > > > >> >
> > > > > >>
> > > > >
> > > >
> > >
> >
> org.apache.http.impl.io.AbstractSessionInputBuffer.readLine(org.apache.http.util.CharArrayBuffer)
> > > > > >> > @bci=137, line=281 (Compiled frame)
> > > > > >> >  -
> > > > > >> >
> > > > > >> >
> > > > > >>
> > > > >
> > > >
> > >
> >
> org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(org.apache.http.io.SessionInputBuffer)
> > > > > >> > @bci=16, line=92 (Compiled frame)
> > > > > >> >  -
> > > > > >> >
> 

Re: Solr hangs on distributed updates

2014-12-16 Thread Peter Keegan
> A distributed update is streamed to all available replicas in parallel.

Hmm, that's not what I'm seeing with 4.6.1, as I tail the logs on leader
and replicas. Mark Miller comments on this last May:

http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201404.mbox/%3CetPan.534d8d6d.74b0dc51.13a79@airmetal.local%3E

On Mon, Dec 15, 2014 at 8:11 PM, Shalin Shekhar Mangar <
shalinman...@gmail.com> wrote:
>
> On Mon, Dec 15, 2014 at 8:41 PM, Peter Keegan 
> wrote:
> >
> > If a timeout occurs, does the distributed update then go to the next
> > replica?
> >
>
> A distributed update is streamed to all available replicas in parallel.
>
>
> >
> > On Fri, Dec 12, 2014 at 3:42 PM, Shalin Shekhar Mangar <
> > shalinman...@gmail.com> wrote:
> > >
> > > Sorry I should have specified. These timeouts go inside the 
> > > section and apply for inter-shard update requests only. The socket and
> > > connection timeout inside the shardHandlerFactory section apply for
> > > inter-shard search requests.
> > >
> > > On Fri, Dec 12, 2014 at 8:38 PM, Peter Keegan 
> > > wrote:
> > >
> > > > Btw, are the following timeouts still supported in solr.xml, and do
> > they
> > > > only apply to distributed search?
> > > >
> > > >> > > class="HttpShardHandlerFactory">
> > > > ${socketTimeout:0}
> > > > ${connTimeout:0}
> > > >   
> > > >
> > > > Thanks,
> > > > Peter
> > > >
> > > > On Fri, Dec 12, 2014 at 3:14 PM, Peter Keegan <
> peterlkee...@gmail.com>
> > > > wrote:
> > > >
> > > > > No, I wasn't aware of these. I will give that a try. If I stop the
> > Solr
> > > > > jetty service manually, things recover fine, but the hang occurs
> > when I
> > > > > 'stop' or 'terminate' the EC2 instance. The Zookeeper leader
> reports
> > a
> > > > > 15-sec timeout from the stopped node, and expires the session, but
> > the
> > > > Solr
> > > > > leader never gets notified. This seems like a bug in ZK.
> > > > >
> > > > > Thanks,
> > > > > Peter
> > > > >
> > > > >
> > > > > On Fri, Dec 12, 2014 at 2:43 PM, Shalin Shekhar Mangar <
> > > > > shalinman...@gmail.com> wrote:
> > > > >
> > > > >> Do you have distribUpdateConnTimeout and distribUpdateSoTimeout
> set
> > to
> > > > >> reasonable values in your solr.xml? These are the timeouts used
> for
> > > > >> inter-shard update requests.
> > > > >>
> > > > >> On Fri, Dec 12, 2014 at 2:20 PM, Peter Keegan <
> > peterlkee...@gmail.com
> > > >
> > > > >> wrote:
> > > > >>
> > > > >> > We are running SolrCloud in AWS and using their auto scaling
> > groups
> > > to
> > > > >> spin
> > > > >> > up new Solr replicas when CPU utilization exceeds a threshold
> for
> > a
> > > > >> period
> > > > >> > of time. All is well until the replicas are terminated when CPU
> > > > >> utilization
> > > > >> > falls below another threshold. What happens is that index
> updates
> > > sent
> > > > >> to
> > > > >> > the Solr leader hang forever in both the Solr leader and the
> SolrJ
> > > > >> client
> > > > >> > app. Searches work fine.  Here are 2 thread stack traces from
> the
> > > Solr
> > > > >> > leader and 2 from the client app:
> > > > >> >
> > > > >> > 1) Solr-leader thread doing a distributed commit:
> > > > >> >
> > > > >> > Thread 23527: (state = IN_NATIVE)
> > > > >> >  -
> java.net.SocketInputStream.socketRead0(java.io.FileDescriptor,
> > > > >> byte[],
> > > > >> > int, int, int) @bci=0 (Compiled frame; information may be
> > imprecise)
> > > > >> >  - java.net.SocketInputStream.read(byte[], int, int, int)
> @bci=79,
> > > > >> line=150
> > > > >> > (Compiled frame)
> > > > >> >  - java.net.SocketInputStream.read(byte[], int, int) @bci=11,
> > > line=121
> > > > >> > (Compiled frame)
> > > > >> >  -
> org.apache.http.impl.io.AbstractSessionInputBuffer.fillBuffer()
> > > > >> @bci=71,
> > > > >> > line=166 (Compiled frame)
> > > > >> >  - org.apache.http.impl.io.SocketInputBuffer.fillBuffer()
> @bci=1,
> > > > >> line=90
> > > > >> > (Compiled frame)
> > > > >> >  -
> > > > >> >
> > > > >> >
> > > > >>
> > > >
> > >
> >
> org.apache.http.impl.io.AbstractSessionInputBuffer.readLine(org.apache.http.util.CharArrayBuffer)
> > > > >> > @bci=137, line=281 (Compiled frame)
> > > > >> >  -
> > > > >> >
> > > > >> >
> > > > >>
> > > >
> > >
> >
> org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(org.apache.http.io.SessionInputBuffer)
> > > > >> > @bci=16, line=92 (Compiled frame)
> > > > >> >  -
> > > > >> >
> > > > >> >
> > > > >>
> > > >
> > >
> >
> org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(org.apache.http.io.SessionInputBuffer)
> > > > >> > @bci=2, line=61 (Compiled frame)
> > > > >> >  - org.apache.http.impl.io.AbstractMessageParser.parse()
> @bci=38,
> > > > >> line=254
> > > > >> > (Compiled frame)
> > > > >> >  -
> > > > >> >
> > > > >>
> > > >
> > org.apache.http.impl.AbstractHttpClientConnection.receiveResponseHeader()
> > > > >> > @bci=8, line=289 (Compiled frame)
> > > > >> >  -
> > > > >> >
> > > > >>
> > > >
> > org.apache.http.impl.conn.DefaultC

Re: Solr 4.10.2 "Found core" but I get "No cores available" in dashboard page

2014-12-16 Thread Ere Maijala
Do you have the jts libraries (e.g. jts-1.13.jar) in Solr's classpath 
(quoting from https://wiki.apache.org/solr/SolrAdaptersForLuceneSpatial4 
"it needs to be in WEB-INF/lib in Solr's war file, basically")?


--Ere

13.12.2014, 1.54, solr-user kirjoitti:

I did find out the cause of my problems.  Turns out the problem wasn't due to
the solrconfig.xml file; it was in the schema.xml file

I spent a fair bit of time making my solrconfig closer to the default
solrconfig.xml in the solr download; when that didnt get rid of the error I
went back to the only other file we had that was different

Turns out the line that was causing the problem was the middle line in this
location_rpt fieldtype definition:

 

The spatialContextFactory line caused the core to not load even tho no
error/warning messages were shown.

I missed that extra line somehow; mea culpa.

Anyhow, I really appreciate the responses/help I got on this issue.  many
thanks



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-4-10-2-Found-core-but-I-get-No-cores-available-in-dashboard-page-tp4173602p4174118.html
Sent from the Solr - User mailing list archive at Nabble.com.




--
Ere Maijala
Kansalliskirjasto / The National Library of Finland


Re: problem with tutorial

2014-12-16 Thread Tomoko Uchida
Hi,

At first, you should check "solr.log" file in your Solr running directory
(maybe JVMs stacktraces in it.)
You might able to find clues from logs.
If you cannot solve the problem, post to mailing list again with your exact
command (options too), and stacktrace log.

Thanks,
Tomoko

2014-12-16 16:45 GMT+09:00 Xin Cai :
>
> hi Everyone
> I am a complete noob when it comes to Solr and when I try to follow the
> tutorial and run Solr I get the error message
>
> "Waiting to see Solr listening on port 8983 [-]  Still not seeing Solr
> listening on 8983 after 30 seconds!"
>
> I did some googling and all I found was instruction for removing grep
> commands which doesn't sound right to me...I have checked my ports and
> currently I don't have any service listening on port 8983 and my firewall
> is not on, so I am not sure what is happening. Any help would be
> appreciated. Thanks
>
> Xin Cai
>