date:20160111

Re: solrcloud -How to delete a doc at a specific shard

2016-01-11 Thread elvis鱼人

i mean i changed zookeeper server from one ip address to the other ip
address.
it is too hard understand,what do you mean "lots"?





--
View this message in context: 
http://lucene.472066.n3.nabble.com/solrcloud-How-to-delete-a-doc-at-a-specific-shard-tp4249354p4250078.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: solrcloud -How to delete a doc at a specific shard

2016-01-11 Thread elvis鱼人

try config core.properties



--
View this message in context: 
http://lucene.472066.n3.nabble.com/solrcloud-How-to-delete-a-doc-at-a-specific-shard-tp4249354p4250079.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: solrcloud -How to delete a doc at a specific shard

2016-01-11 Thread vidya

Hi

I am new to solr and having a doubt on how one can know that a particular
shard is in that particular node or ip address.

Thanks in advance



--
View this message in context: 
http://lucene.472066.n3.nabble.com/solrcloud-How-to-delete-a-doc-at-a-specific-shard-tp4249354p4250072.html
Sent from the Solr - User mailing list archive at Nabble.com.

solr in action - multiple language content in one field

2016-01-11 Thread vidya

Hi

I have gone through solr in action 14th chapter which tells - "searching
content in multiple languages" . But i have a doubt that when i put
documents in solr web UI, it recognises every language and gives me the
result when queried for it. What exactly did they depict in that chapter.
can't solr recognise and process all languages at a time?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/solr-in-action-multiple-language-content-in-one-field-tp4250071.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr /export handler is exporting only unique values from multivalued field?

2016-01-11 Thread Alok Bhandari

Thanks Joel.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-export-handler-is-exporting-only-unique-values-from-multivalued-field-tp4249986p4250067.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: solrcloud -How to delete a doc at a specific shard

2016-01-11 Thread Erick Erickson

OK, what exactly do you mean you "changed zookeeper"? If you went in and
reassigned IP addresses to nodes then all bets are off.

So do you have just a single (or a few) docs that are dups or lots? And by
"lots", I'm thinking if all the duplicate IDs are documents that have been
indexed since you "changed zookeeper", then I suspect that's the root and
you should recreate your collection and re-index it.

Manually editing Zookeeper nodes is a last resort, something you definitely
should _not_ do without a _very_ good reason.

Also, is it possible that some of your docs somehow gained a non-printing
character in the ID (and it's a string)? I'm thinking leading or trailing spaces
here.

Best,
Erick

On Mon, Jan 11, 2016 at 7:32 PM, elvis鱼人  wrote:
> hi Erick ,i really want to know too.
> i remembered change zookeeper,may be correlated with it
>
> shard1:
>   192.168.100.210:7001-leader
>   192.168.100.211:7001-replica
>
> shard2:
>   192.168.100.211:7002:leader
>   192.168.100.212:7001:replica
>
> shard3:
>   192.168.100.210:7002:leader
>   192.168.100.212:7002:replica
> It happens in shard1
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/solrcloud-How-to-delete-a-doc-at-a-specific-shard-tp4249354p4250058.html
> Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solrcloud for Java 1.6

2016-01-11 Thread Zap Org

hello shawn yes it is written in a style readble with java 7 what i have
done is, altered the syntax into java 6 and then compile and it is 100%
working. if you need, i can send you the jar.

On Fri, Jan 8, 2016 at 11:59 AM, Shawn Heisey  wrote:

> On 1/7/2016 10:59 PM, Zap Org wrote:
> > i have solr5.0.0 installed in one machine with JVM specs java 1.7 32GB
> RAM
> > now from another machine i am sending query request to this machine with
> > specs JAVA 1.6
> > so what really happening solr5.0.0 (1.7) is communicating with
> > solrj5.0.0(1.6)
>
> The system with SolrJ 5.0.0 must have a newer Java version installed.
> It is simply not possible to run Solr or SolrJ 5.0.0 with Java 6.
>
> If you attempt to run version 4.8 or later with Java 6, you'll get an
> error that looks like this:
>
> Exception in thread "main" java.lang.UnsupportedClassVersionError:
> org/apache/solr/xxx/yyy: Unsupported major.minor version 51.0
>
> Here is some information about what the specific major.minor version
> number means:
>
> http://stackoverflow.com/a/11432195
>
> Just for giggles, I attempted to compile SolrJ (which is a standalone
> module not dependent on the rest of Solr) in the branch_5x checkout with
> the build script changed to produce jars compatible with version 1.6.
> It would not compile when the target was set to 1.6, because the way the
> code is written *requires* Java 7, and has since version 4.8.
>
> Thanks,
> Shawn
>
>

Re: Solr 5.3.1 ArrayIndexOutOfBoundsException while running a query

2016-01-11 Thread Erick Erickson

The Solr logs should have a much more complete stack trace if you can
locate them.

1G of memory is very little for any serious Solr. I'm assuming you
restarted Solr after the
OOM, but Java isn't entirely reliable after an OOM.

FWIW,
Erick

On Mon, Jan 11, 2016 at 6:34 PM, Kelly, Frank  wrote:

> Using Solr 5.3.1 in Solr Cloud mode deployed on AWS (each Solr instance
> has –Xmx 1024m and the server has 8GB of RAM)
>
> Am getting a 500 error running a query via the UI
>
> Looking in the logs I just see this with no stack trace
>
> 2016-01-12 02:04:22.181 ERROR (qtp59559151-7313)
> [c:qa_us-east-1_here_account18 s:shard2 r:core_node4
> x:qa_us-east-1_here_account18_shard2_replica2] o.a.s.s.SolrDispatchFilter
> null:java.lang.ArrayIndexOutOfBoundsException
>
>
> 2016-01-12 02:04:22.201 ERROR (qtp59559151-7267)
> [c:qa_us-east-1_here_account18 s:shard2 r:core_node4
> x:qa_us-east-1_here_account18_shard2_replica2] o.a.s.c.SolrCore
> java.lang.ArrayIndexOutOfBoundsException
>
>
>
>
> I have examined the cluster state and there are no degraded shards.
> I am suspicious that an OutOfMemoryError earlier in the day is still
> occurring but I have not seen one recently
>
> solr_log_20160111_1510:java.lang.OutOfMemoryError: Java heap space
>
> When I inspect the service status it seems to say I am OK for memory
>
> [solr@qa-solr-us-east-1-4454eff7 logs]$ sudo service solr status
>
>
> Found 1 Solr nodes:
>
>
> Solr process 9641 running on port 8983
>
> {
>
>   "solr_home":"/solr/data/",
>
>   "version":"5.3.1 1703449 - noble - 2015-09-17 01:48:15",
>
>   "startTime":"2016-01-11T15:10:14.113Z",
>
>   "uptime":"0 days, 11 hours, 22 minutes, 53 seconds",
>
>   "memory":"665.3 MB (%67.8) of 981.4 MB",
>
>   "cloud":{
>
> "ZooKeeper":"ec2-54-173-143-61.compute-1.amazonaws.com:2181,
> ec2-52-90-213-206.compute-1.amazonaws.com:2181,
> ec2-52-90-108-64.compute-1.amazonaws.com:2181/solr",
>
> "liveNodes":"3",
>
> "collections":"56"}}
>
> Any thoughts or suggestions on how to debug further (ideally get the stack
> trace)?
>
> Thanks!
>
> -Frank
>
> *Frank Kelly*
>
> Principal Software Engineer
>
> Predictive Analytics Team (SCBE/HAC/CDA)
>
>
> *HERE *
>
> 5 Wayside Rd, Burlington, MA 01803, USA
>
> *42° 29' 7" N 71° 11' 32” W*
>
>
>    
> 
>   
>
>

Re: Solr search and index rate optimization

2016-01-11 Thread Zap Org

thanks for replying currently my machine specs are
32 GB RAM
4 core processor
windows server 2008 64bit
500 GB HD
16 GB swap memorey

now the already running machine with cpu usage not more than 10% already
consumed all the RAM and now started to use swap memorey what my guess is
my server will chok when swap memorey will end. i am only running solr and
ZK instances there any wild idea what is happening and why memorey
consumption is too high.
all the field cache and query caches are set to 1GB in solrconfig and along
with serving queries i am running delta after every 15 minute.

On Fri, Jan 8, 2016 at 3:40 PM, Toke Eskildsen 
wrote:

> On Fri, 2016-01-08 at 10:55 +0500, Zap Org wrote:
> > i wanted to ask that i need to index after evey 15 min with hard commit
> > (real time records) and currently have 5 zookeeper instances and 2 solr
> > instances in one machine serving 200 users with 32GB RAM. whereas i
> wanted
> > to serve more than 10,000 users so what should be my machine specs and
> what
> > should be my architecture for this much serve rate along with index rate.
>
> It depends on your system and if we were forced to guess, our guess
> would be very loose.
>
>
> Fortunately you do have a running system with real queries: Make a copy
> on two similar machines (you will probably need more hardware anyway)
> and simulate growing traffic, measuring response times at appropriate
> points: 200 users, 500, 1000, 2000 etc.
>
> If you are very lucky, your current system scales all the way. If not,
> you should have enough data to make an educated guess of the amount of
> machines you need. You should have at least 3 measuring point to
> extrapolate from as scaling is not always linear.
>
> - Toke Eskildsen, State and University Library, Denmark
>
>
>

Re: Solr search and index rate optimization

2016-01-11 Thread Zap Org

hello dear thanks for replying it means 3 ZK instances are more than enough
in my case

On Fri, Jan 8, 2016 at 10:07 PM, Erick Erickson 
wrote:

> Here's a longer form of Toke's answer:
>
> https://lucidworks.com/blog/2012/07/23/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer/
>
> BTW, on the surface, having 5 ZK nodes isn't doing you any real good.
> Zookeeper isn't really involved in serving queries or handling
> updates, it's purpose is to have the state of the cluster (nodes up,
> recovering, down, etc) and notify Solr listeners when that state
> changes. There's no good reason to have 5 with a small cluster and by
> "small" I mean < 100s of nodes.
>
> Best,
> Erick
>
> On Fri, Jan 8, 2016 at 2:40 AM, Toke Eskildsen 
> wrote:
> > On Fri, 2016-01-08 at 10:55 +0500, Zap Org wrote:
> >> i wanted to ask that i need to index after evey 15 min with hard commit
> >> (real time records) and currently have 5 zookeeper instances and 2 solr
> >> instances in one machine serving 200 users with 32GB RAM. whereas i
> wanted
> >> to serve more than 10,000 users so what should be my machine specs and
> what
> >> should be my architecture for this much serve rate along with index
> rate.
> >
> > It depends on your system and if we were forced to guess, our guess
> > would be very loose.
> >
> >
> > Fortunately you do have a running system with real queries: Make a copy
> > on two similar machines (you will probably need more hardware anyway)
> > and simulate growing traffic, measuring response times at appropriate
> > points: 200 users, 500, 1000, 2000 etc.
> >
> > If you are very lucky, your current system scales all the way. If not,
> > you should have enough data to make an educated guess of the amount of
> > machines you need. You should have at least 3 measuring point to
> > extrapolate from as scaling is not always linear.
> >
> > - Toke Eskildsen, State and University Library, Denmark
> >
> >
>

Solr 5.3.1 ArrayIndexOutOfBoundsException while running a query

2016-01-11 Thread Kelly, Frank

Using Solr 5.3.1 in Solr Cloud mode deployed on AWS (each Solr instance has 
-Xmx 1024m and the server has 8GB of RAM)

Am getting a 500 error running a query via the UI

Looking in the logs I just see this with no stack trace

2016-01-12 02:04:22.181 ERROR (qtp59559151-7313) [c:qa_us-east-1_here_account18 
s:shard2 r:core_node4 x:qa_us-east-1_here_account18_shard2_replica2] 
o.a.s.s.SolrDispatchFilter null:java.lang.ArrayIndexOutOfBoundsException


2016-01-12 02:04:22.201 ERROR (qtp59559151-7267) [c:qa_us-east-1_here_account18 
s:shard2 r:core_node4 x:qa_us-east-1_here_account18_shard2_replica2] 
o.a.s.c.SolrCore java.lang.ArrayIndexOutOfBoundsException



I have examined the cluster state and there are no degraded shards.
I am suspicious that an OutOfMemoryError earlier in the day is still occurring 
but I have not seen one recently

solr_log_20160111_1510:java.lang.OutOfMemoryError: Java heap space

When I inspect the service status it seems to say I am OK for memory


[solr@qa-solr-us-east-1-4454eff7 logs]$ sudo service solr status


Found 1 Solr nodes:


Solr process 9641 running on port 8983

{

  "solr_home":"/solr/data/",

  "version":"5.3.1 1703449 - noble - 2015-09-17 01:48:15",

  "startTime":"2016-01-11T15:10:14.113Z",

  "uptime":"0 days, 11 hours, 22 minutes, 53 seconds",

  "memory":"665.3 MB (%67.8) of 981.4 MB",

  "cloud":{


"ZooKeeper":"ec2-54-173-143-61.compute-1.amazonaws.com:2181,ec2-52-90-213-206.compute-1.amazonaws.com:2181,ec2-52-90-108-64.compute-1.amazonaws.com:2181/solr",

"liveNodes":"3",

"collections":"56"}}

Any thoughts or suggestions on how to debug further (ideally get the stack 
trace)?

Thanks!

-Frank
[cid:372B00C7-E162-45EC-BA3E-64BC5D098016]
Frank Kelly
Principal Software Engineer
Predictive Analytics Team (SCBE/HAC/CDA)






HERE
5 Wayside Rd, Burlington, MA 01803, USA
42° 29' 7" N 71° 11' 32" W

[cid:2A590471-E354-4B0F-9393-C445DD4303A7]  
[cid:0346BFC6-3B91-40BD-8E7E-3D5E549E5516]    
[cid:16238AC6-6A21-4C17-8990-07095FCED69A] 
[cid:5773C34E-F274-464C-810C-B113E6299A56] 

[cid:15654391-21A7-4012-8A18-A4589C8AEE4A]

Re: solrcloud -How to delete a doc at a specific shard

2016-01-11 Thread elvis鱼人

hi Erick ,i really want to know too.
i remembered change zookeeper,may be correlated with it

shard1: 
  192.168.100.210:7001-leader 
  192.168.100.211:7001-replica 

shard2: 
  192.168.100.211:7002:leader 
  192.168.100.212:7001:replica 

shard3: 
  192.168.100.210:7002:leader 
  192.168.100.212:7002:replica
It happens in shard1



--
View this message in context: 
http://lucene.472066.n3.nabble.com/solrcloud-How-to-delete-a-doc-at-a-specific-shard-tp4249354p4250058.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr UIMA Custom Annotator PEAR file installation on Linux

2016-01-11 Thread techqnq


Hi, Tommaso Teofili: any help on this ^



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-UIMA-Custom-Annotator-PEAR-file-installation-on-Linux-tp4249302p4250054.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: multiple solr-config.xml files per core

2016-01-11 Thread techqnq

Thanks Eric, for confirming and putting it correctly in your response.
Appreciate your help!



--
View this message in context: 
http://lucene.472066.n3.nabble.com/multiple-solr-config-xml-files-per-core-tp4250009p4250050.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: WArning in SolrCloud logs

2016-01-11 Thread Erick Erickson

Just show us the solrconfig.xml file, particularly anything referring
to replication,
it's easier than talking past each other.


Best,
Erick.

On Mon, Jan 11, 2016 at 12:18 PM, Gian Maria Ricci - aka Alkampfer
 wrote:
> Actually that is a collection I've created uploading into Zookeeper a 
> configuration I used for single node, with a replication handler activated to 
> backup the core. I did not send any master/slave config actually, I just 
> created the collection using collection API and the warning is immediately 
> there.
>
> --
> Gian Maria Ricci
> Cell: +39 320 0136949
>
>
>
> -Original Message-
> From: Alessandro Benedetti [mailto:abenede...@apache.org]
> Sent: lunedì 11 gennaio 2016 17:52
> To: solr-user@lucene.apache.org
> Subject: Re: WArning in SolrCloud logs
>
> To be honest it seems to me more a wrong usage of java environment variables.
> Is it possible you are sending the enable master/slave config to the node ?
> Strictly talking about the replication request handler, it is required for 
> SolrCloud ( there are scenarios where old style replication is still used) .
> But this supposes to happen automatically .
>
> Strip of code that cause the warning :
>
> if (enableMaster || enableSlave) {
>> if (core.getCoreDescriptor().getCoreContainer().getZkController() !=
>> null) { LOG.warn("SolrCloud is enabled for core " + core.getName() + "
>> but so is old-style replication. Make sure you" + " intend this
>> behavior, it usually indicates a mis-configuration. Master setting is
>> " +
>> Boolean.toString(enableMaster) + " and slave setting is " + Boolean.
>> toString(enableSlave));
>> }
>> }
>
>
> Cheers
>
> On 11 January 2016 at 15:08, Gian Maria Ricci - aka Alkampfer < 
> alkamp...@nablasoft.com> wrote:
>
>> I’ve configured three node in solrcloud, everything seems ok, but in
>> the log I see this kind of warning
>>
>>
>>
>> SolrCloud is enabled for core xxx_shard3_replica1 but so is old-style
>> replication. Make sure you intend this behavior, it usually indicates
>> a mis-configuration. Master setting is true and slave setting is false
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> What could be the reason? Is it possible that this happens because
>> solrconfig.xml used to create the collection has a replication handler
>> active?
>>
>> --
>> Gian Maria Ricci
>> Cell: +39 320 0136949
>>
>> [image:
>> https://ci5.googleusercontent.com/proxy/5oNMOYAeFXZ_LDKanNfoLRHC37mAZk
>> VVhkPN7QxMdA0K5JW2m0bm8azJe7oWZMNt8fKHNX1bzrUTd-kIyE40CmwT2Mlf8OI=s0-d
>> -e1-ft#http://www.codewrecks.com/files/signature/mvp.png]
>>  [image:
>> https://ci3.googleusercontent.com/proxy/f-unQbmk6NtkHFspO5Y6x4jlIf_xrm
>> GLUT3fU9y_7VUHSFUjLs7aUIMdZQYTh3eWIA0sBnvNX3WGXCU59chKXLuAHi2ArWdAcBcl
>> KA=s0-d-e1-ft#http://www.codewrecks.com/files/signature/linkedin.jpg]
>>  [image:
>> https://ci3.googleusercontent.com/proxy/gjapMzu3KEakBQUstx_-cN7gHJ_Gpc
>> IZNEPjCzOYMrPl-r1DViPE378qNAQyEWbXMTj6mcduIAGaApe9qHG1KN_hyFxQAIkdNSVT
>> =s0-d-e1-ft#http://www.codewrecks.com/files/signature/twitter.jpg]
>>  [image:
>> https://ci5.googleusercontent.com/proxy/iuDOD2sdaxRDvTwS8MO7-CcXchpNJX
>> 96uaWuvagoVLcjpAPsJi88XeOonE4vHT6udVimo7yL9ZtdrYueEfH7jXnudmi_Vvw=s0-d
>> -e1-ft#http://www.codewrecks.com/files/signature/rss.jpg]
>>  [image:
>> https://ci6.googleusercontent.com/proxy/EBJjfkBzcsSlAzlyR88y86YXcwaKfn
>> 3x7ydAObL1vtjJYclQr_l5TvrFx4PQ5qLNYW3yp7Ig66DJ-0tPJCDbDmYAFcamPQehwg=s
>> 0-d-e1-ft#http://www.codewrecks.com/files/signature/skype.jpg]
>>
>>
>>
>
>
>
> --
> --
>
> Benedetti Alessandro
> Visiting card : http://about.me/alessandro_benedetti
>
> "Tyger, tyger burning bright
> In the forests of the night,
> What immortal hand or eye
> Could frame thy fearful symmetry?"
>
> William Blake - Songs of Experience -1794 England

Re: multiple solr-config.xml files per core

2016-01-11 Thread Erick Erickson

bq: Can Solr Server have different/multiple solr-config.xml file per core?

Yes. Each separate core can (and usually does) have its own configs,
solrconfig.xml,
schema and the like.

Your question could be interpreted as asking  if you can have multiple
solrconfig.xml files in the
_same_ core, the answer to which is "no".

Best,
Erick

On Mon, Jan 11, 2016 at 12:19 PM, techqnq  wrote:
>
> I assume distinct solr-config.xml file is allowed for every solr core, but I
> got suspicious based upon the data size of the core. So thought to get my
> facts confirmed/corrected here:
>
> Q. Can Solr Server have different/multiple solr-config.xml file per core?
>
> Use Case:
> - For one core solr-config.xml file: it is configured with UIMA update
> processor i.e. "updateRequestProcessorChain"
> 
> 
>   uima
> 
>   
>
> - For second core solr-config-xml file: it is kept default/standard as it is
> (no uima update)
>
>
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/multiple-solr-config-xml-files-per-core-tp4250009.html
> Sent from the Solr - User mailing list archive at Nabble.com.

Re: Change leader in SolrCloud

2016-01-11 Thread Erick Erickson

bq:  It seems to me a huge wasting of resources.

How else would you guarantee consistency? Especially taking
in to account Lucene's write-once segments? Master/Slave
sidesteps the problem by moving entire, closed segments to the
slave, but as Shawn says if the master goes down the slaves
don't have _any_ docs from the not-closed segments.

Best,
Erick

On Mon, Jan 11, 2016 at 1:42 PM, Shawn Heisey  wrote:
> On 1/11/2016 1:23 PM, Gian Maria Ricci - aka Alkampfer wrote:
>> Ok, this imply that if I have X replica of a shard, the document is indexed 
>> X+1 times? one for each replica plus the leader shard? It seems to me a huge 
>> wasting of resources.
>>
>> In a Master/slave scenario indexing takes places only on master node, then 
>> slave replicates analyzed data.
>
> The leader *is* a replica.  So if you have a replicationFactor of three,
> you have three replicas for each shard.  For each shard, one of those
> replicas gets elected to be the leader.  You do not have a leader and
> two replicas.
>
> The above is perhaps extremely pedantic, but understanding how SolrCloud
> works requires understanding that being temporarily assigned the leader
> role does not change how the replica works, it just adds some additional
> coordination responsibilities.
>
> To answer your question, let's assume you build an index with
> replicationFactor=3.  No new replicas are added, and all machines are
> up.  In that situation, each document gets indexed a total of three times.
>
> In return for this additional complexity and resource usage, you don't
> have a single point of failure for indexing.  With master/slave
> replication, if your master goes down for any length of time, you must
> reconfigure all of your remaining Solr nodes to change the master.
> Chances are very good that you will experience downtime.
>
> Thanks,
> Shawn
>

Re: indexing rich data with solr 5.3

2016-01-11 Thread Erick Erickson

Looks like a bad file. Do you have any success using DIH on any files?

What happens if you just send that particular file throug the
 ExtractingRequestHandler?

Best,
Erick

On Mon, Jan 11, 2016 at 3:51 PM, kostali hassan
 wrote:
> such files msword and pdf donsnt indexing using *dataimoprt i have this
> error:*
>
> Full Import failed:java.lang.RuntimeException:
> java.lang.RuntimeException:
> org.apache.solr.handler.dataimport.DataImportHandlerException: Unable
> to read content Processing Document # 2
> at 
> org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:270)
> at 
> org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:416)
> at 
> org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:480)
> at 
> org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:461)
> Caused by: java.lang.RuntimeException:
> org.apache.solr.handler.dataimport.DataImportHandlerException: Unable
> to read content Processing Document # 2
> at 
> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:416)
> at 
> org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:329)
> at 
> org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:232)
> ... 3 more
> Caused by: org.apache.solr.handler.dataimport.DataImportHandlerException:
> Unable to read content Processing Document # 2
> at 
> org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAndThrow(DataImportHandlerException.java:70)
> at 
> org.apache.solr.handler.dataimport.TikaEntityProcessor.nextRow(TikaEntityProcessor.java:168)
> at 
> org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:243)
> at 
> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:475)
> at 
> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:514)
> at 
> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:414)
> ... 5 more
> Caused by: org.apache.tika.exception.TikaException: Unexpected
> RuntimeException from
> org.apache.tika.parser.microsoft.ooxml.OOXMLParser@188120
> at 
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:258)
> at 
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:256)
> at 
> org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
> at 
> org.apache.solr.handler.dataimport.TikaEntityProcessor.nextRow(TikaEntityProcessor.java:162)
> ... 9 more
> Caused by: org.apache.poi.openxml4j.exceptions.InvalidOperationException:
> Can't open the specified file:
> 'D:\solr\solr-5.3.1\server\tmp\apache-tika-121920532070319073.tmp'
> at org.apache.poi.openxml4j.opc.ZipPackage.(ZipPackage.java:112)
> at org.apache.poi.openxml4j.opc.OPCPackage.open(OPCPackage.java:224)
> at 
> org.apache.tika.parser.microsoft.ooxml.OOXMLExtractorFactory.parse(OOXMLExtractorFactory.java:69)
> at 
> org.apache.tika.parser.microsoft.ooxml.OOXMLParser.parse(OOXMLParser.java:82)
> at 
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:256)
> ... 12 more
> Caused by: java.util.zip.ZipException: invalid END header (bad central
> directory offset)
> at java.util.zip.ZipFile.open(Native Method)
> at java.util.zip.ZipFile.(ZipFile.java:220)
> at java.util.zip.ZipFile.(ZipFile.java:150)
> at java.util.zip.ZipFile.(ZipFile.java:164)
> at 
> org.apache.poi.openxml4j.opc.internal.ZipHelper.openZipFile(ZipHelper.java:174)
> at org.apache.poi.openxml4j.opc.ZipPackage.(ZipPackage.java:110)
> ... 16 more

Re: Problems using MapReduceIndexerTool with multiple reducers

2016-01-11 Thread Erick Erickson

Hmm, it looks like you created your collection with the "implicit"
router. Does the same thing happen when you use the default
compositeId router?

Note, this should be OK with either, this is just to gather more info.

Other questions:
1> Are you running MRIT over Solr indexes that are actually hosted on HDFS?
2> Are you using the --go-live option?

Actually, can you show us the entire command you use to invoke MRIT?

Best,
Erick

On Mon, Jan 11, 2016 at 4:18 PM, Douglas Rapp  wrote:
> Hello,
>
> I am using Solr 4.10.4 in SolrCloud mode, but so far with only a single
> instance (so just a single shard - not very cloud-like..).
>
> I have been experimenting using the MapReduceIndexerTool to handle batch
> indexing of CSV files in HDFS. I got it working on a weaker single-node
> Hadoop test system, so I have been trying to do some performance testing on
> a 4-node Hadoop cluster (1 NameNode, 3 DataNode) with better hardware. The
> issue that I have come across is that the job will only finish successfully
> if I specify a single reducer (using the "--reducers 1" option upon
> invoking the tool).
>
> If the tool is invoked without specifying a number for mappers/reducers, it
> appears that it tries to utilize the maximum number available. In my case,
> it tries to use 16 mappers and 6 reducers. I have tried specifying many
> different combinations, and what I have found is that I can tweak the
> number of mappers to just about anything, but reducers must stay at "1" or
> else the job fails. Also explains why I never saw this pop up on the first
> system - looking closer at it, it defaults to only 1 reducer there. If I
> try to increase it, I get the same failure. When the job fails, I get the
> following stack trace:
>
> 6602 [main] WARN  org.apache.hadoop.mapred.YarnChild  - Exception running
> child : org.kitesdk.morphline.api.MorphlineRuntimeException:
> java.lang.IllegalStateException: No matching slice found! The slice seems
> unavailable. docRouterClass: org.apache.solr.common.cloud.ImplicitDocRouter
> at
> org.kitesdk.morphline.base.FaultTolerance.handleException(FaultTolerance.java:73)
> at
> org.apache.solr.hadoop.morphline.MorphlineMapRunner.map(MorphlineMapRunner.java:213)
> at
> org.apache.solr.hadoop.morphline.MorphlineMapper.map(MorphlineMapper.java:86)
> at
> org.apache.solr.hadoop.morphline.MorphlineMapper.map(MorphlineMapper.java:54)
> at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:784)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
> at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
> at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
> Caused by: java.lang.IllegalStateException: No matching slice found! The
> slice seems unavailable. docRouterClass:
> org.apache.solr.common.cloud.ImplicitDocRouter
> at
> org.apache.solr.hadoop.SolrCloudPartitioner.getPartition(SolrCloudPartitioner.java:120)
> at
> org.apache.solr.hadoop.SolrCloudPartitioner.getPartition(SolrCloudPartitioner.java:49)
> at
> org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:712)
> at
> org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:89)
> at
> org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.write(WrappedMapper.java:112)
> at
> org.apache.solr.hadoop.morphline.MorphlineMapper$MyDocumentLoader.load(MorphlineMapper.java:138)
> at
> org.apache.solr.morphlines.solr.LoadSolrBuilder$LoadSolr.doProcess(LoadSolrBuilder.java:129)
> at
> org.kitesdk.morphline.base.AbstractCommand.process(AbstractCommand.java:156)
> at org.kitesdk.morphline.base.Connector.process(Connector.java:64)
> at
> org.kitesdk.morphline.base.AbstractCommand.doProcess(AbstractCommand.java:181)
> at
> org.apache.solr.morphlines.solr.SanitizeUnknownSolrFieldsBuilder$SanitizeUnknownSolrFields.doProcess(SanitizeUnknownSolrFieldsBuilder.java:94)
> at
> org.kitesdk.morphline.base.AbstractCommand.process(AbstractCommand.java:156)
> at org.kitesdk.morphline.base.Connector.process(Connector.java:64)
> at
> org.kitesdk.morphline.stdio.ReadCSVBuilder$ReadCSV.doProcess(ReadCSVBuilder.java:124)
> at
> org.kitesdk.morphline.stdio.AbstractParser.doProcess(AbstractParser.java:93)
> at
> org.kitesdk.morphline.base.AbstractCommand.process(AbstractCommand.java:156)
> at
> org.kitesdk.morphline.base.AbstractCommand.doProcess(AbstractCommand.java:181)
> at
> org.kitesdk.morphline.base.AbstractCommand.process(AbstractCommand.java:156)
> at
>

Problems using MapReduceIndexerTool with multiple reducers

2016-01-11 Thread Douglas Rapp

Hello,

I am using Solr 4.10.4 in SolrCloud mode, but so far with only a single
instance (so just a single shard - not very cloud-like..).

I have been experimenting using the MapReduceIndexerTool to handle batch
indexing of CSV files in HDFS. I got it working on a weaker single-node
Hadoop test system, so I have been trying to do some performance testing on
a 4-node Hadoop cluster (1 NameNode, 3 DataNode) with better hardware. The
issue that I have come across is that the job will only finish successfully
if I specify a single reducer (using the "--reducers 1" option upon
invoking the tool).

If the tool is invoked without specifying a number for mappers/reducers, it
appears that it tries to utilize the maximum number available. In my case,
it tries to use 16 mappers and 6 reducers. I have tried specifying many
different combinations, and what I have found is that I can tweak the
number of mappers to just about anything, but reducers must stay at "1" or
else the job fails. Also explains why I never saw this pop up on the first
system - looking closer at it, it defaults to only 1 reducer there. If I
try to increase it, I get the same failure. When the job fails, I get the
following stack trace:

6602 [main] WARN  org.apache.hadoop.mapred.YarnChild  - Exception running
child : org.kitesdk.morphline.api.MorphlineRuntimeException:
java.lang.IllegalStateException: No matching slice found! The slice seems
unavailable. docRouterClass: org.apache.solr.common.cloud.ImplicitDocRouter
at
org.kitesdk.morphline.base.FaultTolerance.handleException(FaultTolerance.java:73)
at
org.apache.solr.hadoop.morphline.MorphlineMapRunner.map(MorphlineMapRunner.java:213)
at
org.apache.solr.hadoop.morphline.MorphlineMapper.map(MorphlineMapper.java:86)
at
org.apache.solr.hadoop.morphline.MorphlineMapper.map(MorphlineMapper.java:54)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:784)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
Caused by: java.lang.IllegalStateException: No matching slice found! The
slice seems unavailable. docRouterClass:
org.apache.solr.common.cloud.ImplicitDocRouter
at
org.apache.solr.hadoop.SolrCloudPartitioner.getPartition(SolrCloudPartitioner.java:120)
at
org.apache.solr.hadoop.SolrCloudPartitioner.getPartition(SolrCloudPartitioner.java:49)
at
org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:712)
at
org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:89)
at
org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.write(WrappedMapper.java:112)
at
org.apache.solr.hadoop.morphline.MorphlineMapper$MyDocumentLoader.load(MorphlineMapper.java:138)
at
org.apache.solr.morphlines.solr.LoadSolrBuilder$LoadSolr.doProcess(LoadSolrBuilder.java:129)
at
org.kitesdk.morphline.base.AbstractCommand.process(AbstractCommand.java:156)
at org.kitesdk.morphline.base.Connector.process(Connector.java:64)
at
org.kitesdk.morphline.base.AbstractCommand.doProcess(AbstractCommand.java:181)
at
org.apache.solr.morphlines.solr.SanitizeUnknownSolrFieldsBuilder$SanitizeUnknownSolrFields.doProcess(SanitizeUnknownSolrFieldsBuilder.java:94)
at
org.kitesdk.morphline.base.AbstractCommand.process(AbstractCommand.java:156)
at org.kitesdk.morphline.base.Connector.process(Connector.java:64)
at
org.kitesdk.morphline.stdio.ReadCSVBuilder$ReadCSV.doProcess(ReadCSVBuilder.java:124)
at
org.kitesdk.morphline.stdio.AbstractParser.doProcess(AbstractParser.java:93)
at
org.kitesdk.morphline.base.AbstractCommand.process(AbstractCommand.java:156)
at
org.kitesdk.morphline.base.AbstractCommand.doProcess(AbstractCommand.java:181)
at
org.kitesdk.morphline.base.AbstractCommand.process(AbstractCommand.java:156)
at
org.apache.solr.hadoop.morphline.MorphlineMapRunner.map(MorphlineMapRunner.java:201)
... 10 more

When I try searching online for "No matching slice found", the only results
I get back are of the source code.. I can't seem to find anything to lead
me in the right direction.

Looking at the MapReduceIndexerTool more closely, it says that when using
more than one reducer per output shard (so in my case, >1) it will utilize
the "mtree" merge algorithm to merge the results held among several
mini-shards. I'm guessing this might have something to do with it, but I
can't find any other information on how this might be further tweaked or
debug

indexing rich data with solr 5.3

2016-01-11 Thread kostali hassan

such files msword and pdf donsnt indexing using *dataimoprt i have this
error:*

Full Import failed:java.lang.RuntimeException:
java.lang.RuntimeException:
org.apache.solr.handler.dataimport.DataImportHandlerException: Unable
to read content Processing Document # 2
at 
org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:270)
at 
org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:416)
at 
org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:480)
at 
org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:461)
Caused by: java.lang.RuntimeException:
org.apache.solr.handler.dataimport.DataImportHandlerException: Unable
to read content Processing Document # 2
at 
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:416)
at 
org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:329)
at 
org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:232)
... 3 more
Caused by: org.apache.solr.handler.dataimport.DataImportHandlerException:
Unable to read content Processing Document # 2
at 
org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAndThrow(DataImportHandlerException.java:70)
at 
org.apache.solr.handler.dataimport.TikaEntityProcessor.nextRow(TikaEntityProcessor.java:168)
at 
org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:243)
at 
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:475)
at 
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:514)
at 
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:414)
... 5 more
Caused by: org.apache.tika.exception.TikaException: Unexpected
RuntimeException from
org.apache.tika.parser.microsoft.ooxml.OOXMLParser@188120
at 
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:258)
at 
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:256)
at 
org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
at 
org.apache.solr.handler.dataimport.TikaEntityProcessor.nextRow(TikaEntityProcessor.java:162)
... 9 more
Caused by: org.apache.poi.openxml4j.exceptions.InvalidOperationException:
Can't open the specified file:
'D:\solr\solr-5.3.1\server\tmp\apache-tika-121920532070319073.tmp'
at org.apache.poi.openxml4j.opc.ZipPackage.(ZipPackage.java:112)
at org.apache.poi.openxml4j.opc.OPCPackage.open(OPCPackage.java:224)
at 
org.apache.tika.parser.microsoft.ooxml.OOXMLExtractorFactory.parse(OOXMLExtractorFactory.java:69)
at 
org.apache.tika.parser.microsoft.ooxml.OOXMLParser.parse(OOXMLParser.java:82)
at 
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:256)
... 12 more
Caused by: java.util.zip.ZipException: invalid END header (bad central
directory offset)
at java.util.zip.ZipFile.open(Native Method)
at java.util.zip.ZipFile.(ZipFile.java:220)
at java.util.zip.ZipFile.(ZipFile.java:150)
at java.util.zip.ZipFile.(ZipFile.java:164)
at 
org.apache.poi.openxml4j.opc.internal.ZipHelper.openZipFile(ZipHelper.java:174)
at org.apache.poi.openxml4j.opc.ZipPackage.(ZipPackage.java:110)
... 16 more

Re: Possible Bug - MDC handling in org.apache.solr.common.util.ExecutorUtil.MDCAwareThreadPoolExecutor.execut e(Runnable)

2016-01-11 Thread Chris Hostetter


: Not sure I'm onboard with the first proposed solution, but yes, I'd open a
: JIRA issue to discuss.

we should standardize the context keys to use use fully 
qualified (org.apache.solr.*) java class name prefixes -- just like we do 
with the logger names themselves.

: 
: - Mark
: 
: On Mon, Jan 11, 2016 at 4:01 AM Konstantin Hollerith 
: wrote:
: 
: > Hi,
: >
: > I'm using SLF4J MDC to log additional Information in my WebApp. Some of my
: > MDC-Parameters even include Line-Breaks.
: > It seems, that Solr takes _all_ MDC parameters and puts them into the
: > Thread-Name, see
: >
: > 
org.apache.solr.common.util.ExecutorUtil.MDCAwareThreadPoolExecutor.execute(Runnable).
: >
: > When there is some logging of Solr, the log gets cluttered:
: >
: > [11.01.16 09:14:19:170 CET] 02a3 SystemOut O 09:14:19,169
: > [zkCallback-14-thread-1-processing-My
: > Custom
: > MDC
: > Parameter ROraqiFWaoXqP21gu4uLpMh SANDHO] WARN
: > common.cloud.ConnectionManager [session=ROraqiFWaoXqP21gu4uLpMh]
: > [user=SANDHO]: zkClient received AuthFailed
: >
: > (some of my MDC-Parameters are only active in Email-Logs and are not
: > included in the file-log)
: >
: > I think this is a Bug. Solr should only put its own MDC-Parameter into the
: > Thread-Name.
: >
: > Possible Solution: Since all (as far as i can check) invocations in Solr of
: > MDC.put uses a Prefix like "ConcurrentUpdateSolrClient" or
: > "CloudSolrClient" etc., it would be possible to put a check into
: > MDCAwareThreadPoolExecutor.execute(Runnable) that process only those
: > Prefixes.
: >
: > Should i open a Jira-Issue for this?
: >
: > Thanks,
: >
: > Konstantin
: >
: > Environment: JSF-Based App with WebSphrere 8.5, Solr 5.3.0, slf4j-1.7.12,
: > all jars are in WEB-INF/lib.
: >
: -- 
: - Mark
: about.me/markrmiller
: 

-Hoss
http://www.lucidworks.com/

Re: collapse filter query

2016-01-11 Thread Joel Bernstein

I went to go work on the issue and found it was already fixed 7 weeks ago.
The bug fix is available in Solr 5.4.

Joel Bernstein
http://joelsolr.blogspot.com/

On Mon, Jan 11, 2016 at 3:12 PM, Susheel Kumar 
wrote:

> You can go to https://issues.apache.org/jira/browse/SOLR/ and create Jira
> ticket after signing in.
>
> Thanks,
> Susheel
>
> On Mon, Jan 11, 2016 at 2:15 PM, sara hajili 
> wrote:
>
> > Tnx.How I can create a jira ticket?
> > On Jan 11, 2016 10:42 PM, "Joel Bernstein"  wrote:
> >
> > > I believe this is a bug. I think the reason this is occurring is that
> you
> > > have an index segment with no values at all in the collapse field. If
> you
> > > could create a jira ticket for this I will look at resolving the issue.
> > >
> > > Joel Bernstein
> > > http://joelsolr.blogspot.com/
> > >
> > > On Mon, Jan 11, 2016 at 2:03 PM, sara hajili 
> > > wrote:
> > >
> > > > I am using solr 5.3.1
> > > > On Jan 11, 2016 10:30 PM, "Joel Bernstein" 
> wrote:
> > > >
> > > > > Which version of Solr are you using?
> > > > >
> > > > > Joel Bernstein
> > > > > http://joelsolr.blogspot.com/
> > > > >
> > > > > On Mon, Jan 11, 2016 at 6:39 AM, sara hajili <
> hajili.s...@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > hi all
> > > > > > i have a MLT query and i wanna to use collapse filter query.
> > > > > > and i wanna to use collapse expand nullPolicy.
> > > > > > in this way when i used it :
> > > > > > {!collapse field=original_post_id nullPolicy=expand}
> > > > > > i got my appropriate result .
> > > > > > (in solr web UI)
> > > > > >
> > > > > > but in regular search handler "/select",when i used
> > > > > > {!collapse field=original_post_id nullPolicy=expand}
> > > > > > i got error:
> > > > > >
> > > > > > {
> > > > > >   "responseHeader":{
> > > > > > "status":500,
> > > > > > "QTime":2,
> > > > > > "params":{
> > > > > >   "q":"*:*",
> > > > > >   "indent":"true",
> > > > > >   "fq":"{!collapse field=original_post_id
> nullPolicy=expand}",
> > > > > >   "wt":"json"}},
> > > > > >   "error":{
> > > > > > "trace":"java.lang.NullPointerException\n\tat
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.solr.search.CollapsingQParserPlugin$IntScoreCollector.finish(CollapsingQParserPlugin.java:763)\n\tat
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.solr.search.SolrIndexSearcher.buildAndRunCollectorChain(SolrIndexSearcher.java:211)\n\tat
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.solr.search.SolrIndexSearcher.getDocListNC(SolrIndexSearcher.java:1678)\n\tat
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:1497)\n\tat
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:555)\n\tat
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:522)\n\tat
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:277)\n\tat
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:143)\n\tat
> > > > > > org.apache.solr.core.SolrCore.execute(SolrCore.java:2068)\n\tat
> > > > > >
> > > >
> > org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:669)\n\tat
> > > > > >
> > > org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:462)\n\tat
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:214)\n\tat
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:179)\n\tat
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652)\n\tat
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:585)\n\tat
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)\n\tat
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:577)\n\tat
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:223)\n\tat
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1127)\n\tat
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515)\n\tat
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)\n\tat
> > > > > >
> > > > > >
> > > > >

Re: Change leader in SolrCloud

2016-01-11 Thread Shawn Heisey

On 1/11/2016 1:23 PM, Gian Maria Ricci - aka Alkampfer wrote:
> Ok, this imply that if I have X replica of a shard, the document is indexed 
> X+1 times? one for each replica plus the leader shard? It seems to me a huge 
> wasting of resources.  
>
> In a Master/slave scenario indexing takes places only on master node, then 
> slave replicates analyzed data.  

The leader *is* a replica.  So if you have a replicationFactor of three,
you have three replicas for each shard.  For each shard, one of those
replicas gets elected to be the leader.  You do not have a leader and
two replicas.

The above is perhaps extremely pedantic, but understanding how SolrCloud
works requires understanding that being temporarily assigned the leader
role does not change how the replica works, it just adds some additional
coordination responsibilities.

To answer your question, let's assume you build an index with
replicationFactor=3.  No new replicas are added, and all machines are
up.  In that situation, each document gets indexed a total of three times.

In return for this additional complexity and resource usage, you don't
have a single point of failure for indexing.  With master/slave
replication, if your master goes down for any length of time, you must
reconfigure all of your remaining Solr nodes to change the master. 
Chances are very good that you will experience downtime.

Thanks,
Shawn

RE: Change leader in SolrCloud

2016-01-11 Thread Gian Maria Ricci - aka Alkampfer

Ok, this imply that if I have X replica of a shard, the document is indexed X+1 
times? one for each replica plus the leader shard? It seems to me a huge 
wasting of resources.  

In a Master/slave scenario indexing takes places only on master node, then 
slave replicates analyzed data.  

--
Gian Maria Ricci
Cell: +39 320 0136949



-Original Message-
From: Erick Erickson [mailto:erickerick...@gmail.com] 
Sent: lunedì 11 gennaio 2016 19:03
To: solr-user 
Subject: Re: Change leader in SolrCloud

You have to assign the preferredLeader role first. You can do that node-by-node 
via ADDREPLICAPROP or have the system do it for you with BALANCESHARDUNIQUE.

As I said before, in SolrCloud the leader forwards the raw document to each 
follower. There is no pre-processing, analysis anything else done on the leader 
first.

Best,
Erick

On Mon, Jan 11, 2016 at 9:19 AM, Gian Maria Ricci - aka Alkampfer 
 wrote:
> Thanks.
>
> This arise a different question: when I index a document, it is assigned to 
> one of the three shard based on the value of the ID field. Actually indexing 
> a document is usually a CPU and RAM intensive work to parse text, tokenize, 
> etc. How this works in SolrCloud? I probably incorrectly assumed that the 
> indexing task is carried out by the shard leader, then data is propagated to 
> replica of that shard. This lead me to think that, having all three leader 
> shards in a node, it does not use other nodes to index data and performance 
> will suffer.
>
> I've tried to use REBALANCELEADERS but nothing changes (probably because 
> there are few shards).
>
> --
> Gian Maria Ricci
> Cell: +39 320 0136949
>
>
>
> -Original Message-
> From: Shawn Heisey [mailto:apa...@elyograg.org]
> Sent: lunedì 11 gennaio 2016 17:49
> To: solr-user@lucene.apache.org
> Subject: Re: Change leader in SolrCloud
>
> On 1/11/2016 8:45 AM, Gian Maria Ricci - aka Alkampfer wrote:
>> Due to the different reboot times probably, I’ve noticed that upon 
>> reboot all three leader shards are on a single machine. I’m expecting 
>> shard leaders to be distributed evenly between machines, because if 
>> all shard leader are on a same machine, all new documents to index 
>> will be routed to the same machine, thus indexing load is not subdivided.
>
> You're looking for the REBALANCELEADERS functionality ... but because you 
> only have three nodes, the fact that one machine has the leaders for all 
> three shards is not really a problem.
>
> https://cwiki.apache.org/confluence/display/solr/Collections+API#Colle
> ctionsAPI-RebalanceLeaders
>
> This feature was added for a use case where there are hundreds of nodes and 
> hundreds of total shards, with the leader roles heavily concentrated on a 
> small number of nodes.  With REBALANCELEADERS, the leader roles can be spread 
> more evenly around the cluster.
>
> It is true that the shard leader does do a small amount of extra work, but 
> for a very small installation like yours, the overhead is nothing to be 
> concerned about.  You can do something about it if it bothers you, though.
>
> Thanks,
> Shawn
>

multiple solr-config.xml files per core

2016-01-11 Thread techqnq


I assume distinct solr-config.xml file is allowed for every solr core, but I
got suspicious based upon the data size of the core. So thought to get my
facts confirmed/corrected here:

Q. Can Solr Server have different/multiple solr-config.xml file per core?

Use Case: 
- For one core solr-config.xml file: it is configured with UIMA update
processor i.e. "updateRequestProcessorChain" 


  uima

  

- For second core solr-config-xml file: it is kept default/standard as it is
(no uima update)





--
View this message in context: 
http://lucene.472066.n3.nabble.com/multiple-solr-config-xml-files-per-core-tp4250009.html
Sent from the Solr - User mailing list archive at Nabble.com.

RE: WArning in SolrCloud logs

2016-01-11 Thread Gian Maria Ricci - aka Alkampfer

Actually that is a collection I've created uploading into Zookeeper a 
configuration I used for single node, with a replication handler activated to 
backup the core. I did not send any master/slave config actually, I just 
created the collection using collection API and the warning is immediately 
there.

--
Gian Maria Ricci
Cell: +39 320 0136949



-Original Message-
From: Alessandro Benedetti [mailto:abenede...@apache.org] 
Sent: lunedì 11 gennaio 2016 17:52
To: solr-user@lucene.apache.org
Subject: Re: WArning in SolrCloud logs

To be honest it seems to me more a wrong usage of java environment variables.
Is it possible you are sending the enable master/slave config to the node ?
Strictly talking about the replication request handler, it is required for 
SolrCloud ( there are scenarios where old style replication is still used) .
But this supposes to happen automatically .

Strip of code that cause the warning :

if (enableMaster || enableSlave) {
> if (core.getCoreDescriptor().getCoreContainer().getZkController() != 
> null) { LOG.warn("SolrCloud is enabled for core " + core.getName() + " 
> but so is old-style replication. Make sure you" + " intend this 
> behavior, it usually indicates a mis-configuration. Master setting is 
> " +
> Boolean.toString(enableMaster) + " and slave setting is " + Boolean.
> toString(enableSlave));
> }
> }


Cheers

On 11 January 2016 at 15:08, Gian Maria Ricci - aka Alkampfer < 
alkamp...@nablasoft.com> wrote:

> I’ve configured three node in solrcloud, everything seems ok, but in 
> the log I see this kind of warning
>
>
>
> SolrCloud is enabled for core xxx_shard3_replica1 but so is old-style 
> replication. Make sure you intend this behavior, it usually indicates 
> a mis-configuration. Master setting is true and slave setting is false
>
>
>
>
>
>
>
>
>
>
>
> What could be the reason? Is it possible that this happens because 
> solrconfig.xml used to create the collection has a replication handler 
> active?
>
> --
> Gian Maria Ricci
> Cell: +39 320 0136949
>
> [image:
> https://ci5.googleusercontent.com/proxy/5oNMOYAeFXZ_LDKanNfoLRHC37mAZk
> VVhkPN7QxMdA0K5JW2m0bm8azJe7oWZMNt8fKHNX1bzrUTd-kIyE40CmwT2Mlf8OI=s0-d
> -e1-ft#http://www.codewrecks.com/files/signature/mvp.png]
>  [image:
> https://ci3.googleusercontent.com/proxy/f-unQbmk6NtkHFspO5Y6x4jlIf_xrm
> GLUT3fU9y_7VUHSFUjLs7aUIMdZQYTh3eWIA0sBnvNX3WGXCU59chKXLuAHi2ArWdAcBcl
> KA=s0-d-e1-ft#http://www.codewrecks.com/files/signature/linkedin.jpg]
>  [image:
> https://ci3.googleusercontent.com/proxy/gjapMzu3KEakBQUstx_-cN7gHJ_Gpc
> IZNEPjCzOYMrPl-r1DViPE378qNAQyEWbXMTj6mcduIAGaApe9qHG1KN_hyFxQAIkdNSVT
> =s0-d-e1-ft#http://www.codewrecks.com/files/signature/twitter.jpg]
>  [image:
> https://ci5.googleusercontent.com/proxy/iuDOD2sdaxRDvTwS8MO7-CcXchpNJX
> 96uaWuvagoVLcjpAPsJi88XeOonE4vHT6udVimo7yL9ZtdrYueEfH7jXnudmi_Vvw=s0-d
> -e1-ft#http://www.codewrecks.com/files/signature/rss.jpg]
>  [image:
> https://ci6.googleusercontent.com/proxy/EBJjfkBzcsSlAzlyR88y86YXcwaKfn
> 3x7ydAObL1vtjJYclQr_l5TvrFx4PQ5qLNYW3yp7Ig66DJ-0tPJCDbDmYAFcamPQehwg=s
> 0-d-e1-ft#http://www.codewrecks.com/files/signature/skype.jpg]
>
>
>



--
--

Benedetti Alessandro
Visiting card : http://about.me/alessandro_benedetti

"Tyger, tyger burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symmetry?"

William Blake - Songs of Experience -1794 England

Re: Kerberos ticket not renewing when storing index on Kerberized HDFS

2016-01-11 Thread Ishan Chattopadhyaya

Not sure how reliably renewals are taken care of in the context of
kerberized HDFS, but here's my 10-15 minute analysis.
Seems to me that the auto renewal thread is not spawned [0]. This relies on
kinit.
Not sure if having a login configuration with renewTGT is sufficient (which
seems to be passed in by default, unless there's a jaas config being
explicitly passed in with renewTGT=false). As per the last comments from
Devraj & Owen [1] kinit based logins have worked more reliably.

If you can rule out any setup issues, I suggest you file a JIRA and someone
who has worked on the HdfsDirectoryFactory would be able to suggest better.
Thanks,
Ishan

[0] -
http://grepcode.com/file/repo1.maven.org/maven2/org.apache.hadoop/hadoop-common/2.7.1/org/apache/hadoop/security/UserGroupInformation.java#UserGroupInformation.spawnAutoRenewalThreadForUserCreds%28%29

[1] - https://issues.apache.org/jira/browse/HADOOP-6656

On Fri, Jan 8, 2016 at 10:21 PM, Andrew Bumstead <
andrew.bumst...@bigdatapartnership.com> wrote:

> Hello,
>
> I have Solr Cloud configured to stores its index files on a Kerberized HDFS
> (I followed documentation at
> https://cwiki.apache.org/confluence/display/solr/Running+Solr+on+HDFS),
> and
> have been able to index some documents with the files being written to the
> HDFS as expected. However, it appears that some time after starting, Solr
> is unable to connect to HDFS as it no longer has a valid Kerberos TGT. The
> time-frame of this occurring is consistent with my default Kerberos ticket
> lifetime of 24 hours, so it appears as though Solr is not renewing its
> Kerberos ticket upon expiry. A restart of Solr resolves the issue again for
> 24 hours.
>
> Is there any configuration I can add to make Solr automatically renew its
> ticket or is this an issue with Solr?
>
> The following is the stack trace I am getting in Solr.
>
> java.io.IOException: Failed on local exception: java.io.IOException:
> Couldn't setup connection for solr/sandbox.hortonworks@hortonworks.com
> to sandbox.hortonworks.com/10.0.2.15:8020; Host Details : local host is: "
> sandbox.hortonworks.com/10.0.2.15"; destination host is: "
> sandbox.hortonworks.com":8020;
> at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:772)
> at org.apache.hadoop.ipc.Client.call(Client.java:1472)
> at org.apache.hadoop.ipc.Client.call(Client.java:1399)
> at
>
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232)
> at com.sun.proxy.$Proxy10.renewLease(Unknown Source)
> at
>
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.renewLease(ClientNamenodeProtocolTranslatorPB.java:571)
> at sun.reflect.GeneratedMethodAccessor7.invoke(Unknown Source)
> at
>
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at
>
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)
> at
>
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
> at com.sun.proxy.$Proxy11.renewLease(Unknown Source)
> at org.apache.hadoop.hdfs.DFSClient.renewLease(DFSClient.java:879)
> at org.apache.hadoop.hdfs.LeaseRenewer.renew(LeaseRenewer.java:417)
> at org.apache.hadoop.hdfs.LeaseRenewer.run(LeaseRenewer.java:442)
> at
> org.apache.hadoop.hdfs.LeaseRenewer.access$700(LeaseRenewer.java:71)
> at org.apache.hadoop.hdfs.LeaseRenewer$1.run(LeaseRenewer.java:298)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: java.io.IOException: Couldn't setup connection for solr/
> sandbox.hortonworks@hortonworks.com to
> sandbox.hortonworks.com/10.0.2.15:8020
> at org.apache.hadoop.ipc.Client$Connection$1.run(Client.java:672)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at
>
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
> at
>
> org.apache.hadoop.ipc.Client$Connection.handleSaslConnectionFailure(Client.java:643)
> at
> org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:730)
> at
> org.apache.hadoop.ipc.Client$Connection.access$2800(Client.java:368)
> at org.apache.hadoop.ipc.Client.getConnection(Client.java:1521)
> at org.apache.hadoop.ipc.Client.call(Client.java:1438)
> ... 16 more
> Caused by: javax.security.sasl.SaslException: GSS initiate failed [Caused
> by GSSException: No valid credentials provided (Mechanism level: Failed to
> find any Kerberos tgt)]
> at
>
> com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:212)
> at
>
> org.apache.hadoop.security.SaslRpcClient.saslConnect(SaslRpcClient.java:413)
> at
>
> org.apache.hadoop.ipc.Client$Connection

Re: collapse filter query

2016-01-11 Thread Susheel Kumar

You can go to https://issues.apache.org/jira/browse/SOLR/ and create Jira
ticket after signing in.

Thanks,
Susheel

On Mon, Jan 11, 2016 at 2:15 PM, sara hajili  wrote:

> Tnx.How I can create a jira ticket?
> On Jan 11, 2016 10:42 PM, "Joel Bernstein"  wrote:
>
> > I believe this is a bug. I think the reason this is occurring is that you
> > have an index segment with no values at all in the collapse field. If you
> > could create a jira ticket for this I will look at resolving the issue.
> >
> > Joel Bernstein
> > http://joelsolr.blogspot.com/
> >
> > On Mon, Jan 11, 2016 at 2:03 PM, sara hajili 
> > wrote:
> >
> > > I am using solr 5.3.1
> > > On Jan 11, 2016 10:30 PM, "Joel Bernstein"  wrote:
> > >
> > > > Which version of Solr are you using?
> > > >
> > > > Joel Bernstein
> > > > http://joelsolr.blogspot.com/
> > > >
> > > > On Mon, Jan 11, 2016 at 6:39 AM, sara hajili 
> > > > wrote:
> > > >
> > > > > hi all
> > > > > i have a MLT query and i wanna to use collapse filter query.
> > > > > and i wanna to use collapse expand nullPolicy.
> > > > > in this way when i used it :
> > > > > {!collapse field=original_post_id nullPolicy=expand}
> > > > > i got my appropriate result .
> > > > > (in solr web UI)
> > > > >
> > > > > but in regular search handler "/select",when i used
> > > > > {!collapse field=original_post_id nullPolicy=expand}
> > > > > i got error:
> > > > >
> > > > > {
> > > > >   "responseHeader":{
> > > > > "status":500,
> > > > > "QTime":2,
> > > > > "params":{
> > > > >   "q":"*:*",
> > > > >   "indent":"true",
> > > > >   "fq":"{!collapse field=original_post_id nullPolicy=expand}",
> > > > >   "wt":"json"}},
> > > > >   "error":{
> > > > > "trace":"java.lang.NullPointerException\n\tat
> > > > >
> > > > >
> > > >
> > >
> >
> org.apache.solr.search.CollapsingQParserPlugin$IntScoreCollector.finish(CollapsingQParserPlugin.java:763)\n\tat
> > > > >
> > > > >
> > > >
> > >
> >
> org.apache.solr.search.SolrIndexSearcher.buildAndRunCollectorChain(SolrIndexSearcher.java:211)\n\tat
> > > > >
> > > > >
> > > >
> > >
> >
> org.apache.solr.search.SolrIndexSearcher.getDocListNC(SolrIndexSearcher.java:1678)\n\tat
> > > > >
> > > > >
> > > >
> > >
> >
> org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:1497)\n\tat
> > > > >
> > > > >
> > > >
> > >
> >
> org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:555)\n\tat
> > > > >
> > > > >
> > > >
> > >
> >
> org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:522)\n\tat
> > > > >
> > > > >
> > > >
> > >
> >
> org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:277)\n\tat
> > > > >
> > > > >
> > > >
> > >
> >
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:143)\n\tat
> > > > > org.apache.solr.core.SolrCore.execute(SolrCore.java:2068)\n\tat
> > > > >
> > >
> org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:669)\n\tat
> > > > >
> > org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:462)\n\tat
> > > > >
> > > > >
> > > >
> > >
> >
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:214)\n\tat
> > > > >
> > > > >
> > > >
> > >
> >
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:179)\n\tat
> > > > >
> > > > >
> > > >
> > >
> >
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652)\n\tat
> > > > >
> > > > >
> > > >
> > >
> >
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:585)\n\tat
> > > > >
> > > > >
> > > >
> > >
> >
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)\n\tat
> > > > >
> > > > >
> > > >
> > >
> >
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:577)\n\tat
> > > > >
> > > > >
> > > >
> > >
> >
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:223)\n\tat
> > > > >
> > > > >
> > > >
> > >
> >
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1127)\n\tat
> > > > >
> > > > >
> > > >
> > >
> >
> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515)\n\tat
> > > > >
> > > > >
> > > >
> > >
> >
> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)\n\tat
> > > > >
> > > > >
> > > >
> > >
> >
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1061)\n\tat
> > > > >
> > > > >
> > > >
> > >
> >
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)\n\tat
> > > > >
> > > > >
> > > >
> > >
> >
> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:215)\n\tat
> > > > >
> > > > >
> > > >
> > >
> >
> org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:110)\n\tat
> > > > >
> > > > >
> > > >
> > >
> >
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97)\n\tat
>

Re: collapse filter query

2016-01-11 Thread Joel Bernstein

I'll create it later today and update this thread with the Jira number.

Joel Bernstein
http://joelsolr.blogspot.com/

On Mon, Jan 11, 2016 at 2:15 PM, sara hajili  wrote:

> Tnx.How I can create a jira ticket?
> On Jan 11, 2016 10:42 PM, "Joel Bernstein"  wrote:
>
> > I believe this is a bug. I think the reason this is occurring is that you
> > have an index segment with no values at all in the collapse field. If you
> > could create a jira ticket for this I will look at resolving the issue.
> >
> > Joel Bernstein
> > http://joelsolr.blogspot.com/
> >
> > On Mon, Jan 11, 2016 at 2:03 PM, sara hajili 
> > wrote:
> >
> > > I am using solr 5.3.1
> > > On Jan 11, 2016 10:30 PM, "Joel Bernstein"  wrote:
> > >
> > > > Which version of Solr are you using?
> > > >
> > > > Joel Bernstein
> > > > http://joelsolr.blogspot.com/
> > > >
> > > > On Mon, Jan 11, 2016 at 6:39 AM, sara hajili 
> > > > wrote:
> > > >
> > > > > hi all
> > > > > i have a MLT query and i wanna to use collapse filter query.
> > > > > and i wanna to use collapse expand nullPolicy.
> > > > > in this way when i used it :
> > > > > {!collapse field=original_post_id nullPolicy=expand}
> > > > > i got my appropriate result .
> > > > > (in solr web UI)
> > > > >
> > > > > but in regular search handler "/select",when i used
> > > > > {!collapse field=original_post_id nullPolicy=expand}
> > > > > i got error:
> > > > >
> > > > > {
> > > > >   "responseHeader":{
> > > > > "status":500,
> > > > > "QTime":2,
> > > > > "params":{
> > > > >   "q":"*:*",
> > > > >   "indent":"true",
> > > > >   "fq":"{!collapse field=original_post_id nullPolicy=expand}",
> > > > >   "wt":"json"}},
> > > > >   "error":{
> > > > > "trace":"java.lang.NullPointerException\n\tat
> > > > >
> > > > >
> > > >
> > >
> >
> org.apache.solr.search.CollapsingQParserPlugin$IntScoreCollector.finish(CollapsingQParserPlugin.java:763)\n\tat
> > > > >
> > > > >
> > > >
> > >
> >
> org.apache.solr.search.SolrIndexSearcher.buildAndRunCollectorChain(SolrIndexSearcher.java:211)\n\tat
> > > > >
> > > > >
> > > >
> > >
> >
> org.apache.solr.search.SolrIndexSearcher.getDocListNC(SolrIndexSearcher.java:1678)\n\tat
> > > > >
> > > > >
> > > >
> > >
> >
> org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:1497)\n\tat
> > > > >
> > > > >
> > > >
> > >
> >
> org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:555)\n\tat
> > > > >
> > > > >
> > > >
> > >
> >
> org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:522)\n\tat
> > > > >
> > > > >
> > > >
> > >
> >
> org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:277)\n\tat
> > > > >
> > > > >
> > > >
> > >
> >
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:143)\n\tat
> > > > > org.apache.solr.core.SolrCore.execute(SolrCore.java:2068)\n\tat
> > > > >
> > >
> org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:669)\n\tat
> > > > >
> > org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:462)\n\tat
> > > > >
> > > > >
> > > >
> > >
> >
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:214)\n\tat
> > > > >
> > > > >
> > > >
> > >
> >
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:179)\n\tat
> > > > >
> > > > >
> > > >
> > >
> >
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652)\n\tat
> > > > >
> > > > >
> > > >
> > >
> >
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:585)\n\tat
> > > > >
> > > > >
> > > >
> > >
> >
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)\n\tat
> > > > >
> > > > >
> > > >
> > >
> >
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:577)\n\tat
> > > > >
> > > > >
> > > >
> > >
> >
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:223)\n\tat
> > > > >
> > > > >
> > > >
> > >
> >
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1127)\n\tat
> > > > >
> > > > >
> > > >
> > >
> >
> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515)\n\tat
> > > > >
> > > > >
> > > >
> > >
> >
> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)\n\tat
> > > > >
> > > > >
> > > >
> > >
> >
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1061)\n\tat
> > > > >
> > > > >
> > > >
> > >
> >
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)\n\tat
> > > > >
> > > > >
> > > >
> > >
> >
> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:215)\n\tat
> > > > >
> > > > >
> > > >
> > >
> >
> org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:110)\n\tat
> > > > >
> > > > >
> > > >
> > >
> >
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97)\n\tat

Re: collection configuration stored in Zoo Keeper with solrCloud

2016-01-11 Thread Erick Erickson

Do be a little careful though. The sample zookeeper config
that comes with an Apache install of Zookeeper defaults
to storing the data in /tmp/zookeeper which is _not_ a place
you want persistent data on *nix systems. Note, this is _not_
the default for embedded Zookeeper in Solr.

And the other thing to check, depending on the age of your
zookeeper is whether it purges snapshots...

Best,
Erick

On Mon, Jan 11, 2016 at 10:59 AM, Jeff Courtade  wrote:
> Yes its stored in the directories configured in zoo.cfg
>
> .Jeff Courtade
> M: 240.507.6116
> On Jan 11, 2016 1:16 PM, "Jim Shi"  wrote:
>
>> Hi, I have question regarding collection configurations stored Zoo Keeper
>> with solrCloud.
>> All collection configurations are stored at Zoo Keeper. What happens if
>> you want to restart all Zoo Keeper instances? Does the Zoo Keeper persists
>> data on disk and can restore all configurations from disk?

Re: collapse filter query

2016-01-11 Thread sara hajili

Tnx.How I can create a jira ticket?
On Jan 11, 2016 10:42 PM, "Joel Bernstein"  wrote:

> I believe this is a bug. I think the reason this is occurring is that you
> have an index segment with no values at all in the collapse field. If you
> could create a jira ticket for this I will look at resolving the issue.
>
> Joel Bernstein
> http://joelsolr.blogspot.com/
>
> On Mon, Jan 11, 2016 at 2:03 PM, sara hajili 
> wrote:
>
> > I am using solr 5.3.1
> > On Jan 11, 2016 10:30 PM, "Joel Bernstein"  wrote:
> >
> > > Which version of Solr are you using?
> > >
> > > Joel Bernstein
> > > http://joelsolr.blogspot.com/
> > >
> > > On Mon, Jan 11, 2016 at 6:39 AM, sara hajili 
> > > wrote:
> > >
> > > > hi all
> > > > i have a MLT query and i wanna to use collapse filter query.
> > > > and i wanna to use collapse expand nullPolicy.
> > > > in this way when i used it :
> > > > {!collapse field=original_post_id nullPolicy=expand}
> > > > i got my appropriate result .
> > > > (in solr web UI)
> > > >
> > > > but in regular search handler "/select",when i used
> > > > {!collapse field=original_post_id nullPolicy=expand}
> > > > i got error:
> > > >
> > > > {
> > > >   "responseHeader":{
> > > > "status":500,
> > > > "QTime":2,
> > > > "params":{
> > > >   "q":"*:*",
> > > >   "indent":"true",
> > > >   "fq":"{!collapse field=original_post_id nullPolicy=expand}",
> > > >   "wt":"json"}},
> > > >   "error":{
> > > > "trace":"java.lang.NullPointerException\n\tat
> > > >
> > > >
> > >
> >
> org.apache.solr.search.CollapsingQParserPlugin$IntScoreCollector.finish(CollapsingQParserPlugin.java:763)\n\tat
> > > >
> > > >
> > >
> >
> org.apache.solr.search.SolrIndexSearcher.buildAndRunCollectorChain(SolrIndexSearcher.java:211)\n\tat
> > > >
> > > >
> > >
> >
> org.apache.solr.search.SolrIndexSearcher.getDocListNC(SolrIndexSearcher.java:1678)\n\tat
> > > >
> > > >
> > >
> >
> org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:1497)\n\tat
> > > >
> > > >
> > >
> >
> org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:555)\n\tat
> > > >
> > > >
> > >
> >
> org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:522)\n\tat
> > > >
> > > >
> > >
> >
> org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:277)\n\tat
> > > >
> > > >
> > >
> >
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:143)\n\tat
> > > > org.apache.solr.core.SolrCore.execute(SolrCore.java:2068)\n\tat
> > > >
> > org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:669)\n\tat
> > > >
> org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:462)\n\tat
> > > >
> > > >
> > >
> >
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:214)\n\tat
> > > >
> > > >
> > >
> >
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:179)\n\tat
> > > >
> > > >
> > >
> >
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652)\n\tat
> > > >
> > > >
> > >
> >
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:585)\n\tat
> > > >
> > > >
> > >
> >
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)\n\tat
> > > >
> > > >
> > >
> >
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:577)\n\tat
> > > >
> > > >
> > >
> >
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:223)\n\tat
> > > >
> > > >
> > >
> >
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1127)\n\tat
> > > >
> > > >
> > >
> >
> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515)\n\tat
> > > >
> > > >
> > >
> >
> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)\n\tat
> > > >
> > > >
> > >
> >
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1061)\n\tat
> > > >
> > > >
> > >
> >
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)\n\tat
> > > >
> > > >
> > >
> >
> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:215)\n\tat
> > > >
> > > >
> > >
> >
> org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:110)\n\tat
> > > >
> > > >
> > >
> >
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97)\n\tat
> > > > org.eclipse.jetty.server.Server.handle(Server.java:499)\n\tat
> > > >
> org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:310)\n\tat
> > > >
> > > >
> > >
> >
> org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:257)\n\tat
> > > >
> > > >
> > >
> >
> org.eclipse.jetty.io.AbstractConnection$2.run(AbstractConnection.java:540)\n\tat
> > > >
> > > >
> > >
> >
> org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:635)\n\tat
> > > >
> > > >
> > >
> >
> org.eclipse.jetty.util.thread.QueuedThr

Re: collapse filter query

2016-01-11 Thread Joel Bernstein

I believe this is a bug. I think the reason this is occurring is that you
have an index segment with no values at all in the collapse field. If you
could create a jira ticket for this I will look at resolving the issue.

Joel Bernstein
http://joelsolr.blogspot.com/

On Mon, Jan 11, 2016 at 2:03 PM, sara hajili  wrote:

> I am using solr 5.3.1
> On Jan 11, 2016 10:30 PM, "Joel Bernstein"  wrote:
>
> > Which version of Solr are you using?
> >
> > Joel Bernstein
> > http://joelsolr.blogspot.com/
> >
> > On Mon, Jan 11, 2016 at 6:39 AM, sara hajili 
> > wrote:
> >
> > > hi all
> > > i have a MLT query and i wanna to use collapse filter query.
> > > and i wanna to use collapse expand nullPolicy.
> > > in this way when i used it :
> > > {!collapse field=original_post_id nullPolicy=expand}
> > > i got my appropriate result .
> > > (in solr web UI)
> > >
> > > but in regular search handler "/select",when i used
> > > {!collapse field=original_post_id nullPolicy=expand}
> > > i got error:
> > >
> > > {
> > >   "responseHeader":{
> > > "status":500,
> > > "QTime":2,
> > > "params":{
> > >   "q":"*:*",
> > >   "indent":"true",
> > >   "fq":"{!collapse field=original_post_id nullPolicy=expand}",
> > >   "wt":"json"}},
> > >   "error":{
> > > "trace":"java.lang.NullPointerException\n\tat
> > >
> > >
> >
> org.apache.solr.search.CollapsingQParserPlugin$IntScoreCollector.finish(CollapsingQParserPlugin.java:763)\n\tat
> > >
> > >
> >
> org.apache.solr.search.SolrIndexSearcher.buildAndRunCollectorChain(SolrIndexSearcher.java:211)\n\tat
> > >
> > >
> >
> org.apache.solr.search.SolrIndexSearcher.getDocListNC(SolrIndexSearcher.java:1678)\n\tat
> > >
> > >
> >
> org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:1497)\n\tat
> > >
> > >
> >
> org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:555)\n\tat
> > >
> > >
> >
> org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:522)\n\tat
> > >
> > >
> >
> org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:277)\n\tat
> > >
> > >
> >
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:143)\n\tat
> > > org.apache.solr.core.SolrCore.execute(SolrCore.java:2068)\n\tat
> > >
> org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:669)\n\tat
> > > org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:462)\n\tat
> > >
> > >
> >
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:214)\n\tat
> > >
> > >
> >
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:179)\n\tat
> > >
> > >
> >
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652)\n\tat
> > >
> > >
> >
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:585)\n\tat
> > >
> > >
> >
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)\n\tat
> > >
> > >
> >
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:577)\n\tat
> > >
> > >
> >
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:223)\n\tat
> > >
> > >
> >
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1127)\n\tat
> > >
> > >
> >
> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515)\n\tat
> > >
> > >
> >
> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)\n\tat
> > >
> > >
> >
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1061)\n\tat
> > >
> > >
> >
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)\n\tat
> > >
> > >
> >
> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:215)\n\tat
> > >
> > >
> >
> org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:110)\n\tat
> > >
> > >
> >
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97)\n\tat
> > > org.eclipse.jetty.server.Server.handle(Server.java:499)\n\tat
> > > org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:310)\n\tat
> > >
> > >
> >
> org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:257)\n\tat
> > >
> > >
> >
> org.eclipse.jetty.io.AbstractConnection$2.run(AbstractConnection.java:540)\n\tat
> > >
> > >
> >
> org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:635)\n\tat
> > >
> > >
> >
> org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:555)\n\tat
> > > java.lang.Thread.run(Thread.java:745)\n",
> > > "code":500}}
> > >
> > >
> > > but if i change nullpolicy to ignore or collapse
> > >
> > > no error happened.
> > >
> > >
> > > and i wondered when i used pysolr to create  a more like this query
> > >
> > > i get above error again.
> > >
> > > so it seems nullpolicy=expand work me just in more like this query in
> > > solr web UI
> > >
> > > and

Re: collapse filter query

2016-01-11 Thread sara hajili

I am using solr 5.3.1
On Jan 11, 2016 10:30 PM, "Joel Bernstein"  wrote:

> Which version of Solr are you using?
>
> Joel Bernstein
> http://joelsolr.blogspot.com/
>
> On Mon, Jan 11, 2016 at 6:39 AM, sara hajili 
> wrote:
>
> > hi all
> > i have a MLT query and i wanna to use collapse filter query.
> > and i wanna to use collapse expand nullPolicy.
> > in this way when i used it :
> > {!collapse field=original_post_id nullPolicy=expand}
> > i got my appropriate result .
> > (in solr web UI)
> >
> > but in regular search handler "/select",when i used
> > {!collapse field=original_post_id nullPolicy=expand}
> > i got error:
> >
> > {
> >   "responseHeader":{
> > "status":500,
> > "QTime":2,
> > "params":{
> >   "q":"*:*",
> >   "indent":"true",
> >   "fq":"{!collapse field=original_post_id nullPolicy=expand}",
> >   "wt":"json"}},
> >   "error":{
> > "trace":"java.lang.NullPointerException\n\tat
> >
> >
> org.apache.solr.search.CollapsingQParserPlugin$IntScoreCollector.finish(CollapsingQParserPlugin.java:763)\n\tat
> >
> >
> org.apache.solr.search.SolrIndexSearcher.buildAndRunCollectorChain(SolrIndexSearcher.java:211)\n\tat
> >
> >
> org.apache.solr.search.SolrIndexSearcher.getDocListNC(SolrIndexSearcher.java:1678)\n\tat
> >
> >
> org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:1497)\n\tat
> >
> >
> org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:555)\n\tat
> >
> >
> org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:522)\n\tat
> >
> >
> org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:277)\n\tat
> >
> >
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:143)\n\tat
> > org.apache.solr.core.SolrCore.execute(SolrCore.java:2068)\n\tat
> > org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:669)\n\tat
> > org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:462)\n\tat
> >
> >
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:214)\n\tat
> >
> >
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:179)\n\tat
> >
> >
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652)\n\tat
> >
> >
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:585)\n\tat
> >
> >
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)\n\tat
> >
> >
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:577)\n\tat
> >
> >
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:223)\n\tat
> >
> >
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1127)\n\tat
> >
> >
> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515)\n\tat
> >
> >
> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)\n\tat
> >
> >
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1061)\n\tat
> >
> >
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)\n\tat
> >
> >
> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:215)\n\tat
> >
> >
> org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:110)\n\tat
> >
> >
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97)\n\tat
> > org.eclipse.jetty.server.Server.handle(Server.java:499)\n\tat
> > org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:310)\n\tat
> >
> >
> org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:257)\n\tat
> >
> >
> org.eclipse.jetty.io.AbstractConnection$2.run(AbstractConnection.java:540)\n\tat
> >
> >
> org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:635)\n\tat
> >
> >
> org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:555)\n\tat
> > java.lang.Thread.run(Thread.java:745)\n",
> > "code":500}}
> >
> >
> > but if i change nullpolicy to ignore or collapse
> >
> > no error happened.
> >
> >
> > and i wondered when i used pysolr to create  a more like this query
> >
> > i get above error again.
> >
> > so it seems nullpolicy=expand work me just in more like this query in
> > solr web UI
> >
> > and my question is how to solve my problem to use nullpolicy=expand in
> > pysolr?
> >
> > tnx
> >
>

Re: collapse filter query

2016-01-11 Thread Joel Bernstein

Which version of Solr are you using?

Joel Bernstein
http://joelsolr.blogspot.com/

On Mon, Jan 11, 2016 at 6:39 AM, sara hajili  wrote:

> hi all
> i have a MLT query and i wanna to use collapse filter query.
> and i wanna to use collapse expand nullPolicy.
> in this way when i used it :
> {!collapse field=original_post_id nullPolicy=expand}
> i got my appropriate result .
> (in solr web UI)
>
> but in regular search handler "/select",when i used
> {!collapse field=original_post_id nullPolicy=expand}
> i got error:
>
> {
>   "responseHeader":{
> "status":500,
> "QTime":2,
> "params":{
>   "q":"*:*",
>   "indent":"true",
>   "fq":"{!collapse field=original_post_id nullPolicy=expand}",
>   "wt":"json"}},
>   "error":{
> "trace":"java.lang.NullPointerException\n\tat
>
> org.apache.solr.search.CollapsingQParserPlugin$IntScoreCollector.finish(CollapsingQParserPlugin.java:763)\n\tat
>
> org.apache.solr.search.SolrIndexSearcher.buildAndRunCollectorChain(SolrIndexSearcher.java:211)\n\tat
>
> org.apache.solr.search.SolrIndexSearcher.getDocListNC(SolrIndexSearcher.java:1678)\n\tat
>
> org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:1497)\n\tat
>
> org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:555)\n\tat
>
> org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:522)\n\tat
>
> org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:277)\n\tat
>
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:143)\n\tat
> org.apache.solr.core.SolrCore.execute(SolrCore.java:2068)\n\tat
> org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:669)\n\tat
> org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:462)\n\tat
>
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:214)\n\tat
>
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:179)\n\tat
>
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652)\n\tat
>
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:585)\n\tat
>
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)\n\tat
>
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:577)\n\tat
>
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:223)\n\tat
>
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1127)\n\tat
>
> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515)\n\tat
>
> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)\n\tat
>
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1061)\n\tat
>
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)\n\tat
>
> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:215)\n\tat
>
> org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:110)\n\tat
>
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97)\n\tat
> org.eclipse.jetty.server.Server.handle(Server.java:499)\n\tat
> org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:310)\n\tat
>
> org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:257)\n\tat
>
> org.eclipse.jetty.io.AbstractConnection$2.run(AbstractConnection.java:540)\n\tat
>
> org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:635)\n\tat
>
> org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:555)\n\tat
> java.lang.Thread.run(Thread.java:745)\n",
> "code":500}}
>
>
> but if i change nullpolicy to ignore or collapse
>
> no error happened.
>
>
> and i wondered when i used pysolr to create  a more like this query
>
> i get above error again.
>
> so it seems nullpolicy=expand work me just in more like this query in
> solr web UI
>
> and my question is how to solve my problem to use nullpolicy=expand in
> pysolr?
>
> tnx
>

Re: collection configuration stored in Zoo Keeper with solrCloud

2016-01-11 Thread Jeff Courtade

Yes its stored in the directories configured in zoo.cfg

.Jeff Courtade
M: 240.507.6116
On Jan 11, 2016 1:16 PM, "Jim Shi"  wrote:

> Hi, I have question regarding collection configurations stored Zoo Keeper
> with solrCloud.
> All collection configurations are stored at Zoo Keeper. What happens if
> you want to restart all Zoo Keeper instances? Does the Zoo Keeper persists
> data on disk and can restore all configurations from disk?

Re: Solr /export handler is exporting only unique values from multivalued field?

2016-01-11 Thread Joel Bernstein

Perhaps you can achieve what you're trying to do with a prefix on the data
so the sort is maintained and duplicates are not eliminated.

Joel Bernstein
http://joelsolr.blogspot.com/

On Mon, Jan 11, 2016 at 1:42 PM, Joel Bernstein  wrote:

> The /export handler is using DocValues for export, which stores the
> multi-value fields as a sorted set. So the sorting is the expected
> behavior. If you have duplicates in the multi-value field this could
> account for the list being of different sizes.
>
>
>
> Joel Bernstein
> http://joelsolr.blogspot.com/
>
> On Mon, Jan 11, 2016 at 1:25 PM, Alok Bhandari <
> alokomprakashbhand...@gmail.com> wrote:
>
>> Hello ,
>>
>> I am using solr /export handler to export search results and it is
>> performing well.
>> Today I faced an issue , actually there are 2 multivalued fields I am
>> fetching lets say  which holds list of items and  which
>> holds
>> list of sellers.
>>
>> here I am storing information such that seller for 1st item is 1st seller
>> from seller list ,
>> item1--1
>> item2--1
>> item3--2
>>
>> I am expecting these 2 lists of same size.
>> When I export I get 3 entries in item list but only 2 entries in seller
>> list. Also these entries are sorted so it is not giving me the expected
>> results. So I cant find seller for item1,item2
>>
>> This is priority for me and I am stuck , please can someone help.
>>
>> I am using solr 5.2
>>
>> Thanks,
>> Alok
>>
>>
>>
>> --
>> View this message in context:
>> http://lucene.472066.n3.nabble.com/Solr-export-handler-is-exporting-only-unique-values-from-multivalued-field-tp4249986.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>>
>
>

Re: Solr /export handler is exporting only unique values from multivalued field?

2016-01-11 Thread Joel Bernstein

The /export handler is using DocValues for export, which stores the
multi-value fields as a sorted set. So the sorting is the expected
behavior. If you have duplicates in the multi-value field this could
account for the list being of different sizes.



Joel Bernstein
http://joelsolr.blogspot.com/

On Mon, Jan 11, 2016 at 1:25 PM, Alok Bhandari <
alokomprakashbhand...@gmail.com> wrote:

> Hello ,
>
> I am using solr /export handler to export search results and it is
> performing well.
> Today I faced an issue , actually there are 2 multivalued fields I am
> fetching lets say  which holds list of items and  which holds
> list of sellers.
>
> here I am storing information such that seller for 1st item is 1st seller
> from seller list ,
> item1--1
> item2--1
> item3--2
>
> I am expecting these 2 lists of same size.
> When I export I get 3 entries in item list but only 2 entries in seller
> list. Also these entries are sorted so it is not giving me the expected
> results. So I cant find seller for item1,item2
>
> This is priority for me and I am stuck , please can someone help.
>
> I am using solr 5.2
>
> Thanks,
> Alok
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Solr-export-handler-is-exporting-only-unique-values-from-multivalued-field-tp4249986.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>

Re: collection configuration stored in Zoo Keeper with solrCloud

2016-01-11 Thread Shawn Heisey

On 1/11/2016 11:13 AM, Jim Shi wrote:
> Hi, I have question regarding collection configurations stored Zoo Keeper 
> with solrCloud.
> All collection configurations are stored at Zoo Keeper. What happens if you 
> want to restart all Zoo Keeper instances? Does the Zoo Keeper persists data 
> on disk and can restore all configurations from disk?

Typically Zookeeper does store its database on disk.  I do not know
whether there are config options that would change this behavior.  If
there are, then I hope that you would be aware that they have been used.

If you have at least three zookeeper nodes, then you can restart them
one at a time and as long as they come up properly and your SolrCloud
installation is properly configured to use all of the zookeeper nodes,
it will not experience any problems.  A zookeeper ensemble with fewer
than three nodes is not fault tolerant.  Solr's zkHost parameter must
include all of the zookeeper nodes in the ensemble.

For stability, we recommend using standalone zookeeper software
(separate from Solr), with typical zookeeper defaults.  The amount of
support this mailing list can provide for zookeeper is minimal, but we
will attempt to help with simple problems.  The zookeeper project has
its own mailing lists.

Thanks,
Shawn

Solr /export handler is exporting only unique values from multivalued field?

2016-01-11 Thread Alok Bhandari

Hello ,

I am using solr /export handler to export search results and it is
performing well.
Today I faced an issue , actually there are 2 multivalued fields I am
fetching lets say  which holds list of items and  which holds
list of sellers.

here I am storing information such that seller for 1st item is 1st seller
from seller list , 
item1--1
item2--1
item3--2 

I am expecting these 2 lists of same size.
When I export I get 3 entries in item list but only 2 entries in seller
list. Also these entries are sorted so it is not giving me the expected
results. So I cant find seller for item1,item2

This is priority for me and I am stuck , please can someone help.

I am using solr 5.2

Thanks,
Alok



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-export-handler-is-exporting-only-unique-values-from-multivalued-field-tp4249986.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: WArning in SolrCloud logs

2016-01-11 Thread Shawn Heisey

On 1/11/2016 8:08 AM, Gian Maria Ricci - aka Alkampfer wrote:
>
> I've configured three node in solrcloud, everything seems ok, but in
> the log I see this kind of warning
>
>  
>
> SolrCloud is enabled for core xxx_shard3_replica1 but so is old-style
> replication. Make sure you intend this behavior, it usually indicates
> a mis-configuration. Master setting is true and slave setting is false
>
>   
>
>  
>
>   
>
>  
>
>   
>
>  
>
>   
>
>  
>
>  
>
> What could be the reason? Is it possible that this happens because
> solrconfig.xml used to create the collection has a replication handler
> active?
>

The message is being logged because the solrconfig.xml file for the
collection includes an explicit replication handler definition, most
likely with a name of "/replication".  If your Solr version is new
enough, you do not need to define the replication handler in your
configuration -- it will be automatically configured.

The message is a warning.  If SolrCloud is working, then you can put it
on a list to look at later ... but be sure you DO look at it later, and
remove the explicit config once you're on a new enough version.  I think
5.0 includes the implicit handlers, so any 5.x version is new enough. 
The configs included with 5.x would not have this handler, so I think
you must be using configs based on examples for an older version of Solr.

Thanks,
Shawn

collection configuration stored in Zoo Keeper with solrCloud

2016-01-11 Thread Jim Shi

Hi, I have question regarding collection configurations stored Zoo Keeper with 
solrCloud.
All collection configurations are stored at Zoo Keeper. What happens if you 
want to restart all Zoo Keeper instances? Does the Zoo Keeper persists data on 
disk and can restore all configurations from disk?

Re: Change leader in SolrCloud

2016-01-11 Thread Erick Erickson

You have to assign the preferredLeader role first. You can do that
node-by-node via ADDREPLICAPROP or have the system do it for you with
BALANCESHARDUNIQUE.

As I said before, in SolrCloud the leader forwards the raw document to
each follower. There is no pre-processing, analysis anything else done
on the leader first.

Best,
Erick

On Mon, Jan 11, 2016 at 9:19 AM, Gian Maria Ricci - aka Alkampfer
 wrote:
> Thanks.
>
> This arise a different question: when I index a document, it is assigned to 
> one of the three shard based on the value of the ID field. Actually indexing 
> a document is usually a CPU and RAM intensive work to parse text, tokenize, 
> etc. How this works in SolrCloud? I probably incorrectly assumed that the 
> indexing task is carried out by the shard leader, then data is propagated to 
> replica of that shard. This lead me to think that, having all three leader 
> shards in a node, it does not use other nodes to index data and performance 
> will suffer.
>
> I've tried to use REBALANCELEADERS but nothing changes (probably because 
> there are few shards).
>
> --
> Gian Maria Ricci
> Cell: +39 320 0136949
>
>
>
> -Original Message-
> From: Shawn Heisey [mailto:apa...@elyograg.org]
> Sent: lunedì 11 gennaio 2016 17:49
> To: solr-user@lucene.apache.org
> Subject: Re: Change leader in SolrCloud
>
> On 1/11/2016 8:45 AM, Gian Maria Ricci - aka Alkampfer wrote:
>> Due to the different reboot times probably, I’ve noticed that upon
>> reboot all three leader shards are on a single machine. I’m expecting
>> shard leaders to be distributed evenly between machines, because if
>> all shard leader are on a same machine, all new documents to index
>> will be routed to the same machine, thus indexing load is not subdivided.
>
> You're looking for the REBALANCELEADERS functionality ... but because you 
> only have three nodes, the fact that one machine has the leaders for all 
> three shards is not really a problem.
>
> https://cwiki.apache.org/confluence/display/solr/Collections+API#CollectionsAPI-RebalanceLeaders
>
> This feature was added for a use case where there are hundreds of nodes and 
> hundreds of total shards, with the leader roles heavily concentrated on a 
> small number of nodes.  With REBALANCELEADERS, the leader roles can be spread 
> more evenly around the cluster.
>
> It is true that the shard leader does do a small amount of extra work, but 
> for a very small installation like yours, the overhead is nothing to be 
> concerned about.  You can do something about it if it bothers you, though.
>
> Thanks,
> Shawn
>

Re: Pro and cons of using Solr Cloud vs standard Master Slave Replica

2016-01-11 Thread Shawn Heisey

On 1/11/2016 4:28 AM, Gian Maria Ricci - aka Alkampfer wrote:
> a customer need a comprehensive list of all pro and cons of using
> standard Master Slave replica VS using Solr Cloud. I’m interested
> especially in query performance consideration, because in this
> specific situation the rate of new documents is really slow, but the
> amount of data is about 50 millions of document, and the index size on
> disk for single core is about 30 GB.

The primary advantage to SolrCloud is that SolrCloud handles most of the
administrative and operational details for you automatically.

SolrCloud is a little more complicated to set up initially, because you
must worry about Zookeeper as well as Solr, but once it's properly set
up, there is no single point of failure.

> Such amount of data should be easily handled by a Master Slave replica
> with a  single core replicated on a certain number of slaves, but we
> need to evaluate also the option of SolrCloud, especially for fault
> tolerance.
>

Once you're beyond initial setup, fault tolerance with SolrCloud is much
easier than master/slave replication.  Switching a slave to a master is
possible, but the procedure is somewhat complicated.  SolrCloud does not
*have* masters, it is a true cluster.

With master/slave replication, the master handles all indexing, and the
finished index segments are copied to the slaves via HTTP, and the
slaves simply need to open them.  SolrCloud does indexing on all shard
replicas, nearly simultaneously.  Usually this is an advantage, not a
disadvantage, but in heavy indexing situations master/slave replication
*might* show better performance on the slaves.

Thanks,
Shawn

Re: Querying only replica's

2016-01-11 Thread Robert Brown


We won't be using SolrJ, etc. anytime soon unfortunately.

We'll be using a hardware load-balancer to send requests into the 
cloud/pool of servers.


The LB therefore needs to know when a node is down, otherwise a query 
wouldn't get anywhere.


The solr.PingRequestHandler is what I was after.




On 01/11/2016 05:16 PM, Alessandro Benedetti wrote:

mmm i think there is a misconception here :

On 10 January 2016 at 19:00, Robert Brown  wrote:


I'm thinking more about how the external load-balancer will know if a node
is down, as to take it out the pool of active servers to even attempt
sending a query to.


This is SolrCloud responsibility and in particular Zookeeper knows the
topology of the cluster.
A query will not reach a dead node.
You should use a SolrCloud aware client ( like the SolrJ one) .

If you want to use a different load-balancer because you don't like the
SolrCloud one, it will not be that easy, because the distribution of the
queries happens automatically.

Cheers


I could ping tho that just means the IP is alive.  I could configure the
load-balancer to actually try a query, but this may be (even a tiny)
performance hit.

Is there another recommended way of configuring external load-balancers to
know when a node is not accepting queries?




On 10/01/16 18:25, Erick Erickson wrote:


For health checks, you can go ahead and get the real IP addresses and
ping them directly if you care to Or just let Zookeeper do that
for you. One of the tasks of Zookeeper is pinging all the machines
with all the replicas and, if any of them are unreachable, telling the
rest of the cluster that that machine is down.

Best,
Erick

On Sun, Jan 10, 2016 at 5:19 AM, Robert Brown 
wrote:


Thanks Erick,

For the health-checks on the load-balancer side, would you recommend a
simple query, or is there a reliable ping or similar for this scenario?

Cheers,
Rob


On 09/01/16 23:44, Erick Erickson wrote:


bq: is it best/good to get the CLUSTERSTATUS via the collection API
and explicitly send queries to a replica to ensure I don't send
queries to the leaders of my collection

In a word _no_. SolrCloud is vastly different than the old
master/slave. In SolrCloud, each and every node (leader and replicas)
index all the docs and serve queries. The additional burden the leader
has is actually very small. There's absolutely no reason to _not_ use
the leader to serve queries.

As far as sending updates, there would be a _little_ benefit to
sending the updates directly to the leader, but _far_ more benefit in
using SolrJ. If you use SolrJ (and CloudSolrClient), then the
documents are split up on the _client_ and only the docs for a
particular shard are automatically sent to the leader for that shard.
Using SolrJ you can essentially scale indexing linearly with the
number of shards you have. Just using HTTP does not scale linearly.
Your particular app may not care, but in high-throughput situations
this can be significant.

So rather than spend time and effort sending updates directly to a
leader and have the leader then forward the docs to the correct shard,
I recommend investing the time in using SolrJ for updates rather than
sending updates to the leader over HTTP. Or just ignore the problem
and devote your efforts to something that are more valuable.

So in short:
1> just stick a load balancer in front of _all_ your Solr nodes for
queries. And note that there's an internal load balancer already in
Solr that routes things around anyway, although putting a load
balancer in front of your entire cluster makes it so there's not a
single point of failure.
2> Depending on your throughput needs, either
2a> use SolrJ to index
2b> don't worry about it and send updates through the load balancer as
well. There'll be an extra hop if you send updates to a replica, but
if that's significant you should be using SolrJ

As for 5.5, it's not at all clear that there _will_ be a 5.5. 5.4 was
just released in early December. There's usually a several month lag
between point releases and there's some agitation to start the 6.0
release process, so it's up in the air.


On Sat, Jan 9, 2016 at 12:04 PM, Robert Brown 
wrote:


Hi,

(btw, when is 5.5 due?  I see the docs reference it, but not the
download
page)

Anyway, I index and query Solr over HTTP (no SolrJ, etc.) - is it
best/good
to get the CLUSTERSTATUS via the collection API and explicitly send
queries
to a replica to ensure I don't send queries to the leaders of my
collection,
to improve performance?  Like-wise with sending updates directly to a
Leader?

My leaders will receive full updates of the entire collection once a
day,
so
I would assume if the leader is handling queries too, performance would
be
hit?

Is the CLUSTERSTATUS API the only way to do this btw without SolrJ,
etc.?
I
wasn't sure if ZooKeeper would be able to tell me also.

Do I also need to do anything to ensure the leaders are never sent
queries
from the replica's?

Does this all sound sane?

One of my collections i

RE: Change leader in SolrCloud

2016-01-11 Thread Gian Maria Ricci - aka Alkampfer

Thanks.

This arise a different question: when I index a document, it is assigned to one 
of the three shard based on the value of the ID field. Actually indexing a 
document is usually a CPU and RAM intensive work to parse text, tokenize, etc. 
How this works in SolrCloud? I probably incorrectly assumed that the indexing 
task is carried out by the shard leader, then data is propagated to replica of 
that shard. This lead me to think that, having all three leader shards in a 
node, it does not use other nodes to index data and performance will suffer. 

I've tried to use REBALANCELEADERS but nothing changes (probably because there 
are few shards).

--
Gian Maria Ricci
Cell: +39 320 0136949

-Original Message-
From: Shawn Heisey [mailto:apa...@elyograg.org] 
Sent: lunedì 11 gennaio 2016 17:49
To: solr-user@lucene.apache.org
Subject: Re: Change leader in SolrCloud

On 1/11/2016 8:45 AM, Gian Maria Ricci - aka Alkampfer wrote:
> Due to the different reboot times probably, I’ve noticed that upon 
> reboot all three leader shards are on a single machine. I’m expecting 
> shard leaders to be distributed evenly between machines, because if 
> all shard leader are on a same machine, all new documents to index 
> will be routed to the same machine, thus indexing load is not subdivided.

You're looking for the REBALANCELEADERS functionality ... but because you only 
have three nodes, the fact that one machine has the leaders for all three 
shards is not really a problem.

https://cwiki.apache.org/confluence/display/solr/Collections+API#CollectionsAPI-RebalanceLeaders

This feature was added for a use case where there are hundreds of nodes and 
hundreds of total shards, with the leader roles heavily concentrated on a small 
number of nodes.  With REBALANCELEADERS, the leader roles can be spread more 
evenly around the cluster.

It is true that the shard leader does do a small amount of extra work, but for 
a very small installation like yours, the overhead is nothing to be concerned 
about.  You can do something about it if it bothers you, though.

Thanks,
Shawn

Re: Querying only replica's

2016-01-11 Thread Alessandro Benedetti

mmm i think there is a misconception here :

On 10 January 2016 at 19:00, Robert Brown  wrote:

> I'm thinking more about how the external load-balancer will know if a node
> is down, as to take it out the pool of active servers to even attempt
> sending a query to.
>
This is SolrCloud responsibility and in particular Zookeeper knows the
topology of the cluster.
A query will not reach a dead node.
You should use a SolrCloud aware client ( like the SolrJ one) .

If you want to use a different load-balancer because you don't like the
SolrCloud one, it will not be that easy, because the distribution of the
queries happens automatically.

Cheers

>
> I could ping tho that just means the IP is alive.  I could configure the
> load-balancer to actually try a query, but this may be (even a tiny)
> performance hit.
>
> Is there another recommended way of configuring external load-balancers to
> know when a node is not accepting queries?
>
>
>
>
> On 10/01/16 18:25, Erick Erickson wrote:
>
>> For health checks, you can go ahead and get the real IP addresses and
>> ping them directly if you care to Or just let Zookeeper do that
>> for you. One of the tasks of Zookeeper is pinging all the machines
>> with all the replicas and, if any of them are unreachable, telling the
>> rest of the cluster that that machine is down.
>>
>> Best,
>> Erick
>>
>> On Sun, Jan 10, 2016 at 5:19 AM, Robert Brown 
>> wrote:
>>
>>> Thanks Erick,
>>>
>>> For the health-checks on the load-balancer side, would you recommend a
>>> simple query, or is there a reliable ping or similar for this scenario?
>>>
>>> Cheers,
>>> Rob
>>>
>>>
>>> On 09/01/16 23:44, Erick Erickson wrote:
>>>
 bq: is it best/good to get the CLUSTERSTATUS via the collection API
 and explicitly send queries to a replica to ensure I don't send
 queries to the leaders of my collection

 In a word _no_. SolrCloud is vastly different than the old
 master/slave. In SolrCloud, each and every node (leader and replicas)
 index all the docs and serve queries. The additional burden the leader
 has is actually very small. There's absolutely no reason to _not_ use
 the leader to serve queries.

 As far as sending updates, there would be a _little_ benefit to
 sending the updates directly to the leader, but _far_ more benefit in
 using SolrJ. If you use SolrJ (and CloudSolrClient), then the
 documents are split up on the _client_ and only the docs for a
 particular shard are automatically sent to the leader for that shard.
 Using SolrJ you can essentially scale indexing linearly with the
 number of shards you have. Just using HTTP does not scale linearly.
 Your particular app may not care, but in high-throughput situations
 this can be significant.

 So rather than spend time and effort sending updates directly to a
 leader and have the leader then forward the docs to the correct shard,
 I recommend investing the time in using SolrJ for updates rather than
 sending updates to the leader over HTTP. Or just ignore the problem
 and devote your efforts to something that are more valuable.

 So in short:
 1> just stick a load balancer in front of _all_ your Solr nodes for
 queries. And note that there's an internal load balancer already in
 Solr that routes things around anyway, although putting a load
 balancer in front of your entire cluster makes it so there's not a
 single point of failure.
 2> Depending on your throughput needs, either
 2a> use SolrJ to index
 2b> don't worry about it and send updates through the load balancer as
 well. There'll be an extra hop if you send updates to a replica, but
 if that's significant you should be using SolrJ

 As for 5.5, it's not at all clear that there _will_ be a 5.5. 5.4 was
 just released in early December. There's usually a several month lag
 between point releases and there's some agitation to start the 6.0
 release process, so it's up in the air.

 On Sat, Jan 9, 2016 at 12:04 PM, Robert Brown 
 wrote:

> Hi,
>
> (btw, when is 5.5 due?  I see the docs reference it, but not the
> download
> page)
>
> Anyway, I index and query Solr over HTTP (no SolrJ, etc.) - is it
> best/good
> to get the CLUSTERSTATUS via the collection API and explicitly send
> queries
> to a replica to ensure I don't send queries to the leaders of my
> collection,
> to improve performance?  Like-wise with sending updates directly to a
> Leader?
>
> My leaders will receive full updates of the entire collection once a
> day,
> so
> I would assume if the leader is handling queries too, performance would
> be
> hit?
>
> Is the CLUSTERSTATUS API the only way to do this btw without SolrJ,
> etc.?
> I
> wasn't sure if ZooKeeper would be able to tell me also.
>
> Do I also need to

Re: Change leader in SolrCloud

2016-01-11 Thread Erick Erickson

Shawn is spot-on, here's a little bit of "color commentary"

bq: all new documents to index will be routed to the same machine,
thus indexing load is not subdivided

This is something of  a misconception. Indexing is always done on all
nodes, leaders and replicas alike
in SolrCloud. The leader is responsible for coordinating the
distribution of the raw documents to
the followers, _not_ forwarding the _indexed_ doc.

Best,
Erick

On Mon, Jan 11, 2016 at 8:48 AM, Shawn Heisey  wrote:
> On 1/11/2016 8:45 AM, Gian Maria Ricci - aka Alkampfer wrote:
>> Due to the different reboot times probably, I’ve noticed that upon
>> reboot all three leader shards are on a single machine. I’m expecting
>> shard leaders to be distributed evenly between machines, because if
>> all shard leader are on a same machine, all new documents to index
>> will be routed to the same machine, thus indexing load is not subdivided.
>
> You're looking for the REBALANCELEADERS functionality ... but because
> you only have three nodes, the fact that one machine has the leaders for
> all three shards is not really a problem.
>
> https://cwiki.apache.org/confluence/display/solr/Collections+API#CollectionsAPI-RebalanceLeaders
>
> This feature was added for a use case where there are hundreds of nodes
> and hundreds of total shards, with the leader roles heavily concentrated
> on a small number of nodes.  With REBALANCELEADERS, the leader roles can
> be spread more evenly around the cluster.
>
> It is true that the shard leader does do a small amount of extra work,
> but for a very small installation like yours, the overhead is nothing to
> be concerned about.  You can do something about it if it bothers you,
> though.
>
> Thanks,
> Shawn
>

Re: Change leader in SolrCloud

2016-01-11 Thread Shawn Heisey

On 1/11/2016 8:45 AM, Gian Maria Ricci - aka Alkampfer wrote:
> Due to the different reboot times probably, I’ve noticed that upon
> reboot all three leader shards are on a single machine. I’m expecting
> shard leaders to be distributed evenly between machines, because if
> all shard leader are on a same machine, all new documents to index
> will be routed to the same machine, thus indexing load is not subdivided.

You're looking for the REBALANCELEADERS functionality ... but because
you only have three nodes, the fact that one machine has the leaders for
all three shards is not really a problem.

https://cwiki.apache.org/confluence/display/solr/Collections+API#CollectionsAPI-RebalanceLeaders

This feature was added for a use case where there are hundreds of nodes
and hundreds of total shards, with the leader roles heavily concentrated
on a small number of nodes.  With REBALANCELEADERS, the leader roles can
be spread more evenly around the cluster.

It is true that the shard leader does do a small amount of extra work,
but for a very small installation like yours, the overhead is nothing to
be concerned about.  You can do something about it if it bothers you,
though.

Thanks,
Shawn

Re: WArning in SolrCloud logs

2016-01-11 Thread Alessandro Benedetti

To be honest it seems to me more a wrong usage of java environment
variables.
Is it possible you are sending the enable master/slave config to the node ?
Strictly talking about the replication request handler, it is required for
SolrCloud ( there are scenarios where old style replication is still used) .
But this supposes to happen automatically .

Strip of code that cause the warning :

if (enableMaster || enableSlave) {
> if (core.getCoreDescriptor().getCoreContainer().getZkController() != null)
> {
> LOG.warn("SolrCloud is enabled for core " + core.getName() + " but so is
> old-style replication. Make sure you" +
> " intend this behavior, it usually indicates a mis-configuration. Master
> setting is " +
> Boolean.toString(enableMaster) + " and slave setting is " + Boolean.
> toString(enableSlave));
> }
> }


Cheers

On 11 January 2016 at 15:08, Gian Maria Ricci - aka Alkampfer <
alkamp...@nablasoft.com> wrote:

> I’ve configured three node in solrcloud, everything seems ok, but in the
> log I see this kind of warning
>
>
>
> SolrCloud is enabled for core xxx_shard3_replica1 but so is old-style
> replication. Make sure you intend this behavior, it usually indicates a
> mis-configuration. Master setting is true and slave setting is false
>
>
>
>
>
>
>
>
>
>
>
> What could be the reason? Is it possible that this happens because
> solrconfig.xml used to create the collection has a replication handler
> active?
>
> --
> Gian Maria Ricci
> Cell: +39 320 0136949
>
> [image:
> https://ci5.googleusercontent.com/proxy/5oNMOYAeFXZ_LDKanNfoLRHC37mAZkVVhkPN7QxMdA0K5JW2m0bm8azJe7oWZMNt8fKHNX1bzrUTd-kIyE40CmwT2Mlf8OI=s0-d-e1-ft#http://www.codewrecks.com/files/signature/mvp.png]
>  [image:
> https://ci3.googleusercontent.com/proxy/f-unQbmk6NtkHFspO5Y6x4jlIf_xrmGLUT3fU9y_7VUHSFUjLs7aUIMdZQYTh3eWIA0sBnvNX3WGXCU59chKXLuAHi2ArWdAcBclKA=s0-d-e1-ft#http://www.codewrecks.com/files/signature/linkedin.jpg]
>  [image:
> https://ci3.googleusercontent.com/proxy/gjapMzu3KEakBQUstx_-cN7gHJ_GpcIZNEPjCzOYMrPl-r1DViPE378qNAQyEWbXMTj6mcduIAGaApe9qHG1KN_hyFxQAIkdNSVT=s0-d-e1-ft#http://www.codewrecks.com/files/signature/twitter.jpg]
>  [image:
> https://ci5.googleusercontent.com/proxy/iuDOD2sdaxRDvTwS8MO7-CcXchpNJX96uaWuvagoVLcjpAPsJi88XeOonE4vHT6udVimo7yL9ZtdrYueEfH7jXnudmi_Vvw=s0-d-e1-ft#http://www.codewrecks.com/files/signature/rss.jpg]
>  [image:
> https://ci6.googleusercontent.com/proxy/EBJjfkBzcsSlAzlyR88y86YXcwaKfn3x7ydAObL1vtjJYclQr_l5TvrFx4PQ5qLNYW3yp7Ig66DJ-0tPJCDbDmYAFcamPQehwg=s0-d-e1-ft#http://www.codewrecks.com/files/signature/skype.jpg]
>
>
>



-- 
--

Benedetti Alessandro
Visiting card : http://about.me/alessandro_benedetti

"Tyger, tyger burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symmetry?"

William Blake - Songs of Experience -1794 England

Re: Change leader in SolrCloud

2016-01-11 Thread Alessandro Benedetti

This is an interesting point.
Unfortunately I am not sure it is possible to configure anywhere to avoid
the leaders co-location.
I think zookeeper ideally assumes each solr node is on a separate machine.
Curious to know if we can optimize the colocation though config.

Cheers

On 11 January 2016 at 15:45, Gian Maria Ricci - aka Alkampfer <
alkamp...@nablasoft.com> wrote:

> I’ve a test solrCloud installation consisting of Three CentOS machines,
> each one running one zookeeper node and one solr instance. I’ve created a
> collection with 3 shards and 2 replica per each shard, then, after some
> tests, rebooted all three machines.
>
>
>
> Due to the different reboot times probably, I’ve noticed that upon reboot
> all three leader shards are on a single machine. I’m expecting shard
> leaders to be distributed evenly between machines, because if all shard
> leader are on a same machine, all new documents to index will be routed to
> the same machine, thus indexing load is not subdivided.
>
>
>
> I’ve searched for COLLECTION API commands that can helps me to obtain
> this, but it seems that it is not possible. My question is, how can I be
> sure that leader shards will be distributed evenly across all machines?
>
>
>
> http://screencast.com/t/6jD6a0x8
>
>
>
> Thanks.
>
>
>
> --
> Gian Maria Ricci
> Cell: +39 320 0136949
>
> [image:
> https://ci5.googleusercontent.com/proxy/5oNMOYAeFXZ_LDKanNfoLRHC37mAZkVVhkPN7QxMdA0K5JW2m0bm8azJe7oWZMNt8fKHNX1bzrUTd-kIyE40CmwT2Mlf8OI=s0-d-e1-ft#http://www.codewrecks.com/files/signature/mvp.png]
>  [image:
> https://ci3.googleusercontent.com/proxy/f-unQbmk6NtkHFspO5Y6x4jlIf_xrmGLUT3fU9y_7VUHSFUjLs7aUIMdZQYTh3eWIA0sBnvNX3WGXCU59chKXLuAHi2ArWdAcBclKA=s0-d-e1-ft#http://www.codewrecks.com/files/signature/linkedin.jpg]
>  [image:
> https://ci3.googleusercontent.com/proxy/gjapMzu3KEakBQUstx_-cN7gHJ_GpcIZNEPjCzOYMrPl-r1DViPE378qNAQyEWbXMTj6mcduIAGaApe9qHG1KN_hyFxQAIkdNSVT=s0-d-e1-ft#http://www.codewrecks.com/files/signature/twitter.jpg]
>  [image:
> https://ci5.googleusercontent.com/proxy/iuDOD2sdaxRDvTwS8MO7-CcXchpNJX96uaWuvagoVLcjpAPsJi88XeOonE4vHT6udVimo7yL9ZtdrYueEfH7jXnudmi_Vvw=s0-d-e1-ft#http://www.codewrecks.com/files/signature/rss.jpg]
>  [image:
> https://ci6.googleusercontent.com/proxy/EBJjfkBzcsSlAzlyR88y86YXcwaKfn3x7ydAObL1vtjJYclQr_l5TvrFx4PQ5qLNYW3yp7Ig66DJ-0tPJCDbDmYAFcamPQehwg=s0-d-e1-ft#http://www.codewrecks.com/files/signature/skype.jpg]
>
>
>



-- 
--

Benedetti Alessandro
Visiting card : http://about.me/alessandro_benedetti

"Tyger, tyger burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symmetry?"

William Blake - Songs of Experience -1794 England

Change leader in SolrCloud

2016-01-11 Thread Gian Maria Ricci - aka Alkampfer

I've a test solrCloud installation consisting of Three CentOS machines, each
one running one zookeeper node and one solr instance. I've created a
collection with 3 shards and 2 replica per each shard, then, after some
tests, rebooted all three machines.

 

Due to the different reboot times probably, I've noticed that upon reboot
all three leader shards are on a single machine. I'm expecting shard leaders
to be distributed evenly between machines, because if all shard leader are
on a same machine, all new documents to index will be routed to the same
machine, thus indexing load is not subdivided.

 

I've searched for COLLECTION API commands that can helps me to obtain this,
but it seems that it is not possible. My question is, how can I be sure that
leader shards will be distributed evenly across all machines? 

 

http://screencast.com/t/6jD6a0x8 

 

Thanks.

 

--
Gian Maria Ricci
Cell: +39 320 0136949

Re: Spellcheck response format differs between a single core and SolrCloud

2016-01-11 Thread Ryan Yacyshyn

That's solves the mystery. The single-core is running 4.10.1 and SolrCloud
on 5.3.1.

Thanks James.



On Mon, 11 Jan 2016 at 22:24 Dyer, James 
wrote:

> Ryan,
>
> The json response format changed for Solr 5.0.  See
> https://issues.apache.org/jira/browse/SOLR-3029 .  Is the single-core
> solr running a 4.x version with the cloud solr running 5.x ?  If they are
> both on the same major version, then we have a bug.
>
> James Dyer
> Ingram Content Group
>
>
> -Original Message-
> From: Ryan Yacyshyn [mailto:ryan.yacys...@gmail.com]
> Sent: Monday, January 11, 2016 12:32 AM
> To: solr-user@lucene.apache.org
> Subject: Spellcheck response format differs between a single core and
> SolrCloud
>
> Hello,
>
> I am using the spellcheck component for spelling suggestions and I've used
> the same configurations in two separate projects, the only difference is
> one project uses a single core and the other is a collection on SolrCloud
> with three shards. The single core has about 56K docs and the one on
> SolrCloud has 1M docs. Strangely, the format of the response is slightly
> different between the two and I'm not sure why (particularly the collations
> part). Was wondering if any can shed some light on this? Below is my
> configuration and the results I'm getting.
>
> This is in my "/select" searchHandler:
>
> 
> on
> false
> 5
> 2
> 5
> true
> true
> 5
> 3
>
> And my spellcheck component:
>
> 
> 
>   
>   
> default
> spelling
> solr.DirectSolrSpellChecker
> internal
> 0.5
> 2
> 1
> 5
> 4
> 0.01
>   
> 
>
> Examples of each output can be found here:
> https://gist.github.com/ryac/ceff8da00ec9f5b84106
>
> Thanks,
> Ryan
>

Re: [More Like This] Query building

2016-01-11 Thread Alessandro Benedetti

Hi guys,
the patch seems fine to me.
I didn't spend much more time on the code but I checked the tests and the
pre-commit checks.
It seems fine to me.
Let me know ,

Cheers

On 31 December 2015 at 18:40, Alessandro Benedetti 
wrote:

> https://issues.apache.org/jira/browse/LUCENE-6954
>
> First draft patch available, I will check better the tests new year !
>
> On 29 December 2015 at 13:43, Alessandro Benedetti 
> wrote:
>
>> Sure, I will proceed tomorrow with the Jira and the simple patch + tests.
>>
>> In the meantime let's try to collect some additional feedback.
>>
>> Cheers
>>
>> On 29 December 2015 at 12:43, Anshum Gupta 
>> wrote:
>>
>>> Feel free to create a JIRA and put up a patch if you can.
>>>
>>> On Tue, Dec 29, 2015 at 4:26 PM, Alessandro Benedetti <
>>> abenede...@apache.org
>>> > wrote:
>>>
>>> > Hi guys,
>>> > While I was exploring the way we build the More Like This query, I
>>> > discovered a part I am not convinced of :
>>> >
>>> >
>>> >
>>> > Let's see how we build the query :
>>> > org.apache.lucene.queries.mlt.MoreLikeThis#retrieveTerms(int)
>>> >
>>> > 1) we extract the terms from the interesting fields, adding them to a
>>> map :
>>> >
>>> > Map termFreqMap = new HashMap<>();
>>> >
>>> > *( we lose the relation field-> term, we don't know anymore where the
>>> term
>>> > was coming ! )*
>>> >
>>> > org.apache.lucene.queries.mlt.MoreLikeThis#createQueue
>>> >
>>> > 2) we build the queue that will contain the query terms, at this point
>>> we
>>> > connect again there terms to some field, but :
>>> >
>>> > ...
>>> >> // go through all the fields and find the largest document frequency
>>> >> String topField = fieldNames[0];
>>> >> int docFreq = 0;
>>> >> for (String fieldName : fieldNames) {
>>> >>   int freq = ir.docFreq(new Term(fieldName, word));
>>> >>   topField = (freq > docFreq) ? fieldName : topField;
>>> >>   docFreq = (freq > docFreq) ? freq : docFreq;
>>> >> }
>>> >> ...
>>> >
>>> >
>>> > We identify the topField as the field with the highest document
>>> frequency
>>> > for the term t .
>>> > Then we build the termQuery :
>>> >
>>> > queue.add(new ScoreTerm(word, *topField*, score, idf, docFreq, tf));
>>> >
>>> > In this way we lose a lot of precision.
>>> > Not sure why we do that.
>>> > I would prefer to keep the relation between terms and fields.
>>> > The MLT query can improve a lot the quality.
>>> > If i run the MLT on 2 fields : *description* and *facilities* for
>>> example.
>>> > It is likely I want to find documents with similar terms in the
>>> > description and similar terms in the facilities, without mixing up the
>>> > things and loosing the semantic of the terms.
>>> >
>>> > Let me know your opinion,
>>> >
>>> > Cheers
>>> >
>>> >
>>> > --
>>> > --
>>> >
>>> > Benedetti Alessandro
>>> > Visiting card : http://about.me/alessandro_benedetti
>>> >
>>> > "Tyger, tyger burning bright
>>> > In the forests of the night,
>>> > What immortal hand or eye
>>> > Could frame thy fearful symmetry?"
>>> >
>>> > William Blake - Songs of Experience -1794 England
>>> >
>>>
>>>
>>>
>>> --
>>> Anshum Gupta
>>>
>>
>>
>>
>> --
>> --
>>
>> Benedetti Alessandro
>> Visiting card : http://about.me/alessandro_benedetti
>>
>> "Tyger, tyger burning bright
>> In the forests of the night,
>> What immortal hand or eye
>> Could frame thy fearful symmetry?"
>>
>> William Blake - Songs of Experience -1794 England
>>
>
>
>
> --
> --
>
> Benedetti Alessandro
> Visiting card : http://about.me/alessandro_benedetti
>
> "Tyger, tyger burning bright
> In the forests of the night,
> What immortal hand or eye
> Could frame thy fearful symmetry?"
>
> William Blake - Songs of Experience -1794 England
>



-- 
--

Benedetti Alessandro
Visiting card : http://about.me/alessandro_benedetti

"Tyger, tyger burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symmetry?"

William Blake - Songs of Experience -1794 England

WArning in SolrCloud logs

2016-01-11 Thread Gian Maria Ricci - aka Alkampfer

I’ve configured three node in solrcloud, everything seems ok, but in the log I 
see this kind of warning

 


SolrCloud is enabled for core xxx_shard3_replica1 but so is old-style 
replication. Make sure you intend this behavior, it usually indicates a 
mis-configuration. Master setting is true and slave setting is false

 

 

 

 

 

What could be the reason? Is it possible that this happens because 
solrconfig.xml used to create the collection has a replication handler active?

--
Gian Maria Ricci
Cell: +39 320 0136949

Re: Solr has multiple log lines for single search

2016-01-11 Thread Mark Miller

Two of them are sub requests. They have params isShard=true and
distrib=false. The top level user query will not have distrib or isShard
because they default the other way.

- Mark

On Mon, Jan 11, 2016 at 6:30 AM Syed Mudasseer 
wrote:

> Hi,
> I have solr configured on cloud with the following details:
> Every collection has 3 shards andEach shard consists of 3 replicas.
> Whenever I search for any field in solr, having faceting and highlighting
> query checked,then I get more than 2 search logs stored in the log file.
> (sometimes, it goes up to 8 log lines).
> I am trying to get the search terms entered by user, but due to duplicate
> records I am not able to decide which query is more appropriate to parse.
> Here is an example of log lines(field search with faceting) gives me 3
> results in the log,
> INFO  - 2016-01-11 11:07:09.321; org.apache.solr.core.SolrCore;
> [mycollection_shard2_replica1] webapp=/solr path=/select
> params={f.ab_model.facet.limit=160&lowercaseOperators=true&facet=true&qf=description&distrib=false&hl.simple.pre=&wt=javabin&hl=false&version=2&rows=100&defType=edismax&NOW=1452510429317&shard.url=
> http://MyURL:8983/solr/mycollection_shard2_replica1/|http://MyURL:8983/solr/mycollection_shard2_replica3/|http://MyURL:8983/solr/mycollection_shard2_replica2/&fl=id&fl=score&df=search&start=0&q=MySearchTerm&f.ab_model.facet.mincount=0&_=9652510428630&hl.simple.post=&facet.field=ab_model&isShard=true&stopwords=true&fsv=true}
> hits=753 status=0 QTime=1
> INFO  - 2016-01-11 11:07:09.349; org.apache.solr.core.SolrCore;
> [mycollection_shard2_replica1] webapp=/solr path=/select
> params={lowercaseOperators=true&facet=false&ids=2547891056_HDR,3618199460_HDR,3618192453_HDR,3618277839_HDR,3618186992_HDR,3618081995_HDR,3618074192_HDR,3618189660_HDR,3618073929_HDR,3618078287_HDR,3618084580_HDR,3618075438_HDR,3618170375_HDR,3618195949_HDR,3618074030_HDR,3618085730_HDR,3618078288_HDR,3618072500_HDR,3618086961_HDR,3618170928_HDR,3618077108_HDR,3618074090_HDR,3618181279_HDR,3618188058_HDR,3618181018_HDR,3618199309_HDR,3618195610_HDR,3618281575_HDR,3618195568_HDR,3618080877_HDR,3618199114_HDR,3618199132_HDR,3618084030_HDR,3618280868_HDR,3618193086_HDR,3618275194_HDR,3618074917_HDR,3618195102_HDR,3618086958_HDR,3618084870_HDR,3618174630_HDR,3618075776_HDR,3618190529_HDR,3618192993_HDR,3618084217_HDR,3618176677_HDR,3618183612_HDR&qf=description&distrib=false&hl.simple.pre=&wt=javabin&hl=true&version=2&rows=100&defType=edismax&NOW=1452510429317&shard.url=
> http://MyURL:8983/solr/mycollection_shard2_replica1/|http://MyURL:8983/solr/mycollection_shard2_replica3/|http://MyURL:8983/solr/mycollection_shard2_replica2/&df=search&q=MySearchTerm&_=1452510428630&hl.simple.post=&facet.field=ab_model&isShard=true&stopwords=true}
> status=0 QTime=15
> INFO  - 2016-01-11 11:07:09.352; org.apache.solr.core.SolrCore;
> [mycollection_shard1_replica1] webapp=/solr path=/select
> params={lowercaseOperators=true&facet=true&indent=true&qf=description&hl.simple.pre=&wt=json&hl=true&defType=edismax&q=MySearchTerm&_=1452510428630&hl.simple.post=&facet.field=ab_model&stopwords=true}
> hits=2276 status=0 QTime=35
> If I have highlighted query checked, then I get more than 3 logs.
> So my question is which line is more appropriate to get the search query
> entered by User?or Should I consider all of the log lines?
>

-- 
- Mark
about.me/markrmiller

Re: Possible Bug - MDC handling in org.apache.solr.common.util.ExecutorUtil.MDCAwareThreadPoolExecutor.execute(Runnable)

2016-01-11 Thread Mark Miller

Not sure I'm onboard with the first proposed solution, but yes, I'd open a
JIRA issue to discuss.

- Mark

On Mon, Jan 11, 2016 at 4:01 AM Konstantin Hollerith 
wrote:

> Hi,
>
> I'm using SLF4J MDC to log additional Information in my WebApp. Some of my
> MDC-Parameters even include Line-Breaks.
> It seems, that Solr takes _all_ MDC parameters and puts them into the
> Thread-Name, see
>
> org.apache.solr.common.util.ExecutorUtil.MDCAwareThreadPoolExecutor.execute(Runnable).
>
> When there is some logging of Solr, the log gets cluttered:
>
> [11.01.16 09:14:19:170 CET] 02a3 SystemOut O 09:14:19,169
> [zkCallback-14-thread-1-processing-My
> Custom
> MDC
> Parameter ROraqiFWaoXqP21gu4uLpMh SANDHO] WARN
> common.cloud.ConnectionManager [session=ROraqiFWaoXqP21gu4uLpMh]
> [user=SANDHO]: zkClient received AuthFailed
>
> (some of my MDC-Parameters are only active in Email-Logs and are not
> included in the file-log)
>
> I think this is a Bug. Solr should only put its own MDC-Parameter into the
> Thread-Name.
>
> Possible Solution: Since all (as far as i can check) invocations in Solr of
> MDC.put uses a Prefix like "ConcurrentUpdateSolrClient" or
> "CloudSolrClient" etc., it would be possible to put a check into
> MDCAwareThreadPoolExecutor.execute(Runnable) that process only those
> Prefixes.
>
> Should i open a Jira-Issue for this?
>
> Thanks,
>
> Konstantin
>
> Environment: JSF-Based App with WebSphrere 8.5, Solr 5.3.0, slf4j-1.7.12,
> all jars are in WEB-INF/lib.
>
-- 
- Mark
about.me/markrmiller

RE: Spellcheck response format differs between a single core and SolrCloud

2016-01-11 Thread Dyer, James

Ryan,

The json response format changed for Solr 5.0.  See 
https://issues.apache.org/jira/browse/SOLR-3029 .  Is the single-core solr 
running a 4.x version with the cloud solr running 5.x ?  If they are both on 
the same major version, then we have a bug.

James Dyer
Ingram Content Group


-Original Message-
From: Ryan Yacyshyn [mailto:ryan.yacys...@gmail.com] 
Sent: Monday, January 11, 2016 12:32 AM
To: solr-user@lucene.apache.org
Subject: Spellcheck response format differs between a single core and SolrCloud

Hello,

I am using the spellcheck component for spelling suggestions and I've used
the same configurations in two separate projects, the only difference is
one project uses a single core and the other is a collection on SolrCloud
with three shards. The single core has about 56K docs and the one on
SolrCloud has 1M docs. Strangely, the format of the response is slightly
different between the two and I'm not sure why (particularly the collations
part). Was wondering if any can shed some light on this? Below is my
configuration and the results I'm getting.

This is in my "/select" searchHandler:


on
false
5
2
5
true
true
5
3

And my spellcheck component:



  
  
default
spelling
solr.DirectSolrSpellChecker
internal
0.5
2
1
5
4
0.01
  


Examples of each output can be found here:
https://gist.github.com/ryac/ceff8da00ec9f5b84106

Thanks,
Ryan

Re: how to search miilions of record in solr query

2016-01-11 Thread Mugeesh Husain

Thanks Erick,
 "You have to cache (or something) somewhere to make this work."-- Actually
they are not interested to use cache mechanism.

they dont need paging,they want only 10 records with 1 millions ID search in
background etc.

As of now i have implemented terms query parser but result performance is
not pretty well.


I am still looking solution or some suggestion in lucene side or solr side ?

Thanks



--
View this message in context: 
http://lucene.472066.n3.nabble.com/how-to-search-miilions-of-record-in-solr-query-tp4248360p4249871.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Pro and cons of using Solr Cloud vs standard Master Slave Replica

2016-01-11 Thread Rahul Ramesh

Please have a look at this post

https://support.lucidworks.com/hc/en-us/articles/201298317-What-is-SolrCloud-And-how-does-it-compare-to-master-slave-

We dont use Master slave architecture, however we use solr cloud and
standalone solr for our documents.

Indexing is a bit slow in cloud when compared to Standalone. This is
because of replication I think. However you will get a faster query
response.

Solr Cloud also requires a slightly elaborate setup with Zookeepers
compared to master/slave or standalone.

However, once Solr cloud is setup, it runs very smoothly and you dont have
to worry about the performance / high availability.

Please check the post, a detailed analysis and comparison between the two
has been given.

-Rahul


On Mon, Jan 11, 2016 at 4:58 PM, Gian Maria Ricci - aka Alkampfer <
alkamp...@nablasoft.com> wrote:

> Hi guys,
>
>
>
> a customer need a comprehensive list of all pro and cons of using standard
> Master Slave replica VS using Solr Cloud. I’m interested especially in
> query performance consideration, because in this specific situation the
> rate of new documents is really slow, but the amount of data is about 50
> millions of document, and the index size on disk for single core is about
> 30 GB.
>
>
>
> Such amount of data should be easily handled by a Master Slave replica
> with a  single core replicated on a certain number of slaves, but we need
> to evaluate also the option of SolrCloud, especially for fault tolerance.
>
>
>
> I’ve googled around, but did not find anything really comprehensive, so
> I’m looking for real experience from you in Mailing List. J.
>
>
>
> Thanks in advance.
>
>
>
> --
> Gian Maria Ricci
> Cell: +39 320 0136949
>
> [image:
> https://ci5.googleusercontent.com/proxy/5oNMOYAeFXZ_LDKanNfoLRHC37mAZkVVhkPN7QxMdA0K5JW2m0bm8azJe7oWZMNt8fKHNX1bzrUTd-kIyE40CmwT2Mlf8OI=s0-d-e1-ft#http://www.codewrecks.com/files/signature/mvp.png]
>  [image:
> https://ci3.googleusercontent.com/proxy/f-unQbmk6NtkHFspO5Y6x4jlIf_xrmGLUT3fU9y_7VUHSFUjLs7aUIMdZQYTh3eWIA0sBnvNX3WGXCU59chKXLuAHi2ArWdAcBclKA=s0-d-e1-ft#http://www.codewrecks.com/files/signature/linkedin.jpg]
>  [image:
> https://ci3.googleusercontent.com/proxy/gjapMzu3KEakBQUstx_-cN7gHJ_GpcIZNEPjCzOYMrPl-r1DViPE378qNAQyEWbXMTj6mcduIAGaApe9qHG1KN_hyFxQAIkdNSVT=s0-d-e1-ft#http://www.codewrecks.com/files/signature/twitter.jpg]
>  [image:
> https://ci5.googleusercontent.com/proxy/iuDOD2sdaxRDvTwS8MO7-CcXchpNJX96uaWuvagoVLcjpAPsJi88XeOonE4vHT6udVimo7yL9ZtdrYueEfH7jXnudmi_Vvw=s0-d-e1-ft#http://www.codewrecks.com/files/signature/rss.jpg]
>  [image:
> https://ci6.googleusercontent.com/proxy/EBJjfkBzcsSlAzlyR88y86YXcwaKfn3x7ydAObL1vtjJYclQr_l5TvrFx4PQ5qLNYW3yp7Ig66DJ-0tPJCDbDmYAFcamPQehwg=s0-d-e1-ft#http://www.codewrecks.com/files/signature/skype.jpg]
>
>
>

collapse filter query

2016-01-11 Thread sara hajili

hi all
i have a MLT query and i wanna to use collapse filter query.
and i wanna to use collapse expand nullPolicy.
in this way when i used it :
{!collapse field=original_post_id nullPolicy=expand}
i got my appropriate result .
(in solr web UI)

but in regular search handler "/select",when i used
{!collapse field=original_post_id nullPolicy=expand}
i got error:

{
  "responseHeader":{
"status":500,
"QTime":2,
"params":{
  "q":"*:*",
  "indent":"true",
  "fq":"{!collapse field=original_post_id nullPolicy=expand}",
  "wt":"json"}},
  "error":{
"trace":"java.lang.NullPointerException\n\tat
org.apache.solr.search.CollapsingQParserPlugin$IntScoreCollector.finish(CollapsingQParserPlugin.java:763)\n\tat
org.apache.solr.search.SolrIndexSearcher.buildAndRunCollectorChain(SolrIndexSearcher.java:211)\n\tat
org.apache.solr.search.SolrIndexSearcher.getDocListNC(SolrIndexSearcher.java:1678)\n\tat
org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:1497)\n\tat
org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:555)\n\tat
org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:522)\n\tat
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:277)\n\tat
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:143)\n\tat
org.apache.solr.core.SolrCore.execute(SolrCore.java:2068)\n\tat
org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:669)\n\tat
org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:462)\n\tat
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:214)\n\tat
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:179)\n\tat
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652)\n\tat
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:585)\n\tat
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)\n\tat
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:577)\n\tat
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:223)\n\tat
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1127)\n\tat
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515)\n\tat
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)\n\tat
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1061)\n\tat
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)\n\tat
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:215)\n\tat
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:110)\n\tat
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97)\n\tat
org.eclipse.jetty.server.Server.handle(Server.java:499)\n\tat
org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:310)\n\tat
org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:257)\n\tat
org.eclipse.jetty.io.AbstractConnection$2.run(AbstractConnection.java:540)\n\tat
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:635)\n\tat
org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:555)\n\tat
java.lang.Thread.run(Thread.java:745)\n",
"code":500}}


but if i change nullpolicy to ignore or collapse

no error happened.


and i wondered when i used pysolr to create  a more like this query

i get above error again.

so it seems nullpolicy=expand work me just in more like this query in
solr web UI

and my question is how to solve my problem to use nullpolicy=expand in pysolr?

tnx

Re: Bad return type exception

2016-01-11 Thread Asanka Sanjaya Herath

Hi Shawn,

Thank you for your explanation. Yes, without Oozie the project runs
successfully.

On Mon, Jan 11, 2016 at 1:03 PM, Shawn Heisey  wrote:

> On 1/10/2016 11:56 PM, Asanka Sanjaya Herath wrote:
> > I tried to create a solr client using following code.
> >
> >  solrClient = new CloudSolrClient(zkHost);
> >  solrClient.setDefaultCollection(solrCollection);
> > 
> > Solr4j version:5.4.0
> >
> > Project built successfully but in run time I get following error. Any
> > help is appreciated.
> >
> > Main class [org.apache.oozie.action.hadoop.JavaMain], main() threw
> > exception, java.lang.VerifyError: Bad return type
> > Exception Details:
> >   Location:
>
> I'm guessing that this is happening because Oozie includes a dependency
> on commons-httpclient, or httpclient 3.x, and SolrJ 4.0 and later has a
> dependency on httpcomponents -- httpclient 4.x.  It is very likely that
> SolrJ is seeing the classes provided by the 3.x jars, and that the
> method signatures are incompatible with what SolrJ expects.
>
> It is very possible that you will be unable to make Oozie work with
> SolrJ 4.0 or later, and since you are using CloudSolrClient, you have no
> choice but the newer SolrJ version.
>
> I believe that the reason it compiles successfully is because the SolrJ
> code you are using does not expose anything having to do with HttpClient
> at all.  That interaction only happens deeper down, within SolrJ.
>
> Thanks,
> Shawn
>
>


-- 
Thanks,
Regards,
ASH

Solr has multiple log lines for single search

2016-01-11 Thread Syed Mudasseer

Hi,
I have solr configured on cloud with the following details:
Every collection has 3 shards andEach shard consists of 3 replicas.
Whenever I search for any field in solr, having faceting and highlighting query 
checked,then I get more than 2 search logs stored in the log file. (sometimes, 
it goes up to 8 log lines).
I am trying to get the search terms entered by user, but due to duplicate 
records I am not able to decide which query is more appropriate to parse.
Here is an example of log lines(field search with faceting) gives me 3 results 
in the log,
INFO  - 2016-01-11 11:07:09.321; org.apache.solr.core.SolrCore; 
[mycollection_shard2_replica1] webapp=/solr path=/select 
params={f.ab_model.facet.limit=160&lowercaseOperators=true&facet=true&qf=description&distrib=false&hl.simple.pre=&wt=javabin&hl=false&version=2&rows=100&defType=edismax&NOW=1452510429317&shard.url=http://MyURL:8983/solr/mycollection_shard2_replica1/|http://MyURL:8983/solr/mycollection_shard2_replica3/|http://MyURL:8983/solr/mycollection_shard2_replica2/&fl=id&fl=score&df=search&start=0&q=MySearchTerm&f.ab_model.facet.mincount=0&_=9652510428630&hl.simple.post=&facet.field=ab_model&isShard=true&stopwords=true&fsv=true}
 hits=753 status=0 QTime=1 
INFO  - 2016-01-11 11:07:09.349; org.apache.solr.core.SolrCore; 
[mycollection_shard2_replica1] webapp=/solr path=/select 
params={lowercaseOperators=true&facet=false&ids=2547891056_HDR,3618199460_HDR,3618192453_HDR,3618277839_HDR,3618186992_HDR,3618081995_HDR,3618074192_HDR,3618189660_HDR,3618073929_HDR,3618078287_HDR,3618084580_HDR,3618075438_HDR,3618170375_HDR,3618195949_HDR,3618074030_HDR,3618085730_HDR,3618078288_HDR,3618072500_HDR,3618086961_HDR,3618170928_HDR,3618077108_HDR,3618074090_HDR,3618181279_HDR,3618188058_HDR,3618181018_HDR,3618199309_HDR,3618195610_HDR,3618281575_HDR,3618195568_HDR,3618080877_HDR,3618199114_HDR,3618199132_HDR,3618084030_HDR,3618280868_HDR,3618193086_HDR,3618275194_HDR,3618074917_HDR,3618195102_HDR,3618086958_HDR,3618084870_HDR,3618174630_HDR,3618075776_HDR,3618190529_HDR,3618192993_HDR,3618084217_HDR,3618176677_HDR,3618183612_HDR&qf=description&distrib=false&hl.simple.pre=&wt=javabin&hl=true&version=2&rows=100&defType=edismax&NOW=1452510429317&shard.url=http://MyURL:8983/solr/mycollection_shard2_replica1/|http://MyURL:8983/solr/mycollection_shard2_replica3/|http://MyURL:8983/solr/mycollection_shard2_replica2/&df=search&q=MySearchTerm&_=1452510428630&hl.simple.post=&facet.field=ab_model&isShard=true&stopwords=true}
 status=0 QTime=15 
INFO  - 2016-01-11 11:07:09.352; org.apache.solr.core.SolrCore; 
[mycollection_shard1_replica1] webapp=/solr path=/select 
params={lowercaseOperators=true&facet=true&indent=true&qf=description&hl.simple.pre=&wt=json&hl=true&defType=edismax&q=MySearchTerm&_=1452510428630&hl.simple.post=&facet.field=ab_model&stopwords=true}
 hits=2276 status=0 QTime=35 
If I have highlighted query checked, then I get more than 3 logs.
So my question is which line is more appropriate to get the search query 
entered by User?or Should I consider all of the log lines?

Pro and cons of using Solr Cloud vs standard Master Slave Replica

2016-01-11 Thread Gian Maria Ricci - aka Alkampfer

Hi guys,

 

a customer need a comprehensive list of all pro and cons of using standard
Master Slave replica VS using Solr Cloud. I'm interested especially in query
performance consideration, because in this specific situation the rate of
new documents is really slow, but the amount of data is about 50 millions of
document, and the index size on disk for single core is about 30 GB. 

 

Such amount of data should be easily handled by a Master Slave replica with
a  single core replicated on a certain number of slaves, but we need to
evaluate also the option of SolrCloud, especially for fault tolerance.

 

I've googled around, but did not find anything really comprehensive, so I'm
looking for real experience from you in Mailing List. :).

 

Thanks in advance.

 

--
Gian Maria Ricci
Cell: +39 320 0136949

Possible Bug - MDC handling in org.apache.solr.common.util.ExecutorUtil.MDCAwareThreadPoolExecutor.execute(Runnable)

2016-01-11 Thread Konstantin Hollerith

Hi,

I'm using SLF4J MDC to log additional Information in my WebApp. Some of my
MDC-Parameters even include Line-Breaks.
It seems, that Solr takes _all_ MDC parameters and puts them into the
Thread-Name, see
org.apache.solr.common.util.ExecutorUtil.MDCAwareThreadPoolExecutor.execute(Runnable).

When there is some logging of Solr, the log gets cluttered:

[11.01.16 09:14:19:170 CET] 02a3 SystemOut O 09:14:19,169
[zkCallback-14-thread-1-processing-My
Custom
MDC
Parameter ROraqiFWaoXqP21gu4uLpMh SANDHO] WARN
common.cloud.ConnectionManager [session=ROraqiFWaoXqP21gu4uLpMh]
[user=SANDHO]: zkClient received AuthFailed

(some of my MDC-Parameters are only active in Email-Logs and are not
included in the file-log)

I think this is a Bug. Solr should only put its own MDC-Parameter into the
Thread-Name.

Possible Solution: Since all (as far as i can check) invocations in Solr of
MDC.put uses a Prefix like "ConcurrentUpdateSolrClient" or
"CloudSolrClient" etc., it would be possible to put a check into
MDCAwareThreadPoolExecutor.execute(Runnable) that process only those
Prefixes.

Should i open a Jira-Issue for this?

Thanks,

Konstantin

Environment: JSF-Based App with WebSphrere 8.5, Solr 5.3.0, slf4j-1.7.12,
all jars are in WEB-INF/lib.

65 matches

Mail list logo