Sorting on solr results

2013-12-04 Thread anuragwalia
HI All,

Please provide me your idea for below problem.

I required to sort product on webshop price with position.

e.g. If we have three product (A, B ,C) needs to sort Price asc and position
asc.

ID  Price   Position
A   10  3
B   10  2
C   20  5

Result should be sorted forst by price than by position.

Required Order of result :
B
A
C
While A,B products having same price but position of B is higher then A.
My result set query as of now
:&@QueryTerm=*&OnlineFlag=1&@Sort.Price=0,position=0

Please suggest your views for the same.

 





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Sorting-on-solr-results-tp4105060.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Questions about commits and OOE

2013-12-04 Thread Mikhail Khludnev
On Wed, Dec 4, 2013 at 6:36 PM, OSMAN Metin wrote:

> During this massive update, we have sometimes a peak of active threads
> exceeding the limit of 8192 process authorized for the user running the
> tomcat and zookeeper process.
> When this happens, every hardCommit is failing with an "OutOfMemory :
> unable to create native thread" message.
>

Hello,

Can you check by jstack what are these threads? If they are web container
threads you need to limit thread pool, if these are background merge
threads you might need to configure merge policy, etc.


-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics


 


Re: facet.method=fcs vs facet.method=fc on solr slaves

2013-12-04 Thread Mikhail Khludnev
Hello Patrick,

Replication flushes UnInvertedField cache that impacts fc, but doesn't harm
Lucene's FieldCache which is for fcs. You can check how much time in millis
is spend on UnInvertedField cache regeneration in INFO logs like
"UnInverted multi-valued field ,time=### ..."


On Thu, Dec 5, 2013 at 12:15 AM, Patrick O'Lone  wrote:

> Is there any advantage on a Solr slave to receive queries using
> facet.method=fcs instead of the default of facet.method=fc? Most of the
> segment files are unchanged between replication events - but I wasn't
> sure if replication would cause the unchanged segment field caches to be
> lost anyway.
> --
> Patrick O'Lone
> Director of Software Development
> TownNews.com
>
> E-mail ... pol...@townnews.com
> Phone  309-743-0809
> Fax .. 309-743-0830
>



-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics


 


Re: SOLR 4 not utilizing multi CPU cores

2013-12-04 Thread Andrea Gazzarini
Hi, I did moreless the same but didn't get that behaviour...could you give
us more details

Best,
Gazza
On 5 Dec 2013 06:54, "Salman Akram" 
wrote:

> Hi,
>
> We recently upgraded to SOLR 4.6 from SOLR 1.4.1. Overall the performance
> went down for large phrase queries. On some analysis we have seen that
> 1.4.1 utilized multiple cpu cores for such queries but SOLR 4.6 is only
> utilizing single cpu core. Any idea on what could be the reason?
>
> Note: We are not using SOLR Sharding.
>
> --
> Regards,
>
> Salman Akram
>


SOLR 4 not utilizing multi CPU cores

2013-12-04 Thread Salman Akram
Hi,

We recently upgraded to SOLR 4.6 from SOLR 1.4.1. Overall the performance
went down for large phrase queries. On some analysis we have seen that
1.4.1 utilized multiple cpu cores for such queries but SOLR 4.6 is only
utilizing single cpu core. Any idea on what could be the reason?

Note: We are not using SOLR Sharding.

-- 
Regards,

Salman Akram


Re: SOLR Master-Slave Repeater with Load balancer

2013-12-04 Thread Walter Underwood
Erick is right, you have been put in a terrible position.

You need to get agreement, in writing, that it is OK for search to go down when 
one server is out of service. This might be for scheduled maintenance or even a 
config update. When one server is down, search is down, period.

This requirement is like choosing a truck, but insisting that there is only 
budget for three tires.

You must, must, must communicate the risks associated with a two-server 
SolrCloud cluster.

wunder

On Dec 4, 2013, at 7:10 PM, Erick Erickson  wrote:

> bq:  but we have limitation on the number of servers that we can use due to
> budget
> concerns (limit is 2)
> 
> really, really, really push back to your project managers on this. So what
> you need 3 machines for a ZooKeeper quorum? The needs of ZK are quite
> light, they don't need a powerful machine. Your managers are saying "for
> want of spending $1,000 on a machine, which we will waste 10 times that
> paying engineers to set up an old-style system, we can't go with
> SolrCloud". You can run the ZooKeeper instances in a separate JVM on your
> two servers and have a cheap machine running ZK for the third instance if
> necessary.
> 
> Another rant finished.
> 
> Erick
> 
> 
> On Wed, Dec 4, 2013 at 6:07 PM, kondamudims  wrote:
> 
>> Hi Erick,
>> Thanks a lot for your explanation. We initially considered Solr Cloud but
>> we
>> have limitation on the number of servers that we can use due to budget
>> concerns (limit is 2) Solr Cloud requires minimum 3. I have tried out the
>> solution you suggested and so far its going well and we are not doing self
>> polling concept.
>> 
>> 
>> 
>> --
>> View this message in context:
>> http://lucene.472066.n3.nabble.com/SOLR-Master-Slave-Repeater-with-Load-balancer-tp4103363p4105017.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>> 

--
Walter Underwood
wun...@wunderwood.org





Re: SOLR Master-Slave Repeater with Load balancer

2013-12-04 Thread Erick Erickson
bq:  but we have limitation on the number of servers that we can use due to
budget
concerns (limit is 2)

really, really, really push back to your project managers on this. So what
you need 3 machines for a ZooKeeper quorum? The needs of ZK are quite
light, they don't need a powerful machine. Your managers are saying "for
want of spending $1,000 on a machine, which we will waste 10 times that
paying engineers to set up an old-style system, we can't go with
SolrCloud". You can run the ZooKeeper instances in a separate JVM on your
two servers and have a cheap machine running ZK for the third instance if
necessary.

Another rant finished.

Erick


On Wed, Dec 4, 2013 at 6:07 PM, kondamudims  wrote:

> Hi Erick,
> Thanks a lot for your explanation. We initially considered Solr Cloud but
> we
> have limitation on the number of servers that we can use due to budget
> concerns (limit is 2) Solr Cloud requires minimum 3. I have tried out the
> solution you suggested and so far its going well and we are not doing self
> polling concept.
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/SOLR-Master-Slave-Repeater-with-Load-balancer-tp4103363p4105017.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: a core for every user, lots of users... are there issues

2013-12-04 Thread Erick Erickson
I don't know of anyone who's tried and failed to combine transient cores
and SolrCloud. I also don't know of anyone who's tried and succeeded.

I'm saying that the transient core stuff has been thoroughly tested in
non-cloud mode. And people have been working with it for a couple of
releases now. I know of no a-priori reason it wouldn't work in SolrCloud.
But I haven't personally done it, nor do I know of anyone who has. It might
"just work", but the proof is in the pudding.

I've heard some scuttlebutt that the combination of SolrCloud and transient
cores is being, or will be soon, investigated. As in testing and writing
test cases. Being a pessimist by nature on these things, I suspect (but
don't know) that something will come up.

For instance, SolrCloud tries to keep track of all the states of all the
nodes. I _think_ (but don't know for sure) that this is just keeping
contact with the JVM, not particular cores. But what if there's something I
don't know about that pings the individual cores? That would keep them
constantly loading/unloading, which might crop up in unexpected ways. I've
got to emphasize that this is an unknown (at least to me), but an example
of something that could crop up. I'm sure there are other possibilities.

Or distributed updates. For that, every core on every node for a shard in
collectionX must process the update. So for updates, each and every core in
each and every shard might have to be loaded for the update to succeed if
the core is transient. Does this happen fast enough in all cases so a
timeout doesn't cause the update to fail? Or the node to be marked as down?
What about combining that with a heavy query load? I just don't know.

It's uncharted territory is all. I'd love it for you to volunteer to be the
first :). There's certainly committer interest in making this case work so
you wouldn't be left hanging all alone. If I were planning a product
though, I'd either treat the combination of transient cores and SolrCloud
as a R&D project or go with non-cloud mode until I had some reassurance
that transient cores and SolrCloud played nicely together.

All that said, I don't want to paint too bleak a picture. All the transient
core stuff is local to a particular node. SolrCloud and ZooKeeper shouldn't
be interested in the details. It _should_ "just work". It's just that I
can't point to any examples where that's been tried

Best,
Erick


On Wed, Dec 4, 2013 at 5:08 PM, hank williams  wrote:

> Oh my... when you say "I don't know anyone who's combined the two." do you
> mean that those that have tried have failed or that no one has gotten
> around to trying? It sounds like you are saying you have some specific
> knowledge that right now these wont work, otherwise you wouldnt say
> "committers
> will be addressing this sometime soon", right?
>
> I'm worried as we need to make a practical decision here and it sounds like
> maybe we should stick with solr for now... is that what you are saying?
>
>
> On Wed, Dec 4, 2013 at 5:01 PM, Erick Erickson  >wrote:
>
> > Hank:
> >
> > I should add that lots of cores and SolrCloud aren't guaranteed to play
> > nice together. I think some of the committers will be addressing this
> > sometime soon.
> >
> > I'm not saying that this will certainly fail, OTOH I don't know anyone
> > who's combined the two.
> >
> > Erick
> >
> >
> > On Wed, Dec 4, 2013 at 3:18 PM, hank williams  wrote:
> >
> > > Super helpful. Thanks.
> > >
> > >
> > > On Wed, Dec 4, 2013 at 2:53 PM, Shawn Heisey 
> wrote:
> > >
> > > > On 12/4/2013 12:34 PM, hank williams wrote:
> > > >
> > > >> Ok one more simple question. We just upgraded to 4.6 from 4.2. In
> 4.2
> > we
> > > >> were *trying* to use the rest API function "create" to create cores
> > > >> without
> > > >> having to manually mess with files on the server. Is this what
> > "create"
> > > >> was
> > > >> supposed to do? If so it was borken or we werent using it right. In
> > any
> > > >> case in 4.6 is that the right way to programmatically add cores in
> > > >> discovery mode?
> > > >>
> > > >
> > > > If you are NOT in SolrCloud mode, in order to create new cores, the
> > > config
> > > > files need to already exist on the disk.  This is the case with all
> > > > versions of Solr.
> > > >
> > > > If you're running in SolrCloud mode, the core is associated with a
> > > > collection.  Collections have a link to aconfig in zookeeper.  The
> > config
> > > > is not stored with the core on the disk.
> > > >
> > > > Thanks,
> > > > Shawn
> > > >
> > > >
> > >
> > >
> > > --
> > > blog: whydoeseverythingsuck.com
> > >
> >
>
>
>
> --
> blog: whydoeseverythingsuck.com
>


Re: how to increase each index file size

2013-12-04 Thread YouPeng Yang
Hi Erick

   Thanks for your reply.


Regards


2013/12/4 Erick Erickson 

> Why do you want to do this? Are you seeing performance problems?
> If not, I'd just ignore this problem, premature optimization and all that.
>
> If you _really_ want to do this, your segments files are closed every
> time you to a commit, opensearcher=true|false doesn't matter.
>
> BUT, the longer these are the bigger your transaction log will be,
> which may lead to other issues, particularly on restart. See:
>
> http://searchhub.org/2013/08/23/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/
>
> The key is the section on truncating the tlog.
>
> And note the sizes of these segments will change as they're
> merged anyway.
>
> Best,
> Erick
>
>
> On Wed, Dec 4, 2013 at 4:42 AM, YouPeng Yang  >wrote:
>
> > Hi
> >   I'm using the SolrCloud integreted with HDFS,I found there are lots of
> > small size files.
> >   So,I'd like to increase  the index  file size  while doing DIH
> > full-import. Any suggestion to achieve this goal.
> >
> >
> > Regards.
> >
>


Prioritize search returns by URL path?

2013-12-04 Thread Jim Glynn
We have a Telligent based community with Solr as the search engine. We want
to prioritize search returns from within the community by the type of
content: Wiki articles as most relevant, then blog posts, then Verified
answer and Suggested answer forum posts, then remaining forum posts. We have
also implemented a Helpful voting capability and would like to boost items
with more Helpful votes above those within their same category with fewer
votes.

Has anyone out there done something similar, or can someone suggest how to
do this? We're new to search engine tuning, so assume very little knowledge
on our part.

Thanks for your help!
JRG



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Prioritize-search-returns-by-URL-path-tp4105023.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Inconsistent numFound in SC when querying core directly

2013-12-04 Thread Mark Miller
Keep in mind, there have been a *lot* of bug fixes since 4.3.1.

- Mark

On Dec 4, 2013, at 7:07 PM, Tim Vaillancourt  wrote:

> Hey all,
> 
> Now that I am getting correct results with "distrib=false", I've identified 
> that 1 of my nodes has just 1/3rd of the total data set and totally explains 
> the flapping in results. The fix for this is obvious (rebuild replica) but 
> the cause is less obvious.
> 
> There is definately more than one issue going on with this SolrCloud (but 1 
> down thanks to Chris' suggestion!), so I'm guessing the fact that 
> /clusterstate.json doesn't seem to get updated when nodes are brought down/up 
> is the reason why this replica remained in the distributed request chain 
> without recovering/re-replicating from leader.
> 
> I imagine my Zookeeper ensemble is having some problems unrelated to Solr 
> that is the real root cause.
> 
> Thanks!
> 
> Tim
> 
> On 04/12/13 03:00 PM, Tim Vaillancourt wrote:
>> Chris, this is extremely helpful and it's silly I didn't think of this 
>> sooner! Thanks a lot, this makes the situation make much more sense.
>> 
>> I will gather some proper data with your suggestion and get back to the 
>> thread shortly.
>> 
>> Thanks!!
>> 
>> Tim
>> 
>> On 04/12/13 02:57 PM, Chris Hostetter wrote:
>>> :
>>> : I may be incorrect here, but I assumed when querying a single core of a
>>> : SolrCloud collection, the SolrCloud routing is bypassed and I am talking
>>> : directly to a plain/non-SolrCloud core.
>>> 
>>> No ... every query received from a client by solr is handled by a single
>>> core -- if that core knows it's part of a SolrCloud collection then it
>>> will do a distributed search across a random replica from each shard in
>>> that collection.
>>> 
>>> If you want to bypass the distribute search logic, you have to say so
>>> explicitly...
>>> 
>>> To ask an arbitrary replica to only search itself add "distrib=false" to
>>> the request.
>>> 
>>> Alternatively: you can ask that only certain shard names (or certain
>>> explicit replicas) be included in a distribute request..
>>> 
>>> https://cwiki.apache.org/confluence/display/solr/Distributed+Requests
>>> 
>>> 
>>> 
>>> -Hoss
>>> http://www.lucidworks.com/



Re: Inconsistent numFound in SC when querying core directly

2013-12-04 Thread Tim Vaillancourt

Hey all,

Now that I am getting correct results with "distrib=false", I've 
identified that 1 of my nodes has just 1/3rd of the total data set and 
totally explains the flapping in results. The fix for this is obvious 
(rebuild replica) but the cause is less obvious.


There is definately more than one issue going on with this SolrCloud 
(but 1 down thanks to Chris' suggestion!), so I'm guessing the fact that 
/clusterstate.json doesn't seem to get updated when nodes are brought 
down/up is the reason why this replica remained in the distributed 
request chain without recovering/re-replicating from leader.


I imagine my Zookeeper ensemble is having some problems unrelated to 
Solr that is the real root cause.


Thanks!

Tim

On 04/12/13 03:00 PM, Tim Vaillancourt wrote:
Chris, this is extremely helpful and it's silly I didn't think of this 
sooner! Thanks a lot, this makes the situation make much more sense.


I will gather some proper data with your suggestion and get back to 
the thread shortly.


Thanks!!

Tim

On 04/12/13 02:57 PM, Chris Hostetter wrote:

:
: I may be incorrect here, but I assumed when querying a single core 
of a
: SolrCloud collection, the SolrCloud routing is bypassed and I am 
talking

: directly to a plain/non-SolrCloud core.

No ... every query received from a client by solr is handled by a single
core -- if that core knows it's part of a SolrCloud collection then it
will do a distributed search across a random replica from each shard in
that collection.

If you want to bypass the distribute search logic, you have to say so
explicitly...

To ask an arbitrary replica to only search itself add "distrib=false" to
the request.

Alternatively: you can ask that only certain shard names (or certain
explicit replicas) be included in a distribute request..

https://cwiki.apache.org/confluence/display/solr/Distributed+Requests



-Hoss
http://www.lucidworks.com/


Re: SOLR Master-Slave Repeater with Load balancer

2013-12-04 Thread kondamudims
Hi Erick,
Thanks a lot for your explanation. We initially considered Solr Cloud but we
have limitation on the number of servers that we can use due to budget
concerns (limit is 2) Solr Cloud requires minimum 3. I have tried out the
solution you suggested and so far its going well and we are not doing self
polling concept.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/SOLR-Master-Slave-Repeater-with-Load-balancer-tp4103363p4105017.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Inconsistent numFound in SC when querying core directly

2013-12-04 Thread Tim Vaillancourt
Chris, this is extremely helpful and it's silly I didn't think of this 
sooner! Thanks a lot, this makes the situation make much more sense.


I will gather some proper data with your suggestion and get back to the 
thread shortly.


Thanks!!

Tim

On 04/12/13 02:57 PM, Chris Hostetter wrote:

:
: I may be incorrect here, but I assumed when querying a single core of a
: SolrCloud collection, the SolrCloud routing is bypassed and I am talking
: directly to a plain/non-SolrCloud core.

No ... every query received from a client by solr is handled by a single
core -- if that core knows it's part of a SolrCloud collection then it
will do a distributed search across a random replica from each shard in
that collection.

If you want to bypass the distribute search logic, you have to say so
explicitly...

To ask an arbitrary replica to only search itself add "distrib=false" to
the request.

Alternatively: you can ask that only certain shard names (or certain
explicit replicas) be included in a distribute request..

https://cwiki.apache.org/confluence/display/solr/Distributed+Requests



-Hoss
http://www.lucidworks.com/


Re: Inconsistent numFound in SC when querying core directly

2013-12-04 Thread Chris Hostetter
: 
: I may be incorrect here, but I assumed when querying a single core of a
: SolrCloud collection, the SolrCloud routing is bypassed and I am talking
: directly to a plain/non-SolrCloud core.

No ... every query received from a client by solr is handled by a single 
core -- if that core knows it's part of a SolrCloud collection then it 
will do a distributed search across a random replica from each shard in 
that collection.

If you want to bypass the distribute search logic, you have to say so 
explicitly...

To ask an arbitrary replica to only search itself add "distrib=false" to 
the request.

Alternatively: you can ask that only certain shard names (or certain 
explicit replicas) be included in a distribute request..

https://cwiki.apache.org/confluence/display/solr/Distributed+Requests



-Hoss
http://www.lucidworks.com/


Re: Inconsistent numFound in SC when querying core directly

2013-12-04 Thread Tim Vaillancourt

Thanks Markus,

I'm not sure if I'm encountering the same issue. This JIRA mentions 10s 
of docs difference, I'm seeing differences in the multi-millions of 
docs, and even more strangely it very predictably flaps between a 123M 
value and an 87M value, a 30M+ doc difference.


Secondly, I'm not comparing values from 2 instances (Leader to Replica), 
I'm currently performing the same curl call to the same core directly 
and am seeing flapping results each time I perform the query, so this is 
currently happening within a single instance/core unless I am 
misunderstanding how to directly query a core.


Cheers,

Tim

On 04/12/13 02:46 PM, Markus Jelsma wrote:

https://issues.apache.org/jira/browse/SOLR-4260

Join the club Tim! Can you upgrade to trunk or incorporate the latest patches 
of related issues? You can fix it by trashing the bad node's data, although 
without multiple clusters it may be difficult to decide which node is bad.

We use the latest commits now (since tuesday) and are still waiting for it to 
happen again.

-Original message-

From:Tim Vaillancourt
Sent: Wednesday 4th December 2013 23:38
To: solr-user@lucene.apache.org
Subject: Re: Inconsistent numFound in SC when querying core directly

To add two more pieces of data:

1) This occurs with real, conditional queries as well (eg:
"q=key:timvaillancourt"), not just the "q=*:*" I provided in my email.
2) I've noticed when I bring a node of the SolrCloud down it is
remaining "state: active" in my /clusterstate.json - something is really
wrong with this cloud! Would a Zookeeper issue explain my varied results
when querying a core directly?

Thanks again!

Tim

On 04/12/13 02:17 PM, Tim Vaillancourt wrote:

Hey guys,

I'm looking into a strange issue on an unhealthy 4.3.1 SolrCloud with
3-node external Zookeeper and 1 collection (2 shards, 2 replicas).

Currently we are noticing inconsistent results from the SolrCloud when
performing the same simple /select query many times to our collection.
Almost every other query the numFound count (and the returned data)
jumps between two very different values.

Initially I suspected a replica in a shard of the collection was
inconsistent (and every other request hit that node) and started
performing the same /select query direct to the individual cores of
the SolrCloud collection on each instance, only to notice the same
problem - the count jumps between two very different values!

I may be incorrect here, but I assumed when querying a single core of
a SolrCloud collection, the SolrCloud routing is bypassed and I am
talking directly to a plain/non-SolrCloud core.

As you can see here, the count for 1 core of my SolrCloud collection
fluctuates wildly, and is only receiving updates and no deletes to
explain the jumps:

"solrcloud [tvaillancourt@prodapp solr_cloud]$ curl -s
'http://backend:8983/solr/app_shard2_replica2/select?q=*:*&wt=json&rows=0&indent=true'|grep
numFound
   "response":{"numFound":123596839,"start":0,"maxScore":1.0,"docs":[]

solrcloud [tvaillancourt@prodapp solr_cloud]$ curl -s
'http://backend:8983/solr/app_shard2_replica2/select?q=*:*&wt=json&rows=0&indent=true'|grep
numFound
   "response":{"numFound":84739144,"start":0,"maxScore":1.0,"docs":[]

solrcloud [tvaillancourt@prodapp solr_cloud]$ curl -s
'http://backend:8983/solr/app_shard2_replica2/select?q=*:*&wt=json&rows=0&indent=true'|grep
numFound
   "response":{"numFound":123596839,"start":0,"maxScore":1.0,"docs":[]

solrcloud [tvaillancourt@prodapp solr_cloud]$ curl -s
'http://backend:8983/solr/app_shard2_replica2/select?q=*:*&wt=json&rows=0&indent=true'|grep
numFound
   "response":{"numFound":84771358,"start":0,"maxScore":1.0,"docs":[]"


Could anyone help me understand why the same /select query direct to a
single core would return inconsistent, flapping results if there are
no deletes issued in my app to cause such jumps? Am I incorrect in my
assumption that I am querying the core "directly"?

An interesting observation is when I do an /admin/cores call to see
the docCount of the core's index, it does not fluctuate, only the
query result.

That was hard to explain, hopefully someone has some insight! :)

Thanks!

Tim


RE: Inconsistent numFound in SC when querying core directly

2013-12-04 Thread Markus Jelsma
https://issues.apache.org/jira/browse/SOLR-4260

Join the club Tim! Can you upgrade to trunk or incorporate the latest patches 
of related issues? You can fix it by trashing the bad node's data, although 
without multiple clusters it may be difficult to decide which node is bad.

We use the latest commits now (since tuesday) and are still waiting for it to 
happen again.

-Original message-
> From:Tim Vaillancourt 
> Sent: Wednesday 4th December 2013 23:38
> To: solr-user@lucene.apache.org
> Subject: Re: Inconsistent numFound in SC when querying core directly
> 
> To add two more pieces of data:
> 
> 1) This occurs with real, conditional queries as well (eg: 
> "q=key:timvaillancourt"), not just the "q=*:*" I provided in my email.
> 2) I've noticed when I bring a node of the SolrCloud down it is 
> remaining "state: active" in my /clusterstate.json - something is really 
> wrong with this cloud! Would a Zookeeper issue explain my varied results 
> when querying a core directly?
> 
> Thanks again!
> 
> Tim
> 
> On 04/12/13 02:17 PM, Tim Vaillancourt wrote:
> > Hey guys,
> >
> > I'm looking into a strange issue on an unhealthy 4.3.1 SolrCloud with 
> > 3-node external Zookeeper and 1 collection (2 shards, 2 replicas).
> >
> > Currently we are noticing inconsistent results from the SolrCloud when 
> > performing the same simple /select query many times to our collection. 
> > Almost every other query the numFound count (and the returned data) 
> > jumps between two very different values.
> >
> > Initially I suspected a replica in a shard of the collection was 
> > inconsistent (and every other request hit that node) and started 
> > performing the same /select query direct to the individual cores of 
> > the SolrCloud collection on each instance, only to notice the same 
> > problem - the count jumps between two very different values!
> >
> > I may be incorrect here, but I assumed when querying a single core of 
> > a SolrCloud collection, the SolrCloud routing is bypassed and I am 
> > talking directly to a plain/non-SolrCloud core.
> >
> > As you can see here, the count for 1 core of my SolrCloud collection 
> > fluctuates wildly, and is only receiving updates and no deletes to 
> > explain the jumps:
> >
> > "solrcloud [tvaillancourt@prodapp solr_cloud]$ curl -s 
> > 'http://backend:8983/solr/app_shard2_replica2/select?q=*:*&wt=json&rows=0&indent=true'|grep
> >  
> > numFound
> >   "response":{"numFound":123596839,"start":0,"maxScore":1.0,"docs":[]
> >
> > solrcloud [tvaillancourt@prodapp solr_cloud]$ curl -s 
> > 'http://backend:8983/solr/app_shard2_replica2/select?q=*:*&wt=json&rows=0&indent=true'|grep
> >  
> > numFound
> >   "response":{"numFound":84739144,"start":0,"maxScore":1.0,"docs":[]
> >
> > solrcloud [tvaillancourt@prodapp solr_cloud]$ curl -s 
> > 'http://backend:8983/solr/app_shard2_replica2/select?q=*:*&wt=json&rows=0&indent=true'|grep
> >  
> > numFound
> >   "response":{"numFound":123596839,"start":0,"maxScore":1.0,"docs":[]
> >
> > solrcloud [tvaillancourt@prodapp solr_cloud]$ curl -s 
> > 'http://backend:8983/solr/app_shard2_replica2/select?q=*:*&wt=json&rows=0&indent=true'|grep
> >  
> > numFound
> >   "response":{"numFound":84771358,"start":0,"maxScore":1.0,"docs":[]"
> >
> >
> > Could anyone help me understand why the same /select query direct to a 
> > single core would return inconsistent, flapping results if there are 
> > no deletes issued in my app to cause such jumps? Am I incorrect in my 
> > assumption that I am querying the core "directly"?
> >
> > An interesting observation is when I do an /admin/cores call to see 
> > the docCount of the core's index, it does not fluctuate, only the 
> > query result.
> >
> > That was hard to explain, hopefully someone has some insight! :)
> >
> > Thanks!
> >
> > Tim
> 


Re: Inconsistent numFound in SC when querying core directly

2013-12-04 Thread Tim Vaillancourt

To add two more pieces of data:

1) This occurs with real, conditional queries as well (eg: 
"q=key:timvaillancourt"), not just the "q=*:*" I provided in my email.
2) I've noticed when I bring a node of the SolrCloud down it is 
remaining "state: active" in my /clusterstate.json - something is really 
wrong with this cloud! Would a Zookeeper issue explain my varied results 
when querying a core directly?


Thanks again!

Tim

On 04/12/13 02:17 PM, Tim Vaillancourt wrote:

Hey guys,

I'm looking into a strange issue on an unhealthy 4.3.1 SolrCloud with 
3-node external Zookeeper and 1 collection (2 shards, 2 replicas).


Currently we are noticing inconsistent results from the SolrCloud when 
performing the same simple /select query many times to our collection. 
Almost every other query the numFound count (and the returned data) 
jumps between two very different values.


Initially I suspected a replica in a shard of the collection was 
inconsistent (and every other request hit that node) and started 
performing the same /select query direct to the individual cores of 
the SolrCloud collection on each instance, only to notice the same 
problem - the count jumps between two very different values!


I may be incorrect here, but I assumed when querying a single core of 
a SolrCloud collection, the SolrCloud routing is bypassed and I am 
talking directly to a plain/non-SolrCloud core.


As you can see here, the count for 1 core of my SolrCloud collection 
fluctuates wildly, and is only receiving updates and no deletes to 
explain the jumps:


"solrcloud [tvaillancourt@prodapp solr_cloud]$ curl -s 
'http://backend:8983/solr/app_shard2_replica2/select?q=*:*&wt=json&rows=0&indent=true'|grep 
numFound

  "response":{"numFound":123596839,"start":0,"maxScore":1.0,"docs":[]

solrcloud [tvaillancourt@prodapp solr_cloud]$ curl -s 
'http://backend:8983/solr/app_shard2_replica2/select?q=*:*&wt=json&rows=0&indent=true'|grep 
numFound

  "response":{"numFound":84739144,"start":0,"maxScore":1.0,"docs":[]

solrcloud [tvaillancourt@prodapp solr_cloud]$ curl -s 
'http://backend:8983/solr/app_shard2_replica2/select?q=*:*&wt=json&rows=0&indent=true'|grep 
numFound

  "response":{"numFound":123596839,"start":0,"maxScore":1.0,"docs":[]

solrcloud [tvaillancourt@prodapp solr_cloud]$ curl -s 
'http://backend:8983/solr/app_shard2_replica2/select?q=*:*&wt=json&rows=0&indent=true'|grep 
numFound

  "response":{"numFound":84771358,"start":0,"maxScore":1.0,"docs":[]"


Could anyone help me understand why the same /select query direct to a 
single core would return inconsistent, flapping results if there are 
no deletes issued in my app to cause such jumps? Am I incorrect in my 
assumption that I am querying the core "directly"?


An interesting observation is when I do an /admin/cores call to see 
the docCount of the core's index, it does not fluctuate, only the 
query result.


That was hard to explain, hopefully someone has some insight! :)

Thanks!

Tim


Re: Shouldn't fuzzy version of a solr query always return a super set of its not-fuzzy equivalent

2013-12-04 Thread Mhd Wrk
I'm using snowball stemmer and, you are correct, swimming has been stored
as swim.

Should I wrap snowball filter in a multiterm analyzer?

Thanks
 On Dec 4, 2013 2:02 PM, "Jack Krupansky"  wrote:

> Ah... although the lower case filtering does get applied properly in a
> "multiterm" analysis scenario, stemming does not. What stemmer are you
> using? I suspect that "swimming" normally becomes "swim". Compare the debug
> output of the two queries.
>
> -- Jack Krupansky
>
> -Original Message- From: Mhd Wrk
> Sent: Wednesday, December 04, 2013 2:08 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Shouldn't fuzzy version of a solr query always return a super
> set of its not-fuzzy equivalent
>
> Debug shows that all terms are lowercased properly.
>
> Thanks
> On Dec 4, 2013 3:18 AM, "Erik Hatcher"  wrote:
>
>  Chances are you're not getting those fuzzy terms analyzed as you'd like.
>>  See debug (&debug=true) output to be sure.  Most likely the fuzzy terms
>> are not being lowercased.  See
>> http://wiki.apache.org/solr/MultitermQueryAnalysis for more details (this
>> applies to fuzzy, not just wildcard) terms too.
>>
>> Erik
>>
>>
>> On Dec 4, 2013, at 4:46 AM, Mhd Wrk  wrote:
>>
>> > I'm using the following query to do a fuzzy search on Solr 4.5.1 and am
>> > getting empty result.
>> >
>> > qt=standard&q=+(field1|en_CA|:Swimming~2 field1|en|:Swimming~2)
>> > +(field1|en_CA|:Goggle~1 field1|en|:Goggle~1) +(+startDate:[* TO
>> > 2013-12-04T00:23:00Z] -endDate:[* TO
>> > 2013-12-04T00:23:00Z])&start=0&rows=10&fl=id
>> >
>> > If I change it to a not fuzzy query by simply dropping tildes from the
>> > terms (see below) then it returns the expected result! Is this a bug?
>> > Shouldn't fuzzy version of a query always return a super set of its
>> > not-fuzzy equivalent?
>> >
>> > qt=standard&q=+(field1|en_CA|:Swimming field1|en|:Swimming)
>> > +(field1|en_CA|:Goggle field1|en|:Goggle) +(+startDate:[* TO
>> > 2013-12-04T00:23:00Z] -endDate:[* TO
>> > 2013-12-04T00:23:00Z])&start=0&rows=10&fl=id
>>
>>
>>
>


Inconsistent numFound in SC when querying core directly

2013-12-04 Thread Tim Vaillancourt

Hey guys,

I'm looking into a strange issue on an unhealthy 4.3.1 SolrCloud with 
3-node external Zookeeper and 1 collection (2 shards, 2 replicas).


Currently we are noticing inconsistent results from the SolrCloud when 
performing the same simple /select query many times to our collection. 
Almost every other query the numFound count (and the returned data) 
jumps between two very different values.


Initially I suspected a replica in a shard of the collection was 
inconsistent (and every other request hit that node) and started 
performing the same /select query direct to the individual cores of the 
SolrCloud collection on each instance, only to notice the same problem - 
the count jumps between two very different values!


I may be incorrect here, but I assumed when querying a single core of a 
SolrCloud collection, the SolrCloud routing is bypassed and I am talking 
directly to a plain/non-SolrCloud core.


As you can see here, the count for 1 core of my SolrCloud collection 
fluctuates wildly, and is only receiving updates and no deletes to 
explain the jumps:


"solrcloud [tvaillancourt@prodapp solr_cloud]$ curl -s 
'http://backend:8983/solr/app_shard2_replica2/select?q=*:*&wt=json&rows=0&indent=true'|grep 
numFound

  "response":{"numFound":123596839,"start":0,"maxScore":1.0,"docs":[]

solrcloud [tvaillancourt@prodapp solr_cloud]$ curl -s 
'http://backend:8983/solr/app_shard2_replica2/select?q=*:*&wt=json&rows=0&indent=true'|grep 
numFound

  "response":{"numFound":84739144,"start":0,"maxScore":1.0,"docs":[]

solrcloud [tvaillancourt@prodapp solr_cloud]$ curl -s 
'http://backend:8983/solr/app_shard2_replica2/select?q=*:*&wt=json&rows=0&indent=true'|grep 
numFound

  "response":{"numFound":123596839,"start":0,"maxScore":1.0,"docs":[]

solrcloud [tvaillancourt@prodapp solr_cloud]$ curl -s 
'http://backend:8983/solr/app_shard2_replica2/select?q=*:*&wt=json&rows=0&indent=true'|grep 
numFound

  "response":{"numFound":84771358,"start":0,"maxScore":1.0,"docs":[]"


Could anyone help me understand why the same /select query direct to a 
single core would return inconsistent, flapping results if there are no 
deletes issued in my app to cause such jumps? Am I incorrect in my 
assumption that I am querying the core "directly"?


An interesting observation is when I do an /admin/cores call to see the 
docCount of the core's index, it does not fluctuate, only the query result.


That was hard to explain, hopefully someone has some insight! :)

Thanks!

Tim


Re: a core for every user, lots of users... are there issues

2013-12-04 Thread hank williams
Oh my... when you say "I don't know anyone who's combined the two." do you
mean that those that have tried have failed or that no one has gotten
around to trying? It sounds like you are saying you have some specific
knowledge that right now these wont work, otherwise you wouldnt say "committers
will be addressing this sometime soon", right?

I'm worried as we need to make a practical decision here and it sounds like
maybe we should stick with solr for now... is that what you are saying?


On Wed, Dec 4, 2013 at 5:01 PM, Erick Erickson wrote:

> Hank:
>
> I should add that lots of cores and SolrCloud aren't guaranteed to play
> nice together. I think some of the committers will be addressing this
> sometime soon.
>
> I'm not saying that this will certainly fail, OTOH I don't know anyone
> who's combined the two.
>
> Erick
>
>
> On Wed, Dec 4, 2013 at 3:18 PM, hank williams  wrote:
>
> > Super helpful. Thanks.
> >
> >
> > On Wed, Dec 4, 2013 at 2:53 PM, Shawn Heisey  wrote:
> >
> > > On 12/4/2013 12:34 PM, hank williams wrote:
> > >
> > >> Ok one more simple question. We just upgraded to 4.6 from 4.2. In 4.2
> we
> > >> were *trying* to use the rest API function "create" to create cores
> > >> without
> > >> having to manually mess with files on the server. Is this what
> "create"
> > >> was
> > >> supposed to do? If so it was borken or we werent using it right. In
> any
> > >> case in 4.6 is that the right way to programmatically add cores in
> > >> discovery mode?
> > >>
> > >
> > > If you are NOT in SolrCloud mode, in order to create new cores, the
> > config
> > > files need to already exist on the disk.  This is the case with all
> > > versions of Solr.
> > >
> > > If you're running in SolrCloud mode, the core is associated with a
> > > collection.  Collections have a link to aconfig in zookeeper.  The
> config
> > > is not stored with the core on the disk.
> > >
> > > Thanks,
> > > Shawn
> > >
> > >
> >
> >
> > --
> > blog: whydoeseverythingsuck.com
> >
>



-- 
blog: whydoeseverythingsuck.com


Re: Shouldn't fuzzy version of a solr query always return a super set of its not-fuzzy equivalent

2013-12-04 Thread Jack Krupansky
Ah... although the lower case filtering does get applied properly in a 
"multiterm" analysis scenario, stemming does not. What stemmer are you 
using? I suspect that "swimming" normally becomes "swim". Compare the debug 
output of the two queries.


-- Jack Krupansky

-Original Message- 
From: Mhd Wrk

Sent: Wednesday, December 04, 2013 2:08 PM
To: solr-user@lucene.apache.org
Subject: Re: Shouldn't fuzzy version of a solr query always return a super 
set of its not-fuzzy equivalent


Debug shows that all terms are lowercased properly.

Thanks
On Dec 4, 2013 3:18 AM, "Erik Hatcher"  wrote:


Chances are you're not getting those fuzzy terms analyzed as you'd like.
 See debug (&debug=true) output to be sure.  Most likely the fuzzy terms
are not being lowercased.  See
http://wiki.apache.org/solr/MultitermQueryAnalysis for more details (this
applies to fuzzy, not just wildcard) terms too.

Erik


On Dec 4, 2013, at 4:46 AM, Mhd Wrk  wrote:

> I'm using the following query to do a fuzzy search on Solr 4.5.1 and am
> getting empty result.
>
> qt=standard&q=+(field1|en_CA|:Swimming~2 field1|en|:Swimming~2)
> +(field1|en_CA|:Goggle~1 field1|en|:Goggle~1) +(+startDate:[* TO
> 2013-12-04T00:23:00Z] -endDate:[* TO
> 2013-12-04T00:23:00Z])&start=0&rows=10&fl=id
>
> If I change it to a not fuzzy query by simply dropping tildes from the
> terms (see below) then it returns the expected result! Is this a bug?
> Shouldn't fuzzy version of a query always return a super set of its
> not-fuzzy equivalent?
>
> qt=standard&q=+(field1|en_CA|:Swimming field1|en|:Swimming)
> +(field1|en_CA|:Goggle field1|en|:Goggle) +(+startDate:[* TO
> 2013-12-04T00:23:00Z] -endDate:[* TO
> 2013-12-04T00:23:00Z])&start=0&rows=10&fl=id






Re: a core for every user, lots of users... are there issues

2013-12-04 Thread Erick Erickson
Hank:

I should add that lots of cores and SolrCloud aren't guaranteed to play
nice together. I think some of the committers will be addressing this
sometime soon.

I'm not saying that this will certainly fail, OTOH I don't know anyone
who's combined the two.

Erick


On Wed, Dec 4, 2013 at 3:18 PM, hank williams  wrote:

> Super helpful. Thanks.
>
>
> On Wed, Dec 4, 2013 at 2:53 PM, Shawn Heisey  wrote:
>
> > On 12/4/2013 12:34 PM, hank williams wrote:
> >
> >> Ok one more simple question. We just upgraded to 4.6 from 4.2. In 4.2 we
> >> were *trying* to use the rest API function "create" to create cores
> >> without
> >> having to manually mess with files on the server. Is this what "create"
> >> was
> >> supposed to do? If so it was borken or we werent using it right. In any
> >> case in 4.6 is that the right way to programmatically add cores in
> >> discovery mode?
> >>
> >
> > If you are NOT in SolrCloud mode, in order to create new cores, the
> config
> > files need to already exist on the disk.  This is the case with all
> > versions of Solr.
> >
> > If you're running in SolrCloud mode, the core is associated with a
> > collection.  Collections have a link to aconfig in zookeeper.  The config
> > is not stored with the core on the disk.
> >
> > Thanks,
> > Shawn
> >
> >
>
>
> --
> blog: whydoeseverythingsuck.com
>


Re: Solr Stalls on Bulk indexing, no logs or errors

2013-12-04 Thread Erick Erickson
Wait, crashes? Or just stops accepting updates?

At any rate, this should be fixed in 4.6. If you
can dump a stack trace, we can identify whether this
is the same issue quickly. jstack is popular.

If you're still having queries served, it's probably
not your commit settings, try searching the
JIRA list for "distributed deadlock". You should
find two JIRAs, one relevant to SolrJ by Joel
Bernstein (probably not one you are about) and
one by Mark Miller that address this.

Best,
Erick


On Wed, Dec 4, 2013 at 3:19 PM, steven crichton wrote:

> Yes I can continue to query after this importer goes down and whilst it
> running.
>
> The bulk commit is done via a JSON handler in php. There is 121,000
> records that need to go into the index. So this is done in 5000 chunked
> mySQL retrieve calls and parsing to the data as required.
>
> workflow:
>
> get record
> create {add doc… } JSON
> Post to CORE/update/json
>
>
> I stopped doing a hard commit every 1000 records. To see if that was an
> issue.
>
>
> the auto commit settings are ::
>
> 
>   ${solr.autoCommit.MaxDocs:5000}
>   ${solr.autoCommit.MaxTime:24000}
> 
>
>
> I’ve pretty much worked out of the drupal schemas for SOLR 4
> https://drupal.org/project/apachesolr
>
> At one point I thought it could be malformed data, but even reducing the
> records down to just the id and title now .. it crashes at the same point.
> As in the query still works but the import handler does nothing at all
>
>
> Tomcat logs seem to indicate no major issues.
>
>
> There’s not a strange variable that is set to make an upper index limit is
> there?
>
> Regards,
> Steven
>
>
>
> On 4 Dec 2013, at 20:02, Erick Erickson [via Lucene] <
> ml-node+s472066n4104984...@n3.nabble.com> wrote:
>
> > There's a known issue with SolrCloud with multiple shards, but
> > you haven't told us whether you're using that. The test for
> > whether you're running in to that is whether you can continue
> > to _query_, just not update.
> >
> > But you need to tell us more about our setup. In particular
> > hour commit settings (hard and soft), your solrconfig settings,
> > particularly around autowarming, how you're "bulk indexing",
> > SolrJ? DIH? a huge CSV file?
> >
> > Best,
> > Erick
> >
> >
> > On Wed, Dec 4, 2013 at 2:30 PM, steven crichton <[hidden email]>wrote:
> >
> > > I am finding with a bulk index using SOLR 4.3 on Tomcat, that when I
> reach
> > > 69578 records the server stops adding anything more.
> > >
> > > I've tried reducing the data sent to the bare minimum of fields and
> using
> > > ASC and DESC data to see if it could be a field issue.
> > >
> > > Is there anything I could look at for this? As I'm not finding anything
> > > similar noted before. Does tomcat have issues with closing connections
> that
> > > look like DDOS attacks? Or could it be related to too many commits in
> too
> > > short a time?
> > >
> > > Any help will be very greatly appreciated.
> > >
> > >
> > >
> > > --
> > > View this message in context:
> > >
> http://lucene.472066.n3.nabble.com/Solr-Stalls-on-Bulk-indexing-no-logs-or-errors-tp4104981.html
> > > Sent from the Solr - User mailing list archive at Nabble.com.
> > >
> >
> >
> > If you reply to this email, your message will be added to the discussion
> below:
> >
> http://lucene.472066.n3.nabble.com/Solr-Stalls-on-Bulk-indexing-no-logs-or-errors-tp4104981p4104984.html
> > To unsubscribe from Solr Stalls on Bulk indexing, no logs or errors,
> click here.
> > NAML
>
>
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Solr-Stalls-on-Bulk-indexing-no-logs-or-errors-tp4104981p4104990.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: starting up solr automatically

2013-12-04 Thread Eric Palmer
thanks greg

I got it starting but the collection file is not avail. I will use the
script that you gave the url for and the env settings. Thanks


On Wed, Dec 4, 2013 at 4:26 PM, Greg Walters wrote:

> I almost forgot, you'll need a file to setup the environment a bit too:
>
> **
> JAVA_HOME=/usr/java/default
> JAVA_OPTIONS="-Xmx15g \
> -Xms15g \
> -XX:+PrintGCApplicationStoppedTime \
> -XX:+PrintGCDateStamps \
> -XX:+PrintGCDetails \
> -XX:+UseConcMarkSweepGC \
> -XX:+UseParNewGC \
> -XX:+UseTLAB \
> -XX:+CMSParallelRemarkEnabled \
> -XX:+CMSScavengeBeforeRemark \
> -XX:+UseCMSInitiatingOccupancyOnly \
> -XX:CMSInitiatingOccupancyFraction=50 \
> -XX:CMSWaitDuration=30 \
> -XX:GCTimeRatio=40 \
> -Xloggc:/tmp/solr45_gc.log \
> -Dbootstrap_conf=true \
> -Dbootstrap_confdir=/var/lib/answers/atlascloud/solr45/solr/wa-en-collection_1/conf/
> \
> -Dcollection.configName=wa-en-collection \
> -DzkHost= \
> -DnumShards= \
> -Dsolr.solr.home=/var/lib/answers/atlascloud/solr45/solr/ \
> -Dlog4j.configuration=file:///var/lib/answers/atlascloud/solr45/resources/log4j.properties
> \
> -Djetty.port=9101 \
> $JAVA_OPTIONS"
> JETTY_HOME=/var/lib/answers/atlascloud/solr45/
> JETTY_USER=tomcat
> JETTY_LOGS=/var/lib/answers/atlascloud/solr45/logs
> **
>
> On Dec 4, 2013, at 3:21 PM, Greg Walters  wrote:
>
> > I found the instructions and scripts on that page to be unclear and/or
> not work. Here's the script I've been using for solr 4.5.1:
> https://gist.github.com/gregwalters/7795791 Do note that you'll have to
> change a couple of paths to get things working correctly.
> >
> > Thanks,
> > Greg
> >
> > On Dec 4, 2013, at 3:15 PM, Eric Palmer  wrote:
> >
> >> Hey all,
> >>
> >> I'm pretty new to solr.  I'm installing it on an amazon linux (rpm
> based)
> >> ec2 instance and have it running. I even have nutch feeding it pages
> from
> >> a crawl. I'm very happy about that.
> >>
> >> I want solr to start on a reboot and am following the instructions at
> >> http://wiki.apache.org/solr/SolrJetty#Starting
> >>
> >> I'm using solr 4.5.1 and when I check the jetty version I get this
> >>
> >> java -jar start.jar --version
> >> Active Options: [default, *]
> >> Version Information on 17 entries in the classpath.
> >> Note: order presented here is how they would appear on the classpath.
> >> changes to the OPTIONS=[option,option,...] command line option will
> >> be reflected here.
> >> 0:(dir) | ${jetty.home}/resources
> >> 1: 8.1.10.v20130312 |
> ${jetty.home}/lib/jetty-xml-8.1.10.v20130312.jar
> >> 2:  3.0.0.v201112011016 | ${jetty.home}/lib/servlet-api-3.0.jar
> >> 3: 8.1.10.v20130312 |
> ${jetty.home}/lib/jetty-http-8.1.10.v20130312.jar
> >> 4: 8.1.10.v20130312 |
> >> ${jetty.home}/lib/jetty-continuation-8.1.10.v20130312.jar
> >> 5: 8.1.10.v20130312 |
> >> ${jetty.home}/lib/jetty-server-8.1.10.v20130312.jar
> >> 6: 8.1.10.v20130312 |
> >> ${jetty.home}/lib/jetty-security-8.1.10.v20130312.jar
> >> 7: 8.1.10.v20130312 |
> >> ${jetty.home}/lib/jetty-servlet-8.1.10.v20130312.jar
> >> 8: 8.1.10.v20130312 |
> >> ${jetty.home}/lib/jetty-webapp-8.1.10.v20130312.jar
> >> 9: 8.1.10.v20130312 |
> >> ${jetty.home}/lib/jetty-deploy-8.1.10.v20130312.jar
> >> 10:1.6.6 |
> ${jetty.home}/lib/ext/jcl-over-slf4j-1.6.6.jar
> >> 11:1.6.6 | ${jetty.home}/lib/ext/jul-to-slf4j-1.6.6.jar
> >> 12:   1.2.16 | ${jetty.home}/lib/ext/log4j-1.2.16.jar
> >> 13:1.6.6 | ${jetty.home}/lib/ext/slf4j-api-1.6.6.jar
> >> 14:1.6.6 | ${jetty.home}/lib/ext/slf4j-log4j12-1.6.6.jar
> >> 15: 8.1.10.v20130312 |
> ${jetty.home}/lib/jetty-util-8.1.10.v20130312.jar
> >> 16: 8.1.10.v20130312 |
> ${jetty.home}/lib/jetty-io-8.1.10.v20130312.jar
> >>
> >> the instructions reference a jetty.sh script for version 6 and a
> different
> >> one for 7. Does the version 7 one work with jetty 8? If not where can I
> get
> >> the one for version 8?
> >>
> >> BTW - this is just the standard install of solr from the gzip file.
> >>
> >> thanks in advance for your help.
> >>
> >> --
> >> Eric Palmer
> >> U of Richmond
> >
>
>


-- 
Eric Palmer


Re: starting up solr automatically

2013-12-04 Thread Greg Walters
I almost forgot, you'll need a file to setup the environment a bit too:

**
JAVA_HOME=/usr/java/default
JAVA_OPTIONS="-Xmx15g \
-Xms15g \
-XX:+PrintGCApplicationStoppedTime \
-XX:+PrintGCDateStamps \
-XX:+PrintGCDetails \
-XX:+UseConcMarkSweepGC \
-XX:+UseParNewGC \
-XX:+UseTLAB \
-XX:+CMSParallelRemarkEnabled \
-XX:+CMSScavengeBeforeRemark \
-XX:+UseCMSInitiatingOccupancyOnly \
-XX:CMSInitiatingOccupancyFraction=50 \
-XX:CMSWaitDuration=30 \
-XX:GCTimeRatio=40 \
-Xloggc:/tmp/solr45_gc.log \
-Dbootstrap_conf=true \
-Dbootstrap_confdir=/var/lib/answers/atlascloud/solr45/solr/wa-en-collection_1/conf/
 \
-Dcollection.configName=wa-en-collection \
-DzkHost= \
-DnumShards= \
-Dsolr.solr.home=/var/lib/answers/atlascloud/solr45/solr/ \
-Dlog4j.configuration=file:///var/lib/answers/atlascloud/solr45/resources/log4j.properties
 \
-Djetty.port=9101 \
$JAVA_OPTIONS"
JETTY_HOME=/var/lib/answers/atlascloud/solr45/
JETTY_USER=tomcat
JETTY_LOGS=/var/lib/answers/atlascloud/solr45/logs
**

On Dec 4, 2013, at 3:21 PM, Greg Walters  wrote:

> I found the instructions and scripts on that page to be unclear and/or not 
> work. Here's the script I've been using for solr 4.5.1: 
> https://gist.github.com/gregwalters/7795791 Do note that you'll have to 
> change a couple of paths to get things working correctly.
> 
> Thanks,
> Greg
> 
> On Dec 4, 2013, at 3:15 PM, Eric Palmer  wrote:
> 
>> Hey all,
>> 
>> I'm pretty new to solr.  I'm installing it on an amazon linux (rpm based)
>> ec2 instance and have it running. I even have nutch feeding it pages from
>> a crawl. I'm very happy about that.
>> 
>> I want solr to start on a reboot and am following the instructions at
>> http://wiki.apache.org/solr/SolrJetty#Starting
>> 
>> I'm using solr 4.5.1 and when I check the jetty version I get this
>> 
>> java -jar start.jar --version
>> Active Options: [default, *]
>> Version Information on 17 entries in the classpath.
>> Note: order presented here is how they would appear on the classpath.
>> changes to the OPTIONS=[option,option,...] command line option will
>> be reflected here.
>> 0:(dir) | ${jetty.home}/resources
>> 1: 8.1.10.v20130312 | ${jetty.home}/lib/jetty-xml-8.1.10.v20130312.jar
>> 2:  3.0.0.v201112011016 | ${jetty.home}/lib/servlet-api-3.0.jar
>> 3: 8.1.10.v20130312 | ${jetty.home}/lib/jetty-http-8.1.10.v20130312.jar
>> 4: 8.1.10.v20130312 |
>> ${jetty.home}/lib/jetty-continuation-8.1.10.v20130312.jar
>> 5: 8.1.10.v20130312 |
>> ${jetty.home}/lib/jetty-server-8.1.10.v20130312.jar
>> 6: 8.1.10.v20130312 |
>> ${jetty.home}/lib/jetty-security-8.1.10.v20130312.jar
>> 7: 8.1.10.v20130312 |
>> ${jetty.home}/lib/jetty-servlet-8.1.10.v20130312.jar
>> 8: 8.1.10.v20130312 |
>> ${jetty.home}/lib/jetty-webapp-8.1.10.v20130312.jar
>> 9: 8.1.10.v20130312 |
>> ${jetty.home}/lib/jetty-deploy-8.1.10.v20130312.jar
>> 10:1.6.6 | ${jetty.home}/lib/ext/jcl-over-slf4j-1.6.6.jar
>> 11:1.6.6 | ${jetty.home}/lib/ext/jul-to-slf4j-1.6.6.jar
>> 12:   1.2.16 | ${jetty.home}/lib/ext/log4j-1.2.16.jar
>> 13:1.6.6 | ${jetty.home}/lib/ext/slf4j-api-1.6.6.jar
>> 14:1.6.6 | ${jetty.home}/lib/ext/slf4j-log4j12-1.6.6.jar
>> 15: 8.1.10.v20130312 | ${jetty.home}/lib/jetty-util-8.1.10.v20130312.jar
>> 16: 8.1.10.v20130312 | ${jetty.home}/lib/jetty-io-8.1.10.v20130312.jar
>> 
>> the instructions reference a jetty.sh script for version 6 and a different
>> one for 7. Does the version 7 one work with jetty 8? If not where can I get
>> the one for version 8?
>> 
>> BTW - this is just the standard install of solr from the gzip file.
>> 
>> thanks in advance for your help.
>> 
>> -- 
>> Eric Palmer
>> U of Richmond
> 



Re: Querying for results

2013-12-04 Thread Rob Veliz
Follow-up: Would anyone very familiar with DIH be willing to jump on a side
thread with me and my developer to help troubleshoot some issues we're
having?  Please little r me at: robert [at] mavenbridge.com.  Thanks!




On Wed, Dec 4, 2013 at 1:14 PM, Rob Veliz  wrote:

> Hello,
>
> I am running Solr from Magento and using DIH to import/index data from 1
> other source (external).  I am trying to query for results...two questions:
>
> 1. The query I'm using runs against "fulltext_1_en" which is a specific
> shard created by the Magento deployment in solrconfig.xml.  Should I be
> using/querying from another field/store (e.g. not fulltext_1*) to get
> results from both Magento and the other data source?  How would I add the
> data from my DIH indexing to that specific shard so it was all in the same
> place?
>
> 2. OR do I need to add another shard to correspond to the DIH data
> elements?
>
> 3. OR is there something else I'm missing in trying to query for data from
> 2 sources?
>
> Thanks!
>
>
>


-- 
*Rob Veliz*, Founder | *Mavenbridge* | rob...@mavenbridge.com | M: +1 (206)
909 - 3490

Follow us at: http://twitter.com/mavenbridge


Re: starting up solr automatically

2013-12-04 Thread Greg Walters
I found the instructions and scripts on that page to be unclear and/or not 
work. Here's the script I've been using for solr 4.5.1: 
https://gist.github.com/gregwalters/7795791 Do note that you'll have to change 
a couple of paths to get things working correctly.

Thanks,
Greg

On Dec 4, 2013, at 3:15 PM, Eric Palmer  wrote:

> Hey all,
> 
> I'm pretty new to solr.  I'm installing it on an amazon linux (rpm based)
> ec2 instance and have it running. I even have nutch feeding it pages from
> a crawl. I'm very happy about that.
> 
> I want solr to start on a reboot and am following the instructions at
> http://wiki.apache.org/solr/SolrJetty#Starting
> 
> I'm using solr 4.5.1 and when I check the jetty version I get this
> 
> java -jar start.jar --version
> Active Options: [default, *]
> Version Information on 17 entries in the classpath.
> Note: order presented here is how they would appear on the classpath.
>  changes to the OPTIONS=[option,option,...] command line option will
> be reflected here.
> 0:(dir) | ${jetty.home}/resources
> 1: 8.1.10.v20130312 | ${jetty.home}/lib/jetty-xml-8.1.10.v20130312.jar
> 2:  3.0.0.v201112011016 | ${jetty.home}/lib/servlet-api-3.0.jar
> 3: 8.1.10.v20130312 | ${jetty.home}/lib/jetty-http-8.1.10.v20130312.jar
> 4: 8.1.10.v20130312 |
> ${jetty.home}/lib/jetty-continuation-8.1.10.v20130312.jar
> 5: 8.1.10.v20130312 |
> ${jetty.home}/lib/jetty-server-8.1.10.v20130312.jar
> 6: 8.1.10.v20130312 |
> ${jetty.home}/lib/jetty-security-8.1.10.v20130312.jar
> 7: 8.1.10.v20130312 |
> ${jetty.home}/lib/jetty-servlet-8.1.10.v20130312.jar
> 8: 8.1.10.v20130312 |
> ${jetty.home}/lib/jetty-webapp-8.1.10.v20130312.jar
> 9: 8.1.10.v20130312 |
> ${jetty.home}/lib/jetty-deploy-8.1.10.v20130312.jar
> 10:1.6.6 | ${jetty.home}/lib/ext/jcl-over-slf4j-1.6.6.jar
> 11:1.6.6 | ${jetty.home}/lib/ext/jul-to-slf4j-1.6.6.jar
> 12:   1.2.16 | ${jetty.home}/lib/ext/log4j-1.2.16.jar
> 13:1.6.6 | ${jetty.home}/lib/ext/slf4j-api-1.6.6.jar
> 14:1.6.6 | ${jetty.home}/lib/ext/slf4j-log4j12-1.6.6.jar
> 15: 8.1.10.v20130312 | ${jetty.home}/lib/jetty-util-8.1.10.v20130312.jar
> 16: 8.1.10.v20130312 | ${jetty.home}/lib/jetty-io-8.1.10.v20130312.jar
> 
> the instructions reference a jetty.sh script for version 6 and a different
> one for 7. Does the version 7 one work with jetty 8? If not where can I get
> the one for version 8?
> 
> BTW - this is just the standard install of solr from the gzip file.
> 
> thanks in advance for your help.
> 
> -- 
> Eric Palmer
> U of Richmond



starting up solr automatically

2013-12-04 Thread Eric Palmer
Hey all,

I'm pretty new to solr.  I'm installing it on an amazon linux (rpm based)
 ec2 instance and have it running. I even have nutch feeding it pages from
a crawl. I'm very happy about that.

I want solr to start on a reboot and am following the instructions at
http://wiki.apache.org/solr/SolrJetty#Starting

I'm using solr 4.5.1 and when I check the jetty version I get this

java -jar start.jar --version
Active Options: [default, *]
Version Information on 17 entries in the classpath.
Note: order presented here is how they would appear on the classpath.
  changes to the OPTIONS=[option,option,...] command line option will
be reflected here.
 0:(dir) | ${jetty.home}/resources
 1: 8.1.10.v20130312 | ${jetty.home}/lib/jetty-xml-8.1.10.v20130312.jar
 2:  3.0.0.v201112011016 | ${jetty.home}/lib/servlet-api-3.0.jar
 3: 8.1.10.v20130312 | ${jetty.home}/lib/jetty-http-8.1.10.v20130312.jar
 4: 8.1.10.v20130312 |
${jetty.home}/lib/jetty-continuation-8.1.10.v20130312.jar
 5: 8.1.10.v20130312 |
${jetty.home}/lib/jetty-server-8.1.10.v20130312.jar
 6: 8.1.10.v20130312 |
${jetty.home}/lib/jetty-security-8.1.10.v20130312.jar
 7: 8.1.10.v20130312 |
${jetty.home}/lib/jetty-servlet-8.1.10.v20130312.jar
 8: 8.1.10.v20130312 |
${jetty.home}/lib/jetty-webapp-8.1.10.v20130312.jar
 9: 8.1.10.v20130312 |
${jetty.home}/lib/jetty-deploy-8.1.10.v20130312.jar
10:1.6.6 | ${jetty.home}/lib/ext/jcl-over-slf4j-1.6.6.jar
11:1.6.6 | ${jetty.home}/lib/ext/jul-to-slf4j-1.6.6.jar
12:   1.2.16 | ${jetty.home}/lib/ext/log4j-1.2.16.jar
13:1.6.6 | ${jetty.home}/lib/ext/slf4j-api-1.6.6.jar
14:1.6.6 | ${jetty.home}/lib/ext/slf4j-log4j12-1.6.6.jar
15: 8.1.10.v20130312 | ${jetty.home}/lib/jetty-util-8.1.10.v20130312.jar
16: 8.1.10.v20130312 | ${jetty.home}/lib/jetty-io-8.1.10.v20130312.jar

the instructions reference a jetty.sh script for version 6 and a different
one for 7. Does the version 7 one work with jetty 8? If not where can I get
the one for version 8?

BTW - this is just the standard install of solr from the gzip file.

thanks in advance for your help.

-- 
Eric Palmer
U of Richmond


Querying for results

2013-12-04 Thread Rob Veliz
Hello,

I am running Solr from Magento and using DIH to import/index data from 1
other source (external).  I am trying to query for results...two questions:

1. The query I'm using runs against "fulltext_1_en" which is a specific
shard created by the Magento deployment in solrconfig.xml.  Should I be
using/querying from another field/store (e.g. not fulltext_1*) to get
results from both Magento and the other data source?  How would I add the
data from my DIH indexing to that specific shard so it was all in the same
place?

2. OR do I need to add another shard to correspond to the DIH data elements?

3. OR is there something else I'm missing in trying to query for data from
2 sources?

Thanks!


Re: Solr Stalls on Bulk indexing, no logs or errors

2013-12-04 Thread steven crichton
Yes I can continue to query after this importer goes down and whilst it running.

The bulk commit is done via a JSON handler in php. There is 121,000 records 
that need to go into the index. So this is done in 5000 chunked mySQL retrieve 
calls and parsing to the data as required. 

workflow:

get record
create {add doc… } JSON
Post to CORE/update/json


I stopped doing a hard commit every 1000 records. To see if that was an issue.


the auto commit settings are ::


  ${solr.autoCommit.MaxDocs:5000}
  ${solr.autoCommit.MaxTime:24000}



I’ve pretty much worked out of the drupal schemas for SOLR 4
https://drupal.org/project/apachesolr

At one point I thought it could be malformed data, but even reducing the 
records down to just the id and title now .. it crashes at the same point. As 
in the query still works but the import handler does nothing at all


Tomcat logs seem to indicate no major issues.


There’s not a strange variable that is set to make an upper index limit is 
there?

Regards,
Steven



On 4 Dec 2013, at 20:02, Erick Erickson [via Lucene] 
 wrote:

> There's a known issue with SolrCloud with multiple shards, but 
> you haven't told us whether you're using that. The test for 
> whether you're running in to that is whether you can continue 
> to _query_, just not update. 
> 
> But you need to tell us more about our setup. In particular 
> hour commit settings (hard and soft), your solrconfig settings, 
> particularly around autowarming, how you're "bulk indexing", 
> SolrJ? DIH? a huge CSV file? 
> 
> Best, 
> Erick 
> 
> 
> On Wed, Dec 4, 2013 at 2:30 PM, steven crichton <[hidden email]>wrote: 
> 
> > I am finding with a bulk index using SOLR 4.3 on Tomcat, that when I reach 
> > 69578 records the server stops adding anything more. 
> > 
> > I've tried reducing the data sent to the bare minimum of fields and using 
> > ASC and DESC data to see if it could be a field issue. 
> > 
> > Is there anything I could look at for this? As I'm not finding anything 
> > similar noted before. Does tomcat have issues with closing connections that 
> > look like DDOS attacks? Or could it be related to too many commits in too 
> > short a time? 
> > 
> > Any help will be very greatly appreciated. 
> > 
> > 
> > 
> > -- 
> > View this message in context: 
> > http://lucene.472066.n3.nabble.com/Solr-Stalls-on-Bulk-indexing-no-logs-or-errors-tp4104981.html
> > Sent from the Solr - User mailing list archive at Nabble.com. 
> > 
> 
> 
> If you reply to this email, your message will be added to the discussion 
> below:
> http://lucene.472066.n3.nabble.com/Solr-Stalls-on-Bulk-indexing-no-logs-or-errors-tp4104981p4104984.html
> To unsubscribe from Solr Stalls on Bulk indexing, no logs or errors, click 
> here.
> NAML





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-Stalls-on-Bulk-indexing-no-logs-or-errors-tp4104981p4104990.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: a core for every user, lots of users... are there issues

2013-12-04 Thread hank williams
Super helpful. Thanks.


On Wed, Dec 4, 2013 at 2:53 PM, Shawn Heisey  wrote:

> On 12/4/2013 12:34 PM, hank williams wrote:
>
>> Ok one more simple question. We just upgraded to 4.6 from 4.2. In 4.2 we
>> were *trying* to use the rest API function "create" to create cores
>> without
>> having to manually mess with files on the server. Is this what "create"
>> was
>> supposed to do? If so it was borken or we werent using it right. In any
>> case in 4.6 is that the right way to programmatically add cores in
>> discovery mode?
>>
>
> If you are NOT in SolrCloud mode, in order to create new cores, the config
> files need to already exist on the disk.  This is the case with all
> versions of Solr.
>
> If you're running in SolrCloud mode, the core is associated with a
> collection.  Collections have a link to aconfig in zookeeper.  The config
> is not stored with the core on the disk.
>
> Thanks,
> Shawn
>
>


-- 
blog: whydoeseverythingsuck.com


facet.method=fcs vs facet.method=fc on solr slaves

2013-12-04 Thread Patrick O'Lone
Is there any advantage on a Solr slave to receive queries using
facet.method=fcs instead of the default of facet.method=fc? Most of the
segment files are unchanged between replication events - but I wasn't
sure if replication would cause the unchanged segment field caches to be
lost anyway.
-- 
Patrick O'Lone
Director of Software Development
TownNews.com

E-mail ... pol...@townnews.com
Phone  309-743-0809
Fax .. 309-743-0830


Re: Tika not extracting content from ODT / ODS (open document / libreoffice) in Solr 4.2.1

2013-12-04 Thread Augusto Camarotti
Hello everybody,
 
First of all, sorry about my bad english.
 
Giving updates on this bug, i maybe have found a solution for it.
I would like to have opinions on this solution.
I have found out that tika, when reading .odt files, would return more
than one document.
The first one for content.xml, which have the actual content of the
file, and the second one for styles.xml.
To test this, try to modify an .odt file removing styles.xml and solr
should parse its contents normally.
Solr, when receiving the second document (styles.xml), erases anything
it has read before. In general, styles.xml doesnt have any text on it,
so it receives just some spaces. 
I just modified a function inside 'SolrContentHandler.java' that erases
the content of the first document. I made this function to just add an
space, do not erase any previous content, so will always add up any
document tika is returning for solr.
I guess this behavior is going to work for previous cases, but i need
your opinion about this.
 
Here is the only modification i made on 'SolrContentHandler.java' 
 
  @Override
  public void startDocument() throws SAXException {
document.clear();
//catchAllBuilder.setLength(0);
//Augusto Camarotti - 28-11-2013
//As tika may parse more than one documents in one file, i have to
append every documento tika parses me,
//so, i will only append a whitespace and wait for new content
everytime. Otherwise, Solr would just get the last document of the file
catchAllBuilder.append(' ');
for (StringBuilder builder : fieldBuilders.values()) {
  builder.setLength(0);
}
bldrStack.clear();
bldrStack.add(catchAllBuilder);
  }
 
 
Regards, 
 
Augusto Camarotti

>>> Alexandre Rafalovitch  10/05/2013 21:13 >>>
I would try DIH with the flags as in jira issue I linked to. If
possible.
Just in case.

Regards,
Alex
On 10 May 2013 19:53, "Sebastián Ramírez"

wrote:

> OK Jack, I'll switch to MS Office ...hahaha
>
> Many thanks for your interest and help... and the bug report in
JIRA.
>
> Best,
>
> Sebastián Ramírez
>
>
> On Fri, May 10, 2013 at 5:48 PM, Jack Krupansky
 >wrote:
>
> > I filed  SOLR-4809 - "OpenOffice document body is not indexed by
> > SolrCell", including some test files.
> >
> > https://issues.apache.org/**jira/browse/SOLR-4809<
> https://issues.apache.org/jira/browse/SOLR-4809>
> >
> > Yeah, at this stage, switching to Microsoft Office seems like the
best
> bet!
> >
> >
> > -- Jack Krupansky
> >
> > -Original Message- From: Sebastián Ramírez
> > Sent: Friday, May 10, 2013 6:33 PM
> > To: solr-user@lucene.apache.org
> > Subject: Re: Tika not extracting content from ODT / ODS (open
document /
> > libreoffice) in Solr 4.2.1
> >
> >
> > Many thanks Jack for your attention and effort on solving the
problem.
> >
> > Best,
> >
> > Sebastián Ramírez
> >
> >
> > On Fri, May 10, 2013 at 5:23 PM, Jack Krupansky
 >*
> > *wrote:
> >
> >  I downloaded the latest Apache OpenOffice 3.4.1 and it does in
fact fail
> >> to index the proper content, both for .ODP and .ODT files.
> >>
> >> If I do extractOnly=true&extractFormat=text, I see the
extracted
> text
> >>
> >> clearly in addition to the metadata.
> >>
> >> I tested on 4.3, and then tested on Solr 3.6.1 and it also
exhibited the
> >> problem. I just see spaces in both cases.
> >>
> >> But whether the problem is due to Solr or Tika, is not apparent.
> >>
> >> In any case, a Jira is warranted.
> >>
> >>
> >> -- Jack Krupansky
> >>
> >> -Original Message- From: Sebastián Ramírez
> >> Sent: Friday, May 10, 2013 11:24 AM
> >> To: solr-user@lucene.apache.org
> >> Subject: Tika not extracting content from ODT / ODS (open document
/
> >> libreoffice) in Solr 4.2.1
> >>
> >> Hello everyone,
> >>
> >> I'm having a problem indexing content from "opendocument format"
files.
> >> The
> >> files created with OpenOffice and LibreOffice (odt, ods...).
> >>
> >> Tika is being able to read the files but Solr is not indexing the
> content.
> >>
> >> It's not a problem of commiting or something like that, after I
post a
> >> file
> >> it is indexed and all the metadata is indexed/stored but the
content
> isn't
> >> there.
> >>
> >>
> >>   - I modified the solrconfig.xml file to catch everything:
> >>
> >>
> >>  >>
> >>
> >>
> >>
> >>all_txt
> >>
> >>
> >>
> >>
> >>   - Then I submitted the file to Solr:
> >>
> >>
> >> curl '
> >> http://localhost:8983/solr/update/extract?commit=true&**<
> http://localhost:8983/solr/**update/extract?commit=true&**>
> >> literal.id=newods >> extract?commit=true&literal.**id=newods<
>
http://localhost:8983/solr/update/extract?commit=true&literal.id=newods>
> >> >'
> >> -H
> >> 'Content-type:
application/vnd.oasis.opendocument.spreadsheet'
> >>
> >> --data-binary @test_ods.ods
> >>
> >>
> >>
> >>   - Now when I do a search in Solr I get this result, there is
something
> >>
> >>   in the "content", but that's not the actual content of the
original
> >

Re: Solr Stalls on Bulk indexing, no logs or errors

2013-12-04 Thread Erick Erickson
There's a known issue with SolrCloud with multiple shards, but
you haven't told us whether you're using that. The test for
whether you're running in to that is whether you can continue
to _query_, just not update.

But you need to tell us more about our setup. In particular
hour commit settings (hard and soft), your solrconfig settings,
particularly around autowarming, how you're "bulk indexing",
SolrJ? DIH? a huge CSV file?

Best,
Erick


On Wed, Dec 4, 2013 at 2:30 PM, steven crichton wrote:

> I am finding with a bulk index using SOLR 4.3 on Tomcat, that when I reach
> 69578 records the server stops adding anything more.
>
> I've tried reducing the data sent to the bare minimum of fields and using
> ASC and DESC data to see if it could be a field issue.
>
> Is there anything I could look at for this? As I'm not finding anything
> similar noted before. Does tomcat have issues with closing connections that
> look like DDOS attacks? Or could it be related to too many commits in too
> short a time?
>
> Any help will be very greatly appreciated.
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Solr-Stalls-on-Bulk-indexing-no-logs-or-errors-tp4104981.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: a core for every user, lots of users... are there issues

2013-12-04 Thread Shawn Heisey

On 12/4/2013 12:34 PM, hank williams wrote:

Ok one more simple question. We just upgraded to 4.6 from 4.2. In 4.2 we
were *trying* to use the rest API function "create" to create cores without
having to manually mess with files on the server. Is this what "create" was
supposed to do? If so it was borken or we werent using it right. In any
case in 4.6 is that the right way to programmatically add cores in
discovery mode?


If you are NOT in SolrCloud mode, in order to create new cores, the 
config files need to already exist on the disk.  This is the case with 
all versions of Solr.


If you're running in SolrCloud mode, the core is associated with a 
collection.  Collections have a link to aconfig in zookeeper.  The 
config is not stored with the core on the disk.


Thanks,
Shawn



Re: a core for every user, lots of users... are there issues

2013-12-04 Thread hank williams
Ok one more simple question. We just upgraded to 4.6 from 4.2. In 4.2 we
were *trying* to use the rest API function "create" to create cores without
having to manually mess with files on the server. Is this what "create" was
supposed to do? If so it was borken or we werent using it right. In any
case in 4.6 is that the right way to programmatically add cores in
discovery mode?


On Tue, Dec 3, 2013 at 7:37 PM, Erick Erickson wrote:

> bq: Do you have any sense of what a good upper limit might be, or how we
> might figure that out?
>
> As always, "it depends" (tm). And the biggest thing it depends upon is the
> number of simultaneous users you have and the size of their indexes. And
> we've arrived at the black box of estimating size again. Siiihh... I'm
> afraid that the only way is to test and establish some rules of thumb.
>
> The transient core constraint will limit the number of cores loaded at
> once. If you allow too many cores at once, you'll get OOM errors when all
> the users pile on at the same time.
>
> Let's say you've determined that 100 is the limit for transient cores. What
> I suspect you'll see is degrading response times if this is too low. Say
> 110 users are signed on and say they submit queries perfectly in order, one
> after the other. Every request will require the core to be opened and it'll
> take a bit. So that'll be a flag.
>
> Or that's a fine limit but your users have added more and more documents
> and you're coming under memory pressure.
>
> As you can tell I don't have any good answers. I've seen between 10M and
> 300M documents on a single machine
>
> BTW, on a _very_ casual test I found about 1000 cores/second were found in
> discovery mode. While they aren't loaded if they're transient, it's still a
> consideration if you have 10s of thousands.
>
> Best,
> Erick
>
>
>
> On Tue, Dec 3, 2013 at 3:33 PM, hank williams  wrote:
>
> > On Tue, Dec 3, 2013 at 3:20 PM, Erick Erickson  > >wrote:
> >
> > > You probably want to look at "transient cores", see:
> > > http://wiki.apache.org/solr/LotsOfCores
> > >
> > > But millions will be "interesting" for a single node, you must have
> some
> > > kind of partitioning in mind?
> > >
> > >
> > Wow. Thanks for that great link. Yes we are sharding so its not like
> there
> > would be millions of cores on one machine or even cluster. And since the
> > cores are one per user, this is a totally clean approach. But still we
> want
> > to make sure that we are not overloading the machine. Do you have any
> sense
> > of what a good upper limit might be, or how we might figure that out?
> >
> >
> >
> > > Best,
> > > Erick
> > >
> > >
> > > On Tue, Dec 3, 2013 at 2:38 PM, hank williams 
> wrote:
> > >
> > > >  We are building a system where there is a core for every user. There
> > > will
> > > > be many tens or perhaps ultimately hundreds of thousands or millions
> of
> > > > users. We do not need each of those users to have “warm” data in
> > memory.
> > > In
> > > > fact doing so would consume lots of memory unnecessarily, for users
> > that
> > > > might not have logged in in a long time.
> > > >
> > > > So my question is, is the default behavior of Solr to try to keep all
> > of
> > > > our cores warm, and if so, can we stop it? Also given the number of
> > cores
> > > > that we will likely have is there anything else we should be keeping
> in
> > > > mind to maximize performance and minimize memory usage?
> > > >
> > >
> >
> >
> >
> > --
> > blog: whydoeseverythingsuck.com
> >
>



-- 
blog: whydoeseverythingsuck.com


Solr Stalls on Bulk indexing, no logs or errors

2013-12-04 Thread steven crichton
I am finding with a bulk index using SOLR 4.3 on Tomcat, that when I reach
69578 records the server stops adding anything more. 

I've tried reducing the data sent to the bare minimum of fields and using
ASC and DESC data to see if it could be a field issue.

Is there anything I could look at for this? As I'm not finding anything
similar noted before. Does tomcat have issues with closing connections that
look like DDOS attacks? Or could it be related to too many commits in too
short a time?

Any help will be very greatly appreciated.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-Stalls-on-Bulk-indexing-no-logs-or-errors-tp4104981.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Shouldn't fuzzy version of a solr query always return a super set of its not-fuzzy equivalent

2013-12-04 Thread Mhd Wrk
Debug shows that all terms are lowercased properly.

Thanks
On Dec 4, 2013 3:18 AM, "Erik Hatcher"  wrote:

> Chances are you're not getting those fuzzy terms analyzed as you'd like.
>  See debug (&debug=true) output to be sure.  Most likely the fuzzy terms
> are not being lowercased.  See
> http://wiki.apache.org/solr/MultitermQueryAnalysis for more details (this
> applies to fuzzy, not just wildcard) terms too.
>
> Erik
>
>
> On Dec 4, 2013, at 4:46 AM, Mhd Wrk  wrote:
>
> > I'm using the following query to do a fuzzy search on Solr 4.5.1 and am
> > getting empty result.
> >
> > qt=standard&q=+(field1|en_CA|:Swimming~2 field1|en|:Swimming~2)
> > +(field1|en_CA|:Goggle~1 field1|en|:Goggle~1) +(+startDate:[* TO
> > 2013-12-04T00:23:00Z] -endDate:[* TO
> > 2013-12-04T00:23:00Z])&start=0&rows=10&fl=id
> >
> > If I change it to a not fuzzy query by simply dropping tildes from the
> > terms (see below) then it returns the expected result! Is this a bug?
> > Shouldn't fuzzy version of a query always return a super set of its
> > not-fuzzy equivalent?
> >
> > qt=standard&q=+(field1|en_CA|:Swimming field1|en|:Swimming)
> > +(field1|en_CA|:Goggle field1|en|:Goggle) +(+startDate:[* TO
> > 2013-12-04T00:23:00Z] -endDate:[* TO
> > 2013-12-04T00:23:00Z])&start=0&rows=10&fl=id
>
>


Re: Setting routerField/shardKey on specific collection?

2013-12-04 Thread Daniel Bryant
Many thanks Timothy, I tried this today but ran into issues getting the 
new collection to persist (so that I could search for the parameter). 
It's good to have this confirmed as a viable approach though, and I'll 
persevere with this tomorrow.


If I figure it out I'll reply with the details.

Thanks again,

Daniel


On 04/12/2013 17:41, Tim Potter wrote:

Hi Daniel,

I'm not sure how this would apply to an existing collection (in your case 
collection1). Try using the collections API to create a new collection and pass 
the router.field parameter. Grep'ing over the code, the parameter is named: 
router.field (not routerField or routeField).

Cheers,

Timothy Potter
Sr. Software Engineer, LucidWorks
www.lucidworks.com


From: Daniel Bryant 
Sent: Wednesday, December 04, 2013 9:40 AM
To: solr-user@lucene.apache.org
Subject: Setting routerField/shardKey on specific collection?

Hi,

I'm using Solr 4.6 and trying to specify a router.field (shard key) on a
specific collection so that all documents with the same value in the
specified field end up in the same collection.

However, I can't find an example of how to do this via the solr.xml? I
see in this ticket https://issues.apache.org/jira/browse/SOLR-5017 there
is a mention of a routeField property.

Should the solr.xml contain the following?


  


Any help would be greatly appreciate? I've been yak shaving all
afternoon reading various Jira tickets and wikis trying to get this to
work :-)

Best wishes,

Daniel


--
*Daniel Bryant  |  Software Development Consultant  | www.tai-dev.co.uk
*
daniel.bry...@tai-dev.co.uk   |  +44
(0) 7799406399  |  Twitter: @taidevcouk 


--
*Daniel Bryant  |  Software Development Consultant  | www.tai-dev.co.uk 
*
daniel.bry...@tai-dev.co.uk   |  +44 
(0) 7799406399  |  Twitter: @taidevcouk 


RE: Setting routerField/shardKey on specific collection?

2013-12-04 Thread Tim Potter
Hi Daniel,

I'm not sure how this would apply to an existing collection (in your case 
collection1). Try using the collections API to create a new collection and pass 
the router.field parameter. Grep'ing over the code, the parameter is named: 
router.field (not routerField or routeField).

Cheers,

Timothy Potter
Sr. Software Engineer, LucidWorks
www.lucidworks.com


From: Daniel Bryant 
Sent: Wednesday, December 04, 2013 9:40 AM
To: solr-user@lucene.apache.org
Subject: Setting routerField/shardKey on specific collection?

Hi,

I'm using Solr 4.6 and trying to specify a router.field (shard key) on a
specific collection so that all documents with the same value in the
specified field end up in the same collection.

However, I can't find an example of how to do this via the solr.xml? I
see in this ticket https://issues.apache.org/jira/browse/SOLR-5017 there
is a mention of a routeField property.

Should the solr.xml contain the following?


 


Any help would be greatly appreciate? I've been yak shaving all
afternoon reading various Jira tickets and wikis trying to get this to
work :-)

Best wishes,

Daniel


--
*Daniel Bryant  |  Software Development Consultant  | www.tai-dev.co.uk
*
daniel.bry...@tai-dev.co.uk   |  +44
(0) 7799406399  |  Twitter: @taidevcouk 


Re: Questions about commits and OOE

2013-12-04 Thread Daniel Collins
I'd second the use of jstack to check your threads.  Each request (be it a
search or update) will generate a request handler thread on the Solr side
(unless you've set the limits in the HttpShardHandlerFactory (solr.xml for
solr-wide faults and/or under the requestHandler in SolrConfig.xml), we
set maxConnectionsPerHost, corePoolSize and maximumPoolSize, since we ran
into a similar issue.

Our system ironically didn't crash, we just had a JVM with.about 256000
threads, which was rather SSLLOOWW :)

On the softCommit front, we have had some success with small softCommit
times, but then we use SSDs (and have lots of memory and lots of shards).
Once we get concrete figures, we'll publish them, but we are a fair way
below 1s now with no major impact on indexing throughput (yet).  But I
would agree that unless you are really really sure you need it (and most
people don't), keep to the "known limits".


On 4 December 2013 16:09, Tim Potter  wrote:

> Hi Metin,
>
> I think removing the softCommit=true parameter on the client side will
> definitely help as NRT wasn't designed to re-open searchers after every
> document. Try every 1 second (or even every few seconds), I doubt your
> users will notice. To get an idea of what threads are running in your JVM
> process, you can use jstack.
>
> Cheers,
>
> Timothy Potter
> Sr. Software Engineer, LucidWorks
> www.lucidworks.com
>
> 
> From: OSMAN Metin 
> Sent: Wednesday, December 04, 2013 7:36 AM
> To: solr-user@lucene.apache.org
> Subject: Questions about commits and OOE
>
> Hi all,
>
> let me first explain our situation :
>
> We have
>
>
> -   two virtual servers with each :
>
> 4x SolR 4.4.0 on Tomcat 6 (+ with mod_cluster 1.2.0), each JVM has
> -Xms2048m -Xmx2048m -XX:MaxPermSize=384m
> 1x Zookeeper 3.4.5 (Only one of the two Zookeeper is active.)
> CentOS 6.4
> Sun JDK 1.6.0-31
> 16 GB of RAM
> 4 vCPU
>
>
> -   only one core and one shard
>
> -   ~25 docs and 50-100 MB of index size
>
> -   two load balancers (apache + mod_cluster) who are both connected
> to the 8 SolR nodes
>
> -   1 VIP pointing to these two LB
>
> The commit configuration is
>
> -   every update request do a soft commit (i.e. param softCommit=true
> in the http request)
>
> -   autosoftcommit disabled
>
> -   autocommit enabled every 15 seconds
>
> The client application is a java app with SolRj client using the previous
> VIP as an endpoint.
> We need NearRealTime modifications visible by the end users.
> During the day, the client uses SolR with about 80% of select requests and
> 20% of update requests.
> Every morning, the client is sending a massive bunch of updates (about
> 1 in a few minutes).
>
> During this massive update, we have sometimes a peak of active threads
> exceeding the limit of 8192 process authorized for the user running the
> tomcat and zookeeper process.
> When this happens, every hardCommit is failing with an "OutOfMemory :
> unable to create native thread" message.
>
>
> Now, I have some questions :
>
> -   Why are there some many threads created ? Is the softCommit on
> every update that opens a new thread ?
>
> -   Once an OOE occurs, every hardcommit will be broken, even if the
> number of threads opened on the system is low. Is there any way to "free"
> the JVM ? The only solution we have found is to restart all the JVM.
>
> -   When the OOE occurs, the SolR cloud console shows the leader node
> as active and the others as recovering
>
> o   is the replication working at that moment ?
>
> o   as all the hardcommits are failing but the softcommits not, am I very
> sure that I will not lose some updates when restarting all the nodes ?
>
> By the way, we are planning to
>
> -   disable the softCommit parameter on the client side and to enable
> the autosoftcommit instead.
>
> -   create another server and make 3 zookeeper chorum instead of a
> unique zookeeper master.
>
> -   skip the use of load balancers and let zookeeper decide which node
> will respond to the requests
>
> Any help would be appreciated !
>
> Metin OSMAN
>


Re: json update moves doc to end

2013-12-04 Thread Chris Hostetter

: Well, both have a score of -Infinity. So they're "equal" and
: the tiebreaker is the internal Lucene doc ID.
: 
: Now this is not helpful since the question now is where
: -Infinity comes from, this looks suspicious:
:  -Infinity = (MATCH) FunctionQuery(log(int(clicks))), product of:
: -Infinity = log(int(clicks)=0)

If the score of this doc was not "-Infinity" before your doc update, and 
it became "-Infinity" after your update, and your update did not 
intentionally change the value of the "clicks" field to "0" then i suspect 
what you are seeing is the result of not having all of your fields as 
stored="true"...

https://cwiki.apache.org/confluence/display/solr/Updating+Parts+of+Documents

>>   /!\ All original source fields must be stored for field modifiers to 
>>   work correctly, which is the Solr default

-Hoss
http://www.lucidworks.com/


Re: SolrCloud FunctionQuery inconsistency

2013-12-04 Thread Chris Hostetter

: There is no default value for ptime. It is generated by users.

thank you, that rules out my previous wild guess.

: I was trying query with a function query({!boost b=dateDeboost(ptime)}
: channelid:0082 && title:abc), which leads differents results from the same
: shard(using the param: shards=shard3).
: 
: The diffenence is maxScore, which is not consistent. And the maxScore is

Ok ... but you still haven't provided enough information for us to make a 
guess as to why you are seeing inconsistent scores coming back form your 
queries -- at a minimum we need to see the debugQuery=true output for each 
of the different replicas that are generating differnet scores.

It's possible that the descrepency you are seeing is a minor one resulting 
from slightly different term stats (ie: segments being merged slightly 
differnetly on differnet replicas), or it could be a symptom of a larger 
problem.



-Hoss
http://www.lucidworks.com/


Re: Programmatically upload configuration into ZooKeeper

2013-12-04 Thread Shawn Heisey
On 12/4/2013 9:23 AM, Artem Karpenko wrote:
> so it's SolrZkClient indeed. I've tried it out and it seems to do just
> the job I need. Thank you!
> 
> On a related note - is there a similar way to create/reload
> core/collection, using maybe CloudSolrServer or smth. inside it? Didn't
> found any methods that could do the thing.

This should probably work for reloading collection1.  I can't test it
right now, as I'm about to start my morning commute.

CloudSolrServer srv =
new CloudSolrServer("zoo1:2181,zoo2:2181,zoo3:2181/mysolr");
srv.setDefaultCollection("collection2");
SolrQuery q = new SolrQuery();
q.setRequestHandler("/admin/collections");
q.set("action", "RELOAD");
q.set("name", "collection1");
QueryResponse x = srv.query(q);

If you want to reload an individual core, you'd need to use
HttpSolrServer, not CloudSolrServer.  SOLR-4140 made it possible to use
the collections API with CloudSolrServer, but as far as I can tell, it
doesn't enable the CoreAdmin API.

Note that reloads don't work right with SolrCloud unless the server
version is at least 4.4, due to a bug.

Thanks,
Shawn



RE: json update moves doc to end

2013-12-04 Thread Andreas Owen
I changed my boost-function log(clickrate)^8 to div(clciks,displays)^8 and
it works now. I get the following output from debug

0.0022668892 = (MATCH) FunctionQuery(div(const(2),const(5))), product of:
0.4 = div(const(2),const(5))
8.0 = boost
7.0840283E-4 = queryNorm

Am i undestanding this right, that 0.4 and 8.0 result in 7.084? I'm
having trouble undestanding how much i boosted it.

As i use NgramFilterFactory i get a lot of hits because of the tokens. Can i
make the boost higher if the hole search-term is found and not just part of
it?


-Original Message-
From: Erick Erickson [mailto:erickerick...@gmail.com] 
Sent: Mittwoch, 4. Dezember 2013 15:07
To: solr-user@lucene.apache.org
Subject: Re: json update moves doc to end

Well, both have a score of -Infinity. So they're "equal" and the tiebreaker
is the internal Lucene doc ID.

Now this is not helpful since the question now is where -Infinity comes
from, this looks suspicious:
 -Infinity = (MATCH) FunctionQuery(log(int(clicks))), product of:
-Infinity = log(int(clicks)=0)

not much help I know, but

Erick


On Wed, Dec 4, 2013 at 7:24 AM, Andreas Owen  wrote:

> Hi Erick
>
> Here are the last 2 results from a search and i am not understanding 
> why the last one with the boost editorschoice^200 isn't at the top. By 
> the way can i also give a substantial boost to results that contain 
> the hole search-request and not just 3 or 4 letters (tokens)?
>
> 
> -Infinity = (MATCH) sum of:
>   0.013719446 = (MATCH) max of:
> 0.013719446 = (MATCH) sum of:
>   2.090396E-4 = (MATCH) weight(plain_text:ber in 841) 
> [DefaultSimilarity], result of:
> 2.090396E-4 = score(doc=841,freq=8.0 = termFreq=8.0 ), product 
> of:
>   0.009452709 = queryWeight, product of:
> 1.3343692 = idf(docFreq=611, maxDocs=855)
> 0.0070840283 = queryNorm
>   0.022114253 = fieldWeight in 841, product of:
> 2.828427 = tf(freq=8.0), with freq of:
>   8.0 = termFreq=8.0
> 1.3343692 = idf(docFreq=611, maxDocs=855)
> 0.005859375 = fieldNorm(doc=841)
>   0.0012402858 = (MATCH) weight(plain_text:eri in 841) 
> [DefaultSimilarity], result of:
> 0.0012402858 = score(doc=841,freq=9.0 = termFreq=9.0 ), 
> product of:
>   0.022357063 = queryWeight, product of:
> 3.1559815 = idf(docFreq=98, maxDocs=855)
> 0.0070840283 = queryNorm
>   0.05547624 = fieldWeight in 841, product of:
> 3.0 = tf(freq=9.0), with freq of:
>   9.0 = termFreq=9.0
> 3.1559815 = idf(docFreq=98, maxDocs=855)
> 0.005859375 = fieldNorm(doc=841)
>   5.0511415E-4 = (MATCH) weight(plain_text:ric in 841) 
> [DefaultSimilarity], result of:
> 5.0511415E-4 = score(doc=841,freq=1.0 = termFreq=1.0 ), 
> product of:
>   0.024712078 = queryWeight, product of:
> 3.4884217 = idf(docFreq=70, maxDocs=855)
> 0.0070840283 = queryNorm
>   0.020439971 = fieldWeight in 841, product of:
> 1.0 = tf(freq=1.0), with freq of:
>   1.0 = termFreq=1.0
> 3.4884217 = idf(docFreq=70, maxDocs=855)
> 0.005859375 = fieldNorm(doc=841)
>   8.721528E-4 = (MATCH) weight(plain_text:ich in 841) 
> [DefaultSimilarity], result of:
> 8.721528E-4 = score(doc=841,freq=12.0 = termFreq=12.0 ), 
> product of:
>   0.017446788 = queryWeight, product of:
> 2.4628344 = idf(docFreq=197, maxDocs=855)
> 0.0070840283 = queryNorm
>   0.049989305 = fieldWeight in 841, product of:
> 3.4641016 = tf(freq=12.0), with freq of:
>   12.0 = termFreq=12.0
> 2.4628344 = idf(docFreq=197, maxDocs=855)
> 0.005859375 = fieldNorm(doc=841)
>   7.725705E-4 = (MATCH) weight(plain_text:cht in 841) 
> [DefaultSimilarity], result of:
> 7.725705E-4 = score(doc=841,freq=4.0 = termFreq=4.0 ), product 
> of:
>   0.021610687 = queryWeight, product of:
> 3.050621 = idf(docFreq=109, maxDocs=855)
> 0.0070840283 = queryNorm
>   0.035749465 = fieldWeight in 841, product of:
> 2.0 = tf(freq=4.0), with freq of:
>   4.0 = termFreq=4.0
> 3.050621 = idf(docFreq=109, maxDocs=855)
> 0.005859375 = fieldNorm(doc=841)
>   0.0010287998 = (MATCH) weight(plain_text:beri in 841) 
> [DefaultSimilarity], result of:
> 0.0010287998 = score(doc=841,freq=1.0 = termFreq=1.0 ), 
> product of:
>   0.035267927 = queryWeight, product of:
> 4.978513 = idf(docFreq=15, maxDocs=855)
> 0.0070840283 = queryNorm
>   0.029170973 = fieldWeight in 841, product of:
> 1.0 = tf(freq=1.0), with freq of:
>   1.0 = termFreq=1.0
> 4.978513 = idf(docFreq=15, maxDocs=855)
> 0.005859375 = fieldNorm(doc=841)
>   0.0010556461 = (MATCH) wei

Setting routerField/shardKey on specific collection?

2013-12-04 Thread Daniel Bryant

Hi,

I'm using Solr 4.6 and trying to specify a router.field (shard key) on a 
specific collection so that all documents with the same value in the 
specified field end up in the same collection.


However, I can't find an example of how to do this via the solr.xml? I 
see in this ticket https://issues.apache.org/jira/browse/SOLR-5017 there 
is a mention of a routeField property.


Should the solr.xml contain the following?


routerField="consolidationGroupId" />



Any help would be greatly appreciate? I've been yak shaving all 
afternoon reading various Jira tickets and wikis trying to get this to 
work :-)


Best wishes,

Daniel


--
*Daniel Bryant  |  Software Development Consultant  | www.tai-dev.co.uk 
*
daniel.bry...@tai-dev.co.uk   |  +44 
(0) 7799406399  |  Twitter: @taidevcouk 


Re: Programmatically upload configuration into ZooKeeper

2013-12-04 Thread Artem Karpenko

Hello Greg,

so it's SolrZkClient indeed. I've tried it out and it seems to do just 
the job I need. Thank you!


On a related note - is there a similar way to create/reload 
core/collection, using maybe CloudSolrServer or smth. inside it? Didn't 
found any methods that could do the thing.


Regards,
Artem.

04.12.2013 17:15, Greg Walters пишет:

Hi Artem,

This question (or one very like it) has been asked on this list before so 
there's some prior art you could modify to suit your needs.

Taken from Timothy Potter :

**
public static void updateClusterstateJsonInZk(CloudSolrServer
cloudSolrServer, CommandLine cli) throws Exception {
String updateClusterstateJson =
cli.getOptionValue("updateClusterstateJson");

ZkStateReader zkStateReader = cloudSolrServer.getZkStateReader();
SolrZkClient zkClient = zkStateReader.getZkClient();

File jsonFile = new File(updateClusterstateJson);
if (!jsonFile.isFile()) {
System.err.println(jsonFile.getAbsolutePath()+" not found.");
return;
}

byte[] clusterstateJson = readFile(jsonFile);

// validate the user is passing is valid JSON
InputStreamReader bytesReader = new InputStreamReader(new
ByteArrayInputStream(clusterstateJson), "UTF-8");
JSONParser parser = new JSONParser(bytesReader);
parser.toString();

zkClient.setData("/clusterstate.json", clusterstateJson, true);
System.out.println("Updated /clusterstate.json with data from
"+jsonFile.getAbsolutePath());
}
**

You should be able to modify that or use it as a basis for uploading the 
changed files in your config.

Thanks,
Greg

On Dec 4, 2013, at 8:36 AM, Artem Karpenko  wrote:


What is the best way to upload Solr configuration files into ZooKeeper 
programmatically, i.e. - from within Java code?
I know that there are cloud-scripts for this, but in the end they should use 
some Java client library, don't they?

This question raised because we use special configuration system (Java-based) 
to store all configuration files (not only Solr) and it'd be cool if we could
export modified files into ZooKeeper when applying changes. We would then 
reload collections remotely via REST API.

I've digged a little into ZkCli class and it seems that SolrZkClient can do 
something along the lines above. Is it the right tool for the job?

Any hints would be appreciated.

Regards,
Artem.






RE: Questions about commits and OOE

2013-12-04 Thread Tim Potter
Hi Metin,

I think removing the softCommit=true parameter on the client side will 
definitely help as NRT wasn't designed to re-open searchers after every 
document. Try every 1 second (or even every few seconds), I doubt your users 
will notice. To get an idea of what threads are running in your JVM process, 
you can use jstack.

Cheers,

Timothy Potter
Sr. Software Engineer, LucidWorks
www.lucidworks.com


From: OSMAN Metin 
Sent: Wednesday, December 04, 2013 7:36 AM
To: solr-user@lucene.apache.org
Subject: Questions about commits and OOE

Hi all,

let me first explain our situation :

We have


-   two virtual servers with each :

4x SolR 4.4.0 on Tomcat 6 (+ with mod_cluster 1.2.0), each JVM has -Xms2048m 
-Xmx2048m -XX:MaxPermSize=384m
1x Zookeeper 3.4.5 (Only one of the two Zookeeper is active.)
CentOS 6.4
Sun JDK 1.6.0-31
16 GB of RAM
4 vCPU


-   only one core and one shard

-   ~25 docs and 50-100 MB of index size

-   two load balancers (apache + mod_cluster) who are both connected to the 
8 SolR nodes

-   1 VIP pointing to these two LB

The commit configuration is

-   every update request do a soft commit (i.e. param softCommit=true in 
the http request)

-   autosoftcommit disabled

-   autocommit enabled every 15 seconds

The client application is a java app with SolRj client using the previous VIP 
as an endpoint.
We need NearRealTime modifications visible by the end users.
During the day, the client uses SolR with about 80% of select requests and 20% 
of update requests.
Every morning, the client is sending a massive bunch of updates (about 1 in 
a few minutes).

During this massive update, we have sometimes a peak of active threads 
exceeding the limit of 8192 process authorized for the user running the tomcat 
and zookeeper process.
When this happens, every hardCommit is failing with an "OutOfMemory : unable to 
create native thread" message.


Now, I have some questions :

-   Why are there some many threads created ? Is the softCommit on every 
update that opens a new thread ?

-   Once an OOE occurs, every hardcommit will be broken, even if the number 
of threads opened on the system is low. Is there any way to "free" the JVM ? 
The only solution we have found is to restart all the JVM.

-   When the OOE occurs, the SolR cloud console shows the leader node as 
active and the others as recovering

o   is the replication working at that moment ?

o   as all the hardcommits are failing but the softcommits not, am I very sure 
that I will not lose some updates when restarting all the nodes ?

By the way, we are planning to

-   disable the softCommit parameter on the client side and to enable the 
autosoftcommit instead.

-   create another server and make 3 zookeeper chorum instead of a unique 
zookeeper master.

-   skip the use of load balancers and let zookeeper decide which node will 
respond to the requests

Any help would be appreciated !

Metin OSMAN


Re: Solr Performance Issue

2013-12-04 Thread Shawn Heisey
On 12/4/2013 6:31 AM, kumar wrote:
> I am having almost 5 to 6 crores of indexed documents in solr. And when i am
> going to change anything in the configuration file solr server is going
> down.

If you mean crore and not core, then you are talking about 50 to 60
million documents.  That's a lot.  Solr is perfectly capable of handling
that many documents, but you do need to have very good hardware.

Even if they are small, your index is likely to be many gigabytes in
size.  If the documents are large, that might be measured in terabytes.
 Large indexes require a lot of memory for good performance.  This will
be discussed in more detail below.

> As a new user to solr i can't able to find the exact reason for going server
> down.
> 
> I am using cache's in the following way :
> 
>   size="16384"
>  initialSize="4096"
>  autowarmCount="4096"/>
>size="16384"
>  initialSize="4096"
>  autowarmCount="1024"/>
> 
> and i am not using any documentCache, fieldValueCahe's

As Erick said, these cache sizes are HUGE.  In particular, your
autowarmCount values are extremely high.

> Whether this can lead any performance issue means going server down.

Another thing that Erick pointed out is that you haven't really told us
what's happening.  When you say that the server goes down, what EXACTLY
do you mean?

> And i am seeing logging in the server it is showing exception in the
> following way
> 
> 
> Servlet.service() for servlet [default] in context with path [/solr] threw
> exception [java.lang.IllegalStateException: Cannot call sendError() after
> the response has been committed] with root cause

This message comes from your servlet container, not Solr.  You're
probably using Tomcat, not the included Jetty.  There is some indirect
evidence that this can be fixed by increasing the servlet container's
setting for the maximum number of request parameters.

http://forums.adobe.com/message/4590864

Here's what I can say without further information:

You're likely having performance issues.  One potential problem is your
insanely high autowarmCount values.  Your cache configuration tells Solr
that every time you have a soft commit or a hard commit with
openSearcher=true, you're going to execute up to 1024 queries and up to
4096 filters from the old caches, in order to warm the new caches.  Even
if you have an optimal setup, this takes a lot of time.  I suspect that
you don't have an optimal setup.

Another potential problem is that you don't have enough memory for the
size of your index.  A number of potential performance problems are
discussed on this wiki page:

http://wiki.apache.org/solr/SolrPerformanceProblems

A lot more details are required.  Here's some things that will be
helpful, and more is always better:

* Exact symptoms.
* Excerpts from the Solr logfile that include entire stacktraces.
* Operating system and version.
* Total server index size on disk.
* Total machine memory.
* Java heap size for your servlet container.
* Which servlet container you are using to run Solr.
* Solr version.
* Server hardware details.

Thanks,
Shawn



Re: Programmatically upload configuration into ZooKeeper

2013-12-04 Thread Greg Walters
Hi Artem,

This question (or one very like it) has been asked on this list before so 
there's some prior art you could modify to suit your needs.

Taken from Timothy Potter :

**
   public static void updateClusterstateJsonInZk(CloudSolrServer
cloudSolrServer, CommandLine cli) throws Exception {
   String updateClusterstateJson =
cli.getOptionValue("updateClusterstateJson");

   ZkStateReader zkStateReader = cloudSolrServer.getZkStateReader();
   SolrZkClient zkClient = zkStateReader.getZkClient();

   File jsonFile = new File(updateClusterstateJson);
   if (!jsonFile.isFile()) {
   System.err.println(jsonFile.getAbsolutePath()+" not found.");
   return;
   }

   byte[] clusterstateJson = readFile(jsonFile);

   // validate the user is passing is valid JSON
   InputStreamReader bytesReader = new InputStreamReader(new
ByteArrayInputStream(clusterstateJson), "UTF-8");
   JSONParser parser = new JSONParser(bytesReader);
   parser.toString();

   zkClient.setData("/clusterstate.json", clusterstateJson, true);
   System.out.println("Updated /clusterstate.json with data from
"+jsonFile.getAbsolutePath());
   }
**

You should be able to modify that or use it as a basis for uploading the 
changed files in your config.

Thanks,
Greg

On Dec 4, 2013, at 8:36 AM, Artem Karpenko  wrote:

> What is the best way to upload Solr configuration files into ZooKeeper 
> programmatically, i.e. - from within Java code?
> I know that there are cloud-scripts for this, but in the end they should use 
> some Java client library, don't they?
> 
> This question raised because we use special configuration system (Java-based) 
> to store all configuration files (not only Solr) and it'd be cool if we could
> export modified files into ZooKeeper when applying changes. We would then 
> reload collections remotely via REST API.
> 
> I've digged a little into ZkCli class and it seems that SolrZkClient can do 
> something along the lines above. Is it the right tool for the job?
> 
> Any hints would be appreciated.
> 
> Regards,
> Artem.



Re: Using Payloads as a Coefficient For Score At a Custom QParser That extends ExtendedDismaxQParser

2013-12-04 Thread Joel Bernstein
Sounds great Furkan,

Do you have the permission to donate this code? I would be great if you
could create a JIra ticket.

Thanks,
Joel


On Tue, Dec 3, 2013 at 3:26 PM, Furkan KAMACI wrote:

> I've implemented what I want. I can add payload score into the document
> score. I've modified ExtendedDismaxQParser and I can use all the abilities
> of edismax at my case. I will explain what I did at my blog.
>
> Thanks;
> Furkan KAMACI
>
>
> 2013/12/1 Furkan KAMACI 
>
> > Hi;
> >
> > I use Solr 4.5.1 I have a case: When a user searches for some specific
> > keywords some documents should be listed at much more higher than its
> usual
> > score. I mean I have probabilities of which documents user may want to
> see
> > for given keywords.
> >
> > I have come up with that idea. I can put a new field to my schema. This
> > field holds keyword and probability as payload. When a user searches for
> a
> > keyword I will calculate usual document score for given fields and also I
> > will make a search on payloaded field and I will multiply the total score
> > with that payload.
> >
> > I followed that example:
> > http://sujitpal.blogspot.com/2013/07/porting-payloads-to-solr4.html#!
> owever
> > that example extends Qparser directly but I want to use capabilities of
> > edismax.
> >
> > So I found that example:
> >
> http://digitalpebble.blogspot.com/2010/08/using-payloads-with-dismaxqparser-in.htmlhis
> > one exteds dismax and but I could not used payloads at that example.
> >
> > I want to combine above to solutions. First solution has that case:
> >
> > @Override
> > public Similarity get(String name) {
> > if ("payloads".equals(name) || "cscores".equals(name)) {
> > return new PayloadSimilarity();
> > } else {
> > return new DefaultSimilarity();
> > }
> > }
> >
> > However dismax behaves different. i.e. when you search for cscores:A it
> > changes that into that:
> >
> > *+((text:cscores:y text:cscores text:y text:cscoresy)) ()*
> >
> > When I debug it name is text instead of cscores and does not work. My
> idea
> > is combining two examples and extending edismax. Do you have any idea how
> > to extend it for edismax or do you have any idea what to do for my case.
> >
> > *PS:* I've sent same question at Lucene user list too. I ask it here to
> > get an idea from Solr perspective too.
> >
> > Thanks;
> > Furkan KAMACI
> >
>



-- 
Joel Bernstein
Search Engineer at Heliosearch


Questions about commits and OOE

2013-12-04 Thread OSMAN Metin
Hi all,

let me first explain our situation :

We have


-   two virtual servers with each :

4x SolR 4.4.0 on Tomcat 6 (+ with mod_cluster 1.2.0), each JVM has -Xms2048m 
-Xmx2048m -XX:MaxPermSize=384m
1x Zookeeper 3.4.5 (Only one of the two Zookeeper is active.)
CentOS 6.4
Sun JDK 1.6.0-31
16 GB of RAM
4 vCPU


-   only one core and one shard

-   ~25 docs and 50-100 MB of index size

-   two load balancers (apache + mod_cluster) who are both connected to the 
8 SolR nodes

-   1 VIP pointing to these two LB

The commit configuration is

-   every update request do a soft commit (i.e. param softCommit=true in 
the http request)

-   autosoftcommit disabled

-   autocommit enabled every 15 seconds

The client application is a java app with SolRj client using the previous VIP 
as an endpoint.
We need NearRealTime modifications visible by the end users.
During the day, the client uses SolR with about 80% of select requests and 20% 
of update requests.
Every morning, the client is sending a massive bunch of updates (about 1 in 
a few minutes).

During this massive update, we have sometimes a peak of active threads 
exceeding the limit of 8192 process authorized for the user running the tomcat 
and zookeeper process.
When this happens, every hardCommit is failing with an "OutOfMemory : unable to 
create native thread" message.


Now, I have some questions :

-   Why are there some many threads created ? Is the softCommit on every 
update that opens a new thread ?

-   Once an OOE occurs, every hardcommit will be broken, even if the number 
of threads opened on the system is low. Is there any way to "free" the JVM ? 
The only solution we have found is to restart all the JVM.

-   When the OOE occurs, the SolR cloud console shows the leader node as 
active and the others as recovering

o   is the replication working at that moment ?

o   as all the hardcommits are failing but the softcommits not, am I very sure 
that I will not lose some updates when restarting all the nodes ?

By the way, we are planning to

-   disable the softCommit parameter on the client side and to enable the 
autosoftcommit instead.

-   create another server and make 3 zookeeper chorum instead of a unique 
zookeeper master.

-   skip the use of load balancers and let zookeeper decide which node will 
respond to the requests

Any help would be appreciated !

Metin OSMAN


Programmatically upload configuration into ZooKeeper

2013-12-04 Thread Artem Karpenko
What is the best way to upload Solr configuration files into ZooKeeper 
programmatically, i.e. - from within Java code?
I know that there are cloud-scripts for this, but in the end they should 
use some Java client library, don't they?


This question raised because we use special configuration system 
(Java-based) to store all configuration files (not only Solr) and it'd 
be cool if we could
export modified files into ZooKeeper when applying changes. We would 
then reload collections remotely via REST API.


I've digged a little into ZkCli class and it seems that SolrZkClient can 
do something along the lines above. Is it the right tool for the job?


Any hints would be appreciated.

Regards,
Artem.


Re: Solr Doubts

2013-12-04 Thread Erick Erickson
bq: id

This isn't correct, there's no "required" param for
. Just remove the entire  node
AND make the field definition required="false". I.e. you
should have something like:

set required="false" there.

To increase memory, you just specify -Xmx when you start,
something like:
java -Xmx2G -Xms2G -jar start.jar

But interested or not in splitting the csv file, working with 7G
input files is going to be painful no matter what. You may
find yourself having to split it up for expediency's sake.

Best,
Erick


On Wed, Dec 4, 2013 at 7:46 AM, Jiyas Basha H  wrote:

> Hai Team,
>
> I am new to Solr.
> I am trying to index 7GB CSV file.
>
> My questions:
> 1.How to index without using uniquekey ?
>
> I tried with id
>
> I got --> Document is missing mandatory uniqueKey field: id
>
> I am using query to update csv :
> localhost:9050/solr-4.5.1/collection1/update/csv?stream.file=D:\Solr\comma15_Id.csv&commit=true&header=false&fieldnames=ORD,ORC,SBN,BNA,POB,NUM,DST,STM,DDL,DLO,PTN,PCD,CTA,CTP,CTT
>
> 2. how to increase jvm heap space in solr ?
> since my file is too large i am getting java heap space error
>
> I am not interested to split my large file into batches.however i need to
> complete indexing with 7GB CSV file.
>
> please assist me to index my csv file
>
>
>
>
>
> with regards
> Jiyas
>
> Problems are only opportunities with thorns on them.
>


Re: json update moves doc to end

2013-12-04 Thread Erick Erickson
Well, both have a score of -Infinity. So they're "equal" and
the tiebreaker is the internal Lucene doc ID.

Now this is not helpful since the question now is where
-Infinity comes from, this looks suspicious:
 -Infinity = (MATCH) FunctionQuery(log(int(clicks))), product of:
-Infinity = log(int(clicks)=0)

not much help I know, but

Erick


On Wed, Dec 4, 2013 at 7:24 AM, Andreas Owen  wrote:

> Hi Erick
>
> Here are the last 2 results from a search and i am not understanding why
> the
> last one with the boost editorschoice^200 isn't at the top. By the way can
> i
> also give a substantial boost to results that contain the hole
> search-request and not just 3 or 4 letters (tokens)?
>
> 
> -Infinity = (MATCH) sum of:
>   0.013719446 = (MATCH) max of:
> 0.013719446 = (MATCH) sum of:
>   2.090396E-4 = (MATCH) weight(plain_text:ber in 841)
> [DefaultSimilarity], result of:
> 2.090396E-4 = score(doc=841,freq=8.0 = termFreq=8.0
> ), product of:
>   0.009452709 = queryWeight, product of:
> 1.3343692 = idf(docFreq=611, maxDocs=855)
> 0.0070840283 = queryNorm
>   0.022114253 = fieldWeight in 841, product of:
> 2.828427 = tf(freq=8.0), with freq of:
>   8.0 = termFreq=8.0
> 1.3343692 = idf(docFreq=611, maxDocs=855)
> 0.005859375 = fieldNorm(doc=841)
>   0.0012402858 = (MATCH) weight(plain_text:eri in 841)
> [DefaultSimilarity], result of:
> 0.0012402858 = score(doc=841,freq=9.0 = termFreq=9.0
> ), product of:
>   0.022357063 = queryWeight, product of:
> 3.1559815 = idf(docFreq=98, maxDocs=855)
> 0.0070840283 = queryNorm
>   0.05547624 = fieldWeight in 841, product of:
> 3.0 = tf(freq=9.0), with freq of:
>   9.0 = termFreq=9.0
> 3.1559815 = idf(docFreq=98, maxDocs=855)
> 0.005859375 = fieldNorm(doc=841)
>   5.0511415E-4 = (MATCH) weight(plain_text:ric in 841)
> [DefaultSimilarity], result of:
> 5.0511415E-4 = score(doc=841,freq=1.0 = termFreq=1.0
> ), product of:
>   0.024712078 = queryWeight, product of:
> 3.4884217 = idf(docFreq=70, maxDocs=855)
> 0.0070840283 = queryNorm
>   0.020439971 = fieldWeight in 841, product of:
> 1.0 = tf(freq=1.0), with freq of:
>   1.0 = termFreq=1.0
> 3.4884217 = idf(docFreq=70, maxDocs=855)
> 0.005859375 = fieldNorm(doc=841)
>   8.721528E-4 = (MATCH) weight(plain_text:ich in 841)
> [DefaultSimilarity], result of:
> 8.721528E-4 = score(doc=841,freq=12.0 = termFreq=12.0
> ), product of:
>   0.017446788 = queryWeight, product of:
> 2.4628344 = idf(docFreq=197, maxDocs=855)
> 0.0070840283 = queryNorm
>   0.049989305 = fieldWeight in 841, product of:
> 3.4641016 = tf(freq=12.0), with freq of:
>   12.0 = termFreq=12.0
> 2.4628344 = idf(docFreq=197, maxDocs=855)
> 0.005859375 = fieldNorm(doc=841)
>   7.725705E-4 = (MATCH) weight(plain_text:cht in 841)
> [DefaultSimilarity], result of:
> 7.725705E-4 = score(doc=841,freq=4.0 = termFreq=4.0
> ), product of:
>   0.021610687 = queryWeight, product of:
> 3.050621 = idf(docFreq=109, maxDocs=855)
> 0.0070840283 = queryNorm
>   0.035749465 = fieldWeight in 841, product of:
> 2.0 = tf(freq=4.0), with freq of:
>   4.0 = termFreq=4.0
> 3.050621 = idf(docFreq=109, maxDocs=855)
> 0.005859375 = fieldNorm(doc=841)
>   0.0010287998 = (MATCH) weight(plain_text:beri in 841)
> [DefaultSimilarity], result of:
> 0.0010287998 = score(doc=841,freq=1.0 = termFreq=1.0
> ), product of:
>   0.035267927 = queryWeight, product of:
> 4.978513 = idf(docFreq=15, maxDocs=855)
> 0.0070840283 = queryNorm
>   0.029170973 = fieldWeight in 841, product of:
> 1.0 = tf(freq=1.0), with freq of:
>   1.0 = termFreq=1.0
> 4.978513 = idf(docFreq=15, maxDocs=855)
> 0.005859375 = fieldNorm(doc=841)
>   0.0010556461 = (MATCH) weight(plain_text:eric in 841)
> [DefaultSimilarity], result of:
> 0.0010556461 = score(doc=841,freq=1.0 = termFreq=1.0
> ), product of:
>   0.035725117 = queryWeight, product of:
> 5.0430512 = idf(docFreq=14, maxDocs=855)
> 0.0070840283 = queryNorm
>   0.02954913 = fieldWeight in 841, product of:
> 1.0 = tf(freq=1.0), with freq of:
>   1.0 = termFreq=1.0
> 5.0430512 = idf(docFreq=14, maxDocs=855)
> 0.005859375 = fieldNorm(doc=841)
>   5.653785E-4 = (MATCH) weight(plain_text:rich in 841)
> [DefaultSimilarity], result of:
> 5.653785E-4 = score(doc=841,freq=1.0 = termFreq=1.0
> ), product of:
>   0.02614473 = queryWeight, product of:
> 3.69

Re: Solr Performance Issue

2013-12-04 Thread Erick Erickson
You need to give us more of the exception trace,
the real cause is often buried down the stack with
some text like
"Caused by..."

But at a glance your cache sizes and autowarm counts
are far higher than they should be. Try reducing
particularly the autowarm count down to, say, 16 or so.
It's actually rare that you really need very many.

I'd actually go back to the defaults to start with to test
whether this is the problem.

Further, we need to know exactly what you mean by
"change anything in the configuration file". Change
what? Details matter.

Of course the last thing you changed before you started
seeing this problem is the most likely culprit.

Best,
Erick


On Wed, Dec 4, 2013 at 8:31 AM, kumar  wrote:

> I am having almost 5 to 6 crores of indexed documents in solr. And when i
> am
> going to change anything in the configuration file solr server is going
> down.
>
> As a new user to solr i can't able to find the exact reason for going
> server
> down.
>
> I am using cache's in the following way :
>
>   size="16384"
>  initialSize="4096"
>  autowarmCount="4096"/>
>size="16384"
>  initialSize="4096"
>  autowarmCount="1024"/>
>
> and i am not using any documentCache, fieldValueCahe's
>
> Whether this can lead any performance issue means going server down.
>
> And i am seeing logging in the server it is showing exception in the
> following way
>
>
> Servlet.service() for servlet [default] in context with path [/solr] threw
> exception [java.lang.IllegalStateException: Cannot call sendError() after
> the response has been committed] with root cause
>
>
>
> Can anybody help me how can i solve this problem.
>
> Kumar.
>
>
>
>
>
>
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Solr-Performance-Issue-tp4104907.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: how to increase each index file size

2013-12-04 Thread Erick Erickson
Why do you want to do this? Are you seeing performance problems?
If not, I'd just ignore this problem, premature optimization and all that.

If you _really_ want to do this, your segments files are closed every
time you to a commit, opensearcher=true|false doesn't matter.

BUT, the longer these are the bigger your transaction log will be,
which may lead to other issues, particularly on restart. See:
http://searchhub.org/2013/08/23/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/

The key is the section on truncating the tlog.

And note the sizes of these segments will change as they're
merged anyway.

Best,
Erick


On Wed, Dec 4, 2013 at 4:42 AM, YouPeng Yang wrote:

> Hi
>   I'm using the SolrCloud integreted with HDFS,I found there are lots of
> small size files.
>   So,I'd like to increase  the index  file size  while doing DIH
> full-import. Any suggestion to achieve this goal.
>
>
> Regards.
>


Re: post filtering for boolean filter queries

2013-12-04 Thread Erick Erickson
OK, so cache=false and cost=100 should do it, see:
http://searchhub.org/2012/02/22/custom-security-filtering-in-solr/

Best,
Erick


On Wed, Dec 4, 2013 at 5:56 AM, Dmitry Kan  wrote:

> Thanks Yonik.
>
> For our use case, we would like to skip caching only one particular filter
> cache, yet apply a high cost for it to make sure it executes last of all
> filter queries.
>
> So this means, the rest of the fqs will execute and cache as usual.
>
>
>
>
> On Tue, Dec 3, 2013 at 9:58 PM, Yonik Seeley 
> wrote:
>
> > On Tue, Dec 3, 2013 at 4:45 AM, Dmitry Kan  wrote:
> > > ok, we were able to confirm the behavior regarding not caching the
> filter
> > > query. It works as expected. It does not cache with {!cache=false}.
> > >
> > > We are still looking into clarifying the cost assignment: i.e. whether
> it
> > > works as expected for long boolean filter queries.
> >
> > Yes, filters should be ordered by cost (cheapest first) whenever you
> > use {!cache=false}
> >
> > -Yonik
> > http://heliosearch.com -- making solr shine
> >
>
>
>
> --
> Dmitry
> Blog: http://dmitrykan.blogspot.com
> Twitter: twitter.com/dmitrykan
>


Solr Performance Issue

2013-12-04 Thread kumar
I am having almost 5 to 6 crores of indexed documents in solr. And when i am
going to change anything in the configuration file solr server is going
down.

As a new user to solr i can't able to find the exact reason for going server
down.

I am using cache's in the following way :


 

and i am not using any documentCache, fieldValueCahe's

Whether this can lead any performance issue means going server down.

And i am seeing logging in the server it is showing exception in the
following way


Servlet.service() for servlet [default] in context with path [/solr] threw
exception [java.lang.IllegalStateException: Cannot call sendError() after
the response has been committed] with root cause



Can anybody help me how can i solve this problem.

Kumar.









--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-Performance-Issue-tp4104907.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Faceting Query in Solr

2013-12-04 Thread Erick Erickson
The standard way of handling this kind of thing is with
filter queries. For multi-select, you have to put in some
javascript or something to make an OR clause when they
check the boxes.

So your query looks like fq=categoryID:(1 OR 2 OR 3)
rather than
fq=categoryID:1&fq=categoryID:2&fq=categoryID:3

Best,
Erick


On Wed, Dec 4, 2013 at 4:36 AM, kumar  wrote:

> Hi,
>
> I indexed data into solr by using 5 categories. Each category is
> differentiated by categoryId. Now i have a situation that i need to show
> the
> results based on facets.
>
> Ex:
>
> []-category1
> []-category2
> []-category3
> []-category4
> []-category5
>
>
> If the user checks the category1 it has to show the results based on
> categoryId-1
>
> If the user checks 2 categories it has to show the results from two
> categories which the user checked
>
> If the user checks 3 categories it has to show the results from three
> categories
>
> and son on.like how many categories user checked i have to show results
> from checked categories
>
> My Schema is in the following way..
>
>  multiValued="false" />
>  required="true" />
>  required="true"
> />
>  multiValued="true" required="true" />
>
>
> Anyone help me how can i achieve this.
>
> Regards,
> Kumar
>
>
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Faceting-Query-in-Solr-tp4104881.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: solrinitialisationerrors: Error during shutdown of writer.

2013-12-04 Thread Erick Erickson
The crux is: "java.lang.NoClassDefFoundError:"

Usually this means your classpath is wrong and
the JVM can't find the jars. Or you have multiple
jars from different versions in your classpath.

It's pretty tedious to track down, but that's where I'd
start.

In your log, you'll see a bunch of lines like this:
2794 [coreLoadExecutor-3-thread-1] INFO
 org.apache.solr.core.SolrResourceLoader  – Adding
'file:/Users/Erick/apache/4x/solr/contrib/clustering/lib/jackson-mapper-asl-1.7.4.jar'
to classloader

showing you exactly where Solr is trying to load jars from,
that'll help.

Best,
Erick


On Wed, Dec 4, 2013 at 4:08 AM, Nutan  wrote:

> I dont why all of a sudden I started getting this errror :
> this is the sreenshot:
> 
>
> I thought there might be some problem with tomcat,so I uninstalled it ,but
> i
> still get the same error.
> I have no idea why is this happening,initially it worked really well.
> In tomcat java-options home var is : *-Dsolr.solr.home=C:\solr*
> I am using the initial solr.xml only,I have  created two cores n folder
> structure is as desired.
> My folder structure is:
> 1)C:\solr\contract\conf
> 2)C:\solr\document\conf
> 3)C:\solr\lib
> These are my config files:
> *solr.xml*
> 
> 
>zkClientTimeout="${zkClientTimeout:15000}"
> hostPort="8080" hostContext="solr">
>  name="document"/>
>  name="contract"/>
>   
> 
>
> This i got after i re-installed tomcat:
>
> INFO: closing IndexWriter with IndexWriterCloser
> Dec 04, 2013 2:09:30 PM org.apache.solr.update.DefaultSolrCoreState
> closeIndexWriter
> *SEVERE: Error during shutdown of writer.*
> java.lang.NoClassDefFoundError:
> org/apache/solr/request/LocalSolrQueryRequest
> at
>
> org.apache.solr.update.DirectUpdateHandler2.closeWriter(DirectUpdateHandler2.java:682)
> at
>
> org.apache.solr.update.DefaultSolrCoreState.closeIndexWriter(DefaultSolrCoreState.java:69)
> at
>
> org.apache.solr.update.DefaultSolrCoreState.close(DefaultSolrCoreState.java:278)
> at
>
> org.apache.solr.update.SolrCoreState.decrefSolrCoreState(SolrCoreState.java:73)
> at org.apache.solr.core.SolrCore.close(SolrCore.java:972)
> at
> org.apache.solr.core.CoreContainer.shutdown(CoreContainer.java:771)
> at
>
> org.apache.solr.servlet.SolrDispatchFilter.destroy(SolrDispatchFilter.java:134)
> at
>
> org.apache.catalina.core.ApplicationFilterConfig.release(ApplicationFilterConfig.java:311)
> at
>
> org.apache.catalina.core.StandardContext.filterStop(StandardContext.java:4660)
> at
>
> org.apache.catalina.core.StandardContext.stopInternal(StandardContext.java:5442)
> at
> org.apache.catalina.util.LifecycleBase.stop(LifecycleBase.java:232)
> at
> org.apache.catalina.core.ContainerBase.removeChild(ContainerBase.java:1001)
> at
> org.apache.catalina.startup.HostConfig.checkResources(HostConfig.java:1272)
> at
> org.apache.catalina.startup.HostConfig.check(HostConfig.java:1450)
> at
> org.apache.catalina.startup.HostConfig.lifecycleEvent(HostConfig.java:295)
> at
>
> org.apache.catalina.util.LifecycleSupport.fireLifecycleEvent(LifecycleSupport.java:119)
> at
>
> org.apache.catalina.util.LifecycleBase.fireLifecycleEvent(LifecycleBase.java:90)
> at
>
> org.apache.catalina.core.ContainerBase.backgroundProcess(ContainerBase.java:1338)
> at
>
> org.apache.catalina.core.ContainerBase$ContainerBackgroundProcessor.processChildren(ContainerBase.java:1496)
> at
>
> org.apache.catalina.core.ContainerBase$ContainerBackgroundProcessor.processChildren(ContainerBase.java:1506)
> at
>
> org.apache.catalina.core.ContainerBase$ContainerBackgroundProcessor.run(ContainerBase.java:1485)
> at java.lang.Thread.run(Unknown Source)
>
> Please help me, after implementing so much this error has screwed me up.
> *Thanks in advance.*
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/solrinitialisationerrors-Error-during-shutdown-of-writer-tp4104874.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: Deleting and committing inside a SearchComponent

2013-12-04 Thread Erick Erickson
I agree with Upayavira. This seems architecturally
questionable.

In your example, the crux of the matter is
"Only differ by one field". Figuring that out is going to
be expensive, are you burdening searches with this
kind of logic?

Why not create a custom update processor that does
this and use such a component? Or build it into
your updates when you ingest the docs? Or build
a signature field and issue a delete by query on that
when you update?

Best,
Erick


On Tue, Dec 3, 2013 at 9:48 PM, Peyman Faratin wrote:

>
> On Dec 3, 2013, at 8:41 PM, Upayavira  wrote:
>
> >
> >
> > On Tue, Dec 3, 2013, at 03:22 PM, Peyman Faratin wrote:
> >> Hi
> >>
> >> Is it possible to delete and commit updates to an index inside a custom
> >> SearchComponent? I know I can do it with solrj but due to several
> >> business logic requirements I need to build the logic inside the search
> >> component.  I am using SOLR 4.5.0.
> >
> > That just doesn't make sense. Search components are read only.
> >
> i can think of many situations that it makes sense. for instance, you
> search for a document and your index contains many duplicates that only
> differ by one field, such as the time they were indexed (think news feeds
> from multiple sources). So after the search we want to delete the duplicate
> documents that satisfy some policy (here date, but it could be some other
> policy).
>
> > What are you trying to do? What stuff do you need to change? Could you
> > do it within an UpdateProcessor?
>
> Solution i am working with
>
> UpdateRequestProcessorChain processorChain =
> rb.req.getCore().getUpdateProcessingChain(rb.req.getParams().get(UpdateParams.UPDATE_CHAIN));
> UpdateRequestProcessor processor = processorChain.createProcessor(rb.req,
> rb.rsp);
> ...
> docId = f();
> ...
> DeleteUpdateCommand cmd = new DeleteUpdateCommand(req);
> cmd.setId(docId.toString());
> processor.processDelete(cmd);
>
>
> >
> > Upayavira
>
>


Re: [Solr Wiki] Your wiki account data

2013-12-04 Thread Erick Erickson
Sure. Unfortunately we had a problem a while
ago with spam bots creating pages so had
to lock it down.

Done, you should be able to edit the Solr Wiki.

Erick


On Wed, Dec 4, 2013 at 8:06 AM, Mehdi Burgy  wrote:

> Hello,
>
> We've recently launched a job search engine using Solr, and would like to
> add it here: https://wiki.apache.org/solr/PublicServers
>
> Would it be possible to allow me be part of the publishing group?
>
> Thank you for your help
>
> Kind Regards,
>
> Mehdi Burgy
> New Job Search Engine:
> www.jobreez.com
>
> -- Forwarded message --
> From: Apache Wiki 
> Date: 2013/12/4
> Subject: [Solr Wiki] Your wiki account data
> To: Apache Wiki 
>
>
>
> Somebody has requested to email you a password recovery token.
>
> If you lost your password, please go to the password reset URL below or
> go to the password recovery page again and enter your username and the
> recovery token.
>
> Login Name: madeinch
>


Fwd: [Solr Wiki] Your wiki account data

2013-12-04 Thread Mehdi Burgy
Hello,

We've recently launched a job search engine using Solr, and would like to
add it here: https://wiki.apache.org/solr/PublicServers

Would it be possible to allow me be part of the publishing group?

Thank you for your help

Kind Regards,

Mehdi Burgy
New Job Search Engine:
www.jobreez.com

-- Forwarded message --
From: Apache Wiki 
Date: 2013/12/4
Subject: [Solr Wiki] Your wiki account data
To: Apache Wiki 



Somebody has requested to email you a password recovery token.

If you lost your password, please go to the password reset URL below or
go to the password recovery page again and enter your username and the
recovery token.

Login Name: madeinch


Solr Doubts

2013-12-04 Thread Jiyas Basha H
Hai Team, 

I am new to Solr. 
I am trying to index 7GB CSV file. 

My questions: 
1.How to index without using uniquekey ? 

I tried with id 

I got --> Document is missing mandatory uniqueKey field: id 

I am using query to update csv : 
localhost:9050/solr-4.5.1/collection1/update/csv?stream.file=D:\Solr\comma15_Id.csv&commit=true&header=false&fieldnames=ORD,ORC,SBN,BNA,POB,NUM,DST,STM,DDL,DLO,PTN,PCD,CTA,CTP,CTT
 

2. how to increase jvm heap space in solr ? 
since my file is too large i am getting java heap space error 

I am not interested to split my large file into batches.however i need to 
complete indexing with 7GB CSV file. 

please assist me to index my csv file 





with regards 
Jiyas 

Problems are only opportunities with thorns on them. 


RE: json update moves doc to end

2013-12-04 Thread Andreas Owen
Hi Erick

Here are the last 2 results from a search and i am not understanding why the
last one with the boost editorschoice^200 isn't at the top. By the way can i
also give a substantial boost to results that contain the hole
search-request and not just 3 or 4 letters (tokens)?


-Infinity = (MATCH) sum of:
  0.013719446 = (MATCH) max of:
0.013719446 = (MATCH) sum of:
  2.090396E-4 = (MATCH) weight(plain_text:ber in 841)
[DefaultSimilarity], result of:
2.090396E-4 = score(doc=841,freq=8.0 = termFreq=8.0
), product of:
  0.009452709 = queryWeight, product of:
1.3343692 = idf(docFreq=611, maxDocs=855)
0.0070840283 = queryNorm
  0.022114253 = fieldWeight in 841, product of:
2.828427 = tf(freq=8.0), with freq of:
  8.0 = termFreq=8.0
1.3343692 = idf(docFreq=611, maxDocs=855)
0.005859375 = fieldNorm(doc=841)
  0.0012402858 = (MATCH) weight(plain_text:eri in 841)
[DefaultSimilarity], result of:
0.0012402858 = score(doc=841,freq=9.0 = termFreq=9.0
), product of:
  0.022357063 = queryWeight, product of:
3.1559815 = idf(docFreq=98, maxDocs=855)
0.0070840283 = queryNorm
  0.05547624 = fieldWeight in 841, product of:
3.0 = tf(freq=9.0), with freq of:
  9.0 = termFreq=9.0
3.1559815 = idf(docFreq=98, maxDocs=855)
0.005859375 = fieldNorm(doc=841)
  5.0511415E-4 = (MATCH) weight(plain_text:ric in 841)
[DefaultSimilarity], result of:
5.0511415E-4 = score(doc=841,freq=1.0 = termFreq=1.0
), product of:
  0.024712078 = queryWeight, product of:
3.4884217 = idf(docFreq=70, maxDocs=855)
0.0070840283 = queryNorm
  0.020439971 = fieldWeight in 841, product of:
1.0 = tf(freq=1.0), with freq of:
  1.0 = termFreq=1.0
3.4884217 = idf(docFreq=70, maxDocs=855)
0.005859375 = fieldNorm(doc=841)
  8.721528E-4 = (MATCH) weight(plain_text:ich in 841)
[DefaultSimilarity], result of:
8.721528E-4 = score(doc=841,freq=12.0 = termFreq=12.0
), product of:
  0.017446788 = queryWeight, product of:
2.4628344 = idf(docFreq=197, maxDocs=855)
0.0070840283 = queryNorm
  0.049989305 = fieldWeight in 841, product of:
3.4641016 = tf(freq=12.0), with freq of:
  12.0 = termFreq=12.0
2.4628344 = idf(docFreq=197, maxDocs=855)
0.005859375 = fieldNorm(doc=841)
  7.725705E-4 = (MATCH) weight(plain_text:cht in 841)
[DefaultSimilarity], result of:
7.725705E-4 = score(doc=841,freq=4.0 = termFreq=4.0
), product of:
  0.021610687 = queryWeight, product of:
3.050621 = idf(docFreq=109, maxDocs=855)
0.0070840283 = queryNorm
  0.035749465 = fieldWeight in 841, product of:
2.0 = tf(freq=4.0), with freq of:
  4.0 = termFreq=4.0
3.050621 = idf(docFreq=109, maxDocs=855)
0.005859375 = fieldNorm(doc=841)
  0.0010287998 = (MATCH) weight(plain_text:beri in 841)
[DefaultSimilarity], result of:
0.0010287998 = score(doc=841,freq=1.0 = termFreq=1.0
), product of:
  0.035267927 = queryWeight, product of:
4.978513 = idf(docFreq=15, maxDocs=855)
0.0070840283 = queryNorm
  0.029170973 = fieldWeight in 841, product of:
1.0 = tf(freq=1.0), with freq of:
  1.0 = termFreq=1.0
4.978513 = idf(docFreq=15, maxDocs=855)
0.005859375 = fieldNorm(doc=841)
  0.0010556461 = (MATCH) weight(plain_text:eric in 841)
[DefaultSimilarity], result of:
0.0010556461 = score(doc=841,freq=1.0 = termFreq=1.0
), product of:
  0.035725117 = queryWeight, product of:
5.0430512 = idf(docFreq=14, maxDocs=855)
0.0070840283 = queryNorm
  0.02954913 = fieldWeight in 841, product of:
1.0 = tf(freq=1.0), with freq of:
  1.0 = termFreq=1.0
5.0430512 = idf(docFreq=14, maxDocs=855)
0.005859375 = fieldNorm(doc=841)
  5.653785E-4 = (MATCH) weight(plain_text:rich in 841)
[DefaultSimilarity], result of:
5.653785E-4 = score(doc=841,freq=1.0 = termFreq=1.0
), product of:
  0.02614473 = queryWeight, product of:
3.6906586 = idf(docFreq=57, maxDocs=855)
0.0070840283 = queryNorm
  0.021624953 = fieldWeight in 841, product of:
1.0 = tf(freq=1.0), with freq of:
  1.0 = termFreq=1.0
3.6906586 = idf(docFreq=57, maxDocs=855)
0.005859375 = fieldNorm(doc=841)
  0.0010596104 = (MATCH) weight(plain_text:icht in 841)
[DefaultSimilarity], result of:
0.0010596104 = score(doc=841,freq=3.0 = termFreq=3.0
), product of:
  0.027196141 = queryWeight, product of:
3.8390784 = idf(docFreq=49, maxDocs=855)
0.0070840283 = quer

Re: Using the flexible query parser in Solr instead of classic

2013-12-04 Thread Karsten R.
Hi Jack Kurpansky, Hi folks

We could recreate the edismax QueryParser from classic to flexible.
But is this a need for someone else? 

In long text:

ExtendedDismaxQParser uses ExtendedSolrQueryParser.
ExtendedSolrQueryParser is derived from SolrQueryParser.
So it is based on org.apache.solr.parser.QueryParser.jj which is a slightly
change of org.apache.lucene.queryparser.classic.QueryParser.jj 

If SolrQueryParser switches to lucene flexible QueryParser the
ExtendedSolrQueryParser will be a good example how to generate Subclasses
without the "classic" logic of overwriting the methodes "getFuzzyQuery",
"getPrefixQuery", "getWildcardQuery" ...
(and using instead subclasses of "FuzzyQueryNodeProcessor",
"WildcardQueryNodeProcessor" ..).

So again:
Is this a need for someone else?


Best regards
  Karsten



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Using-the-flexible-query-parser-in-Solr-instead-of-classic-tp4104584p4104895.html
Sent from the Solr - User mailing list archive at Nabble.com.


Solr cuts highlighted sentences

2013-12-04 Thread katoo
Hi guys,

when searching for a phrase I get results and would like to show a
highlighting.
The highlightings beeing shown begin somewhere in the sentence, beginning
with a coma or something else.
I'd like to get highlightings beginning with a stence.
How to manage this.
I've tried so many things found in internet, but nothing helped.
Example:

query.setHighlight(true).setParam("hl.useFastVectorHighlighte", "true");
query.setHighlight(true).setParam("hl.fragsize", "500");
query.setHighlight(true).setParam("hl.fragmenter", "regex");
query.setHighlight(true).setParam("hl.regex.slop", "0.8");
query.setHighlight(true).setParam("hl.regex.pattern",
"[\\w][^.!?]{400,600}[.!?]"); //\w[^\.!\?]{400,600}[\.!\?]
query.setHighlight(true).setParam("hl.bs.type", "SENTENCE");

etc etc..
Whats wrong about this?
Thx





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-cuts-highlighted-sentences-tp4104894.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: What type of Solr plugin do I need for filtering results?

2013-12-04 Thread Thomas Seidl
Thanks a lot for both of your answers! The QParserPlugin is probably 
what I meant, but join queries also look interesting and like the could 
maybe solve my use case, too, without any custom code.
However, since this would make it impossible (I think) to have a score 
for the results but I do want to do fulltext searches on the returned 
field set (with score) it will probably not be enough.


Anyways, I'll look into both of your suggestions. Thanks a lot again!

On 2013-12-02 05:39, Ahmet Arslan wrote:

It depends on your use case. What is you custom criteria how is stored etc.


For example  I had two tables, lets say items and permissions tables. 
Permissions table was holding itemId,userId pairs. Meaning userId can see this 
itemId. My initial effort was index items and add a multivalued field named 
WhoCanSeeMe. And fiterQuery on that field using current user.

After sometime indexing become troublesome. Indexing was slowing down. I 
switched to two cores for each table and used query time join. (JoinQParser) as 
a fq. I didnt have anly plugin for the above.

By the way here is an example of post filter Joel advises : 
http://searchhub.org/2012/02/22/custom-security-filtering-in-solr/






On Monday, December 2, 2013 5:14 AM, Joel Bernstein  wrote:

What you're looking for is a QParserPlugin. Here is an example:

http://svn.apache.org/viewvc/lucene/dev/tags/lucene_solr_4_6_0/solr/core/src/java/org/apache/solr/search/FunctionRangeQParserPlugin.java?revision=1544545&view=markup

You're probably want to implement the QParserPlugin as PostFilter.




On Sun, Dec 1, 2013 at 3:46 PM, Thomas Seidl  wrote:


Hi,

I'm currently looking at writing my first Solr plugin, but I could not
really find any "overview" information about how a Solr request works
internally, what the control flow is and what kind of plugins are available
to customize this at which point. The Solr wiki page on plugins [1], in my
opinion, already assumes too much knowledge and is too terse in its
descriptions.

[1] http://wiki.apache.org/solr/SolrPlugins

If anyone knows of any good ressources to get me started, that would be
awesome!

However, also pretty helpful would be just to know what kind of plugin I
should create for my use case, as I could then at least try to find
information specific to that. What I want to do is filter the search
results (at the time fq filters are applied, so before sorting, facetting,
range selection, etc. takes place) by some custom criterion (passed in the
URL). The plan is to add the data needed for that custom filter as a
separate set of documents to Solr and look them up from the Solr index when
filtering the query. Basically the thing discussed in [2], at 29:07.

[2] http://www.youtube.com/watch?v=kJa-3PEc90g&feature=youtu.be&t=29m7s

So, the question is, what kind of plugin would I use (and how would it
have to be configured)? I first thought it'd have to be a SearchComponent,
but I think with that I'd only get the results after they are sorted and
trimmed to the range, right?

Thanks a lot in advance,
Thomas Seidl







Re: Shouldn't fuzzy version of a solr query always return a super set of its not-fuzzy equivalent

2013-12-04 Thread Erik Hatcher
Chances are you're not getting those fuzzy terms analyzed as you'd like.  See 
debug (&debug=true) output to be sure.  Most likely the fuzzy terms are not 
being lowercased.  See http://wiki.apache.org/solr/MultitermQueryAnalysis for 
more details (this applies to fuzzy, not just wildcard) terms too.

Erik


On Dec 4, 2013, at 4:46 AM, Mhd Wrk  wrote:

> I'm using the following query to do a fuzzy search on Solr 4.5.1 and am
> getting empty result.
> 
> qt=standard&q=+(field1|en_CA|:Swimming~2 field1|en|:Swimming~2)
> +(field1|en_CA|:Goggle~1 field1|en|:Goggle~1) +(+startDate:[* TO
> 2013-12-04T00:23:00Z] -endDate:[* TO
> 2013-12-04T00:23:00Z])&start=0&rows=10&fl=id
> 
> If I change it to a not fuzzy query by simply dropping tildes from the
> terms (see below) then it returns the expected result! Is this a bug?
> Shouldn't fuzzy version of a query always return a super set of its
> not-fuzzy equivalent?
> 
> qt=standard&q=+(field1|en_CA|:Swimming field1|en|:Swimming)
> +(field1|en_CA|:Goggle field1|en|:Goggle) +(+startDate:[* TO
> 2013-12-04T00:23:00Z] -endDate:[* TO
> 2013-12-04T00:23:00Z])&start=0&rows=10&fl=id



Re: post filtering for boolean filter queries

2013-12-04 Thread Dmitry Kan
Thanks Yonik.

For our use case, we would like to skip caching only one particular filter
cache, yet apply a high cost for it to make sure it executes last of all
filter queries.

So this means, the rest of the fqs will execute and cache as usual.




On Tue, Dec 3, 2013 at 9:58 PM, Yonik Seeley  wrote:

> On Tue, Dec 3, 2013 at 4:45 AM, Dmitry Kan  wrote:
> > ok, we were able to confirm the behavior regarding not caching the filter
> > query. It works as expected. It does not cache with {!cache=false}.
> >
> > We are still looking into clarifying the cost assignment: i.e. whether it
> > works as expected for long boolean filter queries.
>
> Yes, filters should be ordered by cost (cheapest first) whenever you
> use {!cache=false}
>
> -Yonik
> http://heliosearch.com -- making solr shine
>



-- 
Dmitry
Blog: http://dmitrykan.blogspot.com
Twitter: twitter.com/dmitrykan


RE: SolrCloud FunctionQuery inconsistency

2013-12-04 Thread sling
Hi Raju,
Collection is a concept in solrcloud, and core is in standalone mode.
So you can create multiple cores in solr standalone mode, not collections.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/SolrCloud-FunctionQuery-inconsistency-tp4104346p4104888.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Automatically build spellcheck dictionary on replicas

2013-12-04 Thread Mirko
Ok, thanks for pointing that out!


2013/12/3 Kydryavtsev Andrey 

> Yep, sorry, it doesn't work for file-based dictionaries:
>
> > In particular, you still need to index the dictionary file once by
> issuing a search with &spellcheck.build=true on the end of the URL; if you
> system doesn't update that dictionary file, then this only needs to be done
> once. This manual step may be required even if your configuration sets
> build=true and reload=true.
>
> http://wiki.apache.org/solr/FileBasedSpellChecker
>
> 03.12.2013, 21:27, "Mirko" :
> > Yes, I have that, but it doesn't help. It seems Solr still needs the
> query
> > with the "spellcheck.build" parameter to build the spellchecker index.
> >
> > 2013/12/3 Kydryavtsev Andrey 
> >
> >>  Did you try to add
> >>true
> >>   parameter to your slave's spellcheck configuration?
> >>
> >>  03.12.2013, 12:04, "Mirko"  >:
> >>>  Hi all,
> >>>  We use a Solr SpellcheckComponent with a file-based dictionary. We
> run a
> >>>  master and some replica slave servers. To update the dictionary, we
> copy
> >>>  the dictionary txt file to the master, from where it is automatically
> >>>  replicated to all slaves. However, it seems we need to run the
> >>>  "spellcheck.build" query on all servers individually.
> >>>
> >>>  Is there a way to automatically build the spellcheck dictionary on all
> >>>  servers without calling "spellcheck.build" on all slaves individually?
> >>>
> >>>  We use Solr 4.0.0
> >>>
> >>>  Thanks,
> >>>  Mirko
>


Solr Suggester ranked by boost

2013-12-04 Thread Mirko
I want to implement a Solr Suggester (http://wiki.apache.org/solr/Suggester)
that ranks suggestions by document boost factor.

As I understand the documentation, the following config should work:

Solrconfig.xml:

...


true
7
true


suggest





default
suggesttext
org.apache.solr.spelling.suggest.Suggester
org.apache.solr.spelling.suggest.fst.WFSTLookupFactory
true


...

Schema.xml:

...

...

...

I added three documents with a document boost:

{

"add": {
  "commitWithin": 5000,
  "overwrite": true,
  "boost": 3.0,
  "doc": {
"id": "1",
"suggesttext": "text bb"
  }
},
"add": {
  "commitWithin": 5000,
  "overwrite": true,
  "boost": 2.0,
  "doc": {
"id": "2",
"suggesttext": "text cc"
  }
},
"add": {
  "commitWithin": 5000,
  "overwrite": true,
  "boost": 1.0,
  "doc": {
"id": "3",
"suggesttext": "text aa"
  }
}

}

A query the suggest handler (with spellcheck.q=te) gives the following
response:

{
  "responseHeader":{
"status":0,
"QTime":6},
  "command":"build",
  "response":{"numFound":3,"start":0,"docs":[
  {
"id":"1",
"suggesttext":["text bb"]},
  {
"id":"2",
"suggesttext":["text cc"]},
  {
"id":"3",
"suggesttext":["text aa"]}]
  },
  "spellcheck":{
"suggestions":[
  "te",{
"numFound":3,
"startOffset":0,
"endOffset":2,
"suggestion":["text aa",
  "text bb",
  "text cc"]}]}}

The search results are ranked by boost as expected. However, the
suggestions are not ranked by boost (but alphabetically instead). I also
tried the TSTLookup and FSTLookup lookup implementations with the same
result.

Any ideas what I'm missing?

Thanks,
Mirko


Shouldn't fuzzy version of a solr query always return a super set of its not-fuzzy equivalent

2013-12-04 Thread Mhd Wrk
I'm using the following query to do a fuzzy search on Solr 4.5.1 and am
getting empty result.

qt=standard&q=+(field1|en_CA|:Swimming~2 field1|en|:Swimming~2)
+(field1|en_CA|:Goggle~1 field1|en|:Goggle~1) +(+startDate:[* TO
2013-12-04T00:23:00Z] -endDate:[* TO
2013-12-04T00:23:00Z])&start=0&rows=10&fl=id

If I change it to a not fuzzy query by simply dropping tildes from the
terms (see below) then it returns the expected result! Is this a bug?
Shouldn't fuzzy version of a query always return a super set of its
not-fuzzy equivalent?

qt=standard&q=+(field1|en_CA|:Swimming field1|en|:Swimming)
+(field1|en_CA|:Goggle field1|en|:Goggle) +(+startDate:[* TO
2013-12-04T00:23:00Z] -endDate:[* TO
2013-12-04T00:23:00Z])&start=0&rows=10&fl=id


how to increase each index file size

2013-12-04 Thread YouPeng Yang
Hi
  I'm using the SolrCloud integreted with HDFS,I found there are lots of
small size files.
  So,I'd like to increase  the index  file size  while doing DIH
full-import. Any suggestion to achieve this goal.


Regards.


Faceting Query in Solr

2013-12-04 Thread kumar
Hi,

I indexed data into solr by using 5 categories. Each category is
differentiated by categoryId. Now i have a situation that i need to show the
results based on facets.

Ex:

[]-category1
[]-category2
[]-category3
[]-category4
[]-category5


If the user checks the category1 it has to show the results based on
categoryId-1

If the user checks 2 categories it has to show the results from two
categories which the user checked

If the user checks 3 categories it has to show the results from three
categories

and son on.like how many categories user checked i have to show results
from checked categories

My Schema is in the following way..







Anyone help me how can i achieve this.

Regards,
Kumar





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Faceting-Query-in-Solr-tp4104881.html
Sent from the Solr - User mailing list archive at Nabble.com.


solrinitialisationerrors: Error during shutdown of writer.

2013-12-04 Thread Nutan
I dont why all of a sudden I started getting this errror :
this is the sreenshot:
 

I thought there might be some problem with tomcat,so I uninstalled it ,but i
still get the same error.
I have no idea why is this happening,initially it worked really well.
In tomcat java-options home var is : *-Dsolr.solr.home=C:\solr*
I am using the initial solr.xml only,I have  created two cores n folder
structure is as desired.
My folder structure is:
1)C:\solr\contract\conf
2)C:\solr\document\conf
3)C:\solr\lib
These are my config files:
*solr.xml*


  


  


This i got after i re-installed tomcat:

INFO: closing IndexWriter with IndexWriterCloser
Dec 04, 2013 2:09:30 PM org.apache.solr.update.DefaultSolrCoreState
closeIndexWriter
*SEVERE: Error during shutdown of writer.*
java.lang.NoClassDefFoundError:
org/apache/solr/request/LocalSolrQueryRequest
at
org.apache.solr.update.DirectUpdateHandler2.closeWriter(DirectUpdateHandler2.java:682)
at
org.apache.solr.update.DefaultSolrCoreState.closeIndexWriter(DefaultSolrCoreState.java:69)
at
org.apache.solr.update.DefaultSolrCoreState.close(DefaultSolrCoreState.java:278)
at
org.apache.solr.update.SolrCoreState.decrefSolrCoreState(SolrCoreState.java:73)
at org.apache.solr.core.SolrCore.close(SolrCore.java:972)
at org.apache.solr.core.CoreContainer.shutdown(CoreContainer.java:771)
at
org.apache.solr.servlet.SolrDispatchFilter.destroy(SolrDispatchFilter.java:134)
at
org.apache.catalina.core.ApplicationFilterConfig.release(ApplicationFilterConfig.java:311)
at
org.apache.catalina.core.StandardContext.filterStop(StandardContext.java:4660)
at
org.apache.catalina.core.StandardContext.stopInternal(StandardContext.java:5442)
at org.apache.catalina.util.LifecycleBase.stop(LifecycleBase.java:232)
at
org.apache.catalina.core.ContainerBase.removeChild(ContainerBase.java:1001)
at
org.apache.catalina.startup.HostConfig.checkResources(HostConfig.java:1272)
at org.apache.catalina.startup.HostConfig.check(HostConfig.java:1450)
at
org.apache.catalina.startup.HostConfig.lifecycleEvent(HostConfig.java:295)
at
org.apache.catalina.util.LifecycleSupport.fireLifecycleEvent(LifecycleSupport.java:119)
at
org.apache.catalina.util.LifecycleBase.fireLifecycleEvent(LifecycleBase.java:90)
at
org.apache.catalina.core.ContainerBase.backgroundProcess(ContainerBase.java:1338)
at
org.apache.catalina.core.ContainerBase$ContainerBackgroundProcessor.processChildren(ContainerBase.java:1496)
at
org.apache.catalina.core.ContainerBase$ContainerBackgroundProcessor.processChildren(ContainerBase.java:1506)
at
org.apache.catalina.core.ContainerBase$ContainerBackgroundProcessor.run(ContainerBase.java:1485)
at java.lang.Thread.run(Unknown Source)

Please help me, after implementing so much this error has screwed me up.
*Thanks in advance.*



--
View this message in context: 
http://lucene.472066.n3.nabble.com/solrinitialisationerrors-Error-during-shutdown-of-writer-tp4104874.html
Sent from the Solr - User mailing list archive at Nabble.com.