Re: creating collections dynamically.

2013-11-08 Thread mike st. john
thanks shawn,  i'll give it a try.


msj


On Fri, Nov 8, 2013 at 10:29 PM, Shawn Heisey  wrote:

> On 11/8/2013 7:39 PM, mike st. john wrote:
>
>> Is there any way to create collections dynamically.
>>
>>
>> Having some issues using collections api, need to pass dataDir etc to the
>> cores  doesn't seem to
>> work correctly?
>>
>
> You can't pass dataDir with the collections API. It is concerned with the
> entire collection, not individual cores. With SolrCloud, you really
> shouldn't be trying to override those things.  One reason you might want to
> do this is that you want to share one instanceDir with all your cores.
>  This is basically unsupported with SolrCloud, because the config is in
> zookeeper, not on the disk.  The dataDir defaults to $instanceDir/data.
>
> If you *really* want to go against recommendations and control all the
> directories yourself, you can build the cores using the CoreAdmin API
> instead of the Collections API.  The wiki page on SolrCloud has some
> details on how to do this.
>
> Thanks,
> Shawn
>
>


Re: creating collections dynamically.

2013-11-08 Thread Shawn Heisey

On 11/8/2013 7:39 PM, mike st. john wrote:

Is there any way to create collections dynamically.


Having some issues using collections api, need to pass dataDir etc to the
cores  doesn't seem to
work correctly?


You can't pass dataDir with the collections API. It is concerned with 
the entire collection, not individual cores. With SolrCloud, you really 
shouldn't be trying to override those things.  One reason you might want 
to do this is that you want to share one instanceDir with all your 
cores.  This is basically unsupported with SolrCloud, because the config 
is in zookeeper, not on the disk.  The dataDir defaults to 
$instanceDir/data.


If you *really* want to go against recommendations and control all the 
directories yourself, you can build the cores using the CoreAdmin API 
instead of the Collections API.  The wiki page on SolrCloud has some 
details on how to do this.


Thanks,
Shawn



creating collections dynamically.

2013-11-08 Thread mike st. john
Is there any way to create collections dynamically.


Having some issues using collections api, need to pass dataDir etc to the
cores  doesn't seem to
work correctly?


thanks.

msj


Re: Question on Lots Of cores - How do I know it's working

2013-11-08 Thread Erick Erickson
Just send a query for that core I think

Erick


On Fri, Nov 8, 2013 at 11:14 AM, vybe3142  wrote:

> On a related note, ..
> In our application, the cores can get moderately large , and since we
> mostly
> use a subset of them on a roughly LRU basis, the  dynamic core loading
> seems
> a good fit. We interact with our solr server via a solrj client.
>
> That said, we do require the capability to access older cores. Is there any
> command we can use to "warm" a large potentially unloaded transient core
> when we foresee updating or querying it in the near future. For instance,
> would a solrj ping command work for this purpose ?
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Question-on-Lots-Of-cores-How-do-I-know-it-s-working-tp4099847p4100014.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: Solr cloud : Changing properties of alreadt loaded collection

2013-11-08 Thread sriram
Thanks a lot Erick. I could get that working in my environment.

Kind Regards,
V.Sriram



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-cloud-Changing-properties-of-alreadt-loaded-collection-tp4099671p4100062.html
Sent from the Solr - User mailing list archive at Nabble.com.


core swap duplicates core entries in solr.xml

2013-11-08 Thread Branham, Jeremy [HR]
When performing  a core swap in SOLR 4.5.1 with persistence on, the two core 
entries that were swapped are duplicated.

Solr.xml



  


















  



Performed swap -



  


























Jeremy D. Branham
Performance Technologist II
Sprint University Performance Support
Fort Worth, TX | Tel: **DOTNET
Office: +1 (972) 405-2970 | Mobile: +1 (817) 791-1627
http://JeremyBranham.Wordpress.com
http://www.linkedin.com/in/jeremybranham




This e-mail may contain Sprint proprietary information intended for the sole 
use of the recipient(s). Any use by others is prohibited. If you are not the 
intended recipient, please contact the sender and delete all copies of the 
message.


Re: Unexpected query result

2013-11-08 Thread Patrick Duc
Thank you for your very quick reply - and for your solution, that works
perfectly well.

Still, I wonder why this simple and straightforward syntax "web OR
NOT(russia)" needs some translation to be processed correctly...
>From the many related posts I read before asking my question, I know that
I'm not the first one to be puzzled by this behavior. Wouldn't it be a good
idea to modify the (Lucene, I guess ?) parser so that the subsequent
processing would produce a correct result ?

Thanks again for your help !



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Unexpected-query-result-tp416p4100015.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Question on Lots Of cores - How do I know it's working

2013-11-08 Thread vybe3142
On a related note, ..
In our application, the cores can get moderately large , and since we mostly
use a subset of them on a roughly LRU basis, the  dynamic core loading seems
a good fit. We interact with our solr server via a solrj client.

That said, we do require the capability to access older cores. Is there any
command we can use to "warm" a large potentially unloaded transient core
when we foresee updating or querying it in the near future. For instance,
would a solrj ping command work for this purpose ?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Question-on-Lots-Of-cores-How-do-I-know-it-s-working-tp4099847p4100014.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Error instantiating a Custom Filter in Solr

2013-11-08 Thread Jack Krupansky
Thanks for the plug Erick, but my deep dive doesn't go quite that deep 
(yet.)


But I'm sure a 2,500 page book on how to develop all manner of custom Solr 
plugin would indeed be valuable though.


But I do have plenty of example of using the many builtin Solr analysis 
filters.


-- Jack Krupansky

-Original Message- 
From: Erick Erickson

Sent: Friday, November 08, 2013 10:36 AM
To: solr-user@lucene.apache.org
Subject: Re: Error instantiating a Custom Filter in Solr

Well, I think Jack Krupansky's book has some examples, at $10 it's probably
a steal.

Best,
Erick




On Fri, Nov 8, 2013 at 1:49 AM, Dileepa Jayakody
wrote:


Hi Erick,

Thanks a lot for the pointer.
I looked at the LowerCaseFilterFactory class [1] and it's parent abstract
class AbstractAnalysisFactory API [2] , and modified my custom filter
factory class as below;

public class ContentFilterFactory extends TokenFilterFactory {

public ContentFilterFactory() {
super();
}

@Override
public void init(Map args) {
super.init(args);
}

@Override
public ContentFilter create(TokenStream input) {
assureMatchVersion();
return new ContentFilter(input);
}
}

I have called the parent's init method as above, but I'm still getting the
same error of : java.lang.NoSuchMethodException: com.solr.test.analyzer.
ContentFilterFactory.(java.util.Map)

Any input on this?
Can some one please point me to a doc/blog or any sample to implement a
custom filter with Solr > 4.0
I'm using Solr 4.5.0 server.

Thanks,
Dileepa

[1]

http://search-lucene.com/c/Lucene:analysis/common/src/java/org/apache/lucene/analysis/core/LowerCaseFilterFactory.java
[2]

https://lucene.apache.org/core/4_2_0/analyzers-common/org/apache/lucene/analysis/util/AbstractAnalysisFactory.html


On Fri, Nov 8, 2013 at 4:25 AM, Erick Erickson wrote:

> Well, the example you linked to is based on 3.6, and things have
> changed assuming you're using 4.0.
>
> It's probably that your ContentFilter isn't implementing what it needs 
> to

> or it's not subclassing from the correct class for 4.0.
>
> Maybe take a look at something simple like LowerCaseFilterFactory
> and use that as a model, although you probably don't need to implement
> the MultiTermAware bit.
>
> FWIW,
> Erick
>
>
> On Thu, Nov 7, 2013 at 1:31 PM, Dileepa Jayakody
> wrote:
>
> > Hi All,
> >
> > I'm  a novice in Solr and I'm continuously bumping into problems with
my
> > custom filter I'm trying to use for analyzing a fieldType during
indexing
> > as below;
> >
> > 
> >   
> > 
> > 
> >   
> > 
> >
> > Below is my custom FilterFactory class;
> >
> > *public class ContentFilterFactory extends TokenFilterFactory {*
> >
> > * public ContentFilterFactory() {*
> > * super();*
> > * }*
> >
> > * @Override*
> > * public TokenStream create(TokenStream input) {*
> > * return new ContentFilter(input);*
> > * }*
> > *}*
> >
> > I'm getting below error stack trace [1] caused by a
NoSuchMethodException
> > when starting the server.
> > Solr complains that it cannot init the Plugin (my custom filter)  as
the
> > FilterFactory class doesn't have a init method; But in the example [2]
I
> > was following didn't have any notion of a init method in the
> FilterFactory
> > class, nor I was required to override an init method when extending
> > TokenFilterFactory class.
> >
> > Can someone please help me resolve this error and get my custom filter
> > working?
> >
> > Thanks,
> > Dileepa
> >
> > [1]
> > Caused by: org.apache.solr.common.SolrException: Plugin init failure
for
> > [schema.xml] fieldType "stanbolRequestType": Plugin init failure for
> > [schema.xml] analyzer/filter: Error instantiating class:
> > 'com.solr.test.analyzer.ContentFilterFactory'
> > at
> >
> >
>
org.apache.solr.util.plugin.AbstractPluginLoader.load(AbstractPluginLoader.java:177)
> > at org.apache.solr.schema.IndexSchema.readSchema(IndexSchema.java:468)
> > ... 13 more
> > Caused by: org.apache.solr.common.SolrException: Plugin init failure
for
> > [schema.xml] analyzer/filter: Error instantiating class:
> > 'com.solr.test.analyzer.ContentFilterFactory'
> > at
> >
> >
>
org.apache.solr.util.plugin.AbstractPluginLoader.load(AbstractPluginLoader.java:177)
> > at
> >
> >
>
org.apache.solr.schema.FieldTypePluginLoader.readAnalyzer(FieldTypePluginLoader.java:400)
> > at
> >
> >
>
org.apache.solr.schema.FieldTypePluginLoader.create(FieldTypePluginLoader.java:95)
> > at
> >
> >
>
org.apache.solr.schema.FieldTypePluginLoader.create(FieldTypePluginLoader.java:43)
> > at
> >
> >
>
org.apache.solr.util.plugin.AbstractPluginLoader.load(AbstractPluginLoader.java:151)
> > ... 14 more
> > Caused by: org.apache.solr.common.SolrException: Error instantiating
> class:
> > 'com.solr.test.analyzer.ContentFilterFactory'
> > at
> >
> >
>
org.apache.solr.core.SolrResourceLoader.newInstance(SolrResourceLoader.java:556)
> > at
> >
> >
>
org.apache.solr.schema.FieldTypePluginLoader$3.create(FieldTypePluginLoader.java:382)
> > at
> >
> >
>
org.apache.solr.schema.FieldTypePluginLoader$3.create(F

Re: Unexpected query result

2013-11-08 Thread Erick Erickson
Good blog on the fact that Solr/Lucene query language is
not strict boolean logic, and why:
http://searchhub.org/dev/2011/12/28/why-not-and-or-and-not/

Best,
Erick


On Fri, Nov 8, 2013 at 10:45 AM, Yonik Seeley  wrote:

> On Fri, Nov 8, 2013 at 10:33 AM, Patrick Duc  wrote:
> > "russia" ("web OR NOT(russia)"
>
> russia (web (*:* -russia))
>
> Negative clauses often need something positive to subtract from... so
> replace "NOT russia" with "(*:* -russia)"
>
> -Yonik
> http://heliosearch.com -- making solr shine
>


Re: Unexpected query result

2013-11-08 Thread Yonik Seeley
On Fri, Nov 8, 2013 at 10:33 AM, Patrick Duc  wrote:
> "russia" ("web OR NOT(russia)"

russia (web (*:* -russia))

Negative clauses often need something positive to subtract from... so
replace "NOT russia" with "(*:* -russia)"

-Yonik
http://heliosearch.com -- making solr shine


Unexpected query result

2013-11-08 Thread Patrick Duc
I'm using Solr 4.4.0 running on Tomcat 7.0.29. The solrconfig.xlm is
as-delivered (excepted for the Solr home directory of course). I could pass
on the schema.xml, though I doubt this would help much, as the following
will show.

If I select all documents containing "russia" in the text, which is the
default field, ie if I execute the query "russia", I find only 1 document,
which is correct.

If I select all documents containing "web" in the text ("web"), the result
is 29, which is also correct.

If I search for all documents that do not contain "russia" ("NOT(russia)"),
the result is still correct (202).

If I search for all documents that contain "web" and do not contain "russia"
("web AND NOT(russia)"), the result is, once again, correct (28, because the
document containing "russia" also contains "web").

But if I search for all documents that contain "web" or do not contain
"russia" ("web OR NOT(russia)"), the result is still 28, though I should get
203 matches (the whole set).

Has anyone got an explanation ??

For information, the AND and OR work correctly if I don't use a NOT
somewhere in the query, i.e. : "web AND russia" --> OK "web OR russia" -->
OK



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Unexpected-query-result-tp416.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Error instantiating a Custom Filter in Solr

2013-11-08 Thread Erick Erickson
Well, I think Jack Krupansky's book has some examples, at $10 it's probably
a steal.

Best,
Erick




On Fri, Nov 8, 2013 at 1:49 AM, Dileepa Jayakody
wrote:

> Hi Erick,
>
> Thanks a lot for the pointer.
> I looked at the LowerCaseFilterFactory class [1] and it's parent abstract
> class AbstractAnalysisFactory API [2] , and modified my custom filter
> factory class as below;
>
> public class ContentFilterFactory extends TokenFilterFactory {
>
> public ContentFilterFactory() {
> super();
> }
>
> @Override
> public void init(Map args) {
> super.init(args);
> }
>
> @Override
> public ContentFilter create(TokenStream input) {
> assureMatchVersion();
> return new ContentFilter(input);
> }
> }
>
> I have called the parent's init method as above, but I'm still getting the
> same error of : java.lang.NoSuchMethodException: com.solr.test.analyzer.
> ContentFilterFactory.(java.util.Map)
>
> Any input on this?
> Can some one please point me to a doc/blog or any sample to implement a
> custom filter with Solr > 4.0
> I'm using Solr 4.5.0 server.
>
> Thanks,
> Dileepa
>
> [1]
>
> http://search-lucene.com/c/Lucene:analysis/common/src/java/org/apache/lucene/analysis/core/LowerCaseFilterFactory.java
> [2]
>
> https://lucene.apache.org/core/4_2_0/analyzers-common/org/apache/lucene/analysis/util/AbstractAnalysisFactory.html
>
>
> On Fri, Nov 8, 2013 at 4:25 AM, Erick Erickson  >wrote:
>
> > Well, the example you linked to is based on 3.6, and things have
> > changed assuming you're using 4.0.
> >
> > It's probably that your ContentFilter isn't implementing what it needs to
> > or it's not subclassing from the correct class for 4.0.
> >
> > Maybe take a look at something simple like LowerCaseFilterFactory
> > and use that as a model, although you probably don't need to implement
> > the MultiTermAware bit.
> >
> > FWIW,
> > Erick
> >
> >
> > On Thu, Nov 7, 2013 at 1:31 PM, Dileepa Jayakody
> > wrote:
> >
> > > Hi All,
> > >
> > > I'm  a novice in Solr and I'm continuously bumping into problems with
> my
> > > custom filter I'm trying to use for analyzing a fieldType during
> indexing
> > > as below;
> > >
> > > 
> > >   
> > > 
> > > 
> > >   
> > > 
> > >
> > > Below is my custom FilterFactory class;
> > >
> > > *public class ContentFilterFactory extends TokenFilterFactory {*
> > >
> > > * public ContentFilterFactory() {*
> > > * super();*
> > > * }*
> > >
> > > * @Override*
> > > * public TokenStream create(TokenStream input) {*
> > > * return new ContentFilter(input);*
> > > * }*
> > > *}*
> > >
> > > I'm getting below error stack trace [1] caused by a
> NoSuchMethodException
> > > when starting the server.
> > > Solr complains that it cannot init the Plugin (my custom filter)  as
> the
> > > FilterFactory class doesn't have a init method; But in the example [2]
> I
> > > was following didn't have any notion of a init method in the
> > FilterFactory
> > > class, nor I was required to override an init method when extending
> > > TokenFilterFactory class.
> > >
> > > Can someone please help me resolve this error and get my custom filter
> > > working?
> > >
> > > Thanks,
> > > Dileepa
> > >
> > > [1]
> > > Caused by: org.apache.solr.common.SolrException: Plugin init failure
> for
> > > [schema.xml] fieldType "stanbolRequestType": Plugin init failure for
> > > [schema.xml] analyzer/filter: Error instantiating class:
> > > 'com.solr.test.analyzer.ContentFilterFactory'
> > > at
> > >
> > >
> >
> org.apache.solr.util.plugin.AbstractPluginLoader.load(AbstractPluginLoader.java:177)
> > > at org.apache.solr.schema.IndexSchema.readSchema(IndexSchema.java:468)
> > > ... 13 more
> > > Caused by: org.apache.solr.common.SolrException: Plugin init failure
> for
> > > [schema.xml] analyzer/filter: Error instantiating class:
> > > 'com.solr.test.analyzer.ContentFilterFactory'
> > > at
> > >
> > >
> >
> org.apache.solr.util.plugin.AbstractPluginLoader.load(AbstractPluginLoader.java:177)
> > > at
> > >
> > >
> >
> org.apache.solr.schema.FieldTypePluginLoader.readAnalyzer(FieldTypePluginLoader.java:400)
> > > at
> > >
> > >
> >
> org.apache.solr.schema.FieldTypePluginLoader.create(FieldTypePluginLoader.java:95)
> > > at
> > >
> > >
> >
> org.apache.solr.schema.FieldTypePluginLoader.create(FieldTypePluginLoader.java:43)
> > > at
> > >
> > >
> >
> org.apache.solr.util.plugin.AbstractPluginLoader.load(AbstractPluginLoader.java:151)
> > > ... 14 more
> > > Caused by: org.apache.solr.common.SolrException: Error instantiating
> > class:
> > > 'com.solr.test.analyzer.ContentFilterFactory'
> > > at
> > >
> > >
> >
> org.apache.solr.core.SolrResourceLoader.newInstance(SolrResourceLoader.java:556)
> > > at
> > >
> > >
> >
> org.apache.solr.schema.FieldTypePluginLoader$3.create(FieldTypePluginLoader.java:382)
> > > at
> > >
> > >
> >
> org.apache.solr.schema.FieldTypePluginLoader$3.create(FieldTypePluginLoader.java:376)
> > > at
> > >
> > >
> >
> org.apache.solr.util.plugin.AbstractPluginLoader.load(AbstractPluginLoader.java:151)
> > > ... 18 

Re: Question on Lots Of cores - How do I know it's working

2013-11-08 Thread vybe3142
Thanks so much for the answer, and for "JIRA-fying" it.




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Question-on-Lots-Of-cores-How-do-I-know-it-s-working-tp4099847p415.html
Sent from the Solr - User mailing list archive at Nabble.com.


Merging shards and replicating changes in SolrCloud

2013-11-08 Thread michael.boom
Here's the background of this topic: 
I have setup a collection with 4 shards, replicationFactor=2, on two
machines.
I started to index documents, but after hitting some update deadlocks and
restarting servers my shards ranges in ZK state got nulled (i'm using
implicit routing). Indexing continued without me noticing and all new
documents were indexed in shard1 creating huge disproportions with
shards2,3,4.
Of course, I want to fix this and get my index into 4 shards, evenly
distributed.

What I'm thinking to do is:
1. on machine 1, merge shards2,3,4 into shard1 using
http://wiki.apache.org/solr/MergingSolrIndexes
(at this point what happens to the replica of shard1 on machine2 ? will
SolrCloud try to replicate shard1 from machine1?)
2. on machine 2, unload the shard1,2,3,4 cores
3. on machine 1, split shard1 in shard1_0 and shard1_1. Again split shard1_0
and 1_1 getting 4 equal shards 1_0_0, 1_0_1, 1_1_0, 1_1_1
(will now the shard range for the newborns be correct if in the beginning
shard1's range was "null"?)
4. on machine 1 unload shard1
5. rename shards 1_0_0, 1_0_1, 1_1_0, 1_1_1 to 1,2,3,4.
6. replicate shard 1,2,3,4 to machine 2

Do you see any problems with this scenario? Anything that could be don in a
more efficient way ?
Thank you



-
Thanks,
Michael
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Merging-shards-and-replicating-changes-in-SolrCloud-tp407.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Multi-core support for indexing multiple servers

2013-11-08 Thread Erick Erickson
Yep, you can define multiple data sources for use with DIH.

Combining data from those multiple sources into a single
index can be a bit tricky with DIH, personally I tend to prefer
SolrJ, but that's mostly personal preference, especially if
I want to get some parallelism going on.

But whatever works

Erick


On Thu, Nov 7, 2013 at 11:17 PM, manju16832003 wrote:

> Eric,
> Just a question :-), wouldn't it be easy to use DIH to pull data from
> multiple data sources.
>
> I do use DIH to do that comfortably. I have three data sources
>  - MySQL
>  - URLDataSource that returns XML from an .NET application
>  - URLDataSource that connects to an API and return XML
>
> Here is part of data-config data source settings
>  driver="com.mysql.jdbc.Driver"
> url="jdbc:mysql://localhost/employeeDB" batchSize="-1" user="root"
> password="root"/>
> connectionTimeout="5000" readTimeout="1"/>
> connectionTimeout="5000" readTimeout="1"/>
>
>
> Of course, in application I do the same.
> To construct my results, I do connect to MySQL and those two data sources.
>
> Basically we have two point of indexing
>  - Using DIH at one time indexing
>  - At application whenever there is transaction to the details that we are
> storing in Solr.
>
>
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Multi-core-support-for-indexing-multiple-servers-tp4099729p4099933.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: fq efficiency

2013-11-08 Thread Erick Erickson
Have you tried this and measured or is this theoretical? Because
filter queries are _designed_ for this kind of use case.

bq: If the user has 100 documents, then finding the intersection
requires checking each list ~100 times

The cached fq is a bitset. Before checking each document,
all that has to happen to "check the list" is index into the bitset and
see if the bit is turned on. If it isn't, the document is bypassed.


The lots of cores solution has this drawback. The first time a query
comes in for a particular core, it may be loaded, which will be
noticeably slow, so your users have to be able to tolerate first-time
searches that take a bit of time. That said, test to see if it's
"fast enough" before settling on the solution.

But really, I'd bypass this and just try the filter query solution and
measure. Because I'd be surprised if you really had performance
issues here, assuming your filter queries are indeed cached and
re-used.

Best,
Erick


On Thu, Nov 7, 2013 at 7:02 PM, Scott Schneider <
scott_schnei...@symantec.com> wrote:

> Digging a bit more, I think I have answered my own questions.  Can someone
> please say if this sounds right?
>
> http://wiki.apache.org/solr/LotsOfCores looks like a pretty good
> solution.  If I give each user his own shard, each query can be run in only
> one shard.  The effect of the filter query will basically be to find that
> shard.  The requirements listed on the wiki suggest that performance will
> be good.  But in Solr 3.x, this won't scale with the # users/shards.
>
> Prepending a user id to indexed keywords using an analyzer will break
> wildcard search.  If there is a wildcard, the query analyzer doesn't run
> filters, so it won't prepend the user id.  I could prepend the user id
> myself before calling Solr, but that seems... bad.
>
> Scott
>
>
>
> > -Original Message-
> > From: Scott Schneider [mailto:scott_schnei...@symantec.com]
> > Sent: Thursday, November 07, 2013 2:03 PM
> > To: solr-user@lucene.apache.org
> > Subject: RE: fq efficiency
> >
> > Thanks, that link is very helpful, especially the section, "Leapfrog,
> > anyone?"  This actually seems quite slow for my use case.  Suppose we
> > have 10,000 users and 1,000,000 documents.  We search for "hello" for a
> > particular user and let's assume that the fq set for the user is
> > cached.  "hello" is a common word and perhaps 10,000 documents will
> > match.  If the user has 100 documents, then finding the intersection
> > requires checking each list ~100 times.  If the user has 1,000
> > documents, we check each list ~1,000 times.  That doesn't scale well.
> >
> > My searches are usually in one user's data.  How can I take advantage
> > of that?  I could have a separate index for each user, but loading so
> > many indexes at once seems infeasible; and dynamically loading &
> > unloading indexes is a pain.
> >
> > Or I could create a filter that takes tokens and prepends them with the
> > user id.  That seems like a good solution, since my keyword searches
> > always include a user id (and usually just 1 user id).  Though I wonder
> > if there is a downside I haven't thought of.
> >
> > Thanks,
> > Scott
> >
> >
> > > -Original Message-
> > > From: Shawn Heisey [mailto:s...@elyograg.org]
> > > Sent: Tuesday, November 05, 2013 4:35 PM
> > > To: solr-user@lucene.apache.org
> > > Subject: Re: fq efficiency
> > >
> > > On 11/5/2013 3:36 PM, Scott Schneider wrote:
> > > > I'm wondering if filter queries are efficient enough for my use
> > > cases.  I have lots and lots of users in a big, multi-tenant, sharded
> > > index.  To run a search, I can use an fq on the user id and pass in
> > the
> > > search terms.  Does this scale well with the # users?  I suppose
> > that,
> > > since user id is indexed, generating the filter data (which is
> > cached)
> > > will be fast.  And looking up search terms is fast, of course.  But
> > if
> > > the search term is a common one that many users have in their
> > > documents, then Solr may have to perform an intersection between two
> > > large sets:  docs from all users with the search term and all of the
> > > current user's docs.
> > > >
> > > > Also, how about auto-complete and searching with a trailing
> > wildcard?
> > > As I understand it, these work well in a single-tenant index because
> > > keywords are sorted in the index, so it's easy to get all the search
> > > terms that match "foo*".  In a multi-tenant index, all users'
> > keywords
> > > are stored together.  So if Lucene were to look at all the keywords
> > > from "foo" to "fooz" (I'm not sure if it actually does this), it
> > > would skip over a large majority of keywords that don't belong to
> > this
> > > user.
> > >
> > >  From what I understand, there's not really a whole lot of difference
> > > between queries and filter queries when they are NOT cached, except
> > > that
> > > the main query and the filter queries are executed in parallel, which
> > > can save time.
> > >
> >

Re: SOLR keyword search with fq queries

2013-11-08 Thread Alvaro Cabrerizo
Please, check if "defaults", "appends" and "invariants" from
http://wiki.apache.org/solr/SearchHandler can solve your problem.

Regards.


On Fri, Nov 8, 2013 at 6:05 AM, atuldj.jadhav wrote:

> Hi All,I need your help to find a solution to one of the issue I am facing
> with the keyword search.We have to provide a keyword search functionality,
> on our website i.e. searching of a word will get you all the indexed
> documents where a match is found for that word./ (Not specific to any
> particular field)/I have achieved this, please find my sample query... i.e.
>
> http://localhost:8983/solr/mywebsite/?q=Java+Servlet&version=2.2&start=0&rows=10&indent=on
> This Works perfectly fine for me (here mywebsite is a request handler I had
> to setup to serve different websites)*HOWEVER, To handle localization I had
> to setup my request handler with some predefined conditions.*i.e.
> language:englishHowever this is getting complicated when I try to search
> for
> a keywords through my request handlers, i.e. Keyword search overwrites my
> **
> conditions, and returns me non English results.Can anyone suggest how to
> handle this condition? Can we do keyword search using "fq" query?
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/SOLR-keyword-search-with-fq-queries-tp4099937.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Re: Disjuctive Queries (OR queries) and FilterCache

2013-11-08 Thread Erick Erickson
Glad to hear you have a solution

Best,
Erick


On Thu, Nov 7, 2013 at 5:12 PM, Patanachai Tangchaisin <
patanachai.tangchai...@wizecommerce.com> wrote:

> Hi Erick,
>
> About the size of filter cache, previously we set it to 4,000.
> After we faced this problem, we changed it to 10,000.
> Still at size of 10,000 (always full), hitratio was 0.78 and "eviction"
> was as high as "insertion".
>
> About 100% Cpu, yes, it was Solr using it.
> I profiled an app, it was "DisjunctionSumScorer" that takes most CPU times.
> Since this is a required filter query, we set it for every requests.
> My assumption is because Solr cannot use a filter cache, the filter query
> has to be executed at a same time as normal query.
>
> However, we fix this problem by sorting our filter constraints before
> creating a filter query.
> So, {"1","2","3"}, {"2","3","1"}, {"3","2","1"} will be a same the filter
> query i.e. fq=x:("1"  OR "2" OR "3").
>
> We end up with very small filter cache size (<1,000) and hit ratio is now
> 0.99. There is no eviction at all.
> The median response time is now less than 200ms on 25 QPS.
>
> Thanks,
> Patanachai
>
>
> On 11/07/2013 04:37 AM, Erick Erickson wrote:
>
> Yeah, Solr's fq cache is pretty simple-minded,
> order matters. There's no good way to improve
> that except try to write your fq queries in the
> same order. It's actually quite tricky to
> disassemble/reassemble arbitrary queries to fix
> this problem.
>
> But in your case, you could write a custom query
> component that was able to handle this _specific_
> case relatively easily I should think.
>
> bq: Our machine always use 100% CPU
>
> This is strange. Are you sure Solr is using this?
> Are there any other processes on the server that
> might be using this? Top (*nix) might help here. If
> it's really all Solr, then you need another slave
> or two to handle the load. Do you get good responses
> when the QPS rate is, say 10?
>
> How big is your filter cache?
>
> A hit ratio of .76 isn't actually too bad. It looks like
> you're running for a long time, and if so the insert
> and eviction numbers will tend to the same number.
>
> Do beware of using NOW in your fq clauses, that can
> cause grief. See:
> http://searchhub.org/2012/02/23/date-math-now-and-filter-queries/
>
> This seems like really poor performance, I'm puzzled.
>
> Best,
> Erick
>
>
>
>
> On Mon, Nov 4, 2013 at 8:38 PM, Patanachai Tangchaisin <
> patanachai.tangchai...@wizecommerce.com chai.tangchai...@wizecommerce.com>> wrote:
>
>
>
> Hello,
>
> We are running our search system using Apache Solr 4.2.1 and using
> Master/Slave model.
> Our index has ~100M document. The index size is  ~20gb.
> The machine has 24 CPU and 48gb rams.
>
> Our response time is pretty bad, median is ~4 seconds with 25
> queries/second.
>
> We noticed a couple of things
> - Our machine always use 100% CPU.
> - There is a lot of room for Java Heap. We assign Xms12g and Xmx16g, but
> the size of heap is still only 12g
> - Solr's filterCache hit ratio is only 0.76 and the number of insertion
> and eviction is almost equal.
>
> The weird thing is
> - most items in Solr's filterCache (only 100 first) are specify to only
> 1 field which we filter it by using an OR query for this field. Note
> that every request will have this field constraint.
>
> For example, if field name is x
> fq=x:(1 OR 2 OR 3)&fq=y:'a'
> fq=x:(3 OR 2 OR 1)&fq=y:'b'
> fq=x:(2 OR 1 OR 3)&fq=y:'c'
>
> An order of items is different since it is an input from a different
> system.
>
> To me, it seems that Solr do a cache on this field in different entry if
> an order of item is different. e.g. "(1 OR 2)" and "(2 OR 1)" is going
> to be a different cache entry.
>
> Question:
> Is there other way to create a fq parameter using 'OR' and make Solr
> cache them as a same entry?
>
>
> Thanks,
> Patanachai Tangchaisin
>
> CONFIDENTIALITY NOTICE
> ==
> This email message and any attachments are for the exclusive use of the
> intended recipient(s) and may contain confidential and privileged
> information. Any unauthorized review, use, disclosure or distribution is
> prohibited. If you are not the intended recipient, please contact the
> sender by reply email and destroy all copies of the original message along
> with any attachments, from your computer system. If you are the intended
> recipient, please be advised that the content of this message is subject to
> access, review and disclosure by the sender's Email System Administrator.
>
>
>
>
>
>
>
> CONFIDENTIALITY NOTICE
> ==
> This email message and any attachments are for the exclusive use of the
> intended recipient(s) and may contain confidential and privileged
> information. Any unauthorized review, use, disclosure or distribution is
> prohibited. If you are not the intended recipient, please contact the
> sender by reply email and destroy all copies of the original message along
> with any attachments, from your computer system

Re: Is this a reasonable way to boost?

2013-11-08 Thread Upayavira


On Thu, Nov 7, 2013, at 10:51 PM, Michael Tracey wrote:
> I'm trying to boost results slightly on a price (not currency) field that
> are closer to a certain value.  I want results that are not too expensive
> or too inexpensive to be favored.  Here is what we currently are trying:
> 
> bf=sub(1,abs(sub(15,price)))^0.2
> 
> where 15 is that "median" I want to boost towards.  Is this a good way? 
> I understand in older solr's it was common to use recip(ord()) for this
> but you shouldn't do so now.
> 
> Thanks for any comments or advice on improving this.

I think this is a case of "if it works". If it works for you, then
great.

What I would say though, is that if you have a lot of documents in your
index, consider pre-computing that field at index time, and boost on the
pre-computed value, as that will give you better performance.

Upayavira


Fwd: Mobiles/Tablets for Repair

2013-11-08 Thread Rohan Thakur
Hey there,

Currently, I am part of a company* ZurePro Warranty* which deals with
providing warranties over the mobiles and tablets. If you have any such
product which needs to be repaired (only mobiles & tablets) you can get in
touch with me. ZurePro will arrange a free pick and drop for the gadget,
you will only have to bear the repair cost which will be the market price
of the repairing. We are looking for faulty smartphones and tablets with
any kind of hardware/software problems like motherboard issue, power port
malfunction, screen/touch pad problem etc..


If you have any such product kindly let me know, ZurePro will take care of
it. :)

Cheers!!

-- 

Best Regards,
Rohan Thakur


Range faceting or grouping on a String or count(field)

2013-11-08 Thread Chris Geeringh
I'm trying to achieve something I thought was relatively simple. "Range
faceting" on a String, or count(String). I understand range faceting on a
string isn't possible as per the docs, but is there any way to achieve
something 'like' this functionality.

Consider a document with the field "brand", I want to get the number of
hits for "brand" per user supplied interval gap over a date range.

Something like this in the output for interval gap of +1MONTH:

2013-01
   coke: 20
   pepsi: 5
   fanta: 5

2013-02
   coke: 10
   pepsi: 50
   fanta: 1

...

What are the suggestions to achieve something like this?

Cheers,
Chris