Re: Solr Synonyms, Escape space in case of multi words

2014-10-15 Thread Rajani Maski
Hi David,

  I think you should have the filter class with tokenizer specified. [As
shown below]

  *



So your field type should be as shown below:


  


  



On Wed, Oct 15, 2014 at 7:25 PM, David Philip 
wrote:

> Sorry, analysis page clip is getting trimmed off and hence the indention is
> lost.
>
> Here it is :
>
> ridemakers | ride | ridemakerz | ride | ridemark | ride | makers | makerz|
> care
>
> expected:
>
> ridemakers | ride | ridemakerz | ride | ridemark | ride | makers |
> makerz| *ride
> care*
>
>
>
> On Wed, Oct 15, 2014 at 7:21 PM, David Philip  >
> wrote:
>
> > contd..
> >
> > expectation was that the "ride care"  should not have split into two
> > tokens.
> >
> > It should have been as below. Please correct me/point me where I am
> wrong.
> >
> >
> > Input : ridemakers, ride makers, ridemakerz, ride makerz, ride\mark,
> ride\
> > care
> >
> > o/p
> >
> > ridemakersrideridemakerzrideridemarkridemakersmakerz
> >
> > *ride care*
> >
> >
> >
> >
> > On Wed, Oct 15, 2014 at 7:16 PM, David Philip <
> davidphilipshe...@gmail.com
> > > wrote:
> >
> >> Hi All,
> >>
> >>I remember using multi-words in synonyms in Solr 3.x version. In case
> >> of multi words, I was escaping space with back slash[\] and it work as
> >> intended.  Ex: ride\ makers, riders, rider\ guards.  Each one mapped to
> >> each other and so when I searched for ride makers, I obtained the search
> >> results for all of them. The field type was same as below. I have same
> set
> >> up in solr 4.10 but now the multi word space escape is getting ignored.
> It
> >> is tokenizing on spaces.
> >>
> >>  synonyms.txt
> >> ridemakers, ride makers, ridemakerz, ride makerz, ride\mark, ride\
> >> care
> >>
> >>
> >> Analysis page:
> >>
> >> ridemakersrideridemakerzrideridemarkridemakersmakerzcare
> >>
> >> Field Type
> >>
> >>  >> positionIncrementGap="100">
> >>   
> >> 
> >>  synonyms="synonyms.txt"
> >> ignoreCase="true" expand="true"/>
> >>   
> >> 
> >>
> >>
> >>
> >> Could you please tell me what could be the issue? How do I handle
> >> multi-word cases?
> >>
> >>
> >>
> >>
> >> synonyms.txt
> >> ridemakers, ride makers, ridemakerz, ride makerz, ride\mark, ride\
> >> care
> >>
> >>
> >> Thanks - David
> >>
> >>
> >>
> >
> >
>


Problem with DIH

2014-10-15 Thread Jay Potharaju
Hi
I 'm using DIH for updating my core. I 'm using store procedure for doing a
full/ delta imports. In order to avoid running delta imports for a long
time, i limit the rows returned to a max of 100,000 rows at a given time.
On an average the delta import runs for less than 1 minute.

For the last couple of days I have been noticing that my delta imports has
been running for couple of hours and tries to update all the records in the
core. I 'm not sure why that has been happening. I cant reproduce this
event all the time, it happens randomly.

Has anyone noticed this kind of behavior. And secondly are there any solr
logs that will tell me what is getting updated or what exactly is happening
at the DIH ?
Any suggestion appreciated.

Document size: 20 million
Solr 4.9
3 Nodes in the solr cloud.


Thanks
J


How to use less than and greater than in data-config file of solr

2014-10-15 Thread madhav bahuguna
I have two tables and i want to link them using greater than and less than
condition.They have nothing in common,the only way i can link them is using
range values.Iam able to do this in Mysql but how do i do this in solr in
my data-config.xml file This is how my data-config file looks


 

 
 
 
 
 

When i click full indexing data does not get index and no error is shown.
What is wrong with this,Can any one help and advise.What i have seen is
that if i replace AND with OR it works fine or just use one condition
instead of both it works fine . Can any one advise and help How do i
achieve what i want to do.
I have also posted this question in stackoverflow
http://stackoverflow.com/questions/26397084/how-use-less-than-and-greater-than-in-data-config-file-of-solr
-- 
Regards
Madhav Bahuguna


How should one search on all fields? *:XX does not work

2014-10-15 Thread Aaron Lewis
Hi,

I'm trying to match all fields, so I tried this:
*:"XX"

Is that a bad practice? It does seem to be supported either

-- 
Best Regards,
Aaron Lewis - PGP: 0x13714D33 - http://pgp.mit.edu/
Finger Print:   9F67 391B B770 8FF6 99DC  D92D 87F6 2602 1371 4D33


Re: SolrCloud 4.7 not doing distributed search when querying from a load balancer.

2014-10-15 Thread Shawn Heisey

On 10/15/2014 10:24 PM, S.L wrote:

Yes , I tried those two queries with distrib=false , I get 0 results for
first and 1 result  for the second query( (i.e. server 3 shard 2 replica
2)  consistently.

However if I run the same second query (i.e. server 3 shard 2 replica 2)
with distrib=true, I sometimes get a result and sometimes not , should'nt
this query always return a result when its pointing to a core that seems to
have that document regardless of distrib=true or false ?

Unfortunately I dont see anything particular in the logs to point to any
information.

BTW you asked me to replace the request handler , I use the select request
handler ,so I cannot replace it with anything else , is that  a problem ?


If you send the query with distrib=true (which is the default value in 
SolrCloud), then it treats it just as if you had sent it to 
/solr/collection instead of /solr/collection_shardN_replicaN, so it's a 
full distributed query. The distrib=false is required to turn that 
behavior off and ONLY query the index on the actual core where you sent it.


I only said to replace those things as appropriate.  Since you are using 
/select, it's no problem that you left it that way. If I were to assume 
that you used /select, but you didn't, the URLs as I wrote them might 
not have worked.


As discussed, this means that your replicas are truly out of sync.  It's 
difficult to know what caused it, especially if you can't see anything 
in the log when you indexed the missing documents.


We know you're on Solr 4.10.1.  This means that your Java is a 1.7 
version, since Java7 is required.


Here's where I ask a whole lot of questions about your setup. What is 
the precise Java version, and which vendor's Java are you using?  What 
operating system is it on?  Is everything 64-bit, or is any piece (CPU, 
OS, Java) 32-bit?  On the Solr admin UI dashboard, it lists all 
parameters used when starting Java, labelled as "Args".  Can you include 
those?  Is zookeeper external, or embedded in Solr?  Is it a 3-server 
(or more) ensemble?  Are you using the example jetty, or did you provide 
your own servlet container?


We recommend 64-bit Oracle Java, the latest 1.7 version.  OpenJDK (since 
version 1.7.x) should be pretty safe as well, but IBM's Java should be 
avoided.  IBM does very aggressive runtime optimizations.  These can 
make programs run faster, but they are known to negatively affect 
Lucene/Solr.


Thanks,
Shawn



Re: SolrCloud 4.7 not doing distributed search when querying from a load balancer.

2014-10-15 Thread S.L
Shawn,

Yes , I tried those two queries with distrib=false , I get 0 results for
first and 1 result  for the second query( (i.e. server 3 shard 2 replica
2)  consistently.

However if I run the same second query (i.e. server 3 shard 2 replica 2)
with distrib=true, I sometimes get a result and sometimes not , should'nt
this query always return a result when its pointing to a core that seems to
have that document regardless of distrib=true or false ?

Unfortunately I dont see anything particular in the logs to point to any
information.

BTW you asked me to replace the request handler , I use the select request
handler ,so I cannot replace it with anything else , is that  a problem ?

Thanks.

On Thu, Oct 16, 2014 at 12:05 AM, Shawn Heisey  wrote:

> On 10/15/2014 9:26 PM, S.L wrote:
>
>> Look at the logging information I provided below , looks like the results
>> are only being returned back for this solrCloud cluster  if the request
>> goes to one of the two replicas of a shard.
>>
>> I have verified that numDocs in the replicas for a given shard is same but
>> there is difference in the maxDoc and deletedDocs, does this signal the
>> replicas being out of sync ?
>>
>> Even if the numDocs are same , how do we guarantee that those docs are
>> identical and have the same uniquekeys , is there a way to verify this ? I
>> am suspecting that  as the numDocs is same across the replicas , and still
>> only when the request goes to one of  the  replicas of the shard that I
>> get
>> a result back , the documents with in those replicas with in a shard are
>> not an exact replica set of each other.
>>
>> I suspect the issue I am facing in 4.10.1 cloud is related to
>> https://issues.apache.org/jira/browse/SOLR-4924  .
>>
>> Can anyone please let me know , how to solve this issue of intermittent no
>> results for a query ?
>>
>
> query with no results hits these cores:
> server 2 shard 3 replica1
> server 3 shard 1 replica 1
> server 1 shard 2 replica 1
>
> query with 1 result hits these cores:
> server 2 shard 1 replica 2
> server 3 shard 2 replica 2 (found 1)
> server 1 shard 3 replica 2
>
> Here's some URLs for some testing.  They are directed at specific shard
> replicas and are specifically NOT distributed queries:
>
> http://server1.mydomain.com:8081/solr/dyCollection1_
> shard2_replica1/select?q=*:*&fq=id:e8995da8-7d98-4010-93b4-
> 8ff7dffb8bfb&distrib=false
>
> http://server3.mydomain.com:8081/solr/dyCollection1_
> shard2_replica2/select?q=*:*&fq=id:e8995da8-7d98-4010-93b4-
> 8ff7dffb8bfb&distrib=false
>
> If you run these queries (replacing server names and the /select request
> handler as appropriate), do you get 0 results on the first one and 1 result
> on the second one?  If you do, then you've definitely got replicas out of
> sync.  If you get 1 result on both queries, then something else is
> breaking.  If by chance you have taken steps to fix this particular ID,
> pick another one that you know has a problem.
>
> There is no automated way to detect replicas out of sync.  You could
> request all docs on both replicas with distrib=false&fl=id&sort=id+asc,
> then compare the two lists.  Depending on how many docs you have, those
> queries could take a while to run.
>
> If the replicas are out of sync, are there any ERROR entries in the Solr
> log, especially at the time that the problem docs were indexed?
>
> Thanks,
> Shawn
>
>


Re: SolrCloud 4.7 not doing distributed search when querying from a load balancer.

2014-10-15 Thread Shawn Heisey

On 10/15/2014 9:26 PM, S.L wrote:

Look at the logging information I provided below , looks like the results
are only being returned back for this solrCloud cluster  if the request
goes to one of the two replicas of a shard.

I have verified that numDocs in the replicas for a given shard is same but
there is difference in the maxDoc and deletedDocs, does this signal the
replicas being out of sync ?

Even if the numDocs are same , how do we guarantee that those docs are
identical and have the same uniquekeys , is there a way to verify this ? I
am suspecting that  as the numDocs is same across the replicas , and still
only when the request goes to one of  the  replicas of the shard that I get
a result back , the documents with in those replicas with in a shard are
not an exact replica set of each other.

I suspect the issue I am facing in 4.10.1 cloud is related to
https://issues.apache.org/jira/browse/SOLR-4924  .

Can anyone please let me know , how to solve this issue of intermittent no
results for a query ?


query with no results hits these cores:
server 2 shard 3 replica1
server 3 shard 1 replica 1
server 1 shard 2 replica 1

query with 1 result hits these cores:
server 2 shard 1 replica 2
server 3 shard 2 replica 2 (found 1)
server 1 shard 3 replica 2

Here's some URLs for some testing.  They are directed at specific shard 
replicas and are specifically NOT distributed queries:


http://server1.mydomain.com:8081/solr/dyCollection1_shard2_replica1/select?q=*:*&fq=id:e8995da8-7d98-4010-93b4-8ff7dffb8bfb&distrib=false

http://server3.mydomain.com:8081/solr/dyCollection1_shard2_replica2/select?q=*:*&fq=id:e8995da8-7d98-4010-93b4-8ff7dffb8bfb&distrib=false

If you run these queries (replacing server names and the /select request 
handler as appropriate), do you get 0 results on the first one and 1 
result on the second one?  If you do, then you've definitely got 
replicas out of sync.  If you get 1 result on both queries, then 
something else is breaking.  If by chance you have taken steps to fix 
this particular ID, pick another one that you know has a problem.


There is no automated way to detect replicas out of sync.  You could 
request all docs on both replicas with distrib=false&fl=id&sort=id+asc, 
then compare the two lists.  Depending on how many docs you have, those 
queries could take a while to run.


If the replicas are out of sync, are there any ERROR entries in the Solr 
log, especially at the time that the problem docs were indexed?


Thanks,
Shawn



Re: SolrCloud 4.7 not doing distributed search when querying from a load balancer.

2014-10-15 Thread S.L
Look at the logging information I provided below , looks like the results
are only being returned back for this solrCloud cluster  if the request
goes to one of the two replicas of a shard.

I have verified that numDocs in the replicas for a given shard is same but
there is difference in the maxDoc and deletedDocs, does this signal the
replicas being out of sync ?

Even if the numDocs are same , how do we guarantee that those docs are
identical and have the same uniquekeys , is there a way to verify this ? I
am suspecting that  as the numDocs is same across the replicas , and still
only when the request goes to one of  the  replicas of the shard that I get
a result back , the documents with in those replicas with in a shard are
not an exact replica set of each other.

I suspect the issue I am facing in 4.10.1 cloud is related to
https://issues.apache.org/jira/browse/SOLR-4924 .

Can anyone please let me know , how to solve this issue of intermittent no
results for a query ?



On Wed, Oct 15, 2014 at 3:15 PM, S.L  wrote:

> Tim,
>
> Thanks for the suggestion.
>
> I have rerun the query by adding shards.info=true and debug= track. I
> have included the xml data for both teh scenarios below , thin happens
> intermittently on SolrCloud 4.10.1 , with a replication factor of 2 and 3
> shards (6 cores) , I get result in one execution of query and then no
> results for the subsequent one , I am hoping someone would be able to help
> me find the root cause with this additional information ,I have included
> the query output with the additional parameters for the both the scenarios
> below .
>
> Thanks for your help!
>
> *Scenario #1 : In this try I get no results back. Here is what the query
> returns.*
>
> 
> 
>
>   0
>   29
>   
>  *:*
>  true
>  true
>  track
>  xml
>  (id:e8995da8-7d98-4010-93b4-8ff7dffb8bfb)
>   
>
>
>   http://server1.mydomain.com:8082/solr/dyCollection1_shard3_replica2/|http://server2.mydomain.com:8082/solr/dyCollection1_shard3_replica1/
> ">
>  0
>  0.0
>  
> http://server2.mydomain.com:8082/solr/dyCollection1_shard3_replica1
>  4
>   
>   http://server3.mydomain.com:8082/solr/dyCollection1_shard1_replica1/|http://server2.mydomain.com:8081/solr/dyCollection1_shard1_replica2/
> ">
>  0
>  0.0
>  
> http://server3.mydomain.com:8082/solr/dyCollection1_shard1_replica1
>  13
>   
>   http://server1.mydomain.com:8081/solr/dyCollection1_shard2_replica1/|http://server3.mydomain.com:8081/solr/dyCollection1_shard2_replica2/
> ">
>  0
>  0.0
>  
> http://server1.mydomain.com:8081/solr/dyCollection1_shard2_replica1
>  26
>   
>
>
>
>   
>  false
>   
>
>
>   
>   name="rid">server3.mydomain.com-dyCollection1_shard2_replica2-1413398784226-17
>  
> http://server1.mydomain.com:8082/solr/dyCollection1_shard3_replica2/|http://server2.mydomain.com:8082/solr/dyCollection1_shard3_replica1/
> ">
>1
>4
>GET_TOP_IDS
>0
> name="Response">{responseHeader={status=0,QTime=1,params={spellcheck=true,spellcheck.maxCollationTries=10,distrib=false,debug=[false,
> track],version=2,NOW=1413398784225,shard.url=
> http://server1.mydomain.com:8082/solr/dyCollection1_shard3_replica2/|http://server2.mydomain.com:8082/solr/dyCollection1_shard3_replica1/,df=suggestAggregate,fl=thingURL,score,debugQuery=false,spellcheck.count=10,fq=(id:e8995da8-7d98-4010-93b4-8ff7dffb8bfb),fsv=true,spellcheck.alternativeTermCount=5,spellcheck.maxResultsForSuggest=5,spellcheck.collateExtendedResults=true,spellcheck.extendedResults=true,spellcheck.maxCollations=5,wt=javabin,spellcheck.collate=true,requestPurpose=GET_TOP_IDS,rows=10,rid=server3.mydomain.com-dyCollection1_shard2_replica2-1413398784226-17,start=0,q=*:*,shards.info=true,spellcheck.dictionary=[direct,
> wordbreak],isShard=true}},response={numFound=0,start=0,maxScore=0.0,docs=[]},sort_values={},debug={}}
> 
> http://server3.mydomain.com:8082/solr/dyCollection1_shard1_replica1/|http://server2.mydomain.com:8081/solr/dyCollection1_shard1_replica2/
> ">
>10
>13
>GET_TOP_IDS
>0
> name="Response">{responseHeader={status=0,QTime=10,params={spellcheck=true,spellcheck.maxCollationTries=10,distrib=false,debug=[false,
> track],version=2,NOW=1413398784225,shard.url=
> http://server3.mydomain.com:8082/solr/dyCollection1_shard1_replica1/|http://server2.mydomain.com:8081/solr/dyCollection1_shard1_replica2/,df=suggestAggregate,fl=thingURL,score,debugQuery=false,spellcheck.count=10,fq=(id:e8995da8-7d98-4010-93b4-8ff7dffb8bfb),fsv=true,spellcheck.alternativeTermCount=5,spellcheck.maxResultsForSuggest=5,spellcheck.collateExtendedResults=true,spellcheck.extendedResults

Does Solr support this?

2014-10-15 Thread Aaron Lewis
Hi,

I'm trying to a "if first query is empty then do a second query", e.g

if this returns no rows:
title:"XX" AND subject:"YY"

Then do a
title:"XX"

I can do that with two queries. But I'm wondering if I can merge them
into a single one?

-- 
Best Regards,
Aaron Lewis - PGP: 0x13714D33 - http://pgp.mit.edu/
Finger Print:   9F67 391B B770 8FF6 99DC  D92D 87F6 2602 1371 4D33


[SOLVED] Re: Does the SolrQuery class support 'inc' command?

2014-10-15 Thread Aaron Lewis
Thanks Matt. I got it worked with the '' command.

On Wed, Oct 15, 2014 at 11:42 PM, Matthew Nigl  wrote:
> Hi Aaron,
>
> I do not believe that there is direct support for atomic updates with this
> API. Other libraries such as Solarium do support this.
>
> However, one option is that you could generate the atomic update XML in
> code and send it via a raw request:
> http://php.net/manual/en/solrclient.request.php
>
> Regards,
> Matt
>
> On 15 October 2014 22:23, Aaron Lewis  wrote:
>
>> Hi,
>>
>> I'm trying to do a partial update, according to
>>
>> https://cwiki.apache.org/confluence/display/solr/Updating+Parts+of+Documents
>>
>> I can use 'inc' command to do that, but I couldn't find relevant PHP API
>> here:
>> http://php.net/manual/en/class.solrquery.php
>>
>> Anyone know what function should I use? I tried 'addField' but that
>> would just override existing entry
>>
>> --
>> Best Regards,
>> Aaron Lewis - PGP: 0x13714D33 - http://pgp.mit.edu/
>> Finger Print:   9F67 391B B770 8FF6 99DC  D92D 87F6 2602 1371 4D33
>>



-- 
Best Regards,
Aaron Lewis - PGP: 0x13714D33 - http://pgp.mit.edu/
Finger Print:   9F67 391B B770 8FF6 99DC  D92D 87F6 2602 1371 4D33


Re: ECMASCript Nashorn tutorial

2014-10-15 Thread Chris Hostetter

: Since we are moving to Java 8, how about we support Nashorn?
: 
: http://winterbe.com/posts/2014/04/05/java8-nashorn-tutorial/

define "support" ? what exactly do you have in mind ?

The two places i can think of in solr that support scripting already 
support Nashorn by default (assuming it's part of your JVM) via the 
ScriptEngine API (mentioned in the blog you linked to)...

https://lucene.apache.org/solr/4_10_0/solr-core/org/apache/solr/update/processor/StatelessScriptUpdateProcessorFactory.html
https://lucene.apache.org/solr/4_10_0/solr-dataimporthandler/org/apache/solr/handler/dataimport/ScriptTransformer.html




-Hoss
http://www.lucidworks.com/


ECMASCript Nashorn tutorial

2014-10-15 Thread William Bell
Since we are moving to Java 8, how about we support Nashorn?

http://winterbe.com/posts/2014/04/05/java8-nashorn-tutorial/


-- 
Bill Bell
billnb...@gmail.com
cell 720-256-8076


Re: SolrCloud 4.7 not doing distributed search when querying from a load balancer.

2014-10-15 Thread S.L
Tim,

Thanks for the suggestion.

I have rerun the query by adding shards.info=true and debug= track. I have
included the xml data for both teh scenarios below , thin happens
intermittently on SolrCloud 4.10.1 , with a replication factor of 2 and 3
shards (6 cores) , I get result in one execution of query and then no
results for the subsequent one , I am hoping someone would be able to help
me find the root cause with this additional information ,I have included
the query output with the additional parameters for the both the scenarios
below .

Thanks for your help!

*Scenario #1 : In this try I get no results back. Here is what the query
returns.*



   
  0
  29
  
 *:*
 true
 true
 track
 xml
 (id:e8995da8-7d98-4010-93b4-8ff7dffb8bfb)
  
   
   
  http://server1.mydomain.com:8082/solr/dyCollection1_shard3_replica2/|http://server2.mydomain.com:8082/solr/dyCollection1_shard3_replica1/
">
 0
 0.0
 
http://server2.mydomain.com:8082/solr/dyCollection1_shard3_replica1
 4
  
  http://server3.mydomain.com:8082/solr/dyCollection1_shard1_replica1/|http://server2.mydomain.com:8081/solr/dyCollection1_shard1_replica2/
">
 0
 0.0
 
http://server3.mydomain.com:8082/solr/dyCollection1_shard1_replica1
 13
  
  http://server1.mydomain.com:8081/solr/dyCollection1_shard2_replica1/|http://server3.mydomain.com:8081/solr/dyCollection1_shard2_replica2/
">
 0
 0.0
 
http://server1.mydomain.com:8081/solr/dyCollection1_shard2_replica1
 26
  
   
   
   
  
 false
  
   
   
  
 server3.mydomain.com-dyCollection1_shard2_replica2-1413398784226-17
 
http://server1.mydomain.com:8082/solr/dyCollection1_shard3_replica2/|http://server2.mydomain.com:8082/solr/dyCollection1_shard3_replica1/
">
   1
   4
   GET_TOP_IDS
   0
   {responseHeader={status=0,QTime=1,params={spellcheck=true,spellcheck.maxCollationTries=10,distrib=false,debug=[false,
track],version=2,NOW=1413398784225,shard.url=
http://server1.mydomain.com:8082/solr/dyCollection1_shard3_replica2/|http://server2.mydomain.com:8082/solr/dyCollection1_shard3_replica1/,df=suggestAggregate,fl=thingURL,score,debugQuery=false,spellcheck.count=10,fq=(id:e8995da8-7d98-4010-93b4-8ff7dffb8bfb),fsv=true,spellcheck.alternativeTermCount=5,spellcheck.maxResultsForSuggest=5,spellcheck.collateExtendedResults=true,spellcheck.extendedResults=true,spellcheck.maxCollations=5,wt=javabin,spellcheck.collate=true,requestPurpose=GET_TOP_IDS,rows=10,rid=server3.mydomain.com-dyCollection1_shard2_replica2-1413398784226-17,start=0,q=*:*,shards.info=true,spellcheck.dictionary=[direct,
wordbreak],isShard=true}},response={numFound=0,start=0,maxScore=0.0,docs=[]},sort_values={},debug={}}

http://server3.mydomain.com:8082/solr/dyCollection1_shard1_replica1/|http://server2.mydomain.com:8081/solr/dyCollection1_shard1_replica2/
">
   10
   13
   GET_TOP_IDS
   0
   {responseHeader={status=0,QTime=10,params={spellcheck=true,spellcheck.maxCollationTries=10,distrib=false,debug=[false,
track],version=2,NOW=1413398784225,shard.url=
http://server3.mydomain.com:8082/solr/dyCollection1_shard1_replica1/|http://server2.mydomain.com:8081/solr/dyCollection1_shard1_replica2/,df=suggestAggregate,fl=thingURL,score,debugQuery=false,spellcheck.count=10,fq=(id:e8995da8-7d98-4010-93b4-8ff7dffb8bfb),fsv=true,spellcheck.alternativeTermCount=5,spellcheck.maxResultsForSuggest=5,spellcheck.collateExtendedResults=true,spellcheck.extendedResults=true,spellcheck.maxCollations=5,wt=javabin,spellcheck.collate=true,requestPurpose=GET_TOP_IDS,rows=10,rid=server3.mydomain.com-dyCollection1_shard2_replica2-1413398784226-17,start=0,q=*:*,shards.info=true,spellcheck.dictionary=[direct,
wordbreak],isShard=true}},response={numFound=0,start=0,maxScore=0.0,docs=[]},sort_values={},debug={}}

http://server1.mydomain.com:8081/solr/dyCollection1_shard2_replica1/|http://server3.mydomain.com:8081/solr/dyCollection1_shard2_replica2/
">
   24
   26
   GET_TOP_IDS
   0
   {responseHeader={status=0,QTime=24,params={spellcheck=true,spellcheck.maxCollationTries=10,distrib=false,debug=[false,
track],version=2,NOW=1413398784225,shard.url=
http://server1.mydomain.com:8081/solr/dyCollection1_shard2_replica1/|http://server3.mydomain.com:8081/solr/dyCollection1_shard2_replica2/,df=suggestAggregate,fl=thingURL,score,debugQuery=false,spellcheck.count=10,fq=(id:e8995da8-7d98-4010-93b4-8ff7dffb8bfb),fsv=true,spellcheck.alternativeTermCount=5,spellcheck.maxResultsForSuggest=5,spellcheck.collateExtendedResults=true,spellcheck.extendedResults=true,spellcheck.maxCollations=5,wt=javabin,spellcheck.collate=true,requestPurpose=GET_TOP

RE: Combine boosts

2014-10-15 Thread Corey Gerhardt
I realized the problem is my code.  A person can send multiple boost 
parameters.  I have a quirk in that I'm using solrnet but can probably find a 
work around.

Corey

-Original Message-
From: Corey Gerhardt [mailto:corey.gerha...@directwest.com] 
Sent: October-15-14 11:34 AM
To: Solr User List
Subject: Combine boosts

Using Edismax.

Is it possible to have multiple functions in a boost?

For example boost = (if(not(BUS_IS_TOLL_FREE),log(10),product(log(10),0.1))) & 
if(exists(query({!v="BUS_CITY:saskatoon"})),20,1)

Thanks,

Corey


Combine boosts

2014-10-15 Thread Corey Gerhardt
Using Edismax.

Is it possible to have multiple functions in a boost?

For example boost = (if(not(BUS_IS_TOLL_FREE),log(10),product(log(10),0.1))) & 
if(exists(query({!v="BUS_CITY:saskatoon"})),20,1)

Thanks,

Corey


Re: Does the SolrQuery class support 'inc' command?

2014-10-15 Thread Matthew Nigl
Hi Aaron,

I do not believe that there is direct support for atomic updates with this
API. Other libraries such as Solarium do support this.

However, one option is that you could generate the atomic update XML in
code and send it via a raw request:
http://php.net/manual/en/solrclient.request.php

Regards,
Matt

On 15 October 2014 22:23, Aaron Lewis  wrote:

> Hi,
>
> I'm trying to do a partial update, according to
>
> https://cwiki.apache.org/confluence/display/solr/Updating+Parts+of+Documents
>
> I can use 'inc' command to do that, but I couldn't find relevant PHP API
> here:
> http://php.net/manual/en/class.solrquery.php
>
> Anyone know what function should I use? I tried 'addField' but that
> would just override existing entry
>
> --
> Best Regards,
> Aaron Lewis - PGP: 0x13714D33 - http://pgp.mit.edu/
> Finger Print:   9F67 391B B770 8FF6 99DC  D92D 87F6 2602 1371 4D33
>


Re: Solr Synonyms, Escape space in case of multi words

2014-10-15 Thread David Philip
Sorry, analysis page clip is getting trimmed off and hence the indention is
lost.

Here it is :

ridemakers | ride | ridemakerz | ride | ridemark | ride | makers | makerz|
care

expected:

ridemakers | ride | ridemakerz | ride | ridemark | ride | makers |
makerz| *ride
care*



On Wed, Oct 15, 2014 at 7:21 PM, David Philip 
wrote:

> contd..
>
> expectation was that the "ride care"  should not have split into two
> tokens.
>
> It should have been as below. Please correct me/point me where I am wrong.
>
>
> Input : ridemakers, ride makers, ridemakerz, ride makerz, ride\mark, ride\
> care
>
> o/p
>
> ridemakersrideridemakerzrideridemarkridemakersmakerz
>
> *ride care*
>
>
>
>
> On Wed, Oct 15, 2014 at 7:16 PM, David Philip  > wrote:
>
>> Hi All,
>>
>>I remember using multi-words in synonyms in Solr 3.x version. In case
>> of multi words, I was escaping space with back slash[\] and it work as
>> intended.  Ex: ride\ makers, riders, rider\ guards.  Each one mapped to
>> each other and so when I searched for ride makers, I obtained the search
>> results for all of them. The field type was same as below. I have same set
>> up in solr 4.10 but now the multi word space escape is getting ignored. It
>> is tokenizing on spaces.
>>
>>  synonyms.txt
>> ridemakers, ride makers, ridemakerz, ride makerz, ride\mark, ride\
>> care
>>
>>
>> Analysis page:
>>
>> ridemakersrideridemakerzrideridemarkridemakersmakerzcare
>>
>> Field Type
>>
>> > positionIncrementGap="100">
>>   
>> 
>> > ignoreCase="true" expand="true"/>
>>   
>> 
>>
>>
>>
>> Could you please tell me what could be the issue? How do I handle
>> multi-word cases?
>>
>>
>>
>>
>> synonyms.txt
>> ridemakers, ride makers, ridemakerz, ride makerz, ride\mark, ride\
>> care
>>
>>
>> Thanks - David
>>
>>
>>
>
>


Re: Solr Synonyms, Escape space in case of multi words

2014-10-15 Thread David Philip
contd..

expectation was that the "ride care"  should not have split into two tokens.

It should have been as below. Please correct me/point me where I am wrong.


Input : ridemakers, ride makers, ridemakerz, ride makerz, ride\mark, ride\
care

o/p

ridemakersrideridemakerzrideridemarkridemakersmakerz

*ride care*




On Wed, Oct 15, 2014 at 7:16 PM, David Philip 
wrote:

> Hi All,
>
>I remember using multi-words in synonyms in Solr 3.x version. In case
> of multi words, I was escaping space with back slash[\] and it work as
> intended.  Ex: ride\ makers, riders, rider\ guards.  Each one mapped to
> each other and so when I searched for ride makers, I obtained the search
> results for all of them. The field type was same as below. I have same set
> up in solr 4.10 but now the multi word space escape is getting ignored. It
> is tokenizing on spaces.
>
>  synonyms.txt
> ridemakers, ride makers, ridemakerz, ride makerz, ride\mark, ride\ care
>
>
> Analysis page:
>
> ridemakersrideridemakerzrideridemarkridemakersmakerzcare
>
> Field Type
>
>  positionIncrementGap="100">
>   
> 
>  ignoreCase="true" expand="true"/>
>   
> 
>
>
>
> Could you please tell me what could be the issue? How do I handle
> multi-word cases?
>
>
>
>
> synonyms.txt
> ridemakers, ride makers, ridemakerz, ride makerz, ride\mark, ride\ care
>
>
> Thanks - David
>
>
>


Solr Synonyms, Escape space in case of multi words

2014-10-15 Thread David Philip
Hi All,

   I remember using multi-words in synonyms in Solr 3.x version. In case of
multi words, I was escaping space with back slash[\] and it work as
intended.  Ex: ride\ makers, riders, rider\ guards.  Each one mapped to
each other and so when I searched for ride makers, I obtained the search
results for all of them. The field type was same as below. I have same set
up in solr 4.10 but now the multi word space escape is getting ignored. It
is tokenizing on spaces.

 synonyms.txt
ridemakers, ride makers, ridemakerz, ride makerz, ride\mark, ride\ care


Analysis page:

ridemakersrideridemakerzrideridemarkridemakersmakerzcare

Field Type


  


  




Could you please tell me what could be the issue? How do I handle
multi-word cases?




synonyms.txt
ridemakers, ride makers, ridemakerz, ride makerz, ride\mark, ride\ care


Thanks - David


Run a query via SolrJ using Request URL

2014-10-15 Thread Dickinson, Cliff
I'm faily new to Solr and have run into an issue that I cannot figure out how 
to solve.   I'm trying to implement a Save Search requirement similar to 
bookmarking to allow the same search to be run in the future.  Once the 
original search is executed from within a Spring app, I use the 
ClientUtils.toQueryString() to store a copy of the actual request URL that was 
sent to the Solr Server in a database table(if save is requested).  That part 
works great, but now I can't find anything in the SolrJ API that will allow me 
to run that query from the original URL rather than having to piece it together 
again via a SolrQuery object.  Is there anything out there to run from this URL 
or do I need to manually split it out and build the SolrQuery?

Thanks in advance for the advice!

Cliff

This email and any attachments may contain NCAA confidential and privileged 
information. If you are not the intended recipient, please notify the sender 
immediately by return email, delete this message and destroy any copies. Any 
dissemination or use of this information by a person other than the intended 
recipient is unauthorized and may be illegal.


LBHttpSolrServer is creating new instance of HttpSolrServer in request(Req req) method

2014-10-15 Thread sachinpkale
Following is the line from the request(Req req) method of LBHttpSolrServer
class:

  *HttpSolrServer server = makeServer(serverStr);*

Won't it create new instance of HttpSolrServer for every request?





--
View this message in context: 
http://lucene.472066.n3.nabble.com/LBHttpSolrServer-is-creating-new-instance-of-HttpSolrServer-in-request-Req-req-method-tp4164359.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: How can I pass in query request parameter at search time and know of it in my query analyzer/tokenizer?

2014-10-15 Thread Ilia Sretenskii
You can implement your own kind of SeachHandler to pass your custom request
parameters to keep the common parameters clean.
http://svn.apache.org/repos/asf/lucene/dev/trunk/solr/core/src/java/org/apache/solr/handler/component/SearchHandler.java

As you can see, the SearchHandler.handleRequestBody() method just takes
expected parameters from the SolrQueryRequest request.
Your own handler can expect more parameters than those defined in the
CommonParams interface.
https://svn.apache.org/repos/asf/lucene/dev/trunk/solr/solrj/src/java/org/apache/solr/common/params/CommonParams.java

These are the base classes whose extensions might be useful for the
development.
http://svn.apache.org/repos/asf/lucene/dev/trunk/solr/core/src/java/org/apache/solr/handler/RequestHandlerBase.java
http://svn.apache.org/repos/asf/lucene/dev/trunk/solr/core/src/java/org/apache/solr/handler/component/SearchComponent.java

You can find more information about the request handlers and the search
components in the guide.
https://cwiki.apache.org/confluence/display/solr/RequestHandlers+and+SearchComponents+in+SolrConfig


Does the SolrQuery class support 'inc' command?

2014-10-15 Thread Aaron Lewis
Hi,

I'm trying to do a partial update, according to
https://cwiki.apache.org/confluence/display/solr/Updating+Parts+of+Documents

I can use 'inc' command to do that, but I couldn't find relevant PHP API here:
http://php.net/manual/en/class.solrquery.php

Anyone know what function should I use? I tried 'addField' but that
would just override existing entry

-- 
Best Regards,
Aaron Lewis - PGP: 0x13714D33 - http://pgp.mit.edu/
Finger Print:   9F67 391B B770 8FF6 99DC  D92D 87F6 2602 1371 4D33


[SOLVED] Re: How does one specify which collections to use?

2014-10-15 Thread Aaron Lewis
Thanks Matt. I misunderstood that option ;-)

On Wed, Oct 15, 2014 at 6:43 PM, Matthew Nigl  wrote:
> Hi Aaron,
>
> You would need to set 'path' in the options array when you create a
> SolrClient.
>
> http://php.net/manual/en/solrclient.construct.php
>
> Regards,
> Matt
>
> On 15 October 2014 19:31, Aaron Lewis  wrote:
>
>> Hi,
>>
>> I'm using PHP client here:
>> http://php.net/manual/en/class.solrquery.php
>>
>> I couldn't figure out how to use "collection2" instead of
>> "collection1", I don't see such options in constructor either.
>>
>> Anyone know that?
>>
>> --
>> Best Regards,
>> Aaron Lewis - PGP: 0x13714D33 - http://pgp.mit.edu/
>> Finger Print:   9F67 391B B770 8FF6 99DC  D92D 87F6 2602 1371 4D33
>>



-- 
Best Regards,
Aaron Lewis - PGP: 0x13714D33 - http://pgp.mit.edu/
Finger Print:   9F67 391B B770 8FF6 99DC  D92D 87F6 2602 1371 4D33


Re: How does one specify which collections to use?

2014-10-15 Thread Matthew Nigl
Hi Aaron,

You would need to set 'path' in the options array when you create a
SolrClient.

http://php.net/manual/en/solrclient.construct.php

Regards,
Matt

On 15 October 2014 19:31, Aaron Lewis  wrote:

> Hi,
>
> I'm using PHP client here:
> http://php.net/manual/en/class.solrquery.php
>
> I couldn't figure out how to use "collection2" instead of
> "collection1", I don't see such options in constructor either.
>
> Anyone know that?
>
> --
> Best Regards,
> Aaron Lewis - PGP: 0x13714D33 - http://pgp.mit.edu/
> Finger Print:   9F67 391B B770 8FF6 99DC  D92D 87F6 2602 1371 4D33
>


Re: Delete in Solr based on foreign key (like SQL delete from … where id in (select id from…)

2014-10-15 Thread Matthew Nigl
Absolutely, using Mikhail's code would be the first thing I would do. You
can see the details in SOLR-6234 and
https://github.com/m-khl/lucene-join-solr-query-parser

Otherwise, the only alternative I can think of (without reindexing) would
be to run the select query as provided, returning the ID's of the offending
documents (for reference, you could use grouping or the collapsing query
parser if you just want to get distinct values; faceting is also an
option). Then write a script to iterate through a batch of ID's at a time
and send a delete to Solr, such as id:(100 OR 101 OR
...). Since there are many documents to delete, you would
want to hold off committing until the end.

On 11 October 2014 02:34, Mikhail Khludnev 
wrote:

> On Fri, Oct 10, 2014 at 6:16 AM, Matthew Nigl 
> wrote:
>
> > But I get the same response as in
> > https://issues.apache.org/jira/browse/SOLR-6357
> >
>
> there is a mention for cure (SOLR-6234
> ) over there
>
> --
> Sincerely yours
> Mikhail Khludnev
> Principal Engineer,
> Grid Dynamics
>
> 
> 
>


How does one specify which collections to use?

2014-10-15 Thread Aaron Lewis
Hi,

I'm using PHP client here:
http://php.net/manual/en/class.solrquery.php

I couldn't figure out how to use "collection2" instead of
"collection1", I don't see such options in constructor either.

Anyone know that?

-- 
Best Regards,
Aaron Lewis - PGP: 0x13714D33 - http://pgp.mit.edu/
Finger Print:   9F67 391B B770 8FF6 99DC  D92D 87F6 2602 1371 4D33


fuzzy search and edismax: how to do not sum up

2014-10-15 Thread elisabeth benoit
Hello all,

We are using solr 4.2.1 (but planning to switch to solr 4.10 very soon).

We are trying to do approximative search using ~ operator.

We use catchall_light field without stemming (to do not mix fuzzy and
stemming)

We send a request to solr using fuzzy operator on non "frequent" words

for instance

q=catchall_light:(lyon 69002~1)

our handler uses edismax

that query gives a higher score to document Lyon, having postal codes
69001, 69002, 69003, 69004,...

than to other documents having only Lyon and postal code 69002 (the debug
output is below)

but we do not want to sum up all scores for Lyon document.

Does anyone knows if it is possible to change that?

Best regards,
Elisabeth


here is the debug output for Lyon
(we use idf for that field but want to get rid of it)

15.728481 = (MATCH) sum of:
  1.2349477 = (MATCH) weight(catchall_light:lyon in 707758)
[NoTFSimilarity], result of:
1.2349477 = score(doc=707758,freq=1.0 = termFreq=1.0
), product of:
  0.13427915 = queryWeight, product of:
9.196869 = idf(docFreq=2924, maxDocs=10616483)
0.014600528 = queryNorm
  9.196869 = fieldWeight in 707758, product of:
1.0 = tf(freq=1.0), with freq of:
  1.0 = termFreq=1.0
9.196869 = idf(docFreq=2924, maxDocs=10616483)
1.0 = fieldNorm(doc=707758)
  14.493534 = (MATCH) sum of:
1.576392 = (MATCH) weight(catchall_light:69001^0.8 in 707758)
[NoTFSimilarity], result of:
  1.576392 = score(doc=707758,freq=1.0 = termFreq=1.0
), product of:
0.13569424 = queryWeight, product of:
  0.8 = boost
  11.617237 = idf(docFreq=259, maxDocs=10616483)
  0.014600528 = queryNorm
11.617237 = fieldWeight in 707758, product of:
  1.0 = tf(freq=1.0), with freq of:
1.0 = termFreq=1.0
  11.617237 = idf(docFreq=259, maxDocs=10616483)
  1.0 = fieldNorm(doc=707758)
1.8904426 = (MATCH) weight(catchall_light:69002 in 707758)
[NoTFSimilarity], result of:
  1.8904426 = score(doc=707758,freq=1.0 = termFreq=1.0
), product of:
0.16613688 = queryWeight, product of:
  11.378826 = idf(docFreq=329, maxDocs=10616483)
  0.014600528 = queryNorm
11.378826 = fieldWeight in 707758, product of:
  1.0 = tf(freq=1.0), with freq of:
1.0 = termFreq=1.0
  11.378826 = idf(docFreq=329, maxDocs=10616483)
  1.0 = fieldNorm(doc=707758)
1.460347 = (MATCH) weight(catchall_light:69003^0.8 in 707758)
[NoTFSimilarity], result of:
  1.460347 = score(doc=707758,freq=1.0 = termFreq=1.0
), product of:
0.13060425 = queryWeight, product of:
  0.8 = boost
  11.181466 = idf(docFreq=401, maxDocs=10616483)
  0.014600528 = queryNorm
11.181466 = fieldWeight in 707758, product of:
  1.0 = tf(freq=1.0), with freq of:
1.0 = termFreq=1.0
  11.181466 = idf(docFreq=401, maxDocs=10616483)
  1.0 = fieldNorm(doc=707758)
1.7109065 = (MATCH) weight(catchall_light:69004^0.8 in 707758)
[NoTFSimilarity], result of:
  1.7109065 = score(doc=707758,freq=1.0 = termFreq=1.0
), product of:
0.14136517 = queryWeight, product of:
  0.8 = boost
  12.102744 = idf(docFreq=159, maxDocs=10616483)
  0.014600528 = queryNorm
12.102744 = fieldWeight in 707758, product of:
  1.0 = tf(freq=1.0), with freq of:
1.0 = termFreq=1.0
  12.102744 = idf(docFreq=159, maxDocs=10616483)
  1.0 = fieldNorm(doc=707758)
1.5255939 = (MATCH) weight(catchall_light:69005^0.8 in 707758)
[NoTFSimilarity], result of:
  1.5255939 = score(doc=707758,freq=1.0 = termFreq=1.0
), product of:
0.13349001 = queryWeight, product of:
  0.8 = boost
  11.428525 = idf(docFreq=313, maxDocs=10616483)
  0.014600528 = queryNorm
11.428525 = fieldWeight in 707758, product of:
  1.0 = tf(freq=1.0), with freq of:
1.0 = termFreq=1.0
  11.428525 = idf(docFreq=313, maxDocs=10616483)
  1.0 = fieldNorm(doc=707758)
1.6497903 = (MATCH) weight(catchall_light:69006^0.8 in 707758)
[NoTFSimilarity], result of:
  1.6497903 = score(doc=707758,freq=1.0 = termFreq=1.0
), product of:
0.13881733 = queryWeight, product of:
  0.8 = boost
  11.884614 = idf(docFreq=198, maxDocs=10616483)
  0.014600528 = queryNorm
11.884614 = fieldWeight in 707758, product of:
  1.0 = tf(freq=1.0), with freq of:
1.0 = termFreq=1.0
  11.884614 = idf(docFreq=198, maxDocs=10616483)
  1.0 = fieldNorm(doc=707758)
1.5892421 = (MATCH) weight(catchall_light:69007^0.8 in 707758)
[NoTFSimilarity], result of:
  1.5892421 = score(doc=707758,freq=1.0 = termFreq=1.0
), product of:
0.13624617 = queryWeight, product of:
  0.8 = boost
  11.66449 = idf(docFreq=247, maxDocs=10616483)
  0.014600528 = queryNorm
11.66449 = 

Re: Weird Problem (possible bug?) with german stemming and wildcard search

2014-10-15 Thread Thomas Michael Engelke

Thank you very much,

this information is worht it's weight in gold. So far, we've used the 
asterisk method because it seemed logical and straight-forward. We will 
slowly migrate to a version using EdgeNGramFilterFactory.


Thanks a bunch.

Am 07.10.2014 14:42 schrieb Alexandre Rafalovitch:


On 7 October 2014 08:25, Thomas Michael Engelke
 wrote:

So the culprit is the asterisk at the end. As far as we can read from 
the docs, an asterisk is just 0 or more characters, which means that 
the literal word in front of the asterisk should match the query.


Not quite: http://wiki.apache.org/solr/MultitermQueryAnalysis [1]

It's actually quite complicated and even depends on exact version of
Solr you are using. In fact, out of all the analyzers you showed
above, I think only LowerCase will be present on the chain. Look for
(multi) marker at: http://www.solr-start.com/info/analyzers/ [2] for 
more

details.

On a higher level, I would suggest getting away from *-based expansion
and looking at EdgeNGrams instead. You can see an example of
autocomplete at
http://www.solr-start.com/javadoc/solr-lucene/index.html [3] and the
matching configuration at:
https://github.com/arafalov/Solr-Javadoc/blob/master/JavadocIndex/JavadocCollection/conf/schema.xml#L24 
[4]


Or a dedicated Suggester module, though information on that is a bit
harder to find.

Regards,
Alex.

Personal: http://www.outerthoughts.com/ [5] and @arafalov
Solr resources and newsletter: http://www.solr-start.com/ [6] and 
@solrstart
Solr popularizers community: 
https://www.linkedin.com/groups?gid=6713853 [7]



Links:
--
[1] http://wiki.apache.org/solr/MultitermQueryAnalysis
[2] http://www.solr-start.com/info/analyzers/
[3] http://www.solr-start.com/javadoc/solr-lucene/index.html
[4] 
https://github.com/arafalov/Solr-Javadoc/blob/master/JavadocIndex/JavadocCollection/conf/schema.xml#L24

[5] http://www.outerthoughts.com/
[6] http://www.solr-start.com/
[7] https://www.linkedin.com/groups?gid=6713853