Re: Solr grouping problem - need help

2015-01-14 Thread Naresh Yadav
as my problem is changed completely from first post so i had created new
thread for that.

On Wed, Jan 14, 2015 at 7:31 PM, Naresh Yadav  wrote:

> just wanted to share schema and results for same :
>
> solr version :  4.6.1
> Schema : http://www.imagesup.net/?di=10142124357616
> Code :http://www.imagesup.net/?di=10142124381116
> Response Group :  http://www.imagesup.net/?di=1114212438351
> Response Terms : http://www.imagesup.net/?di=614212438580
>
> Please help me on this problem where no of groups are not matching with no
> of terms which is expected behaviour acc to me.
> Please give direction on this problem.
>
>
> On Wed, Jan 14, 2015 at 5:24 PM, Naresh Yadav 
> wrote:
>
>> I tried what you said also appended group.ngroups=true and got same
>> result not expected onengroups coming is 1.
>> i am on solr-4.6.1 single machine default setup.
>>
>>
>> On Wed, Jan 14, 2015 at 4:43 PM, Norgorn  wrote:
>>
>>> Can u get raw SOLR response?
>>>
>>> For me grouping works exactly the way u expect it to work.
>>>
>>> Try direct query in browser to be sure the problem is not in your code.
>>>
>>> http://192.168.0.1:8983/solr/collection1/select?q=*:*&group=true&group.field=tenant_pool
>>>
>>>
>>>
>>> --
>>> View this message in context:
>>> http://lucene.472066.n3.nabble.com/Solr-grouping-problem-need-help-tp4179149p4179464.html
>>> Sent from the Solr - User mailing list archive at Nabble.com.
>>>
>>
>>
>>
>>
>
>
>
>


Solr groups not matching with terms in a field

2015-01-14 Thread Naresh Yadav
Hi all,

I had done following configuration to test Solr grouping concept.

solr version :  4.6.1 (tried in latest version 4.10.3 also)
Schema : http://www.imagesup.net/?di=10142124357616
Solrj code to insert docs :http://www.imagesup.net/?di=10142124381116
Response Group's :  http://www.imagesup.net/?di=1114212438351
Response Terms' : http://www.imagesup.net/?di=614212438580

Please let me know if am i doing something wrong here.


Re: Engage custom hit collector for special search processing

2015-01-14 Thread William Bell
We all need example data, and a sample query to help you.

You can use "group" to group by a field and remove dupes.

If you want to remove dupes you can do something like:

q=field1:DOG AND NOT field2:DOG AND NOT field3:DOG

That will remove DOG from field2 or field3.

If you don't care if it is in any field, you can use dismax/edismax and qf,
or you can just use OR.

q=field1:DOG OR field2:DOG OR field3:DOG

If you have a set of values that you want to remove duplicates at INDEX
time you can do that with SQL (if coming from SQL), and write code in the
DIH.

var x = row.get("field1");
var x1 = row.get("field2");
var x2 = row.get("field3");

if (x.equals(x1)) {
   row.put("field2", "");
}

if (x.equals(x2)) {
   row.put("field3","");
}

That way you eliminate the dupes at index time...

Bill







On Tue, Jan 13, 2015 at 2:29 PM, tedsolr  wrote:

> I have a complicated problem to solve, and I don't know enough about
> lucene/solr to phrase the question properly. This is kind of a shot in the
> dark. My requirement is to return search results always in completely
> "collapsed" form, rolling up duplicates with a count. Duplicates are
> defined
> by whatever fields are requested. If the search requests fields A, B, C,
> then all matched documents that have identical values for those 3 fields
> are
> "dupes". The field list may change with every new search request. What I do
> know is the super set of all fields that may be part of the field list at
> index time.
>
> I know this can't be done with configuration alone. It doesn't seem
> performant to retrieve all 1M+ docs and post process in Java. A very smart
> person told me that a custom hit collector should be able to do the
> filtering for me. So, maybe I create a custom search handler that somehow
> exposes this custom hit collector that can use FieldCache or DocValues to
> examine all the matches and filter the results in the way I've described
> above.
>
> So assuming this is a viable solution path, can anyone suggest some helpful
> posts, code fragments, books for me to review? I admit to being out of my
> depth, but this requirement isn't going away. I'm grasping for straws right
> now.
>
> thanks
> (using Solr 4.9)
>
>
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Engage-custom-hit-collector-for-special-search-processing-tp4179348.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>



-- 
Bill Bell
billnb...@gmail.com
cell 720-256-8076


Re: Conditions in function query

2015-01-14 Thread Erick Erickson
Nest them perhaps?

Best
Erick

On Wed, Jan 14, 2015 at 7:07 PM, shamik  wrote:
> Thanks Eric, I did take a look at the "if" condition earlier, but not sure
> how that can be used for multiple conditions. It works for a single
> condition :
>
>  if(termfreq(Source2,'A'),sum(Likes,3),0)
>
> But for multiple, I'm struggling to find the right syntax. I tried using OR
> in conjunction but hasn't worked out so far.
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Conditions-in-Boost-function-query-tp4179687p4179696.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Distributed Search returns Empty document list

2015-01-14 Thread Jaikit Savla
Hello,
I am running Solr (4.10) in cloud mode by configuring multiple collections (1 
for each day). Structure is as shown below. I can fetch documents for given 
query, if I query individual collection. However when I send distributed 
request to multiple shards, I only see numFound and no documents returned. 
Appreciate any pointers on setup. 

--- directory structure:solrcollection1   //(does 
not have any index)collection_20150112collection_20150113
Command to run Solr:sh bin/solr restart -d example -cloud -p  -noprompt

Set up RequestHandler called alias in solrconfig.xml of collecton1
  
 
   explicit
   json
   true
   text
   score,*
   http://localhost:/solr/collection_20150113,http://localhost:/solr/collection_20150112
 


http://localhost:/solr/collection1/alias?q=domain:com&debug=false&shard.info=true
e.g{   
   - responseHeader: {  
  - status: 0,
  - QTime: 19,
  - params: { 
 - q: "domain:com",
 - debug: "false",
 - shard.info: "true"
}
},
   - response: {  
  - numFound: 11696,
  - start: 0,
  - maxScore: 1.3015664,
  - docs: [ ]
}
}

Thanks

Re: Conditions in function query

2015-01-14 Thread shamik
Thanks Eric, I did take a look at the "if" condition earlier, but not sure
how that can be used for multiple conditions. It works for a single
condition :

 if(termfreq(Source2,'A'),sum(Likes,3),0)

But for multiple, I'm struggling to find the right syntax. I tried using OR
in conjunction but hasn't worked out so far.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Conditions-in-Boost-function-query-tp4179687p4179696.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Distributed search across Solr cores in a collection - NPE

2015-01-14 Thread Jaikit Savla
It was because I did not have unique id's in my index. I added that and it 
worked. Also it is mentioned as one of the requirement for Distributed Search.
Thanks,Jaikit

 

 On Wednesday, January 14, 2015 1:53 AM, Jaikit Savla 
 wrote:
   

 Folks,
I have set up 3 cores in a single collection and they all have same schema but 
different index. I have set unique Id required field to false.
When I run query against single core, it works fine. But when I add the shard 
param and point to different core than request fails with NPE. I looked up on 
the source code for QueryComponent and line 1043 
isresultIds.put(shardDoc.id.toString(), shardDoc);
looks like the the shardDoc id.toString() is throwing 
NPE.http://grepcode.com/file/repo1.maven.org/maven2/org.apache.solr/solr-core/4.10.1/org/apache/solr/handler/component/QueryComponent.java#QueryComponent.mergeIds%28org.apache.solr.handler.component.ResponseBuilder%2Corg.apache.solr.handler.component.ShardRequest%29
Any clue on if my set up is incorrect ?

 
http://localhost:/solr/core0/select?shards=localhost:/solr/core1&q=title:amazon&fl=*&rows=10&wt=json

RESPONSE:   
{"responseHeader":{"status":500,"QTime":11,"params":{"fl":"*","shards":"localhost:/solr/core1","q":"domain:amazon","wt":"json","rows":"10"}},"error":{"trace":"java.lang.NullPointerException\n\tat
 
org.apache.solr.handler.component.QueryComponent.mergeIds(QueryComponent.java:1043)\n\tat
 
org.apache.solr.handler.component.QueryComponent.handleRegularResponses(QueryComponent.java:716)\n\tat
 
org.apache.solr.handler.component.QueryComponent.handleResponses(QueryComponent.java:695)\n\tat
 
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:324)\n\tat
 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)\n\tat
 org.apache.solr.core.SolrCore.execute(SolrCore.java:1967)\n\tat 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:777)\n\tat
 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:418)\n\tat
 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:207)\n\tat
 
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419)\n\tat
 
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455)\n\tat
 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)\n\tat
 
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557)\n\tat
 
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)\n\tat
 
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075)\n\tat
 
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384)\n\tat 
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)\n\tat
 
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009)\n\tat
 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)\n\tat
 
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)\n\tat
 
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154)\n\tat
 
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)\n\tat
 org.eclipse.jetty.server.Server.handle(Server.java:368)\n\tat 
org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489)\n\tat
 
org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53)\n\tat
 
org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:942)\n\tat
 
org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:1004)\n\tat
 org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:640)\n\tat 
org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235)\n\tat 
org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72)\n\tat
 
org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264)\n\tat
 
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)\n\tat
 
org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)\n\tat
 java.lang.Thread.run(Thread.java:745)\n","code":500}}
Appreciate any pointers.
Thanks,Jaikit


   

Re: Conditions in function query

2015-01-14 Thread Erick Erickson
Why won't the "if" clause work? See:
https://cwiki.apache.org/confluence/display/solr/Function+Queries

On Wed, Jan 14, 2015 at 5:29 PM, Shamik Bandopadhyay  wrote:
> Hi,
>
>Just wanted to know if it's possible to provide conditions with a
> function query. Right now,I'm using the following functions to boost on
> Likes data.
>
> bf=recip(ms(NOW/DAY,PublishDate),3.16e-11,1,1)^2.0 sum(Likes,2)
>
> What I would like to do is to apply the boost on "Likes" based on source.
> For e.g.
>
> if Source="A" or "B" or "C", then sum(Likes,4)
> if Source="D" then sum(Likes,3)
> if Source="E" the sum(Likes,2).
>
> Is it possible to do this using a function ?
>
> Any pointers will be appreciated.
>
> Regards,
> Shamik


Re: Load existing Lucene sharded indexes onto single Solr collection

2015-01-14 Thread Jaikit Savla
Yes, I wanted to get rid of merge step. But looks like merge is not that 
cumbersome either. Thanks Mikhail and Erick for pointers, that helped.
Jaikit 

 On Wednesday, January 14, 2015 8:24 AM, Erick Erickson 
 wrote:
   

 You certainly can't do this into a single directory, there would be
zillions of name conflicts.

I believe I saw Uwe make a comment on the Lucene list about using
MultiReaders and
keeping the sub-indexes in different directories, but that's
lower-level than Solr has access to
Plus, you'd have to control index updates _very_ carefully.

So I don't think there's something built into Solr to work with
indexes like this, so merge is
probably your only option here.

Do note that the contrib MapReduceIndexerTool that will do most all of
this for you, it includes
a --go-live option. That option still copies things around though.

Best,
Erick

On Wed, Jan 14, 2015 at 1:25 AM, Jaikit Savla
 wrote:
> This solution will merge the index as well. I want to find out if merge is 
> "required" before loading indexes onto Solr ?  If that is possible than I can 
> just point solrconfig.xml to directory where I have all the shards.
> Jaikit
>
>      On Wednesday, January 14, 2015 1:11 AM, Mikhail Khludnev 
> wrote:
>
>
>
> On Wed, Jan 14, 2015 at 11:42 AM, Jaikit Savla 
>  wrote:
>
> Now to load this index, I am currently using Lucene IndexMergeTool to merge 
> all the shards into one giant index. My question is, is there a way to load 
> shared index without merging into one giant index on to single collection ?
>
> https://cwiki.apache.org/confluence/display/solr/CoreAdmin+API#CoreAdminAPI-MERGEINDEXES
>  ?
>
>
> --
> Sincerely yours
> Mikhail Khludnev
> Principal Engineer,
> Grid Dynamics
>
>
>
>
>


   

Conditions in function query

2015-01-14 Thread Shamik Bandopadhyay
Hi,

   Just wanted to know if it's possible to provide conditions with a
function query. Right now,I'm using the following functions to boost on
Likes data.

bf=recip(ms(NOW/DAY,PublishDate),3.16e-11,1,1)^2.0 sum(Likes,2)

What I would like to do is to apply the boost on "Likes" based on source.
For e.g.

if Source="A" or "B" or "C", then sum(Likes,4)
if Source="D" then sum(Likes,3)
if Source="E" the sum(Likes,2).

Is it possible to do this using a function ?

Any pointers will be appreciated.

Regards,
Shamik


Re: WordDelimiter Works differently in solr3X vs SolrCloud..?

2015-01-14 Thread gouthsmsimhadri
Thanks Ahmet, that works.



-
 -goutham
--
View this message in context: 
http://lucene.472066.n3.nabble.com/WordDelimiter-Works-differently-in-solr3X-vs-SolrCloud-tp4179647p4179662.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: How to configure Solr PostingsFormat block size

2015-01-14 Thread Chris Hostetter

: As a foolish dev (not malicious I hope!), I did mess around with something
: like this once; I was writing my own Codec.  I found I had to create a file
: called META-INF/services/org.apache.lucene.codecs.Codec in my solr plugin jar
: that contained the fully-qualified class name of my codec: I guess this
: registers it with the SPI framework so it can be found by name?  I'm not

Yep, that's how SPI works - the important bits are mentioned/linked in the 
PostingsFormat (and other SPI related classes in lucene) javadocs...

https://lucene.apache.org/core/4_10_2/core/org/apache/lucene/codecs/PostingsFormat.html

https://docs.oracle.com/javase/7/docs/api/java/util/ServiceLoader.html?is-external=true





-Hoss
http://www.lucidworks.com/


Re: WordDelimiter Works differently in solr3X vs SolrCloud..?

2015-01-14 Thread Ahmet Arslan
Hi,

You could try passing luceneMatchVersion argument to WordDelimiterFilterFactory 
and see if it works for you.
Factory returns Lucene47WordDelimiterFilter before LUCENE_4_8_0.

Ahmet





On Wednesday, January 14, 2015 11:10 PM, gouthsmsimhadri 
 wrote:
Problem:
While migrating the solr version from 3.X(schema version is 1.4)  to cloud
4.10.0 (schema version 1.5), I see a difference in the way the
worddelimiterfilter works for the below configuration 



In the current version, the catentateWords is done on the last postion of
the word delimited, but in the cloud the catenateWords always done on the
position 1 as below

EX: for token – “iPad2” at index
Current Version: 
 

Cloud Version:
   

When “ipad2” is searched the parsed query on fieldXX using WDF 
+fieldXX:\"(ipad2 ipad) 2\"^10.0"  doesn’t find a match on the document
which contains “iPad2” in the cloud but finds a match on solr 3X version.

Did implementation of WDF change from 3x Vs Cloud. Is there any work around
to make “iPad2” match when queried for “ipad2” with WDF setting mentioned as
above.




--
View this message in context: 
http://lucene.472066.n3.nabble.com/WordDelimiter-Works-differently-in-solr3X-vs-SolrCloud-tp4179647.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Occasionally getting error in solr suggester component.

2015-01-14 Thread Michael Sokolov
did you build the spellcheck index using spellcheck.build as described 
here: https://cwiki.apache.org/confluence/display/solr/Spell+Checking ?


-Mike

On 01/14/2015 07:19 AM, Dhanesh Radhakrishnan wrote:

Hi,
Thanks for the reply.
As you mentioned in the previous mail I changed buildOnCommit=false in
solrConfig.
After that change, suggestions are not working.
In Solr 4.7 introduced a new approach based on a dedicated SuggestComponent
I'm using that component to build suggestions and lookup implementation is
"AnalyzingInfixLookupFactory"
Is there any work around ??




On Wed, Jan 14, 2015 at 12:47 AM, Michael Sokolov <
msoko...@safaribooksonline.com> wrote:


I think you are probably getting bitten by one of the issues addressed in
LUCENE-5889

I would recommend against using buildOnCommit=true - with a large index
this can be a performance-killer.  Instead, build the index yourself using
the Solr spellchecker support (spellcheck.build=true)

-Mike


On 01/13/2015 10:41 AM, Dhanesh Radhakrishnan wrote:


Hi all,

I am experiencing a problem in Solr SuggestComponent
Occasionally solr suggester component throws an  error like

Solr failed:
{"responseHeader":{"status":500,"QTime":1},"error":{"msg":"suggester was
not built","trace":"java.lang.IllegalStateException: suggester was not
built\n\tat
org.apache.lucene.search.suggest.analyzing.AnalyzingInfixSuggester.
lookup(AnalyzingInfixSuggester.java:368)\n\tat
org.apache.lucene.search.suggest.analyzing.AnalyzingInfixSuggester.
lookup(AnalyzingInfixSuggester.java:342)\n\tat
org.apache.lucene.search.suggest.Lookup.lookup(Lookup.java:240)\n\tat
org.apache.solr.spelling.suggest.SolrSuggester.
getSuggestions(SolrSuggester.java:199)\n\tat
org.apache.solr.handler.component.SuggestComponent.
process(SuggestComponent.java:234)\n\tat
org.apache.solr.handler.component.SearchHandler.handleRequestBody(
SearchHandler.java:218)\n\tat
org.apache.solr.handler.RequestHandlerBase.handleRequest(
RequestHandlerBase.java:135)\n\tat
org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.
handleRequest(RequestHandlers.java:246)\n\tat
org.apache.solr.core.SolrCore.execute(SolrCore.java:1967)\n\tat
org.apache.solr.servlet.SolrDispatchFilter.execute(
SolrDispatchFilter.java:777)\n\tat
org.apache.solr.servlet.SolrDispatchFilter.doFilter(
SolrDispatchFilter.java:418)\n\tat
org.apache.solr.servlet.SolrDispatchFilter.doFilter(
SolrDispatchFilter.java:207)\n\tat
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(
ApplicationFilterChain.java:243)\n\tat
org.apache.catalina.core.ApplicationFilterChain.doFilter(
ApplicationFilterChain.java:210)\n\tat
org.apache.catalina.core.StandardWrapperValve.invoke(
StandardWrapperValve.java:225)\n\tat
org.apache.catalina.core.StandardContextValve.invoke(
StandardContextValve.java:123)\n\tat
org.apache.catalina.core.StandardHostValve.invoke(
StandardHostValve.java:168)\n\tat
org.apache.catalina.valves.ErrorReportValve.invoke(
ErrorReportValve.java:98)\n\tat
org.apache.catalina.valves.AccessLogValve.invoke(
AccessLogValve.java:927)\n\tat
org.apache.catalina.valves.RemoteIpValve.invoke(
RemoteIpValve.java:680)\n\tat
org.apache.catalina.core.StandardEngineValve.invoke(
StandardEngineValve.java:118)\n\tat
org.apache.catalina.connector.CoyoteAdapter.service(
CoyoteAdapter.java:407)\n\tat
org.apache.coyote.http11.AbstractHttp11Processor.process(
AbstractHttp11Processor.java:1002)\n\tat
org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.
process(AbstractProtocol.java:579)\n\tat
org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.
run(JIoEndpoint.java:312)\n\tat
java.util.concurrent.ThreadPoolExecutor.runWorker(
ThreadPoolExecutor.java:1145)\n\tat
java.util.concurrent.ThreadPoolExecutor$Worker.run(
ThreadPoolExecutor.java:615)\n\tat
java.lang.Thread.run(Thread.java:745)\n","code":500}}

This is not freequently happening, but idexing and suggestor component
working togethere  this error will occur.




In solr config


  
haSuggester
AnalyzingInfixLookupFactory  
textSpell
DocumentDictionaryFactory
  
name
packageWeight
true
  



  
true
10
  
  
suggest
  


Can any one suggest where to look to figure out this error and why these
errors are occurring?



Thanks,
dhanesh s.r




--








WordDelimiter Works differently in solr3X vs SolrCloud..?

2015-01-14 Thread gouthsmsimhadri
Problem:
While migrating the solr version from 3.X(schema version is 1.4)  to cloud
4.10.0 (schema version 1.5), I see a difference in the way the
worddelimiterfilter works for the below configuration 



In the current version, the catentateWords is done on the last postion of
the word delimited, but in the cloud the catenateWords always done on the
position 1 as below

EX: for token – “iPad2” at index
Current Version: 
 

Cloud Version:
   

When “ipad2” is searched the parsed query on fieldXX using WDF 
+fieldXX:\"(ipad2 ipad) 2\"^10.0"  doesn’t find a match on the document
which contains “iPad2” in the cloud but finds a match on solr 3X version.

Did implementation of WDF change from 3x Vs Cloud. Is there any work around
to make “iPad2” match when queried for “ipad2” with WDF setting mentioned as
above.




--
View this message in context: 
http://lucene.472066.n3.nabble.com/WordDelimiter-Works-differently-in-solr3X-vs-SolrCloud-tp4179647.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: can't make sense of spellchecker results when using techproducts example

2015-01-14 Thread Chris Hostetter

James: everything you said made perfect sense, and in hindsight was 
actually covered on the page -- it was just hte example that was bogus in 
light of the current config & defaults

I went ahead and fixed it based on your feedback, and beefed up the 
explanation of spellcheck.collateParam.* (now it's part of hte table 
instead of just a one off sentence out of context)

https://cwiki.apache.org/confluence/display/solr/Spell+Checking
https://cwiki.apache.org/confluence/pages/diffpages.action?pageId=32604254&originalId=50859120

thanks!



: Date: Fri, 9 Jan 2015 14:22:43 -0600
: From: "Dyer, James" 
: Reply-To: solr-user@lucene.apache.org
: To: "solr-user@lucene.apache.org" 
: Subject: RE: can't make sense of spellchecker results when using techproducts
: example
: 
: Chris,
: 
: - DirectSpellChecker has a setting for "minPrefix" which the techproducts 
example sets to 1 (also the default).  So it will never try to correct the 
first character.  I think this is both a performance optimization and is based 
on the assumption that we rarely misspell the first character.  This is why it 
will not  correct "hell" to "dell".  I think it will allow you to set this to 
0, if you want your sample query to work.
: 
: - The "maxCollationTries" feature re-writes "q" / "spellcheck.q", and then 
using all the other parameters, queries internally to see if there any hits.  
This doesn't play very well when "q.op=OR" / "mm=1".  So when you see a 
collation like "here ultrasharp" / "heat ..." etc, you see it is indeed getting 
some hits.  So it considers it a valid query re-write, despite the absurdity.  
We could improve this example config by adding 
"spellcheck.collateParam.q.op=AND" to the defaults.  (When using dismax, you 
would add "spellcheck.collateParam.mm=100%")  Also, while the "collateParam" 
functionality is in the old Solr wiki, it doesn't seem to be in the reference 
manual, so we probably should add it as this would be pretty important for a 
lot of users.
: 
: - Unless using the legacy IndexBasedSpellChecker / FileBasedSpellchecker, you 
need not use "spellcheck.build".  Its a no-op for both Direct and WordBreak, as 
these do not use sidecar indexes.
: 
: So without changing the config, these queries illustrate the spellchecker 
pretty well, including the word-break functionality.
: 
: 
http://localhost:8983/solr/techproducts/spell?spellcheck.q=dzll+ultra%20sharp&df=text&spellcheck=true&spellcheck.collateParam.q.op=AND
: 
http://localhost:8983/solr/techproducts/spell?spellcheck.q=dellultrasharp&df=text&spellcheck=true&spellcheck.collateParam.q.op=AND
: 
: Spellcheck has a lot of gotchas, and I would wish we could dream up a way to 
make it easy for people.  I remember it being a struggle for me when I was a 
new user, and I know we get lots of questions on the user-list about it.
: 
: My apologies to you for not answering this sooner.
: 
: James Dyer
: Ingram Content Group
: 
: 
: -Original Message-
: From: Chris Hostetter [mailto:hossman_luc...@fucit.org] 
: Sent: Wednesday, December 17, 2014 6:49 PM
: To: solr-user@lucene.apache.org
: Subject: can't make sense of spellchecker results when using techproducts 
example
: 
: 
: Ok, so i've been working on updating hte ref guide to account for hte new 
: way to run the "examples" in 5.0.
: 
: The spell checking page...
: 
:   https://cwiki.apache.org/confluence/display/solr/Spell+Checking
: 
: ...has some examples that loosely corroloate to the "techproducts" 
: example, but even if you ignore the specifics of those examples, i need 
: help understanding the basic behavior of hte spellchecker as configured in 
: the techproducts
: 
: Assuming you run this...
: 
:   bin/solr -e techproducts
: 
: with that example running & those docs indexed, this URL gives me 
: results i can't explain...
: 
: 
http://localhost:8983/solr/techproducts/spell?spellcheck.q=hell+ultrashar&df=text&spellcheck=true&spellcheck.build=true
: 
: (see below)
: 
: 1) "dell" is not listed as a possible suggestion for for "hell" (even if 
: the dictionary thinks "hold" is a better suggestion, why isn't "dell" even 
: included in the list of possibilities?
: 
: 2) in the "collation" section, i can't make any sense of what these 
: results mean -- how is "hello ultrasharp" a suggested collationQuery when 
: *none* of the example docs contain both "hello" and "ultrasharp" ?
: 
: 
http://localhost:8983/solr/techproducts/select?df=text&q=%2Bhello+%2Bultrasharp
: 
: 
: So WTF is up with these spell check results?
: 
: 
: 
: 
: 
: 
:0
:15
: 
: build
: 
: 
: 
:
:  
:6
:0
:4
:0
:
:  
:hello
:1
:  
:  
:here
:2
:  
:  
:heat
:1
:  
:  
:hold
:1
:  
:  
:html
:1
:  
:  
:héllo
:1
:  

Re: Distributed mode for stats component?

2015-01-14 Thread Jack Krupansky
Thanks, Chris. I just needed to stare at the code I already knew about more
intently to see what was really going on. It's super convoluted and super
confusing. The keys were the handleResponses method in the main component
class and the AbstractStatsValues class that is hidden in the
StatsValuesFactory source file. Oddly, the StatsValues source file doesn't
contain the classes that implement that interface - they're in the
"factory" source file!

BTW, we should have some doc notes on the limitations and performance
implications of the stats component. Although, admittedly, it's moot if
stats is eventually to be superseded by the analytics component.

-- Jack Krupansky

On Wed, Jan 14, 2015 at 12:26 PM, Chris Hostetter 
wrote:

>
> : Does anybody know for sure whether the stats component fully supports
> : distributed mode? It is listed in the doc as supporting distributed mode
>
> it's been supported for as long as i can remember -- since Day 1 of the
> StatsComponent i believe.
>
> : (at least for old, non-SolrCloud distrib mode), but... I don't see any
> code
> : that actually does that. Nor any tests, unless they are hidden somewhere
> I
> : didn't look.
>
> just like any other SearchComponent: look at StatsComponent.prepare,
> StatsComponent.process, ...distributedProcess, modifyRequest,
> ...handleResponses, ...finishStage, etc...
>
>
> : In particular, I am interested in the "countdistinct" parameter which
> would
> : need to retrieve all distinct values from all other shards to detect
> : whether any of the distinct values overlap between shards.
>
> yep -- that's exactly what it does ... totally naive and not a good idea
> at all for fields with non-trivial cardinality, which is why you have to
> explicitly turn it on with "calcDistinct" and why i wnat to replace it
> with HyperLogLog approximations...
>
> https://issues.apache.org/jira/browse/SOLR-6968
>
> -Hoss
> http://www.lucidworks.com/
>


Re: Engage custom hit collector for special search processing

2015-01-14 Thread tedsolr
Thank you so much Alex and Joel for your ideas. I am pouring through the
documentation and code now to try an understand it all. A post filter sounds
promising. As 99% of my doc fields are character based I should try to
compliment the collapsing Q parser with an option that compares string
fields for equality. As long as a multi-field comparison approach is not
prohibited in some way by this architecture, I feel it's a great place to
start.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Engage-custom-hit-collector-for-special-search-processing-tp4179348p4179621.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Distributed mode for stats component?

2015-01-14 Thread Chris Hostetter

: Does anybody know for sure whether the stats component fully supports
: distributed mode? It is listed in the doc as supporting distributed mode

it's been supported for as long as i can remember -- since Day 1 of the 
StatsComponent i believe.

: (at least for old, non-SolrCloud distrib mode), but... I don't see any code
: that actually does that. Nor any tests, unless they are hidden somewhere I
: didn't look.

just like any other SearchComponent: look at StatsComponent.prepare, 
StatsComponent.process, ...distributedProcess, modifyRequest, 
...handleResponses, ...finishStage, etc...


: In particular, I am interested in the "countdistinct" parameter which would
: need to retrieve all distinct values from all other shards to detect
: whether any of the distinct values overlap between shards.

yep -- that's exactly what it does ... totally naive and not a good idea 
at all for fields with non-trivial cardinality, which is why you have to 
explicitly turn it on with "calcDistinct" and why i wnat to replace it 
with HyperLogLog approximations...

https://issues.apache.org/jira/browse/SOLR-6968

-Hoss
http://www.lucidworks.com/


Re: OutOfMemoryError for PDF document upload into Solr

2015-01-14 Thread Michael Della Bitta
Yep, you'll have to increase the heap size for your Tomcat container.

http://stackoverflow.com/questions/6897476/tomcat-7-how-to-set-initial-heap-size-correctly

Michael Della Bitta

Senior Software Engineer

o: +1 646 532 3062

appinions inc.

“The Science of Influence Marketing”

18 East 41st Street

New York, NY 10017

t: @appinions  | g+:
plus.google.com/appinions

w: appinions.com 

On Wed, Jan 14, 2015 at 12:00 PM,  wrote:

> Hello,
>
> Can someone pass on the hints to get around following error? Is there any
> Heap Size parameter I can set in Tomcat or in Solr webApp that gets
> deployed in Solr?
>
> I am running Solr webapp inside Tomcat on my local machine which has RAM
> of 12 GB. I have PDF document which is 4 GB max in size that needs to be
> loaded into Solr
>
>
>
>
> Exception in thread "http-apr-8983-exec-6" java.lang.: Java heap space
> at java.util.AbstractCollection.toArray(Unknown Source)
> at java.util.ArrayList.(Unknown Source)
> at
> org.apache.pdfbox.cos.COSDocument.getObjects(COSDocument.java:518)
> at org.apache.pdfbox.cos.COSDocument.close(COSDocument.java:575)
> at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:254)
> at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1238)
> at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1203)
> at org.apache.tika.parser.pdf.PDFParser.parse(PDFParser.java:111)
> at
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
> at
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
> at
> org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
> at
> org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:219)
> at
> org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74)
> at
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
> at
> org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:246)
> at org.apache.solr.core.SolrCore.execute(SolrCore.java:1967)
> at
> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:777)
> at
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:418)
> at
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:207)
> at
> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:241)
> at
> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208)
> at
> org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:220)
> at
> org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:122)
> at
> org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:170)
> at
> org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:103)
> at
> org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:950)
> at
> org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:116)
> at
> org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:421)
> at
> org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1070)
> at
> org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:611)
> at
> org.apache.tomcat.util.net.AprEndpoint$SocketProcessor.doRun(AprEndpoint.java:2462)
> at
> org.apache.tomcat.util.net.AprEndpoint$SocketProcessor.run(AprEndpoint.java:2451)
>
>
> Thanks
> Ganesh
>
>


OutOfMemoryError for PDF document upload into Solr

2015-01-14 Thread Ganesh.Yadav
Hello,

Can someone pass on the hints to get around following error? Is there any Heap 
Size parameter I can set in Tomcat or in Solr webApp that gets deployed in Solr?

I am running Solr webapp inside Tomcat on my local machine which has RAM of 12 
GB. I have PDF document which is 4 GB max in size that needs to be loaded into 
Solr




Exception in thread "http-apr-8983-exec-6" java.lang.: Java heap space
at java.util.AbstractCollection.toArray(Unknown Source)
at java.util.ArrayList.(Unknown Source)
at org.apache.pdfbox.cos.COSDocument.getObjects(COSDocument.java:518)
at org.apache.pdfbox.cos.COSDocument.close(COSDocument.java:575)
at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:254)
at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1238)
at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1203)
at org.apache.tika.parser.pdf.PDFParser.parse(PDFParser.java:111)
at 
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
at 
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
at 
org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
at 
org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:219)
at 
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
at 
org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:246)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1967)
at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:777)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:418)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:207)
at 
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:241)
at 
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208)
at 
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:220)
at 
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:122)
at 
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:170)
at 
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:103)
at 
org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:950)
at 
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:116)
at 
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:421)
at 
org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1070)
at 
org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:611)
at 
org.apache.tomcat.util.net.AprEndpoint$SocketProcessor.doRun(AprEndpoint.java:2462)
at 
org.apache.tomcat.util.net.AprEndpoint$SocketProcessor.run(AprEndpoint.java:2451)


Thanks
Ganesh



Distributed mode for stats component?

2015-01-14 Thread Jack Krupansky
Does anybody know for sure whether the stats component fully supports
distributed mode? It is listed in the doc as supporting distributed mode
(at least for old, non-SolrCloud distrib mode), but... I don't see any code
that actually does that. Nor any tests, unless they are hidden somewhere I
didn't look.

In particular, I am interested in the "countdistinct" parameter which would
need to retrieve all distinct values from all other shards to detect
whether any of the distinct values overlap between shards.

If this is supported, where exactly is the code to do it?

I know the new analytics component doesn't support distributed mode, but my
question is about the old "stats" component.

-- Jack Krupansky


Re: How to configure Solr PostingsFormat block size

2015-01-14 Thread Michael Sokolov
As a foolish dev (not malicious I hope!), I did mess around with 
something like this once; I was writing my own Codec.  I found I had to 
create a file called META-INF/services/org.apache.lucene.codecs.Codec in 
my solr plugin jar that contained the fully-qualified class name of my 
codec: I guess this registers it with the SPI framework so it can be 
found by name?  I'm not clear, but I think you might need to do 
something similar to plug in a PostingsFormat as well.


-Mike

On 01/13/2015 05:01 PM, Chris Hostetter wrote:

: This is starting to sound pretty complicated. Are you saying this is not
: doable with Solr 4.10?

it should be doable in 4.10, using a wrapper class like the one i
mentioned below (delegating to Lucene51PostingsFormat instead of
Lucene50PostingsFormat) ... it's just that the 4.10 APIs are dangerous and
let malicious/foolish java devs do scary things they shouldn't do.  but
what i outlined before (Below) is intended to work, and should continue to
work in 5.x.

: >>...or at least: that's how it *should* work :)   makes me a bit nervous
: about trying this on my own.

...worst case scenerio, i overlooked something - but all it would take to
verify that it's working is to try it at small scale: write the class,
configure it, index a handful of docs, shutdown & restart solr, and see if
your index opens & is correctly searchable -- if it is, then i didn't
overlook anything, if it isn't then there is a bug somewhere and details
of your experiement with your custom posting format (ie wrapper class)
source in JIRA would be helpful.

: Should I open a JIRA issue or am I probably the only person with a use case
: for replacing a TermIndexInterval setting with changing the min and max
: block size on the 41 postings format?

you're the only person i've ever seen ask about it :)


: > public final class MyPfWrapper extends PostingFormat {
: >   PostingFormat pf = new Lucene50PostingsFormat(42, 9);
: >   public MyPfWrapper() {
: > super("MyPfWrapper");
: >   }
: >   public FieldsConsumer fieldsConsumer(SegmentWriteState state) throws
: > IOException {
: > return pf.fieldsConsumer(state);
: >   }
: >   public FieldsConsumer fieldsConsumer(SegmentWriteState state) throws
: > IOException {
: > return pf.fieldsConsumer(state);
: >   }
: >   public FieldsProducer fieldsProducer(SegmentReadState state) throws
: > IOException {
: > return pf.fieldsProducer(state);
: >   }
: > }
: >
: > ..and then refer to it with postingFormat="MyPfWrapper"


-Hoss
http://www.lucidworks.com/




Core deletion

2015-01-14 Thread phiroc


Hello,

I am running SOLR 4.10.0 on Tomcat 8.

The solr.xml file in .../apache-tomcat-8.0.15_solr_8983/conf/Catalina/localhost 
looks like this:







My SOLR instance contains four cores, including one whose instanceDir and 
dataDir have the following values:


instanceDir:/archives/solr/example/solr/indexapdf0/
dataDir:/archives/indexpdf0/data/

Strangely enough, every time I restart Tomcat, this core's data, [and only this 
core's data,] get deleted, which is pretty annoying.

How can I prevent it?

Many thanks.

Philippe














Re: Load existing Lucene sharded indexes onto single Solr collection

2015-01-14 Thread Erick Erickson
You certainly can't do this into a single directory, there would be
zillions of name conflicts.

I believe I saw Uwe make a comment on the Lucene list about using
MultiReaders and
keeping the sub-indexes in different directories, but that's
lower-level than Solr has access to
Plus, you'd have to control index updates _very_ carefully.

So I don't think there's something built into Solr to work with
indexes like this, so merge is
probably your only option here.

Do note that the contrib MapReduceIndexerTool that will do most all of
this for you, it includes
a --go-live option. That option still copies things around though.

Best,
Erick

On Wed, Jan 14, 2015 at 1:25 AM, Jaikit Savla
 wrote:
> This solution will merge the index as well. I want to find out if merge is 
> "required" before loading indexes onto Solr ?  If that is possible than I can 
> just point solrconfig.xml to directory where I have all the shards.
> Jaikit
>
>  On Wednesday, January 14, 2015 1:11 AM, Mikhail Khludnev 
>  wrote:
>
>
>
> On Wed, Jan 14, 2015 at 11:42 AM, Jaikit Savla 
>  wrote:
>
> Now to load this index, I am currently using Lucene IndexMergeTool to merge 
> all the shards into one giant index. My question is, is there a way to load 
> shared index without merging into one giant index on to single collection ?
>
> https://cwiki.apache.org/confluence/display/solr/CoreAdmin+API#CoreAdminAPI-MERGEINDEXES
>  ?
>
>
> --
> Sincerely yours
> Mikhail Khludnev
> Principal Engineer,
> Grid Dynamics
>
>
>
>
>


Re: How to do fuzzy search on phrases

2015-01-14 Thread Tomoko Uchida
> Iam using solr 4.7 and solr php client.

Back to original question, I've missed it. ComplexPhraseQueryParser is not
available in Solr 4.7, so sorry for misleading information.

Regards,
Tomoko

2015-01-14 23:44 GMT+09:00 Tomoko Uchida :

> Hi Adrien,
>
> No, you cannot use ComplexPhraseQueryParser in Solr 3.3.0 since this was
> introduced at Solr 4.8 (it's a pretty new feature...)
> https://issues.apache.org/jira/browse/SOLR-1604
>
> > お邪魔しました。
> You do not need this phrase here, we rarely use this in mails. :)
>
> Thanks,
> Tomoko
>
>
> 2015-01-14 23:19 GMT+09:00 Adrien RUFFIE :
>
>> Tomokoさん、おはようございます。
>>
>> Can I use ComplexPhraseQueryParser with Core Solr 3.3.0 ?
>>
>> どうもありがとうございます。
>> お邪魔しました。
>>
>> 宜しくお願いします。
>> Bien cordialement,
>>
>> ルフフィエ アドリエン
>> Adrien Ruffié
>> LD : +33 1 73 03 26 40
>> Tél : +33 1 73 03 29 80
>>
>> E-DEAL
>> Innover la Relation Client
>>
>> -Message d'origine-
>> De : Tomoko Uchida [mailto:tomoko.uchida.1...@gmail.com]
>> Envoyé : mercredi 14 janvier 2015 14:31
>> À : solr-user@lucene.apache.org
>> Objet : Re: How to do fuzzy search on phrases
>>
>> Hi,
>>
>> I suspect you are likely to misunderstand fuzzy search.
>> You should append "~N" to end of each query term, not whole query string /
>> phrase.
>> (You can debug your query and get useful information by specifying
>> "debugQuery=true" parameter, try it if you have not.)
>>
>> At first glance, I guess Complex Phrase Query Parser possibly might work
>> for you... This allows more control over phrase query.
>>
>> https://cwiki.apache.org/confluence/display/solr/Other+Parsers#OtherParsers-ComplexPhraseQueryParser
>>
>> Just for your information, I indexed a document that has two fields, id
>> and
>> title. title field is tokenized by StandardTokenizer.
>> - id: 1
>> - title: mist spa
>>
>> And I issued two phrase queries to Solr, one uses default parser (Standard
>> Query Paraser) and other uses Complex Phrase Query Parser.
>> * Query 1
>> title:"mysty~ spa"// not hit
>> * Query 2
>> {!complexphrase}title:"mysty~ spa"// hits
>>
>> I have no idea about performance impact. (you'll need sufficient
>> performance test.)
>>
>> Regards,
>> Tomoko
>>
>>
>>
>> 2015-01-14 18:34 GMT+09:00 madhav bahuguna :
>>
>> > HI
>> >
>> > Iam using solr 4.7 and solr php client.
>> >
>> > So heres the issue ,i have data indexed in solr
>> > eg
>> >
>> > mist spa
>> >
>> >
>> > I have applied fuzzy to my search and If i search myst or mysty i get
>> the
>> > correct result i get mist spa in result.
>> > But if i write mysty spa or must spa i do not get ant results.Iam not
>> able
>> > to implement fuzzy search on more than one word.
>> > Can any one advise me or help me regarding this.
>> > The query iam passing using solr php client is
>> >
>> > $querynew="(business_name:$data~N)";
>> >
>> > --
>> > Regards
>> > Madhav Bahuguna
>> >
>>
>
>


Re: Tokenizer or Filter ?

2015-01-14 Thread Jack Krupansky
It's what Java has, whatever that is:
http://docs.oracle.com/javase/7/docs/api/java/util/regex/Pattern.html

So, maybe the correct answer is neither, but similar to both.

-- Jack Krupansky

On Wed, Jan 14, 2015 at 9:06 AM, tomas.kalas  wrote:

> Oh yeah, that is it. Thank you very much for your patience. And a last
> question at the end what type regEx Solr actually using ? POSIX or PCRE ?
> Thanks.
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Tokenizer-or-Filter-tp4178346p4179505.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: How to do fuzzy search on phrases

2015-01-14 Thread Tomoko Uchida
Hi Adrien,

No, you cannot use ComplexPhraseQueryParser in Solr 3.3.0 since this was
introduced at Solr 4.8 (it's a pretty new feature...)
https://issues.apache.org/jira/browse/SOLR-1604

> お邪魔しました。
You do not need this phrase here, we rarely use this in mails. :)

Thanks,
Tomoko


2015-01-14 23:19 GMT+09:00 Adrien RUFFIE :

> Tomokoさん、おはようございます。
>
> Can I use ComplexPhraseQueryParser with Core Solr 3.3.0 ?
>
> どうもありがとうございます。
> お邪魔しました。
>
> 宜しくお願いします。
> Bien cordialement,
>
> ルフフィエ アドリエン
> Adrien Ruffié
> LD : +33 1 73 03 26 40
> Tél : +33 1 73 03 29 80
>
> E-DEAL
> Innover la Relation Client
>
> -Message d'origine-
> De : Tomoko Uchida [mailto:tomoko.uchida.1...@gmail.com]
> Envoyé : mercredi 14 janvier 2015 14:31
> À : solr-user@lucene.apache.org
> Objet : Re: How to do fuzzy search on phrases
>
> Hi,
>
> I suspect you are likely to misunderstand fuzzy search.
> You should append "~N" to end of each query term, not whole query string /
> phrase.
> (You can debug your query and get useful information by specifying
> "debugQuery=true" parameter, try it if you have not.)
>
> At first glance, I guess Complex Phrase Query Parser possibly might work
> for you... This allows more control over phrase query.
>
> https://cwiki.apache.org/confluence/display/solr/Other+Parsers#OtherParsers-ComplexPhraseQueryParser
>
> Just for your information, I indexed a document that has two fields, id and
> title. title field is tokenized by StandardTokenizer.
> - id: 1
> - title: mist spa
>
> And I issued two phrase queries to Solr, one uses default parser (Standard
> Query Paraser) and other uses Complex Phrase Query Parser.
> * Query 1
> title:"mysty~ spa"// not hit
> * Query 2
> {!complexphrase}title:"mysty~ spa"// hits
>
> I have no idea about performance impact. (you'll need sufficient
> performance test.)
>
> Regards,
> Tomoko
>
>
>
> 2015-01-14 18:34 GMT+09:00 madhav bahuguna :
>
> > HI
> >
> > Iam using solr 4.7 and solr php client.
> >
> > So heres the issue ,i have data indexed in solr
> > eg
> >
> > mist spa
> >
> >
> > I have applied fuzzy to my search and If i search myst or mysty i get the
> > correct result i get mist spa in result.
> > But if i write mysty spa or must spa i do not get ant results.Iam not
> able
> > to implement fuzzy search on more than one word.
> > Can any one advise me or help me regarding this.
> > The query iam passing using solr php client is
> >
> > $querynew="(business_name:$data~N)";
> >
> > --
> > Regards
> > Madhav Bahuguna
> >
>


RE: How to do fuzzy search on phrases

2015-01-14 Thread Adrien RUFFIE
Tomokoさん、おはようございます。

Can I use ComplexPhraseQueryParser with Core Solr 3.3.0 ?

どうもありがとうございます。
お邪魔しました。

宜しくお願いします。
Bien cordialement,

ルフフィエ アドリエン
Adrien Ruffié
LD : +33 1 73 03 26 40
Tél : +33 1 73 03 29 80

E-DEAL
Innover la Relation Client

-Message d'origine-
De : Tomoko Uchida [mailto:tomoko.uchida.1...@gmail.com] 
Envoyé : mercredi 14 janvier 2015 14:31
À : solr-user@lucene.apache.org
Objet : Re: How to do fuzzy search on phrases

Hi,

I suspect you are likely to misunderstand fuzzy search.
You should append "~N" to end of each query term, not whole query string /
phrase.
(You can debug your query and get useful information by specifying
"debugQuery=true" parameter, try it if you have not.)

At first glance, I guess Complex Phrase Query Parser possibly might work
for you... This allows more control over phrase query.
https://cwiki.apache.org/confluence/display/solr/Other+Parsers#OtherParsers-ComplexPhraseQueryParser

Just for your information, I indexed a document that has two fields, id and
title. title field is tokenized by StandardTokenizer.
- id: 1
- title: mist spa

And I issued two phrase queries to Solr, one uses default parser (Standard
Query Paraser) and other uses Complex Phrase Query Parser.
* Query 1
title:"mysty~ spa"// not hit
* Query 2
{!complexphrase}title:"mysty~ spa"// hits

I have no idea about performance impact. (you'll need sufficient
performance test.)

Regards,
Tomoko



2015-01-14 18:34 GMT+09:00 madhav bahuguna :

> HI
>
> Iam using solr 4.7 and solr php client.
>
> So heres the issue ,i have data indexed in solr
> eg
>
> mist spa
>
>
> I have applied fuzzy to my search and If i search myst or mysty i get the
> correct result i get mist spa in result.
> But if i write mysty spa or must spa i do not get ant results.Iam not able
> to implement fuzzy search on more than one word.
> Can any one advise me or help me regarding this.
> The query iam passing using solr php client is
>
> $querynew="(business_name:$data~N)";
>
> --
> Regards
> Madhav Bahuguna
>


Re: Tokenizer or Filter ?

2015-01-14 Thread tomas.kalas
Oh yeah, that is it. Thank you very much for your patience. And a last
question at the end what type regEx Solr actually using ? POSIX or PCRE ?
Thanks.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Tokenizer-or-Filter-tp4178346p4179505.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr grouping problem - need help

2015-01-14 Thread Naresh Yadav
just wanted to share schema and results for same :

solr version :  4.6.1
Schema : http://www.imagesup.net/?di=10142124357616
Code :http://www.imagesup.net/?di=10142124381116
Response Group :  http://www.imagesup.net/?di=1114212438351
Response Terms : http://www.imagesup.net/?di=614212438580

Please help me on this problem where no of groups are not matching with no
of terms which is expected behaviour acc to me.
Please give direction on this problem.

On Wed, Jan 14, 2015 at 5:24 PM, Naresh Yadav  wrote:

> I tried what you said also appended group.ngroups=true and got same result
> not expected onengroups coming is 1.
> i am on solr-4.6.1 single machine default setup.
>
>
> On Wed, Jan 14, 2015 at 4:43 PM, Norgorn  wrote:
>
>> Can u get raw SOLR response?
>>
>> For me grouping works exactly the way u expect it to work.
>>
>> Try direct query in browser to be sure the problem is not in your code.
>>
>> http://192.168.0.1:8983/solr/collection1/select?q=*:*&group=true&group.field=tenant_pool
>>
>>
>>
>> --
>> View this message in context:
>> http://lucene.472066.n3.nabble.com/Solr-grouping-problem-need-help-tp4179149p4179464.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>>
>
>
>
>


Re: Tokenizer or Filter ?

2015-01-14 Thread Jack Krupansky
I was suspecting it might do that - the pattern is "greedy" and takes the
longest matching pattern. Add a question mark after the asterisk to use
stingy mode that matches the shortest pattern.

-- Jack Krupansky

On Wed, Jan 14, 2015 at 8:37 AM, tomas.kalas  wrote:

> I just used Solr UI Analyzer for my test, or must i indexed it firstly?
>
> I used this XML code in my schema:
>
>  positionIncrementGap="100">
> 
>  pattern=".*" replacement=""/>
>   
> 
>   
>
> This is my result:
> 
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Tokenizer-or-Filter-tp4178346p4179496.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: Tokenizer or Filter ?

2015-01-14 Thread tomas.kalas
I just used Solr UI Analyzer for my test, or must i indexed it firstly?

I used this XML code in my schema: 



  
  

  

This is my result:
 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Tokenizer-or-Filter-tp4178346p4179496.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: How to do fuzzy search on phrases

2015-01-14 Thread Tomoko Uchida
Hi,

I suspect you are likely to misunderstand fuzzy search.
You should append "~N" to end of each query term, not whole query string /
phrase.
(You can debug your query and get useful information by specifying
"debugQuery=true" parameter, try it if you have not.)

At first glance, I guess Complex Phrase Query Parser possibly might work
for you... This allows more control over phrase query.
https://cwiki.apache.org/confluence/display/solr/Other+Parsers#OtherParsers-ComplexPhraseQueryParser

Just for your information, I indexed a document that has two fields, id and
title. title field is tokenized by StandardTokenizer.
- id: 1
- title: mist spa

And I issued two phrase queries to Solr, one uses default parser (Standard
Query Paraser) and other uses Complex Phrase Query Parser.
* Query 1
title:"mysty~ spa"// not hit
* Query 2
{!complexphrase}title:"mysty~ spa"// hits

I have no idea about performance impact. (you'll need sufficient
performance test.)

Regards,
Tomoko



2015-01-14 18:34 GMT+09:00 madhav bahuguna :

> HI
>
> Iam using solr 4.7 and solr php client.
>
> So heres the issue ,i have data indexed in solr
> eg
>
> mist spa
>
>
> I have applied fuzzy to my search and If i search myst or mysty i get the
> correct result i get mist spa in result.
> But if i write mysty spa or must spa i do not get ant results.Iam not able
> to implement fuzzy search on more than one word.
> Can any one advise me or help me regarding this.
> The query iam passing using solr php client is
>
> $querynew="(business_name:$data~N)";
>
> --
> Regards
> Madhav Bahuguna
>


Re: Tokenizer or Filter ?

2015-01-14 Thread Jack Krupansky
It should replace all occurrences of the pattern. Post your specific filter
XML. Patterns can be very tricky.

Use the Solr Admin UI analysis page to see how the filtering is occurring.

-- Jack Krupansky

On Wed, Jan 14, 2015 at 7:16 AM, tomas.kalas  wrote:

> Jack, thanks for help, but if i used PatternReplaceCharFilterFactory for
> example for this :
> text d1text d2text d1text 2 ok then at
> output i only get segment text 2 ok when is  text d2
> between marks  . ...so the filter
> probably takes only first d1 and last d1 and if is something between it so
> the filter it don't skip it and replace it by space too, when i set at
> replacement space. So not better used the update processor ? If you are
> described it well in your book then i will buy it.
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Tokenizer-or-Filter-tp4178346p4179477.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: Occasionally getting error in solr suggester component.

2015-01-14 Thread Dhanesh Radhakrishnan
Hi,
Thanks for the reply.
As you mentioned in the previous mail I changed buildOnCommit=false in
solrConfig.
After that change, suggestions are not working.
In Solr 4.7 introduced a new approach based on a dedicated SuggestComponent
I'm using that component to build suggestions and lookup implementation is
"AnalyzingInfixLookupFactory"
Is there any work around ??




On Wed, Jan 14, 2015 at 12:47 AM, Michael Sokolov <
msoko...@safaribooksonline.com> wrote:

> I think you are probably getting bitten by one of the issues addressed in
> LUCENE-5889
>
> I would recommend against using buildOnCommit=true - with a large index
> this can be a performance-killer.  Instead, build the index yourself using
> the Solr spellchecker support (spellcheck.build=true)
>
> -Mike
>
>
> On 01/13/2015 10:41 AM, Dhanesh Radhakrishnan wrote:
>
>> Hi all,
>>
>> I am experiencing a problem in Solr SuggestComponent
>> Occasionally solr suggester component throws an  error like
>>
>> Solr failed:
>> {"responseHeader":{"status":500,"QTime":1},"error":{"msg":"suggester was
>> not built","trace":"java.lang.IllegalStateException: suggester was not
>> built\n\tat
>> org.apache.lucene.search.suggest.analyzing.AnalyzingInfixSuggester.
>> lookup(AnalyzingInfixSuggester.java:368)\n\tat
>> org.apache.lucene.search.suggest.analyzing.AnalyzingInfixSuggester.
>> lookup(AnalyzingInfixSuggester.java:342)\n\tat
>> org.apache.lucene.search.suggest.Lookup.lookup(Lookup.java:240)\n\tat
>> org.apache.solr.spelling.suggest.SolrSuggester.
>> getSuggestions(SolrSuggester.java:199)\n\tat
>> org.apache.solr.handler.component.SuggestComponent.
>> process(SuggestComponent.java:234)\n\tat
>> org.apache.solr.handler.component.SearchHandler.handleRequestBody(
>> SearchHandler.java:218)\n\tat
>> org.apache.solr.handler.RequestHandlerBase.handleRequest(
>> RequestHandlerBase.java:135)\n\tat
>> org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.
>> handleRequest(RequestHandlers.java:246)\n\tat
>> org.apache.solr.core.SolrCore.execute(SolrCore.java:1967)\n\tat
>> org.apache.solr.servlet.SolrDispatchFilter.execute(
>> SolrDispatchFilter.java:777)\n\tat
>> org.apache.solr.servlet.SolrDispatchFilter.doFilter(
>> SolrDispatchFilter.java:418)\n\tat
>> org.apache.solr.servlet.SolrDispatchFilter.doFilter(
>> SolrDispatchFilter.java:207)\n\tat
>> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(
>> ApplicationFilterChain.java:243)\n\tat
>> org.apache.catalina.core.ApplicationFilterChain.doFilter(
>> ApplicationFilterChain.java:210)\n\tat
>> org.apache.catalina.core.StandardWrapperValve.invoke(
>> StandardWrapperValve.java:225)\n\tat
>> org.apache.catalina.core.StandardContextValve.invoke(
>> StandardContextValve.java:123)\n\tat
>> org.apache.catalina.core.StandardHostValve.invoke(
>> StandardHostValve.java:168)\n\tat
>> org.apache.catalina.valves.ErrorReportValve.invoke(
>> ErrorReportValve.java:98)\n\tat
>> org.apache.catalina.valves.AccessLogValve.invoke(
>> AccessLogValve.java:927)\n\tat
>> org.apache.catalina.valves.RemoteIpValve.invoke(
>> RemoteIpValve.java:680)\n\tat
>> org.apache.catalina.core.StandardEngineValve.invoke(
>> StandardEngineValve.java:118)\n\tat
>> org.apache.catalina.connector.CoyoteAdapter.service(
>> CoyoteAdapter.java:407)\n\tat
>> org.apache.coyote.http11.AbstractHttp11Processor.process(
>> AbstractHttp11Processor.java:1002)\n\tat
>> org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.
>> process(AbstractProtocol.java:579)\n\tat
>> org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.
>> run(JIoEndpoint.java:312)\n\tat
>> java.util.concurrent.ThreadPoolExecutor.runWorker(
>> ThreadPoolExecutor.java:1145)\n\tat
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(
>> ThreadPoolExecutor.java:615)\n\tat
>> java.lang.Thread.run(Thread.java:745)\n","code":500}}
>>
>> This is not freequently happening, but idexing and suggestor component
>> working togethere  this error will occur.
>>
>>
>>
>>
>> In solr config
>>
>> 
>>  
>>haSuggester
>>AnalyzingInfixLookupFactory  
>>textSpell
>>DocumentDictionaryFactory
>>  
>>name
>>packageWeight
>>true
>>  
>>
>>
>>> startup="lazy">
>>  
>>true
>>10
>>  
>>  
>>suggest
>>  
>>
>>
>> Can any one suggest where to look to figure out this error and why these
>> errors are occurring?
>>
>>
>>
>> Thanks,
>> dhanesh s.r
>>
>>
>>
>>
>> --
>>
>>
>


-- 
 [image: hifx_logo] 
*dhanesh s.R *
Team Lead
t: (+91) 484 4011750 (ext. 712) | m: ​(+91) 99 4  703
e: dhan...@hifx.in | w: www.hifx.in
 



-- 

--
IMPORTANT: This is an e-mail from HiFX IT Media Services Pvt. Ltd. Its 
content are confidential to the intended recipient. If you are not the 
intended recipient, b

Re: Tokenizer or Filter ?

2015-01-14 Thread tomas.kalas
Jack, thanks for help, but if i used PatternReplaceCharFilterFactory for
example for this :
text d1text d2text d1text 2 ok then at
output i only get segment text 2 ok when is  text d2
between marks  . ...so the filter
probably takes only first d1 and last d1 and if is something between it so
the filter it don't skip it and replace it by space too, when i set at
replacement space. So not better used the update processor ? If you are
described it well in your book then i will buy it.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Tokenizer-or-Filter-tp4178346p4179477.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr grouping problem - need help

2015-01-14 Thread Naresh Yadav
I tried what you said also appended group.ngroups=true and got same result
not expected onengroups coming is 1.
i am on solr-4.6.1 single machine default setup.

On Wed, Jan 14, 2015 at 4:43 PM, Norgorn  wrote:

> Can u get raw SOLR response?
>
> For me grouping works exactly the way u expect it to work.
>
> Try direct query in browser to be sure the problem is not in your code.
>
> http://192.168.0.1:8983/solr/collection1/select?q=*:*&group=true&group.field=tenant_pool
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Solr-grouping-problem-need-help-tp4179149p4179464.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: Solr grouping problem - need help

2015-01-14 Thread Norgorn
Can u get raw SOLR response?

For me grouping works exactly the way u expect it to work.

Try direct query in browser to be sure the problem is not in your code.
http://192.168.0.1:8983/solr/collection1/select?q=*:*&group=true&group.field=tenant_pool



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-grouping-problem-need-help-tp4179149p4179464.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Distributed search across Solr cores in a collection - NPE

2015-01-14 Thread Mikhail Khludnev
Jaikit,
uniq key is mandatory for distributed search. if most of your docs have ids
assigned, you can drop remaining ones by adding something like ..&fq=id:[*
TO *]

On Wed, Jan 14, 2015 at 12:53 PM, Jaikit Savla <
jaikit.sa...@yahoo.com.invalid> wrote:

> Folks,
> I have set up 3 cores in a single collection and they all have same schema
> but different index. I have set unique Id required field to false. name="id" type="string" indexed="true" stored="true" required="false"/>
> When I run query against single core, it works fine. But when I add the
> shard param and point to different core than request fails with NPE. I
> looked up on the source code for QueryComponent and line 1043
> isresultIds.put(shardDoc.id.toString(), shardDoc);
> looks like the the shardDoc id.toString() is throwing NPE.
> http://grepcode.com/file/repo1.maven.org/maven2/org.apache.solr/solr-core/4.10.1/org/apache/solr/handler/component/QueryComponent.java#QueryComponent.mergeIds%28org.apache.solr.handler.component.ResponseBuilder%2Corg.apache.solr.handler.component.ShardRequest%29
> Any clue on if my set up is incorrect ?
>
>
> http://localhost:/solr/core0/select?shards=localhost:/solr/core1&q=title:amazon&fl=*&rows=10&wt=json
>
> RESPONSE:
>  
> {"responseHeader":{"status":500,"QTime":11,"params":{"fl":"*","shards":"localhost:/solr/core1","q":"domain:amazon","wt":"json","rows":"10"}},"error":{"trace":"java.lang.NullPointerException\n\tat
> org.apache.solr.handler.component.QueryComponent.mergeIds(QueryComponent.java:1043)\n\tat
> org.apache.solr.handler.component.QueryComponent.handleRegularResponses(QueryComponent.java:716)\n\tat
> org.apache.solr.handler.component.QueryComponent.handleResponses(QueryComponent.java:695)\n\tat
> org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:324)\n\tat
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)\n\tat
> org.apache.solr.core.SolrCore.execute(SolrCore.java:1967)\n\tat
> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:777)\n\tat
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:418)\n\tat
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:207)\n\tat
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419)\n\tat
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455)\n\tat
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)\n\tat
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557)\n\tat
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)\n\tat
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075)\n\tat
> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384)\n\tat
> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)\n\tat
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009)\n\tat
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)\n\tat
> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)\n\tat
> org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154)\n\tat
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)\n\tat
> org.eclipse.jetty.server.Server.handle(Server.java:368)\n\tat
> org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489)\n\tat
> org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53)\n\tat
> org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:942)\n\tat
> org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:1004)\n\tat
> org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:640)\n\tat
> org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235)\n\tat
> org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72)\n\tat
> org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264)\n\tat
> org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)\n\tat
> org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)\n\tat
> java.lang.Thread.run(Thread.java:745)\n","code":500}}
> Appreciate any pointers.
> Thanks,Jaikit
>



-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics





Re: Solr grouping problem - need help

2015-01-14 Thread Naresh Yadav
Thanks much, now i got better idea on stored & indexed works internally in
solr.
>From this i tried and modified few things to understand grouping logic.

*Schema :*



*Code :*
SolrQuery q = new SolrQuery().setQuery("type:t1");
q.set(GroupParams.GROUP, true);
q.set(GroupParams.GROUP_FIELD, "tenant_pool");

*Data/Docs :*
"tenant_pool" : "P1 L1", "type"  : "t1"
"tenant_pool" : "P1 L1", "type"  : "t2"

*Output coming :*
groupValue=L1, docs=2

*Expected Output :*
groupValue=P1, docs=2
groupValue=L1 Farms, docs=2
My understanding is field is indexed so it will tokenized by space and P1
and L1 will be tokens..Each token should be one group
when we call Group=true in query. Please help me understand this better.

Thanks
Naresh

On Wed, Jan 14, 2015 at 1:32 AM, Erick Erickson 
wrote:

> bq: My question is for indexed=false, stored=true field..what is optimized
> way
> to get unique values in such field.
>
> There isn't any. To do this you'll have to read the doc from disk,
> it'll be decompressed
> along the way and then the field is read. Note that this happens
> automatically when
> you call doc.getFieldValue or similar.
>
> At the stored="true" level, you're always talking about complete documents.
> indexed="true" is about putting the field data into efficient-access
> structures.
> They're completely different beasts.
>
> your original question was:
> "Please guide me how i can tell solr not to tokenize stored field to decide
> unique groups.."
>
> Simply declare the field type you care about as a "string" type in
> schema.xml. The use a  directive to copy the data to the
> new type, and group on the new field.
>
> There are examples in the schema.xml of string types and copyFields that
> should help.
>
> Best,
> Erick
>
> On Tue, Jan 13, 2015 at 9:00 AM, Naresh Yadav 
> wrote:
> > Erick, my schema is same no change in that..
> > *Schema :*
> > 
> > my guess is i had not mentioned indexed true or falsemay be default
> > indexed is true
> >
> > My question is for indexed=false, stored=true field..what is optimized
> way
> > to get unique values in such field..
> >
> > On Tue, Jan 13, 2015 at 10:07 PM, Erick Erickson <
> erickerick...@gmail.com>
> > wrote:
> >
> >> Something is very wrong here. Have you perhaps been changing your
> >> schema without re-indexing? And I recommend you completely remove
> >> your data directory (the one with "index" and "tlog" subdirectories)
> after
> >> you change your schema.xml file.
> >>
> >> Because you're trying to group on a field that is _not_ indexed, you
> >> should be getting an error returned, something like:
> >> "can not use FieldCache on a field which is neither indexed nor has
> >> doc values: "
> >>
> >> As far as the tokenization comment is, just start by making the field
> you
> >> want
> >> to group on be
> >> stored="false" indexed="true" type="string"
> >>
> >> Best,
> >> Erick
> >>
> >> On Tue, Jan 13, 2015 at 5:09 AM, Naresh Yadav 
> >> wrote:
> >> > Hi jack,
> >> >
> >> > Thanks for replying, i am new to solr please guide me on this. I have
> >> many
> >> > such columns in my schema
> >> > so copy field will create lot of duplicate fields beside i do not need
> >> any
> >> > search on original field.
> >> >
> >> > My usecase is i do not want any search on tenant_pool field thats why
> i
> >> > declared it as stored field not indexed.
> >> > I just need to get unique values in this field. Please show some
> >> direction.
> >> >
> >> >
> >> > On Tue, Jan 13, 2015 at 6:16 PM, Jack Krupansky <
> >> jack.krupan...@gmail.com>
> >> > wrote:
> >> >
> >> >> That's your job. The easiest way is to do a copyField to a "string"
> >> field.
> >> >>
> >> >> -- Jack Krupansky
> >> >>
> >> >> On Tue, Jan 13, 2015 at 7:33 AM, Naresh Yadav 
> >> >> wrote:
> >> >>
> >> >> > *Schema :*
> >> >> > 
> >> >> >
> >> >> > *Code :*
> >> >> > SolrQuery q = new SolrQuery().setQuery("*:*");
> >> >> > q.set(GroupParams.GROUP, true);
> >> >> > q.set(GroupParams.GROUP_FIELD, "tenant_pool");
> >> >> >
> >> >> > *Data :*
> >> >> > "tenant_pool" : "Baroda Farms"
> >> >> > "tenant_pool" : "Ketty Farms"
> >> >> >
> >> >> > *Output coming :*
> >> >> > groupValue=Farms, docs=2
> >> >> >
> >> >> > *Expected Output :*
> >> >> > groupValue=Baroda Farms, docs=1
> >> >> > groupValue=Ketty Farms, docs=1
> >> >> >
> >> >> > Please guide me how i can tell solr not to tokenize stored field to
> >> >> decide
> >> >> > unique groups..
> >> >> >
> >> >> > I want unique groups as exact value of field not the tokens which
> >> solr is
> >> >> > doing
> >> >> > currently.
> >> >> >
> >> >> > Thanks
> >> >> > Naresh
> >> >> >
> >> >>
> >> >
> >> >
> >> >
> >> >
> >>
>


Distributed search across Solr cores in a collection - NPE

2015-01-14 Thread Jaikit Savla
Folks,
I have set up 3 cores in a single collection and they all have same schema but 
different index. I have set unique Id required field to false.
When I run query against single core, it works fine. But when I add the shard 
param and point to different core than request fails with NPE. I looked up on 
the source code for QueryComponent and line 1043 
isresultIds.put(shardDoc.id.toString(), shardDoc);
looks like the the shardDoc id.toString() is throwing 
NPE.http://grepcode.com/file/repo1.maven.org/maven2/org.apache.solr/solr-core/4.10.1/org/apache/solr/handler/component/QueryComponent.java#QueryComponent.mergeIds%28org.apache.solr.handler.component.ResponseBuilder%2Corg.apache.solr.handler.component.ShardRequest%29
Any clue on if my set up is incorrect ?

 
http://localhost:/solr/core0/select?shards=localhost:/solr/core1&q=title:amazon&fl=*&rows=10&wt=json

RESPONSE:   
{"responseHeader":{"status":500,"QTime":11,"params":{"fl":"*","shards":"localhost:/solr/core1","q":"domain:amazon","wt":"json","rows":"10"}},"error":{"trace":"java.lang.NullPointerException\n\tat
 
org.apache.solr.handler.component.QueryComponent.mergeIds(QueryComponent.java:1043)\n\tat
 
org.apache.solr.handler.component.QueryComponent.handleRegularResponses(QueryComponent.java:716)\n\tat
 
org.apache.solr.handler.component.QueryComponent.handleResponses(QueryComponent.java:695)\n\tat
 
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:324)\n\tat
 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)\n\tat
 org.apache.solr.core.SolrCore.execute(SolrCore.java:1967)\n\tat 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:777)\n\tat
 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:418)\n\tat
 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:207)\n\tat
 
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419)\n\tat
 
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455)\n\tat
 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)\n\tat
 
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557)\n\tat
 
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)\n\tat
 
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075)\n\tat
 
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384)\n\tat 
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)\n\tat
 
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009)\n\tat
 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)\n\tat
 
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)\n\tat
 
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154)\n\tat
 
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)\n\tat
 org.eclipse.jetty.server.Server.handle(Server.java:368)\n\tat 
org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489)\n\tat
 
org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53)\n\tat
 
org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:942)\n\tat
 
org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:1004)\n\tat
 org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:640)\n\tat 
org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235)\n\tat 
org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72)\n\tat
 
org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264)\n\tat
 
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)\n\tat
 
org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)\n\tat
 java.lang.Thread.run(Thread.java:745)\n","code":500}}
Appreciate any pointers.
Thanks,Jaikit


How to do fuzzy search on phrases

2015-01-14 Thread madhav bahuguna
HI

Iam using solr 4.7 and solr php client.

So heres the issue ,i have data indexed in solr
eg

mist spa


I have applied fuzzy to my search and If i search myst or mysty i get the
correct result i get mist spa in result.
But if i write mysty spa or must spa i do not get ant results.Iam not able
to implement fuzzy search on more than one word.
Can any one advise me or help me regarding this.
The query iam passing using solr php client is

$querynew="(business_name:$data~N)";

-- 
Regards
Madhav Bahuguna


Re: Load existing Lucene sharded indexes onto single Solr collection

2015-01-14 Thread Jaikit Savla
This solution will merge the index as well. I want to find out if merge is 
"required" before loading indexes onto Solr ?  If that is possible than I can 
just point solrconfig.xml to directory where I have all the shards.
Jaikit 

 On Wednesday, January 14, 2015 1:11 AM, Mikhail Khludnev 
 wrote:
   

 
On Wed, Jan 14, 2015 at 11:42 AM, Jaikit Savla  
wrote:

Now to load this index, I am currently using Lucene IndexMergeTool to merge all 
the shards into one giant index. My question is, is there a way to load shared 
index without merging into one giant index on to single collection ?

https://cwiki.apache.org/confluence/display/solr/CoreAdmin+API#CoreAdminAPI-MERGEINDEXES
 ?


-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics




   

Re: Load existing Lucene sharded indexes onto single Solr collection

2015-01-14 Thread Mikhail Khludnev
On Wed, Jan 14, 2015 at 11:42 AM, Jaikit Savla <
jaikit.sa...@yahoo.com.invalid> wrote:

> Now to load this index, I am currently using Lucene IndexMergeTool to
> merge all the shards into one giant index. My question is, is there a way
> to load shared index without merging into one giant index on to single
> collection ?


https://cwiki.apache.org/confluence/display/solr/CoreAdmin+API#CoreAdminAPI-MERGEINDEXES
?


-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics





Load existing Lucene sharded indexes onto single Solr collection

2015-01-14 Thread Jaikit Savla
Folks,
I have generated multiple (count of 100) sharded Lucene indexes on Hadoop and 
they are of format. The total indexed data (sum of all the index-*) is of size 
500GB and hence the number of shards.drwxr-x--- 2 index-66drwxr-x--- 2 
index-68drwxr-x--- 2 index-9
and each index directory is of formatls index-9_4.fdt  _4.fdx  _4.fnm  
_4_Lucene40_0.frq  _4_Lucene40_0.prx  _4_Lucene40_0.tim  _4_Lucene40_0.tip  
_4_nrm.cfe  _4_nrm.cfs  _4.si  segments_1  segments.gen  write.lock
Now to load this index, I am currently using Lucene IndexMergeTool to merge all 
the shards into one giant index. My question is, is there a way to load shared 
index without merging into one giant index on to single collection ?
Thanks,Jaikit


Re: Solr fails to start with log file not found error

2015-01-14 Thread Graeme Pietersz
I use the same user every time and the /logs directory and everything in it is 
owned by that user. I get the same problem occasionally developing on my Ubuntu 
14.10 laptop as well, and all the files in the solr directory are owned by me 
on that machine (and I run Solr as me there as well).

Config issue? Is there anything that could cause this?

-- 
Graeme Pietersz
gra...@pietersz.net
+94 774 102149
On Tuesday 13 Jan 2015 07:42:43 Erick Erickson wrote:
> By any chance are you trying to start Solr as a different user when
> this happens? I'm
> wondering if there's a permissions issue here
> 
> Wild guess.
> 
> On Tue, Jan 13, 2015 at 12:37 AM, Graeme Pietersz  wrote:
> > I get this error when starting Solr using the script in bin/solr
> >
> > tail cannot open `[path]/logs/solr.log’ for reading: No such file or 
> > directory
> >
> > It does not happen every time, but it does happen a lot. It sometimes 
> > clears up after a while.
> >
> > I have tried creating an empty file, but solr then just says:
> >
> > Backing up [path]/logs/solr.log
> >
> > And repeats the same error.
> >
> > I am guessing the problem is that it cannot get the error from the log file 
> > because the log file has not been created yet, but then how do I debug this?
> >
> > Running  Solr 4.10.2 on Debian 7 using Jetty with the default IcedTea 2.5.3 
> > java version 1.7.0_65
> >
> > Thanks for any help or pointers.