Re: Paoding analyzer with solr for chinese

2012-08-08 Thread Uwe Reh

Hi Rajani,

I'm not really familiar with this paoding tokenizer, but it seems a bit 
old. We are using the CJKBigramFilter (like in the example of Solr 4.0 
alpha), which should be equivalent or even better and it works.



   
   
   
   


Uwe



Am 09.08.2012 06:47, schrieb Rajani Maski:

Hi All,

   Any reply on this?



On Wed, Aug 8, 2012 at 3:23 PM, Rajani Maski mailto:rajinima...@gmail.com>> wrote:

Hi All,

   As said in this blog site
 that paoding
analyzer is much better for chinese text, I was trying to implement
it to get accurate results for chinese text.

I followed the instruction specified in the below sites
Site1


& Site2



After Indexing, when I search on same field with same text, no
search results(numFound=0)

And luke tool is not showing up any terms for the field that is
indexed with below field type. Can anyone comment on what is going
wrong?



*_Schema field types for  paoding :_*

*1) *
**
**
**
**


And analaysis page results is :
Inline image 2

*2)*
*  *
* *
**

Analysis on the  field "paoding_chinese" throws this error
Inline image 3



Thanks & Regards
Rajani







Re: Designing an index with multiple entity types, sharing field names across entity-types.

2012-08-08 Thread santamaria2
*civilized bump*



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Designing-an-index-with-multiple-entity-types-sharing-field-names-across-entity-types-tp3999727p451.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: question(s) re lucene spatial toolkit aka LSP aka spatial4j

2012-08-08 Thread David Smiley (@MITRE.org)
Hi! 

Sorry for the belated response; my google alerts didn't kick in for some
weird reason until you posted to the lucene dev list.


solr-user wrote
> 
> hopefully someone is using the lucene spatial toolkit aka LSP aka
> spatial4j, and can answer this question
> 
> we are using this spatial tool for doing searches.  overall, it seems to
> work very well.  however, finding documentation is difficult.
> 
> 

I'm using it ;-)

The current in-progress documentation is here:
http://wiki.apache.org/solr/SolrAdaptersForLuceneSpatial4


solr-user wrote
> 
> 
> I have a couple of questions:
> 
> 1. I have a geohash field in my solr schema that contains indexed
> geographic polygon data.  I want to find all docs where that polygon
> intersects a given lat/long.  I was experimenting with returning distance
> in the resultset and with sorting by distance and found that the following
> query works.  However, I dont know what distance means in the query.  i.e.
> is it distance from point to the polygon centroid, to the closest outer
> edge of the polygon, its a useless random value, etc. Does anyone know??
> 
> http://solrserver:solrport/solr/core0/select?q=*:*&fq={!v=$geoq%20cache=false}&geoq=wkt_search:%22Intersects(Circle(-97.057%2047.924%20d=0.01))%22&sort=query($geoq)+asc&fl=catchment_wkt1_trimmed,school_name,latitude,longitude,dist:query($geoq,-1),loc_city,loc_state
> 

It's from the center of the indexed shape to the center of the query shape.

At some point soonish, the score of a geo query is going to be similar to
the inverted distance so that it's a better relevancy metric which is what
scores should be.  I expect some alternative means to show up to actually
get the distance in search results -- perhaps a special Solr function query.


solr-user wrote
> 
> 2. some of the polygons, being geographic representations, are very big
> (ie state/province polygons).  when solr starts processing a spatial query
> (like the one above), I can see ("INFO: Building Cache [xx]") it fills
> in some sort of memory cache
> (org.apache.lucene.spatial.strategy.util.ShapeFieldCache) of the indexed
> polygon data.  We are encountering Java OOM issues when this occurs (even
> when we booested the mem to 7GB). I know that some of the polygons can
> have more than 2300 points, but heavy trimming isn't really an option due
> to level of detail issues. Can we control this caching, or the indexing of
> the polygons, in any way to reduce the memory requirements??
> 

All center points get cached into memory upon first use in a score.  I'm
unsatisfied with the current details of this which is not real-time-search
friendly and is a memory pig since it's a ArrayList of ArrayList of
PointImpl objects.  If you have a single shape value per field, then I
suggest indexing the center point into a solr.LatLonType field for sorting,
which uses the lucene FieldCache and it'll use much less memory.  Consider
making it float based to halve your memory requirements further.

p.s. I suggest "watching" this JIRA issue:
https://issues.apache.org/jira/browse/SOLR-3304

~ David Smiley



-
 Author: http://www.packtpub.com/apache-solr-3-enterprise-search-server/book
--
View this message in context: 
http://lucene.472066.n3.nabble.com/question-s-re-lucene-spatial-toolkit-aka-LSP-aka-spatial4j-tp3997757p424.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Paoding analyzer with solr for chinese

2012-08-08 Thread Rajani Maski
Hi All,

  Any reply on this?



On Wed, Aug 8, 2012 at 3:23 PM, Rajani Maski  wrote:

> Hi All,
>
>   As said in this blog 
> site that paoding
> analyzer is much better for chinese text, I was trying to implement it to
> get accurate results for chinese text.
>
> I followed the instruction specified in the below sites
> Site1
> &   
> Site2
>
>
> After Indexing, when I search on same field with same text, no search
> results(numFound=0)
>
> And luke tool is not showing up any terms for the field that is indexed
> with below field type. Can anyone comment on what is going wrong?
>
>
>
> *Schema field types for  paoding :*
>
> *1)  positionIncrementGap="100" >*
> * *
> *  class="test.solr.PaodingTokerFactory.PaoDingTokenizerFactory"/>*
> * *
> * *
>
>
> And analaysis page results is :
> [image: Inline image 2]
>
> *2)*
> *  *
> *  *
> **
>
> Analysis on the  field "paoding_chinese" throws this error
> [image: Inline image 3]
>
>
>
> Thanks & Regards
> Rajani
>
>
>


Re: error message in solr logs

2012-08-08 Thread Chris Hostetter

: Lately we are noticing below exception in our solr logs. This happens
: sometimes once or twice a day on a few cores.

the error you are seing here is a really low level HTTP communications 
error, below hte level of solr...

: Caused by: java.io.IOException: Invalid chunk header
: at
: 
org.apache.coyote.http11.filters.ChunkedInputFilter.doRead(ChunkedInputFilter.java:133)
: at
: 
org.apache.coyote.http11.InternalInputBuffer.doRead(InternalInputBuffer.java:710)
: at org.apache.coyote.Request.doRead(Request.java:428)
: at

"chunking" is a feature of HTTP that lets clients stream an arbitrary 
quantity of data w/o first computing and sending a Content-Length header, 
instead it can send smaller chunks of information prefaced by the length 
of the intividual chunks...

https://en.wikipedia.org/wiki/Chunked_transfer_encoding

This error suggest that your indexing client (the one sending Solr the 
XML) says it is using chunked encoding but is sending malformed chunk 
headers.


-Hoss


Using SolrCloud with non string type id field?

2012-08-08 Thread Mark Miller
Just curious if anyone wants to come forward as someone using SolrCloud with a 
non string based unique field?

That is the default, so if you did not change it, you are using the string type.

We are considering a change to how we handle hashing that would be back compat 
for the string type, but not most of the other types.

- Mark Miller
lucidimagination.com













Re: max connections in CloudSolrServer

2012-08-08 Thread Mark Miller
On Wed, Aug 8, 2012 at 1:55 PM, Jamie Johnson  wrote:

> I see that in other constructors you can specify an HttpClient to be
> used, but I don't see this same option for the CloudSolrServer.


You can pass a LBHttpSolrServer, which you can init with an HttpClient. Or
you can use getLbServer() and then getHttpClient and set it?

If you think it should be done a little differently, feel free to open an
issue.



>  Is
> there a way to say the maximum number of connections that should be
> used for CloudSolrServer?  What is the current number that is
> supported?
>

I'm not sure offhand - I'd guess the HttpClient default?


-- 
- Mark

http://www.lucidimagination.com


Limit on SOLR Cores

2012-08-08 Thread Nitin Arora
Hi Guys,

I've come across a use case where I've to keep separate indexes for multiple
tanents. Data directory of each tenant should be different but SOLR server
instance has same schema and configuration for all the tenants.

Tenants in our case can be added dynamically. I know that I can handle each
tanent by creating a separate core. Questions are:

1. What is the limit on number of cores that 1 SOLR server instance can
handle without affecting the performance? 
2. Are there any constraints on scaling out SOLR instance if we have lets
say 10 cores/instance? 
3. I want that each SOLR instance should register itself with our Zookeeper
Service. I want to write a wrapper over SOLR main class. Can somebody
suggest the extension point that I can use to run my custom code during the
boot up of SOLR instance.

Thanks in advance
Nitin




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Limit-on-SOLR-Cores-tp403.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Multiple Embedded Servers Pointing to single solrhome/index

2012-08-08 Thread Lance Norskog
No, you can only have one program controlling an index. This will not
work! You should use a primary/failover technique where one program
does all of the indexing, and then another program is the fallback for
the first indexer.

On Tue, Aug 7, 2012 at 7:31 AM, Bing Hua  wrote:
> Thanks Lance. The use case is to have a cluster of nodes which runs the same
> application with EmbeddedSolrServer on each of them, and they all point to
> the same index on NFS. Every application is designed equal, meaning that
> everyone may index and/or search.
>
> In such way, after every commit the writer needs to be closed for other
> nodes' availability.
>
> Do you see any issues of this use case? Is the EmbeddedSolrServer able to
> release its write lock without shutting down?
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Multiple-Embedded-Servers-Pointing-to-single-solrhome-index-tp3999451p3999591.html
> Sent from the Solr - User mailing list archive at Nabble.com.



-- 
Lance Norskog
goks...@gmail.com


Re: /solr/admin/stats.jsp null pointer exception

2012-08-08 Thread Chris Hostetter

: New install of Solr 3.6.1, getting a Null Pointer Exception when trying to
: access admin/stats.jsp:


: org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:228)
: at
: org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)
: Caused by: java.lang.NullPointerException
: 
: 
: 
: Any ideas how to fix this?

I can't reproduce with teh example configs -- it looks like you've 
tweaked hte logging to use the XML file format, anyway to get the 
stacktrace of the "Caused by" exception so we can see what is null and 
where? 

As a workarround, i would suggest switching to 
"/solr/admin/mbeans?stats=true" ... moving forward you'll have to since 
stats.jsp has been removed in Solr 4.




-Hoss


Re: Configuration for distributed search

2012-08-08 Thread Chris Hostetter

: This command to each shard returns one document from each shard.
: curl 'http://localhost:8983/solr/select?debugQuery=true&indent=true&q=conway
: curl 'http://localhost:7574/solr/select?debugQuery=true&indent=true&q=conway
: 
: This distributed search command returns 0 documents:

What do those two responses look like?  what do they look like if you add 
debugQuery=true ?

: The same distributed search command with debugQuery=true, returns an error.

Interesting ... based on your stack trace that looks like (if i'm 
understanding the code correctly) for some reason the score explanation 
info didn't come back for a document that was returned by one of the 
shards? so the data was null.  I'm not sure what that would happen, but 
i've opened SOLR-3722 to try and prevent the NPEs moving forward.

: java.lang.NullPointerException
: at
: org.apache.solr.common.util.NamedList.nameValueMapToList(NamedList.java:110)
: at
: org.apache.solr.common.util.NamedList.(NamedList.java:75)
: at
: 
org.apache.solr.common.util.SimpleOrderedMap.(SimpleOrderedMap.java:58)
: at
: 
org.apache.solr.handler.component.DebugComponent.finishStage(DebugComponent.java:130)

-Hoss


Re: Syntax for parameter substitution in function queries?

2012-08-08 Thread Timothy Hill
Thanks very much; that does indeed work as I'd hoped/expected.

On 7 August 2012 17:12, Yonik Seeley  wrote:
> On Tue, Aug 7, 2012 at 3:01 PM, Timothy Hill  wrote:
>> Hello, all ...
>>
>> According to 
>> http://wiki.apache.org/solr/FunctionQuery/#What_is_a_Function.3F,
>> it is possible under Solr 4.0 to perform parameter substitutions
>> within function queries.
>>
>> However, I can't get the syntax provided in the documentation there to
>> work *at all* with Solr 4.0 out of the box: the only location at which
>> function queries can be specified, it seems, is in the 'fl' parameter.
>> And attempts at parameter substitutions here fail. Using (haphazardly
>> guessed) syntax like
>>
>> select?q=*:*&fl=*, test_id:if(exists(employee), employee_id,
>> socialsecurity_id), boost_id:sum($test_id, 10)&wt=xml
>>
>> results in the following error
>>
>> Error parsing fieldname: Missing param test_id while parsing function
>> 'sum($test_id, 10)'
>
> test_id needs to be an actual request parameter.
>
> This worked for me on the example data:
> http://localhost:8983/solr/query?q=*:*&fl=*,%20test_id:if(exists(price),id,name),%20boost_id:sum($param,10)¶m=price
>
> -Yonik
> http://lucidimagination.com


Re: search on default field returns less documents

2012-08-08 Thread Jack Krupansky
Default search field handling changed in Solr 3.6. Which release of Solr are 
you using?


In Solr 3.6, the "df" request parameter in your query request handler 
overrides the deprecated defaultSearchField. The out of the box default for 
"df" is "text", which should match your schema, but... better to check.


Add the &debugQuery=true option to your query and check the parsedquery 
attribute for the two queries, just to be sure what fields are actually 
searched.


And which query parser are you using? dismax and edismax use "qf" to specify 
the search fields.


What does your "textgen" analyzer look like?

-- Jack Krupansky

-Original Message- 
From: Shalom

Sent: Wednesday, August 08, 2012 1:13 PM
To: solr-user@lucene.apache.org
Subject: search on default field returns less documents

Hi All
we have two fields:





'text' is our default field:

text

we copy the doc field to the 'text' field



when indexing 10 documents that have a value with same prefix in the doc
field, for example: ca067-XXX ,and searching on the default field I get only
5 results, I search for ca067 on the default field.
when searching ca067 on the 'doc' field I get the expected 10 results.

anyone has an idea what is wrong here ?

Thank you










--
View this message in context: 
http://lucene.472066.n3.nabble.com/search-on-default-field-returns-less-documents-tp3999896.html
Sent from the Solr - User mailing list archive at Nabble.com. 



Re: Does Solr support 'Value Search'?

2012-08-08 Thread Bing Hua
Not quite understand but I'd explain the problem I had. The response would
contain only fields and a list of field values that match the query.
Essentially it's querying for field values rather than documents. The
underlying use case would be, when typing in a quick search box, the drill
down menu may contain matches on authors, on doctitles, and potentially on
other fields.

Still thanks for your response and hopefully I'm making it clearer.
Bing



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Does-Solr-support-Value-Search-tp3999654p327.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Setting "df" (default field) from solrj?

2012-08-08 Thread homernabble
Perfect.  Thanks!


Jack Krupansky-2 wrote
> 
> You should simply set the default value for the "df" request parameter in 
> your Solr request handlers in solrconfig.xml. It is set to "text" out of
> the 
> box, but you can set it to your desired field.
> 
> If you still want to set/override "df" from SolrJ anyway, use the 
> SolrQuery.setParam method:
> 
> solrQuery.setParam("df", "SearchText");
> 
> -- Jack Krupansky
> 
> 




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Setting-df-default-field-from-solrj-tp3999794p316.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: No search result is returned using Solr 4

2012-08-08 Thread in.abdul
Hi Engy,
  Can you able to get the result for q=*.* ? And if so can you check where
query is parsing properly by add a param debug=true .

Syed Abdul kather
send from Samsung S3
On Aug 8, 2012 4:30 PM, "engy.morsy [via Lucene]" <
ml-node+s472066n3999741...@n3.nabble.com> wrote:

> Hi,
>
> I downloaded new alpha-solr 4. I indexed two documents, the two documents
> contain both English and Arabic text. I am using two different analyzer to
> index both languages, and I was able to check them using luke.
> I tried to query new solr using admin interface, but I did not get any
> result. When I used solr analysis interface to check. I was able to analyze
> the indexing and get correct analysis but unfortunately, I did not get any
> analysis for the query part.
>
> I checked the encoding of the server (Apache Tomcat 7), I even tried to
> query solr programmatically, and encode the query to UTF-8, but still can't
> get any results.
>
> Any idea??
>
> Engy
>
> --
>  If you reply to this email, your message will be added to the discussion
> below:
>
> http://lucene.472066.n3.nabble.com/No-search-result-is-returned-using-Solr-4-tp3999741.html
>  To unsubscribe from Lucene, click 
> here
> .
> NAML
>




-
THANKS AND REGARDS,
SYED ABDUL KATHER
--
View this message in context: 
http://lucene.472066.n3.nabble.com/No-search-result-is-returned-using-Solr-4-tp3999741p315.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Does Solr support 'Value Search'?

2012-08-08 Thread Mikhail Khludnev
Ok. It seems to me you can configure
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.WordDelimiterFilterFactoryfor
index-time to produce "admin" term from all your docs above, after
that
you'll be able to match by simple term query.
Is it what are you looking for?

On Wed, Aug 8, 2012 at 6:43 PM, Bing Hua  wrote:

> Thanks for the response but wait... Is it related to my question searching
> for field values? I was not asking how to use wildcards though.
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Does-Solr-support-Value-Search-tp3999654p3999817.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>



-- 
Sincerely yours
Mikhail Khludnev
Tech Lead
Grid Dynamics


 


/solr/admin/stats.jsp null pointer exception

2012-08-08 Thread Jon Drukman
New install of Solr 3.6.1, getting a Null Pointer Exception when trying to
access admin/stats.jsp:



  2012-08-08T17:55:09
  138509624
  694
  org.apache.solr.servlet.SolrDispatchFilter
  SEVERE
  org.apache.solr.common.SolrException
  log
  25
  org.apache.jasper.JasperException: java.lang.NullPointerException
at
org.apache.jasper.servlet.JspServletWrapper.service(JspServletWrapper.java:418)
at org.apache.jasper.servlet.JspServlet.serviceJspFile(JspServlet.java:486)
at org.apache.jasper.servlet.JspServlet.service(JspServlet.java:380)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:511)
at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:401)
at
org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)
at org.mortbay.jetty.servlet.Dispatcher.forward(Dispatcher.java:327)
at org.mortbay.jetty.servlet.Dispatcher.forward(Dispatcher.java:126)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:283)
at
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
at
org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)
at
org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
at
org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
at org.mortbay.jetty.Server.handle(Server.java:326)
at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
at
org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928)
at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549)
at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212)
at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
at
org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:228)
at
org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)
Caused by: java.lang.NullPointerException



Any ideas how to fix this?

-jsd-


max connections in CloudSolrServer

2012-08-08 Thread Jamie Johnson
I see that in other constructors you can specify an HttpClient to be
used, but I don't see this same option for the CloudSolrServer.  Is
there a way to say the maximum number of connections that should be
used for CloudSolrServer?  What is the current number that is
supported?


search on default field returns less documents

2012-08-08 Thread Shalom
Hi All
we have two fields:





'text' is our default field:

text

we copy the doc field to the 'text' field



when indexing 10 documents that have a value with same prefix in the doc
field, for example: ca067-XXX ,and searching on the default field I get only
5 results, I search for ca067 on the default field.
when searching ca067 on the 'doc' field I get the expected 10 results.

anyone has an idea what is wrong here ?

Thank you










--
View this message in context: 
http://lucene.472066.n3.nabble.com/search-on-default-field-returns-less-documents-tp3999896.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Recovery problem in solrcloud

2012-08-08 Thread Jam Luo
There are 400 million documents in a shard, a document is less then 1 kb.
the data file _**.fdt is 149g.
Does the recovering need large memory in downloading or after downloaded?

I find some log before OOM as below:
Aug 06, 2012 9:43:04 AM org.apache.solr.core.SolrCore execute
INFO: [blog] webapp=/solr path=/select
params={sort=createdAt+desc&distrib=false&collection=today,blog&hl.fl=content&wt=javabin&hl=false&rows=10&version=2&f.content.hl.fragsize=0&fl=id&shard.url=index35:8983/solr/blog/&NOW=1344217556702&start=0&q=((("somewordsA"+%26%26+"somewordsB"+%26%26+"somewordsC")+%26%26+platform:abc)+||+id:"/")+%26%26+(createdAt:[2012-07-30T01:43:28.462Z+TO+2012-08-06T01:43:28.462Z])&_system=business&isShard=true&fsv=true&f.title.hl.fragsize=0}
hits=0 status=0 QTime=95
Aug 06, 2012 9:43:05 AM org.apache.solr.core.SolrDeletionPolicy onInit
INFO: SolrDeletionPolicy.onInit: commits:num=1

commit{dir=/home/ant/jetty/solr/data/index.20120801114027,segFN=segments_aui,generation=14058,filenames=[_cdnu_nrm.cfs,
_cdnu_0.frq, segments_aui, _cdnu.fdt, _cdnu_nrm.cfe, _cdnu_0.tim,
_cdnu.fdx, _cdnu.fnm, _cdnu_0.prx, _cdnu_0.tip, _cdnu.per]
Aug 06, 2012 9:43:05 AM org.apache.solr.core.SolrDeletionPolicy
updateCommits
INFO: newest commit = 14058
Aug 06, 2012 9:43:05 AM org.apache.solr.update.DirectUpdateHandler2 commit
INFO: start
commit{flags=0,version=0,optimize=false,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=false}
Aug 06, 2012 9:43:05 AM org.apache.solr.search.SolrIndexSearcher 
INFO: Opening Searcher@13578a09 main
Aug 06, 2012 9:43:05 AM org.apache.solr.core.QuerySenderListener newSearcher
INFO: QuerySenderListener sending requests to
Searcher@13578a09main{StandardDirectoryReader(segments_aui:1269420
_cdnu(4.0):C457041702)}
Aug 06, 2012 9:43:05 AM org.apache.solr.core.QuerySenderListener newSearcher
INFO: QuerySenderListener done.
Aug 06, 2012 9:43:05 AM org.apache.solr.core.SolrCore registerSearcher
INFO: [blog] Registered new searcher
Searcher@13578a09main{StandardDirectoryReader(segments_aui:1269420
_cdnu(4.0):C457041702)}
Aug 06, 2012 9:43:05 AM org.apache.solr.update.DirectUpdateHandler2 commit
INFO: end_commit_flush
Aug 06, 2012 9:43:05 AM org.apache.solr.update.processor.LogUpdateProcessor
finish
INFO: [blog] webapp=/solr path=/update
params={waitSearcher=true&commit_end_point=true&wt=javabin&commit=true&version=2}
{commit=} 0 1439
Aug 06, 2012 9:43:05 AM org.apache.solr.update.DirectUpdateHandler2 commit
INFO: start
commit{flags=0,version=0,optimize=false,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=false}
Aug 06, 2012 9:43:05 AM org.apache.solr.search.SolrIndexSearcher 
INFO: Opening Searcher@1a630c4d main
Aug 06, 2012 9:43:05 AM org.apache.solr.core.QuerySenderListener newSearcher
INFO: QuerySenderListener sending requests to
Searcher@1a630c4dmain{StandardDirectoryReader(segments_aui:1269420
_cdnu(4.0):C457041702)}
Aug 06, 2012 9:43:05 AM org.apache.solr.core.QuerySenderListener newSearcher
INFO: QuerySenderListener done.
Aug 06, 2012 9:43:05 AM org.apache.solr.core.SolrCore registerSearcher
INFO: [blog] Registered new searcher
Searcher@1a630c4dmain{StandardDirectoryReader(segments_aui:1269420
_cdnu(4.0):C457041702)}
Aug 06, 2012 9:43:05 AM org.apache.solr.update.DirectUpdateHandler2 commit
INFO: end_commit_flush
Aug 06, 2012 9:43:07 AM org.apache.solr.core.SolrCore execute
INFO: [blog] webapp=/solr path=/select
params={sort=createdAt+desc&distrib=false&collection=today,blog&hl.fl=content&wt=javabin&hl=false&rows=10&version=2&f.content.hl.fragsize=0&fl=id&shard.url=index35:8983/solr/blog/&NOW=1344217558778&start=0&_system=business&q=(((somewordsD)+%26%26+platform:(abc))+||+id:"/")+%26%26+(createdAt:[2012-07-30T01:43:30.537Z+TO+2012-08-06T01:43:30.537Z])&isShard=true&fsv=true&f.title.hl.fragsize=0}
hits=0 status=0 QTime=490

Except this log, all of other are "path=/select **" in a few minutes,
there is no add documents request in this cluster in this time.Is
that related to the OOM?

This is live traffic, so I can't test it frequently, Tonight I add
-XX:+HeapDumpOnOutOfMemoryError
option, if this problem appear once again, I will get the  heap dump, but I
am not sure I can analyse it and get a result. I will ask for your help
please.

thanks

2012/8/8 Yonik Seeley 

> Stack trace looks normal - it's just a multi-term query instantiating
> a bitset.  The memory is being taken up somewhere else.
> How many documents are in your index?
> Can you get a heap dump or use some other memory profiler to see
> what's taking up the space?
>
> > if I stop query more then  ten minutes, the solr instance will start
> normally.
>
> Maybe queries are piling up in threads before the server is ready to
> handle them and then trying to handle them all at once gives an OOM?
> Is this live traffic or a test?  How many concurrent requests get sent?
>
> -Yonik
> http://lucidimagination.com
>
>
> On Wed, Aug 8, 2012 at 2:43 AM, Jam Luo  wrote:
> > Aug 06, 2012 10:05:55 AM org.apache.

Re: Solr makes long requests about once a minute

2012-08-08 Thread Jack Krupansky
Check the Solr log file and see if something is happening at those slow 
queries. Maybe an auto-commit?


-- Jack Krupansky

-Original Message- 
From: Andy Lester

Sent: Wednesday, August 08, 2012 11:30 AM
To: solr-user@lucene.apache.org
Subject: Solr makes long requests about once a minute

I'm having a problem with Solr under Tomcat unexpectedly taking a long time 
to respond to queries.  As part of some stress testing, I wrote a bot that 
just does random word searches on my Solr install, and my responses 
typically come back in 10-50 ms.  The queries are just 1-3 random words from 
/usr/share/dict/words, and I cap off the results at 2500 hits.


The queries run just fine and I typically get responses up to 50ms for large 
result sets.  Here's an example of my log:


TIME HITS   MS SEARCH WORDS
12:33:2015 hoovey Aruru kwachas
12:33:2085 blinis twyver
12:33:20 2500   34 prework burlily sunshine
12:33:20 1928   30 rendu Solly
12:33:20   unnethe
12:33:20   gadwell afterpeak
12:33:20  792   14 steen
12:33:2047 blanchi repaving
12:33:20   326 torbanite Storz ungag
12:33:2075 chemostat
12:33:20   156 Guauaenok Adao lakist
12:33:2066 bechance viny
12:33:20   206 chagigah
12:33:22  532 2404 bonne
12:33:22  1439 nonman Norrie
12:33:22   246 repealers
12:33:22   Pfosi laniard locutory
12:33:22   516 sexipolar wordsmith enshield
12:33:22   loggiest Aryanise koels
12:33:22   fogyish unforcing
12:33:2245 Millvale chokies
12:33:2256 Melfa ripal Olva
12:33:22   156 apio Heraea latimeria
12:33:2245 nonnitric parleying

See that one line where it 2404ms to return?  I get those about once a 
minute, but not at a regular interval.  I ran this for two hours and got 122 
spikes in 120 minutes.  I ran it overnight and came in to work to find that 
there were 1283 spikes in 1260 minutes.  So that one-a-minute is a pattern.


As I write this, I'm in IRC with Chris Hostetter and he says:

--snip--
Probably need to tweak your garbage collector settings to something that 
doesn't involve "stop the world" ... the specifics of the changes largely 
depend on what JVM you are using, what options you already have set, etc. 
markrmiller wrote a good blog about this a little while back: 
http://searchhub.org/dev/2011/03/27/garbage-collection-bootcamp-1-0/ 
There's also some notes here in the LucidWorks Solr Ref Guide: 
http://lucidworks.lucidimagination.com/display/solr/JVM+Settings

--snip--

GC certainly sounds like a reasonable suspect.  Any other suggestions?  Any 
hints on Solr-specific GC tuning?  I'm currently scouring Google.


Thanks,
xoa

--
Andy Lester => a...@petdance.com => www.petdance.com => AIM:petdance 



Re: Solr makes long requests about once a minute

2012-08-08 Thread Michael Della Bitta
StandardDirectoryFactory gets us partway there, but that's actually a
class that chooses an appropriate implementation at runtime based on
the parameters of the system it's being run on.

If you go to the status page off of the admin page and do a find on
"readerDir", I'm guessing you'll see
"org.apache.lucene.store.MMapDirectory"

So you have that 16GB to yourself, that's good. Have you told Tomcat
how much heap it can have? It's usually done with a setting like
-Xmx4g, but where that goes depends on how you installed Tomcat.

Have you watched Tomcat's RSIZE in 'top'? You should see it peak out
when your query pauses and then suddenly drop a significant amount.

Michael Della Bitta


Appinions | 18 East 41st St., Suite 1806 | New York, NY 10017
www.appinions.com
Where Influence Isn’t a Game


On Wed, Aug 8, 2012 at 12:05 PM, Andy Lester  wrote:
>
> On Aug 8, 2012, at 10:53 AM, Michael Della Bitta wrote:
>
>> What version of Solr are you running and what Directory implementation
>> are you using? How much RAM does your system have, and how much is
>> available for use by Solr?
>
> Solr 3.6.0
>
> I don't know what "directory implementation" means.  Are you asking about 
> ?  All I have in my solrconfig.xml is
>
>  class="${solr.directoryFactory:solr.StandardDirectoryFactory}"/>
>
> The box has 16GB in it and currently has literally nothing else running on 
> it.  As to the "how much is available for use by Solr", is there somewhere 
> that I'm setting that in a config file?
>
> Clearly, I'm entirely new to the whole JVM ecosystem. I'm coming from the 
> world of Perl.
>
> Thanks,
> xoa
>
> --
> Andy Lester => a...@petdance.com => www.petdance.com => AIM:petdance
>


Re: Solr makes long requests about once a minute

2012-08-08 Thread Andy Lester

On Aug 8, 2012, at 10:53 AM, Michael Della Bitta wrote:

> What version of Solr are you running and what Directory implementation
> are you using? How much RAM does your system have, and how much is
> available for use by Solr?

Solr 3.6.0

I don't know what "directory implementation" means.  Are you asking about 
?  All I have in my solrconfig.xml is



The box has 16GB in it and currently has literally nothing else running on it.  
As to the "how much is available for use by Solr", is there somewhere that I'm 
setting that in a config file?

Clearly, I'm entirely new to the whole JVM ecosystem. I'm coming from the world 
of Perl.

Thanks,
xoa

--
Andy Lester => a...@petdance.com => www.petdance.com => AIM:petdance



RE: numFound changes on changing start and rows

2012-08-08 Thread Rohit
I can cross check our shards once again, but I am sure this is not the case.


Regards,
Rohit
Mobile: +91-9901768202


-Original Message-
From: Chris Hostetter [mailto:hossman_luc...@fucit.org] 
Sent: 08 August 2012 21:04
To: solr-user@lucene.apache.org
Subject: Re: numFound changes on changing start and rows


: We are using Solr3.6 and 2 shards, we are noticing that when we fire a
query
: with start as 0 and rows X the total numFound and the total numFound
changes
: when we fire the same exact query with start as y and rows X.

The only situation where i've ever heard of this happening is when multiple
shards have documents with identical uniqueKeys...

http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201206.mbox/%3CCAP
oDz8S4Z-jnyptFXdv7VJdWntY0Lx_=nzhvq0qtcfqyx7m...@mail.gmail.com%3E
http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201206.mbox/%3Calp
ine.DEB.2.00.1206191429520.19329@bester%3E
http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201206.mbox/%3CCAP
oDz8S59kzUdCAZwHRquzUhM=C90ReyCNe3Au00xsc=wh0...@mail.gmail.com%3E

As noted in the docs..

http://wiki.apache.org/solr/DistributedSearch?#Distributed_Searching_Limitat
ions

"The unique key field must be unique across all shards. If docs with
duplicate unique keys are encountered, Solr will make an attempt to return
valid results, but the behavior may be non-deterministic. "




-Hoss




Re: Solr makes long requests about once a minute

2012-08-08 Thread Michael Della Bitta
Hi, Andy,

What version of Solr are you running and what Directory implementation
are you using? How much RAM does your system have, and how much is
available for use by Solr?

Michael Della Bitta


Appinions | 18 East 41st St., Suite 1806 | New York, NY 10017
www.appinions.com
Where Influence Isn’t a Game


On Wed, Aug 8, 2012 at 11:30 AM, Andy Lester  wrote:
> I'm having a problem with Solr under Tomcat unexpectedly taking a long time 
> to respond to queries.  As part of some stress testing, I wrote a bot that 
> just does random word searches on my Solr install, and my responses typically 
> come back in 10-50 ms.  The queries are just 1-3 random words from 
> /usr/share/dict/words, and I cap off the results at 2500 hits.
>
> The queries run just fine and I typically get responses up to 50ms for large 
> result sets.  Here's an example of my log:
>
> TIME HITS   MS SEARCH WORDS
> 12:33:2015 hoovey Aruru kwachas
> 12:33:2085 blinis twyver
> 12:33:20 2500   34 prework burlily sunshine
> 12:33:20 1928   30 rendu Solly
> 12:33:20   unnethe
> 12:33:20   gadwell afterpeak
> 12:33:20  792   14 steen
> 12:33:2047 blanchi repaving
> 12:33:20   326 torbanite Storz ungag
> 12:33:2075 chemostat
> 12:33:20   156 Guauaenok Adao lakist
> 12:33:2066 bechance viny
> 12:33:20   206 chagigah
> 12:33:22  532 2404 bonne
> 12:33:22  1439 nonman Norrie
> 12:33:22   246 repealers
> 12:33:22   Pfosi laniard locutory
> 12:33:22   516 sexipolar wordsmith enshield
> 12:33:22   loggiest Aryanise koels
> 12:33:22   fogyish unforcing
> 12:33:2245 Millvale chokies
> 12:33:2256 Melfa ripal Olva
> 12:33:22   156 apio Heraea latimeria
> 12:33:2245 nonnitric parleying
>
> See that one line where it 2404ms to return?  I get those about once a 
> minute, but not at a regular interval.  I ran this for two hours and got 122 
> spikes in 120 minutes.  I ran it overnight and came in to work to find that 
> there were 1283 spikes in 1260 minutes.  So that one-a-minute is a pattern.
>
> As I write this, I'm in IRC with Chris Hostetter and he says:
>
> --snip--
> Probably need to tweak your garbage collector settings to something that 
> doesn't involve "stop the world" ... the specifics of the changes largely 
> depend on what JVM you are using, what options you already have set, etc.  
> markrmiller wrote a good blog about this a little while back: 
> http://searchhub.org/dev/2011/03/27/garbage-collection-bootcamp-1-0/  There's 
> also some notes here in the LucidWorks Solr Ref Guide: 
> http://lucidworks.lucidimagination.com/display/solr/JVM+Settings
> --snip--
>
> GC certainly sounds like a reasonable suspect.  Any other suggestions?  Any 
> hints on Solr-specific GC tuning?  I'm currently scouring Google.
>
> Thanks,
> xoa
>
> --
> Andy Lester => a...@petdance.com => www.petdance.com => AIM:petdance
>


Re: numFound changes on changing start and rows

2012-08-08 Thread Michael Della Bitta
Our documents are keyed with UUIDs, and we shard chronologically. The
write events are issued as part of a SQS queue that only allows one
reader to see the message. I think it's pretty unlikely that we have
more than one document with the same uniquekey.

I can actually prove this if it will help the discussion, since I just
dumped 4 of our shards to JSON, but it's over 117 million docs, so
I'll wait until someone asks. :)

Michael Della Bitta


Appinions | 18 East 41st St., Suite 1806 | New York, NY 10017
www.appinions.com
Where Influence Isn’t a Game


On Wed, Aug 8, 2012 at 11:33 AM, Chris Hostetter
 wrote:
>
> : We are using Solr3.6 and 2 shards, we are noticing that when we fire a query
> : with start as 0 and rows X the total numFound and the total numFound changes
> : when we fire the same exact query with start as y and rows X.
>
> The only situation where i've ever heard of this happening is when
> multiple
> shards have documents with identical uniqueKeys...
>
> http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201206.mbox/%3CCAPoDz8S4Z-jnyptFXdv7VJdWntY0Lx_=nzhvq0qtcfqyx7m...@mail.gmail.com%3E
> http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201206.mbox/%3Calpine.DEB.2.00.1206191429520.19329@bester%3E
> http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201206.mbox/%3CCAPoDz8S59kzUdCAZwHRquzUhM=C90ReyCNe3Au00xsc=wh0...@mail.gmail.com%3E
>
> As noted in the docs..
>
> http://wiki.apache.org/solr/DistributedSearch?#Distributed_Searching_Limitations
>
> "The unique key field must be unique across all shards. If docs with
> duplicate unique keys are encountered, Solr will make an attempt to return
> valid results, but the behavior may be non-deterministic. "
>
>
>
>
> -Hoss


Re: HTTP Basic Authentication with HttpSolrServer [solved]

2012-08-08 Thread vilo
You're partly right. The solution in the link was for CommonsHttpSolrServer,
it does work for HttpSolrServer, but the principle is the same.

Actually, I found solution for the new HttpClient here:
http://stackoverflow.com/questions/2014700/preemptive-basic-authentication-with-apache-httpclient-4/11868040#11868040
(this is my modification of some other answer)





--
View this message in context: 
http://lucene.472066.n3.nabble.com/HTTP-Basic-Authentication-with-HttpSolrServer-tp3999829p3999849.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: numFound changes on changing start and rows

2012-08-08 Thread Chris Hostetter

: We are using Solr3.6 and 2 shards, we are noticing that when we fire a query
: with start as 0 and rows X the total numFound and the total numFound changes
: when we fire the same exact query with start as y and rows X.

The only situation where i've ever heard of this happening is when 
multiple 
shards have documents with identical uniqueKeys...

http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201206.mbox/%3CCAPoDz8S4Z-jnyptFXdv7VJdWntY0Lx_=nzhvq0qtcfqyx7m...@mail.gmail.com%3E
http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201206.mbox/%3Calpine.DEB.2.00.1206191429520.19329@bester%3E
http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201206.mbox/%3CCAPoDz8S59kzUdCAZwHRquzUhM=C90ReyCNe3Au00xsc=wh0...@mail.gmail.com%3E

As noted in the docs..

http://wiki.apache.org/solr/DistributedSearch?#Distributed_Searching_Limitations

"The unique key field must be unique across all shards. If docs with 
duplicate unique keys are encountered, Solr will make an attempt to return 
valid results, but the behavior may be non-deterministic. "




-Hoss


Re: getting empty result but numFound > 0

2012-08-08 Thread Jack Krupansky
"status":500 means there was probably an exception on the server. Check the 
Solr log file for details.


-- Jack Krupansky

-Original Message- 
From: Rafael Pappert

Sent: Wednesday, August 08, 2012 9:20 AM
To: solr-user@lucene.apache.org
Subject: getting empty result but numFound > 0

Hello List,

i'm evaluate solr 4 / solrCloud and got the following problem.
I've indexed ~1.5M Documents but the "docs" section in the response is 
always

empty. The response for the *:* query looks like this:

{
 "responseHeader":{
   "status":500,
   "QTime":12,
   "params":{
 "fl":"title,img",
 "indent":"true",
 "start":"0",
 "q":"*:*",
 "wt":"json",
 "rows":"10"}},
 "response":{"numFound":1441958,"start":0,"maxScore":1.0,"docs":[]
 }

The schema.xml has a lot of stored/indexed fields. Any hints whats wrong?

Thanks in Advance,
Rafael. 



Solr makes long requests about once a minute

2012-08-08 Thread Andy Lester
I'm having a problem with Solr under Tomcat unexpectedly taking a long time to 
respond to queries.  As part of some stress testing, I wrote a bot that just 
does random word searches on my Solr install, and my responses typically come 
back in 10-50 ms.  The queries are just 1-3 random words from 
/usr/share/dict/words, and I cap off the results at 2500 hits.  

The queries run just fine and I typically get responses up to 50ms for large 
result sets.  Here's an example of my log:

TIME HITS   MS SEARCH WORDS
12:33:2015 hoovey Aruru kwachas
12:33:2085 blinis twyver
12:33:20 2500   34 prework burlily sunshine
12:33:20 1928   30 rendu Solly
12:33:20   unnethe
12:33:20   gadwell afterpeak
12:33:20  792   14 steen
12:33:2047 blanchi repaving
12:33:20   326 torbanite Storz ungag
12:33:2075 chemostat
12:33:20   156 Guauaenok Adao lakist
12:33:2066 bechance viny
12:33:20   206 chagigah
12:33:22  532 2404 bonne
12:33:22  1439 nonman Norrie
12:33:22   246 repealers
12:33:22   Pfosi laniard locutory
12:33:22   516 sexipolar wordsmith enshield
12:33:22   loggiest Aryanise koels
12:33:22   fogyish unforcing
12:33:2245 Millvale chokies
12:33:2256 Melfa ripal Olva
12:33:22   156 apio Heraea latimeria
12:33:2245 nonnitric parleying

See that one line where it 2404ms to return?  I get those about once a minute, 
but not at a regular interval.  I ran this for two hours and got 122 spikes in 
120 minutes.  I ran it overnight and came in to work to find that there were 
1283 spikes in 1260 minutes.  So that one-a-minute is a pattern.

As I write this, I'm in IRC with Chris Hostetter and he says:

--snip--
Probably need to tweak your garbage collector settings to something that 
doesn't involve "stop the world" ... the specifics of the changes largely 
depend on what JVM you are using, what options you already have set, etc.  
markrmiller wrote a good blog about this a little while back: 
http://searchhub.org/dev/2011/03/27/garbage-collection-bootcamp-1-0/  There's 
also some notes here in the LucidWorks Solr Ref Guide: 
http://lucidworks.lucidimagination.com/display/solr/JVM+Settings
--snip--

GC certainly sounds like a reasonable suspect.  Any other suggestions?  Any 
hints on Solr-specific GC tuning?  I'm currently scouring Google.

Thanks,
xoa

--
Andy Lester => a...@petdance.com => www.petdance.com => AIM:petdance



Re: HTTP Basic Authentication with HttpSolrServer

2012-08-08 Thread Paul Libbrecht
Villam,

this is a question for httpclient, I think you want to enable preemptive 
authentication so as to avoid the need to repeat the query after the 
"unauthorized" response is sent.

http://hc.apache.org/httpclient-3.x/authentication.html#Preemptive_Authentication

paul


Le 8 août 2012 à 17:08, vilo a écrit :

> I have protected my solr server with basic authentication. Now I want to
> connect to it using SOLRJ. CommonsHttpSolrServer is now deprecated, so I try
> to use HttpSolrServer, but I fail to send credentials. If I put them to the
> url, I get 401 (http://user:passw...@example.com/solr). I tried this:
> 
>  HttpSolrServer solr = new HttpSolrServer(urlString);
>  DefaultHttpClient httpClient = (DefaultHttpClient) solr.getHttpClient();
>  httpClient.getCredentialsProvider().setCredentials(
>  new AuthScope(url.getHost(), url.getPort()),
>  new UsernamePasswordCredentials(username, password));
> 
> but I get "org.apache.http.client.NonRepeatableRequestException: Cannot
> retry request with a non-repeatable request entity". Here is complete call
> stack:
> 
> Caused by: org.apache.solr.client.solrj.SolrServerException: IOException
> occured when talking to server at: http://devel1.kios.sk:8280/solr
>   at
> org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:439)
>   at
> org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:221)
>   at
> org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:105)
>   at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:121)
>   at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:106)
>   at
> kinet.common.fulltext.FullTextQueueProcessor.processEntry(FullTextQueueProcessor.java:181)
>   ... 3 more
> Caused by: org.apache.http.client.ClientProtocolException
>   at
> org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:822)
>   at
> org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:754)
>   at
> org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:732)
>   at
> org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:383)
>   ... 8 more
> Caused by: org.apache.http.client.NonRepeatableRequestException: Cannot
> retry request with a non-repeatable request entity.
>   at
> org.apache.http.impl.client.DefaultRequestDirector.tryExecute(DefaultRequestDirector.java:625)
>   at
> org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:464)
>   at
> org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:820)
>   ... 11 more
> 
> Has someone succeeded with basic authentication in combination with the
> HttpSolrServer?
> 
> Thanks, Viliam
> 
> 
> 
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/HTTP-Basic-Authentication-with-HttpSolrServer-tp3999829.html
> Sent from the Solr - User mailing list archive at Nabble.com.



HTTP Basic Authentication with HttpSolrServer

2012-08-08 Thread vilo
I have protected my solr server with basic authentication. Now I want to
connect to it using SOLRJ. CommonsHttpSolrServer is now deprecated, so I try
to use HttpSolrServer, but I fail to send credentials. If I put them to the
url, I get 401 (http://user:passw...@example.com/solr). I tried this:

  HttpSolrServer solr = new HttpSolrServer(urlString);
  DefaultHttpClient httpClient = (DefaultHttpClient) solr.getHttpClient();
  httpClient.getCredentialsProvider().setCredentials(
  new AuthScope(url.getHost(), url.getPort()),
  new UsernamePasswordCredentials(username, password));

but I get "org.apache.http.client.NonRepeatableRequestException: Cannot
retry request with a non-repeatable request entity". Here is complete call
stack:

Caused by: org.apache.solr.client.solrj.SolrServerException: IOException
occured when talking to server at: http://devel1.kios.sk:8280/solr
at
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:439)
at
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:221)
at
org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:105)
at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:121)
at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:106)
at
kinet.common.fulltext.FullTextQueueProcessor.processEntry(FullTextQueueProcessor.java:181)
... 3 more
Caused by: org.apache.http.client.ClientProtocolException
at
org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:822)
at
org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:754)
at
org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:732)
at
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:383)
... 8 more
Caused by: org.apache.http.client.NonRepeatableRequestException: Cannot
retry request with a non-repeatable request entity.
at
org.apache.http.impl.client.DefaultRequestDirector.tryExecute(DefaultRequestDirector.java:625)
at
org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:464)
at
org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:820)
... 11 more

Has someone succeeded with basic authentication in combination with the
HttpSolrServer?

Thanks, Viliam



--
View this message in context: 
http://lucene.472066.n3.nabble.com/HTTP-Basic-Authentication-with-HttpSolrServer-tp3999829.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Problems with EDisMax field aliases for multiple fields on Solr 3.6.1

2012-08-08 Thread Jack Krupansky
There is an open Solr issue to allow commas in lists everywhere, but even in 
4.0 space is still the delimiter for field name boost lists ("qf" and 
"f..qf").


I'll update the wiki.

-- Jack Krupansky

-Original Message- 
From: Nils Kaiser

Sent: Wednesday, August 08, 2012 9:56 AM
To: solr-user@lucene.apache.org
Subject: Re: Problems with EDisMax field aliases for multiple fields on Solr 
3.6.1


Thanks for the quick replies. Jack was right, I switched to space as
separator and it works.

2) &f.name.qf=last_name_text,first_name_text
- returns 0 results, debug shows last_name_text,first_name_text
+DisjunctionMaxQuery((last_name_text,first_name_text:maier))

8) &f.name.qf=last_name_text first_name_text
- returns 39 results, debug shows last_name_text
first_name_text
+DisjunctionMaxQuery((last_name_text:maier | first_name_text:maier))

So the docs are wrong as the example uses a comma. Should I raise a JIRA
issue for that?

Thanks Jan for the hint regarding parsedquery, I'll make sure to include
it in my reports next time.

Best,

Nils

Am 08.08.2012 15:06, schrieb Jack Krupansky:
Jan, I did notice that you used a space rather than a comma in the alias 
field list. The wiki does indicate comma (which is what Nils used), but... 
who knows. I haven't checked the code yet.


-- Jack Krupansky

-Original Message- From: Jan Høydahl
Sent: Wednesday, August 08, 2012 9:03 AM
To: solr-user@lucene.apache.org
Subject: Re: Problems with EDisMax field aliases for multiple fields on 
Solr 3.6.1


Hi,

It's hard to see what's going on without knowing more about your schema 
and documents. Also, it would be more helpful if you could paste the 
"parsedquery" part of the DebugQuery, where you actually see how the query 
was interpreted. Your query syntax looks correct, and I just verified that 
the feature works on a clean 3.6.1.


I indexed all xml's in example/exampledocs, then ran this query:

http://localhost:8983/solr/select?debugQuery=true&q=foo:drive&fl=*%20score&defType=edismax&f.foo.qf=name^1%20features^2

Here's what my debug looks like:
foo:drive
+DisjunctionMaxQuery((features:drive^2.0 | 
name:drive))


You see that the query string is being parsed correctly, and we get three 
hits (vs 2 in name and 1 in features alone).


--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com
Solr Training - www.solrtraining.com

8. aug. 2012 kl. 13:41 skrev Nils Kaiser :


Hey,

I'm trying to use field aliases that reference multiple fields on Solr 
3.6.1 (1362471) as stated in the EDisMax documentation 
(http://wiki.apache.org/solr/ExtendedDisMax#Field_aliasing_.2BAC8_renaming).


If I use an alias for a single field, everything is fine, but once I use 
an alias with more than one field (using syntax 
f.who.qf=name^5.0,namealias^2.0 as in the docs), the alias stops working.


Examples
(base url is 
http://localhost:8982/solr/select?debugQuery=true&fq=type%3AUser&q=name%3Amaier&fl=*+score&defType=edismax&rows=10 
+ params below, debug shows how f.name.qf is displayed in debug xml view)


1) &f.name.qf=last_name_text
- returns 39 results, debug: last_name_text

2) &f.name.qf=last_name_text,first_name_text
- returns 0 results, debug: name="f.name.qf">last_name_text,first_name_text


3) &f.name.qf=last_name_text%2Cfirst_name_text
- returns 0 results, debug: name="f.name.qf">last_name_text,first_name_text


4) &f.name.qf=first_name_text&f.name.qf=last_name_text
- returns 0 results, debug: name="f.name.qf">first_name_textlast_name_text


5) &f.name.qf=last_name_text&f.name.qf=first_name_text
- returns 39 results, debug: name="f.name.qf">last_name_textfirst_name_text


6) &f.name.qf=last_name_text^2.0,first_name_text^2.0
- http error 500, java.lang.NumberFormatException: For input string: 
"2.0,first_name_text"


7) &f.name.qf=last_name_text^2.0%2Cfirst_name_text^2.0
- http error 500, java.lang.NumberFormatException: For input string: 
"2.0,first_name_text"


Comments:

1) works as expected, but uses only one field for the alias
2) does not work, but this format is explained in the docs if I 
understood it right

3) tried this to try escaping issues, but xml shows the same value
4) does not work, because SOLR seems to take first value only
5) does work, but only because SOLR takes first value (see 4)
6), 7) lead to http error, but format is same as in docs??

Any ideas whether I am doing something wrong here, or the docs are 
misleading, or there is a bug in the SOLR version I use?


Best,

Nils





--
Nils Kaiser
MSc in Information Systems 



Re: Setting "df" (default field) from solrj?

2012-08-08 Thread Jack Krupansky
You should simply set the default value for the "df" request parameter in 
your Solr request handlers in solrconfig.xml. It is set to "text" out of the 
box, but you can set it to your desired field.


If you still want to set/override "df" from SolrJ anyway, use the 
SolrQuery.setParam method:


solrQuery.setParam("df", "SearchText");

-- Jack Krupansky

-Original Message- 
From: homernabble

Sent: Wednesday, August 08, 2012 10:07 AM
To: solr-user@lucene.apache.org
Subject: Setting "df" (default field) from solrj?

I see in Solr 4 the defaultSearchField tag in schema.xml has been 
deprecated.

I was looking in the Solrj API and I don't see a method for setting the
default field on a SolrQuery object.

This is basically what the code looks like now (stripped down):
solrQuery = SolrQuery.new()
solrQuery.setQuery(queryText)
queryResponse = solrServer.query(solrQuery)

Before Solr 4 this would work fine because defaultSearchField was set in
schema.xml.  Now I need to be able to set it from my solrj call.

Am I missing something, how can I set this for my queries via solrj?

As I'm typing this I realize I can do something like I have below (and this
is fine) but still wondering if there is a dedicated method for setting this
somewhere:

solrQuery.add("df", "SearchText")





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Setting-df-default-field-from-solrj-tp3999794.html
Sent from the Solr - User mailing list archive at Nabble.com. 



Re: Does Solr support 'Value Search'?

2012-08-08 Thread Bing Hua
Thanks for the response but wait... Is it related to my question searching
for field values? I was not asking how to use wildcards though. 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Does-Solr-support-Value-Search-tp3999654p3999817.html
Sent from the Solr - User mailing list archive at Nabble.com.


Setting "df" (default field) from solrj?

2012-08-08 Thread homernabble
I see in Solr 4 the defaultSearchField tag in schema.xml has been deprecated. 
I was looking in the Solrj API and I don't see a method for setting the
default field on a SolrQuery object.  

This is basically what the code looks like now (stripped down):
solrQuery = SolrQuery.new()
solrQuery.setQuery(queryText)
queryResponse = solrServer.query(solrQuery)

Before Solr 4 this would work fine because defaultSearchField was set in
schema.xml.  Now I need to be able to set it from my solrj call.

Am I missing something, how can I set this for my queries via solrj?

As I'm typing this I realize I can do something like I have below (and this
is fine) but still wondering if there is a dedicated method for setting this
somewhere:

solrQuery.add("df", "SearchText")





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Setting-df-default-field-from-solrj-tp3999794.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Problems with EDisMax field aliases for multiple fields on Solr 3.6.1

2012-08-08 Thread Nils Kaiser
Thanks for the quick replies. Jack was right, I switched to space as 
separator and it works.


2) &f.name.qf=last_name_text,first_name_text
- returns 0 results, debug shows name="f.name.qf">last_name_text,first_name_text

+DisjunctionMaxQuery((last_name_text,first_name_text:maier))

8) &f.name.qf=last_name_text first_name_text
- returns 39 results, debug shows last_name_text 
first_name_text

+DisjunctionMaxQuery((last_name_text:maier | first_name_text:maier))

So the docs are wrong as the example uses a comma. Should I raise a JIRA 
issue for that?


Thanks Jan for the hint regarding parsedquery, I'll make sure to include 
it in my reports next time.


Best,

Nils

Am 08.08.2012 15:06, schrieb Jack Krupansky:
Jan, I did notice that you used a space rather than a comma in the 
alias field list. The wiki does indicate comma (which is what Nils 
used), but... who knows. I haven't checked the code yet.


-- Jack Krupansky

-Original Message- From: Jan Høydahl
Sent: Wednesday, August 08, 2012 9:03 AM
To: solr-user@lucene.apache.org
Subject: Re: Problems with EDisMax field aliases for multiple fields 
on Solr 3.6.1


Hi,

It's hard to see what's going on without knowing more about your 
schema and documents. Also, it would be more helpful if you could 
paste the "parsedquery" part of the DebugQuery, where you actually see 
how the query was interpreted. Your query syntax looks correct, and I 
just verified that the feature works on a clean 3.6.1.


I indexed all xml's in example/exampledocs, then ran this query:

http://localhost:8983/solr/select?debugQuery=true&q=foo:drive&fl=*%20score&defType=edismax&f.foo.qf=name^1%20features^2 



Here's what my debug looks like:
foo:drive
+DisjunctionMaxQuery((features:drive^2.0 | 
name:drive))


You see that the query string is being parsed correctly, and we get 
three hits (vs 2 in name and 1 in features alone).


--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com
Solr Training - www.solrtraining.com

8. aug. 2012 kl. 13:41 skrev Nils Kaiser :


Hey,

I'm trying to use field aliases that reference multiple fields on 
Solr 3.6.1 (1362471) as stated in the EDisMax documentation 
(http://wiki.apache.org/solr/ExtendedDisMax#Field_aliasing_.2BAC8_renaming).


If I use an alias for a single field, everything is fine, but once I 
use an alias with more than one field (using syntax 
f.who.qf=name^5.0,namealias^2.0 as in the docs), the alias stops 
working.


Examples
(base url is 
http://localhost:8982/solr/select?debugQuery=true&fq=type%3AUser&q=name%3Amaier&fl=*+score&defType=edismax&rows=10 
+ params below, debug shows how f.name.qf is displayed in debug xml 
view)


1) &f.name.qf=last_name_text
- returns 39 results, debug: last_name_text

2) &f.name.qf=last_name_text,first_name_text
- returns 0 results, debug: name="f.name.qf">last_name_text,first_name_text


3) &f.name.qf=last_name_text%2Cfirst_name_text
- returns 0 results, debug: name="f.name.qf">last_name_text,first_name_text


4) &f.name.qf=first_name_text&f.name.qf=last_name_text
- returns 0 results, debug: name="f.name.qf">first_name_textlast_name_text


5) &f.name.qf=last_name_text&f.name.qf=first_name_text
- returns 39 results, debug: name="f.name.qf">last_name_textfirst_name_text


6) &f.name.qf=last_name_text^2.0,first_name_text^2.0
- http error 500, java.lang.NumberFormatException: For input string: 
"2.0,first_name_text"


7) &f.name.qf=last_name_text^2.0%2Cfirst_name_text^2.0
- http error 500, java.lang.NumberFormatException: For input string: 
"2.0,first_name_text"


Comments:

1) works as expected, but uses only one field for the alias
2) does not work, but this format is explained in the docs if I 
understood it right

3) tried this to try escaping issues, but xml shows the same value
4) does not work, because SOLR seems to take first value only
5) does work, but only because SOLR takes first value (see 4)
6), 7) lead to http error, but format is same as in docs??

Any ideas whether I am doing something wrong here, or the docs are 
misleading, or there is a bug in the SOLR version I use?


Best,

Nils 





--
Nils Kaiser
MSc in Information Systems



RE: getting empty result but numFound > 0

2012-08-08 Thread Markus Jelsma
The status is 500, check your logs for some errors.

 
 
-Original message-
> From:Rafael Pappert 
> Sent: Wed 08-Aug-2012 15:49
> To: solr-user@lucene.apache.org
> Subject: getting empty result but numFound > 0
> 
> Hello List,
> 
> i'm evaluate solr 4 / solrCloud and got the following problem.
> I've indexed ~1.5M Documents but the "docs" section in the response is always
> empty. The response for the *:* query looks like this:
> 
> {
>   "responseHeader":{
> "status":500,
> "QTime":12,
> "params":{
>   "fl":"title,img",
>   "indent":"true",
>   "start":"0",
>   "q":"*:*",
>   "wt":"json",
>   "rows":"10"}},
>   "response":{"numFound":1441958,"start":0,"maxScore":1.0,"docs":[]
>   }
> 
> The schema.xml has a lot of stored/indexed fields. Any hints whats wrong?
> 
> Thanks in Advance,
> Rafael.
> 


getting empty result but numFound > 0

2012-08-08 Thread Rafael Pappert
Hello List,

i'm evaluate solr 4 / solrCloud and got the following problem.
I've indexed ~1.5M Documents but the "docs" section in the response is always
empty. The response for the *:* query looks like this:

{
  "responseHeader":{
"status":500,
"QTime":12,
"params":{
  "fl":"title,img",
  "indent":"true",
  "start":"0",
  "q":"*:*",
  "wt":"json",
  "rows":"10"}},
  "response":{"numFound":1441958,"start":0,"maxScore":1.0,"docs":[]
  }

The schema.xml has a lot of stored/indexed fields. Any hints whats wrong?

Thanks in Advance,
Rafael.


Re: Is this too much time for full Data Import?

2012-08-08 Thread Alexey Serba
9m*15 - that's a lot of queries (>400 QPS).

I would try reduce the number of queries:

1. Rewrite your main (root) query to select all possible data
* use SQL joins instead of DIH nested entities
* select data from 1-N related tables (tags, authors, etc) in the main
query using GROUP_CONCAT (that's MySQL specific function, but there
are similar functions for other RDBMS-es) aggregate function and then
split concatenated data in a DIH transformer.

2. Identify small tables in nested entities and cache them completely
in CachedSqlEntityProcessor.



On Wed, Aug 8, 2012 at 10:35 AM, Mikhail Khludnev
 wrote:
> Hello,
>
> Does your indexer utilize CPU/IO? - check it by iostat/vmstat.
> If it doesn't, take several thread dumps by jvisualvm sampler or jstack,
> try to understand what blocks your threads from progress.
> It might happen you need to speedup your SQL data consumption, to do this,
> you can enable threads in DIH (only in 3.6.1), move from N+1 SQL queries to
> select all/cache approach
> http://wiki.apache.org/solr/DataImportHandler#CachedSqlEntityProcessor and
> https://issues.apache.org/jira/browse/SOLR-2382
>
> Good luck
>
> On Wed, Aug 8, 2012 at 9:16 AM, Pranav Prakash  wrote:
>
>> Folks,
>>
>> My full data import takes ~80hrs. It has around ~9m documents and ~15 SQL
>> queries for each document. The database servers are different from Solr
>> Servers. Each document has an update processor chain which (a) calculates
>> signature of the document using SignatureUpdateProcessorFactory and (b)
>> Finds out terms which have term frequency > 2; using a custom processor.
>> The index size is ~ 480GiB
>>
>> I want to know if the amount of time taken is too large compared to the
>> document count? How do I benchmark the stats and what are some of the ways
>> I can improve this? I believe there are some optimizations that I could do
>> at Update Processor Factory level as well. What would be a good way to get
>> dirty on this?
>>
>> *Pranav Prakash*
>>
>> "temet nosce"
>>
>
>
>
> --
> Sincerely yours
> Mikhail Khludnev
> Tech Lead
> Grid Dynamics
>
> 
>  


Re: Problems with EDisMax field aliases for multiple fields on Solr 3.6.1

2012-08-08 Thread Jack Krupansky
Jan, I did notice that you used a space rather than a comma in the alias 
field list. The wiki does indicate comma (which is what Nils used), but... 
who knows. I haven't checked the code yet.


-- Jack Krupansky

-Original Message- 
From: Jan Høydahl

Sent: Wednesday, August 08, 2012 9:03 AM
To: solr-user@lucene.apache.org
Subject: Re: Problems with EDisMax field aliases for multiple fields on Solr 
3.6.1


Hi,

It's hard to see what's going on without knowing more about your schema and 
documents. Also, it would be more helpful if you could paste the 
"parsedquery" part of the DebugQuery, where you actually see how the query 
was interpreted. Your query syntax looks correct, and I just verified that 
the feature works on a clean 3.6.1.


I indexed all xml's in example/exampledocs, then ran this query:

http://localhost:8983/solr/select?debugQuery=true&q=foo:drive&fl=*%20score&defType=edismax&f.foo.qf=name^1%20features^2

Here's what my debug looks like:
foo:drive
+DisjunctionMaxQuery((features:drive^2.0 | 
name:drive))


You see that the query string is being parsed correctly, and we get three 
hits (vs 2 in name and 1 in features alone).


--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com
Solr Training - www.solrtraining.com

8. aug. 2012 kl. 13:41 skrev Nils Kaiser :


Hey,

I'm trying to use field aliases that reference multiple fields on Solr 
3.6.1 (1362471) as stated in the EDisMax documentation 
(http://wiki.apache.org/solr/ExtendedDisMax#Field_aliasing_.2BAC8_renaming).


If I use an alias for a single field, everything is fine, but once I use 
an alias with more than one field (using syntax 
f.who.qf=name^5.0,namealias^2.0 as in the docs), the alias stops working.


Examples
(base url is 
http://localhost:8982/solr/select?debugQuery=true&fq=type%3AUser&q=name%3Amaier&fl=*+score&defType=edismax&rows=10 
+ params below, debug shows how f.name.qf is displayed in debug xml view)


1) &f.name.qf=last_name_text
- returns 39 results, debug: last_name_text

2) &f.name.qf=last_name_text,first_name_text
- returns 0 results, debug: name="f.name.qf">last_name_text,first_name_text


3) &f.name.qf=last_name_text%2Cfirst_name_text
- returns 0 results, debug: name="f.name.qf">last_name_text,first_name_text


4) &f.name.qf=first_name_text&f.name.qf=last_name_text
- returns 0 results, debug: name="f.name.qf">first_name_textlast_name_text


5) &f.name.qf=last_name_text&f.name.qf=first_name_text
- returns 39 results, debug: name="f.name.qf">last_name_textfirst_name_text


6) &f.name.qf=last_name_text^2.0,first_name_text^2.0
- http error 500, java.lang.NumberFormatException: For input string: 
"2.0,first_name_text"


7) &f.name.qf=last_name_text^2.0%2Cfirst_name_text^2.0
- http error 500, java.lang.NumberFormatException: For input string: 
"2.0,first_name_text"


Comments:

1) works as expected, but uses only one field for the alias
2) does not work, but this format is explained in the docs if I understood 
it right

3) tried this to try escaping issues, but xml shows the same value
4) does not work, because SOLR seems to take first value only
5) does work, but only because SOLR takes first value (see 4)
6), 7) lead to http error, but format is same as in docs??

Any ideas whether I am doing something wrong here, or the docs are 
misleading, or there is a bug in the SOLR version I use?


Best,

Nils 




Re: Problems with EDisMax field aliases for multiple fields on Solr 3.6.1

2012-08-08 Thread Jan Høydahl
Hi,

It's hard to see what's going on without knowing more about your schema and 
documents. Also, it would be more helpful if you could paste the "parsedquery" 
part of the DebugQuery, where you actually see how the query was interpreted. 
Your query syntax looks correct, and I just verified that the feature works on 
a clean 3.6.1.

I indexed all xml's in example/exampledocs, then ran this query:

http://localhost:8983/solr/select?debugQuery=true&q=foo:drive&fl=*%20score&defType=edismax&f.foo.qf=name^1%20features^2

Here's what my debug looks like:
foo:drive
+DisjunctionMaxQuery((features:drive^2.0 | 
name:drive))

You see that the query string is being parsed correctly, and we get three hits 
(vs 2 in name and 1 in features alone).

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com
Solr Training - www.solrtraining.com

8. aug. 2012 kl. 13:41 skrev Nils Kaiser :

> Hey,
> 
> I'm trying to use field aliases that reference multiple fields on Solr 3.6.1 
> (1362471) as stated in the EDisMax documentation 
> (http://wiki.apache.org/solr/ExtendedDisMax#Field_aliasing_.2BAC8_renaming).
> 
> If I use an alias for a single field, everything is fine, but once I use an 
> alias with more than one field (using syntax f.who.qf=name^5.0,namealias^2.0 
> as in the docs), the alias stops working.
> 
> Examples
> (base url is 
> http://localhost:8982/solr/select?debugQuery=true&fq=type%3AUser&q=name%3Amaier&fl=*+score&defType=edismax&rows=10
>  + params below, debug shows how f.name.qf is displayed in debug xml view)
> 
> 1) &f.name.qf=last_name_text
> - returns 39 results, debug: last_name_text
> 
> 2) &f.name.qf=last_name_text,first_name_text
> - returns 0 results, debug:  name="f.name.qf">last_name_text,first_name_text
> 
> 3) &f.name.qf=last_name_text%2Cfirst_name_text
> - returns 0 results, debug:  name="f.name.qf">last_name_text,first_name_text
> 
> 4) &f.name.qf=first_name_text&f.name.qf=last_name_text
> - returns 0 results, debug:  name="f.name.qf">first_name_textlast_name_text
> 
> 5) &f.name.qf=last_name_text&f.name.qf=first_name_text
> - returns 39 results, debug:  name="f.name.qf">last_name_textfirst_name_text
> 
> 6) &f.name.qf=last_name_text^2.0,first_name_text^2.0
> - http error 500, java.lang.NumberFormatException: For input string: 
> "2.0,first_name_text"
> 
> 7) &f.name.qf=last_name_text^2.0%2Cfirst_name_text^2.0
> - http error 500, java.lang.NumberFormatException: For input string: 
> "2.0,first_name_text"
> 
> Comments:
> 
> 1) works as expected, but uses only one field for the alias
> 2) does not work, but this format is explained in the docs if I understood it 
> right
> 3) tried this to try escaping issues, but xml shows the same value
> 4) does not work, because SOLR seems to take first value only
> 5) does work, but only because SOLR takes first value (see 4)
> 6), 7) lead to http error, but format is same as in docs??
> 
> Any ideas whether I am doing something wrong here, or the docs are 
> misleading, or there is a bug in the SOLR version I use?
> 
> Best,
> 
> Nils



Re: Is this too much time for full Data Import?

2012-08-08 Thread Michael Della Bitta
Pranav,

If possible, you may wish to consider moving a job this large outside
of DataImportHandler to a standalone program, as the SQL processing is
somewhat limited by the N+1 subselects problem.

Michael Della Bitta


Appinions | 18 East 41st St., Suite 1806 | New York, NY 10017
www.appinions.com
Where Influence Isn’t a Game


On Wed, Aug 8, 2012 at 1:16 AM, Pranav Prakash  wrote:
> Folks,
>
> My full data import takes ~80hrs. It has around ~9m documents and ~15 SQL
> queries for each document. The database servers are different from Solr
> Servers. Each document has an update processor chain which (a) calculates
> signature of the document using SignatureUpdateProcessorFactory and (b)
> Finds out terms which have term frequency > 2; using a custom processor.
> The index size is ~ 480GiB
>
> I want to know if the amount of time taken is too large compared to the
> document count? How do I benchmark the stats and what are some of the ways
> I can improve this? I believe there are some optimizations that I could do
> at Update Processor Factory level as well. What would be a good way to get
> dirty on this?
>
> *Pranav Prakash*
>
> "temet nosce"


Re: numFound changes on changing start and rows

2012-08-08 Thread Michael Della Bitta
Sorry, in my time range example, I forgot to mention that you can
repeatedly execute the 8 hour query and receive no results, even after
the 7 hour query retrieves them.

Kind of an important detail to not forget. :)

Michael Della Bitta


Appinions | 18 East 41st St., Suite 1806 | New York, NY 10017
www.appinions.com
Where Influence Isn’t a Game


On Wed, Aug 8, 2012 at 8:42 AM, Michael Della Bitta
 wrote:
> We've noticed some pretty non-deterministic behavior with sharded
> setups as well.
>
> One thing we've noticed is that a query server can hang on to the set
> of document ids that correspond to a given query even if caching is
> off, which results in some weird behavior, such as a query like:
>
> timestamp:[NOW TO NOW-8HOUR]
>
> Will return no results, but:
>
> timestamp:[NOW TO NOW-7HOUR]
>
> ...will, IF the former query was executed prior to a replication that
> brought in documents that match both queries.
>
> We've also noticed numFound changing during paging through query
> results, as you mention.
>
> One of our use cases is more of a reporting function and it depends on
> there being more deterministic behavior than this, so in the shifting
> numFound case, we've written code to detect a shift and restart the
> query from the beginning.
>
> In the case of cached documentIds not revealing fresher information,
> I'm worried that we're going to have to move to querying each shard in
> turn, which may mean we get left out of using SolrCloud. We haven't
> tried to evaluate it yet, however.
>
> Michael Della Bitta
>
> 
> Appinions | 18 East 41st St., Suite 1806 | New York, NY 10017
> www.appinions.com
> Where Influence Isn’t a Game
>
>
> On Wed, Aug 8, 2012 at 8:10 AM, Rohit  wrote:
>> Hi,
>>
>>
>>
>> We are using Solr3.6 and 2 shards, we are noticing that when we fire a query
>> with start as 0 and rows X the total numFound and the total numFound changes
>> when we fire the same exact query with start as y and rows X.
>>
>>
>>
>> For example.
>>
>>
>>
>> First time
>>
>> query=abc&start=0&rows=4000
>>
>> numFound- 56000
>>
>>
>>
>> Second time
>>
>> query=abc&start=4000&rows=4000
>>
>> numFound- 55998
>>
>>
>>
>> What can cause this?
>>
>>
>>
>>
>>
>>
>>
>> Regards,
>>
>> Rohit
>>
>>
>>


Re: numFound changes on changing start and rows

2012-08-08 Thread Michael Della Bitta
We've noticed some pretty non-deterministic behavior with sharded
setups as well.

One thing we've noticed is that a query server can hang on to the set
of document ids that correspond to a given query even if caching is
off, which results in some weird behavior, such as a query like:

timestamp:[NOW TO NOW-8HOUR]

Will return no results, but:

timestamp:[NOW TO NOW-7HOUR]

...will, IF the former query was executed prior to a replication that
brought in documents that match both queries.

We've also noticed numFound changing during paging through query
results, as you mention.

One of our use cases is more of a reporting function and it depends on
there being more deterministic behavior than this, so in the shifting
numFound case, we've written code to detect a shift and restart the
query from the beginning.

In the case of cached documentIds not revealing fresher information,
I'm worried that we're going to have to move to querying each shard in
turn, which may mean we get left out of using SolrCloud. We haven't
tried to evaluate it yet, however.

Michael Della Bitta


Appinions | 18 East 41st St., Suite 1806 | New York, NY 10017
www.appinions.com
Where Influence Isn’t a Game


On Wed, Aug 8, 2012 at 8:10 AM, Rohit  wrote:
> Hi,
>
>
>
> We are using Solr3.6 and 2 shards, we are noticing that when we fire a query
> with start as 0 and rows X the total numFound and the total numFound changes
> when we fire the same exact query with start as y and rows X.
>
>
>
> For example.
>
>
>
> First time
>
> query=abc&start=0&rows=4000
>
> numFound- 56000
>
>
>
> Second time
>
> query=abc&start=4000&rows=4000
>
> numFound- 55998
>
>
>
> What can cause this?
>
>
>
>
>
>
>
> Regards,
>
> Rohit
>
>
>


numFound changes on changing start and rows

2012-08-08 Thread Rohit
Hi,

 

We are using Solr3.6 and 2 shards, we are noticing that when we fire a query
with start as 0 and rows X the total numFound and the total numFound changes
when we fire the same exact query with start as y and rows X.

 

For example.

 

First time 

query=abc&start=0&rows=4000

numFound- 56000

 

Second time

query=abc&start=4000&rows=4000

numFound- 55998

 

What can cause this?

 

 

 

Regards,

Rohit

 



Problems with EDisMax field aliases for multiple fields on Solr 3.6.1

2012-08-08 Thread Nils Kaiser

Hey,

I'm trying to use field aliases that reference multiple fields on Solr 
3.6.1 (1362471) as stated in the EDisMax documentation 
(http://wiki.apache.org/solr/ExtendedDisMax#Field_aliasing_.2BAC8_renaming).


If I use an alias for a single field, everything is fine, but once I use 
an alias with more than one field (using syntax 
f.who.qf=name^5.0,namealias^2.0 as in the docs), the alias stops working.


Examples
(base url is 
http://localhost:8982/solr/select?debugQuery=true&fq=type%3AUser&q=name%3Amaier&fl=*+score&defType=edismax&rows=10 
+ params below, debug shows how f.name.qf is displayed in debug xml view)


1) &f.name.qf=last_name_text
- returns 39 results, debug: last_name_text

2) &f.name.qf=last_name_text,first_name_text
- returns 0 results, debug: name="f.name.qf">last_name_text,first_name_text


3) &f.name.qf=last_name_text%2Cfirst_name_text
- returns 0 results, debug: name="f.name.qf">last_name_text,first_name_text


4) &f.name.qf=first_name_text&f.name.qf=last_name_text
- returns 0 results, debug: name="f.name.qf">first_name_textlast_name_text


5) &f.name.qf=last_name_text&f.name.qf=first_name_text
- returns 39 results, debug: name="f.name.qf">last_name_textfirst_name_text


6) &f.name.qf=last_name_text^2.0,first_name_text^2.0
- http error 500, java.lang.NumberFormatException: For input string: 
"2.0,first_name_text"


7) &f.name.qf=last_name_text^2.0%2Cfirst_name_text^2.0
- http error 500, java.lang.NumberFormatException: For input string: 
"2.0,first_name_text"


Comments:

1) works as expected, but uses only one field for the alias
2) does not work, but this format is explained in the docs if I 
understood it right

3) tried this to try escaping issues, but xml shows the same value
4) does not work, because SOLR seems to take first value only
5) does work, but only because SOLR takes first value (see 4)
6), 7) lead to http error, but format is same as in docs??

Any ideas whether I am doing something wrong here, or the docs are 
misleading, or there is a bug in the SOLR version I use?


Best,

Nils


Re: Recovery problem in solrcloud

2012-08-08 Thread Yonik Seeley
Stack trace looks normal - it's just a multi-term query instantiating
a bitset.  The memory is being taken up somewhere else.
How many documents are in your index?
Can you get a heap dump or use some other memory profiler to see
what's taking up the space?

> if I stop query more then  ten minutes, the solr instance will start normally.

Maybe queries are piling up in threads before the server is ready to
handle them and then trying to handle them all at once gives an OOM?
Is this live traffic or a test?  How many concurrent requests get sent?

-Yonik
http://lucidimagination.com


On Wed, Aug 8, 2012 at 2:43 AM, Jam Luo  wrote:
> Aug 06, 2012 10:05:55 AM org.apache.solr.common.SolrException log
> SEVERE: null:java.lang.RuntimeException: java.lang.OutOfMemoryError: Java
> heap space
> at
> org.apache.solr.servlet.SolrDispatchFilter.sendError(SolrDispatchFilter.java:456)
> at
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:284)
> at
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1337)
> at
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:484)
> at
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:119)
> at
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:499)
> at
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:233)
> at
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1065)
> at
> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:413)
> at
> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:192)
> at
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:999)
> at
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:117)
> at
> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:250)
> at
> org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:149)
> at
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:111)
> at org.eclipse.jetty.server.Server.handle(Server.java:351)
> at
> org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:454)
> at
> org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:47)
> at
> org.eclipse.jetty.server.AbstractHttpConnection.content(AbstractHttpConnection.java:900)
> at
> org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.content(AbstractHttpConnection.java:954)
> at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:857)
> at
> org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235)
> at
> org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:66)
> at
> org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:254)
> at
> org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:599)
> at
> org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:534)
> at java.lang.Thread.run(Thread.java:722)
> Caused by: java.lang.OutOfMemoryError: Java heap space
> at org.apache.lucene.util.FixedBitSet.(FixedBitSet.java:54)
> at
> org.apache.lucene.search.MultiTermQueryWrapperFilter.getDocIdSet(MultiTermQueryWrapperFilter.java:104)
> at
> org.apache.lucene.search.ConstantScoreQuery$ConstantWeight.scorer(ConstantScoreQuery.java:129)
> at
> org.apache.lucene.search.BooleanQuery$BooleanWeight.scorer(BooleanQuery.java:318)
> at
> org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:507)
> at
> org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:280)
> at
> org.apache.solr.search.SolrIndexSearcher.getDocListNC(SolrIndexSearcher.java:1394)
> at
> org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:1269)
> at
> org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:384)
> at
> org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:420)
> at
> org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:204)
> at
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
> at org.apache.solr.core.SolrCore.execute(SolrCore.java:1544)
> at
> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:442)
> at
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:263)
> at
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1337)
> at
> org.eclipse.jetty.servlet.ServletHandle

Paoding analyzer with solr for chinese

2012-08-08 Thread Rajani Maski
Hi All,

  As said in this blog
site that
paoding
analyzer is much better for chinese text, I was trying to implement it to
get accurate results for chinese text.

I followed the instruction specified in the below sites
Site1
&   
Site2


After Indexing, when I search on same field with same text, no search
results(numFound=0)

And luke tool is not showing up any terms for the field that is indexed
with below field type. Can anyone comment on what is going wrong?



*Schema field types for  paoding :*

*1) *
* *
* 
*
* *
* *


And analaysis page results is :
[image: Inline image 2]

*2)*
*  *
*  *
**

Analysis on the  field "paoding_chinese" throws this error
[image: Inline image 3]



Thanks & Regards
Rajani


Re: Designing an index with multiple entity types, sharing field names across entity-types.

2012-08-08 Thread santamaria2
To clarify a wee bit more. I'm wondering the performance impact on
single-entity queries if I use common field names.
eg. 'name' field for all entity types. 'Author' & 'Book' together make up
for 200,000+ 'name' values. Will this affect anything if I search over
'Category'? Will using fq=type:category save me?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Designing-an-index-with-multiple-entity-types-sharing-field-names-across-entity-types-tp3999727p3999728.html
Sent from the Solr - User mailing list archive at Nabble.com.


Designing an index with multiple entity types, sharing field names across entity-types.

2012-08-08 Thread santamaria2
My question stems from a vague memory of reading somewhere that Solr's search
performance depends on how the total number of 'terms' there are in all in a
field that is searched upon.

I'm setting up an index core for some autocomplete boxes on my site. There
is a search box for each facet group in my results page (suggestions for a
single entity-type), and a 'generic' search box on my header that will
display suggestions for multiple entity-types.

The entity types are: Books, Authors, Categories, Publishers.

Books, Authors --> over 100,000 of each type right now. Will grow larger.
Categories, Publishers --> around 500 of each type. Will grow slowly.

Books & Categories have 'descriptions' which I also want searchable -- with
lower boosts.

In my per-entity search boxes, for autocomplete suggestions for user input
"man", I'd do:
q=(name:man* OR description:man*^0.5)&fq=type:


For my generic search box on top of my page, I would not have fq, but
instead I'd use &group=true&group.field=type.
(type --> {'book', 'author', 'category', 'publisher'})

This seems okay, but I'm just wondering about what I said in my first
paragraph. The number of total terms of a field.

For a lrge index, would it be better to more specific fields?
eg. Instead of a common field 'name', what if I do 'author_name',
'book_name', 'publisher_name', 'category_name', 'book_description',
'category_description'?

Would this be 'faster' to search on?
For my per-entity search boxes, the query changes in an obvious manner. But
this would complicate stuff for my generic-search-box query... for which I
haven't decided on how I'd go about designing a query, yet.

What say thee?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Designing-an-index-with-multiple-entity-types-sharing-field-names-across-entity-types-tp3999727.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Connect to SOLR over socket file

2012-08-08 Thread Michael Kuhlmann

On 07.08.2012 21:43, Jason Axelson wrote:

Hi,

Is it possible to connect to SOLR over a socket file as is possible
with mysql? I've looked around and I get the feeling that I may be
mi-understanding part of SOLR's architecture.

Any pointers are welcome.

Thanks,
Jason


Hi Jason,

not that I know of. This has nothing to do with Solr, it depends on the 
web server you are using. Tomcat, Jetty and the others are using TCP/IP 
directly through java.io or java.nio classes, and Solr is just one web 
app that is handled by them.


Java web servers typically run on a separate host, and in contrast to 
MySQL, the local deployment is rather the exception than the standard.


If you don't want the network overhead, than use an embedded Solr 
server: http://wiki.apache.org/solr/EmbeddedSolr


Greetings,
Kuli