date:20130603

Hi,

I am constantly getting this error in my solr log:

Can't find (or read) directory to add to classloader:
/non/existent/dir/yields/warning (resolved as:
E:\Projects\apache_solr\solr-4.3.0\example\solr\genesis_experimental\non\existent\dir\yields\warning).

Anyone got any idea on how to solve this


-- 
Regards,
Raheel Hasan

Re: /non/existent/dir/yields/warning

2013-06-03 Thread Rafał Kuć

Hello!

You should remove that entry from your solrconfig.xml file. It is
something like this:

  lib dir=/non/existent/dir/yields/warning / 


-- 
Regards,
 Rafał Kuć
 Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - ElasticSearch

 Hi,

 I am constantly getting this error in my solr log:

 Can't find (or read) directory to add to classloader:
 /non/existent/dir/yields/warning (resolved as:
 E:\Projects\apache_solr\solr-4.3.0\example\solr\genesis_experimental\non\existent\dir\yields\warning).

 Anyone got any idea on how to solve this

Re: /non/existent/dir/yields/warning

ok thanks :)

But why was it there anyway? I mean it says in comments:
If a 'dir' option (with or without a regex) is used and nothing
is found that matches, a warning will be logged.

So it looks like a kind of exception handling or logging for libs not
found... so shouldnt this folder actually exist?





On Mon, Jun 3, 2013 at 2:06 PM, Rafał Kuć r@solr.pl wrote:

 Hello!

 You should remove that entry from your solrconfig.xml file. It is
 something like this:

   lib dir=/non/existent/dir/yields/warning /


 --
 Regards,
  Rafał Kuć
  Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - ElasticSearch

  Hi,

  I am constantly getting this error in my solr log:

  Can't find (or read) directory to add to classloader:
  /non/existent/dir/yields/warning (resolved as:
 
 E:\Projects\apache_solr\solr-4.3.0\example\solr\genesis_experimental\non\existent\dir\yields\warning).

  Anyone got any idea on how to solve this





-- 
Regards,
Raheel Hasan

HostPort attribute of core tag in solr.xml

2013-06-03 Thread Prathik Puthran

Hi,

I am not very sure what the hostPort attribute in core tag of solr.xml
mean. Can someone please let me know?

Thanks,
Prathik

Constant score for more like this reference document

2013-06-03 Thread Achim Domma

I call the mlt handler using a query which searches for a certain document 
(?q=id:some_document_id). The reference document is included in the result and 
the score is also returned. I found out, that the score if fixed, independent 
of the document. So for each document id I get the same score. The score varies 
between cores, but is fixed per core.

I'm aware of all the warnings about scores not being absolute values and that 
you cannot compare them. But I wonder, why the value is fixed per core. Is it 
just a random value or is it possible to explain how it's calculated?

I'm just digging into the code to get a better understanding of the inner 
working, but I'm not yet deep enough. Feel free to point me to the relevant 
code snippets!

kind regards,
Achim

Re: /non/existent/dir/yields/warning

2013-06-03 Thread Rafał Kuć

Hello!

That's a good question. I suppose its there to show users how to setup
a custom path to libraries.

-- 
Regards,
 Rafał Kuć
 Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - ElasticSearch

 ok thanks :)

 But why was it there anyway? I mean it says in comments:
 If a 'dir' option (with or without a regex) is used and nothing
 is found that matches, a warning will be logged.

 So it looks like a kind of exception handling or logging for libs not
 found... so shouldnt this folder actually exist?





 On Mon, Jun 3, 2013 at 2:06 PM, Rafał Kuć r@solr.pl wrote:

 Hello!

 You should remove that entry from your solrconfig.xml file. It is
 something like this:

   lib dir=/non/existent/dir/yields/warning /


 --
 Regards,
  Rafał Kuć
  Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - ElasticSearch

  Hi,

  I am constantly getting this error in my solr log:

  Can't find (or read) directory to add to classloader:
  /non/existent/dir/yields/warning (resolved as:
 
 E:\Projects\apache_solr\solr-4.3.0\example\solr\genesis_experimental\non\existent\dir\yields\warning).

  Anyone got any idea on how to solve this

Re: How can a Tokenizer be CoreAware?

2013-06-03 Thread Michael Sokolov

Benson, I think the idea is that Tokenizers are created as needed (from 
the TokenizerFactory), while those other objects are singular (one 
created for each corresponding stanza in solrconfig.xml).  So Tokenizers 
should be short-lived; they'll be cleaned up after each use, and the 
assumption is you wouldn't need to do any cleanup yourself; rather just 
let the garbage collector do its work -- assuming these are per-document 
resources.  But if you have longer-lived resources, maybe you could 
manage them in the TokenizerFactory, which will be a singleton?  Or in 
UpdateRequestProcessFactory, like you suggested


-Mike

On 5/29/13 7:36 AM, Benson Margulies wrote:

I am currently testing some things with Solr 4.0.0. I tried to make a
tokenizer CoreAware, and was rewarded with:

Caused by: org.apache.solr.common.SolrException: Invalid 'Aware'
object: com.basistech.rlp.solr.RLPTokenizerFactory@19336006 --
org.apache.solr.util.plugin.SolrCoreAware must be an instance of:
[org.apache.solr.request.SolrRequestHandler]
[org.apache.solr.response.QueryResponseWriter]
[org.apache.solr.handler.component.SearchComponent]
[org.apache.solr.update.processor.UpdateRequestProcessorFactory]
[org.apache.solr.handler.component.ShardHandlerFactory]

I need this to allow cleanup of some cached items in the tokenizer.

Questions:

1: will a newer version allow me to do this directly?
2: is there some other approach that anyone would recommend? I could,
for example, make a fake object in the list above to act as a
singleton with a static accessor, but that seems pretty ugly.

Re: Solr + Groovy

2013-06-03 Thread Michael Sokolov


On 6/3/13 3:07 AM, Achim Domma wrote:

Hi,

I have some query building and result processing code, which is currently running as 
normal Solr client outside of Solr. I think it would make a lot of sense to 
move parts of this code into a custom SearchHandler or SearchComponent. Because I'm not a 
big fan of the Java language, I would like to use Groovy.

Searching the web I got the impression that Solr + alternative JVM languages 
is not a very common topic. So before starting my project, I would like to know: Is there 
a well known good reason not to use Groovy (or Clojure, Scala, ...) for implementing 
custom Solr code?

kind regards,
Achim

Check out Paul Nelson's work, presented at Lucene Revolution 2013:

http://www.lucenerevolution.org/sites/default/files/Advanced%20Query%20Parsing%20Techniques.pdf

He reported success using Groovy embedded in Solr to generate queries

-Mike

Re: Reindexing strategy

2013-06-03 Thread Dotan Cohen

On Fri, May 31, 2013 at 3:57 AM, Michael Sokolov
msoko...@safaribooksonline.comgt wrote:
 On UNIX platforms, take a look at vmstat for basic I/O measurement, and
 iostat for more detailed stats.  One coarse measurement is the number of
 blocked/waiting processes - usually this is due to I/O contention, and you
 will want to look at the paging and swapping numbers - you don't want any
 swapping at all.  But the best single number to look at is overall disk
 activity, which is the I/O percentage utilized number Shaun was mentioning.

 -Mike

Great, thanks! I've got some terms to google. For those who follow in
my footsteps, on Ubuntu the package 'sysstat' needs to be installed to
use iostat. Here are my reference stats before starting to experiment,
both for my own use later to compare and also if anybody sees anything
amiss here then I would love to know about it. If there is any fine
manual that is particularly urgent that I should read, please do
mention it. Thanks!


--
Dotan Cohen

http://gibberish.co.il
http://what-is-what.com

SpatialRecursivePrefixTreeFieldType Spatial Searching

2013-06-03 Thread Chris Atkinson

Hi,
I'm seeing really slow query times. 7-25 seconds when I run a simple filter
query that uses my SpatialRecursivePrefixTreeFieldType field.

My index is about 30k documents. Prior to adding the Spatial field, the on
disk space was about 100Mb, so it's a really tiny index. Once I add the
spatial field (which is multi-values), the index size jumps up to 2GB. (Is
this normal?).

Only about 10k documents will have any spatial data. Typically, they will
have at most 10 shapes each, but the majority are all one of two
rectangles.

This is my fieldType definition.

   fieldType name=date_availability
class=solr.SpatialRecursivePrefixTreeFieldType
geo=false
worldBounds=0 0 3650 1
distErrPct=0
maxDistErr=1
units=degrees
/

And the field

 field name=availability_spatial  type=date_availability
 indexed=true stored=false multiValued=true /


I am using the field to represent approximately 10 years after January 1st
2013, where each day is along the X-axis. Because the availability starts
and ends at 2pm and 10am, I was using a decimal place when creating my
shape to show that detail. (Is this approach wrong?)

So a typical rectangle when indexed would be (minX minY maxX maxY)

Rectangle 100.6 0 120.4 1

Is it wrong that my Y and X values are not of the same scale? Since I don't
care about the Y axis at all, I just set it to be of 1 height always.

I'm running Solr 4.3, with a small JVM of 768M (can be increased). And I
have 2GB RAM. (Again can be increased).

Thanks

ContributorsGroup

2013-06-03 Thread Emrah Kara

Hi,
Could you please add EmrahKara to ContributorsGroup in solr wiki?

-- 
  *[image: CNT logo] http://www.cntbilisim.com.tr/
**Emrah Kara*
Developer at CNT

Email / Gtalk: em...@cntbilisim.com.tr   Skype: rockipsiz
TEL: +90 232 3481851   GSM: +90 533 3634362   FAX: +90 232 3481861
283/14 Sk No 4 Ender Apt. D:4 Mansuroglu Mah. Bayrakli IZMIR TURKEY
www.tamindir.com

Re: ContributorsGroup

2013-06-03 Thread Erick Erickson

Done, looking forward to your contributions!

Erick

On Mon, Jun 3, 2013 at 7:22 AM, Emrah Kara em...@cntbilisim.com.tr wrote:
 Hi,
 Could you please add EmrahKara to ContributorsGroup in solr wiki?

 --
   *[image: CNT logo] http://www.cntbilisim.com.tr/
 **Emrah Kara*
 Developer at CNT

 Email / Gtalk: em...@cntbilisim.com.tr   Skype: rockipsiz
 TEL: +90 232 3481851   GSM: +90 533 3634362   FAX: +90 232 3481861
 283/14 Sk No 4 Ender Apt. D:4 Mansuroglu Mah. Bayrakli IZMIR TURKEY
 www.tamindir.com

Re: SpatialRecursivePrefixTreeFieldType Spatial Searching

2013-06-03 Thread Chris Atkinson

Also, here is a sample query, and the debugQuery output

fq={!cost=200}*:* -availability_spatial:Intersects(182.6 0 199.4 1)

Incase the formatting is bad, here is a raw past of the debugQuery:

http://pastie.org/pastes/872/text?key=ksjyboect4imrha0rck8sa


?xml version=1.0 encoding=UTF-8? response lst name=responseHeader
 int name=status0/int int name=QTime8171/int lst name=params
 str name=debugQuerytrue/str str name=indenttrue/str str name=
q*:*/str str name=_1370259235923/str str name=wtxml/str 
str name=fq{!cost=200}*:* -availability_spatial:Intersects(182.6 0
199.4 1)/str str name=rows0/str /lst /lst result name=
response numFound=16137 start=0 /result lst name=debug str
name=rawquerystring*:*/str str name=querystring*:*/str str name=
parsedqueryMatchAllDocsQuery(*:*)/str str name=parsedquery_toString
*:*/str lst name=explain/ str name=QParserLuceneQParser/str arr
name=filter_queries str{!cost=200}*:*
-availability_spatial:Intersects(182.6 0 199.4 1)/str /arr arr name=
parsed_filter_queries str+MatchAllDocsQuery(*:*)
-ConstantScore(org.apache.lucene.spatial.prefix.IntersectsPrefixTreeFilter@42ce603b
)/str /arr lst name=timing double name=time8171.0/double lst
name=prepare double name=time1.0/double lst name=query double
name=time0.0/double /lst lst name=facet double name=time0.0/
double /lst lst name=mlt double name=time1.0/double /lst lst
name=highlight double name=time0.0/double /lst lst name=stats
double name=time0.0/double /lst lst name=debug double name=
time0.0/double /lst /lst lst name=process double name=time
8170.0/double lst name=query double name=time8170.0/double /lst
 lst name=facet double name=time0.0/double /lst lst name=mlt
 double name=time0.0/double /lst lst name=highlight double
name=time0.0/double /lst lst name=stats double name=time0.0/
double /lst lst name=debug double name=time0.0/double /lst /
lst /lst /lst /response



On Mon, Jun 3, 2013 at 12:27 PM, Chris Atkinson chrisa...@gmail.com wrote:

 Hi,
 I'm seeing really slow query times. 7-25 seconds when I run a simple
 filter query that uses my SpatialRecursivePrefixTreeFieldType field.

 My index is about 30k documents. Prior to adding the Spatial field, the on
 disk space was about 100Mb, so it's a really tiny index. Once I add the
 spatial field (which is multi-values), the index size jumps up to 2GB. (Is
 this normal?).

 Only about 10k documents will have any spatial data. Typically, they will
 have at most 10 shapes each, but the majority are all one of two
 rectangles.

 This is my fieldType definition.

fieldType name=date_availability
 class=solr.SpatialRecursivePrefixTreeFieldType
 geo=false
 worldBounds=0 0 3650 1
 distErrPct=0
 maxDistErr=1
 units=degrees
 /

 And the field

  field name=availability_spatial  type=date_availability
  indexed=true stored=false multiValued=true /


 I am using the field to represent approximately 10 years after January 1st
 2013, where each day is along the X-axis. Because the availability starts
 and ends at 2pm and 10am, I was using a decimal place when creating my
 shape to show that detail. (Is this approach wrong?)

 So a typical rectangle when indexed would be (minX minY maxX maxY)

 Rectangle 100.6 0 120.4 1

 Is it wrong that my Y and X values are not of the same scale? Since I
 don't care about the Y axis at all, I just set it to be of 1 height always.

 I'm running Solr 4.3, with a small JVM of 768M (can be increased). And I
 have 2GB RAM. (Again can be increased).

 Thanks

Re: Estimating the required volume to

2013-06-03 Thread Erick Erickson

Here's a link to various transformations you can do
while indexing and searching in Solr:
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters
Consider
stemming
ngrams
WordDelimiterFilterFactory
ASCIIFoldingFilterFactory
phrase queries
boosting
synonyms
blah blah blah

You can't do a lot of these transformations, at least not easily
in SQL. OTOH, you can't do 5-way joins in Solr. Different problems,
different tools

All that said, there's no good reason to use Solr if your use-case
is satisfied by simple keyword searches that have no transformations,
mysql etc. work just fine in those cases. It's all about selecting the
right tool for the use-case.

FWIW,
Erick

On Mon, Jun 3, 2013 at 4:44 AM, Mysurf Mail stammail...@gmail.com wrote:
Thanks for your answer.
Can you please elaborate on
mssql text searching is pretty primitive compared to Solr
(Link or anything)
Thanks.

On Sun, Jun 2, 2013 at 4:54 PM, Erick Erickson erickerick...@gmail.comwrote:

1 Maybe, maybe not. mssql text searching is pretty primitive
compared to Solr, just as Solr's db-like operations are
primitive compared to mssql. They address different use-cases.

Very often, something like the DB is considered the system-of-record
and it's indexed to Solr (See DIH or SolrJ) periodically.

There is no underlying connection between your SQL store and Solr.
You control when data is fetched from SQL and put into Solr. You
control what the search experience is. etc.

2 Not really :(. See:

http://searchhub.org/dev/2012/07/23/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer/

Best
Erick

On Sat, Jun 1, 2013 at 1:07 PM, Mysurf Mail stammail...@gmail.com wrote:
Hi,

I am just starting to learn about solr.
I want to test it in my env working with ms sql server.

I have followed the tutorial and imported some rows to the Solr.
Now I have a few noob question regarding the benefits of implementing
Solr
on a sql environment.

Re: /non/existent/dir/yields/warning

Hi,

but the path looks like it shows how to setup non existent lib warning...
:D


On Mon, Jun 3, 2013 at 2:56 PM, Rafał Kuć r@solr.pl wrote:

 Hello!

 That's a good question. I suppose its there to show users how to setup
 a custom path to libraries.

 --
 Regards,
  Rafał Kuć
  Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - ElasticSearch

  ok thanks :)

  But why was it there anyway? I mean it says in comments:
  If a 'dir' option (with or without a regex) is used and nothing
  is found that matches, a warning will be logged.

  So it looks like a kind of exception handling or logging for libs not
  found... so shouldnt this folder actually exist?





  On Mon, Jun 3, 2013 at 2:06 PM, Rafał Kuć r@solr.pl wrote:

  Hello!
 
  You should remove that entry from your solrconfig.xml file. It is
  something like this:
 
lib dir=/non/existent/dir/yields/warning /
 
 
  --
  Regards,
   Rafał Kuć
   Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch -
 ElasticSearch
 
   Hi,
 
   I am constantly getting this error in my solr log:
 
   Can't find (or read) directory to add to classloader:
   /non/existent/dir/yields/warning (resolved as:
  
 
 E:\Projects\apache_solr\solr-4.3.0\example\solr\genesis_experimental\non\existent\dir\yields\warning).
 
   Anyone got any idea on how to solve this
 
 
 





-- 
Regards,
Raheel Hasan

Re: FieldCache insanity with field used as facet and group

2013-06-03 Thread Elodie Sannier


I'm reproducing the problem with the 4.2.1 example with 2 shards.

1) started up solr shards, indexed the example data, and confirmed empty
fieldCaches
[sanniere@funlevel-dx example]$ java
-Dbootstrap_confdir=./solr/collection1/conf
-Dcollection.configName=myconf -DzkRun -DnumShards=2 -jar start.jar
[sanniere@funlevel-dx example2]$ java -Djetty.port=7574
-DzkHost=localhost:9983 -jar start.jar

2) used both grouping and faceting on the popularity field, then checked
the fieldcache insanity count
[sanniere@funlevel-dx example]$ curl -sS
http://localhost:8983/solr/select?q=*:*group=truegroup.field=popularity;
 /dev/null
[sanniere@funlevel-dx example]$ curl -sS
http://localhost:8983/solr/select?q=*:*facet=truefacet.field=popularity;
 /dev/null
[sanniere@funlevel-dx example]$ curl -sS
http://localhost:8983/solr/admin/mbeans?stats=truekey=fieldCachewt=jsonindent=true;
| grep entries_count|insanity_count
entries_count:10,
insanity_count:2,

insanity#0:VALUEMISMATCH: Multiple distinct value objects for
SegmentCoreReader(owner=_g(4.2.1):C1)+popularity\n\t'SegmentCoreReader(owner=_g(4.2.1):C1)'='popularity',class
org.apache.lucene.index.SortedDocValues,0.5=org.apache.lucene.search.FieldCacheImpl$SortedDocValuesImpl#12129794\n\t'SegmentCoreReader(owner=_g(4.2.1):C1)'='popularity',int,null=org.apache.lucene.search.FieldCacheImpl$IntsFromArray#12298774\n\t'SegmentCoreReader(owner=_g(4.2.1):C1)'='popularity',int,org.apache.lucene.search.FieldCache.NUMERIC_UTILS_INT_PARSER=org.apache.lucene.search.FieldCacheImpl$IntsFromArray#12298774\n,
insanity#1:VALUEMISMATCH: Multiple distinct value objects for
SegmentCoreReader(owner=_f(4.2.1):C9)+popularity\n\t'SegmentCoreReader(owner=_f(4.2.1):C9)'='popularity',int,org.apache.lucene.search.FieldCache.NUMERIC_UTILS_INT_PARSER=org.apache.lucene.search.FieldCacheImpl$IntsFromArray#16648315\n\t'SegmentCoreReader(owner=_f(4.2.1):C9)'='popularity',int,null=org.apache.lucene.search.FieldCacheImpl$IntsFromArray#16648315\n\t'SegmentCoreReader(owner=_f(4.2.1):C9)'='popularity',class
org.apache.lucene.index.SortedDocValues,0.5=org.apache.lucene.search.FieldCacheImpl$SortedDocValuesImpl#1130715\n}}},
HIGHLIGHTING,{},
OTHER,{}]}

I've updated https://issues.apache.org/jira/browse/SOLR-4866

Elodie

Le 28.05.2013 10:22, Elodie Sannier a écrit :

I've created https://issues.apache.org/jira/browse/SOLR-4866

Elodie

Le 07.05.2013 18:19, Chris Hostetter a écrit :

: I am using the Lucene FieldCache with SolrCloud and I have insane instances
: with messages like:

FWIW: I'm the one that named the result of these sanity checks
FieldCacheInsantity and i have regretted it ever since -- a better label
would have been inconsistency

: VALUEMISMATCH: Multiple distinct value objects for
: SegmentCoreReader(owner=_11i(4.2.1):C4493997/853637)+merchantid
: 'SegmentCoreReader(owner=_11i(4.2.1):C4493997/853637)'='merchantid',class
: 
org.apache.lucene.index.SortedDocValues,0.5=org.apache.lucene.search.FieldCacheImpl$SortedDocValuesImpl#557711353
: 
'SegmentCoreReader(owner=_11i(4.2.1):C4493997/853637)'='merchantid',int,null=org.apache.lucene.search.FieldCacheImpl$IntsFromArray#1105988713
: 
'SegmentCoreReader(owner=_11i(4.2.1):C4493997/853637)'='merchantid',int,org.apache.lucene.search.FieldCache.NUMERIC_UTILS_INT_PARSER=org.apache.lucene.search.FieldCacheImpl$IntsFromArray#1105988713
:
: All insane instances are for a field merchantid of type int used as facet
: and group field.

Interesting: it appears that the grouping code and the facet code are not
being consistent in how they are building hte field cache, so you are
getting two objects in the cache for each segment

I haven't checked if this happens much with the example configs, but if
you could: please file a bug with the details of which Solr version you
are using along with the schema fieldType   filed declarations for your
merchantid field, along with the mbean stats output showing the field
cache insanity after executing two queries like...

/select?q=*:*facet=truefacet.field=merchantid
/select?q=*:*group=truegroup.field=merchantid

(that way we can rule out your custom SearchComponent as having a bug in
it)

: This insanity can have performance impact ?
: How can I fix it ?

the impact is just that more ram is being used them is probably strictly
neccessary.  unless there is something unusual in your fieldType
delcataion, i don't think there is an easy fix you can apply -- we need to
fix the underlying code.

-Hoss


--
Kelkoo

*Elodie Sannier *Software engineer

*E*elodie.sann...@kelkoo.frmailto:elodie.sann...@kelkoo.fr
*Y!Messenger* kelkooelodies
*T* +33 (0)4 56 09 07 55 *M*
*A* 4/6 Rue des Méridiens 38130 Echirolles




Kelkoo SAS
Société par Actions Simplifiée
Au capital de € 4.168.964,30
Siège social : 8, rue du Sentier 75002 Paris
425 093 069 RCS Paris

Ce message et les pièces jointes sont confidentiels et établis à l'attention 
exclusive de leurs destinataires. Si vous n'êtes pas le destinataire de ce 
message, merci de le

Multitable import - uniqueKey

Hi,

I am importing multiple table (by join) into solr using DIH. All is set,
except for 1 confusion:
what to do with *uniqueKey* in schema?

When I had only 1 table, I had it fine. Now how to put 2 uniqueKeys (both
from different table).

For example:

uniqueKeytable1_id/uniqueKey
uniqueKeytable2_id/uniqueKey

Will this work?

-- 
Regards,
Raheel Hasan

Re: Estimating the required volume to

2013-06-03 Thread Mysurf Mail

Hi,
Thanks for your answer.
I want to refer to your message, because I am trying to choose the right
tool.

1. regarding stemming:
I am running in ms-sql

SELECT * FROM sys.dm_fts_parser ('FORMSOF(INFLECTIONAL,provide)', 1033,
0, 0)

and I receive

group_id phrase_id occurrence special_term display_term expansion_type
source_term
1 0 1 Exact Match *provided *2 provide
1 0 1 Exact Match *provides *2 provide
1 0 1 Exact Match *providing *2 provide
1 0 1 Exact Match *provide *0 provide

isnt that stemming ?
2. Regarding synonyms
sql server has a full thesaurus
featurehttp://msdn.microsoft.com/en-us/library/ms142491.aspx.
Doesnt it mean synonyms?

On Mon, Jun 3, 2013 at 2:43 PM, Erick Erickson erickerick...@gmail.comwrote:

You can't do a lot of these transformations, at least not easily
in SQL. OTOH, you can't do 5-way joins in Solr. Different problems,
different tools

FWIW,
Erick

On Sun, Jun 2, 2013 at 4:54 PM, Erick Erickson erickerick...@gmail.com
wrote:

1 Maybe, maybe not. mssql text searching is pretty primitive
compared to Solr, just as Solr's db-like operations are
primitive compared to mssql. They address different use-cases.

Very often, something like the DB is considered the system-of-record
and it's indexed to Solr (See DIH or SolrJ) periodically.

There is no underlying connection between your SQL store and Solr.
You control when data is fetched from SQL and put into Solr. You
control what the search experience is. etc.

2 Not really :(. See:

http://searchhub.org/dev/2012/07/23/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer/

Best
Erick

On Sat, Jun 1, 2013 at 1:07 PM, Mysurf Mail stammail...@gmail.com
wrote:
Hi,

I am just starting to learn about solr.
I want to test it in my env working with ms sql server.

I have followed the tutorial and imported some rows to the Solr.
Now I have a few noob question regarding the benefits of implementing
Solr
on a sql environment.

1. As I understand, When I send a query request over http, I receive a
result with ID from the Solr system and than I query the full object
row
from the db.
Is that right?
Is there a comparison next to ms sql full text search which retrieves
the
full object in the same select?
Is there a comparison that relates to db/server cluster and multiple
machines?
2. Is there a technic that will assist me to estimate the volume size
I
will need for the indexed data (obviously, based on the indexed data
properties) ?

Re: how are you handling killer queries?

On 6/3/2013 2:39 AM, Bernd Fehling wrote:
 How are you handling killer queries with solr?
 
 While solr/lucene (currently 4.2.1) is trying to do its best I see sometimes 
 stupid queries
 in my logs, located with extremly long query time.
 
 Example:
 q=???+and+??+and+???+and++and+???+and+??
 
 I even get hits for this (hits=34091309 status=0 QTime=88667).
 
 But the jetty log says:
 WARN:oejs.Response:Committed before 500 {msg=Datenübergabe unterbrochen
  (broken pipe),trace=org.eclipse.jetty.io.EofException...
 org.eclipse.jetty.http.HttpGenerator.flushBuffer(HttpGenerator.java:838)|?... 
 35 more|,code=500}
 WARN:oejs.ServletHandler:/solr/base/select
 java.lang.IllegalStateException: Committed
 at org.eclipse.jetty.server.Response.resetBuffer(Response.java:1136)
 
 Because I get hits and qtime the search is successful, right?
 
 But jetty/http has already closed the connection and solr doesn't know about 
 this?
 
 How are you handling killer queries, just ignoring?
 Or something to tune (jetty config about timeout) or filter (query filtering)?

As you might know, EofException happens when one end (usually the
client) closes the TCP connection before the response is delivered.
This is usually caused by explicitly setting timeouts, or by using a
load balancer in front of Solr, because these will normally limit how
long the response can take.  The timeout involved is probably 60 seconds
in this case, and the query took nearly 90 seconds.

It doesn't cause any *direct* problems for Solr, though the nasty
exception that gets logged every time is annoying.  A query like that
does use a lot of resources, so if the server doesn't have a lot of
spare capacity, it can cause problems for everyone else.

Assuming that this isn't happening due to bugs in your application, the
only way to really handle this problem is to first locate the problem
user and educate them.  If the problem continues and it's a viable
option, you might need to ban that user from your system.

Thanks,
Shawn

Re: HostPort attribute of core tag in solr.xml

On 6/3/2013 3:16 AM, Prathik Puthran wrote:
 I am not very sure what the hostPort attribute in core tag of solr.xml
 mean. Can someone please let me know?

This only has meaning if you are using SolrCloud.  This is how each Solr
server in the cloud informs the cloud what port it is using.

http://wiki.apache.org/solr/SolrCloud#SolrCloud_Instance_Params

Thanks,
Shawn

Re: /non/existent/dir/yields/warning

On 6/3/2013 5:58 AM, Raheel Hasan wrote:
 but the path looks like it shows how to setup non existent lib warning...
 :D

The reason for its existence is encoded in its name.  A nonexistent path
results in a warning.  It's a way to illustrate to a novice what happens
when you have a non-fatal misconfiguration.  The message is a warning
and doesn't prevent Solr startup.

Thanks,
Shawn

Can mm (min-match) be specified by field in dismax or edismax?

2013-06-03 Thread Eric Wilson

I would like to have the min-match set differently for different fields in
my dismax handler. Is this possible?

Re: how are you handling killer queries?

2013-06-03 Thread Bernd Fehling

Hi Shawn,
well, the user is the world and the servers have enough capacity.
So its nothing really to worry about.
OK, could raise timeout from standard 60 to 90, 120 or even 180 seconds.
Just wanted to know how other solr developer handle this.

The technical question, where is the difference between hitting
the stop button from the browser while a search is running and
the timeout of http connection in my container (in my case jetty)?

I guess the stop button from the browser will inform all parts involved
whereas the timeout just leaves an open end somewhere in the container (broken 
pipe)?

And the container has no way to simulate a browser stop button in case of a 
timeout
to get a sane termination?

Bernd


Am 03.06.2013 16:20, schrieb Shawn Heisey:
 On 6/3/2013 2:39 AM, Bernd Fehling wrote:
 How are you handling killer queries with solr?

 While solr/lucene (currently 4.2.1) is trying to do its best I see sometimes 
 stupid queries
 in my logs, located with extremly long query time.

 Example:
 q=???+and+??+and+???+and++and+???+and+??

 I even get hits for this (hits=34091309 status=0 QTime=88667).

 But the jetty log says:
 WARN:oejs.Response:Committed before 500 {msg=Datenübergabe unterbrochen
  (broken pipe),trace=org.eclipse.jetty.io.EofException...
 org.eclipse.jetty.http.HttpGenerator.flushBuffer(HttpGenerator.java:838)|?...
  35 more|,code=500}
 WARN:oejs.ServletHandler:/solr/base/select
 java.lang.IllegalStateException: Committed
 at org.eclipse.jetty.server.Response.resetBuffer(Response.java:1136)

 Because I get hits and qtime the search is successful, right?

 But jetty/http has already closed the connection and solr doesn't know about 
 this?

 How are you handling killer queries, just ignoring?
 Or something to tune (jetty config about timeout) or filter (query 
 filtering)?
 
 As you might know, EofException happens when one end (usually the
 client) closes the TCP connection before the response is delivered.
 This is usually caused by explicitly setting timeouts, or by using a
 load balancer in front of Solr, because these will normally limit how
 long the response can take.  The timeout involved is probably 60 seconds
 in this case, and the query took nearly 90 seconds.
 
 It doesn't cause any *direct* problems for Solr, though the nasty
 exception that gets logged every time is annoying.  A query like that
 does use a lot of resources, so if the server doesn't have a lot of
 spare capacity, it can cause problems for everyone else.
 
 Assuming that this isn't happening due to bugs in your application, the
 only way to really handle this problem is to first locate the problem
 user and educate them.  If the problem continues and it's a viable
 option, you might need to ban that user from your system.
 
 Thanks,
 Shawn

Re: Multitable import - uniqueKey

Hi,

Thanks for the replies. Actually, I had only a small confusion:

From table_1 I got key_1; using this I join into table_2. But table_2 also
gave another key key_2 which is needed for joining with table_3.

So for Table1 and Table2 its obviously just fine... but what will happen
when table3 is also added? will the 3 tables be intact in terms of
relationship?

Thanks.



On Mon, Jun 3, 2013 at 7:33 PM, Jack Krupansky j...@basetechnology.comwrote:

 If the respective table IDs are not globally unique, then you (the
 developer) will have to supplement the raw ID with a prefix or suffix or
 other form of global ID (e.g., UUID) to assure that they are unique. You
 could just add the SQL table name as a prefix or suffix.

 The bottom line: What do you WANT the Solr key field to look like? I mean,
 YOU are the data architect, right? What requirements do you have? When your
 Solr application users receive the key values in the responses to queries,
 what expectations do you expect to set for them?

 -- Jack Krupansky

 -Original Message- From: Raheel Hasan
 Sent: Monday, June 03, 2013 9:12 AM
 To: solr-user@lucene.apache.org
 Subject: Multitable import - uniqueKey


 Hi,

 I am importing multiple table (by join) into solr using DIH. All is set,
 except for 1 confusion:
 what to do with *uniqueKey* in schema?


 When I had only 1 table, I had it fine. Now how to put 2 uniqueKeys (both
 from different table).

 For example:

 uniqueKeytable1_id/**uniqueKey
 uniqueKeytable2_id/**uniqueKey

 Will this work?

 --
 Regards,
 Raheel Hasan




-- 
Regards,
Raheel Hasan

Re: /non/existent/dir/yields/warning

ok fantastic... now I will comment it to be sure thanks a lot

Regards,
Raheel


On Mon, Jun 3, 2013 at 7:27 PM, Shawn Heisey s...@elyograg.org wrote:

 On 6/3/2013 5:58 AM, Raheel Hasan wrote:
  but the path looks like it shows how to setup non existent lib warning...
  :D

 The reason for its existence is encoded in its name.  A nonexistent path
 results in a warning.  It's a way to illustrate to a novice what happens
 when you have a non-fatal misconfiguration.  The message is a warning
 and doesn't prevent Solr startup.

 Thanks,
 Shawn




-- 
Regards,
Raheel Hasan

Re: how are you handling killer queries?

On 6/3/2013 8:43 AM, Bernd Fehling wrote:
 Hi Shawn,
 well, the user is the world and the servers have enough capacity.
 So its nothing really to worry about.
 OK, could raise timeout from standard 60 to 90, 120 or even 180 seconds.
 Just wanted to know how other solr developer handle this.
 
 The technical question, where is the difference between hitting
 the stop button from the browser while a search is running and
 the timeout of http connection in my container (in my case jetty)?
 
 I guess the stop button from the browser will inform all parts involved
 whereas the timeout just leaves an open end somewhere in the container 
 (broken pipe)?
 
 And the container has no way to simulate a browser stop button in case of a 
 timeout
 to get a sane termination?

The result is probably the same, no matter how the connection gets
closed.  I've seen it mostly from my load balancer, and most often with
the layer 7 check that uses my ping handler.  It has a timeout of 5
seconds, and occasionally (usually due to garbage collection pauses) the
query will take longer than 5 seconds.  The load balancer closes the
connection with a TCP reset, which is a perfectly valid (and very fast)
way to close a TCP connection.  The exception isn't coming from unclean
closes, it's coming from ANY close.

I think that Solr shouldn't log a full stacktrace when this happens, but
I'm not sure whether Solr has any control over it, because the exception
comes from Jetty.

Thanks,
Shawn

Re: Multitable import - uniqueKey

Same answer. Whether it is 2, 3, 10 or 1000 tables, you, the data architect 
must decide how to uniquely identify Solr documents. In general, when 
joining n tables, combine the n keys into one composite key. Either do it on 
the SQL query side, or with a Solr update request processor.


-- Jack Krupansky

-Original Message- 
From: Raheel Hasan

Sent: Monday, June 03, 2013 10:44 AM
To: solr-user@lucene.apache.org
Subject: Re: Multitable import - uniqueKey

Hi,

Thanks for the replies. Actually, I had only a small confusion:


From table_1 I got key_1; using this I join into table_2. But table_2 also

gave another key key_2 which is needed for joining with table_3.

So for Table1 and Table2 its obviously just fine... but what will happen
when table3 is also added? will the 3 tables be intact in terms of
relationship?

Thanks.



On Mon, Jun 3, 2013 at 7:33 PM, Jack Krupansky 
j...@basetechnology.comwrote:



If the respective table IDs are not globally unique, then you (the
developer) will have to supplement the raw ID with a prefix or suffix or
other form of global ID (e.g., UUID) to assure that they are unique. You
could just add the SQL table name as a prefix or suffix.

The bottom line: What do you WANT the Solr key field to look like? I mean,
YOU are the data architect, right? What requirements do you have? When 
your

Solr application users receive the key values in the responses to queries,
what expectations do you expect to set for them?

-- Jack Krupansky

-Original Message- From: Raheel Hasan
Sent: Monday, June 03, 2013 9:12 AM
To: solr-user@lucene.apache.org
Subject: Multitable import - uniqueKey


Hi,

I am importing multiple table (by join) into solr using DIH. All is set,
except for 1 confusion:
what to do with *uniqueKey* in schema?


When I had only 1 table, I had it fine. Now how to put 2 uniqueKeys (both
from different table).

For example:

uniqueKeytable1_id/**uniqueKey
uniqueKeytable2_id/**uniqueKey

Will this work?

--
Regards,
Raheel Hasan





--
Regards,
Raheel Hasan

RE: Spell Checker (DirectSolrSpellChecker) correct settings

2013-06-03 Thread Dyer, James

My first guess is that no documents match the query provinical court.  
Because you have spellcheck.maxCollationTries set to a non-zero value, it 
will not return these as collations unless the correction will return hits.  
You can test my theory out by removing spellcheck.maxCollationTries from the 
request and see if it returns provinical court as expected.

If this isn't it, then give us the full query request and also the full 
spellcheck response for your failing case.

James Dyer
Ingram Content Group
(615) 213-4311


-Original Message-
From: Raheel Hasan [mailto:raheelhasan@gmail.com] 
Sent: Friday, May 31, 2013 9:38 AM
To: solr-user@lucene.apache.org
Subject: Spell Checker (DirectSolrSpellChecker) correct settings

Hi guyz, I am new to solr. Here is the thing I have:

When i search Courtt, I get correct suggestion saying:



spellcheck: {
suggestions: [
  courtt,
  {
numFound: 1,
startOffset: 0,
endOffset: 6,
suggestion: [
  court
]
  },
  collation,
  [
collationQuery,
court,
hits,
53,
misspellingsAndCorrections,
[
  courtt,
  court
]
  ]
]
  },



But when I try Provincial Courtt, it gives me no suggestions, instead it
searches for Provincial only.


Here is the spell check settings in *solrconfig.xml*:
searchComponent name=spellcheck class=solr.SpellCheckComponent

str name=queryAnalyzerFieldTypetext_en_splitting/str

!-- a spellchecker built from a field of the main index --
lst name=spellchecker
  str name=namedefault/str
  str name=classnamesolr.DirectSolrSpellChecker/str
  str name=fieldtext/str

  !-- minimum accuracy needed to be considered a valid spellcheck
suggestion --
  float name=accuracy0.5/float
  !-- Require terms to occur in 1% of documents in order to be
included in the dictionary --
  float name=thresholdTokenFrequency.01/float
  !-- the spellcheck distance measure used, the default is the
internal levenshtein --
  !--str name=distanceMeasureinternal/str--
  !-- the maximum #edits we consider when enumerating terms: can be 1
or 2 --
  int name=maxEdits1/int
  !-- the minimum number of characters the terms should share --
  int name=minPrefix3/int
  !-- maximum number of possible matches to review before returning
results --
  int name=maxInspections3/int
  !-- minimum length of a query term to be considered for correction
--
  int name=minQueryLength4/int
  !-- maximum threshold of documents a query term can appear to be
considered for correction --
  float name=maxQueryFrequency0.01/float
/lst


!-- a spellchecker that can break or combine words.  See /spell
handler below for usage --
lst name=spellchecker
  str name=namewordbreak/str
  str name=classnamesolr.WordBreakSolrSpellChecker/str
  str name=fieldtext/str
  str name=combineWordstrue/str
  str name=breakWordstrue/str
  int name=maxChanges5/int
/lst
  /searchComponent



Here is the *requestHandler*:

requestHandler name=/select class=solr.SearchHandler
 lst name=defaults
   str name=echoParamsexplicit/str
   int name=rows20/int
   str name=dftext/str

   !-- Spell checking defaults --
   str name=spellcheckon/str
   str name=spellcheck.count5/str
   str name=spellcheck.onlyMorePopulartrue/str
   str name=spellcheck.maxResultsForSuggest5/str
   str name=spellcheck.alternativeTermCount2/str
   str name=spellcheck.extendedResultsfalse/str

   str name=spellcheck.collatetrue/str
   str name=spellcheck.maxCollations3/str
   str name=spellcheck.maxCollationTries3/str
   str name=spellcheck.collateExtendedResultstrue/str
 /lst

 !-- append spellchecking to our list of components --
 arr name=last-components
   strspellcheck/str
 /arr

  /requestHandler



-- 
Regards,
Raheel Hasan

Re: how are you handling killer queries?


There are two radically distinct use cases:

1. Consumers on the open Internet. They do stupid things. Give them a very 
constrained search experience, enforced with query preprocessing. Maybe give 
them only dismax queries.
2. Professional power users. They typically have credentials for using the 
application, so if they are detected as performing long or stupid queries, 
log the details and administratively take action, such as denying them 
access (or billing them for excessive resource usage.)


-- Jack Krupansky

-Original Message- 
From: Bernd Fehling

Sent: Monday, June 03, 2013 4:39 AM
To: solr-user@lucene.apache.org
Subject: how are you handling killer queries?

How are you handling killer queries with solr?

While solr/lucene (currently 4.2.1) is trying to do its best I see sometimes 
stupid queries

in my logs, located with extremly long query time.

Example:
q=???+and+??+and+???+and++and+???+and+??

I even get hits for this (hits=34091309 status=0 QTime=88667).

But the jetty log says:
WARN:oejs.Response:Committed before 500 {msg=Datenübergabe unterbrochen
(broken pipe),trace=org.eclipse.jetty.io.EofException...
org.eclipse.jetty.http.HttpGenerator.flushBuffer(HttpGenerator.java:838)|?... 
35 more|,code=500}

WARN:oejs.ServletHandler:/solr/base/select
java.lang.IllegalStateException: Committed
   at org.eclipse.jetty.server.Response.resetBuffer(Response.java:1136)

Because I get hits and qtime the search is successful, right?

But jetty/http has already closed the connection and solr doesn't know about 
this?


How are you handling killer queries, just ignoring?
Or something to tune (jetty config about timeout) or filter (query 
filtering)?


Would be pleased to hear your comments.

Bernd

Re: Multitable import - uniqueKey

ok. But do we need it? Thats what I am confused at. should 1 key from
table_1 pull all the data in relationship as they were inserted?


On Mon, Jun 3, 2013 at 7:53 PM, Jack Krupansky j...@basetechnology.comwrote:

 Same answer. Whether it is 2, 3, 10 or 1000 tables, you, the data
 architect must decide how to uniquely identify Solr documents. In general,
 when joining n tables, combine the n keys into one composite key. Either do
 it on the SQL query side, or with a Solr update request processor.


 -- Jack Krupansky

 -Original Message- From: Raheel Hasan
 Sent: Monday, June 03, 2013 10:44 AM
 To: solr-user@lucene.apache.org
 Subject: Re: Multitable import - uniqueKey


 Hi,

 Thanks for the replies. Actually, I had only a small confusion:

 From table_1 I got key_1; using this I join into table_2. But table_2 also
 gave another key key_2 which is needed for joining with table_3.

 So for Table1 and Table2 its obviously just fine... but what will happen
 when table3 is also added? will the 3 tables be intact in terms of
 relationship?

 Thanks.



 On Mon, Jun 3, 2013 at 7:33 PM, Jack Krupansky j...@basetechnology.com**
 wrote:

  If the respective table IDs are not globally unique, then you (the
 developer) will have to supplement the raw ID with a prefix or suffix or
 other form of global ID (e.g., UUID) to assure that they are unique. You
 could just add the SQL table name as a prefix or suffix.

 The bottom line: What do you WANT the Solr key field to look like? I mean,
 YOU are the data architect, right? What requirements do you have? When
 your
 Solr application users receive the key values in the responses to queries,
 what expectations do you expect to set for them?

 -- Jack Krupansky

 -Original Message- From: Raheel Hasan
 Sent: Monday, June 03, 2013 9:12 AM
 To: solr-user@lucene.apache.org
 Subject: Multitable import - uniqueKey


 Hi,

 I am importing multiple table (by join) into solr using DIH. All is set,
 except for 1 confusion:
 what to do with *uniqueKey* in schema?


 When I had only 1 table, I had it fine. Now how to put 2 uniqueKeys (both
 from different table).

 For example:

 uniqueKeytable1_id/uniqueKey
 uniqueKeytable2_id/uniqueKey


 Will this work?

 --
 Regards,
 Raheel Hasan




 --
 Regards,
 Raheel Hasan




-- 
Regards,
Raheel Hasan

Re: Can mm (min-match) be specified by field in dismax or edismax?


No, but you can with the LucidWorks Search query parser:

f1:(cat dog fox bat fish cow)~50% f2:(cat dog fox bat fish zebra)~2

See:
http://docs.lucidworks.com/display/lweug/Minimum+Match+for+Simple+Queries

-- Jack Krupansky

-Original Message- 
From: Eric Wilson 
Sent: Monday, June 03, 2013 10:30 AM 
To: solr-user@lucene.apache.org 
Subject: Can mm (min-match) be specified by field in dismax or edismax? 


I would like to have the min-match set differently for different fields in
my dismax handler. Is this possible?

Re: Can mm (min-match) be specified by field in dismax or edismax?

2013-06-03 Thread Jason Hellman

Well, there is a hack(ish) way to do it:

_query_:{!type=edismax qf='someField' v='$q' mm=100%}

This is clearly not a solrconfig.xml settings, but part of your query string 
using LocalParam behavior.

This is going to get really messy if you have plenty of fields you'd like to 
search, where you'd need a similar construct for each.  I cannot attest to 
performance at scale with such a construct…but just showing a way you can go 
about this if you feel compelled enough to do so.

Jason

On Jun 3, 2013, at 8:08 AM, Jack Krupansky j...@basetechnology.com wrote:

 No, but you can with the LucidWorks Search query parser:
 
 f1:(cat dog fox bat fish cow)~50% f2:(cat dog fox bat fish zebra)~2
 
 See:
 http://docs.lucidworks.com/display/lweug/Minimum+Match+for+Simple+Queries
 
 -- Jack Krupansky
 
 -Original Message- From: Eric Wilson Sent: Monday, June 03, 2013 
 10:30 AM To: solr-user@lucene.apache.org Subject: Can mm (min-match) be 
 specified by field in dismax or edismax? 
 I would like to have the min-match set differently for different fields in
 my dismax handler. Is this possible?

Re: updating docs in solr cloud hangs

2013-06-03 Thread Yago Riveiro

Hi,

My cluster hangs again running an update process, the HTTP POST request was 
aborted because a timeout error. After the hang,  I couldn't do more updates 
without restart the cluster.

I could see this error on node's log after kill it. Is like if solr waits for 
the update response forever … and no more operations can be handle until this 
one finish.

[qtp301150411-1248] ERROR org.apache.solr.core.SolrCore  – 
org.apache.solr.common.SolrException: interrupted waiting for shard update 
response
at 
org.apache.solr.update.SolrCmdDistributor.checkResponses(SolrCmdDistributor.java:429)
at org.apache.solr.update.SolrCmdDistributor.finish(SolrCmdDistributor.java:99)
at 
org.apache.solr.update.processor.DistributedUpdateProcessor.doFinish(DistributedUpdateProcessor.java:447)
at 
org.apache.solr.update.processor.DistributedUpdateProcessor.finish(DistributedUpdateProcessor.java:1140)
at 
org.apache.solr.update.processor.LogUpdateProcessor.finish(LogUpdateProcessorFactory.java:179)
at 
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:83)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1816)
at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:656)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:359)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:155)
at 
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1307)
at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:453)
at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)
at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:560)
at 
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
at 
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1072)
at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:382)
at 
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)
at 
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1006)
at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
at 
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)
at 
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154)
at 
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)
at org.eclipse.jetty.server.Server.handle(Server.java:365)
at 
org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:485)
at 
org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53)
at 
org.eclipse.jetty.server.AbstractHttpConnection.content(AbstractHttpConnection.java:937)
at 
org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.content(AbstractHttpConnection.java:998)
at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:856)
at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:240)
at 
org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72)
at 
org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264)
at 
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
at 
org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)
at java.lang.Thread.run(Unknown Source)
Caused by: java.lang.InterruptedException: sleep interrupted
at java.lang.Thread.sleep(Native Method)
at 
org.apache.solr.update.SolrCmdDistributor.checkResponses(SolrCmdDistributor.java:408)
... 35 more

--  
Yago Riveiro
Sent with Sparrow (http://www.sparrowmailapp.com/?sig)


On Monday, June 3, 2013 at 2:18 AM, Erick Erickson wrote:

 Did you take a stack trace of your _server_ and see if the
 fragment I posted is the place a bunch of threads are
 stuck? If so, then it's what I mentioned, and the patch
 I pointed to should fix it up (when it's ready)...
  
 The fact that it hangs more frequently with replication  1
 is consistent with the JIRA.
  
 Shawn:
  
 Thanks, you beat me to the punch for clarifying replication!
  
 Best
 Erick
  
 On Sun, Jun 2, 2013 at 12:41 PM, Yago Riveiro yago.rive...@gmail.com 
 (mailto:yago.rive...@gmail.com) wrote:
  Shawn:
   
  replicationFactor higher than one yes.
   
  --
  Yago Riveiro
  Sent with Sparrow (http://www.sparrowmailapp.com/?sig)
   
   
  On Sunday, June 2, 2013 at 4:07 PM, Shawn Heisey wrote:
   
   On 6/2/2013 8:28 AM, Yago Riveiro wrote:
Erick:
 
In my case, when server hangs, no exception is thrown, the logs on both 
servers stop registering the update INFO messages. if a shutdown one 
node, immediately the log of the alive node register some update INFO 
messages that appears was stuck

RE: Spell Checker (DirectSolrSpellChecker) correct settings

2013-06-03 Thread Dyer, James

For each fot he 4 cases listed below, can you give your query request string 
(q=...fq=...qt=...etc) and also the spellchecker output?

James Dyer
Ingram Content Group
(615) 213-4311


-Original Message-
From: Raheel Hasan [mailto:raheelhasan@gmail.com] 
Sent: Monday, June 03, 2013 10:22 AM
To: solr-user@lucene.apache.org
Subject: Re: Spell Checker (DirectSolrSpellChecker) correct settings

Hi, thanks a lot for the reply.

Actually, Provincial Courtt is mentioned in many documents (sorry about
the type earlier).

Secondly, I tried your idea, but not much of help. The issue is very
microscopic:

1) When I search for Provinciaal Courtt = it only suggests `str name=
courttcourt/str` and not Provincial
2) Search for Provincial Courtt = returns result for 'Provincial' keyword
and no suggestion for 'court'.
3) Search for Provinciaal Court = no suggestion; instead searches for
court and returns result.
4) Search for Provinciall Courtt = correct suggestions..






On Mon, Jun 3, 2013 at 7:55 PM, Dyer, James james.d...@ingramcontent.comwrote:

 My first guess is that no documents match the query provinical court.
  Because you have spellcheck.maxCollationTries set to a non-zero value,
 it will not return these as collations unless the correction will return
 hits.  You can test my theory out by removing
 spellcheck.maxCollationTries from the request and see if it returns
 provinical court as expected.

 If this isn't it, then give us the full query request and also the full
 spellcheck response for your failing case.

 James Dyer
 Ingram Content Group
 (615) 213-4311


 -Original Message-
 From: Raheel Hasan [mailto:raheelhasan@gmail.com]
 Sent: Friday, May 31, 2013 9:38 AM
 To: solr-user@lucene.apache.org
 Subject: Spell Checker (DirectSolrSpellChecker) correct settings

 Hi guyz, I am new to solr. Here is the thing I have:

 When i search Courtt, I get correct suggestion saying:

 

 spellcheck: {
 suggestions: [
   courtt,
   {
 numFound: 1,
 startOffset: 0,
 endOffset: 6,
 suggestion: [
   court
 ]
   },
   collation,
   [
 collationQuery,
 court,
 hits,
 53,
 misspellingsAndCorrections,
 [
   courtt,
   court
 ]
   ]
 ]
   },

 

 But when I try Provincial Courtt, it gives me no suggestions, instead it
 searches for Provincial only.


 Here is the spell check settings in *solrconfig.xml*:
 searchComponent name=spellcheck class=solr.SpellCheckComponent

 str name=queryAnalyzerFieldTypetext_en_splitting/str

 !-- a spellchecker built from a field of the main index --
 lst name=spellchecker
   str name=namedefault/str
   str name=classnamesolr.DirectSolrSpellChecker/str
   str name=fieldtext/str

   !-- minimum accuracy needed to be considered a valid spellcheck
 suggestion --
   float name=accuracy0.5/float
   !-- Require terms to occur in 1% of documents in order to be
 included in the dictionary --
   float name=thresholdTokenFrequency.01/float
   !-- the spellcheck distance measure used, the default is the
 internal levenshtein --
   !--str name=distanceMeasureinternal/str--
   !-- the maximum #edits we consider when enumerating terms: can be 1
 or 2 --
   int name=maxEdits1/int
   !-- the minimum number of characters the terms should share --
   int name=minPrefix3/int
   !-- maximum number of possible matches to review before returning
 results --
   int name=maxInspections3/int
   !-- minimum length of a query term to be considered for correction
 --
   int name=minQueryLength4/int
   !-- maximum threshold of documents a query term can appear to be
 considered for correction --
   float name=maxQueryFrequency0.01/float
 /lst


 !-- a spellchecker that can break or combine words.  See /spell
 handler below for usage --
 lst name=spellchecker
   str name=namewordbreak/str
   str name=classnamesolr.WordBreakSolrSpellChecker/str
   str name=fieldtext/str
   str name=combineWordstrue/str
   str name=breakWordstrue/str
   int name=maxChanges5/int
 /lst
   /searchComponent

 

 Here is the *requestHandler*:

 requestHandler name=/select class=solr.SearchHandler
  lst name=defaults
str name=echoParamsexplicit/str
int name=rows20/int
str name=dftext/str

!-- Spell checking defaults --
str name=spellcheckon/str
str name=spellcheck.count5/str
str name=spellcheck.onlyMorePopulartrue/str
str name=spellcheck.maxResultsForSuggest5/str
str name=spellcheck.alternativeTermCount2/str
str name=spellcheck.extendedResultsfalse/str

str name=spellcheck.collatetrue/str
str

Re: how are you handling killer queries?

2013-06-03 Thread Roman Chyla

I think you should take a look at the TimeLimitingCollector (it is used
also inside SolrIndexSearcher).
My understanding is that it will stop your server from consuming
unnecessary resources.

--roman


On Mon, Jun 3, 2013 at 4:39 AM, Bernd Fehling 
bernd.fehl...@uni-bielefeld.de wrote:

 How are you handling killer queries with solr?

 While solr/lucene (currently 4.2.1) is trying to do its best I see
 sometimes stupid queries
 in my logs, located with extremly long query time.

 Example:
 q=???+and+??+and+???+and++and+???+and+??

 I even get hits for this (hits=34091309 status=0 QTime=88667).

 But the jetty log says:
 WARN:oejs.Response:Committed before 500 {msg=Datenübergabe unterbrochen
  (broken pipe),trace=org.eclipse.jetty.io.EofException...
 org.eclipse.jetty.http.HttpGenerator.flushBuffer(HttpGenerator.java:838)|?...
 35 more|,code=500}
 WARN:oejs.ServletHandler:/solr/base/select
 java.lang.IllegalStateException: Committed
 at
 org.eclipse.jetty.server.Response.resetBuffer(Response.java:1136)

 Because I get hits and qtime the search is successful, right?

 But jetty/http has already closed the connection and solr doesn't know
 about this?

 How are you handling killer queries, just ignoring?
 Or something to tune (jetty config about timeout) or filter (query
 filtering)?

 Would be pleased to hear your comments.

 Bernd

Solr: separating index and storage

2013-06-03 Thread Sourajit Basak

Consider the following use case.

Certain words are extracted from a document and indexed. The exact sentence
containing the word cannot be stored alongside the extracted word because
of the volume at which the documents grow; How can the index and, lets call
it doc servers be separated ?

An option is to store the sentences in MongoDB or a RDBMS. But there seems
to be a schema level design issue. Assuming 'word' to be a multivalued
field, how do we associate to it a reference to the corresponding entry in
the doc server.

May create (word_1, ref_1) tuples. Is there any other in-built feature ?

Any related project which separates index  doc servers ?

Thanks,
Sourajit

Re: Solr query performance tool

2013-06-03 Thread bbarani

You can use this tool to analyze the logs..

https://github.com/dfdeshom/solr-loganalyzer

We use solrmeter to test the performance / Stress testing.

https://code.google.com/p/solrmeter/

 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-query-performance-tool-tp4066900p4067869.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: how are you handling killer queries?


There is the timeAllowed parameter:

http://wiki.apache.org/solr/CommonQueryParameters#timeAllowed

-- Jack Krupansky

-Original Message- 
From: Roman Chyla

Sent: Monday, June 03, 2013 11:53 AM
To: solr-user@lucene.apache.org
Subject: Re: how are you handling killer queries?

I think you should take a look at the TimeLimitingCollector (it is used
also inside SolrIndexSearcher).
My understanding is that it will stop your server from consuming
unnecessary resources.

--roman


On Mon, Jun 3, 2013 at 4:39 AM, Bernd Fehling 
bernd.fehl...@uni-bielefeld.de wrote:


How are you handling killer queries with solr?

While solr/lucene (currently 4.2.1) is trying to do its best I see
sometimes stupid queries
in my logs, located with extremly long query time.

Example:
q=???+and+??+and+???+and++and+???+and+??

I even get hits for this (hits=34091309 status=0 QTime=88667).

But the jetty log says:
WARN:oejs.Response:Committed before 500 {msg=Datenübergabe unterbrochen
 (broken pipe),trace=org.eclipse.jetty.io.EofException...
org.eclipse.jetty.http.HttpGenerator.flushBuffer(HttpGenerator.java:838)|?...
35 more|,code=500}
WARN:oejs.ServletHandler:/solr/base/select
java.lang.IllegalStateException: Committed
at
org.eclipse.jetty.server.Response.resetBuffer(Response.java:1136)

Because I get hits and qtime the search is successful, right?

But jetty/http has already closed the connection and solr doesn't know
about this?

How are you handling killer queries, just ignoring?
Or something to tune (jetty config about timeout) or filter (query
filtering)?

Would be pleased to hear your comments.

Bernd

Saravanan Chinnadurai/Actionimages is out of the office.

2013-06-03 Thread Saravanan . Chinnadurai

I will be out of the office starting  03/06/2013 and will not return until
04/06/2013.

Please email to itsta...@actionimages.com  for any urgent issues.


Action Images is a division of Reuters Limited and your data will therefore be 
protected
in accordance with the Reuters Group Privacy / Data Protection notice which is 
available
in the privacy footer at www.reuters.com
Registered in England No. 145516   VAT REG: 397000555

Solr 4.2.1 higher memory footprint vs Solr 3.5

2013-06-03 Thread SandeepM

Hi,

Using the same schema for both Solr 3.5 and Solr 4.2.1 and posting the same
data to both these server,  and the memory requirements seem to have gone up
sharply during request handling.
. Requests come in at around 200QPS.
. Document sizes are very large but that did not seem to be a problem with
3.5 (Lots of multivalued fields with large array lengths.)
Could you help me understand what change in SOLR 4.2.1 would attribute to
this higher memory requirement?

Also, in a different test, I ran a query to just get a list of all unique
ID's via a single query and no load and I see it complete in 500ms however
the time it takes to ship the data back to the client seems to be very
large.  Any idea what could be causing this behavior?

Would appreciate any help.

Regards,
-- Sandeep



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-4-2-1-higher-memory-footprint-vs-Solr-3-5-tp4067879.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Can mm (min-match) be specified by field in dismax or edismax?

Also, just to be clear, MM/minMatch, is not an option for a field but for 
a full BooleanQuery. I mean, you can't have two different MM values within 
the same BooleanQuery, except with nested BooleanQuerys, where each BQ has 
its own MM.


-- Jack Krupansky

-Original Message- 
From: Jason Hellman

Sent: Monday, June 03, 2013 11:40 AM
To: solr-user@lucene.apache.org
Subject: Re: Can mm (min-match) be specified by field in dismax or edismax?

Well, there is a hack(ish) way to do it:

_query_:{!type=edismax qf='someField' v='$q' mm=100%}

This is clearly not a solrconfig.xml settings, but part of your query string 
using LocalParam behavior.


This is going to get really messy if you have plenty of fields you'd like to 
search, where you'd need a similar construct for each.  I cannot attest to 
performance at scale with such a construct…but just showing a way you can go 
about this if you feel compelled enough to do so.


Jason

On Jun 3, 2013, at 8:08 AM, Jack Krupansky j...@basetechnology.com wrote:


No, but you can with the LucidWorks Search query parser:

f1:(cat dog fox bat fish cow)~50% f2:(cat dog fox bat fish zebra)~2

See:
http://docs.lucidworks.com/display/lweug/Minimum+Match+for+Simple+Queries

-- Jack Krupansky

-Original Message- From: Eric Wilson Sent: Monday, June 03, 2013 
10:30 AM To: solr-user@lucene.apache.org Subject: Can mm (min-match) be 
specified by field in dismax or edismax?

I would like to have the min-match set differently for different fields in
my dismax handler. Is this possible?

Re: Disable all caches in solr

2013-06-03 Thread bbarani

You can also check out this link.

http://lucene.472066.n3.nabble.com/Is-there-a-way-to-remove-caches-in-SOLR-td4061216.html#a4061219





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Disable-all-caches-in-solr-tp4066517p4067870.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr + Groovy

2013-06-03 Thread Achim Domma

Looks interesting, but it's just for the UpdateHandler. Right? Does a similar 
handler for searching already exist?

Achim

Am 03.06.2013 um 17:22 schrieb Jack Krupansky:

 Check out the support for external scripting of update request processors:
 
 http://lucene.apache.org/solr/4_3_0/solr-core/org/apache/solr/update/processor/StatelessScriptUpdateProcessorFactory.html
 
 Are there any of your requirements that that doesn't address?
 
 -- Jack Krupansky
 
 -Original Message- From: Achim Domma
 Sent: Monday, June 03, 2013 3:07 AM
 To: solr-user@lucene.apache.org
 Subject: Solr + Groovy
 
 Hi,
 
 I have some query building and result processing code, which is currently 
 running as normal Solr client outside of Solr. I think it would make a lot 
 of sense to move parts of this code into a custom SearchHandler or 
 SearchComponent. Because I'm not a big fan of the Java language, I would like 
 to use Groovy.
 
 Searching the web I got the impression that Solr + alternative JVM 
 languages is not a very common topic. So before starting my project, I would 
 like to know: Is there a well known good reason not to use Groovy (or 
 Clojure, Scala, ...) for implementing custom Solr code?
 
 kind regards,
 Achim=

Re: Solr + Groovy

Sorry about that. Unfortunately, scripting is only on the update side. But I 
imagine athat a lot of the logic could be repurposed for the query side.


-- Jack Krupansky

-Original Message- 
From: Achim Domma

Sent: Monday, June 03, 2013 2:31 PM
To: solr-user@lucene.apache.org
Subject: Re: Solr + Groovy

Looks interesting, but it's just for the UpdateHandler. Right? Does a 
similar handler for searching already exist?


Achim

Am 03.06.2013 um 17:22 schrieb Jack Krupansky:


Check out the support for external scripting of update request processors:

http://lucene.apache.org/solr/4_3_0/solr-core/org/apache/solr/update/processor/StatelessScriptUpdateProcessorFactory.html

Are there any of your requirements that that doesn't address?

-- Jack Krupansky

-Original Message- From: Achim Domma
Sent: Monday, June 03, 2013 3:07 AM
To: solr-user@lucene.apache.org
Subject: Solr + Groovy

Hi,

I have some query building and result processing code, which is currently 
running as normal Solr client outside of Solr. I think it would make a 
lot of sense to move parts of this code into a custom SearchHandler or 
SearchComponent. Because I'm not a big fan of the Java language, I would 
like to use Groovy.


Searching the web I got the impression that Solr + alternative JVM 
languages is not a very common topic. So before starting my project, I 
would like to know: Is there a well known good reason not to use Groovy 
(or Clojure, Scala, ...) for implementing custom Solr code?


kind regards,
Achim=

Re: Solr + Groovy

2013-06-03 Thread Erik Hatcher

Yeah, it's currently just for the update side of things. But this issue is
open https://issues.apache.org/jira/browse/SOLR-3669 and assigned to me, for
one of these days. I set it for my 5.0 radar. Certainly anyone that wants to
make this happen sooner than I maybe will possibly hopefully one week will
delve into, go for it!

Erik

p.s. [infomercial] We do have update-side scripting (JavaScript) and business
rules (via Drools) capabilities in our LucidWorks Search platform*
http://www.lucidworks.com/products/lucidworks-search with the update-side
scripting running in the connector framework by design rather than on the Solr
side of things to allow it to scale in a separate tier.

On Jun 3, 2013, at 14:31 , Achim Domma wrote:

Looks interesting, but it's just for the UpdateHandler. Right? Does a similar
handler for searching already exist?

Achim

Am 03.06.2013 um 17:22 schrieb Jack Krupansky:

Check out the support for external scripting of update request processors:

http://lucene.apache.org/solr/4_3_0/solr-core/org/apache/solr/update/processor/StatelessScriptUpdateProcessorFactory.html

Are there any of your requirements that that doesn't address?

-- Jack Krupansky

-Original Message- From: Achim Domma
Sent: Monday, June 03, 2013 3:07 AM
To: solr-user@lucene.apache.org
Subject: Solr + Groovy

Hi,

I have some query building and result processing code, which is currently
running as normal Solr client outside of Solr. I think it would make a lot
of sense to move parts of this code into a custom SearchHandler or
SearchComponent. Because I'm not a big fan of the Java language, I would
like to use Groovy.

Searching the web I got the impression that Solr + alternative JVM
languages is not a very common topic. So before starting my project, I
would like to know: Is there a well known good reason not to use Groovy (or
Clojure, Scala, ...) for implementing custom Solr code?

kind regards,
Achim=

Re: Dynamic Indexing using DB and DIH


On 6/3/2013 12:35 PM, PeriS wrote:

I noticed the delta-import is creating a new indexed entry on top of the 
existing one..is that normal?


Not sure what you are asking here, so I'll give an answer to the 
question I think you're asking:  If you have a uniqueKey defined in your 
schema, then new documents with matching values in the uniqueKey field 
will replace the existing documents.  Solr will delete the old one 
before inserting the new one.


Thanks,
Shawn

Re: Dynamic Indexing using DB and DIH

2013-06-03 Thread PeriS

Shawn,

You got the point; I do have a the unique key defined, but for some reason, 
when i run the delta-import; a new entry is created for the same record with a 
new unique key. Its almost somehow it doesn't detect the existing record. 

On Jun 3, 2013, at 3:51 PM, Shawn Heisey s...@elyograg.org wrote:

 On 6/3/2013 12:35 PM, PeriS wrote:
 I noticed the delta-import is creating a new indexed entry on top of the 
 existing one..is that normal?
 
 Not sure what you are asking here, so I'll give an answer to the question I 
 think you're asking:  If you have a uniqueKey defined in your schema, then 
 new documents with matching values in the uniqueKey field will replace the 
 existing documents.  Solr will delete the old one before inserting the new 
 one.
 
 Thanks,
 Shawn

Re: Custom Response Handler

2013-06-03 Thread vibhoreng04

Hi Erik,

In my case I have to calculate a custom value depending on the retrieved
candidates .This will be for each document.So my choice will be Doc
Transformer.
Lets say in this case if I need to include a java class which does the
computation , how does I tie that with Doc transformer.

Solr wiki (http://wiki.apache.org/solr/DocTransformers) talks about the
Custom Transformers but does not include an example.

Please help.

Regards,
Vibhor Jaiswal



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Custom-Response-Handler-tp4067558p4067923.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Custom Response Handler

2013-06-03 Thread bbarani

You can refer this post to use doctransforemers..

http://java.dzone.com/news/solr-40-doctransformers-first





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Custom-Response-Handler-tp4067558p4067926.html
Sent from the Solr - User mailing list archive at Nabble.com.

Inconsistent Full import document index counts.

2013-06-03 Thread chris . donaher


Hello All,
 
I've been working on a 2-shard SolrCloud instance with several million 
documents, and the import process has recently begun to miss documents as they 
are added to the underlying Postgres database. There are no glaring failures in 
the log files (all SEVERE and WARNING level errors in the log are from 
malformed queries). To ensure that it is not an issue with my delta-import 
query, I've tried running full imports to no avail. Strangely, when I modify my 
data-import query to only search for a specific id that is missed in the 
full-import, all of the relevant documents are indexed. Any ideas for possible 
causes of missed document imports in long-running full-imports?
 
Thanks,
Chris Donaher

RE: Solr query performance tool

2013-06-03 Thread Greg Harris


You have to be careful looking at the QTime's. They do not include garbage 
collection. I've run into issues where QTime is short (cause it was), it just 
happened that the query came in during a long garbage collection where 
everything was paused. So you can get into situations where once the 15 second 
GC is done everything performs as expected! I'd make sure and have an external 
querying tool and you can monitor GC times as well via JMX.



From: bbarani [bbar...@gmail.com]
Sent: Monday, June 03, 2013 8:58 AM
To: solr-user@lucene.apache.org
Subject: Re: Solr query performance tool

You can use this tool to analyze the logs..

https://github.com/dfdeshom/solr-loganalyzer

We use solrmeter to test the performance / Stress testing.

https://code.google.com/p/solrmeter/





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-query-performance-tool-tp4066900p4067869.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr query performance tool