solr suggester not working with shards

2014-10-08 Thread rsi...@ambrac.nl
I try to use the suggest component (solr 4.6) with multiple cores. 
I added a search component and a request handler in my solrconfig. 
That works fine for 1 core but querying my solr instance with the shards 
parameter does not work. 


searchComponent class=solr.SpellCheckComponent name=suggest
lst name=spellchecker
  str name=namesuggestDictionary/str
  str name=classnameorg.apache.solr.spelling.suggest.Suggester/str
  str
name=lookupImplorg.apache.solr.spelling.suggest.fst.FSTLookupFactory/str
  str name=fieldsuggest/str
  float name=threshold0.0005/float
  str name=buildOnCommittrue/str
/lst
  /searchComponent
  requestHandler name=/suggest
class=org.apache.solr.handler.component.SearchHandler
lst name=defaults
  str name=echoParamsnone/str
  str name=wtxml/str
  str name=indentfalse/str  
  str name=spellchecktrue/str
  str name=spellcheck.dictionarysuggestDictionary/str
  str name=spellcheck.onlyMorePopulartrue/str
  str name=spellcheck.count5/str
  str name=spellcheck.collatefalse/str
  str name=qt/suggest/str
  str name=shards.qt/suggest/str
 str
name=shardslocalhost:8080/cores/core1,localhost:8080/cores/core2/str
  bool name=distribfalse/bool
/lst
arr name=components
  strsuggest/str
/arr
  shardHandlerFactory class=HttpShardHandlerFactory
  int name=socketTimeOut1000/int
  int name=connTimeOut5000/int
/shardHandlerFactory
  /requestHandler



--
View this message in context: 
http://lucene.472066.n3.nabble.com/solr-suggester-not-working-with-shards-tp4163261.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Search multiple values with wildcards

2014-10-08 Thread J'roo
Hi Jack, Ahmet,

Thanks for your tips! 

In the end I found this the best way to do it:

q=proprietaryMessage_tis:(25++23456*++32A++130202US*)

All the best



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Search-multiple-values-with-wildcards-tp4161916p4163263.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: solr suggester not working with shards

2014-10-08 Thread rsi...@ambrac.nl
One more thing :

suggest is not working  with multiple cores using  shard but  'did you mean'
(spell check ) is working fine with multiple cores.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/solr-suggester-not-working-with-shards-tp4163261p4163265.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: eDisMax parser and special characters

2014-10-08 Thread Aman Tandon
Hi,

It seems me like there is difference in tokens generated during query and
indexing time, you can tell us the your field type and the analyzers you
are using to index that field.

With Regards
Aman Tandon

On Wed, Oct 8, 2014 at 11:09 AM, Lanke,Aniruddha aniruddha.la...@cerner.com
 wrote:

 We are using a eDisMax parser in our configuration. When we search using
 the query term that has a ‘-‘ we don’t get any results back.

 Search term: red - yellow
 This doesn’t return any data back but

 Search term: red yellow
 Will give back result ‘red - yellow’

 How does eDisMax treat special characters?
 What tweaks do we need to do, so when a user enters a ‘-‘ in the query
 e.g. red - yellow, we
 get the appropriate result back?

 Thanks,

 CONFIDENTIALITY NOTICE This message and any included attachments are from
 Cerner Corporation and are intended only for the addressee. The information
 contained in this message is confidential and may constitute inside or
 non-public information under international, federal, or state securities
 laws. Unauthorized forwarding, printing, copying, distribution, or use of
 such information is strictly prohibited and may be unlawful. If you are not
 the addressee, please promptly delete this message and notify the sender of
 the delivery error by e-mail or you may call Cerner's corporate offices in
 Kansas City, Missouri, U.S.A at (+1) (816)221-1024.



Re: dismax query does not match with additional field in qf

2014-10-08 Thread Andreas Hubold
The query is not from a real use-case. We used it to test edge cases. I 
just asked to better understand the parser as its behavior did not match 
my expectations.


Anyway, one use-case I can think of is a free search field for end-users 
where they can search in both ID and text fields including phrases - 
without specifying whether their query is an ID or full-text. Users 
typically just expect the right thing to happen. So application 
developers have to be aware of such effects. Maybe the newer simple 
query parser would be a better fit for us.


There were also some good comments in SOLR-6602, especially a link to 
SOLR-3085 which describes a more realistic case with stopword removal.


Thanks everybody!

Regards,
Andreas

Jack Krupansky wrote on 10/07/2014 06:16 PM:
Your query term seems particularly inappropriate for dismax - think 
simple keyword queries.


Also, don't confuse dismax and edismax - maybe you want the latter. 
The former is for... simple keyword queries.


I'm still not sure what your actual use case really is. In particular, 
are you trying to do a full, exact match on the string field, or a 
substring match? You can do the latter with wildcards or regex, but 
normally the former (exact match) is used.


Maybe simply enclosing the complex term in quotes to make it a phrase 
query is what you need - that would do an exact match on the string 
field, but a tokenized phrase match on the text field, and support 
partial matches on the text field as a phrase of contiguous terms.


-- Jack Krupansky

-Original Message- From: Andreas Hubold
Sent: Tuesday, October 7, 2014 12:08 PM
To: solr-user@lucene.apache.org
Subject: Re: dismax query does not match with additional field in qf

Okay, sounds reasonable. However I didn't expect this when reading the
documentation of the dismax query parser.

Especially the need to escape special characters (and which ones) was
not clear to me as the dismax query parser is designed to process
simple phrases (without complex syntax) entered by users and special
characters (except AND and OR) are escaped by the parser - as written
on 
https://cwiki.apache.org/confluence/display/solr/The+DisMax+Query+Parser


Do you know if the new Simple Query Parser has the same behaviour when
searching across multiple fields? Or could it be used instead to search
across text_general and string fields of arbitrary content without
additional query preprocessing to get results for matches in any of
these fields (as in field1:STUFF OR field2:STUFF).

Thank you,
Andreas

Jack Krupansky wrote on 10/07/2014 05:24 PM:
I think what is happening is that your last term, the naked 
apostrophe is analyzing to zero terms and simply being ignored, but 
when you add the extra field, a string field, you now have another 
term in the query, and you have mm set to 100%, so that new term 
must match. It probably fails because you have no naked apostrophe 
term in that field in the index.


Probably none of your string field terms were matching before, but 
that wasn't apparent since the tokenized text matched. But with this 
naked apostrophe term, there is no way to tell Lucene to match no 
term, so it requried the string term to match, which won't happen 
since only the full string is indexed.


Generally, you need to escape all special characters in a query. Then 
hopefully your string field will match.


-- Jack Krupansky

-Original Message- From: Andreas Hubold
Sent: Tuesday, September 30, 2014 11:14 AM
To: solr-user@lucene.apache.org
Subject: dismax query does not match with additional field in qf

Hi,

I ran into a problem with the Solr dismax query parser. We're using Solr
4.10.0 and the field types mentioned below are taken from the example
schema.xml.

In a test we have a document with rather strange content in a field
named name_tokenized of type text_general:

abc_iframe src='loadLocale.js' 
onload='javascript:document.XSSed=name' width=0 height=0


(It's a test for XSS bug detection, but that doesn't matter here.)

I can find the document when I use the following dismax query with qf
set to field name_tokenized only:

http://localhost:44080/solr/studio/editor?deftype=dismaxq=abc_%3Ciframe+src%3D%27loadLocale.js%27+onload%3D%27javascript%3Adocument.XSSed%3D%22name%22%27debug=trueechoParams=allqf=name_tokenized^2 



If I submit exactly the same query but add another field feederstate
to the qf parameter, I don't get any results anymore. The field is of
type string.

http://localhost:44080/solr/studio/editor?deftype=dismaxq=abc_%3Ciframe+src%3D%27loadLocale.js%27+onload%3D%27javascript%3Adocument.XSSed%3D%22name%22%27debug=trueechoParams=allqf=name_tokenized^2%20feederstate 



The decoded value of q is: abc_iframe src='loadLocale.js'
onload='javascript:document.XSSed=name' and it seems the trailing
single-quote causes problems here. (In fact, I can find the document
when I remove the last char)
The parsed query for the latter case is

(
  +((

RE: Solr configuration, memory usage and MMapDirectory

2014-10-08 Thread Simon Fairey
Hi

I'm currently setting up jconsole but as I have to remotely monitor (no gui 
capability on the server) I have to wait before I can restart solr with a JMX 
port setup. In the meantime I looked at top and given the calculations you said 
based on your top output and this top of my java process from the node that 
handles the querying, the indexing node has a similar memory profile:

https://www.dropbox.com/s/pz85dm4e7qpepco/SolrTop.png?dl=0

It would seem I need a monstrously large heap in the 60GB region?

We do use a lot of navigators/filters so I have set the caches to be quite 
large for these, are these what are using up the memory?

Thanks

Si

-Original Message-
From: Shawn Heisey [mailto:apa...@elyograg.org] 
Sent: 06 October 2014 16:56
To: solr-user@lucene.apache.org
Subject: Re: Solr configuration, memory usage and MMapDirectory

On 10/6/2014 9:24 AM, Simon Fairey wrote:
 I've inherited a Solr config and am doing some sanity checks before 
 making some updates, I'm concerned about the memory settings.

 System has 1 index in 2 shards split across 2 Ubuntu 64 bit nodes, 
 each node has 32 CPU cores and 132GB RAM, we index around 500k files a 
 day spread out over the day in batches every 10 minutes, a portion of 
 these are updates to existing content, maybe 5-10%. Currently 
 MergeFactor is set to 2 and commit settings are:

 autoCommit

 maxTime6/maxTime

 openSearcherfalse/openSearcher

 /autoCommit

 autoSoftCommit

 maxTime90/maxTime

 /autoSoftCommit

 Currently each node has around 25M docs in with an index size of 45GB, 
 we prune the data every few weeks so it never gets much above 35M docs 
 per node.

 On reading I've seen a recommendation that we should be using 
 MMapDirectory, currently it's set to NRTCachingDirectoryFactory.
 However currently the JVM is configured with -Xmx131072m, and for 
 MMapDirectory I've read you should use less memory for the JVM so 
 there is more available for the OS caching.

 Looking at the dashboard in the JVM memory usage I see:

 enter image description here

 Not sure I understand the 3 bands, assume 127.81 is Max, dark grey is 
 in use at the moment and the light grey is allocated as it was used 
 previously but not been cleaned up yet?

 I'm trying to understand if this will help me know how much would be a 
 good value to change Xmx to, i.e. say 64GB based on light grey?

 Additionally once I've changed the max heap size is it a simple case 
 of changing the config to use MMapDirectory or are there things i need 
 to watch out for?


NRTCachingDirectoryFactory is a wrapper directory implementation. The wrapped 
Directory implementation is used with some code between that implementation and 
the consumer (Solr in this case) that does caching for NRT indexing.  The 
wrapped implementation is MMapDirectory, so you do not need to switch, you ARE 
using MMap.

Attachments rarely make it to the list, and that has happened in this case, so 
I cannot see any of your pictures.  Instead, look at one of mine, and the 
output of a command from the same machine, running Solr
4.7.2 with Oracle Java 7:

https://www.dropbox.com/s/91uqlrnfghr2heo/solr-memory-sorted-top.png?dl=0

[root@idxa1 ~]# du -sh /index/solr4/data/
64G /index/solr4/data/

I've got 64GB of index data on this machine, used by about 56 million 
documents.  I've also got 64GB of RAM.  The solr process shows a virtual memory 
size of 54GB, a resident size of 16GB, and a shared size of 11GB.  My max heap 
on this process is 6GB.  If you deduct the shared memory size from the resident 
size, you get about 5GB.  The admin dashboard for this machine says the current 
max heap size is 5.75GB, so that 5GB is pretty close to that, and probably 
matches up really well when you consider that the resident size may be 
considerably more than 16GB and the shared size may be just barely over 11GB.

My system has well over 9GB free memory and 44GB is being used for the OS disk 
cache.  This system is NOT facing memory pressure.  The index is well-cached 
and there is even memory that is not used *at all*.

With an index size of 45GB and 132GB of RAM, you're unlikely to be having 
problems with memory unless your heap size is *ENORMOUS*.  You
*should* have your garbage collection highly tuned, especially if your max heap 
larger than 2 or 3GB.  I would guess that a 4 to 6GB heap is probably enough 
for your needs, unless you're doing a lot with facets, sorting, or Solr's 
caches, then you may need more.  Here's some info about heap requirements, 
followed by information about garbage collection tuning:

http://wiki.apache.org/solr/SolrPerformanceProblems#Java_Heap

Your automatic commit settings do not raise any red flags with me. 
Those are sensible settings.

Thanks,
Shawn




How to link tables based on range values solr data-config

2014-10-08 Thread madhav bahuguna
Hi ,

 Businessmasters
 Business_id  Business_point
   13.4
   22.8
   38.0

Business_Colors
business_colors_id   business_rating_from business_rating_to rating
   1 2   5  OK
   2 5   10 GOOD
   310   15
Excellent

I want to link the two tables based business_rating_from and
business_rating_to like

SELECT business_colors_id,business_rating_from,business_rating_to,rating
where  business_rating_from = 2 AND business_rating_to  5;

Now i want to index them into solr.This is how my data-config file looks

 entity name=business_colors query=SELECT business_colors_id,
business_rating_from,business_rating_to,business_text,hex_colors,rgb_colors,business_colors_modify
 from business_colors where business_rating_from gt;=
'${businessmasters.business_point}' AND
business_rating_to lt; '${businessmasters.business_point}'
deltaQuery=select business_colors_id from business_colors where
business_colors_modify'${dih.last_index_time}'
 parentDeltaQuery=select business_id from businessmasters where
business_point lt;  ${business_colors.business_rating_from} AND
business_point gt;= ${business_colors.business_rating_from}
field column=business_colors_id name=id/
   field column=business_rating_from name=business_rating_from
indexed=true stored=true /
field column=business_rating_to name=business_rating_to
indexed=true   stored=true /

field column=business_text name=business_text indexed=true
stored=true /
field column=hex_colors name=hex_colors indexed=true
stored=true /
field column=rgb_colors name=rgb_colors indexed=true
stored=true /
field column=business_colors_modify name=business_colors_modify
indexed=true
 stored=true/
When i click full indexing data does not get index and no error is shown.
What is wrong with this,Can any one help and advise. How do i achieve what
i want to do

I also have this question posted on stack over flow
http://stackoverflow.com/questions/26256344/how-to-link-tables-based-on-range-values-solr-data-config
-- 
Regards
Madhav Bahuguna


NullPointerException for ExternalFileField when key field has no terms

2014-10-08 Thread Matthew Nigl
Hi,

I use various ID fields as the keys for various ExternalFileField fields,
and I have noticed that I will sometimes get the following error:

ERROR org.apache.solr.servlet.SolrDispatchFilter  û
null:java.lang.NullPointerException
at
org.apache.solr.search.function.FileFloatSource.getFloats(FileFloatSource.java:273)
at
org.apache.solr.search.function.FileFloatSource.access$000(FileFloatSource.java:51)
at
org.apache.solr.search.function.FileFloatSource$2.createValue(FileFloatSource.java:147)
at
org.apache.solr.search.function.FileFloatSource$Cache.get(FileFloatSource.java:190)
at
org.apache.solr.search.function.FileFloatSource.getCachedFloats(FileFloatSource.java:141)
at
org.apache.solr.search.function.FileFloatSource.getValues(FileFloatSource.java:84)
at
org.apache.solr.response.transform.ValueSourceAugmenter.transform(ValueSourceAugmenter.java:95)
at
org.apache.solr.response.TextResponseWriter.writeDocuments(TextResponseWriter.java:252)
at
org.apache.solr.response.TextResponseWriter.writeVal(TextResponseWriter.java:170)
at
org.apache.solr.response.JSONWriter.writeNamedListAsMapWithDups(JSONResponseWriter.java:184)
at
org.apache.solr.response.JSONWriter.writeNamedList(JSONResponseWriter.java:300)
at
org.apache.solr.response.JSONWriter.writeResponse(JSONResponseWriter.java:96)
at
org.apache.solr.response.JSONResponseWriter.write(JSONResponseWriter.java:61)
at
org.apache.solr.servlet.SolrDispatchFilter.writeResponse(SolrDispatchFilter.java:765)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:426)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:207)
at
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419)
at
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)
at
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557)
at
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
at
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075)
at
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384)
at
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)
at
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
at
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)
at
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154)
at
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)
at org.eclipse.jetty.server.Server.handle(Server.java:368)
at
org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489)
at
org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53)
at
org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:942)
at
org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:1004)
at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:640)
at
org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235)
at
org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72)
at
org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264)
at
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
at
org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)
at java.lang.Thread.run(Unknown Source)



The source code referenced in the error is below (FileFloatSource.java:273):

TermsEnum termsEnum = MultiFields.getTerms(reader, idName).iterator(null);

So if there are no terms in the index for the key field, then getTerms will
return null, and of course trying to call iterator on null will cause the
exception.

For my use-case, it makes sense that the key field may have no terms
(initially) because there are various types of documents sharing the index,
and they will not all exist at the onset. The default value for the EFF
would suffice in those cases.

Is this worthy of a JIRA? I have gone through whatever documentation I can
find for ExternalFileField and I can't seem to find anything about
requiring key terms first. It seems that this error is not encountered
often because users generally set the unique key field as the external file
key field, so it always exists.

The workaround is to ensure at least 

RE: NullPointerException for ExternalFileField when key field has no terms

2014-10-08 Thread Markus Jelsma
Hi - yes it is worth a ticket as the javadoc says it is ok:
http://lucene.apache.org/solr/4_10_1/solr-core/org/apache/solr/schema/ExternalFileField.html
 
 
-Original message-
 From:Matthew Nigl matthew.n...@gmail.com
 Sent: Wednesday 8th October 2014 14:48
 To: solr-user@lucene.apache.org
 Subject: NullPointerException for ExternalFileField when key field has no 
 terms
 
 Hi,
 
 I use various ID fields as the keys for various ExternalFileField fields,
 and I have noticed that I will sometimes get the following error:
 
 ERROR org.apache.solr.servlet.SolrDispatchFilter  û
 null:java.lang.NullPointerException
 at
 org.apache.solr.search.function.FileFloatSource.getFloats(FileFloatSource.java:273)
 at
 org.apache.solr.search.function.FileFloatSource.access$000(FileFloatSource.java:51)
 at
 org.apache.solr.search.function.FileFloatSource$2.createValue(FileFloatSource.java:147)
 at
 org.apache.solr.search.function.FileFloatSource$Cache.get(FileFloatSource.java:190)
 at
 org.apache.solr.search.function.FileFloatSource.getCachedFloats(FileFloatSource.java:141)
 at
 org.apache.solr.search.function.FileFloatSource.getValues(FileFloatSource.java:84)
 at
 org.apache.solr.response.transform.ValueSourceAugmenter.transform(ValueSourceAugmenter.java:95)
 at
 org.apache.solr.response.TextResponseWriter.writeDocuments(TextResponseWriter.java:252)
 at
 org.apache.solr.response.TextResponseWriter.writeVal(TextResponseWriter.java:170)
 at
 org.apache.solr.response.JSONWriter.writeNamedListAsMapWithDups(JSONResponseWriter.java:184)
 at
 org.apache.solr.response.JSONWriter.writeNamedList(JSONResponseWriter.java:300)
 at
 org.apache.solr.response.JSONWriter.writeResponse(JSONResponseWriter.java:96)
 at
 org.apache.solr.response.JSONResponseWriter.write(JSONResponseWriter.java:61)
 at
 org.apache.solr.servlet.SolrDispatchFilter.writeResponse(SolrDispatchFilter.java:765)
 at
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:426)
 at
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:207)
 at
 org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419)
 at
 org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455)
 at
 org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)
 at
 org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557)
 at
 org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
 at
 org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075)
 at
 org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384)
 at
 org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)
 at
 org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009)
 at
 org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
 at
 org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)
 at
 org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154)
 at
 org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)
 at org.eclipse.jetty.server.Server.handle(Server.java:368)
 at
 org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489)
 at
 org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53)
 at
 org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:942)
 at
 org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:1004)
 at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:640)
 at
 org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235)
 at
 org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72)
 at
 org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264)
 at
 org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
 at
 org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)
 at java.lang.Thread.run(Unknown Source)
 
 
 
 The source code referenced in the error is below (FileFloatSource.java:273):
 
 TermsEnum termsEnum = MultiFields.getTerms(reader, idName).iterator(null);
 
 So if there are no terms in the index for the key field, then getTerms will
 return null, and of course trying to call iterator on null will cause the
 exception.
 
 For my use-case, it makes sense that the key field may have no terms
 (initially) because there are various types of documents 

Re: SolrCloud with client ssl

2014-10-08 Thread Jan Høydahl
Hi,

I answered at https://issues.apache.org/jira/browse/SOLR-6595:

* Does it work with createNodeSet when using plain SolrCloud without SSL?
* Please provide the exact CollectionApi request you used when it failed, so we 
can see if the syntax is correct. Also, is 443 your secure port number in 
Jetty/Tomcat?

...but perhaps keep the conversation going here until it is a confirmed bug :)

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com

7. okt. 2014 kl. 06:57 skrev Sindre Fiskaa s...@dips.no:

 Followed the description 
 https://cwiki.apache.org/confluence/display/solr/Enabling+SSL and generated a 
 self signed key pair. Configured a few solr-nodes and used the collection api 
 to crate a new collection. I get error message when specify the nodes with 
 the createNodeSet param. When I don't use createNodeSet param the collection 
 gets created without error on random nodes. Could this be a bug related to 
 the createNodeSet param?
 
 
 response
 lst name=responseHeaderint name=status0/intint 
 name=QTime185/int/lstlst 
 name=failurestrorg.apache.solr.client.solrj.SolrServerException:IOException
  occured when talking to server 
 at:https://vt-searchln04:443/solr/str/lsthttps://vt-searchln04/solr%3C/str%3E%3C/lst%3E
 /response



Re: NullPointerException for ExternalFileField when key field has no terms

2014-10-08 Thread Matthew Nigl
Thanks Markus. I initially interpreted the line It's OK to have a keyField
value that can't be found in the index as meaning that the key field value
in the external file does not have to exist as a term in the index.





On 8 October 2014 23:56, Markus Jelsma markus.jel...@openindex.io wrote:

 Hi - yes it is worth a ticket as the javadoc says it is ok:

 http://lucene.apache.org/solr/4_10_1/solr-core/org/apache/solr/schema/ExternalFileField.html


 -Original message-
  From:Matthew Nigl matthew.n...@gmail.com
  Sent: Wednesday 8th October 2014 14:48
  To: solr-user@lucene.apache.org
  Subject: NullPointerException for ExternalFileField when key field has
 no terms
 
  Hi,
 
  I use various ID fields as the keys for various ExternalFileField fields,
  and I have noticed that I will sometimes get the following error:
 
  ERROR org.apache.solr.servlet.SolrDispatchFilter  û
  null:java.lang.NullPointerException
  at
 
 org.apache.solr.search.function.FileFloatSource.getFloats(FileFloatSource.java:273)
  at
 
 org.apache.solr.search.function.FileFloatSource.access$000(FileFloatSource.java:51)
  at
 
 org.apache.solr.search.function.FileFloatSource$2.createValue(FileFloatSource.java:147)
  at
 
 org.apache.solr.search.function.FileFloatSource$Cache.get(FileFloatSource.java:190)
  at
 
 org.apache.solr.search.function.FileFloatSource.getCachedFloats(FileFloatSource.java:141)
  at
 
 org.apache.solr.search.function.FileFloatSource.getValues(FileFloatSource.java:84)
  at
 
 org.apache.solr.response.transform.ValueSourceAugmenter.transform(ValueSourceAugmenter.java:95)
  at
 
 org.apache.solr.response.TextResponseWriter.writeDocuments(TextResponseWriter.java:252)
  at
 
 org.apache.solr.response.TextResponseWriter.writeVal(TextResponseWriter.java:170)
  at
 
 org.apache.solr.response.JSONWriter.writeNamedListAsMapWithDups(JSONResponseWriter.java:184)
  at
 
 org.apache.solr.response.JSONWriter.writeNamedList(JSONResponseWriter.java:300)
  at
 
 org.apache.solr.response.JSONWriter.writeResponse(JSONResponseWriter.java:96)
  at
 
 org.apache.solr.response.JSONResponseWriter.write(JSONResponseWriter.java:61)
  at
 
 org.apache.solr.servlet.SolrDispatchFilter.writeResponse(SolrDispatchFilter.java:765)
  at
 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:426)
  at
 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:207)
  at
 
 org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419)
  at
 
 org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455)
  at
 
 org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)
  at
 
 org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557)
  at
 
 org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
  at
 
 org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075)
  at
  org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384)
  at
 
 org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)
  at
 
 org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009)
  at
 
 org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
  at
 
 org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)
  at
 
 org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154)
  at
 
 org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)
  at org.eclipse.jetty.server.Server.handle(Server.java:368)
  at
 
 org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489)
  at
 
 org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53)
  at
 
 org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:942)
  at
 
 org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:1004)
  at
 org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:640)
  at
  org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235)
  at
 
 org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72)
  at
 
 org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264)
  at
 
 org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
  at
 
 org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)
  at java.lang.Thread.run(Unknown Source)
 
 
 
  The source code referenced 

Re: eDisMax parser and special characters

2014-10-08 Thread Michael Joyner

Try escaping special chars with a \

On 10/08/2014 01:39 AM, Lanke,Aniruddha wrote:

We are using a eDisMax parser in our configuration. When we search using the 
query term that has a ‘-‘ we don’t get any results back.

Search term: red - yellow
This doesn’t return any data back but






Re: Filter cache pollution during sharded edismax queries

2014-10-08 Thread Charlie Hull

On 01/10/2014 09:55, jim ferenczi wrote:

I think you should test with facet.shard.limit=-1 this will disallow the
limit for the facet on the shards and remove the needs for facet
refinements. I bet that returning every facet with a count greater than 0
on internal queries is cheaper than using the filter cache to handle a lot
of refinements.


I'm happy to report that in our case setting facet.limit=-1 has a 
significant impact on performance, cache hit ratios and reduced CPU 
load. Thanks to all who replied!


Cheers

Charlie
Flax


Jim

2014-10-01 10:24 GMT+02:00 Charlie Hull char...@flax.co.uk:


On 30/09/2014 22:25, Erick Erickson wrote:


Just from a 20,000 ft. view, using the filterCache this way seems...odd.

+1 for using a different cache, but that's being quite unfamiliar with the
code.



Here's a quick update:

1. LFUCache performs worse so we returned to LRUCache
2. Making the cache smaller than the default 512 reduced performance.
3. Raising the cache size to 2048 didn't seem to have a significant effect
on performance but did reduce CPU load significantly. This may help our
client as they can reduce their system spec considerably.

We're continuing to test with our client, but the upshot is that even if
you think you don't need the filter cache, if you're doing distributed
faceting you probably do, and you should size it based on experimentation.
In our case there is a single filter but the cache needs to be considerably
larger than that!

Cheers

Charlie




On Tue, Sep 30, 2014 at 1:53 PM, Alan Woodward a...@flax.co.uk wrote:





  Once all the facets have been gathered, the co-ordinating node then

asks
the subnodes for an exact count for the final top-N facets,




What's the point to refine these counts? I've thought that it make sense
only for facet.limit ed requests. Is it correct statement? can those who
suffer from the low performance, just unlimit  facet.limit to avoid that
distributed hop?



Presumably yes, but if you've got a sufficiently high cardinality field
then any gains made by missing out the hop will probably be offset by
having to stream all the return values back again.

Alan


  --

Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics

http://www.griddynamics.com
mkhlud...@griddynamics.com









--
Charlie Hull
Flax - Open Source Enterprise Search

tel/fax: +44 (0)8700 118334
mobile:  +44 (0)7767 825828
web: www.flax.co.uk






--
Charlie Hull
Flax - Open Source Enterprise Search

tel/fax: +44 (0)8700 118334
mobile:  +44 (0)7767 825828
web: www.flax.co.uk


Re: eDisMax parser and special characters

2014-10-08 Thread Erick Erickson
There's not much information here.
What's the doc look like?
What is the analyzer chain for it?
What is the output when you add debug=query?

Details matter. A lot ;)

Best,
Erick

On Wed, Oct 8, 2014 at 6:26 AM, Michael Joyner mich...@newsrx.com wrote:
 Try escaping special chars with a \


 On 10/08/2014 01:39 AM, Lanke,Aniruddha wrote:

 We are using a eDisMax parser in our configuration. When we search using
 the query term that has a ‘-‘ we don’t get any results back.

 Search term: red - yellow
 This doesn’t return any data back but





WhitespaceTokenizer to consider incorrectly encoded c2a0?

2014-10-08 Thread Markus Jelsma
Hi,

For some crazy reason, some users somehow manage to substitute a perfectly 
normal space with a badly encoded non-breaking space, properly URL encoded this 
then becomes %c2a0 and depending on the encoding you use to view you probably 
see  followed by a space. For example:

Because c2a0 is not considered whitespace (indeed, it is not real whitespace, 
that is 00a0) by the Java Character class, the WhitespaceTokenizer won't split 
on it, but the WordDelimiterFilter still does, somehow mitigating the problem 
as it becomes:

HTMLSCF een abonnement
WT een abonnement
WDF een eenabonnement abonnement

Should the WhitespaceTokenizer not include this weird edge case? 

Cheers,
Markus


Re: SolrCloud with client ssl

2014-10-08 Thread Sindre Fiskaa
Yes, running SolrCloud without SSL it works fine with the createNodeSet
param. I run this with Tomcat application server and 443 enabled.
Although I receive this error message the collection and the shards gets
created and the clusterstate.json updated, but the cores are missing. I
manual add them one by one in the admin console so I get my cloud up
running and the solr-nodes are able to talk to each other - no certificate
issues or SSL handshake error between the nodes.

curl -E solr-ssl.pem:secret12 -k
'https://vt-searchln03:443/solr/admin/collections?action=CREATEnumShards=3
replicationFactor=2name=multishardingcreateNodeSet=vt-searchln03:443_sol
r,vt-searchln04:443_solr,vt-searchln01:443_solr,vt-searchln02:443_solr,vt-s
earchln05:443_solr,vt-searchln06:443_solr'

?xml version=1.0 encoding=UTF-8?
response
lst name=responseHeaderint name=status0/intint
name=QTime206/int/lstlst
name=failurestrorg.apache.solr.client.solrj.SolrServerException:IOExce
ption occured when talking to server at: https://vt-searchln03:443/solr
https://vt-searchln03/solr/strstrorg.apache.solr.client.solrj.SolrSer
verException:IOException occured when talking to server at:
https://vt-searchln04:443/solr
https://vt-searchln04/solr/strstrorg.apache.solr.client.solrj.SolrSer
verException:IOException occured when talking to server at:
https://vt-searchln06:443/solr
https://vt-searchln06/solr/strstrorg.apache.solr.client.solrj.SolrSer
verException:IOException occured when talking to server at:
https://vt-searchln05:443/solr
https://vt-searchln05/solr/strstrorg.apache.solr.client.solrj.SolrSer
verException:IOException occured when talking to server at:
https://vt-searchln01:443/solr
https://vt-searchln01/solr/strstrorg.apache.solr.client.solrj.SolrSer
verException:IOException occured when talking to server at:
https://vt-searchln02:443/solr https://vt-searchln02/solr/str/lst
/response


-Sindre

On 08.10.14 15:14, Jan Høydahl jan@cominvent.com wrote:

Hi,

I answered at https://issues.apache.org/jira/browse/SOLR-6595:

* Does it work with createNodeSet when using plain SolrCloud without SSL?
* Please provide the exact CollectionApi request you used when it failed,
so we can see if the syntax is correct. Also, is 443 your secure port
number in Jetty/Tomcat?

...but perhaps keep the conversation going here until it is a confirmed
bug :)

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com

7. okt. 2014 kl. 06:57 skrev Sindre Fiskaa s...@dips.no:

 Followed the description
https://cwiki.apache.org/confluence/display/solr/Enabling+SSL and
generated a self signed key pair. Configured a few solr-nodes and used
the collection api to crate a new collection. I get error message when
specify the nodes with the createNodeSet param. When I don't use
createNodeSet param the collection gets created without error on random
nodes. Could this be a bug related to the createNodeSet param?
 
 
 response
 lst name=responseHeaderint name=status0/intint
name=QTime185/int/lstlst
name=failurestrorg.apache.solr.client.solrj.SolrServerException:IOEx
ception occured when talking to server
at:https://vt-searchln04:443/solr/str/lsthttps://vt-searchln04/solr%
3C/str%3E%3C/lst%3E
 /response




Re: WhitespaceTokenizer to consider incorrectly encoded c2a0?

2014-10-08 Thread Alexandre Rafalovitch
Is this a suggestion for JIRA ticket? Or a question on how to solve
it? If the later, you could probably stick a RegEx replacement in the
UpdateRequestProcessor chain and be done with it.

As to why? I would look for the rest of the MSWord-generated
artifacts, such as smart quotes, extra-long dashes, etc.

Regards,
   Alex.
Personal: http://www.outerthoughts.com/ and @arafalov
Solr resources and newsletter: http://www.solr-start.com/ and @solrstart
Solr popularizers community: https://www.linkedin.com/groups?gid=6713853


On 8 October 2014 09:59, Markus Jelsma markus.jel...@openindex.io wrote:
 Hi,

 For some crazy reason, some users somehow manage to substitute a perfectly 
 normal space with a badly encoded non-breaking space, properly URL encoded 
 this then becomes %c2a0 and depending on the encoding you use to view you 
 probably see  followed by a space. For example:

 Because c2a0 is not considered whitespace (indeed, it is not real whitespace, 
 that is 00a0) by the Java Character class, the WhitespaceTokenizer won't 
 split on it, but the WordDelimiterFilter still does, somehow mitigating the 
 problem as it becomes:

 HTMLSCF een abonnement
 WT een abonnement
 WDF een eenabonnement abonnement

 Should the WhitespaceTokenizer not include this weird edge case?

 Cheers,
 Markus


RE: WhitespaceTokenizer to consider incorrectly encoded c2a0?

2014-10-08 Thread Markus Jelsma
Alexandre - i am sorry if i was not clear, this is about queries, this all 
happens at query time. Yes we can do the substitution in with the regex replace 
filter, but i would propose this weird exception to be added to 
WhitespaceTokenizer so Lucene deals with this by itself.

Markus
 
-Original message-
 From:Alexandre Rafalovitch arafa...@gmail.com
 Sent: Wednesday 8th October 2014 16:12
 To: solr-user solr-user@lucene.apache.org
 Subject: Re: WhitespaceTokenizer to consider incorrectly encoded c2a0?
 
 Is this a suggestion for JIRA ticket? Or a question on how to solve
 it? If the later, you could probably stick a RegEx replacement in the
 UpdateRequestProcessor chain and be done with it.
 
 As to why? I would look for the rest of the MSWord-generated
 artifacts, such as smart quotes, extra-long dashes, etc.
 
 Regards,
Alex.
 Personal: http://www.outerthoughts.com/ and @arafalov
 Solr resources and newsletter: http://www.solr-start.com/ and @solrstart
 Solr popularizers community: https://www.linkedin.com/groups?gid=6713853
 
 
 On 8 October 2014 09:59, Markus Jelsma markus.jel...@openindex.io wrote:
  Hi,
 
  For some crazy reason, some users somehow manage to substitute a perfectly 
  normal space with a badly encoded non-breaking space, properly URL encoded 
  this then becomes %c2a0 and depending on the encoding you use to view you 
  probably see  followed by a space. For example:
 
  Because c2a0 is not considered whitespace (indeed, it is not real 
  whitespace, that is 00a0) by the Java Character class, the 
  WhitespaceTokenizer won't split on it, but the WordDelimiterFilter still 
  does, somehow mitigating the problem as it becomes:
 
  HTMLSCF een abonnement
  WT een abonnement
  WDF een eenabonnement abonnement
 
  Should the WhitespaceTokenizer not include this weird edge case?
 
  Cheers,
  Markus
 


Using Velocity with Child Documents?

2014-10-08 Thread Edwards, Joshua
Hi -

I am trying to index a collection that has child documents.  I have 
successfully loaded the data into my index using SolrJ, and I have verified 
that I can search correctly using the child of method in my fq variable.  
Now, I would like to use Velocity (Solritas) to display the parent records with 
some details of the child records underneath.  Is there an easy way to do this? 
 Is there an example somewhere that I can look at?

Thanks,
Josh Edwards


The information contained in this e-mail is confidential and/or proprietary to 
Capital One and/or its affiliates. The information transmitted herewith is 
intended only for use by the individual or entity to which it is addressed.  If 
the reader of this message is not the intended recipient, you are hereby 
notified that any review, retransmission, dissemination, distribution, copying 
or other use of, or taking of any action in reliance upon this information is 
strictly prohibited. If you have received this communication in error, please 
contact the sender and delete the material from your computer.


Re: WhitespaceTokenizer to consider incorrectly encoded c2a0?

2014-10-08 Thread Jack Krupansky
The source code uses that Java Character.isWhitespace method which 
specifically excludes the non-breaking white space characters.


The Javadoc contract for WhitespaceTokenizer is too vague, especially since 
Unicode has so many... subtleties.


Personally, I'd go along with treating non-breaking white space as white 
space here.


And update the Lucene Javadoc contract to be more explicit.

-- Jack Krupansky

-Original Message- 
From: Markus Jelsma

Sent: Wednesday, October 8, 2014 10:16 AM
To: solr-user@lucene.apache.org ; solr-user
Subject: RE: WhitespaceTokenizer to consider incorrectly encoded c2a0?

Alexandre - i am sorry if i was not clear, this is about queries, this all 
happens at query time. Yes we can do the substitution in with the regex 
replace filter, but i would propose this weird exception to be added to 
WhitespaceTokenizer so Lucene deals with this by itself.


Markus

-Original message-

From:Alexandre Rafalovitch arafa...@gmail.com
Sent: Wednesday 8th October 2014 16:12
To: solr-user solr-user@lucene.apache.org
Subject: Re: WhitespaceTokenizer to consider incorrectly encoded c2a0?

Is this a suggestion for JIRA ticket? Or a question on how to solve
it? If the later, you could probably stick a RegEx replacement in the
UpdateRequestProcessor chain and be done with it.

As to why? I would look for the rest of the MSWord-generated
artifacts, such as smart quotes, extra-long dashes, etc.

Regards,
   Alex.
Personal: http://www.outerthoughts.com/ and @arafalov
Solr resources and newsletter: http://www.solr-start.com/ and @solrstart
Solr popularizers community: https://www.linkedin.com/groups?gid=6713853


On 8 October 2014 09:59, Markus Jelsma markus.jel...@openindex.io wrote:
 Hi,

 For some crazy reason, some users somehow manage to substitute a 
 perfectly normal space with a badly encoded non-breaking space, properly 
 URL encoded this then becomes %c2a0 and depending on the encoding you 
 use to view you probably see  followed by a space. For example:


 Because c2a0 is not considered whitespace (indeed, it is not real 
 whitespace, that is 00a0) by the Java Character class, the 
 WhitespaceTokenizer won't split on it, but the WordDelimiterFilter still 
 does, somehow mitigating the problem as it becomes:


 HTMLSCF een abonnement
 WT een abonnement
 WDF een eenabonnement abonnement

 Should the WhitespaceTokenizer not include this weird edge case?

 Cheers,
 Markus





Re: solr suggester not working with shards

2014-10-08 Thread Varun Thacker
Hi,

You have defined the suggester in the old way of implementing it but you do
mention the SuggestComponent. Can you try it out using the documentation
given here - https://cwiki.apache.org/confluence/display/solr/Suggester

Secondly how are you firing your queries?

On Wed, Oct 8, 2014 at 12:39 PM, rsi...@ambrac.nl rsi...@ambrac.nl wrote:

 One more thing :

 suggest is not working  with multiple cores using  shard but  'did you
 mean'
 (spell check ) is working fine with multiple cores.



 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/solr-suggester-not-working-with-shards-tp4163261p4163265.html
 Sent from the Solr - User mailing list archive at Nabble.com.




-- 


Regards,
Varun Thacker
http://www.vthacker.in/


Custom Solr Query Post Filter

2014-10-08 Thread Christopher Gross
Code:
http://pastebin.com/tNjzDbmy

Solr 4.9.0
Tomcat 7
Java 7

I took Erik Hatcher's example for creating a PostFilter and have modified
it so it would work with Solr 4.x.  Right now it works...the first time.
If I were to run this query it would work right:
http://localhost:8080/solr/plugintest/select?q=*:*sort=uniqueId%20descfq={!classif%20creds=ABC}
However, if I ran this one:
http://localhost:8080/solr/plugintest/select?q=*:*sort=uniqueId%20descfq={!classif%20creds=XYZ}
I would get the results from the first query.  I could do a different
query, like:
http://localhost:8080/solr/plugintest/select?q=uniqueId[* TO
*]sort=uniqueId%20descfq={!classif%20creds=XYZ}
and I'd get the XYZ tagged items.  But if I tried to find ABC with that one:
http://localhost:8080/solr/plugintest/select?q=uniqueId[* TO
*]sort=uniqueId%20descfq={!classif%20creds=ABC}
it would just list the XYZ items.

I'm not sure what is persisting where to cause this to happen.  Anybody
have some tips/pointers for building filters like this for Solr 4.x?

Thanks!

-- Chris


Re: Custom Solr Query Post Filter

2014-10-08 Thread Joel Bernstein
The results are being cached in the QueryResultCache most likely. You need
to implement equals() and hashCode() on the query object, which is part of
the cache key. In your case the creds param must be included in the
hashCode and equals logic.

Joel Bernstein
Search Engineer at Heliosearch

On Wed, Oct 8, 2014 at 1:17 PM, Christopher Gross cogr...@gmail.com wrote:

 Code:
 http://pastebin.com/tNjzDbmy

 Solr 4.9.0
 Tomcat 7
 Java 7

 I took Erik Hatcher's example for creating a PostFilter and have modified
 it so it would work with Solr 4.x.  Right now it works...the first time.
 If I were to run this query it would work right:

 http://localhost:8080/solr/plugintest/select?q=*:*sort=uniqueId%20descfq={!classif%20creds=ABC}
 However, if I ran this one:

 http://localhost:8080/solr/plugintest/select?q=*:*sort=uniqueId%20descfq={!classif%20creds=XYZ}
 I would get the results from the first query.  I could do a different
 query, like:
 http://localhost:8080/solr/plugintest/select?q=uniqueId[* TO
 *]sort=uniqueId%20descfq={!classif%20creds=XYZ}
 and I'd get the XYZ tagged items.  But if I tried to find ABC with that
 one:
 http://localhost:8080/solr/plugintest/select?q=uniqueId[* TO
 *]sort=uniqueId%20descfq={!classif%20creds=ABC}
 it would just list the XYZ items.

 I'm not sure what is persisting where to cause this to happen.  Anybody
 have some tips/pointers for building filters like this for Solr 4.x?

 Thanks!

 -- Chris



Re: Custom Solr Query Post Filter

2014-10-08 Thread Christopher Gross
That did the trick!  Thanks Joel.

-- Chris

On Wed, Oct 8, 2014 at 2:05 PM, Joel Bernstein joels...@gmail.com wrote:

 The results are being cached in the QueryResultCache most likely. You need
 to implement equals() and hashCode() on the query object, which is part of
 the cache key. In your case the creds param must be included in the
 hashCode and equals logic.

 Joel Bernstein
 Search Engineer at Heliosearch

 On Wed, Oct 8, 2014 at 1:17 PM, Christopher Gross cogr...@gmail.com
 wrote:

  Code:
  http://pastebin.com/tNjzDbmy
 
  Solr 4.9.0
  Tomcat 7
  Java 7
 
  I took Erik Hatcher's example for creating a PostFilter and have modified
  it so it would work with Solr 4.x.  Right now it works...the first time.
  If I were to run this query it would work right:
 
 
 http://localhost:8080/solr/plugintest/select?q=*:*sort=uniqueId%20descfq={!classif%20creds=ABC}
  However, if I ran this one:
 
 
 http://localhost:8080/solr/plugintest/select?q=*:*sort=uniqueId%20descfq={!classif%20creds=XYZ}
  I would get the results from the first query.  I could do a different
  query, like:
  http://localhost:8080/solr/plugintest/select?q=uniqueId[* TO
  *]sort=uniqueId%20descfq={!classif%20creds=XYZ}
  and I'd get the XYZ tagged items.  But if I tried to find ABC with that
  one:
  http://localhost:8080/solr/plugintest/select?q=uniqueId[* TO
  *]sort=uniqueId%20descfq={!classif%20creds=ABC}
  it would just list the XYZ items.
 
  I'm not sure what is persisting where to cause this to happen.  Anybody
  have some tips/pointers for building filters like this for Solr 4.x?
 
  Thanks!
 
  -- Chris
 



Re: Using Velocity with Child Documents?

2014-10-08 Thread Erick Erickson
Velocity is just taking the Solr response and displaying selected bits
in HTML. So assuming the information you want is in the reponse packet
(which you can tell just by doing the query from the browser) it's
just a matter of pulling it out of the response and displaying it.

Mostly when I started down this path I poked around the velocity
directory it was just a bit of hunt int to figure things out, with
some help from the Apache Velocity page.

Not much help, but the short form is there's much of an example that I
know of for your specific problem.

Erick

On Wed, Oct 8, 2014 at 8:54 AM, Edwards, Joshua
joshua.edwa...@capitalone.com wrote:
 Hi -

 I am trying to index a collection that has child documents.  I have 
 successfully loaded the data into my index using SolrJ, and I have verified 
 that I can search correctly using the child of method in my fq variable.  
 Now, I would like to use Velocity (Solritas) to display the parent records 
 with some details of the child records underneath.  Is there an easy way to 
 do this?  Is there an example somewhere that I can look at?

 Thanks,
 Josh Edwards
 

 The information contained in this e-mail is confidential and/or proprietary 
 to Capital One and/or its affiliates. The information transmitted herewith is 
 intended only for use by the individual or entity to which it is addressed.  
 If the reader of this message is not the intended recipient, you are hereby 
 notified that any review, retransmission, dissemination, distribution, 
 copying or other use of, or taking of any action in reliance upon this 
 information is strictly prohibited. If you have received this communication 
 in error, please contact the sender and delete the material from your 
 computer.


Edismax parser and boosts

2014-10-08 Thread Pawel Rog
Hi,
I use edismax query with q parameter set as below:

q=foo^1.0+AND+bar

For such a query for the same document I see different (lower) scoring
value than for

q=foo+AND+bar

By default boost of term is 1 as far as i know so why the scoring differs?

When I check debugQuery parameter in parsedQuery for foo^1.0+AND+bar I
see Boolean query which one of clauses is a phrase query foo 1.0 bar. It
seems that edismax parser takes whole q parameter as a phrase without
removing boost value and add it as a boolean clause. Is it a bug or it
should work like that?

--
Paweł Róg


Re: Solr configuration, memory usage and MMapDirectory

2014-10-08 Thread Shawn Heisey
On 10/8/2014 4:02 AM, Simon Fairey wrote:
 I'm currently setting up jconsole but as I have to remotely monitor (no gui 
 capability on the server) I have to wait before I can restart solr with a JMX 
 port setup. In the meantime I looked at top and given the calculations you 
 said based on your top output and this top of my java process from the node 
 that handles the querying, the indexing node has a similar memory profile:
 
 https://www.dropbox.com/s/pz85dm4e7qpepco/SolrTop.png?dl=0
 
 It would seem I need a monstrously large heap in the 60GB region?
 
 We do use a lot of navigators/filters so I have set the caches to be quite 
 large for these, are these what are using up the memory?

With a VIRT size of 189GB and a RES size of 73GB, I believe you probably
have more than 45GB of index data.  This might be a combination of old
indexes and the active index.  Only the indexes (cores) that are being
actively used need to be considered when trying to calculate the total
RAM needed.  Other indexes will not affect performance, even though they
increase your virtual memory size.

With MMap, part of the virtual memory size is the size of the index data
that has been opened on the disk.  This is not memory that's actually
allocated.  There's a very good reason that mmap has been the default in
Lucene and Solr for more than two years.

http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html

You stated originally that you have 25 million document and 45GB of
index data on each node.  With those numbers and a conservative
configuration, I would expect that you need about 4GB of heap, maybe as
much as 8GB.  I cannot think of any reason that you would NEED a heap
60GB or larger.

Each field that you sort on, each field that you facet on with the
default facet.method of fc, and each filter that you cache will use a
large block of memory.  The size of that block of memory is almost
exclusively determined by the number of documents in the index.

With 25 million documents, each filterCache entry will be approximately
3MB -- one bit for every document.  I do not know how big each
FieldCache entry is for a sort field and a facet field, but assume that
they are probably larger than the 3MB entries on the filterCache.

I've got a filterCache sized at 64, with an autowarmCount of 4.  With
larger autowarmCount values, I was seeing commits take 30 seconds or
more, because each of those filters can take a few seconds to execute.
Cache sizes in the thousands are rarely necessary, and just chew up a
lot of memory with no benefit.  Large autowarmCount values are also
rarely necessary.  Every time a new searcher is opened by a commit, add
up all your autowarmCount values and realize that the searcher likely
needs to execute that many queries before it is available.

If you need to set up remote JMX so you can remotely connect jconsole, I
have done this in the redhat init script I've built -- see JMX_OPTS here:

http://wiki.apache.org/solr/ShawnHeisey#Init_script

It's never a good idea to expose Solr directly to the internet, but if
you use that JMX config, *definitely* don't expose it to the Internet.
It doesn't use any authentication.

We might need to back up a little bit and start with the problem that
you are trying to figure out, not the numbers that are being reported.

http://people.apache.org/~hossman/#xyproblem

Your original note said that you're sanity checking.  Toward that end,
the only insane thing that jumps out at me is that your max heap is
*VERY* large, and you probably don't have the proper GC tuning.

My recommendations for initial action are to use -Xmx8g on the servlet
container startup and include the GC settings you can find on the wiki
pages I've given you.  It would be a very good idea to set up remote JMX
so you can use jconsole or jvisualvm remotely.

Thanks,
Shawn



Re: eDisMax parser and special characters

2014-10-08 Thread Lanke,Aniruddha
Sorry for a delayed reply here is more information -

Schema that we are using - http://pastebin.com/WQAJCCph
Request Handler in config - http://pastebin.com/Y0kP40WF

Some analysis -

Search term: red -
Parser eDismax
No results show up
str name=parsedquery(+((DisjunctionMaxQuery((name_starts_with:red^9.0 | 
name_parts_starts_with:red^6.0 | s_detail:red | name:red^12.0 | 
s_detail_starts_with:red^3.0 | s_detail_parts_starts_with:red^2.0)) 
DisjunctionMaxQuery((name_starts_with:-^9.0 | 
s_detail_starts_with:-^3.0)))~2))/no_coord/str

Search term: red -
Parser dismax
Results are returned
str name=parsedquery(+DisjunctionMaxQuery((name_starts_with:red^9.0 | 
name_parts_starts_with:red^6.0 | s_detail:red | name:red^12.0 | 
s_detail_starts_with:red^3.0 | s_detail_parts_starts_with:red^2.0)) 
())/no_coord/str

Why do we see the variation in the results between dismax and eDismax?


On Oct 8, 2014, at 8:59 AM, Erick Erickson 
erickerick...@gmail.commailto:erickerick...@gmail.com wrote:

There's not much information here.
What's the doc look like?
What is the analyzer chain for it?
What is the output when you add debug=query?

Details matter. A lot ;)

Best,
Erick

On Wed, Oct 8, 2014 at 6:26 AM, Michael Joyner 
mich...@newsrx.commailto:mich...@newsrx.com wrote:
Try escaping special chars with a \


On 10/08/2014 01:39 AM, Lanke,Aniruddha wrote:

We are using a eDisMax parser in our configuration. When we search using
the query term that has a ‘-‘ we don’t get any results back.

Search term: red - yellow
This doesn’t return any data back but




CONFIDENTIALITY NOTICE This message and any included attachments are from 
Cerner Corporation and are intended only for the addressee. The information 
contained in this message is confidential and may constitute inside or 
non-public information under international, federal, or state securities laws. 
Unauthorized forwarding, printing, copying, distribution, or use of such 
information is strictly prohibited and may be unlawful. If you are not the 
addressee, please promptly delete this message and notify the sender of the 
delivery error by e-mail or you may call Cerner's corporate offices in Kansas 
City, Missouri, U.S.A at (+1) (816)221-1024.


RE: Solr configuration, memory usage and MMapDirectory

2014-10-08 Thread Simon Fairey
Hi

Thanks for this I will investigate further after reading a number of your 
points in more detail, I do have a feeling they've setup too many entries in 
the filter cache (1000s) so will revisit that.

Just a note on numbers, those were valid when I made the post but obviously 
they change as the week progresses before a regular clean-up of content, 
current numbers for info (if it's at all relevant) from the index admin view on 
one of the 2 nodes is:

Last Modified:  18 minutes ago
Num Docs:   24590368
Max Doc:29139255
Deleted Docs:   4548887
Version:1297982
Segment Count:  28

   Version  Gen Size
Master: 1412798583558 402364 52.98 GB

Top:
2996 tomcat6   20   0  189g  73g 1.5g S   15 58.7  58034:04 java

And the only GC option I can see that is on is - XX:+UseConcMarkSweepGC

Regarding the XY problem, you are very likely correct, unfortunately I wasn't 
involved in the config and I very much suspect when it was done many of the 
defaults were used and then if it didn't work or there was say an out of memory 
error they just upped the heap to solve the symptom without investigating the 
cause. The luxury of having more than enough RAM I guess!

I'm going to get some late night downtime soon at which point I'm hoping to 
change the heap size, GC settings and add the JMX, it's not exposed to the 
internet so no security is fine.

Right off to do some reading!

Cheers

Si

-Original Message-
From: Shawn Heisey [mailto:apa...@elyograg.org] 
Sent: 08 October 2014 21:09
To: solr-user@lucene.apache.org
Subject: Re: Solr configuration, memory usage and MMapDirectory

On 10/8/2014 4:02 AM, Simon Fairey wrote:
 I'm currently setting up jconsole but as I have to remotely monitor (no gui 
 capability on the server) I have to wait before I can restart solr with a JMX 
 port setup. In the meantime I looked at top and given the calculations you 
 said based on your top output and this top of my java process from the node 
 that handles the querying, the indexing node has a similar memory profile:
 
 https://www.dropbox.com/s/pz85dm4e7qpepco/SolrTop.png?dl=0
 
 It would seem I need a monstrously large heap in the 60GB region?
 
 We do use a lot of navigators/filters so I have set the caches to be quite 
 large for these, are these what are using up the memory?

With a VIRT size of 189GB and a RES size of 73GB, I believe you probably have 
more than 45GB of index data.  This might be a combination of old indexes and 
the active index.  Only the indexes (cores) that are being actively used need 
to be considered when trying to calculate the total RAM needed.  Other indexes 
will not affect performance, even though they increase your virtual memory size.

With MMap, part of the virtual memory size is the size of the index data that 
has been opened on the disk.  This is not memory that's actually allocated.  
There's a very good reason that mmap has been the default in Lucene and Solr 
for more than two years.

http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html

You stated originally that you have 25 million document and 45GB of index data 
on each node.  With those numbers and a conservative configuration, I would 
expect that you need about 4GB of heap, maybe as much as 8GB.  I cannot think 
of any reason that you would NEED a heap 60GB or larger.

Each field that you sort on, each field that you facet on with the default 
facet.method of fc, and each filter that you cache will use a large block of 
memory.  The size of that block of memory is almost exclusively determined by 
the number of documents in the index.

With 25 million documents, each filterCache entry will be approximately 3MB -- 
one bit for every document.  I do not know how big each FieldCache entry is for 
a sort field and a facet field, but assume that they are probably larger than 
the 3MB entries on the filterCache.

I've got a filterCache sized at 64, with an autowarmCount of 4.  With larger 
autowarmCount values, I was seeing commits take 30 seconds or more, because 
each of those filters can take a few seconds to execute.
Cache sizes in the thousands are rarely necessary, and just chew up a lot of 
memory with no benefit.  Large autowarmCount values are also rarely necessary.  
Every time a new searcher is opened by a commit, add up all your autowarmCount 
values and realize that the searcher likely needs to execute that many queries 
before it is available.

If you need to set up remote JMX so you can remotely connect jconsole, I have 
done this in the redhat init script I've built -- see JMX_OPTS here:

http://wiki.apache.org/solr/ShawnHeisey#Init_script

It's never a good idea to expose Solr directly to the internet, but if you use 
that JMX config, *definitely* don't expose it to the Internet.
It doesn't use any authentication.

We might need to back up a little bit and start with the problem that you are 
trying to figure out, not the 

Re: eDisMax parser and special characters

2014-10-08 Thread Jack Krupansky
Hyphen is a prefix operator and is normally followed by a term to indicate 
that the term must not be present. So, your query has a syntax error. The 
two query parsers differ in how they handle various errors. In the case of 
edismax, it quotes operators and then tries again, so the hyphen gets 
quoted, and then analyzed to nothing for text fields but is still a string 
for string fields.


-- Jack Krupansky

-Original Message- 
From: Lanke,Aniruddha

Sent: Wednesday, October 8, 2014 4:38 PM
To: solr-user@lucene.apache.org
Subject: Re: eDisMax parser and special characters

Sorry for a delayed reply here is more information -

Schema that we are using - http://pastebin.com/WQAJCCph
Request Handler in config - http://pastebin.com/Y0kP40WF

Some analysis -

Search term: red -
Parser eDismax
No results show up
str name=parsedquery(+((DisjunctionMaxQuery((name_starts_with:red^9.0 | 
name_parts_starts_with:red^6.0 | s_detail:red | name:red^12.0 | 
s_detail_starts_with:red^3.0 | s_detail_parts_starts_with:red^2.0)) 
DisjunctionMaxQuery((name_starts_with:-^9.0 | 
s_detail_starts_with:-^3.0)))~2))/no_coord/str


Search term: red -
Parser dismax
Results are returned
str name=parsedquery(+DisjunctionMaxQuery((name_starts_with:red^9.0 | 
name_parts_starts_with:red^6.0 | s_detail:red | name:red^12.0 | 
s_detail_starts_with:red^3.0 | s_detail_parts_starts_with:red^2.0)) 
())/no_coord/str


Why do we see the variation in the results between dismax and eDismax?


On Oct 8, 2014, at 8:59 AM, Erick Erickson 
erickerick...@gmail.commailto:erickerick...@gmail.com wrote:


There's not much information here.
What's the doc look like?
What is the analyzer chain for it?
What is the output when you add debug=query?

Details matter. A lot ;)

Best,
Erick

On Wed, Oct 8, 2014 at 6:26 AM, Michael Joyner 
mich...@newsrx.commailto:mich...@newsrx.com wrote:

Try escaping special chars with a \


On 10/08/2014 01:39 AM, Lanke,Aniruddha wrote:

We are using a eDisMax parser in our configuration. When we search using
the query term that has a ‘-‘ we don’t get any results back.

Search term: red - yellow
This doesn’t return any data back but




CONFIDENTIALITY NOTICE This message and any included attachments are from 
Cerner Corporation and are intended only for the addressee. The information 
contained in this message is confidential and may constitute inside or 
non-public information under international, federal, or state securities 
laws. Unauthorized forwarding, printing, copying, distribution, or use of 
such information is strictly prohibited and may be unlawful. If you are not 
the addressee, please promptly delete this message and notify the sender of 
the delivery error by e-mail or you may call Cerner's corporate offices in 
Kansas City, Missouri, U.S.A at (+1) (816)221-1024. 



Re: Edismax parser and boosts

2014-10-08 Thread Jack Krupansky
Definitely sounds like a bug! File a Jira. Thanks for reporting this. What 
release of Solr?




-- Jack Krupansky
-Original Message- 
From: Pawel Rog

Sent: Wednesday, October 8, 2014 3:57 PM
To: solr-user@lucene.apache.org
Subject: Edismax parser and boosts

Hi,
I use edismax query with q parameter set as below:

q=foo^1.0+AND+bar

For such a query for the same document I see different (lower) scoring
value than for

q=foo+AND+bar

By default boost of term is 1 as far as i know so why the scoring differs?

When I check debugQuery parameter in parsedQuery for foo^1.0+AND+bar I
see Boolean query which one of clauses is a phrase query foo 1.0 bar. It
seems that edismax parser takes whole q parameter as a phrase without
removing boost value and add it as a boolean clause. Is it a bug or it
should work like that?

--
Paweł Róg 



Re: Using Velocity with Child Documents?

2014-10-08 Thread Chris Hostetter

: I am trying to index a collection that has child documents.  I have 
: successfully loaded the data into my index using SolrJ, and I have 
: verified that I can search correctly using the child of method in my 
: fq variable.  Now, I would like to use Velocity (Solritas) to display 
: the parent records with some details of the child records underneath.  
: Is there an easy way to do this?  Is there an example somewhere that I 
: can look at?

Step #1 is to forget about velocity and focus on getting the data you want 
about the children into the response.  

To do that you'll need to use the [child] DocTransformer...

https://cwiki.apache.org/confluence/display/solr/Transforming+Result+Documents

ala...

fl=id,[child parentFilter=doc_type:book childFilter=doc_type:chapter limit=100]

If you are using this in conjunction with a block join query, you can use 
local params to eliminate some redundency...

q=some_parent_field:foo
parents=content_type:parentDoc
fq={!parent which=$parents}child_field:bar
fl=id,[child parentFilter=$parents childFilter=content_type:childDoc limit=100]


Step #2: once you have the children in the response data, then you can use 
velocity to access each of the children of the docs that match your query 
via SolrDocument.getChildDocuments()



-Hoss
http://www.lucidworks.com/


Re: Best way to index wordpress blogs in solr

2014-10-08 Thread Jack Krupansky
The LucidWorks product has builtin crawler support so you could crawl one or 
more web sites.


http://lucidworks.com/product/fusion/

-- Jack Krupansky

-Original Message- 
From: Vishal Sharma

Sent: Tuesday, October 7, 2014 2:08 PM
To: solr-user@lucene.apache.org
Subject: Best way to index wordpress blogs in solr

Hi,

I am trying to get some help on finding out if there is any best practice
to index wordpress blogs in solr index? Can someone help with architecture
I shoudl be setting up?

Do, I need to write separate scripts to crawl wordpress and then pump posts
back to Solr using its API?




*Vishal Sharma**TL, Grazitti Interactive*T: +1 650­ 641 1754
E: vish...@grazitti.com
www.grazitti.com [image: Description: LinkedIn]
http://www.linkedin.com/company/grazitti-interactive[image: Description:
Twitter] https://twitter.com/grazitti[image: fbook]
https://www.facebook.com/grazitti.interactive*dreamforce®*Oct 13-16,
2014 *Meet
us at the Cloud Expo*
Booth N2341 Moscone North,
San Francisco
Schedule a Meeting
http://www.vcita.com/v/grazittiinteractive/online_scheduling#/schedule
  |   Follow us https://twitter.com/grazittiZakCalendar
Dreamforce® Featured
App
https://appexchange.salesforce.com/listingDetail?listingId=a0N300B5UPKEA3 



Re: Custom Solr Query Post Filter

2014-10-08 Thread Joel Bernstein
Also just took a quick look at the code. This will likely be a performance
problem if you have a large result set:

String classif = context.reader().document(docId).get(classification);

Instead of using the stored field, you'll want to get the BytesRef for the
field using either the FieldCache or DocValues. Recent releases of
DocValues will likely be the fastest docID-BytesRef lookup.



Joel Bernstein
Search Engineer at Heliosearch

On Wed, Oct 8, 2014 at 2:20 PM, Christopher Gross cogr...@gmail.com wrote:

 That did the trick!  Thanks Joel.

 -- Chris

 On Wed, Oct 8, 2014 at 2:05 PM, Joel Bernstein joels...@gmail.com wrote:

  The results are being cached in the QueryResultCache most likely. You
 need
  to implement equals() and hashCode() on the query object, which is part
 of
  the cache key. In your case the creds param must be included in the
  hashCode and equals logic.
 
  Joel Bernstein
  Search Engineer at Heliosearch
 
  On Wed, Oct 8, 2014 at 1:17 PM, Christopher Gross cogr...@gmail.com
  wrote:
 
   Code:
   http://pastebin.com/tNjzDbmy
  
   Solr 4.9.0
   Tomcat 7
   Java 7
  
   I took Erik Hatcher's example for creating a PostFilter and have
 modified
   it so it would work with Solr 4.x.  Right now it works...the first
 time.
   If I were to run this query it would work right:
  
  
 
 http://localhost:8080/solr/plugintest/select?q=*:*sort=uniqueId%20descfq={!classif%20creds=ABC}
   However, if I ran this one:
  
  
 
 http://localhost:8080/solr/plugintest/select?q=*:*sort=uniqueId%20descfq={!classif%20creds=XYZ}
   I would get the results from the first query.  I could do a different
   query, like:
   http://localhost:8080/solr/plugintest/select?q=uniqueId[* TO
   *]sort=uniqueId%20descfq={!classif%20creds=XYZ}
   and I'd get the XYZ tagged items.  But if I tried to find ABC with that
   one:
   http://localhost:8080/solr/plugintest/select?q=uniqueId[* TO
   *]sort=uniqueId%20descfq={!classif%20creds=ABC}
   it would just list the XYZ items.
  
   I'm not sure what is persisting where to cause this to happen.  Anybody
   have some tips/pointers for building filters like this for Solr 4.x?
  
   Thanks!
  
   -- Chris
  
 



Re: Add multiple JSON documents with boost

2014-10-08 Thread Chris Hostetter

: i try to add documents to the index and boost them (hole document) but i
: get this error message:
: 
: ERROR org.apache.solr.core.SolrCore  –
: org.apache.solr.common.SolrException: Error parsing JSON field value.
: Unexpected OBJECT_START
: 
: Any ideas?

The top level structure you are sending is a JSON array (because you start 
with [) which is how you tell solr you want to send a simple list of 
documents to add.

In order to send explicit commands (like add) your top level JSON 
structure needs to be JSON Object (aka: Map), which contains add as a 
key.

there are examples of this in the ref guide...

https://cwiki.apache.org/confluence/display/solr/Uploading+Data+with+Index+Handlers#UploadingDatawithIndexHandlers-SendingArbitraryJSONUpdateCommands

so basically, just take your list containing 2 objects that each have 1 
key of add and replace it with a single obejct that has 2 add keys...

: {
: add: {
: boost: 1,
: doc: {
: store_id: 1,
: created_at: 2007-08-23T01:03:05Z,
: sku: {boost: 10, value: n2610},
: status: 1,
: tax_class_id_t: 2,
: color_t: Black,
: visibility: 4,
: name: {boost: -60, value: Nokia 2610 Phone},
: url_key: nokia-2610-phone,
: image: \/n\/o\/nokia-2610-phone-2.jpg,
: small_image: \/n\/o\/nokia-2610-phone-2.jpg,
: thumbnail: \/n\/o\/nokia-2610-phone-2.jpg,
: msrp_enabled_t: 2,
: msrp_display_actual_price_type_t: 4,
: model_t: 2610,
: dimension_t: 4.1 x 1.7 x 0.7 inches,
: meta_keyword_t: Nokia 2610, cell, phone,,
: short_description: The words \entry level\ no longer
: mean \low-end,\ especially when it comes to the Nokia 2610. Offering
: advanced media and calling features without breaking the bank,
: price: 149.99,
: in_stock: 1,
: id: 16_1,
: product_id: 16,
: content_type: product,
: attribute_set_id: 38,
: type_id: simple,
: has_options: 0,
: required_options: 0,
: entity_type_id: 10,
: category: [
: 8,
: 13
: ]
: }
: }
  ,
: add: {
: boost: 1,
: doc: {
: store_id: 1,
: created_at: 2007-08-23T03:40:26Z,
: sku: {boost: 10, value: bb8100},
: color_t: Silver,
: status: 1,
: tax_class_id_t: 2,
: visibility: 4,
: name: {boost: -60, value: BlackBerry 8100 Pearl},
: url_key: blackberry-8100-pearl,
: thumbnail: \/b\/l\/blackberry-8100-pearl-2.jpg,
: small_image: \/b\/l\/blackberry-8100-pearl-2.jpg,
: image: \/b\/l\/blackberry-8100-pearl-2.jpg,
: model_t: 8100,
: dimension_t: 4.2 x 2 x 0.6 inches,
: meta_keyword_t: Blackberry, 8100, pearl, cell, phone,
: short_description: The BlackBerry 8100 Pearl is a
: departure from the form factor of previous BlackBerry devices. This
: BlackBerry handset is far more phone-like, and RIM's engineers have managed
: to fit a QWERTY keyboard onto the handset's slim frame.,
: price: 349.99,
: in_stock: 1,
: id: 17_1,
: product_id: 17,
: content_type: product,
: attribute_set_id: 38,
: type_id: simple,
: has_options: 0,
: required_options: 0,
: entity_type_id: 10,
: category: [
: 8,
: 13
: ]
: }
: }
: }


-Hoss
http://www.lucidworks.com/

Re: Having an issue with pivot faceting

2014-10-08 Thread Chris Hostetter

: Subject: Having an issue with pivot faceting

Ok - first off -- your example request doens't include any facet.pivot 
params, so you aren't using pivot faceting .. which makes me concerned 
that if you aren't using the feature you think you are, or don't 
understand the feature you are using.

: I'm having an issue getting pivot faceting working as expected.  I'm trying
: to filter by a specific criteria, and then first facet by one of my document
: attributes called item_generator, then facet those results into 2 sets each:
: the first set is the count of documents satisfying that facet with
: number_of_items_generated set to 0, the other set counting the documents
: satisfying that facet with number_of_items_generated greater than 0.  Seems

second:: interval faceting is just a fancy, more efficient, way of 
using facet.query if your queries are always over ranges.  there's 
nothing about interval faceting that is directly related to pivot 
faceting.

third: there isn't currently any generic support for faceting by a field, 
and then facet those results by some other field/criteria.  This is 
actively being worked on in issues like SOLR-6348 - but it doens't exist 
yet.

fourth: because you ultimately have a specific citeria for how 
you want to divide the facets, something similar to the behavior you are 
asking is available using taged exclusions on facets

https://cwiki.apache.org/confluence/display/solr/Faceting#Faceting-LocalParametersforFaceting

...the basic idea you could follow is that you send additional fq params 
for each of the 2 criteria you want to lump things into 
(number_of_items_generated0 and number_of_items_generated0) but you tag 
those filters so they can individuall be excluded from facets -- then 
you use facet.field on your item_generator field twice (with different 
keys)  and in each case you exclude only one of those filters.

Here's a similar example to what you describe using the sample data that 
comes with solr...

http://localhost:8983/solr/select?rows=0debug=queryq=inStock:truefq={!tag=pricey}price:[100%20TO%20*]fq={!tag=cheap}price:[*%20TO%20100}facet=truefacet.field={!key=cheap_cats%20ex=pricey}catfacet.field={!key=pricey_cats%20ex=cheap}cat

so cheap_cats gives you facet counts on the cat field but only for the 
cheap products (because it excludes the pricey fq) and pricey_cats 
gives you facet counts on the cat field for the pricey products by 
excluding the cheap fq.

note however that the numFound is 0 -- this works fine for getting the 
facet counts you wnat, but you'd need a second query q/ the filters to get 
the main result set since (i'm pretty sure) it's not possible to use ex 
on the main query to exclude filters from affecting the main result set.


-Hoss
http://www.lucidworks.com/


Solr Index to Helio Search

2014-10-08 Thread Norgorn
When I try to simple copy index from native SOLR to Heliosearch, i get
exception:

Caused by: java.lang.IllegalArgumentException: A SPI class of type
org.apache.lu
cene.codecs.Codec with name 'Lucene410' does not exist. You need to add the
corr
esponding JAR file supporting this SPI to your classpath.The current
classpath s
upports the following names: [Lucene40, Lucene3x, Lucene41, Lucene42,
Lucene45,
Lucene46, Lucene49]

Is there any proper way to add index from native SOLR to Heliosearch?

The problem with native SOLR is that there are lot of OOM Exceptions (cause
of large index).



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-Index-to-Helio-Search-tp4163446.html
Sent from the Solr - User mailing list archive at Nabble.com.