How to migrate content of a collection to a new collection

2014-07-23 Thread Per Steffensen

Hi

We have numerous collections each with numerous shards spread across 
numerous machines. We just discovered that all documents have a field 
with a wrong value and besides that we would like to add a new field to 
all documents
* The field with the wrong value is a long, DocValued, Indexed and 
Stored. Some (about half) of the documents need to have a constant added 
to their current value
* The field we want to add will be and int, DocValued, Indexed and 
Stored. Needs to be added to all documents, but will have different 
values among the documents


How to achieve our goal in the easiest possible way?

We thought about spooling/streaming from the existing collection into a 
twin-collection, then delete the existing collection and finally 
rename the twin-collection to have the same name as the original 
collection. Basically indexing all documents again. If that is the 
easiest way, how do we query in a way so that we get all documents 
streamed. We cannot just do a *:* query that returns everything into 
memory and the index from there, because we have billions of documents 
(not enough memory). Please note that we are on 4.4, which does not 
contain the new CURSOR-feature. Please also note that speed is an 
important factor for us.


Guess this could also be achieved by doing 1-1 migration on shard-level 
instead of collection-level, keeping everything in the new collections 
on the same machine as where they lived in the old collections. That 
could probably complete faster than the 1-1 on collection-level 
approach. But this 1-1 on shard-level approach is not very good for us, 
because the long field we need to change is also part of the id 
(controlling the routing to a particular shard) and therefore actually 
we also need to change the id on all documents. So if we do the 1-1 on 
shard-level approach, we will end up having documents in shards that 
they actually do not be to (they would not have been routed there by the 
routing system in Solr). We might be able to live with this disadvantage 
if 1-1 on shard-level can be easily achieved much faster than the 1-1 on 
collection-level.


Any input is very much appreciated! Thanks

Regards, Per Steffensen


integrating Accumulo with solr

2014-07-23 Thread Ali Nazemian
Dear All,
Hi,
I was wondering is there anybody out there that tried to integrate Solr
with Accumulo? I was thinking about using Accumulo on top of HDFS and using
Solr to index data inside Accumulo? Do you have any idea how can I do such
integration?

Best regards.

-- 
A.Nazemian


Are stored fields compressed by default?

2014-07-23 Thread Gili Nachum
Hi! I'm planning to use atomic-updates
https://wiki.apache.org/solr/Atomic_Updates which means having all fields
stored.
Some docs might have text fields of up to 200K, I will feel better knowing
that Solr automatically compresses stored fields (I know Lucene 4.x default
codec does).
*Are stored fields compressed by default? Or there's a way to configure it?
(Solr 4.7).*
Thanks!


Re: Java heap Space error

2014-07-23 Thread Harald Kirsch
You may want to change your solr startup script such that it creates a 
heap dump on OOM. Add -XX:+HeapDumpOnOutOfMemoryError as an option.


The heap dump can be nicely analyzed with http://www.eclipse.org/mat/.

Just increasing -Xmx is a workaround that may help to get around for a 
while. With mat you will see much clearer what is the likely cause.


Harald.

On 22.07.2014 19:37, Ameya Aware wrote:

Hi

i am running into java heap space issue. Please see below log.

ERROR - 2014-07-22 11:38:59.370; org.apache.solr.common.SolrException;
null:java.lang.RuntimeException: java.lang.OutOfMemoryError: Java heap space
at
org.apache.solr.servlet.SolrDispatchFilter.sendError(SolrDispatchFilter.java:790)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:439)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:207)
at
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419)
at
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)
at
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557)
at
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
at
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075)
at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384)
at
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)
at
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
at
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)
at
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154)
at
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)
at org.eclipse.jetty.server.Server.handle(Server.java:368)
at
org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489)
at
org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53)
at
org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:942)
at
org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:1004)
at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:636)
at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235)
at
org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72)
at
org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264)
at
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
at
org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)
at java.lang.Thread.run(Unknown Source)
Caused by: java.lang.OutOfMemoryError: Java heap space
at org.apache.solr.common.util.JavaBinCodec.writeStr(JavaBinCodec.java:567)
at
org.apache.solr.common.util.JavaBinCodec.writePrimitive(JavaBinCodec.java:646)
at
org.apache.solr.common.util.JavaBinCodec.writeKnownType(JavaBinCodec.java:240)
at org.apache.solr.common.util.JavaBinCodec.writeVal(JavaBinCodec.java:153)
at
org.apache.solr.common.util.JavaBinCodec.writeSolrInputDocument(JavaBinCodec.java:409)
at org.apache.solr.update.TransactionLog.write(TransactionLog.java:353)
at org.apache.solr.update.UpdateLog.add(UpdateLog.java:397)
at org.apache.solr.update.UpdateLog.add(UpdateLog.java:382)
at
org.apache.solr.update.DirectUpdateHandler2.addDoc0(DirectUpdateHandler2.java:255)
at
org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:160)
at
org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:69)
at
org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:51)
at
org.apache.solr.update.processor.DistributedUpdateProcessor.doLocalAdd(DistributedUpdateProcessor.java:704)
at
org.apache.solr.update.processor.DistributedUpdateProcessor.versionAdd(DistributedUpdateProcessor.java:858)
at
org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:557)
at
org.apache.solr.update.processor.LogUpdateProcessor.processAdd(LogUpdateProcessorFactory.java:100)
at
org.apache.solr.handler.extraction.ExtractingDocumentLoader.doAdd(ExtractingDocumentLoader.java:121)
at
org.apache.solr.handler.extraction.ExtractingDocumentLoader.addDoc(ExtractingDocumentLoader.java:126)
at
org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:228)
at
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
at

solr 3.6 to 4.7 upgrade has changed the query string

2014-07-23 Thread shashi.rsb
Hi,

 Our backend application queries solr to retrieve certain records.

We were initially on 3.6 version and now upgrade to 4.7 solr version.

something has changed in terms of query string which needs a parentheses for
the below query

both the queries are from 4.7 solr.

returns 1 record
http://solr-dev.ss.com/solr/hotel/select/?q=ID:AFLGWBDAB OR AFLGWBGIB OR
FLGWGNDG
 
returns 14 records as expected.
 
http://solr-dev.ss.com/solr/hotel/select/?q=ID:(AFLGWBDAB OR AFLGWBGIB OR
FLGWGNDG)


If you notice I had to add parentheses to get expected records.

Also I cannot change the application code now as the release process to the
application will go beyond 1 month. Is there a workaround within solrconfig
to not use parentheses in 4.7

Thanks
Shashi



--
View this message in context: 
http://lucene.472066.n3.nabble.com/solr-3-6-to-4-7-upgrade-has-changed-the-query-string-tp4148719.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: SolrCloud replica dies under high throughput

2014-07-23 Thread Darren Lee
Thanks that helped. I no longer see the constant replica recovery. It also 
increased my throughput to 1.6/1.7 million per hour reliably. I actually then 
tried using SSDs instead and it flew up to 6.5 million updates per hour.

Setup:
4 node cluster using m3.2xl AWS servers using general purpose SSDs.

Thanks again,
Darren


-Original Message-
From: Mark Miller [mailto:markrmil...@gmail.com] 
Sent: 22 July 2014 00:25
To: solr-user@lucene.apache.org
Subject: Re: SolrCloud replica dies under high throughput

Looks like you probably have to raise the http client connection pool limits to 
handle that kind of load currently.

They are specified as top level config in solr.xml:

maxUpdateConnections
maxUpdateConnectionsPerHost

--
Mark Miller
about.me/markrmiller

On July 21, 2014 at 7:14:59 PM, Darren Lee (d...@amplience.com) wrote:
 Hi,
  
 I'm doing some benchmarking with Solr Cloud 4.9.0. I am trying to work 
 out exactly how much throughput my cluster can handle.
  
 Consistently in my test I see a replica go into recovering state 
 forever caused by what looks like a timeout during replication. I can 
 understand the timeout and failure (I am hitting it fairly hard) but 
 what seems odd to me is that when I stop the heavy load it still does 
 not recover the next time it tries, it seems broken forever until I manually 
 go in, clear the index and let it do a full resync.
  
 Is this normal? Am I misunderstanding something? My cluster has 4 
 nodes (2 shards, 2 replicas) (AWS m3.2xlarge). I am indexing with ~800 
 concurrent connections and a 10 sec soft commit.
 I consistently get this problem with a throughput of around 1.5 
 million documents per hour.
  
 Thanks all,
 Darren
  
  
 Stack Traces  Messages:
  
 [qtp779330563-627] ERROR org.apache.solr.servlet.SolrDispatchFilter â 
 null:org.apache.http.conn.ConnectionPoolTimeoutException:  
 Timeout waiting for connection from pool at 
 org.apache.http.impl.conn.PoolingClientConnectionManager.leaseConnecti
 on(PoolingClientConnectionManager.java:226)
 at 
 org.apache.http.impl.conn.PoolingClientConnectionManager$1.getConnecti
 on(PoolingClientConnectionManager.java:195)
 at 
 org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequ
 estDirector.java:422) at 
 org.apache.http.impl.client.AbstractHttpClient.doExecute(AbstractHttpC
 lient.java:863) at 
 org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpC
 lient.java:82) at 
 org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpC
 lient.java:106) at 
 org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpC
 lient.java:57) at 
 org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrServer$Runner.ru
 n(ConcurrentUpdateSolrServer.java:233)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.j
 ava:1145) at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.
 java:615) at java.lang.Thread.run(Thread.java:724)
  
 Error while trying to recover. 
 core=assets_shard2_replica1:java.util.concurrent.ExecutionException:  
 org.apache.solr.client.solrj.SolrServerException: IOException occured 
 when talking to server at: http://xxx.xxx.15.171:8080/solr at 
 java.util.concurrent.FutureTask.report(FutureTask.java:122)
 at java.util.concurrent.FutureTask.get(FutureTask.java:188)
 at 
 org.apache.solr.cloud.RecoveryStrategy.sendPrepRecoveryCmd(RecoveryStr
 ategy.java:615) at 
 org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.jav
 a:371) at 
 org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:235)
 Caused by: org.apache.solr.client.solrj.SolrServerException: 
 IOException occured when talking to server at: 
 http://xxx.xxx.15.171:8080/solr at 
 org.apache.solr.client.solrj.impl.HttpSolrServer.executeMethod(HttpSol
 rServer.java:566) at 
 org.apache.solr.client.solrj.impl.HttpSolrServer$1.call(HttpSolrServer
 .java:245) at 
 org.apache.solr.client.solrj.impl.HttpSolrServer$1.call(HttpSolrServer
 .java:241) at java.util.concurrent.FutureTask.run(FutureTask.java:262)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.j
 ava:1145) at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.
 java:615) at java.lang.Thread.run(Thread.java:744)
 Caused by: java.net.SocketException: Socket closed at 
 java.net.SocketInputStream.socketRead0(Native Method) at 
 java.net.SocketInputStream.read(SocketInputStream.java:152)
 at java.net.SocketInputStream.read(SocketInputStream.java:122)
 at 
 org.apache.http.impl.io.AbstractSessionInputBuffer.fillBuffer(Abstract
 SessionInputBuffer.java:160) at 
 org.apache.http.impl.io.SocketInputBuffer.fillBuffer(SocketInputBuffer
 .java:84) at 
 org.apache.http.impl.io.AbstractSessionInputBuffer.readLine(AbstractSe
 ssionInputBuffer.java:273) at 
 org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultH
 ttpResponseParser.java:140) at 
 

Re: Query using doc Id

2014-07-23 Thread Mukundaraman Valakumaresan
@Alexandre
No, I mean the same what you mean docId:[100 TO 200]

@Santosh
My intention is to query all the docs from Solr. If I give rows=100start=100,
for which I need to apply my query as *:* , which is not a good idea. Hence
looking for an option to filter based on docId.

Thanks  Regards
Mukund




On Wed, Jul 23, 2014 at 10:43 AM, Alexandre Rafalovitch arafa...@gmail.com
wrote:

 Do you mean something different from docId:[100 TO 200] ?

 Regards,
Alex.
 Personal: http://www.outerthoughts.com/ and @arafalov
 Solr resources: http://www.solr-start.com/ and @solrstart
 Solr popularizers community: https://www.linkedin.com/groups?gid=6713853


 On Wed, Jul 23, 2014 at 11:49 AM, Mukundaraman Valakumaresan
 muk...@8kmiles.com wrote:
  Hi,
 
  Is it possible to execute queries using doc Id as a query parameter
 
  For eg, query docs whose doc Id is between 100 and 200
 
  Thanks  Regards
  Mukund



Re: Query using doc Id

2014-07-23 Thread Alexandre Rafalovitch
Perhaps you are looking for cursorMark:
http://solr.pl/en/2014/03/10/solr-4-7-efficient-deep-paging/ ?

Regards,
   Alex.
Personal: http://www.outerthoughts.com/ and @arafalov
Solr resources: http://www.solr-start.com/ and @solrstart
Solr popularizers community: https://www.linkedin.com/groups?gid=6713853


On Wed, Jul 23, 2014 at 4:59 PM, Mukundaraman Valakumaresan
muk...@8kmiles.com wrote:
 @Alexandre
 No, I mean the same what you mean docId:[100 TO 200]

 @Santosh
 My intention is to query all the docs from Solr. If I give rows=100start=100,
 for which I need to apply my query as *:* , which is not a good idea. Hence
 looking for an option to filter based on docId.

 Thanks  Regards
 Mukund




 On Wed, Jul 23, 2014 at 10:43 AM, Alexandre Rafalovitch arafa...@gmail.com
 wrote:

 Do you mean something different from docId:[100 TO 200] ?

 Regards,
Alex.
 Personal: http://www.outerthoughts.com/ and @arafalov
 Solr resources: http://www.solr-start.com/ and @solrstart
 Solr popularizers community: https://www.linkedin.com/groups?gid=6713853


 On Wed, Jul 23, 2014 at 11:49 AM, Mukundaraman Valakumaresan
 muk...@8kmiles.com wrote:
  Hi,
 
  Is it possible to execute queries using doc Id as a query parameter
 
  For eg, query docs whose doc Id is between 100 and 200
 
  Thanks  Regards
  Mukund



Re: Query using doc Id

2014-07-23 Thread Mukundaraman Valakumaresan
Exactly Alexandre, Thanks

Regards
Mukund


On Wed, Jul 23, 2014 at 3:37 PM, Alexandre Rafalovitch arafa...@gmail.com
wrote:

 Perhaps you are looking for cursorMark:
 http://solr.pl/en/2014/03/10/solr-4-7-efficient-deep-paging/ ?

 Regards,
Alex.
 Personal: http://www.outerthoughts.com/ and @arafalov
 Solr resources: http://www.solr-start.com/ and @solrstart
 Solr popularizers community: https://www.linkedin.com/groups?gid=6713853


 On Wed, Jul 23, 2014 at 4:59 PM, Mukundaraman Valakumaresan
 muk...@8kmiles.com wrote:
  @Alexandre
  No, I mean the same what you mean docId:[100 TO 200]
 
  @Santosh
  My intention is to query all the docs from Solr. If I give
 rows=100start=100,
  for which I need to apply my query as *:* , which is not a good idea.
 Hence
  looking for an option to filter based on docId.
 
  Thanks  Regards
  Mukund
 
 
 
 
  On Wed, Jul 23, 2014 at 10:43 AM, Alexandre Rafalovitch 
 arafa...@gmail.com
  wrote:
 
  Do you mean something different from docId:[100 TO 200] ?
 
  Regards,
 Alex.
  Personal: http://www.outerthoughts.com/ and @arafalov
  Solr resources: http://www.solr-start.com/ and @solrstart
  Solr popularizers community:
 https://www.linkedin.com/groups?gid=6713853
 
 
  On Wed, Jul 23, 2014 at 11:49 AM, Mukundaraman Valakumaresan
  muk...@8kmiles.com wrote:
   Hi,
  
   Is it possible to execute queries using doc Id as a query parameter
  
   For eg, query docs whose doc Id is between 100 and 200
  
   Thanks  Regards
   Mukund
 



[ANN] SIREn, a Lucene/Solr plugin for rich JSON data search

2014-07-23 Thread Renaud Delbru
One of the coolest features of Lucene/Solr is its ability to index 
nested documents using a Blockjoin approach.


While this works well for small documents and document collections, it 
becomes unsustainable for larger ones: Blockjoin works by splitting the 
original document in many documents, one per nested record.


For example, a single USPTO patent (XML format converted to JSON) will 
end up being over 1500 documents in the index. This has massive 
implications on performance and scalability.


Introducing SIREn

SIREn is an open source plugin for Solr for indexing and searching rich 
nested JSON data.


SIREn uses a sophisticated tree indexing design which ensures that the 
index is not artificially inflated. This ensures that querying on many 
types of nested queries can be up to 3x faster. Further, depending on 
the data, memory requirements for faceting can be up to 10x higher. As 
such, SIREn allows you to use Solr for larger and more complex datasets, 
especially so for sophisticated analytics. (You can read our whitepaper 
to find out more [1])


SIREn is also truly schemaless - it even allows you to change the type 
of a property between documents without being restricted by a defined 
mapping. This can be very useful for data integration scenarios where 
data is described in different ways in different sources.


You only need a few minutes to download and try SIREn [2]. It comes with 
a detailed manual [3] and you have access to the code on GitHub [4].


We look forward to hear about your feedbacks.

[1] 
http://siren.solutions/siren/resources/whitepapers/comparing-siren-1-2-and-lucenes-blockjoin-performance-a-uspto-patent-search-scenario/

[2] http://siren.solutions/siren/downloads/
[3] http://siren.solutions/manual/preface.html
[4] https://github.com/sindicetech/siren
--
Renaud Delbru
CTO
SIREn Solutions


Passivate core in Solr Cloud

2014-07-23 Thread Aurélien MAZOYER

Hello,

We want to setup a Solr Cloud cluster in order to handle a high volume 
of documents with a multi-tenant architecture. The problem is that an 
application-level isolation for a tenant (using a mutual index with a 
field customer) is not enough to fit our requirements. As a result, we 
need 1 collection/customer. There is more than a thousand customers and 
it seems unreasonable to create thousands of collections in Solr 
Cloud... But as we know that there are less than 1 query/customer/day, 
we are currently looking for a way to passivate collection when they are 
not in use. Can it be a good idea? If yes, are there best practices to 
implement this? What side effects can we expect? Do we need to put some 
application-level logic on top on the Solr Cloud cluster to choose which 
collection we have to unload (and maybe there is something smarter (and 
quicker?) than simply loading/unloading the core when it is not in used?) ?



Thank you for your answer(s),

Aurelien



Re: Passivate core in Solr Cloud

2014-07-23 Thread Alexandre Rafalovitch
Solr has some support for large number of cores, including transient
cores: http://wiki.apache.org/solr/LotsOfCores

Regards,
   Alex.
Personal: http://www.outerthoughts.com/ and @arafalov
Solr resources: http://www.solr-start.com/ and @solrstart
Solr popularizers community: https://www.linkedin.com/groups?gid=6713853


On Wed, Jul 23, 2014 at 7:55 PM, Aurélien MAZOYER
aurelien.mazo...@francelabs.com wrote:
 Hello,

 We want to setup a Solr Cloud cluster in order to handle a high volume of
 documents with a multi-tenant architecture. The problem is that an
 application-level isolation for a tenant (using a mutual index with a field
 customer) is not enough to fit our requirements. As a result, we need 1
 collection/customer. There is more than a thousand customers and it seems
 unreasonable to create thousands of collections in Solr Cloud... But as we
 know that there are less than 1 query/customer/day, we are currently looking
 for a way to passivate collection when they are not in use. Can it be a good
 idea? If yes, are there best practices to implement this? What side effects
 can we expect? Do we need to put some application-level logic on top on the
 Solr Cloud cluster to choose which collection we have to unload (and maybe
 there is something smarter (and quicker?) than simply loading/unloading the
 core when it is not in used?) ?


 Thank you for your answer(s),

 Aurelien



how to fully test a response writer

2014-07-23 Thread Matteo Grolla
Hi,
I developed a new SolResponseWriter but I'm not happy with how I wrote 
tests.
My problem is that I need to test it either with local request and with 
distributed request since the solr response object (input to the response 
writer) are different.
a) I tested the local request case 
using SolrTestCaseJ4
b) tested the distributed request case 
using a junit test case and making rest calls to
the alias coll12 associated to
a couple of solrcloud  collection configured with my custom 
response writer

the problem with b) is that it requires a manual setup on every machine where I 
want to run the tests.

thanks



Re: text search problem

2014-07-23 Thread Josh Lincoln
Ravi, for the hyphen issue, try setting autoGeneratePhraseQueries=true for
that fieldType (no re-index needed). As of 1.4, this defaults to false. One
word of caution, autoGeneratePhraseQueries may not work as expected for
langauges that aren't whitespace delimited. As Erick mentioned, the
Analysis page will help you verify that your content and your queries are
handled the way you expect them to be.

See this thread for more info on autoGeneratePhraseQueries
http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201202.mbox/%3c439f69a3-f292-482b-a102-7c011c576...@gmail.com%3E


On Mon, Jul 21, 2014 at 8:42 PM, Erick Erickson erickerick...@gmail.com
wrote:

 Try escaping the hyphen as \-. Or enclosing it all
 in quotes.

 But you _really_ have to spend some time with the debug option
 an admin/analysis page or you will find endless surprises.

 Best,
 Erick


 On Mon, Jul 21, 2014 at 11:12 AM, EXTERNAL Taminidi Ravi (ETI,
 Automotive-Service-Solutions) external.ravi.tamin...@us.bosch.com wrote:

 
  Thanks for the reply Erick, I will try as you suggested. There I have
   another question related to this lines.
 
  When I have - in my description , name then the search results are
  different. For e.g.
 
  ABC-123 , it look sofr ABC or 123, I want to treat this search as exact
  match, i.e if my document has ABC-123 then I should get the results.
 
  When I check with hl-on, it has emABCem and get the results. How can
  I avoid this situation.
 
  Thanks
 
  Ravi
 
 
  -Original Message-
  From: Erick Erickson [mailto:erickerick...@gmail.com]
  Sent: Saturday, July 19, 2014 4:40 PM
  To: solr-user@lucene.apache.org
  Subject: Re: text search problem
 
  Try adding debug=all to the query and see what the parsed form of the
  query is, likely you're
  1 using phrase queries, so broadway hotel requires both words in the
  1 text
  or
  2 if you're not using phrases, you're searching for the AND of the two
  terms.
 
  But debug=all will show you.
 
  Plus, take a look at the admin/analysis page, your tokenization may not
 be
  what you expect.
 
  Best,
  Erick
 
 
  On Fri, Jul 18, 2014 at 2:00 PM, EXTERNAL Taminidi Ravi (ETI,
  Automotive-Service-Solutions) external.ravi.tamin...@us.bosch.com
 wrote:
 
   Hi,  Below is the text_general field type when I search Text:Boradway
   it is not returning all the records, it returning only few records.
   But when I search for Text:*Broadway*, it is getting more records.
   When I get into multiple words ln search like Broadway Hotel, it may
   not get Broadway , HotelBroadway Hotel. DO you have any
   thought how to handle these type of keyword search.
  
   Text:Broadway,Vehicle Detailing,Water Systems,Vehicle Detailing,Car
   Wash Water Recovery
  
   My Field type look like this.
  
   fieldType name=text_general class=solr.TextField
   positionIncrementGap=100
 analyzer type=index
charFilter class=solr.HTMLStripCharFilterFactory /
 tokenizer class=solr.WhitespaceTokenizerFactory/
   filter class=solr.StopFilterFactory ignoreCase=true
   words=stopwords.txt /
 filter class=solr.KStemFilterFactory/
 filter class=solr.LowerCaseFilterFactory/
 filter class=solr.WordDelimiterFilterFactory
   generateWordParts=0 generateNumberParts=0 splitOnCaseChange=0
   splitOnNumerics=0 stemEnglishPossessive=0 catenateWords=1
   catenateNumbers=1 catenateAll=1 preserveOriginal=0/
  
 !-- in this example, we will only use synonyms at query
  time
   filter class=solr.SynonymFilterFactory
   synonyms=index_synonyms.txt ignoreCase=true expand=false/
   --
  
 /analyzer
 analyzer type=query
charFilter class=solr.HTMLStripCharFilterFactory /
tokenizer class=solr.WhitespaceTokenizerFactory/
 filter class=solr.KStemFilterFactory/
   filter class=solr.StopFilterFactory ignoreCase=true
   words=stopwords.txt /
   filter class=solr.SynonymFilterFactory
 synonyms=synonyms.txt
   ignoreCase=true expand=true/
   filter class=solr.LowerCaseFilterFactory/
 filter class=solr.WordDelimiterFilterFactory
   generateWordParts=0 generateNumberParts=0 splitOnCaseChange=0
   splitOnNumerics=0 stemEnglishPossessive=0 catenateWords=1
   catenateNumbers=1 catenateAll=1 preserveOriginal=0/
  
/analyzer
   /fieldType
  
  
  
   Do you have any thought the behavior or how to get this?
  
   Thanks
  
   Ravi
  
 



Question about ReRankQuery

2014-07-23 Thread Peter Keegan
I'm looking at how 'ReRankQuery' works. If the main query has a Sort
criteria, it is only used to sort the first pass results. The QueryScorer
used in the second pass only reorders the ScoreDocs based on score and
docid, but doesn't use the original Sort fields. If the Sort criteria is
'score desc, myfield asc', I would expect 'myfield' to break score ties
from the second pass after rescoring.

Is this a bug or the intended behavior?

Thanks,
Peter


RE: NoClassDefFoundError while indexing in Solr

2014-07-23 Thread Pablo Queixalos
There is a source code parser in tika that in fact just renders the source 
using an external source higlighter.

Seen in you stack trace : 
com.apache.tika.parser.code.SourceCodeParser.getRenderer(SourceCodeParser.java:121)

You are indexing code (java, c or groovy). Solr seems to be missing a 
transitive tika dependency (http://freecode.com/projects/jhighlight).

Copying the lib in solr runtime lib directory should solve your issue.


Pablo.

From: Shalin Shekhar Mangar shalinman...@gmail.com
Sent: Wednesday, July 23, 2014 7:43 AM
To: solr-user@lucene.apache.org
Subject: Re: NoClassDefFoundError while indexing in Solr

Solr is trying to load com/uwyn/jhighlight/renderer/XhtmlRendererFactory
but that is not a class which is shipped or used by Solr. I think you have
some custom plugins (a highlighter perhaps?) which uses that class and the
classpath is not setup correctly.


On Wed, Jul 23, 2014 at 2:20 AM, Ameya Aware ameya.aw...@gmail.com wrote:

 Hi

 I am running into below error while indexing a file in solr.

 Can you please help to fix this?

 ERROR - 2014-07-22 16:40:32.126; org.apache.solr.common.SolrException;
 null:java.lang.RuntimeException: java.lang.NoClassDefFoundError:
 com/uwyn/jhighlight/renderer/XhtmlRendererFactory
 at

 org.apache.solr.servlet.SolrDispatchFilter.sendError(SolrDispatchFilter.java:790)
 at

 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:439)
 at

 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:207)
 at

 org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419)
 at
 org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455)
 at

 org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)
 at
 org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557)
 at

 org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
 at

 org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075)
 at
 org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384)
 at

 org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)
 at

 org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009)
 at

 org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
 at

 org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)
 at

 org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154)
 at

 org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)
 at org.eclipse.jetty.server.Server.handle(Server.java:368)
 at

 org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489)
 at

 org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53)
 at

 org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:942)
 at

 org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:1004)
 at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:636)
 at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235)
 at

 org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72)
 at

 org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264)
 at

 org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
 at

 org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)
 at java.lang.Thread.run(Unknown Source)
 Caused by: java.lang.NoClassDefFoundError:
 com/uwyn/jhighlight/renderer/XhtmlRendererFactory
 at

 org.apache.tika.parser.code.SourceCodeParser.getRenderer(SourceCodeParser.java:121)
 at

 org.apache.tika.parser.code.SourceCodeParser.parse(SourceCodeParser.java:102)
 at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
 at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
 at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
 at

 org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:219)
 at

 org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74)
 at

 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
 at

 org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:241)
 at org.apache.solr.core.SolrCore.execute(SolrCore.java:1952)
 at

 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:774)
 at

 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:418)
 ... 26 more
 Caused by: java.lang.ClassNotFoundException:
 com.uwyn.jhighlight.renderer.XhtmlRendererFactory
 at 

Re: Solr 4.7.2 auto suggestion

2014-07-23 Thread benjelloun
Hello,
 the suggester solr.SuggestComponent
with 
str name=lookupImplFuzzyLookupFactory/str  
str name=dictionaryImplDocumentDictionaryFactory/str

Dont work with type of fields which are not string and  multivalued.

any idea ?




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-4-7-2-auto-suggestion-tp4147677p4148768.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: NoClassDefFoundError while indexing in Solr

2014-07-23 Thread Ameya Aware
Thanks a lot for your suggestions.


On Wed, Jul 23, 2014 at 9:53 AM, Pablo Queixalos 
pqueixa...@customermatrix.com wrote:

 There is a source code parser in tika that in fact just renders the
 source using an external source higlighter.

 Seen in you stack trace :
 com.apache.tika.parser.code.SourceCodeParser.getRenderer(SourceCodeParser.java:121)

 You are indexing code (java, c or groovy). Solr seems to be missing a
 transitive tika dependency (http://freecode.com/projects/jhighlight).

 Copying the lib in solr runtime lib directory should solve your issue.


 Pablo.
 
 From: Shalin Shekhar Mangar shalinman...@gmail.com
 Sent: Wednesday, July 23, 2014 7:43 AM
 To: solr-user@lucene.apache.org
 Subject: Re: NoClassDefFoundError while indexing in Solr

 Solr is trying to load com/uwyn/jhighlight/renderer/XhtmlRendererFactory
 but that is not a class which is shipped or used by Solr. I think you have
 some custom plugins (a highlighter perhaps?) which uses that class and the
 classpath is not setup correctly.


 On Wed, Jul 23, 2014 at 2:20 AM, Ameya Aware ameya.aw...@gmail.com
 wrote:

  Hi
 
  I am running into below error while indexing a file in solr.
 
  Can you please help to fix this?
 
  ERROR - 2014-07-22 16:40:32.126; org.apache.solr.common.SolrException;
  null:java.lang.RuntimeException: java.lang.NoClassDefFoundError:
  com/uwyn/jhighlight/renderer/XhtmlRendererFactory
  at
 
 
 org.apache.solr.servlet.SolrDispatchFilter.sendError(SolrDispatchFilter.java:790)
  at
 
 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:439)
  at
 
 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:207)
  at
 
 
 org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419)
  at
 
 org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455)
  at
 
 
 org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)
  at
 
 org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557)
  at
 
 
 org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
  at
 
 
 org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075)
  at
  org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384)
  at
 
 
 org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)
  at
 
 
 org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009)
  at
 
 
 org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
  at
 
 
 org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)
  at
 
 
 org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154)
  at
 
 
 org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)
  at org.eclipse.jetty.server.Server.handle(Server.java:368)
  at
 
 
 org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489)
  at
 
 
 org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53)
  at
 
 
 org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:942)
  at
 
 
 org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:1004)
  at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:636)
  at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235)
  at
 
 
 org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72)
  at
 
 
 org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264)
  at
 
 
 org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
  at
 
 
 org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)
  at java.lang.Thread.run(Unknown Source)
  Caused by: java.lang.NoClassDefFoundError:
  com/uwyn/jhighlight/renderer/XhtmlRendererFactory
  at
 
 
 org.apache.tika.parser.code.SourceCodeParser.getRenderer(SourceCodeParser.java:121)
  at
 
 
 org.apache.tika.parser.code.SourceCodeParser.parse(SourceCodeParser.java:102)
  at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
  at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
  at
 org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
  at
 
 
 org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:219)
  at
 
 
 org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74)
  at
 
 
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
  at
 
 
 org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:241)
  at org.apache.solr.core.SolrCore.execute(SolrCore.java:1952)
  at
 
 
 

Re: [ANN] SIREn, a Lucene/Solr plugin for rich JSON data search

2014-07-23 Thread Jay Vyas
Querying nested data is very difficult in any modern db that I have seen.

If It works as you suggest then It would be cool if the feature was it going to 
be eventually maintained inside solr.

 On Jul 23, 2014, at 7:13 AM, Renaud Delbru renaud@siren.solutions wrote:
 
 One of the coolest features of Lucene/Solr is its ability to index nested 
 documents using a Blockjoin approach.
 
 While this works well for small documents and document collections, it 
 becomes unsustainable for larger ones: Blockjoin works by splitting the 
 original document in many documents, one per nested record.
 
 For example, a single USPTO patent (XML format converted to JSON) will end up 
 being over 1500 documents in the index. This has massive implications on 
 performance and scalability.
 
 Introducing SIREn
 
 SIREn is an open source plugin for Solr for indexing and searching rich 
 nested JSON data.
 
 SIREn uses a sophisticated tree indexing design which ensures that the 
 index is not artificially inflated. This ensures that querying on many types 
 of nested queries can be up to 3x faster. Further, depending on the data, 
 memory requirements for faceting can be up to 10x higher. As such, SIREn 
 allows you to use Solr for larger and more complex datasets, especially so 
 for sophisticated analytics. (You can read our whitepaper to find out more 
 [1])
 
 SIREn is also truly schemaless - it even allows you to change the type of a 
 property between documents without being restricted by a defined mapping. 
 This can be very useful for data integration scenarios where data is 
 described in different ways in different sources.
 
 You only need a few minutes to download and try SIREn [2]. It comes with a 
 detailed manual [3] and you have access to the code on GitHub [4].
 
 We look forward to hear about your feedbacks.
 
 [1] 
 http://siren.solutions/siren/resources/whitepapers/comparing-siren-1-2-and-lucenes-blockjoin-performance-a-uspto-patent-search-scenario/
 [2] http://siren.solutions/siren/downloads/
 [3] http://siren.solutions/manual/preface.html
 [4] https://github.com/sindicetech/siren
 -- 
 Renaud Delbru
 CTO
 SIREn Solutions


Re: How to get Lacuma to match Lucuma

2014-07-23 Thread Warren Bell
Is there a way to make solr do fuzzy searches automatically without having to 
add the tilda character ? And are there disadvantages of doing a fuzzy searches 
?

Warren

On Jul 22, 2014, at 1:54 PM, Anshum Gupta ans...@anshumgupta.net wrote:

 Hi Warren,
 
 Check out the section about fuzzy search here
 https://cwiki.apache.org/confluence/display/solr/The+Standard+Query+Parser.
 
 
 On Tue, Jul 22, 2014 at 1:29 PM, Warren Bell warr...@clarksnutrition.com
 wrote:
 
 What field type or filters do I use to get something like the word
 “Lacuma” to return results with “Lucuma” in it ? The word “Lucuma” has been
 indexed in a field with field type text_en_splitting that came with the
 original solar examples.
 
 Thanks,
 
 Warren
 
 
   fieldType name=text_en_splitting class=solr.TextField
 positionIncrementGap=100 autoGeneratePhraseQueries=true
  analyzer type=index
tokenizer class=solr.WhitespaceTokenizerFactory/
!-- in this example, we will only use synonyms at query time
filter class=solr.SynonymFilterFactory
 synonyms=index_synonyms.txt ignoreCase=true expand=false/
--
!-- Case insensitive stop word removal.
--
filter class=solr.StopFilterFactory
ignoreCase=true
words=lang/stopwords_en.txt
/
filter class=solr.WordDelimiterFilterFactory
 generateWordParts=1 generateNumberParts=1 catenateWords=1
 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.KeywordMarkerFilterFactory
 protected=protwords.txt/
filter class=solr.PorterStemFilterFactory/
  /analyzer
  analyzer type=query
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.SynonymFilterFactory synonyms=synonyms.txt
 ignoreCase=true expand=true/
filter class=solr.StopFilterFactory
ignoreCase=true
words=lang/stopwords_en.txt
/
filter class=solr.WordDelimiterFilterFactory
 generateWordParts=1 generateNumberParts=1 catenateWords=0
 catenateNumbers=0 catenateAll=0 splitOnCaseChange=1/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.KeywordMarkerFilterFactory
 protected=protwords.txt/
filter class=solr.PorterStemFilterFactory/
  /analyzer
/fieldType
 
 
 --
 This email was Virus checked by Clark's Nutrition's Astaro Security
 Gateway.
 
 The information contained in this e-mail is intended only for use of
 the individual or entity named above. This e-mail, and any documents,
 files, previous e-mails or other information attached to it, may contain
 confidential information that is legally privileged. If you are not the
 intended recipient of this e-mail, or the employee or agent responsible
 for delivering it to the intended recipient, you are hereby notified
 that any disclosure, dissemination, distribution, copying or other use
 of this e-mail or any of the information contained in or attached to it
 is strictly prohibited. If you have received this e-mail in error,
 please immediately notify us by return e-mail or by telephone at
 (951)321-1960, and destroy the original e-mail and its attachments
 without reading or saving it in any manner. Thank you.
 
 Clark’s Nutrition is a registered trademark of Clark's Nutritional
 Centers, Inc.
 
 
 
 
 -- 
 
 Anshum Gupta
 http://www.anshumgupta.net
 
 -- 
 This email was Virus checked by Clark's Nutrition's Astaro Security Gateway.



how to achieve static boost in solr

2014-07-23 Thread rahulmodi
Hi,

I am struggling how to achieve static boost in solr, i have visited many web
sites but not getting solid answer.

The requirement is as below:
Suppose i have 100 keywords to search for and for each keyword i want
particular URL to be appear on top.

Say..
for keyword *car* the URL *http://car.com* should be on the top
for keyword *building* the URL *http://building.com* should be on the
top
for keyword *java* the URL *http://javajee.com* should be on the top
And So On

How to achieve this if there are many no.of keywords or queries and i don't
want to Hard code in java API as i should not hard coding for hundreds of
keywords. 
I am using DB crawling and many of the keywords and all the urls are stored
in DB.
If i can use some configuration settings to achieve this then it will be
better.




--
View this message in context: 
http://lucene.472066.n3.nabble.com/how-to-achieve-static-boost-in-solr-tp4148788.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: How to migrate content of a collection to a new collection

2014-07-23 Thread Erick Erickson
Per:

Given that you said that the field redefinition also includes routing
info I don't see
any other way than re-indexing each collection. That said, could you use the
collection aliasing and do one collection at a time?

Best,
Erick


On Tue, Jul 22, 2014 at 11:45 PM, Per Steffensen st...@designware.dk
wrote:

 Hi

 We have numerous collections each with numerous shards spread across
 numerous machines. We just discovered that all documents have a field with
 a wrong value and besides that we would like to add a new field to all
 documents
 * The field with the wrong value is a long, DocValued, Indexed and Stored.
 Some (about half) of the documents need to have a constant added to their
 current value
 * The field we want to add will be and int, DocValued, Indexed and Stored.
 Needs to be added to all documents, but will have different values among
 the documents

 How to achieve our goal in the easiest possible way?

 We thought about spooling/streaming from the existing collection into a
 twin-collection, then delete the existing collection and finally rename
 the twin-collection to have the same name as the original collection.
 Basically indexing all documents again. If that is the easiest way, how do
 we query in a way so that we get all documents streamed. We cannot just do
 a *:* query that returns everything into memory and the index from there,
 because we have billions of documents (not enough memory). Please note that
 we are on 4.4, which does not contain the new CURSOR-feature. Please also
 note that speed is an important factor for us.

 Guess this could also be achieved by doing 1-1 migration on shard-level
 instead of collection-level, keeping everything in the new collections on
 the same machine as where they lived in the old collections. That could
 probably complete faster than the 1-1 on collection-level approach. But
 this 1-1 on shard-level approach is not very good for us, because the long
 field we need to change is also part of the id (controlling the routing to
 a particular shard) and therefore actually we also need to change the id on
 all documents. So if we do the 1-1 on shard-level approach, we will end up
 having documents in shards that they actually do not be to (they would not
 have been routed there by the routing system in Solr). We might be able to
 live with this disadvantage if 1-1 on shard-level can be easily achieved
 much faster than the 1-1 on collection-level.

 Any input is very much appreciated! Thanks

 Regards, Per Steffensen



Re: Are stored fields compressed by default?

2014-07-23 Thread Erick Erickson
Yes, they have been since 4.1.

And there's no handy option for turning this off at this point..

Best,
Erick


On Wed, Jul 23, 2014 at 2:31 AM, Gili Nachum gilinac...@gmail.com wrote:

 Hi! I'm planning to use atomic-updates
 https://wiki.apache.org/solr/Atomic_Updates which means having all
 fields
 stored.
 Some docs might have text fields of up to 200K, I will feel better knowing
 that Solr automatically compresses stored fields (I know Lucene 4.x default
 codec does).
 *Are stored fields compressed by default? Or there's a way to configure it?
 (Solr 4.7).*
 Thanks!



Re: solr 3.6 to 4.7 upgrade has changed the query string

2014-07-23 Thread Erick Erickson
Try adding debug=all to both records.

But these are very different queries. My guess is that
something _else_ changed, probably in solrconfig.xml
that's the cause, most probably your default field in your
3.6 case is the ID field. If that's the case you should
be able to change it in the 4.7 solnconfig.xml file for the
request handlers involved. One change is that you used
to be able to define this in schema.xml, but that's
been deprecated.

The first query parses as
ID:AFLGWBDAB http://solr-dev.ss.com/solr/hotel/select/?q=ID:AFLGWBDAB OR
default_search_field:AFLGWBGIB OR
default_search_field:FLGWGNDG

Best,
Erick



On Wed, Jul 23, 2014 at 2:51 AM, shashi.rsb shashi@hotmail.com wrote:

 Hi,

  Our backend application queries solr to retrieve certain records.

 We were initially on 3.6 version and now upgrade to 4.7 solr version.

 something has changed in terms of query string which needs a parentheses
 for
 the below query

 both the queries are from 4.7 solr.

 returns 1 record
 http://solr-dev.ss.com/solr/hotel/select/?q=ID:AFLGWBDAB OR AFLGWBGIB OR
 FLGWGNDG

 returns 14 records as expected.

 http://solr-dev.ss.com/solr/hotel/select/?q=ID:(AFLGWBDAB OR AFLGWBGIB OR
 FLGWGNDG)


 If you notice I had to add parentheses to get expected records.

 Also I cannot change the application code now as the release process to the
 application will go beyond 1 month. Is there a workaround within solrconfig
 to not use parentheses in 4.7

 Thanks
 Shashi



 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/solr-3-6-to-4-7-upgrade-has-changed-the-query-string-tp4148719.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: Passivate core in Solr Cloud

2014-07-23 Thread Erick Erickson
Do note that the lots of cores stuff does NOT play nice with in
distributed mode (yet).

Best,
Erick


On Wed, Jul 23, 2014 at 6:00 AM, Alexandre Rafalovitch arafa...@gmail.com
wrote:

 Solr has some support for large number of cores, including transient
 cores: http://wiki.apache.org/solr/LotsOfCores

 Regards,
Alex.
 Personal: http://www.outerthoughts.com/ and @arafalov
 Solr resources: http://www.solr-start.com/ and @solrstart
 Solr popularizers community: https://www.linkedin.com/groups?gid=6713853


 On Wed, Jul 23, 2014 at 7:55 PM, Aurélien MAZOYER
 aurelien.mazo...@francelabs.com wrote:
  Hello,
 
  We want to setup a Solr Cloud cluster in order to handle a high volume of
  documents with a multi-tenant architecture. The problem is that an
  application-level isolation for a tenant (using a mutual index with a
 field
  customer) is not enough to fit our requirements. As a result, we need 1
  collection/customer. There is more than a thousand customers and it seems
  unreasonable to create thousands of collections in Solr Cloud... But as
 we
  know that there are less than 1 query/customer/day, we are currently
 looking
  for a way to passivate collection when they are not in use. Can it be a
 good
  idea? If yes, are there best practices to implement this? What side
 effects
  can we expect? Do we need to put some application-level logic on top on
 the
  Solr Cloud cluster to choose which collection we have to unload (and
 maybe
  there is something smarter (and quicker?) than simply loading/unloading
 the
  core when it is not in used?) ?
 
 
  Thank you for your answer(s),
 
  Aurelien
 



Re: solr 3.6 to 4.7 upgrade has changed the query string

2014-07-23 Thread Jack Krupansky
Did you blindly switch to the new solrconfig.xml? If so, the default query 
request handler sets the df parameter to text, which would give you 
different results compared to having the defaultSearchField set to some 
other field, like your ID field.


Read the comments in the new schema.xml about defaultSearchField being 
deprecated in favor of the df parameter.


-- Jack Krupansky

-Original Message- 
From: shashi.rsb

Sent: Wednesday, July 23, 2014 5:51 AM
To: solr-user@lucene.apache.org
Subject: solr 3.6 to 4.7 upgrade has changed the query string

Hi,

Our backend application queries solr to retrieve certain records.

We were initially on 3.6 version and now upgrade to 4.7 solr version.

something has changed in terms of query string which needs a parentheses for
the below query

both the queries are from 4.7 solr.

returns 1 record
http://solr-dev.ss.com/solr/hotel/select/?q=ID:AFLGWBDAB OR AFLGWBGIB OR
FLGWGNDG

returns 14 records as expected.

http://solr-dev.ss.com/solr/hotel/select/?q=ID:(AFLGWBDAB OR AFLGWBGIB OR
FLGWGNDG)


If you notice I had to add parentheses to get expected records.

Also I cannot change the application code now as the release process to the
application will go beyond 1 month. Is there a workaround within solrconfig
to not use parentheses in 4.7

Thanks
Shashi



--
View this message in context: 
http://lucene.472066.n3.nabble.com/solr-3-6-to-4-7-upgrade-has-changed-the-query-string-tp4148719.html
Sent from the Solr - User mailing list archive at Nabble.com. 



Re: Question about ReRankQuery

2014-07-23 Thread Erick Erickson
I'm having a little trouble understanding the use-case here. Why use
re-ranking?
Isn't this just combining the original query with the second query with an
AND
and using the original sort?

At the end, you have your original list in it's original order, with
(potentially) some
documents removed that don't satisfy the secondary query.

Or I'm missing the boat entirely.

Best,
Erick


On Wed, Jul 23, 2014 at 6:31 AM, Peter Keegan peterlkee...@gmail.com
wrote:

 I'm looking at how 'ReRankQuery' works. If the main query has a Sort
 criteria, it is only used to sort the first pass results. The QueryScorer
 used in the second pass only reorders the ScoreDocs based on score and
 docid, but doesn't use the original Sort fields. If the Sort criteria is
 'score desc, myfield asc', I would expect 'myfield' to break score ties
 from the second pass after rescoring.

 Is this a bug or the intended behavior?

 Thanks,
 Peter



Re: how to achieve static boost in solr

2014-07-23 Thread Erick Erickson
Take a look at Query Elevation Component perhaps?

Best,
Erick


On Wed, Jul 23, 2014 at 8:05 AM, rahulmodi rahul.m...@ge.com wrote:

 Hi,

 I am struggling how to achieve static boost in solr, i have visited many
 web
 sites but not getting solid answer.

 The requirement is as below:
 Suppose i have 100 keywords to search for and for each keyword i want
 particular URL to be appear on top.

 Say..
 for keyword *car* the URL *http://car.com* should be on the top
 for keyword *building* the URL *http://building.com* should be on the
 top
 for keyword *java* the URL *http://javajee.com* should be on the top
 And So On

 How to achieve this if there are many no.of keywords or queries and i don't
 want to Hard code in java API as i should not hard coding for hundreds of
 keywords.
 I am using DB crawling and many of the keywords and all the urls are stored
 in DB.
 If i can use some configuration settings to achieve this then it will be
 better.




 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/how-to-achieve-static-boost-in-solr-tp4148788.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: Question about ReRankQuery

2014-07-23 Thread Peter Keegan
See http://heliosearch.org/solrs-new-re-ranking-feature/


On Wed, Jul 23, 2014 at 11:27 AM, Erick Erickson erickerick...@gmail.com
wrote:

 I'm having a little trouble understanding the use-case here. Why use
 re-ranking?
 Isn't this just combining the original query with the second query with an
 AND
 and using the original sort?

 At the end, you have your original list in it's original order, with
 (potentially) some
 documents removed that don't satisfy the secondary query.

 Or I'm missing the boat entirely.

 Best,
 Erick


 On Wed, Jul 23, 2014 at 6:31 AM, Peter Keegan peterlkee...@gmail.com
 wrote:

  I'm looking at how 'ReRankQuery' works. If the main query has a Sort
  criteria, it is only used to sort the first pass results. The QueryScorer
  used in the second pass only reorders the ScoreDocs based on score and
  docid, but doesn't use the original Sort fields. If the Sort criteria is
  'score desc, myfield asc', I would expect 'myfield' to break score ties
  from the second pass after rescoring.
 
  Is this a bug or the intended behavior?
 
  Thanks,
  Peter
 



Re: [ANN] SIREn, a Lucene/Solr plugin for rich JSON data search

2014-07-23 Thread Walter Underwood
Querying nested data is very easy in MarkLogic, it was built for that. I used 
to work there.

The founder is a former search engine guy from Infoseek and Ultraseek, so it 
has a lot of familiar behavior, like merging segments automatically.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/


On Jul 23, 2014, at 7:25 AM, Jay Vyas jayunit100.apa...@gmail.com wrote:

 Querying nested data is very difficult in any modern db that I have seen.
 
 If It works as you suggest then It would be cool if the feature was it going 
 to be eventually maintained inside solr.
 
 On Jul 23, 2014, at 7:13 AM, Renaud Delbru renaud@siren.solutions wrote:
 
 One of the coolest features of Lucene/Solr is its ability to index nested 
 documents using a Blockjoin approach.
 
 While this works well for small documents and document collections, it 
 becomes unsustainable for larger ones: Blockjoin works by splitting the 
 original document in many documents, one per nested record.
 
 For example, a single USPTO patent (XML format converted to JSON) will end 
 up being over 1500 documents in the index. This has massive implications on 
 performance and scalability.
 
 Introducing SIREn
 
 SIREn is an open source plugin for Solr for indexing and searching rich 
 nested JSON data.
 
 SIREn uses a sophisticated tree indexing design which ensures that the 
 index is not artificially inflated. This ensures that querying on many types 
 of nested queries can be up to 3x faster. Further, depending on the data, 
 memory requirements for faceting can be up to 10x higher. As such, SIREn 
 allows you to use Solr for larger and more complex datasets, especially so 
 for sophisticated analytics. (You can read our whitepaper to find out more 
 [1])
 
 SIREn is also truly schemaless - it even allows you to change the type of a 
 property between documents without being restricted by a defined mapping. 
 This can be very useful for data integration scenarios where data is 
 described in different ways in different sources.
 
 You only need a few minutes to download and try SIREn [2]. It comes with a 
 detailed manual [3] and you have access to the code on GitHub [4].
 
 We look forward to hear about your feedbacks.
 
 [1] 
 http://siren.solutions/siren/resources/whitepapers/comparing-siren-1-2-and-lucenes-blockjoin-performance-a-uspto-patent-search-scenario/
 [2] http://siren.solutions/siren/downloads/
 [3] http://siren.solutions/manual/preface.html
 [4] https://github.com/sindicetech/siren
 -- 
 Renaud Delbru
 CTO
 SIREn Solutions



Re: NoClassDefFoundError while indexing in Solr

2014-07-23 Thread Steve McKay
BTW, Ameya, jhighlight-1.0.jar is in the Solr binary distribution, in
contrib/extraction/lib. There are a bunch of different libraries that
Tika uses for content extraction, so this seems like a good time to make
sure that Tika has all the jars available that it might need to process
the files you're indexing. Everything relevant should be included in
contrib/extraction/lib.

Steve

On Wed, Jul 23, 2014 at 01:53:45PM +, Pablo Queixalos wrote:
 There is a source code parser in tika that in fact just renders the source 
 using an external source higlighter.
 
 Seen in you stack trace : 
 com.apache.tika.parser.code.SourceCodeParser.getRenderer(SourceCodeParser.java:121)
 
 You are indexing code (java, c or groovy). Solr seems to be missing a 
 transitive tika dependency (http://freecode.com/projects/jhighlight).
 
 Copying the lib in solr runtime lib directory should solve your issue.
 
 
 Pablo.
 
 From: Shalin Shekhar Mangar shalinman...@gmail.com
 Sent: Wednesday, July 23, 2014 7:43 AM
 To: solr-user@lucene.apache.org
 Subject: Re: NoClassDefFoundError while indexing in Solr
 
 Solr is trying to load com/uwyn/jhighlight/renderer/XhtmlRendererFactory
 but that is not a class which is shipped or used by Solr. I think you have
 some custom plugins (a highlighter perhaps?) which uses that class and the
 classpath is not setup correctly.
 
 
 On Wed, Jul 23, 2014 at 2:20 AM, Ameya Aware ameya.aw...@gmail.com wrote:
 
  Hi
 
  I am running into below error while indexing a file in solr.
 
  Can you please help to fix this?
 
  ERROR - 2014-07-22 16:40:32.126; org.apache.solr.common.SolrException;
  null:java.lang.RuntimeException: java.lang.NoClassDefFoundError:
  com/uwyn/jhighlight/renderer/XhtmlRendererFactory
  at
 
  org.apache.solr.servlet.SolrDispatchFilter.sendError(SolrDispatchFilter.java:790)
  at
 
  org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:439)
  at
 
  org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:207)
  at
 
  org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419)
  at
  org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455)
  at
 
  org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)
  at
  org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557)
  at
 
  org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
  at
 
  org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075)
  at
  org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384)
  at
 
  org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)
  at
 
  org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009)
  at
 
  org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
  at
 
  org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)
  at
 
  org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154)
  at
 
  org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)
  at org.eclipse.jetty.server.Server.handle(Server.java:368)
  at
 
  org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489)
  at
 
  org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53)
  at
 
  org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:942)
  at
 
  org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:1004)
  at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:636)
  at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235)
  at
 
  org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72)
  at
 
  org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264)
  at
 
  org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
  at
 
  org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)
  at java.lang.Thread.run(Unknown Source)
  Caused by: java.lang.NoClassDefFoundError:
  com/uwyn/jhighlight/renderer/XhtmlRendererFactory
  at
 
  org.apache.tika.parser.code.SourceCodeParser.getRenderer(SourceCodeParser.java:121)
  at
 
  org.apache.tika.parser.code.SourceCodeParser.parse(SourceCodeParser.java:102)
  at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
  at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
  at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
  at
 
  org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:219)
  at
 
  

Re: How do I get faceting to work with Solr JOINs

2014-07-23 Thread Vinay B,
Thank You, Umesh !

That's a neat approach. Reading through your post, we decided to tweak our
indexing strategy a bit, basically an inversion. We moved all our facetable
(and frequently updated) fields to the main doc and the text and other
static content fields to the sub doc (co-related via a parent id field as
described in my original post). This allows us to satisfy the query
criteria I described.


On Thu, Jul 17, 2014 at 11:58 PM, Umesh Prasad umesh.i...@gmail.com wrote:

 Hi Vinay,

 You can customize the FacetsComponent. Basically FacetComponent uses
 SimpleFacets to compute the facet count. It passes matched docset present
 in responsebuilder to  SimpleFacets's constructor.

 1.  Build a mapping between parent space and auxiliary document space in
 (say an int array) and cache it in your own custom cache in
 SolrIndexSearcher.  You will need to rebuild this mapping on every commit
 have to define a CacheRegenerator for that.

 2.  You can map the matched docset (which is in parent space) to auxiliary
 document space.
  The catch is that facets from non matching auxililary docs also would
 be counted.

 3. You can then pass on this mapped auxiliary document to SimpleFacets for
 faceting.

 I have doing something similar for our needs .. Basically, we have a parent
 document with text attributes and changes very less. And we have child
 documents with inventory attributes which changes extremely fast. The
 search results requires child documents but faceting has to be done on text
 attributes which belong to parents. So we do this mapping by customizing
 the FacetComponent.






 On 18 July 2014 04:11, Vinay B, vybe3...@gmail.com wrote:

  Some Background info :
  In our application, we have a requirement to update large number of
 records
  often.  I investigated solr child documents but it requires updating both
  the child and the parent document . Therefore, I'm investigating adding
  frequently updated information in an auxillary document with a custom
  defined parent-id field that can be used to join with the static
 parent
  document. - basically rolling my own child document functionality.
 
  This approach has satisfied all my requirements, except one. How can I
  facet upon a field present in the auxillary document?
 
  First, here's a gist dump of my test core index (4 docs + 4 aux docs)
  https://gist.github.com/anonymous/2774b54e667778c71492
 
  Next, here's a simple facet query only on the aux . While this works, it
  only returns auxillary documents
  https://gist.github.com/anonymous/a58b87576b895e467c68
 
  Finally, I tweak the query using a SOLR join (
  https://wiki.apache.org/solr/Join ) to return the main documents (which
 it
  does), but the faceting returns no results. This is what I'm hoping
 someone
  on this list can answer .
  Here is the gist of that query
  https://gist.github.com/anonymous/f3a287ab726f35b142cf
 
  Any answers, suggestions ?
 
  Thanks
 



 --
 ---
 Thanks  Regards
 Umesh Prasad



Re: Question about ReRankQuery

2014-07-23 Thread Joel Bernstein
The ReRankingQParserPlugin uses the Lucene QueryRescorer, which only uses
the score from the re-rank query when re-ranking the top N documents.

The ReRanklingQParserPlugin is built as a RankQuery plugin so you can swap
in your own implementation. Patches are also welcome for the existing
implementation.

Joel Bernstein
Search Engineer at Heliosearch


On Wed, Jul 23, 2014 at 11:37 AM, Peter Keegan peterlkee...@gmail.com
wrote:

 See http://heliosearch.org/solrs-new-re-ranking-feature/


 On Wed, Jul 23, 2014 at 11:27 AM, Erick Erickson erickerick...@gmail.com
 wrote:

  I'm having a little trouble understanding the use-case here. Why use
  re-ranking?
  Isn't this just combining the original query with the second query with
 an
  AND
  and using the original sort?
 
  At the end, you have your original list in it's original order, with
  (potentially) some
  documents removed that don't satisfy the secondary query.
 
  Or I'm missing the boat entirely.
 
  Best,
  Erick
 
 
  On Wed, Jul 23, 2014 at 6:31 AM, Peter Keegan peterlkee...@gmail.com
  wrote:
 
   I'm looking at how 'ReRankQuery' works. If the main query has a Sort
   criteria, it is only used to sort the first pass results. The
 QueryScorer
   used in the second pass only reorders the ScoreDocs based on score and
   docid, but doesn't use the original Sort fields. If the Sort criteria
 is
   'score desc, myfield asc', I would expect 'myfield' to break score ties
   from the second pass after rescoring.
  
   Is this a bug or the intended behavior?
  
   Thanks,
   Peter
  
 



Re: Question about ReRankQuery

2014-07-23 Thread Joel Bernstein
Blog on the RankQuery API
http://heliosearch.org/solrs-new-rankquery-feature/

Joel Bernstein
Search Engineer at Heliosearch


On Wed, Jul 23, 2014 at 3:27 PM, Joel Bernstein joels...@gmail.com wrote:

 The ReRankingQParserPlugin uses the Lucene QueryRescorer, which only uses
 the score from the re-rank query when re-ranking the top N documents.

 The ReRanklingQParserPlugin is built as a RankQuery plugin so you can swap
 in your own implementation. Patches are also welcome for the existing
 implementation.

 Joel Bernstein
 Search Engineer at Heliosearch


 On Wed, Jul 23, 2014 at 11:37 AM, Peter Keegan peterlkee...@gmail.com
 wrote:

 See http://heliosearch.org/solrs-new-re-ranking-feature/


 On Wed, Jul 23, 2014 at 11:27 AM, Erick Erickson erickerick...@gmail.com
 
 wrote:

  I'm having a little trouble understanding the use-case here. Why use
  re-ranking?
  Isn't this just combining the original query with the second query with
 an
  AND
  and using the original sort?
 
  At the end, you have your original list in it's original order, with
  (potentially) some
  documents removed that don't satisfy the secondary query.
 
  Or I'm missing the boat entirely.
 
  Best,
  Erick
 
 
  On Wed, Jul 23, 2014 at 6:31 AM, Peter Keegan peterlkee...@gmail.com
  wrote:
 
   I'm looking at how 'ReRankQuery' works. If the main query has a Sort
   criteria, it is only used to sort the first pass results. The
 QueryScorer
   used in the second pass only reorders the ScoreDocs based on score and
   docid, but doesn't use the original Sort fields. If the Sort criteria
 is
   'score desc, myfield asc', I would expect 'myfield' to break score
 ties
   from the second pass after rescoring.
  
   Is this a bug or the intended behavior?
  
   Thanks,
   Peter
  
 





Any Solr consultants available??

2014-07-23 Thread Jack Krupansky
I occasionally get pinged by recruiters looking for Solr application 
developers... here’s the latest. If you are interested, either contact Jessica 
directly or reply to me and I’ll forward your reply.

Even if you don’t strictly meet all the requirements... they are having trouble 
finding... anyone. All the great Solr guys I know are quite busy.

Thanks.

-- Jack Krupansky

From: Jessica Feigin 
Sent: Wednesday, July 23, 2014 3:36 PM
To: 'Jack Krupansky' 
Subject: Thank you!

Hi Jack,

 

Thanks for your assistance, below is the Solr Consultant job description:

 

Our client, a hospitality Fortune 500 company are looking to update their 
platform to make accessing information easier for the franchisees. This is the 
first phase of the project which will take a few years. They want a hands on  
Solr consultant who has ideally worked in the search space.  As you can imagine 
the company culture is great, everyone is really friendly and there is also an 
option to become permanent.  They are looking for:

 

- 10+ years’ experience with Solr (Apache Lucene), HTML, XML, Java, Tomcat, 
JBoss, MySQL

- 5+ years’ experience implementing Solr builds of indexes, shards, and refined 
searches across semi-structured data sets to include architectural scaling

- Experience in developing a re-usable framework to support web site search; 
implement rich web site search, including the incorporation of metadata.

- Experienced in development using Java, Oracle, RedHat, Perl, shell, and 
clustering

- A strong understanding of Data analytics, algorithms, and large data 
structures

- Experienced in architectural design and resource planning for scaling 
Solr/Lucene capabilities.

- Bachelor's degree in Computer Science or related discipline.





 

 

Jessica Feigin 
Technical Recruiter

Technology Resource Management 
30 Vreeland Rd., Florham Park, NJ 07932 
Phone 973-377-0040 x 415, Fax 973-377-7064 
Email: jess...@trmconsulting.com

Web site: www.trmconsulting.com

LinkedIn Profile: www.linkedin.com/in/jessicafeigin

 


Re: Any Solr consultants available??

2014-07-23 Thread Tri Cao

Well, it's kind of hard to find a person if the requirement is 10 years' experience 
with Solr given that Solr was created in 2004.

On Jul 23, 2014, at 12:45 PM, Jack Krupansky j...@basetechnology.com wrote:

I occasionally get pinged by recruiters looking for Solr application 
developers... here’s the latest. If you are interested, either contact Jessica 
directly or reply to me and I’ll forward your reply.

Even if you don’t strictly meet all the requirements... they are having trouble 
finding... anyone. All the great Solr guys I know are quite busy.

Thanks.

-- Jack Krupansky

From: Jessica Feigin 
Sent: Wednesday, July 23, 2014 3:36 PM
To: 'Jack Krupansky' 
Subject: Thank you!


Hi Jack,



Thanks for your assistance, below is the Solr Consultant job description:



Our client, a hospitality Fortune 500 company are looking to update their 
platform to make accessing information easier for the franchisees. This is the 
first phase of the project which will take a few years. They want a hands on 
Solr consultant who has ideally worked in the search space. As you can imagine 
the company culture is great, everyone is really friendly and there is also an 
option to become permanent. They are looking for:



- 10+ years’ experience with Solr (Apache Lucene), HTML, XML, Java, Tomcat, 
JBoss, MySQL

- 5+ years’ experience implementing Solr builds of indexes, shards, and refined 
searches across semi-structured data sets to include architectural scaling

- Experience in developing a re-usable framework to support web site search; 
implement rich web site search, including the incorporation of metadata.

- Experienced in development using Java, Oracle, RedHat, Perl, shell, and 
clustering

- A strong understanding of Data analytics, algorithms, and large data 
structures

- Experienced in architectural design and resource planning for scaling 
Solr/Lucene capabilities.

- Bachelor's degree in Computer Science or related discipline.









Jessica Feigin 
Technical Recruiter


Technology Resource Management 
30 Vreeland Rd., Florham Park, NJ 07932 
Phone 973-377-0040 x 415, Fax 973-377-7064 
Email: jess...@trmconsulting.com


Web site: www.trmconsulting.com

LinkedIn Profile: www.linkedin.com/in/jessicafeigin




Re: Any Solr consultants available??

2014-07-23 Thread Jack Krupansky
Yeah, I saw that, which is why I suggested not being too picky about specific 
requirements. If you have at least two or three years of solid Solr experience, 
that would make you at least worth looking at.

-- Jack Krupansky

From: Tri Cao 
Sent: Wednesday, July 23, 2014 3:57 PM
To: solr-user@lucene.apache.org 
Cc: solr-user@lucene.apache.org 
Subject: Re: Any Solr consultants available??

Well, it's kind of hard to find a person if the requirement is 10 years' 
experience with Solr given that Solr was created in 2004.

On Jul 23, 2014, at 12:45 PM, Jack Krupansky j...@basetechnology.com wrote:


  I occasionally get pinged by recruiters looking for Solr application 
developers... here’s the latest. If you are interested, either contact Jessica 
directly or reply to me and I’ll forward your reply.

  Even if you don’t strictly meet all the requirements... they are having 
trouble finding... anyone. All the great Solr guys I know are quite busy.

  Thanks.

  -- Jack Krupansky

  From: Jessica Feigin 
  Sent: Wednesday, July 23, 2014 3:36 PM
  To: 'Jack Krupansky' 
  Subject: Thank you!

  Hi Jack,



  Thanks for your assistance, below is the Solr Consultant job description:



  Our client, a hospitality Fortune 500 company are looking to update their 
platform to make accessing information easier for the franchisees. This is the 
first phase of the project which will take a few years. They want a hands on 
Solr consultant who has ideally worked in the search space. As you can imagine 
the company culture is great, everyone is really friendly and there is also an 
option to become permanent. They are looking for:



  - 10+ years’ experience with Solr (Apache Lucene), HTML, XML, Java, Tomcat, 
JBoss, MySQL

  - 5+ years’ experience implementing Solr builds of indexes, shards, and 
refined searches across semi-structured data sets to include architectural 
scaling

  - Experience in developing a re-usable framework to support web site search; 
implement rich web site search, including the incorporation of metadata.

  - Experienced in development using Java, Oracle, RedHat, Perl, shell, and 
clustering

  - A strong understanding of Data analytics, algorithms, and large data 
structures

  - Experienced in architectural design and resource planning for scaling 
Solr/Lucene capabilities.

  - Bachelor's degree in Computer Science or related discipline.









  Jessica Feigin 
  Technical Recruiter

  Technology Resource Management 
  30 Vreeland Rd., Florham Park, NJ 07932 
  Phone 973-377-0040 x 415, Fax 973-377-7064 
  Email: jess...@trmconsulting.com

  Web site: www.trmconsulting.com

  LinkedIn Profile: www.linkedin.com/in/jessicafeigin




Re: Any Solr consultants available??

2014-07-23 Thread Steve McKay
Perhaps the requirement means a total of 10 years of experience spread across 
Solr, HTML, XML, Java, Tomcat, JBoss, and MySQL. This doesn't seem likely, but 
it is satisfiable, so if we proceed on the assumption that a job posting 
doesn't contain unsatisfiable requirements then it's more reasonable than a 
naive interpretation. 

There exists the possibility of a satisfiable interpretation which is more 
intuitively appealing, and IMO this warrants further investigation. 

 On Jul 23, 2014, at 3:57 PM, Tri Cao tm...@me.com wrote:
 
 Well, it's kind of hard to find a person if the requirement is 10 years' 
 experience with Solr given that Solr was created in 2004.
 
 On Jul 23, 2014, at 12:45 PM, Jack Krupansky j...@basetechnology.com wrote:
 
 
 I occasionally get pinged by recruiters looking for Solr application 
 developers... here’s the latest. If you are interested, either contact 
 Jessica directly or reply to me and I’ll forward your reply.
 
 Even if you don’t strictly meet all the requirements... they are having 
 trouble finding... anyone. All the great Solr guys I know are quite busy.
 
 Thanks.
 
 -- Jack Krupansky
 
 From: Jessica Feigin 
 Sent: Wednesday, July 23, 2014 3:36 PM
 To: 'Jack Krupansky' 
 Subject: Thank you!
 
 Hi Jack,
 
  
 
 Thanks for your assistance, below is the Solr Consultant job description:
 
  
 
 Our client, a hospitality Fortune 500 company are looking to update their 
 platform to make accessing information easier for the franchisees. This is 
 the first phase of the project which will take a few years. They want a 
 hands on Solr consultant who has ideally worked in the search space. As you 
 can imagine the company culture is great, everyone is really friendly and 
 there is also an option to become permanent. They are looking for:
 
  
 
 - 10+ years’ experience with Solr (Apache Lucene), HTML, XML, Java, Tomcat, 
 JBoss, MySQL
 
 - 5+ years’ experience implementing Solr builds of indexes, shards, and 
 refined searches across semi-structured data sets to include architectural 
 scaling
 
 - Experience in developing a re-usable framework to support web site search; 
 implement rich web site search, including the incorporation of metadata.
 
 - Experienced in development using Java, Oracle, RedHat, Perl, shell, and 
 clustering
 
 - A strong understanding of Data analytics, algorithms, and large data 
 structures
 
 - Experienced in architectural design and resource planning for scaling 
 Solr/Lucene capabilities.
 
 - Bachelor's degree in Computer Science or related discipline.
 
 
 
 
 
  
 
  
 
 Jessica Feigin 
 Technical Recruiter
 
 Technology Resource Management 
 30 Vreeland Rd., Florham Park, NJ 07932 
 Phone 973-377-0040 x 415, Fax 973-377-7064 
 Email: jess...@trmconsulting.com
 
 Web site: www.trmconsulting.com
 
 LinkedIn Profile: www.linkedin.com/in/jessicafeigin
 
  


Performance of indexing using Solr

2014-07-23 Thread Ameya Aware
Hi,

I am kind of in trouble regarding indexing documents using Solr.

After every 15-20 documents, Solr gives below log:

INFO  - 2014-07-23 15:38:50.715; org.apache.solr.core.SolrDeletionPolicy;
newest commit generation = 994
INFO  - 2014-07-23 15:38:50.718; org.apache.solr.search.SolrIndexSearcher;
Opening Searcher@41ffd00c[collection1] realtime
INFO  - 2014-07-23 15:38:50.719;
org.apache.solr.update.DirectUpdateHandler2; end_commit_flush

And stops for a minute or so.

This is affecting throughput very badly.

Anything wrong here?


Thanks,
Ameya


Re: integrating Accumulo with solr

2014-07-23 Thread Joe Gresock
We store data in both Solr and Accumulo -- do you have more details about
what kind of data and indexing you want?  Is there a reason you're thinking
of using both databases in particular?


On Wed, Jul 23, 2014 at 5:17 AM, Ali Nazemian alinazem...@gmail.com wrote:

 Dear All,
 Hi,
 I was wondering is there anybody out there that tried to integrate Solr
 with Accumulo? I was thinking about using Accumulo on top of HDFS and using
 Solr to index data inside Accumulo? Do you have any idea how can I do such
 integration?

 Best regards.

 --
 A.Nazemian




-- 
I know what it is to be in need, and I know what it is to have plenty.  I
have learned the secret of being content in any and every situation,
whether well fed or hungry, whether living in plenty or in want.  I can do
all this through him who gives me strength.*-Philippians 4:12-13*


Re: Question about ReRankQuery

2014-07-23 Thread Peter Keegan
 The ReRankingQParserPlugin uses the Lucene QueryRescorer, which only uses
the score from the re-rank query when re-ranking the top N documents.

Understood, but if the re-rank scores produce new ties, wouldn't you want
to resort them with the FieldSortedHitQueue?

Anyway, I was looking to reimplement the ScaleScoreQParser PostFilter
plugin with RankQuery, and would need to implement the behavior of the
DelegateCollector there for handling multiple sort fields.

Peter

On Wednesday, July 23, 2014, Joel Bernstein joels...@gmail.com wrote:

 The ReRankingQParserPlugin uses the Lucene QueryRescorer, which only uses
 the score from the re-rank query when re-ranking the top N documents.

 The ReRanklingQParserPlugin is built as a RankQuery plugin so you can swap
 in your own implementation. Patches are also welcome for the existing
 implementation.

 Joel Bernstein
 Search Engineer at Heliosearch


 On Wed, Jul 23, 2014 at 11:37 AM, Peter Keegan peterlkee...@gmail.com
 javascript:;
 wrote:

  See http://heliosearch.org/solrs-new-re-ranking-feature/
 
 
  On Wed, Jul 23, 2014 at 11:27 AM, Erick Erickson 
 erickerick...@gmail.com javascript:;
  wrote:
 
   I'm having a little trouble understanding the use-case here. Why use
   re-ranking?
   Isn't this just combining the original query with the second query with
  an
   AND
   and using the original sort?
  
   At the end, you have your original list in it's original order, with
   (potentially) some
   documents removed that don't satisfy the secondary query.
  
   Or I'm missing the boat entirely.
  
   Best,
   Erick
  
  
   On Wed, Jul 23, 2014 at 6:31 AM, Peter Keegan peterlkee...@gmail.com
 javascript:;
   wrote:
  
I'm looking at how 'ReRankQuery' works. If the main query has a Sort
criteria, it is only used to sort the first pass results. The
  QueryScorer
used in the second pass only reorders the ScoreDocs based on score
 and
docid, but doesn't use the original Sort fields. If the Sort criteria
  is
'score desc, myfield asc', I would expect 'myfield' to break score
 ties
from the second pass after rescoring.
   
Is this a bug or the intended behavior?
   
Thanks,
Peter
   
  
 



Issue with solr admin collection API 4.8.1

2014-07-23 Thread Hutchins, Jonathan
Solr 4.8.1
Zookeeper 3.4.5
Centos 6.5

We are running into an issue where one of our environments is unable to 
successfully execute commands via the collection API.  We found that we were 
unable to add new collections and after doing some digging found that even 
/solr/admin/collections?action=LIST sits for 180 seconds and then times out.  
Looking at Zookeeper in overseer/collection-queue-work you can see that tasks 
get queued up but never executed.  Restarting Zookeeper and the solr instances 
seems to have no effect.  Has anyone experienced anything similar or know how 
to recover from this state?

Thank you,

Jonathan Hutchins


Re: How to migrate content of a collection to a new collection

2014-07-23 Thread Chris Hostetter

: billions of documents (not enough memory). Please note that we are on 4.4,
: which does not contain the new CURSOR-feature. Please also note that speed is
: an important factor for us.

for situations where you know you will be processing every doc and order 
doesn't matter you can use a poor mans cursor by filtering on sccessive 
ranges of your uniqueKey field as described in the Is There A 
Workaround? section of this blog post...

http://searchhub.org/2013/12/12/coming-soon-to-solr-efficient-cursor-based-iteration-of-large-result-sets/

* sort on uniqueKey
* leave start=0 on every requets
* add an fq to each request based on the last uniqueKey value from 
  the previous request.


-Hoss
http://www.lucidworks.com/


Re: SOLR 4.4 - Slave always replicates full index

2014-07-23 Thread Robin Woods
Thanks Shawn. that makes sense.





--
View this message in context: 
http://lucene.472066.n3.nabble.com/SOLR-4-4-Slave-always-replicates-full-index-tp4113089p4148909.html
Sent from the Solr - User mailing list archive at Nabble.com.


commons-configuration NoClassDefFoundError: Predicate

2014-07-23 Thread Peyman Faratin
Hi

I've tried all permutations with no results so I thought I write to the group 
for help. 

I am running commons config 
(http://commons.apache.org/proper/commons-configuration/) just fine via maven 
and ant but when I try to run the class calling the method 
PropertiesConfiguration via a SOLR search component I get the following error

 org.eclipse.jetty.servlet.ServletHandler  – Error for /solr/ArticlesRaw/ingest
java.lang.NoClassDefFoundError: org/apache/commons/collections/Predicate
at com.xyz.logic(Ingest.java:106)
at com.xyz.logic.process(Runngest.java:76)
at 
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:217)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1916)
at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:780)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:427)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:217)
at 
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419)
at 
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455)
at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)
at 
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:533)
at 
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
at 
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075)
at 
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384)
at 
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)
at 
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009)
at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
at 
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)
at 
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154)
at 
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)
at org.eclipse.jetty.server.Server.handle(Server.java:368)
at 
org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489)
at 
org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53)
at 
org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:942)
at 
org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:1004)
at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:640)
at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235)
at 
org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72)
at 
org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264)
at 
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
at 
org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)
at java.lang.Thread.run(Thread.java:744)
Caused by: java.lang.ClassNotFoundException: 
org.apache.commons.collections.Predicate
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
at 
org.eclipse.jetty.webapp.WebAppClassLoader.loadClass(WebAppClassLoader.java:430)
at 
org.eclipse.jetty.webapp.WebAppClassLoader.loadClass(WebAppClassLoader.java:383)


Following suggestions here

http://stackoverflow.com/questions/7651799/proper-usage-of-apache-commons-configuration/7651867#7651867

I am including the appropriate jars in solrconfig.xml 

  lib dir=${mvnRepository}/commons-lang/commons-lang/2.6/ regex=.*\.jar/
  lib dir=${mvnRepository}/commons-collections/commons-collections/3.2.1/ 
regex=.*\.jar/  
  lib dir=${mvnRepository}/commons-logging/commons-logging/1.1.1/ 
regex=.*\.jar/  
  lib dir=${mvnRepository}/commons-configuration/commons-configuration/1.10/ 
regex=.*\.jar/  

(the class is in org.apache.commons.collections.Predicate 
commons-collections/3.2.1 jar)

I am running solr 4.7.1

Any help would be much appreciated

Peyman




Re: Any Solr consultants available??

2014-07-23 Thread Walter Underwood
When I see job postings like this, I have to assume they were written by people 
who really don’t understand the problem and have never met people with the 
various skills they are asking for. They are not going to find one person who 
does all this.

This is an opening for zebra unicorn that walks on water. At best, they’ll get 
a one-horned goat with painted stripes on a life raft. They need to talk to 
some people, make multiple realistic openings, and expect to grow some of their 
own expertise.

I got an email like this from Goldman Sachs this morning.

“... a Senior Application Architect/Developer and DevOps Engineer for a major 
company initiative. In addition to an effort to build a new cloud 
infrastructure from the ground up, they are beginning a number of company 
projects in the areas of cloud-based open source search, Machine Learning/AI, 
Big Data, Predictive Analytics  Low-Latency Trading Algorithm Development.”

Good luck, fellas.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/


On Jul 23, 2014, at 1:01 PM, Jack Krupansky j...@basetechnology.com wrote:

 Yeah, I saw that, which is why I suggested not being too picky about specific 
 requirements. If you have at least two or three years of solid Solr 
 experience, that would make you at least worth looking at.
 
 -- Jack Krupansky
 
 From: Tri Cao 
 Sent: Wednesday, July 23, 2014 3:57 PM
 To: solr-user@lucene.apache.org 
 Cc: solr-user@lucene.apache.org 
 Subject: Re: Any Solr consultants available??
 
 Well, it's kind of hard to find a person if the requirement is 10 years' 
 experience with Solr given that Solr was created in 2004.
 
 On Jul 23, 2014, at 12:45 PM, Jack Krupansky j...@basetechnology.com wrote:
 
 
  I occasionally get pinged by recruiters looking for Solr application 
 developers... here’s the latest. If you are interested, either contact 
 Jessica directly or reply to me and I’ll forward your reply.
 
  Even if you don’t strictly meet all the requirements... they are having 
 trouble finding... anyone. All the great Solr guys I know are quite busy.
 
  Thanks.
 
  -- Jack Krupansky
 
  From: Jessica Feigin 
  Sent: Wednesday, July 23, 2014 3:36 PM
  To: 'Jack Krupansky' 
  Subject: Thank you!
 
  Hi Jack,
 
 
 
  Thanks for your assistance, below is the Solr Consultant job description:
 
 
 
  Our client, a hospitality Fortune 500 company are looking to update their 
 platform to make accessing information easier for the franchisees. This is 
 the first phase of the project which will take a few years. They want a hands 
 on Solr consultant who has ideally worked in the search space. As you can 
 imagine the company culture is great, everyone is really friendly and there 
 is also an option to become permanent. They are looking for:
 
 
 
  - 10+ years’ experience with Solr (Apache Lucene), HTML, XML, Java, Tomcat, 
 JBoss, MySQL
 
  - 5+ years’ experience implementing Solr builds of indexes, shards, and 
 refined searches across semi-structured data sets to include architectural 
 scaling
 
  - Experience in developing a re-usable framework to support web site search; 
 implement rich web site search, including the incorporation of metadata.
 
  - Experienced in development using Java, Oracle, RedHat, Perl, shell, and 
 clustering
 
  - A strong understanding of Data analytics, algorithms, and large data 
 structures
 
  - Experienced in architectural design and resource planning for scaling 
 Solr/Lucene capabilities.
 
  - Bachelor's degree in Computer Science or related discipline.
 
 
 
 
 
 
 
 
 
  Jessica Feigin 
  Technical Recruiter
 
  Technology Resource Management 
  30 Vreeland Rd., Florham Park, NJ 07932 
  Phone 973-377-0040 x 415, Fax 973-377-7064 
  Email: jess...@trmconsulting.com
 
  Web site: www.trmconsulting.com
 
  LinkedIn Profile: www.linkedin.com/in/jessicafeigin
 
 



Re: Any Solr consultants available??

2014-07-23 Thread Alexandre Rafalovitch
On Thu, Jul 24, 2014 at 2:44 AM, Jack Krupansky j...@basetechnology.com wrote:
 All the great Solr guys I know are quite busy.

Sounds like an opportunity for somebody to put together a training
hacker camp, similar to https://hackerbeach.org/ . Cross-train
consultants in Solr, immediately increase their value.  Do it
somewhere on the beach or in the mountains, etc. If somebody organizes
it, I would probably even be interested to teaching the first (newbie)
part.

And the graduation project would a be a solr-consutants.com website to
make it easier to find those same consultants later. :-)

Regards,
   Alex.
P.s. Last issue of my newsletter had Solr big ideas. The one above
was not in it, but it is - I believe - also viable. Contact me if it
catches your fancy for more detailed brainstorming and notes sharing.

Personal: http://www.outerthoughts.com/ and @arafalov
Solr resources and newsletter: http://www.solr-start.com/ and @solrstart
Solr popularizers community: https://www.linkedin.com/groups?gid=6713853


Re: Question about ReRankQuery

2014-07-23 Thread Joel Bernstein
I like the FieldSortedHitQueue idea. If  you want to work up a patch for
that, it would be great.

Joel Bernstein
Search Engineer at Heliosearch


On Wed, Jul 23, 2014 at 5:17 PM, Peter Keegan peterlkee...@gmail.com
wrote:

  The ReRankingQParserPlugin uses the Lucene QueryRescorer, which only uses
 the score from the re-rank query when re-ranking the top N documents.

 Understood, but if the re-rank scores produce new ties, wouldn't you want
 to resort them with the FieldSortedHitQueue?

 Anyway, I was looking to reimplement the ScaleScoreQParser PostFilter
 plugin with RankQuery, and would need to implement the behavior of the
 DelegateCollector there for handling multiple sort fields.

 Peter

 On Wednesday, July 23, 2014, Joel Bernstein joels...@gmail.com wrote:

  The ReRankingQParserPlugin uses the Lucene QueryRescorer, which only uses
  the score from the re-rank query when re-ranking the top N documents.
 
  The ReRanklingQParserPlugin is built as a RankQuery plugin so you can
 swap
  in your own implementation. Patches are also welcome for the existing
  implementation.
 
  Joel Bernstein
  Search Engineer at Heliosearch
 
 
  On Wed, Jul 23, 2014 at 11:37 AM, Peter Keegan peterlkee...@gmail.com
  javascript:;
  wrote:
 
   See http://heliosearch.org/solrs-new-re-ranking-feature/
  
  
   On Wed, Jul 23, 2014 at 11:27 AM, Erick Erickson 
  erickerick...@gmail.com javascript:;
   wrote:
  
I'm having a little trouble understanding the use-case here. Why use
re-ranking?
Isn't this just combining the original query with the second query
 with
   an
AND
and using the original sort?
   
At the end, you have your original list in it's original order, with
(potentially) some
documents removed that don't satisfy the secondary query.
   
Or I'm missing the boat entirely.
   
Best,
Erick
   
   
On Wed, Jul 23, 2014 at 6:31 AM, Peter Keegan 
 peterlkee...@gmail.com
  javascript:;
wrote:
   
 I'm looking at how 'ReRankQuery' works. If the main query has a
 Sort
 criteria, it is only used to sort the first pass results. The
   QueryScorer
 used in the second pass only reorders the ScoreDocs based on score
  and
 docid, but doesn't use the original Sort fields. If the Sort
 criteria
   is
 'score desc, myfield asc', I would expect 'myfield' to break score
  ties
 from the second pass after rescoring.

 Is this a bug or the intended behavior?

 Thanks,
 Peter

   
  
 



Re: Performance of indexing using Solr

2014-07-23 Thread Joel Bernstein
It looks you're committing too frequently. If you're explicitly committing
from the application you may want to switch to using autoCommits. If you're
not committing from the application your autocommit settings are probably
too low.

Joel Bernstein
Search Engineer at Heliosearch


On Wed, Jul 23, 2014 at 4:41 PM, Ameya Aware ameya.aw...@gmail.com wrote:

 Hi,

 I am kind of in trouble regarding indexing documents using Solr.

 After every 15-20 documents, Solr gives below log:

 INFO  - 2014-07-23 15:38:50.715; org.apache.solr.core.SolrDeletionPolicy;
 newest commit generation = 994
 INFO  - 2014-07-23 15:38:50.718; org.apache.solr.search.SolrIndexSearcher;
 Opening Searcher@41ffd00c[collection1] realtime
 INFO  - 2014-07-23 15:38:50.719;
 org.apache.solr.update.DirectUpdateHandler2; end_commit_flush

 And stops for a minute or so.

 This is affecting throughput very badly.

 Anything wrong here?


 Thanks,
 Ameya