How to migrate content of a collection to a new collection
Hi We have numerous collections each with numerous shards spread across numerous machines. We just discovered that all documents have a field with a wrong value and besides that we would like to add a new field to all documents * The field with the wrong value is a long, DocValued, Indexed and Stored. Some (about half) of the documents need to have a constant added to their current value * The field we want to add will be and int, DocValued, Indexed and Stored. Needs to be added to all documents, but will have different values among the documents How to achieve our goal in the easiest possible way? We thought about spooling/streaming from the existing collection into a twin-collection, then delete the existing collection and finally rename the twin-collection to have the same name as the original collection. Basically indexing all documents again. If that is the easiest way, how do we query in a way so that we get all documents streamed. We cannot just do a *:* query that returns everything into memory and the index from there, because we have billions of documents (not enough memory). Please note that we are on 4.4, which does not contain the new CURSOR-feature. Please also note that speed is an important factor for us. Guess this could also be achieved by doing 1-1 migration on shard-level instead of collection-level, keeping everything in the new collections on the same machine as where they lived in the old collections. That could probably complete faster than the 1-1 on collection-level approach. But this 1-1 on shard-level approach is not very good for us, because the long field we need to change is also part of the id (controlling the routing to a particular shard) and therefore actually we also need to change the id on all documents. So if we do the 1-1 on shard-level approach, we will end up having documents in shards that they actually do not be to (they would not have been routed there by the routing system in Solr). We might be able to live with this disadvantage if 1-1 on shard-level can be easily achieved much faster than the 1-1 on collection-level. Any input is very much appreciated! Thanks Regards, Per Steffensen
integrating Accumulo with solr
Dear All, Hi, I was wondering is there anybody out there that tried to integrate Solr with Accumulo? I was thinking about using Accumulo on top of HDFS and using Solr to index data inside Accumulo? Do you have any idea how can I do such integration? Best regards. -- A.Nazemian
Are stored fields compressed by default?
Hi! I'm planning to use atomic-updates https://wiki.apache.org/solr/Atomic_Updates which means having all fields stored. Some docs might have text fields of up to 200K, I will feel better knowing that Solr automatically compresses stored fields (I know Lucene 4.x default codec does). *Are stored fields compressed by default? Or there's a way to configure it? (Solr 4.7).* Thanks!
Re: Java heap Space error
You may want to change your solr startup script such that it creates a heap dump on OOM. Add -XX:+HeapDumpOnOutOfMemoryError as an option. The heap dump can be nicely analyzed with http://www.eclipse.org/mat/. Just increasing -Xmx is a workaround that may help to get around for a while. With mat you will see much clearer what is the likely cause. Harald. On 22.07.2014 19:37, Ameya Aware wrote: Hi i am running into java heap space issue. Please see below log. ERROR - 2014-07-22 11:38:59.370; org.apache.solr.common.SolrException; null:java.lang.RuntimeException: java.lang.OutOfMemoryError: Java heap space at org.apache.solr.servlet.SolrDispatchFilter.sendError(SolrDispatchFilter.java:790) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:439) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:207) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137) at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384) at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135) at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255) at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116) at org.eclipse.jetty.server.Server.handle(Server.java:368) at org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489) at org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53) at org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:942) at org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:1004) at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:636) at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235) at org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72) at org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264) at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608) at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543) at java.lang.Thread.run(Unknown Source) Caused by: java.lang.OutOfMemoryError: Java heap space at org.apache.solr.common.util.JavaBinCodec.writeStr(JavaBinCodec.java:567) at org.apache.solr.common.util.JavaBinCodec.writePrimitive(JavaBinCodec.java:646) at org.apache.solr.common.util.JavaBinCodec.writeKnownType(JavaBinCodec.java:240) at org.apache.solr.common.util.JavaBinCodec.writeVal(JavaBinCodec.java:153) at org.apache.solr.common.util.JavaBinCodec.writeSolrInputDocument(JavaBinCodec.java:409) at org.apache.solr.update.TransactionLog.write(TransactionLog.java:353) at org.apache.solr.update.UpdateLog.add(UpdateLog.java:397) at org.apache.solr.update.UpdateLog.add(UpdateLog.java:382) at org.apache.solr.update.DirectUpdateHandler2.addDoc0(DirectUpdateHandler2.java:255) at org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:160) at org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:69) at org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:51) at org.apache.solr.update.processor.DistributedUpdateProcessor.doLocalAdd(DistributedUpdateProcessor.java:704) at org.apache.solr.update.processor.DistributedUpdateProcessor.versionAdd(DistributedUpdateProcessor.java:858) at org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:557) at org.apache.solr.update.processor.LogUpdateProcessor.processAdd(LogUpdateProcessorFactory.java:100) at org.apache.solr.handler.extraction.ExtractingDocumentLoader.doAdd(ExtractingDocumentLoader.java:121) at org.apache.solr.handler.extraction.ExtractingDocumentLoader.addDoc(ExtractingDocumentLoader.java:126) at org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:228) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) at
solr 3.6 to 4.7 upgrade has changed the query string
Hi, Our backend application queries solr to retrieve certain records. We were initially on 3.6 version and now upgrade to 4.7 solr version. something has changed in terms of query string which needs a parentheses for the below query both the queries are from 4.7 solr. returns 1 record http://solr-dev.ss.com/solr/hotel/select/?q=ID:AFLGWBDAB OR AFLGWBGIB OR FLGWGNDG returns 14 records as expected. http://solr-dev.ss.com/solr/hotel/select/?q=ID:(AFLGWBDAB OR AFLGWBGIB OR FLGWGNDG) If you notice I had to add parentheses to get expected records. Also I cannot change the application code now as the release process to the application will go beyond 1 month. Is there a workaround within solrconfig to not use parentheses in 4.7 Thanks Shashi -- View this message in context: http://lucene.472066.n3.nabble.com/solr-3-6-to-4-7-upgrade-has-changed-the-query-string-tp4148719.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: SolrCloud replica dies under high throughput
Thanks that helped. I no longer see the constant replica recovery. It also increased my throughput to 1.6/1.7 million per hour reliably. I actually then tried using SSDs instead and it flew up to 6.5 million updates per hour. Setup: 4 node cluster using m3.2xl AWS servers using general purpose SSDs. Thanks again, Darren -Original Message- From: Mark Miller [mailto:markrmil...@gmail.com] Sent: 22 July 2014 00:25 To: solr-user@lucene.apache.org Subject: Re: SolrCloud replica dies under high throughput Looks like you probably have to raise the http client connection pool limits to handle that kind of load currently. They are specified as top level config in solr.xml: maxUpdateConnections maxUpdateConnectionsPerHost -- Mark Miller about.me/markrmiller On July 21, 2014 at 7:14:59 PM, Darren Lee (d...@amplience.com) wrote: Hi, I'm doing some benchmarking with Solr Cloud 4.9.0. I am trying to work out exactly how much throughput my cluster can handle. Consistently in my test I see a replica go into recovering state forever caused by what looks like a timeout during replication. I can understand the timeout and failure (I am hitting it fairly hard) but what seems odd to me is that when I stop the heavy load it still does not recover the next time it tries, it seems broken forever until I manually go in, clear the index and let it do a full resync. Is this normal? Am I misunderstanding something? My cluster has 4 nodes (2 shards, 2 replicas) (AWS m3.2xlarge). I am indexing with ~800 concurrent connections and a 10 sec soft commit. I consistently get this problem with a throughput of around 1.5 million documents per hour. Thanks all, Darren Stack Traces Messages: [qtp779330563-627] ERROR org.apache.solr.servlet.SolrDispatchFilter â null:org.apache.http.conn.ConnectionPoolTimeoutException: Timeout waiting for connection from pool at org.apache.http.impl.conn.PoolingClientConnectionManager.leaseConnecti on(PoolingClientConnectionManager.java:226) at org.apache.http.impl.conn.PoolingClientConnectionManager$1.getConnecti on(PoolingClientConnectionManager.java:195) at org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequ estDirector.java:422) at org.apache.http.impl.client.AbstractHttpClient.doExecute(AbstractHttpC lient.java:863) at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpC lient.java:82) at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpC lient.java:106) at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpC lient.java:57) at org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrServer$Runner.ru n(ConcurrentUpdateSolrServer.java:233) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.j ava:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor. java:615) at java.lang.Thread.run(Thread.java:724) Error while trying to recover. core=assets_shard2_replica1:java.util.concurrent.ExecutionException: org.apache.solr.client.solrj.SolrServerException: IOException occured when talking to server at: http://xxx.xxx.15.171:8080/solr at java.util.concurrent.FutureTask.report(FutureTask.java:122) at java.util.concurrent.FutureTask.get(FutureTask.java:188) at org.apache.solr.cloud.RecoveryStrategy.sendPrepRecoveryCmd(RecoveryStr ategy.java:615) at org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.jav a:371) at org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:235) Caused by: org.apache.solr.client.solrj.SolrServerException: IOException occured when talking to server at: http://xxx.xxx.15.171:8080/solr at org.apache.solr.client.solrj.impl.HttpSolrServer.executeMethod(HttpSol rServer.java:566) at org.apache.solr.client.solrj.impl.HttpSolrServer$1.call(HttpSolrServer .java:245) at org.apache.solr.client.solrj.impl.HttpSolrServer$1.call(HttpSolrServer .java:241) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.j ava:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor. java:615) at java.lang.Thread.run(Thread.java:744) Caused by: java.net.SocketException: Socket closed at java.net.SocketInputStream.socketRead0(Native Method) at java.net.SocketInputStream.read(SocketInputStream.java:152) at java.net.SocketInputStream.read(SocketInputStream.java:122) at org.apache.http.impl.io.AbstractSessionInputBuffer.fillBuffer(Abstract SessionInputBuffer.java:160) at org.apache.http.impl.io.SocketInputBuffer.fillBuffer(SocketInputBuffer .java:84) at org.apache.http.impl.io.AbstractSessionInputBuffer.readLine(AbstractSe ssionInputBuffer.java:273) at org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultH ttpResponseParser.java:140) at
Re: Query using doc Id
@Alexandre No, I mean the same what you mean docId:[100 TO 200] @Santosh My intention is to query all the docs from Solr. If I give rows=100start=100, for which I need to apply my query as *:* , which is not a good idea. Hence looking for an option to filter based on docId. Thanks Regards Mukund On Wed, Jul 23, 2014 at 10:43 AM, Alexandre Rafalovitch arafa...@gmail.com wrote: Do you mean something different from docId:[100 TO 200] ? Regards, Alex. Personal: http://www.outerthoughts.com/ and @arafalov Solr resources: http://www.solr-start.com/ and @solrstart Solr popularizers community: https://www.linkedin.com/groups?gid=6713853 On Wed, Jul 23, 2014 at 11:49 AM, Mukundaraman Valakumaresan muk...@8kmiles.com wrote: Hi, Is it possible to execute queries using doc Id as a query parameter For eg, query docs whose doc Id is between 100 and 200 Thanks Regards Mukund
Re: Query using doc Id
Perhaps you are looking for cursorMark: http://solr.pl/en/2014/03/10/solr-4-7-efficient-deep-paging/ ? Regards, Alex. Personal: http://www.outerthoughts.com/ and @arafalov Solr resources: http://www.solr-start.com/ and @solrstart Solr popularizers community: https://www.linkedin.com/groups?gid=6713853 On Wed, Jul 23, 2014 at 4:59 PM, Mukundaraman Valakumaresan muk...@8kmiles.com wrote: @Alexandre No, I mean the same what you mean docId:[100 TO 200] @Santosh My intention is to query all the docs from Solr. If I give rows=100start=100, for which I need to apply my query as *:* , which is not a good idea. Hence looking for an option to filter based on docId. Thanks Regards Mukund On Wed, Jul 23, 2014 at 10:43 AM, Alexandre Rafalovitch arafa...@gmail.com wrote: Do you mean something different from docId:[100 TO 200] ? Regards, Alex. Personal: http://www.outerthoughts.com/ and @arafalov Solr resources: http://www.solr-start.com/ and @solrstart Solr popularizers community: https://www.linkedin.com/groups?gid=6713853 On Wed, Jul 23, 2014 at 11:49 AM, Mukundaraman Valakumaresan muk...@8kmiles.com wrote: Hi, Is it possible to execute queries using doc Id as a query parameter For eg, query docs whose doc Id is between 100 and 200 Thanks Regards Mukund
Re: Query using doc Id
Exactly Alexandre, Thanks Regards Mukund On Wed, Jul 23, 2014 at 3:37 PM, Alexandre Rafalovitch arafa...@gmail.com wrote: Perhaps you are looking for cursorMark: http://solr.pl/en/2014/03/10/solr-4-7-efficient-deep-paging/ ? Regards, Alex. Personal: http://www.outerthoughts.com/ and @arafalov Solr resources: http://www.solr-start.com/ and @solrstart Solr popularizers community: https://www.linkedin.com/groups?gid=6713853 On Wed, Jul 23, 2014 at 4:59 PM, Mukundaraman Valakumaresan muk...@8kmiles.com wrote: @Alexandre No, I mean the same what you mean docId:[100 TO 200] @Santosh My intention is to query all the docs from Solr. If I give rows=100start=100, for which I need to apply my query as *:* , which is not a good idea. Hence looking for an option to filter based on docId. Thanks Regards Mukund On Wed, Jul 23, 2014 at 10:43 AM, Alexandre Rafalovitch arafa...@gmail.com wrote: Do you mean something different from docId:[100 TO 200] ? Regards, Alex. Personal: http://www.outerthoughts.com/ and @arafalov Solr resources: http://www.solr-start.com/ and @solrstart Solr popularizers community: https://www.linkedin.com/groups?gid=6713853 On Wed, Jul 23, 2014 at 11:49 AM, Mukundaraman Valakumaresan muk...@8kmiles.com wrote: Hi, Is it possible to execute queries using doc Id as a query parameter For eg, query docs whose doc Id is between 100 and 200 Thanks Regards Mukund
[ANN] SIREn, a Lucene/Solr plugin for rich JSON data search
One of the coolest features of Lucene/Solr is its ability to index nested documents using a Blockjoin approach. While this works well for small documents and document collections, it becomes unsustainable for larger ones: Blockjoin works by splitting the original document in many documents, one per nested record. For example, a single USPTO patent (XML format converted to JSON) will end up being over 1500 documents in the index. This has massive implications on performance and scalability. Introducing SIREn SIREn is an open source plugin for Solr for indexing and searching rich nested JSON data. SIREn uses a sophisticated tree indexing design which ensures that the index is not artificially inflated. This ensures that querying on many types of nested queries can be up to 3x faster. Further, depending on the data, memory requirements for faceting can be up to 10x higher. As such, SIREn allows you to use Solr for larger and more complex datasets, especially so for sophisticated analytics. (You can read our whitepaper to find out more [1]) SIREn is also truly schemaless - it even allows you to change the type of a property between documents without being restricted by a defined mapping. This can be very useful for data integration scenarios where data is described in different ways in different sources. You only need a few minutes to download and try SIREn [2]. It comes with a detailed manual [3] and you have access to the code on GitHub [4]. We look forward to hear about your feedbacks. [1] http://siren.solutions/siren/resources/whitepapers/comparing-siren-1-2-and-lucenes-blockjoin-performance-a-uspto-patent-search-scenario/ [2] http://siren.solutions/siren/downloads/ [3] http://siren.solutions/manual/preface.html [4] https://github.com/sindicetech/siren -- Renaud Delbru CTO SIREn Solutions
Passivate core in Solr Cloud
Hello, We want to setup a Solr Cloud cluster in order to handle a high volume of documents with a multi-tenant architecture. The problem is that an application-level isolation for a tenant (using a mutual index with a field customer) is not enough to fit our requirements. As a result, we need 1 collection/customer. There is more than a thousand customers and it seems unreasonable to create thousands of collections in Solr Cloud... But as we know that there are less than 1 query/customer/day, we are currently looking for a way to passivate collection when they are not in use. Can it be a good idea? If yes, are there best practices to implement this? What side effects can we expect? Do we need to put some application-level logic on top on the Solr Cloud cluster to choose which collection we have to unload (and maybe there is something smarter (and quicker?) than simply loading/unloading the core when it is not in used?) ? Thank you for your answer(s), Aurelien
Re: Passivate core in Solr Cloud
Solr has some support for large number of cores, including transient cores: http://wiki.apache.org/solr/LotsOfCores Regards, Alex. Personal: http://www.outerthoughts.com/ and @arafalov Solr resources: http://www.solr-start.com/ and @solrstart Solr popularizers community: https://www.linkedin.com/groups?gid=6713853 On Wed, Jul 23, 2014 at 7:55 PM, Aurélien MAZOYER aurelien.mazo...@francelabs.com wrote: Hello, We want to setup a Solr Cloud cluster in order to handle a high volume of documents with a multi-tenant architecture. The problem is that an application-level isolation for a tenant (using a mutual index with a field customer) is not enough to fit our requirements. As a result, we need 1 collection/customer. There is more than a thousand customers and it seems unreasonable to create thousands of collections in Solr Cloud... But as we know that there are less than 1 query/customer/day, we are currently looking for a way to passivate collection when they are not in use. Can it be a good idea? If yes, are there best practices to implement this? What side effects can we expect? Do we need to put some application-level logic on top on the Solr Cloud cluster to choose which collection we have to unload (and maybe there is something smarter (and quicker?) than simply loading/unloading the core when it is not in used?) ? Thank you for your answer(s), Aurelien
how to fully test a response writer
Hi, I developed a new SolResponseWriter but I'm not happy with how I wrote tests. My problem is that I need to test it either with local request and with distributed request since the solr response object (input to the response writer) are different. a) I tested the local request case using SolrTestCaseJ4 b) tested the distributed request case using a junit test case and making rest calls to the alias coll12 associated to a couple of solrcloud collection configured with my custom response writer the problem with b) is that it requires a manual setup on every machine where I want to run the tests. thanks
Re: text search problem
Ravi, for the hyphen issue, try setting autoGeneratePhraseQueries=true for that fieldType (no re-index needed). As of 1.4, this defaults to false. One word of caution, autoGeneratePhraseQueries may not work as expected for langauges that aren't whitespace delimited. As Erick mentioned, the Analysis page will help you verify that your content and your queries are handled the way you expect them to be. See this thread for more info on autoGeneratePhraseQueries http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201202.mbox/%3c439f69a3-f292-482b-a102-7c011c576...@gmail.com%3E On Mon, Jul 21, 2014 at 8:42 PM, Erick Erickson erickerick...@gmail.com wrote: Try escaping the hyphen as \-. Or enclosing it all in quotes. But you _really_ have to spend some time with the debug option an admin/analysis page or you will find endless surprises. Best, Erick On Mon, Jul 21, 2014 at 11:12 AM, EXTERNAL Taminidi Ravi (ETI, Automotive-Service-Solutions) external.ravi.tamin...@us.bosch.com wrote: Thanks for the reply Erick, I will try as you suggested. There I have another question related to this lines. When I have - in my description , name then the search results are different. For e.g. ABC-123 , it look sofr ABC or 123, I want to treat this search as exact match, i.e if my document has ABC-123 then I should get the results. When I check with hl-on, it has emABCem and get the results. How can I avoid this situation. Thanks Ravi -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: Saturday, July 19, 2014 4:40 PM To: solr-user@lucene.apache.org Subject: Re: text search problem Try adding debug=all to the query and see what the parsed form of the query is, likely you're 1 using phrase queries, so broadway hotel requires both words in the 1 text or 2 if you're not using phrases, you're searching for the AND of the two terms. But debug=all will show you. Plus, take a look at the admin/analysis page, your tokenization may not be what you expect. Best, Erick On Fri, Jul 18, 2014 at 2:00 PM, EXTERNAL Taminidi Ravi (ETI, Automotive-Service-Solutions) external.ravi.tamin...@us.bosch.com wrote: Hi, Below is the text_general field type when I search Text:Boradway it is not returning all the records, it returning only few records. But when I search for Text:*Broadway*, it is getting more records. When I get into multiple words ln search like Broadway Hotel, it may not get Broadway , HotelBroadway Hotel. DO you have any thought how to handle these type of keyword search. Text:Broadway,Vehicle Detailing,Water Systems,Vehicle Detailing,Car Wash Water Recovery My Field type look like this. fieldType name=text_general class=solr.TextField positionIncrementGap=100 analyzer type=index charFilter class=solr.HTMLStripCharFilterFactory / tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt / filter class=solr.KStemFilterFactory/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.WordDelimiterFilterFactory generateWordParts=0 generateNumberParts=0 splitOnCaseChange=0 splitOnNumerics=0 stemEnglishPossessive=0 catenateWords=1 catenateNumbers=1 catenateAll=1 preserveOriginal=0/ !-- in this example, we will only use synonyms at query time filter class=solr.SynonymFilterFactory synonyms=index_synonyms.txt ignoreCase=true expand=false/ -- /analyzer analyzer type=query charFilter class=solr.HTMLStripCharFilterFactory / tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.KStemFilterFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt / filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.WordDelimiterFilterFactory generateWordParts=0 generateNumberParts=0 splitOnCaseChange=0 splitOnNumerics=0 stemEnglishPossessive=0 catenateWords=1 catenateNumbers=1 catenateAll=1 preserveOriginal=0/ /analyzer /fieldType Do you have any thought the behavior or how to get this? Thanks Ravi
Question about ReRankQuery
I'm looking at how 'ReRankQuery' works. If the main query has a Sort criteria, it is only used to sort the first pass results. The QueryScorer used in the second pass only reorders the ScoreDocs based on score and docid, but doesn't use the original Sort fields. If the Sort criteria is 'score desc, myfield asc', I would expect 'myfield' to break score ties from the second pass after rescoring. Is this a bug or the intended behavior? Thanks, Peter
RE: NoClassDefFoundError while indexing in Solr
There is a source code parser in tika that in fact just renders the source using an external source higlighter. Seen in you stack trace : com.apache.tika.parser.code.SourceCodeParser.getRenderer(SourceCodeParser.java:121) You are indexing code (java, c or groovy). Solr seems to be missing a transitive tika dependency (http://freecode.com/projects/jhighlight). Copying the lib in solr runtime lib directory should solve your issue. Pablo. From: Shalin Shekhar Mangar shalinman...@gmail.com Sent: Wednesday, July 23, 2014 7:43 AM To: solr-user@lucene.apache.org Subject: Re: NoClassDefFoundError while indexing in Solr Solr is trying to load com/uwyn/jhighlight/renderer/XhtmlRendererFactory but that is not a class which is shipped or used by Solr. I think you have some custom plugins (a highlighter perhaps?) which uses that class and the classpath is not setup correctly. On Wed, Jul 23, 2014 at 2:20 AM, Ameya Aware ameya.aw...@gmail.com wrote: Hi I am running into below error while indexing a file in solr. Can you please help to fix this? ERROR - 2014-07-22 16:40:32.126; org.apache.solr.common.SolrException; null:java.lang.RuntimeException: java.lang.NoClassDefFoundError: com/uwyn/jhighlight/renderer/XhtmlRendererFactory at org.apache.solr.servlet.SolrDispatchFilter.sendError(SolrDispatchFilter.java:790) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:439) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:207) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137) at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384) at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135) at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255) at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116) at org.eclipse.jetty.server.Server.handle(Server.java:368) at org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489) at org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53) at org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:942) at org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:1004) at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:636) at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235) at org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72) at org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264) at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608) at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543) at java.lang.Thread.run(Unknown Source) Caused by: java.lang.NoClassDefFoundError: com/uwyn/jhighlight/renderer/XhtmlRendererFactory at org.apache.tika.parser.code.SourceCodeParser.getRenderer(SourceCodeParser.java:121) at org.apache.tika.parser.code.SourceCodeParser.parse(SourceCodeParser.java:102) at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242) at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242) at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120) at org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:219) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) at org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:241) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1952) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:774) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:418) ... 26 more Caused by: java.lang.ClassNotFoundException: com.uwyn.jhighlight.renderer.XhtmlRendererFactory at
Re: Solr 4.7.2 auto suggestion
Hello, the suggester solr.SuggestComponent with str name=lookupImplFuzzyLookupFactory/str str name=dictionaryImplDocumentDictionaryFactory/str Dont work with type of fields which are not string and multivalued. any idea ? -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-4-7-2-auto-suggestion-tp4147677p4148768.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: NoClassDefFoundError while indexing in Solr
Thanks a lot for your suggestions. On Wed, Jul 23, 2014 at 9:53 AM, Pablo Queixalos pqueixa...@customermatrix.com wrote: There is a source code parser in tika that in fact just renders the source using an external source higlighter. Seen in you stack trace : com.apache.tika.parser.code.SourceCodeParser.getRenderer(SourceCodeParser.java:121) You are indexing code (java, c or groovy). Solr seems to be missing a transitive tika dependency (http://freecode.com/projects/jhighlight). Copying the lib in solr runtime lib directory should solve your issue. Pablo. From: Shalin Shekhar Mangar shalinman...@gmail.com Sent: Wednesday, July 23, 2014 7:43 AM To: solr-user@lucene.apache.org Subject: Re: NoClassDefFoundError while indexing in Solr Solr is trying to load com/uwyn/jhighlight/renderer/XhtmlRendererFactory but that is not a class which is shipped or used by Solr. I think you have some custom plugins (a highlighter perhaps?) which uses that class and the classpath is not setup correctly. On Wed, Jul 23, 2014 at 2:20 AM, Ameya Aware ameya.aw...@gmail.com wrote: Hi I am running into below error while indexing a file in solr. Can you please help to fix this? ERROR - 2014-07-22 16:40:32.126; org.apache.solr.common.SolrException; null:java.lang.RuntimeException: java.lang.NoClassDefFoundError: com/uwyn/jhighlight/renderer/XhtmlRendererFactory at org.apache.solr.servlet.SolrDispatchFilter.sendError(SolrDispatchFilter.java:790) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:439) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:207) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137) at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384) at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135) at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255) at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116) at org.eclipse.jetty.server.Server.handle(Server.java:368) at org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489) at org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53) at org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:942) at org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:1004) at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:636) at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235) at org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72) at org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264) at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608) at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543) at java.lang.Thread.run(Unknown Source) Caused by: java.lang.NoClassDefFoundError: com/uwyn/jhighlight/renderer/XhtmlRendererFactory at org.apache.tika.parser.code.SourceCodeParser.getRenderer(SourceCodeParser.java:121) at org.apache.tika.parser.code.SourceCodeParser.parse(SourceCodeParser.java:102) at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242) at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242) at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120) at org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:219) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) at org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:241) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1952) at
Re: [ANN] SIREn, a Lucene/Solr plugin for rich JSON data search
Querying nested data is very difficult in any modern db that I have seen. If It works as you suggest then It would be cool if the feature was it going to be eventually maintained inside solr. On Jul 23, 2014, at 7:13 AM, Renaud Delbru renaud@siren.solutions wrote: One of the coolest features of Lucene/Solr is its ability to index nested documents using a Blockjoin approach. While this works well for small documents and document collections, it becomes unsustainable for larger ones: Blockjoin works by splitting the original document in many documents, one per nested record. For example, a single USPTO patent (XML format converted to JSON) will end up being over 1500 documents in the index. This has massive implications on performance and scalability. Introducing SIREn SIREn is an open source plugin for Solr for indexing and searching rich nested JSON data. SIREn uses a sophisticated tree indexing design which ensures that the index is not artificially inflated. This ensures that querying on many types of nested queries can be up to 3x faster. Further, depending on the data, memory requirements for faceting can be up to 10x higher. As such, SIREn allows you to use Solr for larger and more complex datasets, especially so for sophisticated analytics. (You can read our whitepaper to find out more [1]) SIREn is also truly schemaless - it even allows you to change the type of a property between documents without being restricted by a defined mapping. This can be very useful for data integration scenarios where data is described in different ways in different sources. You only need a few minutes to download and try SIREn [2]. It comes with a detailed manual [3] and you have access to the code on GitHub [4]. We look forward to hear about your feedbacks. [1] http://siren.solutions/siren/resources/whitepapers/comparing-siren-1-2-and-lucenes-blockjoin-performance-a-uspto-patent-search-scenario/ [2] http://siren.solutions/siren/downloads/ [3] http://siren.solutions/manual/preface.html [4] https://github.com/sindicetech/siren -- Renaud Delbru CTO SIREn Solutions
Re: How to get Lacuma to match Lucuma
Is there a way to make solr do fuzzy searches automatically without having to add the tilda character ? And are there disadvantages of doing a fuzzy searches ? Warren On Jul 22, 2014, at 1:54 PM, Anshum Gupta ans...@anshumgupta.net wrote: Hi Warren, Check out the section about fuzzy search here https://cwiki.apache.org/confluence/display/solr/The+Standard+Query+Parser. On Tue, Jul 22, 2014 at 1:29 PM, Warren Bell warr...@clarksnutrition.com wrote: What field type or filters do I use to get something like the word “Lacuma” to return results with “Lucuma” in it ? The word “Lucuma” has been indexed in a field with field type text_en_splitting that came with the original solar examples. Thanks, Warren fieldType name=text_en_splitting class=solr.TextField positionIncrementGap=100 autoGeneratePhraseQueries=true analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ !-- in this example, we will only use synonyms at query time filter class=solr.SynonymFilterFactory synonyms=index_synonyms.txt ignoreCase=true expand=false/ -- !-- Case insensitive stop word removal. -- filter class=solr.StopFilterFactory ignoreCase=true words=lang/stopwords_en.txt / filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.KeywordMarkerFilterFactory protected=protwords.txt/ filter class=solr.PorterStemFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ filter class=solr.StopFilterFactory ignoreCase=true words=lang/stopwords_en.txt / filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=0 splitOnCaseChange=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.KeywordMarkerFilterFactory protected=protwords.txt/ filter class=solr.PorterStemFilterFactory/ /analyzer /fieldType -- This email was Virus checked by Clark's Nutrition's Astaro Security Gateway. The information contained in this e-mail is intended only for use of the individual or entity named above. This e-mail, and any documents, files, previous e-mails or other information attached to it, may contain confidential information that is legally privileged. If you are not the intended recipient of this e-mail, or the employee or agent responsible for delivering it to the intended recipient, you are hereby notified that any disclosure, dissemination, distribution, copying or other use of this e-mail or any of the information contained in or attached to it is strictly prohibited. If you have received this e-mail in error, please immediately notify us by return e-mail or by telephone at (951)321-1960, and destroy the original e-mail and its attachments without reading or saving it in any manner. Thank you. Clark’s Nutrition is a registered trademark of Clark's Nutritional Centers, Inc. -- Anshum Gupta http://www.anshumgupta.net -- This email was Virus checked by Clark's Nutrition's Astaro Security Gateway.
how to achieve static boost in solr
Hi, I am struggling how to achieve static boost in solr, i have visited many web sites but not getting solid answer. The requirement is as below: Suppose i have 100 keywords to search for and for each keyword i want particular URL to be appear on top. Say.. for keyword *car* the URL *http://car.com* should be on the top for keyword *building* the URL *http://building.com* should be on the top for keyword *java* the URL *http://javajee.com* should be on the top And So On How to achieve this if there are many no.of keywords or queries and i don't want to Hard code in java API as i should not hard coding for hundreds of keywords. I am using DB crawling and many of the keywords and all the urls are stored in DB. If i can use some configuration settings to achieve this then it will be better. -- View this message in context: http://lucene.472066.n3.nabble.com/how-to-achieve-static-boost-in-solr-tp4148788.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: How to migrate content of a collection to a new collection
Per: Given that you said that the field redefinition also includes routing info I don't see any other way than re-indexing each collection. That said, could you use the collection aliasing and do one collection at a time? Best, Erick On Tue, Jul 22, 2014 at 11:45 PM, Per Steffensen st...@designware.dk wrote: Hi We have numerous collections each with numerous shards spread across numerous machines. We just discovered that all documents have a field with a wrong value and besides that we would like to add a new field to all documents * The field with the wrong value is a long, DocValued, Indexed and Stored. Some (about half) of the documents need to have a constant added to their current value * The field we want to add will be and int, DocValued, Indexed and Stored. Needs to be added to all documents, but will have different values among the documents How to achieve our goal in the easiest possible way? We thought about spooling/streaming from the existing collection into a twin-collection, then delete the existing collection and finally rename the twin-collection to have the same name as the original collection. Basically indexing all documents again. If that is the easiest way, how do we query in a way so that we get all documents streamed. We cannot just do a *:* query that returns everything into memory and the index from there, because we have billions of documents (not enough memory). Please note that we are on 4.4, which does not contain the new CURSOR-feature. Please also note that speed is an important factor for us. Guess this could also be achieved by doing 1-1 migration on shard-level instead of collection-level, keeping everything in the new collections on the same machine as where they lived in the old collections. That could probably complete faster than the 1-1 on collection-level approach. But this 1-1 on shard-level approach is not very good for us, because the long field we need to change is also part of the id (controlling the routing to a particular shard) and therefore actually we also need to change the id on all documents. So if we do the 1-1 on shard-level approach, we will end up having documents in shards that they actually do not be to (they would not have been routed there by the routing system in Solr). We might be able to live with this disadvantage if 1-1 on shard-level can be easily achieved much faster than the 1-1 on collection-level. Any input is very much appreciated! Thanks Regards, Per Steffensen
Re: Are stored fields compressed by default?
Yes, they have been since 4.1. And there's no handy option for turning this off at this point.. Best, Erick On Wed, Jul 23, 2014 at 2:31 AM, Gili Nachum gilinac...@gmail.com wrote: Hi! I'm planning to use atomic-updates https://wiki.apache.org/solr/Atomic_Updates which means having all fields stored. Some docs might have text fields of up to 200K, I will feel better knowing that Solr automatically compresses stored fields (I know Lucene 4.x default codec does). *Are stored fields compressed by default? Or there's a way to configure it? (Solr 4.7).* Thanks!
Re: solr 3.6 to 4.7 upgrade has changed the query string
Try adding debug=all to both records. But these are very different queries. My guess is that something _else_ changed, probably in solrconfig.xml that's the cause, most probably your default field in your 3.6 case is the ID field. If that's the case you should be able to change it in the 4.7 solnconfig.xml file for the request handlers involved. One change is that you used to be able to define this in schema.xml, but that's been deprecated. The first query parses as ID:AFLGWBDAB http://solr-dev.ss.com/solr/hotel/select/?q=ID:AFLGWBDAB OR default_search_field:AFLGWBGIB OR default_search_field:FLGWGNDG Best, Erick On Wed, Jul 23, 2014 at 2:51 AM, shashi.rsb shashi@hotmail.com wrote: Hi, Our backend application queries solr to retrieve certain records. We were initially on 3.6 version and now upgrade to 4.7 solr version. something has changed in terms of query string which needs a parentheses for the below query both the queries are from 4.7 solr. returns 1 record http://solr-dev.ss.com/solr/hotel/select/?q=ID:AFLGWBDAB OR AFLGWBGIB OR FLGWGNDG returns 14 records as expected. http://solr-dev.ss.com/solr/hotel/select/?q=ID:(AFLGWBDAB OR AFLGWBGIB OR FLGWGNDG) If you notice I had to add parentheses to get expected records. Also I cannot change the application code now as the release process to the application will go beyond 1 month. Is there a workaround within solrconfig to not use parentheses in 4.7 Thanks Shashi -- View this message in context: http://lucene.472066.n3.nabble.com/solr-3-6-to-4-7-upgrade-has-changed-the-query-string-tp4148719.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Passivate core in Solr Cloud
Do note that the lots of cores stuff does NOT play nice with in distributed mode (yet). Best, Erick On Wed, Jul 23, 2014 at 6:00 AM, Alexandre Rafalovitch arafa...@gmail.com wrote: Solr has some support for large number of cores, including transient cores: http://wiki.apache.org/solr/LotsOfCores Regards, Alex. Personal: http://www.outerthoughts.com/ and @arafalov Solr resources: http://www.solr-start.com/ and @solrstart Solr popularizers community: https://www.linkedin.com/groups?gid=6713853 On Wed, Jul 23, 2014 at 7:55 PM, Aurélien MAZOYER aurelien.mazo...@francelabs.com wrote: Hello, We want to setup a Solr Cloud cluster in order to handle a high volume of documents with a multi-tenant architecture. The problem is that an application-level isolation for a tenant (using a mutual index with a field customer) is not enough to fit our requirements. As a result, we need 1 collection/customer. There is more than a thousand customers and it seems unreasonable to create thousands of collections in Solr Cloud... But as we know that there are less than 1 query/customer/day, we are currently looking for a way to passivate collection when they are not in use. Can it be a good idea? If yes, are there best practices to implement this? What side effects can we expect? Do we need to put some application-level logic on top on the Solr Cloud cluster to choose which collection we have to unload (and maybe there is something smarter (and quicker?) than simply loading/unloading the core when it is not in used?) ? Thank you for your answer(s), Aurelien
Re: solr 3.6 to 4.7 upgrade has changed the query string
Did you blindly switch to the new solrconfig.xml? If so, the default query request handler sets the df parameter to text, which would give you different results compared to having the defaultSearchField set to some other field, like your ID field. Read the comments in the new schema.xml about defaultSearchField being deprecated in favor of the df parameter. -- Jack Krupansky -Original Message- From: shashi.rsb Sent: Wednesday, July 23, 2014 5:51 AM To: solr-user@lucene.apache.org Subject: solr 3.6 to 4.7 upgrade has changed the query string Hi, Our backend application queries solr to retrieve certain records. We were initially on 3.6 version and now upgrade to 4.7 solr version. something has changed in terms of query string which needs a parentheses for the below query both the queries are from 4.7 solr. returns 1 record http://solr-dev.ss.com/solr/hotel/select/?q=ID:AFLGWBDAB OR AFLGWBGIB OR FLGWGNDG returns 14 records as expected. http://solr-dev.ss.com/solr/hotel/select/?q=ID:(AFLGWBDAB OR AFLGWBGIB OR FLGWGNDG) If you notice I had to add parentheses to get expected records. Also I cannot change the application code now as the release process to the application will go beyond 1 month. Is there a workaround within solrconfig to not use parentheses in 4.7 Thanks Shashi -- View this message in context: http://lucene.472066.n3.nabble.com/solr-3-6-to-4-7-upgrade-has-changed-the-query-string-tp4148719.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Question about ReRankQuery
I'm having a little trouble understanding the use-case here. Why use re-ranking? Isn't this just combining the original query with the second query with an AND and using the original sort? At the end, you have your original list in it's original order, with (potentially) some documents removed that don't satisfy the secondary query. Or I'm missing the boat entirely. Best, Erick On Wed, Jul 23, 2014 at 6:31 AM, Peter Keegan peterlkee...@gmail.com wrote: I'm looking at how 'ReRankQuery' works. If the main query has a Sort criteria, it is only used to sort the first pass results. The QueryScorer used in the second pass only reorders the ScoreDocs based on score and docid, but doesn't use the original Sort fields. If the Sort criteria is 'score desc, myfield asc', I would expect 'myfield' to break score ties from the second pass after rescoring. Is this a bug or the intended behavior? Thanks, Peter
Re: how to achieve static boost in solr
Take a look at Query Elevation Component perhaps? Best, Erick On Wed, Jul 23, 2014 at 8:05 AM, rahulmodi rahul.m...@ge.com wrote: Hi, I am struggling how to achieve static boost in solr, i have visited many web sites but not getting solid answer. The requirement is as below: Suppose i have 100 keywords to search for and for each keyword i want particular URL to be appear on top. Say.. for keyword *car* the URL *http://car.com* should be on the top for keyword *building* the URL *http://building.com* should be on the top for keyword *java* the URL *http://javajee.com* should be on the top And So On How to achieve this if there are many no.of keywords or queries and i don't want to Hard code in java API as i should not hard coding for hundreds of keywords. I am using DB crawling and many of the keywords and all the urls are stored in DB. If i can use some configuration settings to achieve this then it will be better. -- View this message in context: http://lucene.472066.n3.nabble.com/how-to-achieve-static-boost-in-solr-tp4148788.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Question about ReRankQuery
See http://heliosearch.org/solrs-new-re-ranking-feature/ On Wed, Jul 23, 2014 at 11:27 AM, Erick Erickson erickerick...@gmail.com wrote: I'm having a little trouble understanding the use-case here. Why use re-ranking? Isn't this just combining the original query with the second query with an AND and using the original sort? At the end, you have your original list in it's original order, with (potentially) some documents removed that don't satisfy the secondary query. Or I'm missing the boat entirely. Best, Erick On Wed, Jul 23, 2014 at 6:31 AM, Peter Keegan peterlkee...@gmail.com wrote: I'm looking at how 'ReRankQuery' works. If the main query has a Sort criteria, it is only used to sort the first pass results. The QueryScorer used in the second pass only reorders the ScoreDocs based on score and docid, but doesn't use the original Sort fields. If the Sort criteria is 'score desc, myfield asc', I would expect 'myfield' to break score ties from the second pass after rescoring. Is this a bug or the intended behavior? Thanks, Peter
Re: [ANN] SIREn, a Lucene/Solr plugin for rich JSON data search
Querying nested data is very easy in MarkLogic, it was built for that. I used to work there. The founder is a former search engine guy from Infoseek and Ultraseek, so it has a lot of familiar behavior, like merging segments automatically. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ On Jul 23, 2014, at 7:25 AM, Jay Vyas jayunit100.apa...@gmail.com wrote: Querying nested data is very difficult in any modern db that I have seen. If It works as you suggest then It would be cool if the feature was it going to be eventually maintained inside solr. On Jul 23, 2014, at 7:13 AM, Renaud Delbru renaud@siren.solutions wrote: One of the coolest features of Lucene/Solr is its ability to index nested documents using a Blockjoin approach. While this works well for small documents and document collections, it becomes unsustainable for larger ones: Blockjoin works by splitting the original document in many documents, one per nested record. For example, a single USPTO patent (XML format converted to JSON) will end up being over 1500 documents in the index. This has massive implications on performance and scalability. Introducing SIREn SIREn is an open source plugin for Solr for indexing and searching rich nested JSON data. SIREn uses a sophisticated tree indexing design which ensures that the index is not artificially inflated. This ensures that querying on many types of nested queries can be up to 3x faster. Further, depending on the data, memory requirements for faceting can be up to 10x higher. As such, SIREn allows you to use Solr for larger and more complex datasets, especially so for sophisticated analytics. (You can read our whitepaper to find out more [1]) SIREn is also truly schemaless - it even allows you to change the type of a property between documents without being restricted by a defined mapping. This can be very useful for data integration scenarios where data is described in different ways in different sources. You only need a few minutes to download and try SIREn [2]. It comes with a detailed manual [3] and you have access to the code on GitHub [4]. We look forward to hear about your feedbacks. [1] http://siren.solutions/siren/resources/whitepapers/comparing-siren-1-2-and-lucenes-blockjoin-performance-a-uspto-patent-search-scenario/ [2] http://siren.solutions/siren/downloads/ [3] http://siren.solutions/manual/preface.html [4] https://github.com/sindicetech/siren -- Renaud Delbru CTO SIREn Solutions
Re: NoClassDefFoundError while indexing in Solr
BTW, Ameya, jhighlight-1.0.jar is in the Solr binary distribution, in contrib/extraction/lib. There are a bunch of different libraries that Tika uses for content extraction, so this seems like a good time to make sure that Tika has all the jars available that it might need to process the files you're indexing. Everything relevant should be included in contrib/extraction/lib. Steve On Wed, Jul 23, 2014 at 01:53:45PM +, Pablo Queixalos wrote: There is a source code parser in tika that in fact just renders the source using an external source higlighter. Seen in you stack trace : com.apache.tika.parser.code.SourceCodeParser.getRenderer(SourceCodeParser.java:121) You are indexing code (java, c or groovy). Solr seems to be missing a transitive tika dependency (http://freecode.com/projects/jhighlight). Copying the lib in solr runtime lib directory should solve your issue. Pablo. From: Shalin Shekhar Mangar shalinman...@gmail.com Sent: Wednesday, July 23, 2014 7:43 AM To: solr-user@lucene.apache.org Subject: Re: NoClassDefFoundError while indexing in Solr Solr is trying to load com/uwyn/jhighlight/renderer/XhtmlRendererFactory but that is not a class which is shipped or used by Solr. I think you have some custom plugins (a highlighter perhaps?) which uses that class and the classpath is not setup correctly. On Wed, Jul 23, 2014 at 2:20 AM, Ameya Aware ameya.aw...@gmail.com wrote: Hi I am running into below error while indexing a file in solr. Can you please help to fix this? ERROR - 2014-07-22 16:40:32.126; org.apache.solr.common.SolrException; null:java.lang.RuntimeException: java.lang.NoClassDefFoundError: com/uwyn/jhighlight/renderer/XhtmlRendererFactory at org.apache.solr.servlet.SolrDispatchFilter.sendError(SolrDispatchFilter.java:790) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:439) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:207) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137) at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384) at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135) at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255) at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116) at org.eclipse.jetty.server.Server.handle(Server.java:368) at org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489) at org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53) at org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:942) at org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:1004) at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:636) at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235) at org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72) at org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264) at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608) at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543) at java.lang.Thread.run(Unknown Source) Caused by: java.lang.NoClassDefFoundError: com/uwyn/jhighlight/renderer/XhtmlRendererFactory at org.apache.tika.parser.code.SourceCodeParser.getRenderer(SourceCodeParser.java:121) at org.apache.tika.parser.code.SourceCodeParser.parse(SourceCodeParser.java:102) at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242) at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242) at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120) at org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:219) at
Re: How do I get faceting to work with Solr JOINs
Thank You, Umesh ! That's a neat approach. Reading through your post, we decided to tweak our indexing strategy a bit, basically an inversion. We moved all our facetable (and frequently updated) fields to the main doc and the text and other static content fields to the sub doc (co-related via a parent id field as described in my original post). This allows us to satisfy the query criteria I described. On Thu, Jul 17, 2014 at 11:58 PM, Umesh Prasad umesh.i...@gmail.com wrote: Hi Vinay, You can customize the FacetsComponent. Basically FacetComponent uses SimpleFacets to compute the facet count. It passes matched docset present in responsebuilder to SimpleFacets's constructor. 1. Build a mapping between parent space and auxiliary document space in (say an int array) and cache it in your own custom cache in SolrIndexSearcher. You will need to rebuild this mapping on every commit have to define a CacheRegenerator for that. 2. You can map the matched docset (which is in parent space) to auxiliary document space. The catch is that facets from non matching auxililary docs also would be counted. 3. You can then pass on this mapped auxiliary document to SimpleFacets for faceting. I have doing something similar for our needs .. Basically, we have a parent document with text attributes and changes very less. And we have child documents with inventory attributes which changes extremely fast. The search results requires child documents but faceting has to be done on text attributes which belong to parents. So we do this mapping by customizing the FacetComponent. On 18 July 2014 04:11, Vinay B, vybe3...@gmail.com wrote: Some Background info : In our application, we have a requirement to update large number of records often. I investigated solr child documents but it requires updating both the child and the parent document . Therefore, I'm investigating adding frequently updated information in an auxillary document with a custom defined parent-id field that can be used to join with the static parent document. - basically rolling my own child document functionality. This approach has satisfied all my requirements, except one. How can I facet upon a field present in the auxillary document? First, here's a gist dump of my test core index (4 docs + 4 aux docs) https://gist.github.com/anonymous/2774b54e667778c71492 Next, here's a simple facet query only on the aux . While this works, it only returns auxillary documents https://gist.github.com/anonymous/a58b87576b895e467c68 Finally, I tweak the query using a SOLR join ( https://wiki.apache.org/solr/Join ) to return the main documents (which it does), but the faceting returns no results. This is what I'm hoping someone on this list can answer . Here is the gist of that query https://gist.github.com/anonymous/f3a287ab726f35b142cf Any answers, suggestions ? Thanks -- --- Thanks Regards Umesh Prasad
Re: Question about ReRankQuery
The ReRankingQParserPlugin uses the Lucene QueryRescorer, which only uses the score from the re-rank query when re-ranking the top N documents. The ReRanklingQParserPlugin is built as a RankQuery plugin so you can swap in your own implementation. Patches are also welcome for the existing implementation. Joel Bernstein Search Engineer at Heliosearch On Wed, Jul 23, 2014 at 11:37 AM, Peter Keegan peterlkee...@gmail.com wrote: See http://heliosearch.org/solrs-new-re-ranking-feature/ On Wed, Jul 23, 2014 at 11:27 AM, Erick Erickson erickerick...@gmail.com wrote: I'm having a little trouble understanding the use-case here. Why use re-ranking? Isn't this just combining the original query with the second query with an AND and using the original sort? At the end, you have your original list in it's original order, with (potentially) some documents removed that don't satisfy the secondary query. Or I'm missing the boat entirely. Best, Erick On Wed, Jul 23, 2014 at 6:31 AM, Peter Keegan peterlkee...@gmail.com wrote: I'm looking at how 'ReRankQuery' works. If the main query has a Sort criteria, it is only used to sort the first pass results. The QueryScorer used in the second pass only reorders the ScoreDocs based on score and docid, but doesn't use the original Sort fields. If the Sort criteria is 'score desc, myfield asc', I would expect 'myfield' to break score ties from the second pass after rescoring. Is this a bug or the intended behavior? Thanks, Peter
Re: Question about ReRankQuery
Blog on the RankQuery API http://heliosearch.org/solrs-new-rankquery-feature/ Joel Bernstein Search Engineer at Heliosearch On Wed, Jul 23, 2014 at 3:27 PM, Joel Bernstein joels...@gmail.com wrote: The ReRankingQParserPlugin uses the Lucene QueryRescorer, which only uses the score from the re-rank query when re-ranking the top N documents. The ReRanklingQParserPlugin is built as a RankQuery plugin so you can swap in your own implementation. Patches are also welcome for the existing implementation. Joel Bernstein Search Engineer at Heliosearch On Wed, Jul 23, 2014 at 11:37 AM, Peter Keegan peterlkee...@gmail.com wrote: See http://heliosearch.org/solrs-new-re-ranking-feature/ On Wed, Jul 23, 2014 at 11:27 AM, Erick Erickson erickerick...@gmail.com wrote: I'm having a little trouble understanding the use-case here. Why use re-ranking? Isn't this just combining the original query with the second query with an AND and using the original sort? At the end, you have your original list in it's original order, with (potentially) some documents removed that don't satisfy the secondary query. Or I'm missing the boat entirely. Best, Erick On Wed, Jul 23, 2014 at 6:31 AM, Peter Keegan peterlkee...@gmail.com wrote: I'm looking at how 'ReRankQuery' works. If the main query has a Sort criteria, it is only used to sort the first pass results. The QueryScorer used in the second pass only reorders the ScoreDocs based on score and docid, but doesn't use the original Sort fields. If the Sort criteria is 'score desc, myfield asc', I would expect 'myfield' to break score ties from the second pass after rescoring. Is this a bug or the intended behavior? Thanks, Peter
Any Solr consultants available??
I occasionally get pinged by recruiters looking for Solr application developers... here’s the latest. If you are interested, either contact Jessica directly or reply to me and I’ll forward your reply. Even if you don’t strictly meet all the requirements... they are having trouble finding... anyone. All the great Solr guys I know are quite busy. Thanks. -- Jack Krupansky From: Jessica Feigin Sent: Wednesday, July 23, 2014 3:36 PM To: 'Jack Krupansky' Subject: Thank you! Hi Jack, Thanks for your assistance, below is the Solr Consultant job description: Our client, a hospitality Fortune 500 company are looking to update their platform to make accessing information easier for the franchisees. This is the first phase of the project which will take a few years. They want a hands on Solr consultant who has ideally worked in the search space. As you can imagine the company culture is great, everyone is really friendly and there is also an option to become permanent. They are looking for: - 10+ years’ experience with Solr (Apache Lucene), HTML, XML, Java, Tomcat, JBoss, MySQL - 5+ years’ experience implementing Solr builds of indexes, shards, and refined searches across semi-structured data sets to include architectural scaling - Experience in developing a re-usable framework to support web site search; implement rich web site search, including the incorporation of metadata. - Experienced in development using Java, Oracle, RedHat, Perl, shell, and clustering - A strong understanding of Data analytics, algorithms, and large data structures - Experienced in architectural design and resource planning for scaling Solr/Lucene capabilities. - Bachelor's degree in Computer Science or related discipline. Jessica Feigin Technical Recruiter Technology Resource Management 30 Vreeland Rd., Florham Park, NJ 07932 Phone 973-377-0040 x 415, Fax 973-377-7064 Email: jess...@trmconsulting.com Web site: www.trmconsulting.com LinkedIn Profile: www.linkedin.com/in/jessicafeigin
Re: Any Solr consultants available??
Well, it's kind of hard to find a person if the requirement is 10 years' experience with Solr given that Solr was created in 2004. On Jul 23, 2014, at 12:45 PM, Jack Krupansky j...@basetechnology.com wrote: I occasionally get pinged by recruiters looking for Solr application developers... here’s the latest. If you are interested, either contact Jessica directly or reply to me and I’ll forward your reply. Even if you don’t strictly meet all the requirements... they are having trouble finding... anyone. All the great Solr guys I know are quite busy. Thanks. -- Jack Krupansky From: Jessica Feigin Sent: Wednesday, July 23, 2014 3:36 PM To: 'Jack Krupansky' Subject: Thank you! Hi Jack, Thanks for your assistance, below is the Solr Consultant job description: Our client, a hospitality Fortune 500 company are looking to update their platform to make accessing information easier for the franchisees. This is the first phase of the project which will take a few years. They want a hands on Solr consultant who has ideally worked in the search space. As you can imagine the company culture is great, everyone is really friendly and there is also an option to become permanent. They are looking for: - 10+ years’ experience with Solr (Apache Lucene), HTML, XML, Java, Tomcat, JBoss, MySQL - 5+ years’ experience implementing Solr builds of indexes, shards, and refined searches across semi-structured data sets to include architectural scaling - Experience in developing a re-usable framework to support web site search; implement rich web site search, including the incorporation of metadata. - Experienced in development using Java, Oracle, RedHat, Perl, shell, and clustering - A strong understanding of Data analytics, algorithms, and large data structures - Experienced in architectural design and resource planning for scaling Solr/Lucene capabilities. - Bachelor's degree in Computer Science or related discipline. Jessica Feigin Technical Recruiter Technology Resource Management 30 Vreeland Rd., Florham Park, NJ 07932 Phone 973-377-0040 x 415, Fax 973-377-7064 Email: jess...@trmconsulting.com Web site: www.trmconsulting.com LinkedIn Profile: www.linkedin.com/in/jessicafeigin
Re: Any Solr consultants available??
Yeah, I saw that, which is why I suggested not being too picky about specific requirements. If you have at least two or three years of solid Solr experience, that would make you at least worth looking at. -- Jack Krupansky From: Tri Cao Sent: Wednesday, July 23, 2014 3:57 PM To: solr-user@lucene.apache.org Cc: solr-user@lucene.apache.org Subject: Re: Any Solr consultants available?? Well, it's kind of hard to find a person if the requirement is 10 years' experience with Solr given that Solr was created in 2004. On Jul 23, 2014, at 12:45 PM, Jack Krupansky j...@basetechnology.com wrote: I occasionally get pinged by recruiters looking for Solr application developers... here’s the latest. If you are interested, either contact Jessica directly or reply to me and I’ll forward your reply. Even if you don’t strictly meet all the requirements... they are having trouble finding... anyone. All the great Solr guys I know are quite busy. Thanks. -- Jack Krupansky From: Jessica Feigin Sent: Wednesday, July 23, 2014 3:36 PM To: 'Jack Krupansky' Subject: Thank you! Hi Jack, Thanks for your assistance, below is the Solr Consultant job description: Our client, a hospitality Fortune 500 company are looking to update their platform to make accessing information easier for the franchisees. This is the first phase of the project which will take a few years. They want a hands on Solr consultant who has ideally worked in the search space. As you can imagine the company culture is great, everyone is really friendly and there is also an option to become permanent. They are looking for: - 10+ years’ experience with Solr (Apache Lucene), HTML, XML, Java, Tomcat, JBoss, MySQL - 5+ years’ experience implementing Solr builds of indexes, shards, and refined searches across semi-structured data sets to include architectural scaling - Experience in developing a re-usable framework to support web site search; implement rich web site search, including the incorporation of metadata. - Experienced in development using Java, Oracle, RedHat, Perl, shell, and clustering - A strong understanding of Data analytics, algorithms, and large data structures - Experienced in architectural design and resource planning for scaling Solr/Lucene capabilities. - Bachelor's degree in Computer Science or related discipline. Jessica Feigin Technical Recruiter Technology Resource Management 30 Vreeland Rd., Florham Park, NJ 07932 Phone 973-377-0040 x 415, Fax 973-377-7064 Email: jess...@trmconsulting.com Web site: www.trmconsulting.com LinkedIn Profile: www.linkedin.com/in/jessicafeigin
Re: Any Solr consultants available??
Perhaps the requirement means a total of 10 years of experience spread across Solr, HTML, XML, Java, Tomcat, JBoss, and MySQL. This doesn't seem likely, but it is satisfiable, so if we proceed on the assumption that a job posting doesn't contain unsatisfiable requirements then it's more reasonable than a naive interpretation. There exists the possibility of a satisfiable interpretation which is more intuitively appealing, and IMO this warrants further investigation. On Jul 23, 2014, at 3:57 PM, Tri Cao tm...@me.com wrote: Well, it's kind of hard to find a person if the requirement is 10 years' experience with Solr given that Solr was created in 2004. On Jul 23, 2014, at 12:45 PM, Jack Krupansky j...@basetechnology.com wrote: I occasionally get pinged by recruiters looking for Solr application developers... here’s the latest. If you are interested, either contact Jessica directly or reply to me and I’ll forward your reply. Even if you don’t strictly meet all the requirements... they are having trouble finding... anyone. All the great Solr guys I know are quite busy. Thanks. -- Jack Krupansky From: Jessica Feigin Sent: Wednesday, July 23, 2014 3:36 PM To: 'Jack Krupansky' Subject: Thank you! Hi Jack, Thanks for your assistance, below is the Solr Consultant job description: Our client, a hospitality Fortune 500 company are looking to update their platform to make accessing information easier for the franchisees. This is the first phase of the project which will take a few years. They want a hands on Solr consultant who has ideally worked in the search space. As you can imagine the company culture is great, everyone is really friendly and there is also an option to become permanent. They are looking for: - 10+ years’ experience with Solr (Apache Lucene), HTML, XML, Java, Tomcat, JBoss, MySQL - 5+ years’ experience implementing Solr builds of indexes, shards, and refined searches across semi-structured data sets to include architectural scaling - Experience in developing a re-usable framework to support web site search; implement rich web site search, including the incorporation of metadata. - Experienced in development using Java, Oracle, RedHat, Perl, shell, and clustering - A strong understanding of Data analytics, algorithms, and large data structures - Experienced in architectural design and resource planning for scaling Solr/Lucene capabilities. - Bachelor's degree in Computer Science or related discipline. Jessica Feigin Technical Recruiter Technology Resource Management 30 Vreeland Rd., Florham Park, NJ 07932 Phone 973-377-0040 x 415, Fax 973-377-7064 Email: jess...@trmconsulting.com Web site: www.trmconsulting.com LinkedIn Profile: www.linkedin.com/in/jessicafeigin
Performance of indexing using Solr
Hi, I am kind of in trouble regarding indexing documents using Solr. After every 15-20 documents, Solr gives below log: INFO - 2014-07-23 15:38:50.715; org.apache.solr.core.SolrDeletionPolicy; newest commit generation = 994 INFO - 2014-07-23 15:38:50.718; org.apache.solr.search.SolrIndexSearcher; Opening Searcher@41ffd00c[collection1] realtime INFO - 2014-07-23 15:38:50.719; org.apache.solr.update.DirectUpdateHandler2; end_commit_flush And stops for a minute or so. This is affecting throughput very badly. Anything wrong here? Thanks, Ameya
Re: integrating Accumulo with solr
We store data in both Solr and Accumulo -- do you have more details about what kind of data and indexing you want? Is there a reason you're thinking of using both databases in particular? On Wed, Jul 23, 2014 at 5:17 AM, Ali Nazemian alinazem...@gmail.com wrote: Dear All, Hi, I was wondering is there anybody out there that tried to integrate Solr with Accumulo? I was thinking about using Accumulo on top of HDFS and using Solr to index data inside Accumulo? Do you have any idea how can I do such integration? Best regards. -- A.Nazemian -- I know what it is to be in need, and I know what it is to have plenty. I have learned the secret of being content in any and every situation, whether well fed or hungry, whether living in plenty or in want. I can do all this through him who gives me strength.*-Philippians 4:12-13*
Re: Question about ReRankQuery
The ReRankingQParserPlugin uses the Lucene QueryRescorer, which only uses the score from the re-rank query when re-ranking the top N documents. Understood, but if the re-rank scores produce new ties, wouldn't you want to resort them with the FieldSortedHitQueue? Anyway, I was looking to reimplement the ScaleScoreQParser PostFilter plugin with RankQuery, and would need to implement the behavior of the DelegateCollector there for handling multiple sort fields. Peter On Wednesday, July 23, 2014, Joel Bernstein joels...@gmail.com wrote: The ReRankingQParserPlugin uses the Lucene QueryRescorer, which only uses the score from the re-rank query when re-ranking the top N documents. The ReRanklingQParserPlugin is built as a RankQuery plugin so you can swap in your own implementation. Patches are also welcome for the existing implementation. Joel Bernstein Search Engineer at Heliosearch On Wed, Jul 23, 2014 at 11:37 AM, Peter Keegan peterlkee...@gmail.com javascript:; wrote: See http://heliosearch.org/solrs-new-re-ranking-feature/ On Wed, Jul 23, 2014 at 11:27 AM, Erick Erickson erickerick...@gmail.com javascript:; wrote: I'm having a little trouble understanding the use-case here. Why use re-ranking? Isn't this just combining the original query with the second query with an AND and using the original sort? At the end, you have your original list in it's original order, with (potentially) some documents removed that don't satisfy the secondary query. Or I'm missing the boat entirely. Best, Erick On Wed, Jul 23, 2014 at 6:31 AM, Peter Keegan peterlkee...@gmail.com javascript:; wrote: I'm looking at how 'ReRankQuery' works. If the main query has a Sort criteria, it is only used to sort the first pass results. The QueryScorer used in the second pass only reorders the ScoreDocs based on score and docid, but doesn't use the original Sort fields. If the Sort criteria is 'score desc, myfield asc', I would expect 'myfield' to break score ties from the second pass after rescoring. Is this a bug or the intended behavior? Thanks, Peter
Issue with solr admin collection API 4.8.1
Solr 4.8.1 Zookeeper 3.4.5 Centos 6.5 We are running into an issue where one of our environments is unable to successfully execute commands via the collection API. We found that we were unable to add new collections and after doing some digging found that even /solr/admin/collections?action=LIST sits for 180 seconds and then times out. Looking at Zookeeper in overseer/collection-queue-work you can see that tasks get queued up but never executed. Restarting Zookeeper and the solr instances seems to have no effect. Has anyone experienced anything similar or know how to recover from this state? Thank you, Jonathan Hutchins
Re: How to migrate content of a collection to a new collection
: billions of documents (not enough memory). Please note that we are on 4.4, : which does not contain the new CURSOR-feature. Please also note that speed is : an important factor for us. for situations where you know you will be processing every doc and order doesn't matter you can use a poor mans cursor by filtering on sccessive ranges of your uniqueKey field as described in the Is There A Workaround? section of this blog post... http://searchhub.org/2013/12/12/coming-soon-to-solr-efficient-cursor-based-iteration-of-large-result-sets/ * sort on uniqueKey * leave start=0 on every requets * add an fq to each request based on the last uniqueKey value from the previous request. -Hoss http://www.lucidworks.com/
Re: SOLR 4.4 - Slave always replicates full index
Thanks Shawn. that makes sense. -- View this message in context: http://lucene.472066.n3.nabble.com/SOLR-4-4-Slave-always-replicates-full-index-tp4113089p4148909.html Sent from the Solr - User mailing list archive at Nabble.com.
commons-configuration NoClassDefFoundError: Predicate
Hi I've tried all permutations with no results so I thought I write to the group for help. I am running commons config (http://commons.apache.org/proper/commons-configuration/) just fine via maven and ant but when I try to run the class calling the method PropertiesConfiguration via a SOLR search component I get the following error org.eclipse.jetty.servlet.ServletHandler – Error for /solr/ArticlesRaw/ingest java.lang.NoClassDefFoundError: org/apache/commons/collections/Predicate at com.xyz.logic(Ingest.java:106) at com.xyz.logic.process(Runngest.java:76) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:217) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1916) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:780) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:427) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:217) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137) at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:533) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384) at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135) at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255) at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116) at org.eclipse.jetty.server.Server.handle(Server.java:368) at org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489) at org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53) at org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:942) at org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:1004) at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:640) at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235) at org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72) at org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264) at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608) at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543) at java.lang.Thread.run(Thread.java:744) Caused by: java.lang.ClassNotFoundException: org.apache.commons.collections.Predicate at java.net.URLClassLoader$1.run(URLClassLoader.java:366) at java.net.URLClassLoader$1.run(URLClassLoader.java:355) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:354) at java.lang.ClassLoader.loadClass(ClassLoader.java:425) at java.lang.ClassLoader.loadClass(ClassLoader.java:358) at org.eclipse.jetty.webapp.WebAppClassLoader.loadClass(WebAppClassLoader.java:430) at org.eclipse.jetty.webapp.WebAppClassLoader.loadClass(WebAppClassLoader.java:383) Following suggestions here http://stackoverflow.com/questions/7651799/proper-usage-of-apache-commons-configuration/7651867#7651867 I am including the appropriate jars in solrconfig.xml lib dir=${mvnRepository}/commons-lang/commons-lang/2.6/ regex=.*\.jar/ lib dir=${mvnRepository}/commons-collections/commons-collections/3.2.1/ regex=.*\.jar/ lib dir=${mvnRepository}/commons-logging/commons-logging/1.1.1/ regex=.*\.jar/ lib dir=${mvnRepository}/commons-configuration/commons-configuration/1.10/ regex=.*\.jar/ (the class is in org.apache.commons.collections.Predicate commons-collections/3.2.1 jar) I am running solr 4.7.1 Any help would be much appreciated Peyman
Re: Any Solr consultants available??
When I see job postings like this, I have to assume they were written by people who really don’t understand the problem and have never met people with the various skills they are asking for. They are not going to find one person who does all this. This is an opening for zebra unicorn that walks on water. At best, they’ll get a one-horned goat with painted stripes on a life raft. They need to talk to some people, make multiple realistic openings, and expect to grow some of their own expertise. I got an email like this from Goldman Sachs this morning. “... a Senior Application Architect/Developer and DevOps Engineer for a major company initiative. In addition to an effort to build a new cloud infrastructure from the ground up, they are beginning a number of company projects in the areas of cloud-based open source search, Machine Learning/AI, Big Data, Predictive Analytics Low-Latency Trading Algorithm Development.” Good luck, fellas. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ On Jul 23, 2014, at 1:01 PM, Jack Krupansky j...@basetechnology.com wrote: Yeah, I saw that, which is why I suggested not being too picky about specific requirements. If you have at least two or three years of solid Solr experience, that would make you at least worth looking at. -- Jack Krupansky From: Tri Cao Sent: Wednesday, July 23, 2014 3:57 PM To: solr-user@lucene.apache.org Cc: solr-user@lucene.apache.org Subject: Re: Any Solr consultants available?? Well, it's kind of hard to find a person if the requirement is 10 years' experience with Solr given that Solr was created in 2004. On Jul 23, 2014, at 12:45 PM, Jack Krupansky j...@basetechnology.com wrote: I occasionally get pinged by recruiters looking for Solr application developers... here’s the latest. If you are interested, either contact Jessica directly or reply to me and I’ll forward your reply. Even if you don’t strictly meet all the requirements... they are having trouble finding... anyone. All the great Solr guys I know are quite busy. Thanks. -- Jack Krupansky From: Jessica Feigin Sent: Wednesday, July 23, 2014 3:36 PM To: 'Jack Krupansky' Subject: Thank you! Hi Jack, Thanks for your assistance, below is the Solr Consultant job description: Our client, a hospitality Fortune 500 company are looking to update their platform to make accessing information easier for the franchisees. This is the first phase of the project which will take a few years. They want a hands on Solr consultant who has ideally worked in the search space. As you can imagine the company culture is great, everyone is really friendly and there is also an option to become permanent. They are looking for: - 10+ years’ experience with Solr (Apache Lucene), HTML, XML, Java, Tomcat, JBoss, MySQL - 5+ years’ experience implementing Solr builds of indexes, shards, and refined searches across semi-structured data sets to include architectural scaling - Experience in developing a re-usable framework to support web site search; implement rich web site search, including the incorporation of metadata. - Experienced in development using Java, Oracle, RedHat, Perl, shell, and clustering - A strong understanding of Data analytics, algorithms, and large data structures - Experienced in architectural design and resource planning for scaling Solr/Lucene capabilities. - Bachelor's degree in Computer Science or related discipline. Jessica Feigin Technical Recruiter Technology Resource Management 30 Vreeland Rd., Florham Park, NJ 07932 Phone 973-377-0040 x 415, Fax 973-377-7064 Email: jess...@trmconsulting.com Web site: www.trmconsulting.com LinkedIn Profile: www.linkedin.com/in/jessicafeigin
Re: Any Solr consultants available??
On Thu, Jul 24, 2014 at 2:44 AM, Jack Krupansky j...@basetechnology.com wrote: All the great Solr guys I know are quite busy. Sounds like an opportunity for somebody to put together a training hacker camp, similar to https://hackerbeach.org/ . Cross-train consultants in Solr, immediately increase their value. Do it somewhere on the beach or in the mountains, etc. If somebody organizes it, I would probably even be interested to teaching the first (newbie) part. And the graduation project would a be a solr-consutants.com website to make it easier to find those same consultants later. :-) Regards, Alex. P.s. Last issue of my newsletter had Solr big ideas. The one above was not in it, but it is - I believe - also viable. Contact me if it catches your fancy for more detailed brainstorming and notes sharing. Personal: http://www.outerthoughts.com/ and @arafalov Solr resources and newsletter: http://www.solr-start.com/ and @solrstart Solr popularizers community: https://www.linkedin.com/groups?gid=6713853
Re: Question about ReRankQuery
I like the FieldSortedHitQueue idea. If you want to work up a patch for that, it would be great. Joel Bernstein Search Engineer at Heliosearch On Wed, Jul 23, 2014 at 5:17 PM, Peter Keegan peterlkee...@gmail.com wrote: The ReRankingQParserPlugin uses the Lucene QueryRescorer, which only uses the score from the re-rank query when re-ranking the top N documents. Understood, but if the re-rank scores produce new ties, wouldn't you want to resort them with the FieldSortedHitQueue? Anyway, I was looking to reimplement the ScaleScoreQParser PostFilter plugin with RankQuery, and would need to implement the behavior of the DelegateCollector there for handling multiple sort fields. Peter On Wednesday, July 23, 2014, Joel Bernstein joels...@gmail.com wrote: The ReRankingQParserPlugin uses the Lucene QueryRescorer, which only uses the score from the re-rank query when re-ranking the top N documents. The ReRanklingQParserPlugin is built as a RankQuery plugin so you can swap in your own implementation. Patches are also welcome for the existing implementation. Joel Bernstein Search Engineer at Heliosearch On Wed, Jul 23, 2014 at 11:37 AM, Peter Keegan peterlkee...@gmail.com javascript:; wrote: See http://heliosearch.org/solrs-new-re-ranking-feature/ On Wed, Jul 23, 2014 at 11:27 AM, Erick Erickson erickerick...@gmail.com javascript:; wrote: I'm having a little trouble understanding the use-case here. Why use re-ranking? Isn't this just combining the original query with the second query with an AND and using the original sort? At the end, you have your original list in it's original order, with (potentially) some documents removed that don't satisfy the secondary query. Or I'm missing the boat entirely. Best, Erick On Wed, Jul 23, 2014 at 6:31 AM, Peter Keegan peterlkee...@gmail.com javascript:; wrote: I'm looking at how 'ReRankQuery' works. If the main query has a Sort criteria, it is only used to sort the first pass results. The QueryScorer used in the second pass only reorders the ScoreDocs based on score and docid, but doesn't use the original Sort fields. If the Sort criteria is 'score desc, myfield asc', I would expect 'myfield' to break score ties from the second pass after rescoring. Is this a bug or the intended behavior? Thanks, Peter
Re: Performance of indexing using Solr
It looks you're committing too frequently. If you're explicitly committing from the application you may want to switch to using autoCommits. If you're not committing from the application your autocommit settings are probably too low. Joel Bernstein Search Engineer at Heliosearch On Wed, Jul 23, 2014 at 4:41 PM, Ameya Aware ameya.aw...@gmail.com wrote: Hi, I am kind of in trouble regarding indexing documents using Solr. After every 15-20 documents, Solr gives below log: INFO - 2014-07-23 15:38:50.715; org.apache.solr.core.SolrDeletionPolicy; newest commit generation = 994 INFO - 2014-07-23 15:38:50.718; org.apache.solr.search.SolrIndexSearcher; Opening Searcher@41ffd00c[collection1] realtime INFO - 2014-07-23 15:38:50.719; org.apache.solr.update.DirectUpdateHandler2; end_commit_flush And stops for a minute or so. This is affecting throughput very badly. Anything wrong here? Thanks, Ameya