Negative Query Behaviour in Solr 3.2
Hi All, I am using solr 3.2 and confused how a particular query is executed. q=name:memory OR -name:encoded separately firing q=name:memory gives 3 results and q=-name:encoded gives 25 results and result sets are disjoint sets. Since I am doing OR query it should return 28 results, but it is only returning 3 results same as query (name:memory). Can anyone explain? -Karan -- View this message in context: http://lucene.472066.n3.nabble.com/Negative-Query-Behaviour-in-Solr-3-2-tp4081538.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: How might one search for dupe IDs other than faceting on the ID field?
On Wed, Jul 31, 2013 at 4:56 AM, Bill Bell billnb...@gmail.com wrote: On Jul 30, 2013, at 12:34 PM, Dotan Cohen dotanco...@gmail.com wrote: On Tue, Jul 30, 2013 at 9:21 PM, Aloke Ghoshal alghos...@gmail.com wrote: Does adding facet.mincount=2 help? In fact, when adding facet.mincount=20 (I know that some dupes are in the hundreds) I got the OutOfMemoryError in seconds instead of minutes. Dotan Cohen This seems like a fairly large issue. Can you create a Jira issue ? Bill Bell I'll file an issue, but on what? What information should I include? How is this different that what you would expect? Thanks. -- Dotan Cohen http://gibberish.co.il http://what-is-what.com
Re: Solr Cloud Setup
What was the problem..? On Tue, Jul 30, 2013 at 10:33 PM, AdityaR aditya.ravinuth...@gmail.comwrote: I was able to get the setup to work. -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-Cloud-Setup-tp4080182p4081434.html Sent from the Solr - User mailing list archive at Nabble.com.
EmbeddedSolrServer Solr 4.4.0 bug?
Hello guys, Since I upgrade from 4.1.0 to 4.4.0 version I've noticed that EmbeddedSolrServer has changed a little the way of construction: *Solr 4.1.0 style:* CoreContainer coreContainer = new CoreContainer(*solrHome, new File(solrHome+/solr.xml*)); EmbeddedSolrServer localSolrServer = new EmbeddedSolrServer(coreContainer, core); *Solr 4.4.0 new style: * CoreContainer coreContainer = new CoreContainer(*solrHome*); EmbeddedSolrServer localSolrServer = new EmbeddedSolrServer(coreContainer, core); However, it's not working. I've got the following solr.xml configuration file: *cores adminPath=/admin/cores defaultCoreName=core host=${host:} hostPort=${jetty.port:8983} hostContext=${hostContext:solr} zkClientTimeout=${zkClientTimeout:15000} * *core name=core instanceDir=core /* * /cores * */solr* And resources appears to be loaded correctly: *2013-07-31 09:46:37,583 47889 [main] INFO org.apache.solr.core.ConfigSolr - Loading container configuration from /opt/solr/solr.xml* But when indexing into core with coreName 'core', it throws an Exception: *2013-07-31 09:50:49,409 5189 [main] ERROR com.buguroo.solr.index.WriteIndex - No such core: core* Or I am sleppy, something that's possible, or there is some kind of bug here. Best regards, -- - Luis Cappa
SimplePostTool: FATAL: Solr returned an error #400 Bad Request
Hi All Currently I am in a mid of a project which Index some data to Solrs multiple instance. I have the Configuration as, on the same machine I have made multiple instances of Solr http://localhost:8080/solr1 http://localhost:8080/solr2 http://localhost:8080/solr3 http://localhost:8080/solr4 http://localhost:8080/solr5 http://localhost:8080/solr6 Now when I am posting the Data to Solr through SimplePostTool by passing a xml file in spt.postFile(file) method and committing it there after. This all process is Multithreaded and works fine till 1 Million of data record but there after it suddenly stops saying, *SimplePostTool: FATAL: Solr returned an error #400 Bad Request* * * in the Tomcat Catalina I found *WARNING: Failed to register info bean: searcher* *javax.management.InstanceAlreadyExistsException: solr/:type=searcher,id=org.apache.solr.search.SolrIndexSearcher* * at com.sun.jmx.mbeanserver.Repository.addMBean(Repository.java:437)* * at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.registerWithRepository(DefaultMBeanServerInterceptor.java:1898) * * at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.registerDynamicMBean(DefaultMBeanServerInterceptor.java:966) * * at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.registerObject(DefaultMBeanServerInterceptor.java:900) * * at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.registerMBean(DefaultMBeanServerInterceptor.java:324) * * at com.sun.jmx.mbeanserver.JmxMBeanServer.registerMBean(JmxMBeanServer.java:513) * * at org.apache.solr.core.JmxMonitoredMap.put(JmxMonitoredMap.java:141)* * at org.apache.solr.core.JmxMonitoredMap.put(JmxMonitoredMap.java:47)* * at org.apache.solr.search.SolrIndexSearcher.register(SolrIndexSearcher.java:220) * * at org.apache.solr.core.SolrCore.registerSearcher(SolrCore.java:1349)* * at org.apache.solr.core.SolrCore.access$000(SolrCore.java:84)* * at org.apache.solr.core.SolrCore$5.call(SolrCore.java:1247)* * at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)* * at java.util.concurrent.FutureTask.run(FutureTask.java:166)* * at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) * * at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) * * at java.lang.Thread.run(Thread.java:722)* * * *Jul 31, 2013 12:46:00 PM org.apache.solr.core.SolrCore registerSearcher* *INFO: [] Registered new searcher Searcher@5fa1891b main* *Jul 31, 2013 12:46:00 PM org.apache.solr.search.SolrIndexSearcher close* Has anybody traced such issue. Please this is really very Urgent and Important. Waiting for your response. Thanks and Regards Vineet
Re: EmbeddedSolrServer Solr 4.4.0 bug?
Hi Luis, You need to call coreContainer.load() after construction for it to load the cores. Previously the CoreContainer(solrHome, configFile) constructor also called load(), but this was the only constructor to do that. I probably need to put something in CHANGES.txt to point this out... Alan Woodward www.flax.co.uk On 31 Jul 2013, at 08:53, Luis Cappa Banda wrote: Hello guys, Since I upgrade from 4.1.0 to 4.4.0 version I've noticed that EmbeddedSolrServer has changed a little the way of construction: *Solr 4.1.0 style:* CoreContainer coreContainer = new CoreContainer(*solrHome, new File(solrHome+/solr.xml*)); EmbeddedSolrServer localSolrServer = new EmbeddedSolrServer(coreContainer, core); *Solr 4.4.0 new style: * CoreContainer coreContainer = new CoreContainer(*solrHome*); EmbeddedSolrServer localSolrServer = new EmbeddedSolrServer(coreContainer, core); However, it's not working. I've got the following solr.xml configuration file: *cores adminPath=/admin/cores defaultCoreName=core host=${host:} hostPort=${jetty.port:8983} hostContext=${hostContext:solr} zkClientTimeout=${zkClientTimeout:15000} * *core name=core instanceDir=core /* * /cores * */solr* And resources appears to be loaded correctly: *2013-07-31 09:46:37,583 47889 [main] INFO org.apache.solr.core.ConfigSolr - Loading container configuration from /opt/solr/solr.xml* But when indexing into core with coreName 'core', it throws an Exception: *2013-07-31 09:50:49,409 5189 [main] ERROR com.buguroo.solr.index.WriteIndex - No such core: core* Or I am sleppy, something that's possible, or there is some kind of bug here. Best regards, -- - Luis Cappa
result grouping and paging, solr 4.21
Hello, I'm trying to page results with grouping /field collapsing. My query is: ?q=myKeywordsstart=0rows=100group=truegroup.field=myGroupFieldgroup.format=simplegroup.limit=1 The result will contain 70 groups, is there a way to get 100 records returned, means 70 from each group first doc and second docs from the first 30 groups? Thanks, Gunnar
Re: EmbeddedSolrServer Solr 4.4.0 bug?
Thank you very much, Alan. Now it's working! I agree with you: this kind of things should be documented at least in CHANGELOG.txt, because when upgrading from one version to another all should be compatible between versions, but this is not the case, thus people should be noticed of that. Regards, 2013/7/31 Alan Woodward a...@flax.co.uk Hi Luis, You need to call coreContainer.load() after construction for it to load the cores. Previously the CoreContainer(solrHome, configFile) constructor also called load(), but this was the only constructor to do that. I probably need to put something in CHANGES.txt to point this out... Alan Woodward www.flax.co.uk On 31 Jul 2013, at 08:53, Luis Cappa Banda wrote: Hello guys, Since I upgrade from 4.1.0 to 4.4.0 version I've noticed that EmbeddedSolrServer has changed a little the way of construction: *Solr 4.1.0 style:* CoreContainer coreContainer = new CoreContainer(*solrHome, new File(solrHome+/solr.xml*)); EmbeddedSolrServer localSolrServer = new EmbeddedSolrServer(coreContainer, core); *Solr 4.4.0 new style: * CoreContainer coreContainer = new CoreContainer(*solrHome*); EmbeddedSolrServer localSolrServer = new EmbeddedSolrServer(coreContainer, core); However, it's not working. I've got the following solr.xml configuration file: *cores adminPath=/admin/cores defaultCoreName=core host=${host:} hostPort=${jetty.port:8983} hostContext=${hostContext:solr} zkClientTimeout=${zkClientTimeout:15000} * *core name=core instanceDir=core /* * /cores * */solr* And resources appears to be loaded correctly: *2013-07-31 09:46:37,583 47889 [main] INFO org.apache.solr.core.ConfigSolr - Loading container configuration from /opt/solr/solr.xml* But when indexing into core with coreName 'core', it throws an Exception: *2013-07-31 09:50:49,409 5189 [main] ERROR com.buguroo.solr.index.WriteIndex - No such core: core* Or I am sleppy, something that's possible, or there is some kind of bug here. Best regards, -- - Luis Cappa -- - Luis Cappa
Solr PolyField
Hi, I'm trying to create a field with multiple fields inside, that is: origin: { htmlUrl: http://www.gazzetta.it/;, streamId: feed/http://www.gazzetta.it/rss/Home.xml;, title: Gazzetta.it }, Get something like this. Is that possible? I'm using Solr 4.4.0. Thanks smime.p7s Description: S/MIME cryptographic signature
Sharding with a SolrCloud
Hi list, I have a Solr server, which uses sharding to make distributed search with another Solr server. The other Solr server now migrates to a Solr Cloud system. I've been trying recently to continue searching the Solr Cloud as a shard for my Solr server, but this is failing with mysterious effects. I am getting a result with a number of hits, when I perform a search, but the results are not displayed at all. This is the resonse header I am getting from Solr: { responseHeader:{ status:0, QTime:305, params:{ facet:true, indent:yes, facet.mincount:1, facet.limit:30, qf:title_short^750 title_full_unstemmed^600, json.nl:arrarr, wt:json, rows:20, shards:ourindex.nowhere.de/solr/index, bq:format:Book^500, fl:*,score, facet.sort:count, start:0, q:xml, shards.info:true, facet.prefix:, facet.field:[publishDate], qt:dismax}}, shards.info:{ ourindex.nowhere.de/solr/index:{ numFound:10076, maxScore:8.507474, time:263}}, response:{numFound:10056,start:0,maxScore:8.507474,docs:[] } As you can see, there are no docs in the result. This result is not 100% reproducable: sometimes I get no results displayed, other times it works (with the same query URL!). As you also can see in the result, the number of hits in the response is a little bit less than the number of hits sent from the shard. This makes me wonder if it is not possible to use a Solr Cloud as a shard for another standalone Solr server? Any hint is appreciated! Best - Oliver -- Oliver Goldschmidt TU Hamburg-Harburg / Universitätsbibliothek / Digitale Dienste Denickestr. 22 21071 Hamburg - Harburg Tel.+49 (0)40 / 428 78 - 32 91 eMail o.goldschm...@tuhh.de -- GPG/PGP-Schlüssel: http://www.tub.tu-harburg.de/keys/Oliver_Marahrens_pub.asc
Solr show total row count in response of full import
Hey there Is there a way to show the total row count (documents that will be inserted) when executing a full import through the Data Import Request handler ? Currently after executing a full import and pointing to solrcore/dataimport you can get the total rows processed str name=Total Documents Processed6354/str It would be nice if you could receive a total row count like str name=Total Documents10100/str With this information we could add another information like str name=Imported in Percent 62.91/str This would make it easier to generate a progress bar for the end user. Best regards Sandro Zbinden
Re: Negative Query Behaviour in Solr 3.2
Can you try: q=+name:memory -name:encoded or q=name:memory AND -name:encoded On Wed, Jul 31, 2013 at 10:14 AM, karanjindal karan_jin...@students.iiit.ac.in wrote: Hi All, I am using solr 3.2 and confused how a particular query is executed. q=name:memory OR -name:encoded separately firing q=name:memory gives 3 results and q=-name:encoded gives 25 results and result sets are disjoint sets. Since I am doing OR query it should return 28 results, but it is only returning 3 results same as query (name:memory). Can anyone explain? -Karan -- View this message in context: http://lucene.472066.n3.nabble.com/Negative-Query-Behaviour-in-Solr-3-2-tp4081538.html Sent from the Solr - User mailing list archive at Nabble.com. -- Sincerely yours Mikhail Khludnev Principal Engineer, Grid Dynamics http://www.griddynamics.com mkhlud...@griddynamics.com
TrieField and FieldCache confusion
Hello everyone, I have a question about Solr TrieField and Lucene FieldCache. From my understanding, Solr added the implementation of TrieField to perform faster range queries. For each value it will index multiple terms. The n-th term being a masked version of our value, showing only it first (precisionStep * n) bits. When uninverting the field to populate a FieldCache, the last value with regard to the lexicographical order will be retained ; which from my understanding should be the term with the highest precision. Can I expect the FieldCache of Lucene to return the correct values when working with TrieField with the precisionStep higher than 0. If not, what did I get wrong? Regards, Paul Masurel e-mail: paul.masu...@gmail.com
Re: How might one search for dupe IDs other than faceting on the ID field?
fwiw, this code won't capture uncommitted duplicates. On Wed, Jul 31, 2013 at 9:41 AM, Dotan Cohen dotanco...@gmail.com wrote: On Tue, Jul 30, 2013 at 11:14 PM, Jack Krupansky j...@basetechnology.com wrote: The Solr SignatureUpdateProcessorFactory is designed to facilitate dedupe... any particular reason you did not use it? See: http://wiki.apache.org/solr/Deduplication and https://cwiki.apache.org/confluence/display/solr/De-Duplication Actually, the guy who made the changes (a coworker) did in fact write an alternative UpdateHandler. I've just noticed that there are a bunch of dupes right now, though. public class DiscoAPIUpdateHandler extends DirectUpdateHandler2 { public DiscoAPIUpdateHandler(SolrCore core) { super(core); } @Override public int addDoc(AddUpdateCommand cmd) throws IOException{ // if overwrite is set to false we'll use the DefaultUpdateHandler2 , this is done for debugging to insert duplicates to solr if (!cmd.overwrite) return super.addDoc(cmd); // when using ref counted objects you have!! to decrement the ref count when your done RefCountedSolrIndexSearcher indexSearcher = this.core.getNewestSearcher(false); // the idea is like this we'll make an internal lucene query and check if that id already exists Term updateTerm = null; if (cmd.updateTerm != null){ updateTerm = cmd.updateTerm; } else { updateTerm = new Term(id,cmd.getIndexedId()); } Query query = new TermQuery(updateTerm); TopDocs docs = indexSearcher.get().search(query,2); if (docs.totalHits0){ // index searcher is no longer needed indexSearcher.decref(); // don't add the new document return 0; } // index searcher is no longer needed indexSearcher.decref(); // if i'm here then it's a new document return super.addDoc(cmd); } } And I give a bunch of examples in my book. I anticipate the book with esteem! -- Dotan Cohen http://gibberish.co.il http://what-is-what.com -- Sincerely yours Mikhail Khludnev Principal Engineer, Grid Dynamics http://www.griddynamics.com mkhlud...@griddynamics.com
Re: Performance question on Spatial Search
On Wed, Jul 31, 2013 at 1:10 AM, Steven Bower sbo...@alcyon.net wrote: not sure what you mean by good hit raitio? I mean such queries are really expensive (even on cache hit), so if the list of ids changes every time, it never hit cache and hence executes these heavy queries every time. It's well known performance problem. Here are the stacks... they seems like hotspots, and shows index reading that's reasonable. But I can't see what caused these readings, to get that I need whole stack of hot thread. Name Time (ms) Own Time (ms) org.apache.lucene.search.MultiTermQueryWrapperFilter.getDocIdSet(AtomicReaderContext, Bits) 300879 203478 org.apache.lucene.codecs.lucene41.Lucene41PostingsReader$BlockDocsEnum.nextDoc() 45539 19 org.apache.lucene.codecs.lucene41.Lucene41PostingsReader$BlockDocsEnum.refillDocs() 45519 40 org.apache.lucene.codecs.lucene41.Lucene41PostingsReader.readVIntBlock(IndexInput, int[], int[], int, boolean) 24352 0 org.apache.lucene.store.DataInput.readVInt() 24352 24352 org.apache.lucene.codecs.lucene41.ForUtil.readBlock(IndexInput, byte[], int[]) 21126 14976 org.apache.lucene.store.ByteBufferIndexInput.readBytes(byte[], int, int) 6150 0 java.nio.DirectByteBuffer.get(byte[], int, int) 6150 0 java.nio.Bits.copyToArray(long, Object, long, long, long) 6150 6150 org.apache.lucene.codecs.BlockTreeTermsReader$FieldReader$SegmentTermsEnum.docs(Bits, DocsEnum, int) 35342 421 org.apache.lucene.codecs.BlockTreeTermsReader$FieldReader$SegmentTermsEnum$Frame.decodeMetaData() 34920 27939 org.apache.lucene.codecs.lucene41.Lucene41PostingsReader.nextTerm(FieldInfo, BlockTermState) 6980 6980 org.apache.lucene.codecs.BlockTreeTermsReader$FieldReader$SegmentTermsEnum.next() 14129 1053 org.apache.lucene.codecs.BlockTreeTermsReader$FieldReader$SegmentTermsEnum$Frame.loadNextFloorBlock() 5948 261 org.apache.lucene.codecs.BlockTreeTermsReader$FieldReader$SegmentTermsEnum$Frame.loadBlock() 5686 199 org.apache.lucene.store.ByteBufferIndexInput.readBytes(byte[], int, int) 3606 0 java.nio.DirectByteBuffer.get(byte[], int, int) 3606 0 java.nio.Bits.copyToArray(long, Object, long, long, long) 3606 3606 org.apache.lucene.codecs.lucene41.Lucene41PostingsReader.readTermsBlock(IndexInput, FieldInfo, BlockTermState) 1879 80 org.apache.lucene.store.ByteBufferIndexInput.readBytes(byte[], int, int) 1798 0java.nio.DirectByteBuffer.get(byte[], int, int) 1798 0 java.nio.Bits.copyToArray(long, Object, long, long, long) 1798 1798 org.apache.lucene.codecs.BlockTreeTermsReader$FieldReader$SegmentTermsEnum$Frame.next() 4010 3324 org.apache.lucene.codecs.BlockTreeTermsReader$FieldReader$SegmentTermsEnum$Frame.nextNonLeaf() 685 685 org.apache.lucene.codecs.BlockTreeTermsReader$FieldReader$SegmentTermsEnum$Frame.loadBlock() 3117 144 org.apache.lucene.store.ByteBufferIndexInput.readBytes(byte[], int, int) 1861 0java.nio.DirectByteBuffer.get(byte[], int, int) 1861 0 java.nio.Bits.copyToArray(long, Object, long, long, long) 1861 1861 org.apache.lucene.codecs.lucene41.Lucene41PostingsReader.readTermsBlock(IndexInput, FieldInfo, BlockTermState) 1090 19 org.apache.lucene.store.ByteBufferIndexInput.readBytes(byte[], int, int) 1070 0 java.nio.DirectByteBuffer.get(byte[], int, int) 1070 0 java.nio.Bits.copyToArray(long, Object, long, long, long) 1070 1070 org.apache.lucene.codecs.BlockTreeTermsReader$FieldReader$SegmentTermsEnum.initIndexInput() 20 0org.apache.lucene.store.ByteBufferIndexInput.clone() 20 0 org.apache.lucene.store.ByteBufferIndexInput.clone() 20 0 org.apache.lucene.store.ByteBufferIndexInput.buildSlice(long, long) 20 0 org.apache.lucene.util.WeakIdentityMap.put(Object, Object) 20 0 org.apache.lucene.util.WeakIdentityMap$IdentityWeakReference.init(Object, ReferenceQueue) 20 0 java.lang.System.identityHashCode(Object) 20 20 org.apache.lucene.index.FilteredTermsEnum.docs(Bits, DocsEnum, int) 1485 527 org.apache.lucene.codecs.BlockTreeTermsReader$FieldReader$SegmentTermsEnum.docs(Bits, DocsEnum, int) 957 0 org.apache.lucene.codecs.BlockTreeTermsReader$FieldReader$SegmentTermsEnum$Frame.decodeMetaData() 957 513 org.apache.lucene.codecs.lucene41.Lucene41PostingsReader.nextTerm(FieldInfo, BlockTermState) 443 443 org.apache.lucene.index.FilteredTermsEnum.next() 874 324 org.apache.lucene.search.NumericRangeQuery$NumericRangeTermsEnum.accept(BytesRef) 368 0 org.apache.lucene.util.BytesRef$UTF8SortedAsUnicodeComparator.compare(Object, Object) 368 368 org.apache.lucene.codecs.BlockTreeTermsReader$FieldReader$SegmentTermsEnum.next() 160 0 org.apache.lucene.codecs.BlockTreeTermsReader$FieldReader$SegmentTermsEnum$Frame.loadNextFloorBlock() 160 0 org.apache.lucene.codecs.BlockTreeTermsReader$FieldReader$SegmentTermsEnum$Frame.loadBlock() 160 0 org.apache.lucene.store.ByteBufferIndexInput.readBytes(byte[], int, int) 120 0
Re: Improper shutdown of Solr in Jetty 9
Hello Dmitry, it's Windows 7. I'm starting Jetty with java -jar start.jar 31.07.2013 12:36, Dmitry Kan пишет: Artem, Whats the OS are using? So far jetty 9 with solr 4.3.1 works ok under ubuntu 12.04. On 30 Jul 2013 17:23, Alexandre Rafalovitch arafa...@gmail.com wrote: Of course, I meant Jetty (not Tomcat). So apologies for spam and confusion of my own. The rest of the statement stands. Personal website: http://www.outerthoughts.com/ LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch - Time is the quality of nature that keeps events from happening all at once. Lately, it doesn't seem to be working. (Anonymous - via GTD book) On Tue, Jul 30, 2013 at 10:20 AM, Alexandre Rafalovitch arafa...@gmail.comwrote: Thanks for letting us know. See if you can add it to the documentation somewhere. Solr is not using Tomcat 9, but I believe that was primarily because Tomcat 9 requires Java 7 and Solr 4.x is staying with Java 6 as minimum requirement. Regards, Alex. Personal website: http://www.outerthoughts.com/ LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch - Time is the quality of nature that keeps events from happening all at once. Lately, it doesn't seem to be working. (Anonymous - via GTD book) On Tue, Jul 30, 2013 at 10:09 AM, Artem Karpenko a.karpe...@oxseed.com wrote: Uh, sorry for spamming, but if anyone interested there is a way to properly shutdown Jetty when it's launched with --exec flag. You can use JMX to invoke method stop() on the Jetty's Server MBean. This triggers a proper shutdown with all Solr's close() callbacks executed. I wonder why it's not noted at least in Jetty documentation. Regards, Artem Karpenko. 30.07.2013 16:58, Artem Karpenko пишет: After some investigation I found that the problem is not with Jetty's version but usage of --exec flag. Namely, when --exec is used (to specify JVM args) then shutdown is not graceful, it seems that Java process that is just killed. Not sure how to handle this... Regards, Artem Karpenko. 29.07.2013 16:51, Artem Karpenko пишет: Hi, I can't make Solr shut down properly when using Jetty 9. Tested this with a simple plugin that only extends DirectUpdateHandler2, creates a file in constructor and deletes it in close(). While it's working fine in the example installation (the one that can be downloaded from Solr site) and in the simple custom installation with Jetty 8, it won't in Jetty 9. There is not much logging at shutdown at all, just Jetty's closing selector or smth., unlike with Jetty 8 where it prints various Graceful shutdown messages from Solr. Installation procedure I used for both Jettys is rather simple: just put solr.war into webapps/ directory, plugin JAR into {core}/lib/ and configure update handler in solrconfig.xml. OS is Windows 7, Solr 4.4. I tried to stop Jetty with both Ctrl+C and java start.jar [port/key params] --stop. For Jetty 8 it works fine even with Ctrl+C. Did anybody stumble on this issue? Best regards, Artem.
Re: Trying to determine the benefit of spellcheck-based suggester vs. using terms component?
The biggest thing is that the spellchecker has lots of knobs to tune, all the stuff in http://wiki.apache.org/solr/SpellCheckComponent#spellcheck.collate TermsComponent, on the other hand, just gives you what's in the index with essentially no knobs to tune. So it depends on your goal. Typeahead or spelling correction? In the first case I'd go for TermsComponent and the second spell check as an example. Best Erick On Tue, Jul 30, 2013 at 2:07 PM, Timothy Potter thelabd...@gmail.com wrote: Going over the comments in SOLR-1316, I seemed to have lost the forrest for the trees. What is the benefit of using the spellcheck based suggester over something like the terms component to get suggestions as the user types? Maybe it is faster because it builds the in-memory data structure on commit? Seems like the terms component is pretty fast too. I'd appreciate any additional insights about this. There are so many solutions to auto-suggest for Solr, it's hard to decide what approach to take. Cheers, Tim
Re: SimplePostTool: FATAL: Solr returned an error #400 Bad Request
Probably not the root of your problem, but bq: and committing it there after. Does that mean you're calling commit after every document? This is usually poor practice, I'd set the autocommit intervals on solrconfig.xml and NOT call commit explicitly. Does the same document fail every time? What does it look like? You really haven't provided much information to go on. Best Erick On Wed, Jul 31, 2013 at 3:55 AM, Vineet Mishra clearmido...@gmail.com wrote: Hi All Currently I am in a mid of a project which Index some data to Solrs multiple instance. I have the Configuration as, on the same machine I have made multiple instances of Solr http://localhost:8080/solr1 http://localhost:8080/solr2 http://localhost:8080/solr3 http://localhost:8080/solr4 http://localhost:8080/solr5 http://localhost:8080/solr6 Now when I am posting the Data to Solr through SimplePostTool by passing a xml file in spt.postFile(file) method and committing it there after. This all process is Multithreaded and works fine till 1 Million of data record but there after it suddenly stops saying, *SimplePostTool: FATAL: Solr returned an error #400 Bad Request* * * in the Tomcat Catalina I found *WARNING: Failed to register info bean: searcher* *javax.management.InstanceAlreadyExistsException: solr/:type=searcher,id=org.apache.solr.search.SolrIndexSearcher* * at com.sun.jmx.mbeanserver.Repository.addMBean(Repository.java:437)* * at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.registerWithRepository(DefaultMBeanServerInterceptor.java:1898) * * at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.registerDynamicMBean(DefaultMBeanServerInterceptor.java:966) * * at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.registerObject(DefaultMBeanServerInterceptor.java:900) * * at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.registerMBean(DefaultMBeanServerInterceptor.java:324) * * at com.sun.jmx.mbeanserver.JmxMBeanServer.registerMBean(JmxMBeanServer.java:513) * * at org.apache.solr.core.JmxMonitoredMap.put(JmxMonitoredMap.java:141)* * at org.apache.solr.core.JmxMonitoredMap.put(JmxMonitoredMap.java:47)* * at org.apache.solr.search.SolrIndexSearcher.register(SolrIndexSearcher.java:220) * * at org.apache.solr.core.SolrCore.registerSearcher(SolrCore.java:1349)* * at org.apache.solr.core.SolrCore.access$000(SolrCore.java:84)* * at org.apache.solr.core.SolrCore$5.call(SolrCore.java:1247)* * at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)* * at java.util.concurrent.FutureTask.run(FutureTask.java:166)* * at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) * * at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) * * at java.lang.Thread.run(Thread.java:722)* * * *Jul 31, 2013 12:46:00 PM org.apache.solr.core.SolrCore registerSearcher* *INFO: [] Registered new searcher Searcher@5fa1891b main* *Jul 31, 2013 12:46:00 PM org.apache.solr.search.SolrIndexSearcher close* Has anybody traced such issue. Please this is really very Urgent and Important. Waiting for your response. Thanks and Regards Vineet
Re: result grouping and paging, solr 4.21
Not that I know of. Grouping pretty much treats all groups the same... Best Erick On Wed, Jul 31, 2013 at 4:14 AM, Gunnar glus...@akitogo.com wrote: Hello, I'm trying to page results with grouping /field collapsing. My query is: ?q=myKeywordsstart=0rows=100group=truegroup.field=myGroupFieldgroup.format=simplegroup.limit=1 The result will contain 70 groups, is there a way to get 100 records returned, means 70 from each group first doc and second docs from the first 30 groups? Thanks, Gunnar
Working with solr over two different db schemas
Been working on it for quitre some time. this is my config dataConfig dataSource type=JdbcDataSource name=ds1 driver=com.microsoft.sqlserver.jdbc.SQLServerDriver url=jdbc:sqlserver://...:1433;databaseName=A user=XX password=XX / document entity name=PackageVersion pk=PackageVersionId query= /*PackageVersion.Query*/ select PackageVersion.Id PackageVersionId, PackageVersion.VersionNumber, CONVERT(char(19), PackageVersion.LastModificationTime ,126) + 'Z' LastModificationTime, Package.Id PackageId, Package.Name PackageName, PackageVersion.Comments PackageVersionComments, Package.CreatedBy CreatedBy from [dbo].[Package] Package inner join [dbo].[PackageVersion] PackageVersion on Package.Id = PackageVersion.PackageId where Package.RecordStatusId=0 and PackageVersion.RecordStatusId=0 entity name=PackageTag pk=ResourceId processor=CachedSqlEntityProcessor cacheKey=ResourceId cacheLookup=PackageVersion.PackageId query=/*PackageTag.Query*/ select ResourceId,[Text] PackageTag from [dbo].[Tag] Tag Where ResourceType = 0/ /entity /document /dataConfig Now, this runs in my test env and the only thing I do is change the configuration to another db( and as a result also the schema name from [dbo] to another ) This result in a totally different behavior. In the first configuration the selects were done be this order - inner object and then outer object. which means that the cache works. In the second configuration - over the other db the order was first the outer and then the inner. cache did not work at all. the inner query is not stored at all. What could be the problem?
queryResultCache showing all zeros
Hi, We just configured a new Solr cloud (5 nodes) running Solr 4.3, ran about 200 000 queries taken from our production environment and measured the performance of the cloud over a collection of 14M documents with the default Solr settings. We are now trying to tune the different caches and when I look at each node of the cloud, all of them are showing no activity (see below) regarding the queryResultCache... all other caches are showing some activity. Any idea what could cause this? * org.apache.solr.search.LRUCache * version: 1.0 * description: LRU Cache(maxSize=512, initialSize=512) * src: $URL: https:/?/?svn.apache.org/?repos/?asf/?lucene/?dev/?branches/?lucene_solr_4_3/?solr/?core/?src/?java/?org/?apache/?solr/?search/?LRUCache.javahttps://svn.apache.org/repos/asf/lucene/dev/branches/lucene_solr_4_3/solr/core/src/java/org/apache/solr/search/LRUCache.java $ * stats: * lookups: 0 * hits: 0 * hitratio: 0.00 * inserts: 0 * evictions: 0 * size: 0 * warmupTime: 0 * cumulative_lookups: 0 * cumulative_hits: 0 * cumulative_hitratio: 0.00 * cumulative_inserts: 0 * cumulative_evictions: 0
Re: Negative Query Behaviour in Solr 3.2
Since there are no parentheses, the terms and operators are all at the save level and the OR is essentially a redundant operator and ignored, so: q=name:memory OR -name:encoded is treated as: q=name:memory -name:encoded When what you probably want is: q=name:memory OR (-name:encoded) BUT... a bug/deficiency prevents Solr from handling pure-negative sub-queries properly, so you have to add a *:*: q=name:memory OR (*:* -name:encoded) So that reads ... or any documents that do not contain encoded in the name field, which is equivalent to ... or all documents except those that have encoded in the name field. -- Jack Krupansky -Original Message- From: karanjindal Sent: Wednesday, July 31, 2013 2:14 AM To: solr-user@lucene.apache.org Subject: Negative Query Behaviour in Solr 3.2 Hi All, I am using solr 3.2 and confused how a particular query is executed. q=name:memory OR -name:encoded separately firing q=name:memory gives 3 results and q=-name:encoded gives 25 results and result sets are disjoint sets. Since I am doing OR query it should return 28 results, but it is only returning 3 results same as query (name:memory). Can anyone explain? -Karan -- View this message in context: http://lucene.472066.n3.nabble.com/Negative-Query-Behaviour-in-Solr-3-2-tp4081538.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: SimplePostTool: FATAL: Solr returned an error #400 Bad Request
I got it resolved, actually the error trace was even more above this one. It was just that the posting XML was not forming properly for the Solr field *Date* which usually takes the format *2006-07-15T22:18:48Z* * * This is the standard format for the Solr date(datatype) which follows specifically some of the pattern mentioned. - 1995-12-31T23:59:59Z - 1995-12-31T23:59:59.9Z - 1995-12-31T23:59:59.99Z - 1995-12-31T23:59:59.999Z As documented by Solr http://www.meticent.com/DAt( *www.meticent.com/DAt*) By the way thanks! Vineet On Wed, Jul 31, 2013 at 4:47 PM, Erick Erickson erickerick...@gmail.comwrote: Probably not the root of your problem, but bq: and committing it there after. Does that mean you're calling commit after every document? This is usually poor practice, I'd set the autocommit intervals on solrconfig.xml and NOT call commit explicitly. Does the same document fail every time? What does it look like? You really haven't provided much information to go on. Best Erick On Wed, Jul 31, 2013 at 3:55 AM, Vineet Mishra clearmido...@gmail.com wrote: Hi All Currently I am in a mid of a project which Index some data to Solrs multiple instance. I have the Configuration as, on the same machine I have made multiple instances of Solr http://localhost:8080/solr1 http://localhost:8080/solr2 http://localhost:8080/solr3 http://localhost:8080/solr4 http://localhost:8080/solr5 http://localhost:8080/solr6 Now when I am posting the Data to Solr through SimplePostTool by passing a xml file in spt.postFile(file) method and committing it there after. This all process is Multithreaded and works fine till 1 Million of data record but there after it suddenly stops saying, *SimplePostTool: FATAL: Solr returned an error #400 Bad Request* * * in the Tomcat Catalina I found *WARNING: Failed to register info bean: searcher* *javax.management.InstanceAlreadyExistsException: solr/:type=searcher,id=org.apache.solr.search.SolrIndexSearcher* * at com.sun.jmx.mbeanserver.Repository.addMBean(Repository.java:437)* * at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.registerWithRepository(DefaultMBeanServerInterceptor.java:1898) * * at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.registerDynamicMBean(DefaultMBeanServerInterceptor.java:966) * * at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.registerObject(DefaultMBeanServerInterceptor.java:900) * * at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.registerMBean(DefaultMBeanServerInterceptor.java:324) * * at com.sun.jmx.mbeanserver.JmxMBeanServer.registerMBean(JmxMBeanServer.java:513) * * at org.apache.solr.core.JmxMonitoredMap.put(JmxMonitoredMap.java:141)* * at org.apache.solr.core.JmxMonitoredMap.put(JmxMonitoredMap.java:47)* * at org.apache.solr.search.SolrIndexSearcher.register(SolrIndexSearcher.java:220) * * at org.apache.solr.core.SolrCore.registerSearcher(SolrCore.java:1349)* * at org.apache.solr.core.SolrCore.access$000(SolrCore.java:84)* * at org.apache.solr.core.SolrCore$5.call(SolrCore.java:1247)* * at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)* * at java.util.concurrent.FutureTask.run(FutureTask.java:166)* * at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) * * at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) * * at java.lang.Thread.run(Thread.java:722)* * * *Jul 31, 2013 12:46:00 PM org.apache.solr.core.SolrCore registerSearcher* *INFO: [] Registered new searcher Searcher@5fa1891b main* *Jul 31, 2013 12:46:00 PM org.apache.solr.search.SolrIndexSearcher close* Has anybody traced such issue. Please this is really very Urgent and Important. Waiting for your response. Thanks and Regards Vineet
Unexpected character '' (code 60) expected '='
Hi All I am currently stuck in a Solr Issue while Posting some data to Solr Server. I have some record from Hbase which I am posting to Solr, but after posting some 1 Million of data records, it suddenly stopped. Checking the Catalina log trace it showed, *org.apache.solr.common.SolrException: Unexpected character '' (code 60) expected '='* * * * * I am not sure whether its the issue with some malformed data for the posting, because whatever xml file which I am generating before posting I have tried posting that specific file to the solr and its going well. Below is the whole log trace, *SEVERE: org.apache.solr.common.SolrException: Unexpected character '' (code 60) expected '='* * at [row,col {unknown-source}]: [20281,18]* * at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:81)* * at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:58) * * at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129) * * at org.apache.solr.core.SolrCore.execute(SolrCore.java:1398)* * at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:356) * * at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:252) * * at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) * * at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) * * at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233) * * at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191) * * at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127) * * at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:103) * * at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109) * * at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:293)* * at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:861)* * at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:606) * * at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489) * * at java.lang.Thread.run(Thread.java:722)* *Caused by: com.ctc.wstx.exc.WstxUnexpectedCharException: Unexpected character '' (code 60) expected '='* * at [row,col {unknown-source}]: [20281,18]* * at com.ctc.wstx.sr.StreamScanner.throwUnexpectedChar(StreamScanner.java:648)* * at com.ctc.wstx.sr.BasicStreamReader.handleNsAttrs(BasicStreamReader.java:3001) * * at com.ctc.wstx.sr.BasicStreamReader.handleStartElem(BasicStreamReader.java:2936) * * at com.ctc.wstx.sr.BasicStreamReader.nextFromTree(BasicStreamReader.java:2848)* * at com.ctc.wstx.sr.BasicStreamReader.next(BasicStreamReader.java:1019)* * at org.apache.solr.handler.XMLLoader.readDoc(XMLLoader.java:295)* * at org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:157)* * at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:79)* * ... 17 more* * * Has anybody faced this issue. Thanks and Regards Vineet
RE: Unexpected character '' (code 60) expected '='
This file is malformed: *SEVERE: org.apache.solr.common.SolrException: Unexpected character '' (code 60) expected '='* * at [row,col {unknown-source}]: [20281,18]* Check row 20281 column 18 -Original message- From:Vineet Mishra clearmido...@gmail.com Sent: Wednesday 31st July 2013 15:05 To: solr-user@lucene.apache.org Subject: Unexpected character 'lt;' (code 60) expected '=' Hi All I am currently stuck in a Solr Issue while Posting some data to Solr Server. I have some record from Hbase which I am posting to Solr, but after posting some 1 Million of data records, it suddenly stopped. Checking the Catalina log trace it showed, *org.apache.solr.common.SolrException: Unexpected character '' (code 60) expected '='* * * * * I am not sure whether its the issue with some malformed data for the posting, because whatever xml file which I am generating before posting I have tried posting that specific file to the solr and its going well. Below is the whole log trace, *SEVERE: org.apache.solr.common.SolrException: Unexpected character '' (code 60) expected '='* * at [row,col {unknown-source}]: [20281,18]* * at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:81)* * at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:58) * * at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129) * * at org.apache.solr.core.SolrCore.execute(SolrCore.java:1398)* * at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:356) * * at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:252) * * at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) * * at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) * * at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233) * * at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191) * * at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127) * * at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:103) * * at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109) * * at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:293)* * at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:861)* * at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:606) * * at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489) * * at java.lang.Thread.run(Thread.java:722)* *Caused by: com.ctc.wstx.exc.WstxUnexpectedCharException: Unexpected character '' (code 60) expected '='* * at [row,col {unknown-source}]: [20281,18]* * at com.ctc.wstx.sr.StreamScanner.throwUnexpectedChar(StreamScanner.java:648)* * at com.ctc.wstx.sr.BasicStreamReader.handleNsAttrs(BasicStreamReader.java:3001) * * at com.ctc.wstx.sr.BasicStreamReader.handleStartElem(BasicStreamReader.java:2936) * * at com.ctc.wstx.sr.BasicStreamReader.nextFromTree(BasicStreamReader.java:2848)* * at com.ctc.wstx.sr.BasicStreamReader.next(BasicStreamReader.java:1019)* * at org.apache.solr.handler.XMLLoader.readDoc(XMLLoader.java:295)* * at org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:157)* * at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:79)* * ... 17 more* * * Has anybody faced this issue. Thanks and Regards Vineet
RE: new field type - enum field
Hi, I have managed to attach the patch in Jira. Thanks. -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: Monday, July 29, 2013 2:15 PM To: solr-user@lucene.apache.org Subject: Re: new field type - enum field OK, if you can attach it to an e-mail, I'll attach it. Just to check, though, make sure you're logged in. I've been fooled once or twice by being automatically signed out... Erick On Mon, Jul 29, 2013 at 3:17 AM, Elran Dvir elr...@checkpoint.com wrote: Thanks, Erick. I have tried it four times. It keeps failing. The problem reoccurred today. Thanks. -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: Monday, July 29, 2013 2:44 AM To: solr-user@lucene.apache.org Subject: Re: new field type - enum field You should be able to attach a patch, wonder if there was some temporary glitch in the JIRA. Is this persisting. Let us know if this continues... Erick On Sun, Jul 28, 2013 at 12:11 PM, Elran Dvir elr...@checkpoint.com wrote: Hi, I have created an issue: https://issues.apache.org/jira/browse/SOLR-5084 I tried to attach my patch, but it failed: Cannot attach file Solr-5084.patch: Unable to communicate with JIRA. What am I doing wrong? Thanks. -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: Thursday, July 25, 2013 3:25 PM To: solr-user@lucene.apache.org Subject: Re: new field type - enum field Start here: http://wiki.apache.org/solr/HowToContribute Then, when your patch is ready submit a JIRA and attach your patch. Then nudge (gently) if none of the committers picks it up and applies it NOTE: It is _not_ necessary that the first version of your patch is completely polished. I often put up partial/incomplete patches (comments with //nocommit are explicitly caught by the ant precommit target for instance) to see if anyone has any comments before polishing. Best Erick On Thu, Jul 25, 2013 at 5:04 AM, Elran Dvir elr...@checkpoint.com wrote: Hi, I have implemented like Chris described it: The field is indexed as numeric, but displayed as string, according to configuration. It applies to facet, pivot, group and query. How do we proceed? How do I contribute it? Thanks. -Original Message- From: Chris Hostetter [mailto:hossman_luc...@fucit.org] Sent: Thursday, July 25, 2013 4:40 AM To: solr-user@lucene.apache.org Subject: Re: new field type - enum field : Doable at Lucene level by any chance? Given how well the Trie fields compress (ByteField and ShortField have been deprecated in favor of TrieIntField for this reason) it probably just makes sense to treat it as a numeric at the Lucene level. : If there's positive feedback, I'll open an issue with a patch for the functionality. I've typically dealt with this sort of thing at the client layer using a simple numeric field in Solr, or used an UpdateProcessor to convert the String-numeric mapping when indexing used clinet logic of a DocTransformer to handle the stored value at query time -- but having a built in FieldType that handles that for you automatically (and helps ensure the indexed values conform to the enum) would certainly be cool if you'd like to contribute it. -Hoss Email secured by Check Point Email secured by Check Point Email secured by Check Point Email secured by Check Point
Re: How might one search for dupe IDs other than faceting on the ID field?
Good to note! But... any search will not detect dupe IDs for uncommitted documents. -- Jack Krupansky -Original Message- From: Mikhail Khludnev Sent: Wednesday, July 31, 2013 6:11 AM To: solr-user Subject: Re: How might one search for dupe IDs other than faceting on the ID field? fwiw, this code won't capture uncommitted duplicates. On Wed, Jul 31, 2013 at 9:41 AM, Dotan Cohen dotanco...@gmail.com wrote: On Tue, Jul 30, 2013 at 11:14 PM, Jack Krupansky j...@basetechnology.com wrote: The Solr SignatureUpdateProcessorFactory is designed to facilitate dedupe... any particular reason you did not use it? See: http://wiki.apache.org/solr/Deduplication and https://cwiki.apache.org/confluence/display/solr/De-Duplication Actually, the guy who made the changes (a coworker) did in fact write an alternative UpdateHandler. I've just noticed that there are a bunch of dupes right now, though. public class DiscoAPIUpdateHandler extends DirectUpdateHandler2 { public DiscoAPIUpdateHandler(SolrCore core) { super(core); } @Override public int addDoc(AddUpdateCommand cmd) throws IOException{ // if overwrite is set to false we'll use the DefaultUpdateHandler2 , this is done for debugging to insert duplicates to solr if (!cmd.overwrite) return super.addDoc(cmd); // when using ref counted objects you have!! to decrement the ref count when your done RefCountedSolrIndexSearcher indexSearcher = this.core.getNewestSearcher(false); // the idea is like this we'll make an internal lucene query and check if that id already exists Term updateTerm = null; if (cmd.updateTerm != null){ updateTerm = cmd.updateTerm; } else { updateTerm = new Term(id,cmd.getIndexedId()); } Query query = new TermQuery(updateTerm); TopDocs docs = indexSearcher.get().search(query,2); if (docs.totalHits0){ // index searcher is no longer needed indexSearcher.decref(); // don't add the new document return 0; } // index searcher is no longer needed indexSearcher.decref(); // if i'm here then it's a new document return super.addDoc(cmd); } } And I give a bunch of examples in my book. I anticipate the book with esteem! -- Dotan Cohen http://gibberish.co.il http://what-is-what.com -- Sincerely yours Mikhail Khludnev Principal Engineer, Grid Dynamics http://www.griddynamics.com mkhlud...@griddynamics.com
Re: Unexpected character '' (code 60) expected '='
I checked the File. . .nothing is there. I mean the formatting is correct, its a valid XML file. On Wed, Jul 31, 2013 at 6:38 PM, Markus Jelsma markus.jel...@openindex.iowrote: This file is malformed: *SEVERE: org.apache.solr.common.SolrException: Unexpected character '' (code 60) expected '='* * at [row,col {unknown-source}]: [20281,18]* Check row 20281 column 18 -Original message- From:Vineet Mishra clearmido...@gmail.com Sent: Wednesday 31st July 2013 15:05 To: solr-user@lucene.apache.org Subject: Unexpected character 'lt;' (code 60) expected '=' Hi All I am currently stuck in a Solr Issue while Posting some data to Solr Server. I have some record from Hbase which I am posting to Solr, but after posting some 1 Million of data records, it suddenly stopped. Checking the Catalina log trace it showed, *org.apache.solr.common.SolrException: Unexpected character '' (code 60) expected '='* * * * * I am not sure whether its the issue with some malformed data for the posting, because whatever xml file which I am generating before posting I have tried posting that specific file to the solr and its going well. Below is the whole log trace, *SEVERE: org.apache.solr.common.SolrException: Unexpected character '' (code 60) expected '='* * at [row,col {unknown-source}]: [20281,18]* * at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:81)* * at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:58) * * at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129) * * at org.apache.solr.core.SolrCore.execute(SolrCore.java:1398)* * at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:356) * * at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:252) * * at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) * * at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) * * at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233) * * at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191) * * at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127) * * at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:103) * * at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109) * * at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:293)* * at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:861)* * at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:606) * * at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489) * * at java.lang.Thread.run(Thread.java:722)* *Caused by: com.ctc.wstx.exc.WstxUnexpectedCharException: Unexpected character '' (code 60) expected '='* * at [row,col {unknown-source}]: [20281,18]* * at com.ctc.wstx.sr.StreamScanner.throwUnexpectedChar(StreamScanner.java:648)* * at com.ctc.wstx.sr.BasicStreamReader.handleNsAttrs(BasicStreamReader.java:3001) * * at com.ctc.wstx.sr.BasicStreamReader.handleStartElem(BasicStreamReader.java:2936) * * at com.ctc.wstx.sr.BasicStreamReader.nextFromTree(BasicStreamReader.java:2848)* * at com.ctc.wstx.sr.BasicStreamReader.next(BasicStreamReader.java:1019)* * at org.apache.solr.handler.XMLLoader.readDoc(XMLLoader.java:295)* * at org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:157)* * at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:79)* * ... 17 more* * * Has anybody faced this issue. Thanks and Regards Vineet
Re: Trying to determine the benefit of spellcheck-based suggester vs. using terms component?
Thanks for the reply Erick. I'm looking for type-ahead support; using spell checking too via the DirectSolrSpellChecker. Seems like the spell check based suggester is designed for type-head or am I not understanding something? Here's my config: requestHandler class=org.apache.solr.handler.component.SearchHandler name=/suggest lst name=defaults str name=echoParamsexplicit/str str name=wtjson/str str name=indenttrue/str str name=dfsuggest/str str name=spellchecktrue/str str name=spellcheck.dictionarysuggestDictionary/str str name=spellcheck.onlyMorePopulartrue/str str name=spellcheck.count5/str str name=spellcheck.collatefalse/str /lst arr name=components strsuggest/str /arr /requestHandler searchComponent class=solr.SpellCheckComponent name=suggest lst name=spellchecker str name=namesuggestDictionary/str str name=classnameorg.apache.solr.spelling.suggest.Suggester/str str name=lookupImplorg.apache.solr.spelling.suggest.tst.TSTLookup/str str name=fieldsuggest/str float name=threshold0./float str name=buildOnCommittrue/str /lst /searchComponent I was confused why this approach was needed because using terms component is so easy and doesn't require any build step. From your answer, it seems like either approach is valid in Solr 4.4 but the spellcheck based suggester has more knobs, such as loading in an external dictionary in addition to data in my index, etc. Cheers, Tim On Wed, Jul 31, 2013 at 5:08 AM, Erick Erickson erickerick...@gmail.com wrote: The biggest thing is that the spellchecker has lots of knobs to tune, all the stuff in http://wiki.apache.org/solr/SpellCheckComponent#spellcheck.collate TermsComponent, on the other hand, just gives you what's in the index with essentially no knobs to tune. So it depends on your goal. Typeahead or spelling correction? In the first case I'd go for TermsComponent and the second spell check as an example. Best Erick On Tue, Jul 30, 2013 at 2:07 PM, Timothy Potter thelabd...@gmail.com wrote: Going over the comments in SOLR-1316, I seemed to have lost the forrest for the trees. What is the benefit of using the spellcheck based suggester over something like the terms component to get suggestions as the user types? Maybe it is faster because it builds the in-memory data structure on commit? Seems like the terms component is pretty fast too. I'd appreciate any additional insights about this. There are so many solutions to auto-suggest for Solr, it's hard to decide what approach to take. Cheers, Tim
Autowarming last 15 days data
Hi, We have a solr master slave set up with close to 30 million records. Our index changes/updates very frequently and replication is set up at 60 seconds delay. Now every time replication completes, the new searches take a time. How can this be improved? I have come across that warming would help this scenario, I our case we cannot warm some queries, but most of the users use the last 15 days data only. So would it be possible to auto warm only last 15 days data? Regards, Ayush
Re: Measuring SOLR performance
Hi Roman, What version and config of SOLR does the tool expect? Tried to run, but got: **ERROR** File solrjmeter.py, line 1390, in module main(sys.argv) File solrjmeter.py, line 1296, in main check_prerequisities(options) File solrjmeter.py, line 351, in check_prerequisities error('Cannot contact: %s' % options.query_endpoint) File solrjmeter.py, line 66, in error traceback.print_stack() Cannot contact: http://localhost:8983/solr complains about URL, clicking which leads properly to the admin page... solr 4.3.1, 2 cores shard Dmitry On Wed, Jul 31, 2013 at 3:59 AM, Roman Chyla roman.ch...@gmail.com wrote: Hello, I have been wanting some tools for measuring performance of SOLR, similar to Mike McCandles' lucene benchmark. so yet another monitor was born, is described here: http://29min.wordpress.com/2013/07/31/measuring-solr-query-performance/ I tested it on the problem of garbage collectors (see the blogs for details) and so far I can't conclude whether highly customized G1 is better than highly customized CMS, but I think interesting details can be seen there. Hope this helps someone, and of course, feel free to improve the tool and share! roman
Re: Improper shutdown of Solr in Jetty 9
OK. On ubuntu there are shell scripts that come with jetty 9. They seem to do the proper job (disclaimer: not yet extensive testing with solr done, but looks good so far). Not sure, how well jetty supports win environment on the life-cycle automation side. On Wed, Jul 31, 2013 at 1:43 PM, Artem Karpenko a.karpe...@oxseed.comwrote: Hello Dmitry, it's Windows 7. I'm starting Jetty with java -jar start.jar 31.07.2013 12:36, Dmitry Kan пишет: Artem, Whats the OS are using? So far jetty 9 with solr 4.3.1 works ok under ubuntu 12.04. On 30 Jul 2013 17:23, Alexandre Rafalovitch arafa...@gmail.com wrote: Of course, I meant Jetty (not Tomcat). So apologies for spam and confusion of my own. The rest of the statement stands. Personal website: http://www.outerthoughts.com/ LinkedIn: http://www.linkedin.com/in/**alexandrerafalovitchhttp://www.linkedin.com/in/alexandrerafalovitch - Time is the quality of nature that keeps events from happening all at once. Lately, it doesn't seem to be working. (Anonymous - via GTD book) On Tue, Jul 30, 2013 at 10:20 AM, Alexandre Rafalovitch arafa...@gmail.comwrote: Thanks for letting us know. See if you can add it to the documentation somewhere. Solr is not using Tomcat 9, but I believe that was primarily because Tomcat 9 requires Java 7 and Solr 4.x is staying with Java 6 as minimum requirement. Regards, Alex. Personal website: http://www.outerthoughts.com/ LinkedIn: http://www.linkedin.com/in/**alexandrerafalovitchhttp://www.linkedin.com/in/alexandrerafalovitch - Time is the quality of nature that keeps events from happening all at once. Lately, it doesn't seem to be working. (Anonymous - via GTD book) On Tue, Jul 30, 2013 at 10:09 AM, Artem Karpenko a.karpe...@oxseed.com wrote: Uh, sorry for spamming, but if anyone interested there is a way to properly shutdown Jetty when it's launched with --exec flag. You can use JMX to invoke method stop() on the Jetty's Server MBean. This triggers a proper shutdown with all Solr's close() callbacks executed. I wonder why it's not noted at least in Jetty documentation. Regards, Artem Karpenko. 30.07.2013 16:58, Artem Karpenko пишет: After some investigation I found that the problem is not with Jetty's version but usage of --exec flag. Namely, when --exec is used (to specify JVM args) then shutdown is not graceful, it seems that Java process that is just killed. Not sure how to handle this... Regards, Artem Karpenko. 29.07.2013 16:51, Artem Karpenko пишет: Hi, I can't make Solr shut down properly when using Jetty 9. Tested this with a simple plugin that only extends DirectUpdateHandler2, creates a file in constructor and deletes it in close(). While it's working fine in the example installation (the one that can be downloaded from Solr site) and in the simple custom installation with Jetty 8, it won't in Jetty 9. There is not much logging at shutdown at all, just Jetty's closing selector or smth., unlike with Jetty 8 where it prints various Graceful shutdown messages from Solr. Installation procedure I used for both Jettys is rather simple: just put solr.war into webapps/ directory, plugin JAR into {core}/lib/ and configure update handler in solrconfig.xml. OS is Windows 7, Solr 4.4. I tried to stop Jetty with both Ctrl+C and java start.jar [port/key params] --stop. For Jetty 8 it works fine even with Ctrl+C. Did anybody stumble on this issue? Best regards, Artem.
Re: Measuring SOLR performance
Ok, got the error fixed by modifying the base solr ulr in solrjmeter.py (added core name after /solr part). Next error is: WARNING: no test name(s) supplied nor found in: ['/home/dmitry/projects/lab/solrjmeter/demo/queries/demo.queries'] It is a 'slow start with new tool' symptom I guess.. :) On Wed, Jul 31, 2013 at 4:39 PM, Dmitry Kan solrexp...@gmail.com wrote: Hi Roman, What version and config of SOLR does the tool expect? Tried to run, but got: **ERROR** File solrjmeter.py, line 1390, in module main(sys.argv) File solrjmeter.py, line 1296, in main check_prerequisities(options) File solrjmeter.py, line 351, in check_prerequisities error('Cannot contact: %s' % options.query_endpoint) File solrjmeter.py, line 66, in error traceback.print_stack() Cannot contact: http://localhost:8983/solr complains about URL, clicking which leads properly to the admin page... solr 4.3.1, 2 cores shard Dmitry On Wed, Jul 31, 2013 at 3:59 AM, Roman Chyla roman.ch...@gmail.comwrote: Hello, I have been wanting some tools for measuring performance of SOLR, similar to Mike McCandles' lucene benchmark. so yet another monitor was born, is described here: http://29min.wordpress.com/2013/07/31/measuring-solr-query-performance/ I tested it on the problem of garbage collectors (see the blogs for details) and so far I can't conclude whether highly customized G1 is better than highly customized CMS, but I think interesting details can be seen there. Hope this helps someone, and of course, feel free to improve the tool and share! roman
Re: SolrCloud Exception
On 7/31/2013 4:27 AM, Sinduja Rajendran wrote: I am running solr 4.0 in a cloud. We have close to 100Mdocuments. The data is from a single DB table. I use dih. Our solrCloud has 3 zookeepers, one tomcat, 2 solr instances in same tomcat. We have 8 GB Ram. After indexing 14M, my indexing fails witht the below exception. solr org.apache.lucene.index.MergePolicy$MergeException: java.lang.OutOfMemoryError: GC overhead limit exceeded I tried increasing the GC value to the App server -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=80 But after giving the command, my indexing went drastically down. Its was indexing only 15k documents for 20 minutes. Earlier it was 300k for 20 min. First thing to mention is that Solr 4.0 was extremely buggy, upgrading would be advisable. In the meantime: An OutOfMemoryError means that Solr needs more heap memory than the JVM is allowed to use. The Solr Admin UI dashboard will tell you how much memory is allocated to your JVM, which you can increase with the -Xmx parameter. Real RAM must be available from the system in order to increase the heap size. The options you have given just change the GC collector and tune one aspect of the new collector, they don't increase anything. Here are some things that may help you: http://wiki.apache.org/solr/SolrPerformanceProblems http://wiki.apache.org/solr/ShawnHeisey#GC_Tuning After looking over that information and making adjustments, if you are still having trouble, we can go over your config and all your details to see what can be done. You said that both of your Solr instances are running in the same tomcat. Just FYI - because you aren't running all functions on separate hardware, your setup is not fault tolerant. Machine failures DO happen, no matter how much redundancy you build into that server. If you are running all this on a redundant VM solution that has live migration of running VMs, then my statement isn't accurate. Thanks, Shawn
SolrCloud - Replica 'down'. How to get it back as 'active'? - Solr 4.3.0
Hi, After the following error, one of the replicas of the leader went down. Error opening new searcher. exceeded limit of maxWarmingSearchers=2, try again later. I increased the autoCommit time to 5000ms and restarted Solr. However, the status is still set to down. How do I get it back to active? Regards, Jeroen
Re: Solr PolyField
Nope. Solr fields are flat. Why do you want to do this? I'm asking because this might be an XY problems and there may be other possibilities. Best Erick On Wed, Jul 31, 2013 at 5:09 AM, Luís Portela Afonso meligalet...@gmail.com wrote: Hi, I'm trying to create a field with multiple fields inside, that is: origin: { htmlUrl: http://www.gazzetta.it/;, streamId: feed/http://www.gazzetta.it/rss/Home.xml;, title: Gazzetta.it }, Get something like this. Is that possible? I'm using Solr 4.4.0. Thanks
Re: Sharding with a SolrCloud
You're in uncharted territory. I can imagine you use a SolrCloud cluster as a separate Solr for a federated search, but using it as a single shard just seems wrong. If nothing else, indexing to the shards will require that the documents be routed correctly. But having one shard in SolrCloud and another shard managed externally seems ripe for getting the docs indexed to various shards you're not expecting, unless you're using explicit routing All in all, this _really_ sounds like something you should not be attempting. Why are you trying to do this? Is it possible to just set up a SolrCloud cluster and index all the docs to it and be done with it? 'cause I think you'll end up with endless problems given what you've described. Best Erick On Wed, Jul 31, 2013 at 5:16 AM, Oliver Goldschmidt o.goldschm...@tuhh.de wrote: Hi list, I have a Solr server, which uses sharding to make distributed search with another Solr server. The other Solr server now migrates to a Solr Cloud system. I've been trying recently to continue searching the Solr Cloud as a shard for my Solr server, but this is failing with mysterious effects. I am getting a result with a number of hits, when I perform a search, but the results are not displayed at all. This is the resonse header I am getting from Solr: { responseHeader:{ status:0, QTime:305, params:{ facet:true, indent:yes, facet.mincount:1, facet.limit:30, qf:title_short^750 title_full_unstemmed^600, json.nl:arrarr, wt:json, rows:20, shards:ourindex.nowhere.de/solr/index, bq:format:Book^500, fl:*,score, facet.sort:count, start:0, q:xml, shards.info:true, facet.prefix:, facet.field:[publishDate], qt:dismax}}, shards.info:{ ourindex.nowhere.de/solr/index:{ numFound:10076, maxScore:8.507474, time:263}}, response:{numFound:10056,start:0,maxScore:8.507474,docs:[] } As you can see, there are no docs in the result. This result is not 100% reproducable: sometimes I get no results displayed, other times it works (with the same query URL!). As you also can see in the result, the number of hits in the response is a little bit less than the number of hits sent from the shard. This makes me wonder if it is not possible to use a Solr Cloud as a shard for another standalone Solr server? Any hint is appreciated! Best - Oliver -- Oliver Goldschmidt TU Hamburg-Harburg / Universitätsbibliothek / Digitale Dienste Denickestr. 22 21071 Hamburg - Harburg Tel.+49 (0)40 / 428 78 - 32 91 eMail o.goldschm...@tuhh.de -- GPG/PGP-Schlüssel: http://www.tub.tu-harburg.de/keys/Oliver_Marahrens_pub.asc
Re: SolrCloud - Replica 'down'. How to get it back as 'active'? - Solr 4.3.0
It perhaps is just replaying the transaction logs and coming up. Wait for it is what I'd say. The admin UI as of now doesn't show replaying of transaction log as 'recovering', it does so only during peer sync. Also, you may want to add autoSoftCommit and increase the autoCommit to a few minutes. On Wed, Jul 31, 2013 at 7:55 PM, Jeroen Steggink jer...@stegg-inc.comwrote: Hi, After the following error, one of the replicas of the leader went down. Error opening new searcher. exceeded limit of maxWarmingSearchers=2, try again later. I increased the autoCommit time to 5000ms and restarted Solr. However, the status is still set to down. How do I get it back to active? Regards, Jeroen -- Anshum Gupta http://www.anshumgupta.net
Re: Solr PolyField
Hi, I'm trying to index information of RSS Feeds. So in a more detailed explanation: The RSS feed has something like: enclosure url=http://www.engadget.com/podcasts/Engadget_Podcast_353.mp3; length=32642192 type=audio/mpeg/ With my current configuration, this is working and i get a result like that: enclosure: [ audio/mpeg, http://www.engadget.com/podcasts/EngadgetHD_Podcast_359.mp3;, 37521428 ], BUT, this is not the result that i'm trying to reach. With that i'm not able to know in a correct way, if audio/mpeg is the type, or the url, or the length. I want to reach something like: enclosure: { type: audio/mpeg, url: http://www.engadget.com/podcasts/EngadgetHD_Podcast_359.mp3;, length: 37521428 }, So, how i intend this, this should be 3 fields inside of another field, no? Many Thanks for the answer and the help. On Jul 31, 2013, at 3:34 PM, Erick Erickson erickerick...@gmail.com wrote: Nope. Solr fields are flat. Why do you want to do this? I'm asking because this might be an XY problems and there may be other possibilities. Best Erick On Wed, Jul 31, 2013 at 5:09 AM, Luís Portela Afonso meligalet...@gmail.com wrote: Hi, I'm trying to create a field with multiple fields inside, that is: origin: { htmlUrl: http://www.gazzetta.it/;, streamId: feed/http://www.gazzetta.it/rss/Home.xml;, title: Gazzetta.it }, Get something like this. Is that possible? I'm using Solr 4.4.0. Thanks smime.p7s Description: S/MIME cryptographic signature
Re: Autowarming last 15 days data
On 7/31/2013 7:30 AM, Cool Techi wrote: We have a solr master slave set up with close to 30 million records. Our index changes/updates very frequently and replication is set up at 60 seconds delay. Now every time replication completes, the new searches take a time. How can this be improved? I have come across that warming would help this scenario, I our case we cannot warm some queries, but most of the users use the last 15 days data only. So would it be possible to auto warm only last 15 days data? Autowarming is generally done automatically when a new searcher is opened, according to the cache config. It will take the most recent N queries in the cache (according to the autowarmCount) and re-execute those queries against the index to populate the cache. The document cache cannot be warmed directly, but when the query result cache is warmed, that will also populate the document cache. Because you have a potentially very frequent interval for opening new searchers (possibly replicating every 60 seconds), you will want to avoid large autowarmCount values. If your autowarming ends up taking too long, the system will try to open a new searcher while the previous one is being warmed, which can lead to problems. I have found that the filterCache is particularly slow to warm. Thanks, Shawn
Re: Solr PolyField
Luís, Is there a reason why splitting this up into enclosure_type, enclosure_url, and enclosure_length would not work? Michael Della Bitta Applications Developer o: +1 646 532 3062 | c: +1 917 477 7906 appinions inc. “The Science of Influence Marketing” 18 East 41st Street New York, NY 10017 t: @appinions https://twitter.com/Appinions | g+: plus.google.com/appinionshttps://plus.google.com/u/0/b/112002776285509593336/112002776285509593336/posts w: appinions.com http://www.appinions.com/ On Wed, Jul 31, 2013 at 10:43 AM, Luís Portela Afonso meligalet...@gmail.com wrote: Hi, I'm trying to index information of RSS Feeds. So in a more detailed explanation: The RSS feed has something like: enclosure url=http://www.engadget.com/podcasts/Engadget_Podcast_353.mp3; length=32642192 type=audio/mpeg/ *With my current configuration, this is working and i get a result like that:* - enclosure: [ - audio/mpeg, - http://www.engadget.com/podcasts/EngadgetHD_Podcast_359.mp3;, - 37521428 ], *BUT,* this is not the result that i'm trying to reach. With that i'm not able to know in a correct way, if audio/mpeg is the *type*, or the * url,* or the *length*. * * *I want to reach something like:* - - enclosure: { - type: a http://www.gazzetta.it/udio/mpeg, - url: http://www.engadget.com/podcasts/EngadgetHD_Podcast_359.mp3;, - length: 37521428 }, So, how i intend this, this should be 3 fields inside of another field, no? Many Thanks for the answer and the help. On Jul 31, 2013, at 3:34 PM, Erick Erickson erickerick...@gmail.com wrote: Nope. Solr fields are flat. Why do you want to do this? I'm asking because this might be an XY problems and there may be other possibilities. Best Erick On Wed, Jul 31, 2013 at 5:09 AM, Luís Portela Afonso meligalet...@gmail.com wrote: Hi, I'm trying to create a field with multiple fields inside, that is: origin: { htmlUrl: http://www.gazzetta.it/;, streamId: feed/http://www.gazzetta.it/rss/Home.xml;, title: Gazzetta.it }, Get something like this. Is that possible? I'm using Solr 4.4.0. Thanks
Re: Solr PolyField
This fields can be multiValued. I the rss standart there is not correct to do that, but some sources do and i like to grab it all. Is there any way that make it possible? Once again, Many thanks :) On Jul 31, 2013, at 3:54 PM, Michael Della Bitta michael.della.bi...@appinions.com wrote: Luís, Is there a reason why splitting this up into enclosure_type, enclosure_url, and enclosure_length would not work? Michael Della Bitta Applications Developer o: +1 646 532 3062 | c: +1 917 477 7906 appinions inc. “The Science of Influence Marketing” 18 East 41st Street New York, NY 10017 t: @appinions https://twitter.com/Appinions | g+: plus.google.com/appinionshttps://plus.google.com/u/0/b/112002776285509593336/112002776285509593336/posts w: appinions.com http://www.appinions.com/ On Wed, Jul 31, 2013 at 10:43 AM, Luís Portela Afonso meligalet...@gmail.com wrote: Hi, I'm trying to index information of RSS Feeds. So in a more detailed explanation: The RSS feed has something like: enclosure url=http://www.engadget.com/podcasts/Engadget_Podcast_353.mp3; length=32642192 type=audio/mpeg/ *With my current configuration, this is working and i get a result like that:* - enclosure: [ - audio/mpeg, - http://www.engadget.com/podcasts/EngadgetHD_Podcast_359.mp3;, - 37521428 ], *BUT,* this is not the result that i'm trying to reach. With that i'm not able to know in a correct way, if audio/mpeg is the *type*, or the * url,* or the *length*. * * *I want to reach something like:* - - enclosure: { - type: a http://www.gazzetta.it/udio/mpeg, - url: http://www.engadget.com/podcasts/EngadgetHD_Podcast_359.mp3;, - length: 37521428 }, So, how i intend this, this should be 3 fields inside of another field, no? Many Thanks for the answer and the help. On Jul 31, 2013, at 3:34 PM, Erick Erickson erickerick...@gmail.com wrote: Nope. Solr fields are flat. Why do you want to do this? I'm asking because this might be an XY problems and there may be other possibilities. Best Erick On Wed, Jul 31, 2013 at 5:09 AM, Luís Portela Afonso meligalet...@gmail.com wrote: Hi, I'm trying to create a field with multiple fields inside, that is: origin: { htmlUrl: http://www.gazzetta.it/;, streamId: feed/http://www.gazzetta.it/rss/Home.xml;, title: Gazzetta.it }, Get something like this. Is that possible? I'm using Solr 4.4.0. Thanks smime.p7s Description: S/MIME cryptographic signature
Re: Sharding with a SolrCloud
Thank you very much for that information, Erick. That was what I was fearing... Well, the problem, why I am trying to do this is, that the SolrCloud is managed by someone else. We are indexing some content to a pretty small local index. To this index we have complete access and can do whatever we want to do. But we also need the seperate index, which is now moving into the cloud. Its not possible to put our local content into the cloud, because we are not maintaining it and have no write permission to it. But why shouldn't that work? Isn't Solr Cloud acting like one solr server? The indices have to be maintained seperately - can't I just continue using them as shards and get one result list from both of them (thats how I did it before they wanted to switch to Solr Cloud)? Though, if there is no way to use the cloud as a shard, we will have to think about how to solve that. Of course we can split up the queries and make two queries (one for the cloud and one for our local index). But this might be a bit confusing for the user. Thank you again, best - Oliver Am 31.07.2013 16:39, schrieb Erick Erickson: You're in uncharted territory. I can imagine you use a SolrCloud cluster as a separate Solr for a federated search, but using it as a single shard just seems wrong. If nothing else, indexing to the shards will require that the documents be routed correctly. But having one shard in SolrCloud and another shard managed externally seems ripe for getting the docs indexed to various shards you're not expecting, unless you're using explicit routing All in all, this _really_ sounds like something you should not be attempting. Why are you trying to do this? Is it possible to just set up a SolrCloud cluster and index all the docs to it and be done with it? 'cause I think you'll end up with endless problems given what you've described. Best Erick On Wed, Jul 31, 2013 at 5:16 AM, Oliver Goldschmidt o.goldschm...@tuhh.de wrote: Hi list, I have a Solr server, which uses sharding to make distributed search with another Solr server. The other Solr server now migrates to a Solr Cloud system. I've been trying recently to continue searching the Solr Cloud as a shard for my Solr server, but this is failing with mysterious effects. I am getting a result with a number of hits, when I perform a search, but the results are not displayed at all. This is the resonse header I am getting from Solr: { responseHeader:{ status:0, QTime:305, params:{ facet:true, indent:yes, facet.mincount:1, facet.limit:30, qf:title_short^750 title_full_unstemmed^600, json.nl:arrarr, wt:json, rows:20, shards:ourindex.nowhere.de/solr/index, bq:format:Book^500, fl:*,score, facet.sort:count, start:0, q:xml, shards.info:true, facet.prefix:, facet.field:[publishDate], qt:dismax}}, shards.info:{ ourindex.nowhere.de/solr/index:{ numFound:10076, maxScore:8.507474, time:263}}, response:{numFound:10056,start:0,maxScore:8.507474,docs:[] } As you can see, there are no docs in the result. This result is not 100% reproducable: sometimes I get no results displayed, other times it works (with the same query URL!). As you also can see in the result, the number of hits in the response is a little bit less than the number of hits sent from the shard. This makes me wonder if it is not possible to use a Solr Cloud as a shard for another standalone Solr server? Any hint is appreciated! Best - Oliver -- Oliver Goldschmidt TU Hamburg-Harburg / Universitätsbibliothek / Digitale Dienste Denickestr. 22 21071 Hamburg - Harburg Tel.+49 (0)40 / 428 78 - 32 91 eMail o.goldschm...@tuhh.de -- GPG/PGP-Schlüssel: http://www.tub.tu-harburg.de/keys/Oliver_Marahrens_pub.asc -- Oliver Goldschmidt TU Hamburg-Harburg / Universitätsbibliothek / Digitale Dienste Denickestr. 22 21071 Hamburg - Harburg Tel.+49 (0)40 / 428 78 - 32 91 eMail o.goldschm...@tuhh.de -- GPG/PGP-Schlüssel: http://www.tub.tu-harburg.de/keys/Oliver_Marahrens_pub.asc
Re: Measuring SOLR performance
On 7/31/2013 12:24 AM, William Bell wrote: But that link does not tell me which on you are using? You are listing like 4 versions on your site. Also, what did it fix? Pause times? Any other words of wisdom ? I'm not sure whether that was directed at me or Roman, but here's my answers: I run one copy of my index on Solr 3.5.0 and another copy on Solr 4.2.1. I have a completely separate (and much smaller) index using SolrCloud on 4.2.1. I was seeing GC pause times of 8-10 seconds on both 3.5.0 and 4.2.1 with an untuned CMS collector. When I switched that to G1 (also untuned), I was seeing pause times of 12 seconds. The average GC time did go down, but the long stop-the-world pauses were worse. I used the jHiccup tool to see the problem. I went to a CMS config much like what Roman used on his benchmarks, and that improved things greatly, but I was still seeing occasional pauses long enough to make my load balancer ping check (5 second timeout) think that the index had gone down. I later tried the CMS config that's on my wiki page. That seems to have fixed my load balancer problem. I do still see pauses of up to a second, but they are not frequent. We have more page load delay from our webapp than we do from Solr, so users aren't noticing when searches occasionally take a little longer. Thanks, Shawn
Re: Solr PolyField
As a single record? Hum, no. So an Rss has /rss/channel/ and then lot of /rss/channel/item, right? Each /rss/channel/item is a new document on Solr. I start with the solr example rss, but i change that to has more fields, other fields and get the feed url from a database. So each /rss/channel/item is a document to the indexing, bue each /rss/channel/item can have more than on enclosure tag. Many thanks On Jul 31, 2013, at 4:05 PM, Michael Della Bitta michael.della.bi...@appinions.com wrote: So you're trying to index a RSS feed as a single record, but you want to be able to search for and retrieve individual entries from within the feed? Is that the issue? Michael Della Bitta Applications Developer o: +1 646 532 3062 | c: +1 917 477 7906 appinions inc. “The Science of Influence Marketing” 18 East 41st Street New York, NY 10017 t: @appinions https://twitter.com/Appinions | g+: plus.google.com/appinionshttps://plus.google.com/u/0/b/112002776285509593336/112002776285509593336/posts w: appinions.com http://www.appinions.com/ On Wed, Jul 31, 2013 at 10:59 AM, Luís Portela Afonso meligalet...@gmail.com wrote: This fields can be multiValued. I the rss standart there is not correct to do that, but some sources do and i like to grab it all. Is there any way that make it possible? Once again, Many thanks :) On Jul 31, 2013, at 3:54 PM, Michael Della Bitta michael.della.bi...@appinions.com wrote: Luís, Is there a reason why splitting this up into enclosure_type, enclosure_url, and enclosure_length would not work? Michael Della Bitta Applications Developer o: +1 646 532 3062 | c: +1 917 477 7906 appinions inc. “The Science of Influence Marketing” 18 East 41st Street New York, NY 10017 t: @appinions https://twitter.com/Appinions | g+: plus.google.com/appinions https://plus.google.com/u/0/b/112002776285509593336/112002776285509593336/posts w: appinions.com http://www.appinions.com/ On Wed, Jul 31, 2013 at 10:43 AM, Luís Portela Afonso meligalet...@gmail.com wrote: Hi, I'm trying to index information of RSS Feeds. So in a more detailed explanation: The RSS feed has something like: enclosure url= http://www.engadget.com/podcasts/Engadget_Podcast_353.mp3; length=32642192 type=audio/mpeg/ *With my current configuration, this is working and i get a result like that:* - enclosure: [ - audio/mpeg, - http://www.engadget.com/podcasts/EngadgetHD_Podcast_359.mp3;, - 37521428 ], *BUT,* this is not the result that i'm trying to reach. With that i'm not able to know in a correct way, if audio/mpeg is the *type*, or the * url,* or the *length*. * * *I want to reach something like:* - - enclosure: { - type: a http://www.gazzetta.it/udio/mpeg, - url: http://www.engadget.com/podcasts/EngadgetHD_Podcast_359.mp3;, - length: 37521428 }, So, how i intend this, this should be 3 fields inside of another field, no? Many Thanks for the answer and the help. On Jul 31, 2013, at 3:34 PM, Erick Erickson erickerick...@gmail.com wrote: Nope. Solr fields are flat. Why do you want to do this? I'm asking because this might be an XY problems and there may be other possibilities. Best Erick On Wed, Jul 31, 2013 at 5:09 AM, Luís Portela Afonso meligalet...@gmail.com wrote: Hi, I'm trying to create a field with multiple fields inside, that is: origin: { htmlUrl: http://www.gazzetta.it/;, streamId: feed/http://www.gazzetta.it/rss/Home.xml;, title: Gazzetta.it }, Get something like this. Is that possible? I'm using Solr 4.4.0. Thanks smime.p7s Description: S/MIME cryptographic signature
Re: SolrCloud - Replica 'down'. How to get it back as 'active'? - Solr 4.3.0
Thanks Anshum, autoSoftCommit was alread set to 1000ms, but I changed the autoCommit to 3 minutes. I'll wait for it to come back. The index contains about 200.000 documents and the last commit was 14 hours ago. So I wonder how long it will take. I would have thought it would be back up already. On 31-7-2013 16:40, Anshum Gupta wrote: It perhaps is just replaying the transaction logs and coming up. Wait for it is what I'd say. The admin UI as of now doesn't show replaying of transaction log as 'recovering', it does so only during peer sync. Also, you may want to add autoSoftCommit and increase the autoCommit to a few minutes. On Wed, Jul 31, 2013 at 7:55 PM, Jeroen Steggink jer...@stegg-inc.comwrote: Hi, After the following error, one of the replicas of the leader went down. Error opening new searcher. exceeded limit of maxWarmingSearchers=2, try again later. I increased the autoCommit time to 5000ms and restarted Solr. However, the status is still set to down. How do I get it back to active? Regards, Jeroen
Re: Solr PolyField
Hum, ok. It's possible to add to a field, static text? Text that i write on the configuration and then append another field? I saw something like CloneFieldProcessor but when i'm starting solr, it says that could not find the class. I was trying to use processors to move one field to another. I saw this: processor class=solr.FieldCopyProcessorFactory str name=sourcelastname firstname/str str name=destfullname/str bool name=appendtrue/bool str name=append.delim, /str /processor But when i try to use it solr says that he cannot find the solr.FieldCopyProcessorFactory. I'm using solr 4.4.0 Thanks ;) On Jul 31, 2013, at 4:16 PM, Michael Della Bitta michael.della.bi...@appinions.com wrote: OK, Then I would suggest creating multiValued enclosure_type, etc. tags for searching, and then one string-typed field to store the JSON snippet you've been showing. Michael Della Bitta Applications Developer o: +1 646 532 3062 | c: +1 917 477 7906 appinions inc. “The Science of Influence Marketing” 18 East 41st Street New York, NY 10017 t: @appinions https://twitter.com/Appinions | g+: plus.google.com/appinionshttps://plus.google.com/u/0/b/112002776285509593336/112002776285509593336/posts w: appinions.com http://www.appinions.com/ On Wed, Jul 31, 2013 at 11:11 AM, Luís Portela Afonso meligalet...@gmail.com wrote: As a single record? Hum, no. So an Rss has /rss/channel/ and then lot of /rss/channel/item, right? Each /rss/channel/item is a new document on Solr. I start with the solr example rss, but i change that to has more fields, other fields and get the feed url from a database. So each /rss/channel/item is a document to the indexing, bue each /rss/channel/item can have more than on enclosure tag. Many thanks On Jul 31, 2013, at 4:05 PM, Michael Della Bitta michael.della.bi...@appinions.com wrote: So you're trying to index a RSS feed as a single record, but you want to be able to search for and retrieve individual entries from within the feed? Is that the issue? Michael Della Bitta Applications Developer o: +1 646 532 3062 | c: +1 917 477 7906 appinions inc. “The Science of Influence Marketing” 18 East 41st Street New York, NY 10017 t: @appinions https://twitter.com/Appinions | g+: plus.google.com/appinions https://plus.google.com/u/0/b/112002776285509593336/112002776285509593336/posts w: appinions.com http://www.appinions.com/ On Wed, Jul 31, 2013 at 10:59 AM, Luís Portela Afonso meligalet...@gmail.com wrote: This fields can be multiValued. I the rss standart there is not correct to do that, but some sources do and i like to grab it all. Is there any way that make it possible? Once again, Many thanks :) On Jul 31, 2013, at 3:54 PM, Michael Della Bitta michael.della.bi...@appinions.com wrote: Luís, Is there a reason why splitting this up into enclosure_type, enclosure_url, and enclosure_length would not work? Michael Della Bitta Applications Developer o: +1 646 532 3062 | c: +1 917 477 7906 appinions inc. “The Science of Influence Marketing” 18 East 41st Street New York, NY 10017 t: @appinions https://twitter.com/Appinions | g+: plus.google.com/appinions https://plus.google.com/u/0/b/112002776285509593336/112002776285509593336/posts w: appinions.com http://www.appinions.com/ On Wed, Jul 31, 2013 at 10:43 AM, Luís Portela Afonso meligalet...@gmail.com wrote: Hi, I'm trying to index information of RSS Feeds. So in a more detailed explanation: The RSS feed has something like: enclosure url= http://www.engadget.com/podcasts/Engadget_Podcast_353.mp3; length=32642192 type=audio/mpeg/ *With my current configuration, this is working and i get a result like that:* - enclosure: [ - audio/mpeg, - http://www.engadget.com/podcasts/EngadgetHD_Podcast_359.mp3;, - 37521428 ], *BUT,* this is not the result that i'm trying to reach. With that i'm not able to know in a correct way, if audio/mpeg is the *type*, or the * url,* or the *length*. * * *I want to reach something like:* - - enclosure: { - type: a http://www.gazzetta.it/udio/mpeg, - url: http://www.engadget.com/podcasts/EngadgetHD_Podcast_359.mp3;, - length: 37521428 }, So, how i intend this, this should be 3 fields inside of another field, no? Many Thanks for the answer and the help. On Jul 31, 2013, at 3:34 PM, Erick Erickson erickerick...@gmail.com wrote: Nope. Solr fields are flat. Why do you want to do this? I'm asking because this might be an XY problems and there may be other possibilities. Best Erick On Wed, Jul 31, 2013 at 5:09 AM, Luís Portela Afonso meligalet...@gmail.com wrote: Hi, I'm trying to create a field with multiple fields inside, that is: origin: { htmlUrl: http://www.gazzetta.it/;, streamId: feed/http://www.gazzetta.it/rss/Home.xml;, title: Gazzetta.it
Re: Sharding with a SolrCloud
Well, assuming you have solved the differences in statistics between the index you maintain and the one in the cloud with respect to the scoring... My comment about indexing is probably irrelevant, you're not indexing anything to the SolrCloud cluster. But still doubt this will work. Here's the problem: Internally, the round-trip looks like this: node 1 receives request node 1 sends requests to all the shards node 1 receives the top N docs from each shard node 1 collates those to the real top N node1 then queries each shard for the docs hosted on those shards. This last step is where I'd expect just adding shard to the list that happened to be a separate SolrCloud instance to fall down, the originating node would expect to just get the documents from the shard it knew about. And if you list _all_ the shards in the SolrCloud instance, then each of them will distribute the request to all shards in the SolrCloud instance, confusing things even more. Much of this is speculation, but I can imagine a number of ways this scenario would go bad, it wasn't one of the design goals as far as I know. Best Erick On Wed, Jul 31, 2013 at 11:01 AM, Oliver Goldschmidt o.goldschm...@tuhh.de wrote: Thank you very much for that information, Erick. That was what I was fearing... Well, the problem, why I am trying to do this is, that the SolrCloud is managed by someone else. We are indexing some content to a pretty small local index. To this index we have complete access and can do whatever we want to do. But we also need the seperate index, which is now moving into the cloud. Its not possible to put our local content into the cloud, because we are not maintaining it and have no write permission to it. But why shouldn't that work? Isn't Solr Cloud acting like one solr server? The indices have to be maintained seperately - can't I just continue using them as shards and get one result list from both of them (thats how I did it before they wanted to switch to Solr Cloud)? Though, if there is no way to use the cloud as a shard, we will have to think about how to solve that. Of course we can split up the queries and make two queries (one for the cloud and one for our local index). But this might be a bit confusing for the user. Thank you again, best - Oliver Am 31.07.2013 16:39, schrieb Erick Erickson: You're in uncharted territory. I can imagine you use a SolrCloud cluster as a separate Solr for a federated search, but using it as a single shard just seems wrong. If nothing else, indexing to the shards will require that the documents be routed correctly. But having one shard in SolrCloud and another shard managed externally seems ripe for getting the docs indexed to various shards you're not expecting, unless you're using explicit routing All in all, this _really_ sounds like something you should not be attempting. Why are you trying to do this? Is it possible to just set up a SolrCloud cluster and index all the docs to it and be done with it? 'cause I think you'll end up with endless problems given what you've described. Best Erick On Wed, Jul 31, 2013 at 5:16 AM, Oliver Goldschmidt o.goldschm...@tuhh.de wrote: Hi list, I have a Solr server, which uses sharding to make distributed search with another Solr server. The other Solr server now migrates to a Solr Cloud system. I've been trying recently to continue searching the Solr Cloud as a shard for my Solr server, but this is failing with mysterious effects. I am getting a result with a number of hits, when I perform a search, but the results are not displayed at all. This is the resonse header I am getting from Solr: { responseHeader:{ status:0, QTime:305, params:{ facet:true, indent:yes, facet.mincount:1, facet.limit:30, qf:title_short^750 title_full_unstemmed^600, json.nl:arrarr, wt:json, rows:20, shards:ourindex.nowhere.de/solr/index, bq:format:Book^500, fl:*,score, facet.sort:count, start:0, q:xml, shards.info:true, facet.prefix:, facet.field:[publishDate], qt:dismax}}, shards.info:{ ourindex.nowhere.de/solr/index:{ numFound:10076, maxScore:8.507474, time:263}}, response:{numFound:10056,start:0,maxScore:8.507474,docs:[] } As you can see, there are no docs in the result. This result is not 100% reproducable: sometimes I get no results displayed, other times it works (with the same query URL!). As you also can see in the result, the number of hits in the response is a little bit less than the number of hits sent from the shard. This makes me wonder if it is not possible to use a Solr Cloud as a shard for another standalone Solr server? Any hint is appreciated! Best - Oliver -- Oliver Goldschmidt TU Hamburg-Harburg / Universitätsbibliothek / Digitale Dienste Denickestr. 22 21071 Hamburg - Harburg Tel.+49 (0)40
Re: Autowarming last 15 days data
On 7/31/2013 9:21 AM, Cool Techi wrote: Would it make sense if we open a newSearcher with the last 15 days documents? since these is the documents which are mostly used by the users. Also, how could i do this if this is possible? When you open a searcher, it's for the entire index. You may want to go distributed and keep the newest 15 days of data in a separate index from the rest. For my own index, I use this hot/cold shard setup. I have a nightly process that indexes data that needs to be moved into the cold shards and deletes it from the hot shard. http://wiki.apache.org/solr/DistributedSearch SolrCloud is the future of distributed search, but it does not have built-in support for a hot/cold shard setup. You'd need to manage that yourself with manual sharding. A custom sharding plugin to automate indexing would likely be very very involved, it would probably be easier to manage it outside of SolrCloud. Thanks, Shawn
solr 4.4 multiple datasource connection
in my db-data-config.xml i have configured two datasource, each with his parameter name, for example: dataSource name=test1 type=JdbcDataSource driver=com.mysql.jdbc.Driver url=jdbc:mysql://localhost/firstdb user=username1 password=psw1/ dataSource name=test2 type=JdbcDataSource driver=com.mysql.jdbc.Driver url=jdbc:mysql://localhost/seconddb user=username2 password=psw2/ document name=content entity name=news datasource=test1 query=select... field column=OTYPE_ID name=otypeID / field column=NWS_ID name=cntID / /entity entity name=news_update datasource=test2 query=select... field column=OTYPE_ID name=otypeID / field column=NWS_ID name=cntID / /entity /document /dataConfig but when in solr from dataimport i execute the second entity-name-query it launch an exception: *Table 'firstdb.secondTable' doesn't exist\n\tat* could someone help me? thank you in advance http://stackoverflow.com/questions/17974029/solr-4-4-multiple-datasource-connection
Re: solr 4.4 multiple datasource connection
On Wed, Jul 31, 2013 at 11:49 AM, Carmine Paternoster carmine...@gmail.comwrote: entity name=news datasource=test1 query=select... Try datasource = dataSource in: entity name=news datasource=test1 query=select... entity name=news_update datasource=test2 query=select... Regards, Alex. P.s. This check will be (eventually) part of SolrLint: https://github.com/arafalov/SolrLint/issues/7 Personal website: http://www.outerthoughts.com/ LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch - Time is the quality of nature that keeps events from happening all at once. Lately, it doesn't seem to be working. (Anonymous - via GTD book)
Re: Solr PolyField
See: https://builds.apache.org/job/Solr-Artifacts-4.x/javadoc/solr-core/org/apache/solr/update/processor/CloneFieldUpdateProcessorFactory.html I have more examples in my book. -- Jack Krupansky From: Luís Portela Afonso Sent: Wednesday, July 31, 2013 11:41 AM To: solr-user@lucene.apache.org Subject: Re: Solr PolyField Hum, ok. It's possible to add to a field, static text? Text that i write on the configuration and then append another field? I saw something like CloneFieldProcessor but when i'm starting solr, it says that could not find the class. I was trying to use processors to move one field to another. I saw this: processor class=solr.FieldCopyProcessorFactory str name=sourcelastname firstname/str str name=destfullname/str bool name=appendtrue/bool str name=append.delim, /str /processor But when i try to use it solr says that he cannot find the solr.FieldCopyProcessorFactory. I'm using solr 4.4.0 Thanks ;) On Jul 31, 2013, at 4:16 PM, Michael Della Bitta michael.della.bi...@appinions.com wrote: OK, Then I would suggest creating multiValued enclosure_type, etc. tags for searching, and then one string-typed field to store the JSON snippet you've been showing. Michael Della Bitta Applications Developer o: +1 646 532 3062 | c: +1 917 477 7906 appinions inc. “The Science of Influence Marketing” 18 East 41st Street New York, NY 10017 t: @appinions https://twitter.com/Appinions | g+: plus.google.com/appinionshttps://plus.google.com/u/0/b/112002776285509593336/112002776285509593336/posts w: appinions.com http://www.appinions.com/ On Wed, Jul 31, 2013 at 11:11 AM, Luís Portela Afonso meligalet...@gmail.com wrote: As a single record? Hum, no. So an Rss has /rss/channel/ and then lot of /rss/channel/item, right? Each /rss/channel/item is a new document on Solr. I start with the solr example rss, but i change that to has more fields, other fields and get the feed url from a database. So each /rss/channel/item is a document to the indexing, bue each /rss/channel/item can have more than on enclosure tag. Many thanks On Jul 31, 2013, at 4:05 PM, Michael Della Bitta michael.della.bi...@appinions.com wrote: So you're trying to index a RSS feed as a single record, but you want to be able to search for and retrieve individual entries from within the feed? Is that the issue? Michael Della Bitta Applications Developer o: +1 646 532 3062 | c: +1 917 477 7906 appinions inc. “The Science of Influence Marketing” 18 East 41st Street New York, NY 10017 t: @appinions https://twitter.com/Appinions | g+: plus.google.com/appinions https://plus.google.com/u/0/b/112002776285509593336/112002776285509593336/posts w: appinions.com http://www.appinions.com/ On Wed, Jul 31, 2013 at 10:59 AM, Luís Portela Afonso meligalet...@gmail.com wrote: This fields can be multiValued. I the rss standart there is not correct to do that, but some sources do and i like to grab it all. Is there any way that make it possible? Once again, Many thanks :) On Jul 31, 2013, at 3:54 PM, Michael Della Bitta michael.della.bi...@appinions.com wrote: Luís, Is there a reason why splitting this up into enclosure_type, enclosure_url, and enclosure_length would not work? Michael Della Bitta Applications Developer o: +1 646 532 3062 | c: +1 917 477 7906 appinions inc. “The Science of Influence Marketing” 18 East 41st Street New York, NY 10017 t: @appinions https://twitter.com/Appinions | g+: plus.google.com/appinions https://plus.google.com/u/0/b/112002776285509593336/112002776285509593336/posts w: appinions.com http://www.appinions.com/ On Wed, Jul 31, 2013 at 10:43 AM, Luís Portela Afonso meligalet...@gmail.com wrote: Hi, I'm trying to index information of RSS Feeds. So in a more detailed explanation: The RSS feed has something like: enclosure url= http://www.engadget.com/podcasts/Engadget_Podcast_353.mp3; length=32642192 type=audio/mpeg/ *With my current configuration, this is working and i get a result like that:* - enclosure: [ - audio/mpeg, - http://www.engadget.com/podcasts/EngadgetHD_Podcast_359.mp3;, - 37521428 ], *BUT,* this is not the result that i'm trying to reach. With that i'm not able to know in a correct way, if audio/mpeg is the *type*, or the * url,* or the *length*. * * *I want to reach
Re: Solr Cloud Setup
Flavio, There was a problem with the solrconfig and schema files. One of the team members had deleted some entries in the solrconfig.xml and I was picking the same solr configuration everytime, I got the latest version of solr and carefully edited the solrconfig and schema files and it worked. We have the cloud up and running, testing is in progress and it looks good. Thanks for all your help. -Aditya -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-Cloud-Setup-tp4080182p4081654.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Measuring SOLR performance
Hi Dmitry, probably mistake in the readme, try calling it with -q /home/dmitry/projects/lab/solrjmeter/queries/demo/demo.queries as for the base_url, i was testing it on solr4.0, where it tries contactin /solr/admin/system - is it different for 4.3? I guess I should make it configurable (it already is, the endpoint is set at the check_options()) thanks roman On Wed, Jul 31, 2013 at 10:01 AM, Dmitry Kan solrexp...@gmail.com wrote: Ok, got the error fixed by modifying the base solr ulr in solrjmeter.py (added core name after /solr part). Next error is: WARNING: no test name(s) supplied nor found in: ['/home/dmitry/projects/lab/solrjmeter/demo/queries/demo.queries'] It is a 'slow start with new tool' symptom I guess.. :) On Wed, Jul 31, 2013 at 4:39 PM, Dmitry Kan solrexp...@gmail.com wrote: Hi Roman, What version and config of SOLR does the tool expect? Tried to run, but got: **ERROR** File solrjmeter.py, line 1390, in module main(sys.argv) File solrjmeter.py, line 1296, in main check_prerequisities(options) File solrjmeter.py, line 351, in check_prerequisities error('Cannot contact: %s' % options.query_endpoint) File solrjmeter.py, line 66, in error traceback.print_stack() Cannot contact: http://localhost:8983/solr complains about URL, clicking which leads properly to the admin page... solr 4.3.1, 2 cores shard Dmitry On Wed, Jul 31, 2013 at 3:59 AM, Roman Chyla roman.ch...@gmail.comwrote: Hello, I have been wanting some tools for measuring performance of SOLR, similar to Mike McCandles' lucene benchmark. so yet another monitor was born, is described here: http://29min.wordpress.com/2013/07/31/measuring-solr-query-performance/ I tested it on the problem of garbage collectors (see the blogs for details) and so far I can't conclude whether highly customized G1 is better than highly customized CMS, but I think interesting details can be seen there. Hope this helps someone, and of course, feel free to improve the tool and share! roman
RE: monitor jvm heap size for solrcloud
Thanks for all answers. We decided to use VisualVM with multiple remote connections. -Original Message- From: Utkarsh Sengar [mailto:utkarsh2...@gmail.com] Sent: Friday, July 26, 2013 6:19 PM To: solr-user@lucene.apache.org Subject: Re: monitor jvm heap size for solrcloud We have been using newrelic (they have a free plan too) and gives all needed info like: jvm heap usage in eden space, survivor space and old gen. Garbage collection info, detailed info about the solr requests and its response times, error rates etc. I highly recommend using newrelic to monitor your solr cluster: http://blog.newrelic.com/2010/05/11/got-apache-solr-search-server-use-rpm-to-monitor-troubleshoot-and-tune-solr-operations/ Thanks, -Utkarsh On Fri, Jul 26, 2013 at 2:38 PM, SolrLover bbar...@gmail.com wrote: I have used JMX with SOLR before.. http://docs.lucidworks.com/display/solr/Using+JMX+with+Solr -- View this message in context: http://lucene.472066.n3.nabble.com/monitor-jvm-heap-size-for-solrcloud-tp4080713p4080725.html Sent from the Solr - User mailing list archive at Nabble.com. -- Thanks, -Utkarsh
Re: Does solr cloud support rename or swap function for collection?
This is awesome news. I had been looking for the ability to do this with SolrCloud since 4.0.0-ALPHA. We're on 4.1.0 right now, so this is a great reason to plan for an upgrade. Just to be clear, CREATEALIAS both creates and updates an alias, right? -- View this message in context: http://lucene.472066.n3.nabble.com/Does-solr-cloud-support-rename-or-swap-function-for-collection-tp4054193p4081660.html Sent from the Solr - User mailing list archive at Nabble.com.
upgrade from 4.3 to 4.4
We have SolrCloud (4.3.0) cluster (5 shards and 2 replicas) on 10 boxes. We have about 450 million documents. We're planning to upgrade to Solr 4.4.0. Do We need to re-index already indexed documents? Thanks!
Re: Measuring SOLR performance
I'll try to run it with the new parameters and let you know how it goes. I've rechecked details for the G1 (default) garbage collector run and I can confirm that 2 out of 3 runs were showing high max response times, in some cases even 10secs, but the customized G1 never - so definitely the parameters had effect because the max time for the customized G1 never went higher than 1.5secs (and that happend for 2 query classes only). Both the cms-custom and G1-custom are similar, the G1 seems to have higher values in the max fields, but that may be random. So, yes, now I am sure what to think of default G1 as 'bad', and that these G1 parameters, even if they don't seem G1 specific, have real effect. Thanks, roman On Tue, Jul 30, 2013 at 11:01 PM, Shawn Heisey s...@elyograg.org wrote: On 7/30/2013 6:59 PM, Roman Chyla wrote: I have been wanting some tools for measuring performance of SOLR, similar to Mike McCandles' lucene benchmark. so yet another monitor was born, is described here: http://29min.wordpress.com/2013/07/31/measuring-solr-query-performance/ I tested it on the problem of garbage collectors (see the blogs for details) and so far I can't conclude whether highly customized G1 is better than highly customized CMS, but I think interesting details can be seen there. Hope this helps someone, and of course, feel free to improve the tool and share! I have a CMS config that's even more tuned than before, and it has made things MUCH better. This new config is inspired by more info that I got on IRC: http://wiki.apache.org/solr/ShawnHeisey#GC_Tuning The G1 customizations in your blog post don't look like they are really G1-specific - they may be useful with CMS as well. This statement also applies to some of the CMS parameters, so I would use those with G1 as well for any testing. UseNUMA looks interesting for machines that actually are NUMA. All the information that I can find says it is only for the throughput (parallel) collector, so it's probably not doing anything for G1. The pause parameters you've got for G1 are targets only. It will *try* to stick within those parameters, but if a collection requires more than 50 milliseconds or has to happen more often than once a second, the collector will ignore what you have told it. Thanks, Shawn
Re: SolrCloud Exception
Thanks shawn for the reply. I would upgrade to solr 4.3 and check that. On Wed, Jul 31, 2013 at 4:13 PM, Shawn Heisey s...@elyograg.org wrote: On 7/31/2013 4:27 AM, Sinduja Rajendran wrote: I am running solr 4.0 in a cloud. We have close to 100Mdocuments. The data is from a single DB table. I use dih. Our solrCloud has 3 zookeepers, one tomcat, 2 solr instances in same tomcat. We have 8 GB Ram. After indexing 14M, my indexing fails witht the below exception. solr org.apache.lucene.index.MergePolicy$MergeException: java.lang.OutOfMemoryError: GC overhead limit exceeded I tried increasing the GC value to the App server -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=80 But after giving the command, my indexing went drastically down. Its was indexing only 15k documents for 20 minutes. Earlier it was 300k for 20 min. First thing to mention is that Solr 4.0 was extremely buggy, upgrading would be advisable. In the meantime: An OutOfMemoryError means that Solr needs more heap memory than the JVM is allowed to use. The Solr Admin UI dashboard will tell you how much memory is allocated to your JVM, which you can increase with the -Xmx parameter. Real RAM must be available from the system in order to increase the heap size. The options you have given just change the GC collector and tune one aspect of the new collector, they don't increase anything. Here are some things that may help you: http://wiki.apache.org/solr/SolrPerformanceProblems http://wiki.apache.org/solr/ShawnHeisey#GC_Tuning After looking over that information and making adjustments, if you are still having trouble, we can go over your config and all your details to see what can be done. You said that both of your Solr instances are running in the same tomcat. Just FYI - because you aren't running all functions on separate hardware, your setup is not fault tolerant. Machine failures DO happen, no matter how much redundancy you build into that server. If you are running all this on a redundant VM solution that has live migration of running VMs, then my statement isn't accurate. Thanks, Shawn
Re: upgrade from 4.3 to 4.4
A dot release should never require reindexing, unless... there is some change in a field type analyzer or update processor that your data depends on. For example, some changes occurred in the ngram filter, so whether that would impact your data is up to you to decide. See: https://issues.apache.org/jira/browse/LUCENE-4955 There were a few other changes as well - you need to review each change yourself. -- Jack Krupansky -Original Message- From: Joshi, Shital Sent: Wednesday, July 31, 2013 12:31 PM To: 'solr-user@lucene.apache.org' Subject: upgrade from 4.3 to 4.4 We have SolrCloud (4.3.0) cluster (5 shards and 2 replicas) on 10 boxes. We have about 450 million documents. We're planning to upgrade to Solr 4.4.0. Do We need to re-index already indexed documents? Thanks!
RE: Measuring SOLR performance
Did you also test indexing speed? With default G1GC settings we're seeing a slightly higher latency for queries than CMS. However, G1GC allows for much higher throughput than CMS when indexing. I haven't got the raw numbers here but it is roughly 45 minutes against 60 in favour of G1GC! Load is obviously higher with G1GC. -Original message- From:Roman Chyla roman.ch...@gmail.com Sent: Wednesday 31st July 2013 18:32 To: solr-user@lucene.apache.org Subject: Re: Measuring SOLR performance I'll try to run it with the new parameters and let you know how it goes. I've rechecked details for the G1 (default) garbage collector run and I can confirm that 2 out of 3 runs were showing high max response times, in some cases even 10secs, but the customized G1 never - so definitely the parameters had effect because the max time for the customized G1 never went higher than 1.5secs (and that happend for 2 query classes only). Both the cms-custom and G1-custom are similar, the G1 seems to have higher values in the max fields, but that may be random. So, yes, now I am sure what to think of default G1 as 'bad', and that these G1 parameters, even if they don't seem G1 specific, have real effect. Thanks, roman On Tue, Jul 30, 2013 at 11:01 PM, Shawn Heisey s...@elyograg.org wrote: On 7/30/2013 6:59 PM, Roman Chyla wrote: I have been wanting some tools for measuring performance of SOLR, similar to Mike McCandles' lucene benchmark. so yet another monitor was born, is described here: http://29min.wordpress.com/2013/07/31/measuring-solr-query-performance/ I tested it on the problem of garbage collectors (see the blogs for details) and so far I can't conclude whether highly customized G1 is better than highly customized CMS, but I think interesting details can be seen there. Hope this helps someone, and of course, feel free to improve the tool and share! I have a CMS config that's even more tuned than before, and it has made things MUCH better. This new config is inspired by more info that I got on IRC: http://wiki.apache.org/solr/ShawnHeisey#GC_Tuning The G1 customizations in your blog post don't look like they are really G1-specific - they may be useful with CMS as well. This statement also applies to some of the CMS parameters, so I would use those with G1 as well for any testing. UseNUMA looks interesting for machines that actually are NUMA. All the information that I can find says it is only for the throughput (parallel) collector, so it's probably not doing anything for G1. The pause parameters you've got for G1 are targets only. It will *try* to stick within those parameters, but if a collection requires more than 50 milliseconds or has to happen more often than once a second, the collector will ignore what you have told it. Thanks, Shawn
Re: Solr PolyField
Ok, thanks. I will check it. On Jul 31, 2013, at 5:08 PM, Jack Krupansky j...@basetechnology.com wrote: See: https://builds.apache.org/job/Solr-Artifacts-4.x/javadoc/solr-core/org/apache/solr/update/processor/CloneFieldUpdateProcessorFactory.html I have more examples in my book. -- Jack Krupansky From: Luís Portela Afonso Sent: Wednesday, July 31, 2013 11:41 AM To: solr-user@lucene.apache.org Subject: Re: Solr PolyField Hum, ok. It's possible to add to a field, static text? Text that i write on the configuration and then append another field? I saw something like CloneFieldProcessor but when i'm starting solr, it says that could not find the class. I was trying to use processors to move one field to another. I saw this: processor class=solr.FieldCopyProcessorFactory str name=sourcelastname firstname/str str name=destfullname/str bool name=appendtrue/bool str name=append.delim, /str /processor But when i try to use it solr says that he cannot find the solr.FieldCopyProcessorFactory. I'm using solr 4.4.0 Thanks ;) On Jul 31, 2013, at 4:16 PM, Michael Della Bitta michael.della.bi...@appinions.com wrote: OK, Then I would suggest creating multiValued enclosure_type, etc. tags for searching, and then one string-typed field to store the JSON snippet you've been showing. Michael Della Bitta Applications Developer o: +1 646 532 3062 | c: +1 917 477 7906 appinions inc. “The Science of Influence Marketing” 18 East 41st Street New York, NY 10017 t: @appinions https://twitter.com/Appinions | g+: plus.google.com/appinionshttps://plus.google.com/u/0/b/112002776285509593336/112002776285509593336/posts w: appinions.com http://www.appinions.com/ On Wed, Jul 31, 2013 at 11:11 AM, Luís Portela Afonso meligalet...@gmail.com wrote: As a single record? Hum, no. So an Rss has /rss/channel/ and then lot of /rss/channel/item, right? Each /rss/channel/item is a new document on Solr. I start with the solr example rss, but i change that to has more fields, other fields and get the feed url from a database. So each /rss/channel/item is a document to the indexing, bue each /rss/channel/item can have more than on enclosure tag. Many thanks On Jul 31, 2013, at 4:05 PM, Michael Della Bitta michael.della.bi...@appinions.com wrote: So you're trying to index a RSS feed as a single record, but you want to be able to search for and retrieve individual entries from within the feed? Is that the issue? Michael Della Bitta Applications Developer o: +1 646 532 3062 | c: +1 917 477 7906 appinions inc. “The Science of Influence Marketing” 18 East 41st Street New York, NY 10017 t: @appinions https://twitter.com/Appinions | g+: plus.google.com/appinions https://plus.google.com/u/0/b/112002776285509593336/112002776285509593336/posts w: appinions.com http://www.appinions.com/ On Wed, Jul 31, 2013 at 10:59 AM, Luís Portela Afonso meligalet...@gmail.com wrote: This fields can be multiValued. I the rss standart there is not correct to do that, but some sources do and i like to grab it all. Is there any way that make it possible? Once again, Many thanks :) On Jul 31, 2013, at 3:54 PM, Michael Della Bitta michael.della.bi...@appinions.com wrote: Luís, Is there a reason why splitting this up into enclosure_type, enclosure_url, and enclosure_length would not work? Michael Della Bitta Applications Developer o: +1 646 532 3062 | c: +1 917 477 7906 appinions inc. “The Science of Influence Marketing” 18 East 41st Street New York, NY 10017 t: @appinions https://twitter.com/Appinions | g+: plus.google.com/appinions https://plus.google.com/u/0/b/112002776285509593336/112002776285509593336/posts w: appinions.com http://www.appinions.com/ On Wed, Jul 31, 2013 at 10:43 AM, Luís Portela Afonso meligalet...@gmail.com wrote: Hi, I'm trying to index information of RSS Feeds. So in a more detailed explanation: The RSS feed has something like: enclosure url= http://www.engadget.com/podcasts/Engadget_Podcast_353.mp3; length=32642192 type=audio/mpeg/ *With my current configuration, this is working and i get a result like that:* - enclosure: [ - audio/mpeg, - http://www.engadget.com/podcasts/EngadgetHD_Podcast_359.mp3;, - 37521428 ], *BUT,* this is not the result that i'm trying to
RE: Ingesting geo data into Solr very slow
Hi guys, Here is the reply I got from the solr group. I'll change those settings. It's good to know that it doesn't matter if we use the bean vs solr doc. -Marta -Original Message- From: David Smiley (@MITRE.org) [mailto:dsmi...@mitre.org] Sent: Tuesday, July 30, 2013 9:08 PM To: solr-user@lucene.apache.org Subject: Re: Ingesting geo data into Solr very slow Hi Marta, Presumably you are indexing polygons -- I suspect complex ones. There isn't too much that you can do about this right now other than index them in parallel. I see you are doing this in 2 threads; try 4, or maybe even 6. Also, ensure that maxDistErr is reflective of the smallest distance you need to distinguish between. It may help a little but not much. I can think of some internal code details that might be improved but that doesn't help you now. There's some generic Solr things you can do to improve indexing performance too like increasing the indexing buffer size (100MB - 200MB) and the mergeFactor (10-20 albeit temporarily and/or issue optimize), both in solrconfig.xml. Changing the servlet engine won't help. Calling server.addBean(item) isn't a problem either. ~ David Simonian, Marta M (US SSA) wrote Hi, We are using Solr 4.4 to ingest geo data and it's really slow. When we don't index the geo it takes seconds to ingest 100, 000 records but as soon as we add it takes 2 hours. Also we found that when changing the distErrPct from 0.025 to 0.1, 1000 rows are ingested in 20 sec vs 2 min. But we can't change that setting as we want our search to be as accurate as possible. About the environment we are running Solr on 6 CPUs and 8GB of memory. We've been monitoring the VMs and they seem to be ok. We are running on Tomcat but we might switch to Jetty to see if that will increase the performance. We use ConcurrentUpdateSolrServer(httpSolrServer, 5000, 2); We are saving a bean rather than a solr document (server.addBean(item)). I'm not sure if that could make it slow as it's going to do some conversion? Can you please let me know what are the best settings for Solr? Maybe some changes in the solrconfig.xml or the schema.xml? What are the preferred environment settings and resources? Thank you! Marta - Author: http://www.packtpub.com/apache-solr-3-enterprise-search-server/book -- View this message in context: http://lucene.472066.n3.nabble.com/Ingesting-geo-data-into-Solr-very-slow-tp4081484p4081527.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: Ingesting geo data into Solr very slow
Does anybody know if Solr performs better on Jetty vs Tomcat? -Original Message- From: David Smiley (@MITRE.org) [mailto:dsmi...@mitre.org] Sent: Tuesday, July 30, 2013 9:08 PM To: solr-user@lucene.apache.org Subject: Re: Ingesting geo data into Solr very slow Hi Marta, Presumably you are indexing polygons -- I suspect complex ones. There isn't too much that you can do about this right now other than index them in parallel. I see you are doing this in 2 threads; try 4, or maybe even 6. Also, ensure that maxDistErr is reflective of the smallest distance you need to distinguish between. It may help a little but not much. I can think of some internal code details that might be improved but that doesn't help you now. There's some generic Solr things you can do to improve indexing performance too like increasing the indexing buffer size (100MB - 200MB) and the mergeFactor (10-20 albeit temporarily and/or issue optimize), both in solrconfig.xml. Changing the servlet engine won't help. Calling server.addBean(item) isn't a problem either. ~ David Simonian, Marta M (US SSA) wrote Hi, We are using Solr 4.4 to ingest geo data and it's really slow. When we don't index the geo it takes seconds to ingest 100, 000 records but as soon as we add it takes 2 hours. Also we found that when changing the distErrPct from 0.025 to 0.1, 1000 rows are ingested in 20 sec vs 2 min. But we can't change that setting as we want our search to be as accurate as possible. About the environment we are running Solr on 6 CPUs and 8GB of memory. We've been monitoring the VMs and they seem to be ok. We are running on Tomcat but we might switch to Jetty to see if that will increase the performance. We use ConcurrentUpdateSolrServer(httpSolrServer, 5000, 2); We are saving a bean rather than a solr document (server.addBean(item)). I'm not sure if that could make it slow as it's going to do some conversion? Can you please let me know what are the best settings for Solr? Maybe some changes in the solrconfig.xml or the schema.xml? What are the preferred environment settings and resources? Thank you! Marta - Author: http://www.packtpub.com/apache-solr-3-enterprise-search-server/book -- View this message in context: http://lucene.472066.n3.nabble.com/Ingesting-geo-data-into-Solr-very-slow-tp4081484p4081527.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Ingesting geo data into Solr very slow
On 7/31/2013 11:20 AM, Simonian, Marta M (US SSA) wrote: Does anybody know if Solr performs better on Jetty vs Tomcat? Jetty has less complexity than tomcat. It is likely to use less memory. If you went with default settings for both, jetty is likely to perform better, but the difference would probably be very small. If you understand how to tune your servlet container, then there's no way to answer that question. You should use whatever you are comfortable with. A well-tuned tomcat server would probably perform better than the default example jetty - but you have to do that tuning. The only concrete information I can give you is this: Solr tests use jetty, so jetty is the only container that is fully tested with Solr. Bugs *have* been found with other containers, and they get fixed as fast as possible. The other point worth reiterating: Unless you carefully tune your container, something this list can't really help you with, the container choice probably isn't going to affect performance much. Thanks, Shawn
Alternative searches
Can someone explain how one would go about providing alternative searches for a query… similar to Amazon. For example say I search for Red Dump Truck - 0 results for Red Dump Truck - 500 results for Red Truck - 350 results for Dump Truck Does this require multiple searches? Thanks
Re: Solr list all records but fq matching records first
I was going to say 10, but frequently people find that they need a really big boost. Normally, a boost might be 1.5 or 2 or 5, or something like that. A fractional boost, like 0.5, 0.25, 0.1, or even 0.01 can de-emphasize terms. If you add debugQuery=true to your query request and look at the explain section, you can see all the scores and intermediate scores to get an idea how big a boost a document needs to make it move as desired. -- Jack Krupansky -Original Message- From: Thyagaraj Sent: Wednesday, July 31, 2013 1:34 PM To: solr-user@lucene.apache.org Subject: Re: Solr list all records but fq matching records first Awesome Jack Krupansky-2!!!. It seems to work!. What I didn't understand is *^100*. Could you give some explanation on ^100 please? if it could be any number other than 100?. Thanks a lot!, I was working on this for past 3 days!. -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-list-all-records-but-fq-matching-records-first-tp4081572p4081677.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr list all records but fq matching records first
Awesome Jack Krupansky-2!!!. It seems to work!. What I didn't understand is *^100*. Could you give some explanation on ^100 please? if it could be any number other than 100?. Thanks a lot!, I was working on this for past 3 days!. -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-list-all-records-but-fq-matching-records-first-tp4081572p4081677.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: TrieField and FieldCache confusion
: Can I expect the FieldCache of Lucene to return the correct values when : working : with TrieField with the precisionStep higher than 0. If not, what did I get : wrong? Yes -- the code for building FieldCaches from Trie fields is smart enough to ensure that only the real original values are used to populate the Cache (See for example: FieldCache.NUMERIC_UTILS_INT_PARSER and the classes linked to from it's javadocs... https://lucene.apache.org/core/4_4_0/core/org/apache/lucene/search/FieldCache.html#NUMERIC_UTILS_INT_PARSER https://lucene.apache.org/core/4_4_0/core/org/apache/lucene/util/NumericUtils.html https://lucene.apache.org/core/4_4_0/core/org/apache/lucene/document/IntField.html (Solr's Trie fields are backed by the various numeric fields in lucene -- ie: solr:TrieIntField - lucene:IntField. the Trie* prefix is used in solr because there already had classes named IntField, DoubleField, etc... when the Trie based impls where added to lucene) -Hoss
Re: Improper shutdown of Solr in Jetty 9
: it's Windows 7. I'm starting Jetty with java -jar start.jar Not sure if you are using cygwin, or if this is related but... https://issues.apache.org/jira/browse/SOLR-3884 https://issues.apache.org/jira/browse/SOLR-3884?focusedCommentId=13462996page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13462996 https://issues.apache.org/jira/browse/SOLR-3884?focusedCommentId=13463332page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13463332 http://cygwin.com/ml/cygwin/2012-07/msg00250.html http://cygwin.com/ml/cygwin/2012-05/msg00482.html -Hoss
Re: queryResultCache showing all zeros
: We just configured a new Solr cloud (5 nodes) running Solr 4.3, ran : about 200 000 queries taken from our production environment and measured : the performance of the cloud over a collection of 14M documents with the : default Solr settings. We are now trying to tune the different caches : and when I look at each node of the cloud, all of them are showing no : activity (see below) regarding the queryResultCache... all other caches : are showing some activity. Any idea what could cause this? Can you show us some examples of hte types of queries you are executing? Do you have useFilterForSortedQuery in your solrconfig.xml ? -Hoss
Re: FieldCollapsing issues in SolrCloud 4.4
Hello Paul, Can you please explain what you mean by: To get the exact number of groups, you need to shard along your grouping field Thanks! :) On Wed, Jul 31, 2013 at 3:08 AM, Paul Masurel paul.masu...@gmail.comwrote: Do you mean you get different results with group=true? numFound is supposed returns the number of ungrouped hits. To get the number of groups, you are expected to set set group.ngroups=true. Even then, the result will only give you an upperbound in a distributed environment. To get the exact number of groups, you need to shard along your grouping field. If you have many groups, you may also experience a huge performance hit, as the current implementation has been heaviy optimized for low number of groups (e.g. e-commerce categories). Paul On Wed, Jul 31, 2013 at 1:59 AM, Ali, Saqib docbook@gmail.com wrote: Hello all, Is anyone experiencing issues with the numFound when using group=true in SolrCloud 4.4? Sometimes the results are off for us. I will post more details shortly. Thanks. -- __ Masurel Paul e-mail: paul.masu...@gmail.com
Re: Sending shard requests to all replicas
Thanks to Ryan Ernst, my issue is duplicate of SOLR-4449. I think that this proposal might be very useful (some supporting links are attached there. worth reading..) On Tue, Jul 30, 2013 at 11:49 PM, Isaac Hebsh isaac.he...@gmail.com wrote: Hi, I submitted a new JIRA for this: https://issues.apache.org/jira/browse/SOLR-5092 A (very initial) patch is already attached. Reviews are very welcome. On Sun, Jul 28, 2013 at 4:50 PM, Erick Erickson erickerick...@gmail.comwrote: You'd probably start in CloudSolrServer in SolrJ code, as far as I know that's where the request is sent out. I'd think that would be better than changing Solr itself since if you found that this was useful you wouldn't be patching your Solr release, just keeping your client up to date. Best Erick On Sat, Jul 27, 2013 at 7:28 PM, Isaac Hebsh isaac.he...@gmail.com wrote: Shawn, thank you for the tips. I know the significant cons of virtualization, but I don't want to move this thread into a virtualization pros/cons in the Solr(Cloud) case. I've just asked what is the minimal code change should be made, in order to examine whether this is a possible solution or not.. :) On Sun, Jul 28, 2013 at 1:06 AM, Shawn Heisey s...@elyograg.org wrote: On 7/27/2013 3:33 PM, Isaac Hebsh wrote: I have about 40 shards. repFactor=2. The cause of slower shards is very interesting, and this is the main approach we took. Note that in every query, it is another shard which is the slowest. In 20% of the queries, the slowest shard takes about 4 times more than the average shard qtime. While continuing investigation, remember it might be the virtualization / storage-access / network / gc /..., so I thought that reducing the effect of the slow shards might be a good (temporary or permanent) solution. Virtualization is not the best approach for Solr. Assuming you're dealing with your own hardware and not something based in the cloud like Amazon, you can get better results by running on bare metal and having multiple shards per host. Garbage collection is a very likely source of this problem. http://wiki.apache.org/solr/SolrPerformanceProblems#GC_pause_problems I thought it should be an almost trivial code change (for proving the concept). Isn't it? I have no idea what you're saying/asking here. Can you clarify? It seems to me that sending requests to all replicas would just increase the overall load on the cluster, with no real benefit. Thanks, Shawn
RE: queryResultCache showing all zeros
Looks like the problem might not be related to Solr but to a proprietary system we have on top of it. I made some queries with facets and the cache was updated. We are looking into this... I should not have assumed that the problem was coming from Solr ;) I'll let you know if there is anything From: Chris Hostetter Sent: Wednesday, July 31, 2013 1:58 PM To: solr-user@lucene.apache.org Subject: Re: queryResultCache showing all zeros : We just configured a new Solr cloud (5 nodes) running Solr 4.3, ran : about 200 000 queries taken from our production environment and measured : the performance of the cloud over a collection of 14M documents with the : default Solr settings. We are now trying to tune the different caches : and when I look at each node of the cloud, all of them are showing no : activity (see below) regarding the queryResultCache... all other caches : are showing some activity. Any idea what could cause this? Can you show us some examples of hte types of queries you are executing? Do you have useFilterForSortedQuery in your solrconfig.xml ? -Hoss
RE: Highlighting externally stored text
Hey Bryan, Thanks for the response! To make use of the FastVectorHighlighter you need to enable termVectors, termPositions, and termOffsets correct? Which takes a considerable amount of space, but is good to know and I may possibly pursue this solution as well. Just starting to look at the code now, do you remember how substantial the change was? Are there any other options? -- View this message in context: http://lucene.472066.n3.nabble.com/Highlighting-externally-stored-text-tp4078387p4081719.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: Alternative searches
Hi Mark Yes, it is something we implemented also. We just try various subsets of the search terms when there are zero results. To increase performance for all these searches we return only the first three results and no facets so we can simply display the result counts for the various subsets of the original search terms. We only do this if the first search had zero results and then a double metaphone search (which is how we handle misspelled terms) also returned nothing. We also apply various heuristics to the alternative searches being performed like no one word searches if the original search had many words etc Thanks Robi -Original Message- From: Mark [mailto:static.void@gmail.com] Sent: Wednesday, July 31, 2013 10:35 AM To: solr-user@lucene.apache.org Subject: Alternative searches Can someone explain how one would go about providing alternative searches for a query... similar to Amazon. For example say I search for Red Dump Truck - 0 results for Red Dump Truck - 500 results for Red Truck - 350 results for Dump Truck Does this require multiple searches? Thanks
RE: queryResultCache showing all zeros
Ok I might have found an Solr issue after I fixed a problem in our system. This the kind of query we are making: http://10.0.5.214:8201/solr/Current/select?fq=position_refreshed_date_id:[2747%20TO%203501]fq=position_soc_2011_8_code:41101100fq=country_id:1fq=position_job_type_id:4fq=position_education_level_id:8fq=position_salary_range_id:2fq=is_dirty:falsefq=is_staffing:falsefq=-position_soc_2011_2_code:99fq=-covering_source_id:(839%20OR%201145%20OR%2025%20OR%20802%20OR%20777%20OR%2085%20OR%20881%20OR%20775%20OR%201558%20OR%20743%20OR%20800%20OR%201580%20OR%201147%20OR%201690%20OR%20674%20OR%20894%20OR%20791)q=%20(title:photographer%20OR%20ad_description:photographer%20OR%20super_alias:photographer)%20AND%20(_val_:%22sum(product(75,div(5000,sum(50,sub(3500,position_refreshed_date_id,product(0.75,job_score),product(0.75,source_score))%22)facet=truefacet.mincount=1f.state_id.facet.limit=10facet.field=state_idfacet.field=position_salary_range_idfacet.field=position_job_type_idfacet.field=position_naics_6_codefacet.field=place_idfacet.field=position_education_level_idfacet.field=position_soc_2011_8_codef.position_salary_range_id.facet.limit=10f.position_job_type_id.facet.limit=10f.position_naics_6_code.facet.limit=10f.place_id.facet.limit=10f.position_education_level_id.facet.limit=10f.position_soc_2011_8_code.facet.limit=10rows=10start=0fl=job_id,position_id,super_alias_id,advertiser,super_alias,credited_source_id,position_first_seen_date_id,position_last_seen_date_id,%20position_posted_date_id,%20position_refreshed_date_id,%20position_job_type_id,%20position_function_id,position_green_code,title_id,semi_clean_title_id,clean_title_id,position_empl_count,place_id,%20state_id,county_id,msa_id,country_id,position_id,position_job_type_mva,%20ad_activity_status_id,%20position_score,%20ad_score,position_salary,position_salary_range_id,position_salary_source,position_naics_6_code,position_education_level_id,%20is_staffing,is_bulk,is_anonymous,is_third_party,is_dirty,ref_num,tags,lat,long,position_duns_number,url,advertiser_id,%20title,%20semi_clean_title,%20ad_description,%20position_description,%20ad_bls_salary,%20position_bls_salary,%20covering_source_id,%20content_model_id,position_soc_2011_8_code,position_noc_2006_4_idgroup.field=position_idgroup=truegroup.ngroups=truegroup.main=truesort=score%20desc it's quite long but this request uses both faceting and grouping. If I remove the grouping then the cache is used. Is this a normal behavior or a bug? Thanks From: Jean-Sebastien Vachon Sent: Wednesday, July 31, 2013 2:38 PM To: solr-user@lucene.apache.org Subject: RE: queryResultCache showing all zeros Looks like the problem might not be related to Solr but to a proprietary system we have on top of it. I made some queries with facets and the cache was updated. We are looking into this... I should not have assumed that the problem was coming from Solr ;) I'll let you know if there is anything From: Chris Hostetter Sent: Wednesday, July 31, 2013 1:58 PM To: solr-user@lucene.apache.org Subject: Re: queryResultCache showing all zeros : We just configured a new Solr cloud (5 nodes) running Solr 4.3, ran : about 200 000 queries taken from our production environment and measured : the performance of the cloud over a collection of 14M documents with the : default Solr settings. We are now trying to tune the different caches : and when I look at each node of the cloud, all of them are showing no : activity (see below) regarding the queryResultCache... all other caches : are showing some activity. Any idea what could cause this? Can you show us some examples of hte types of queries you are executing? Do you have useFilterForSortedQuery in your solrconfig.xml ? -Hoss
RE: queryResultCache showing all zeros
Also we do not have any useFilterForSortedQuery in our config. So we are relying on the default which I guess is false. From: Jean-Sebastien Vachon Sent: Wednesday, July 31, 2013 3:44 PM To: solr-user@lucene.apache.org Subject: RE: queryResultCache showing all zeros Ok I might have found an Solr issue after I fixed a problem in our system. This the kind of query we are making: http://10.0.5.214:8201/solr/Current/select?fq=position_refreshed_date_id:[2747%20TO%203501]fq=position_soc_2011_8_code:41101100fq=country_id:1fq=position_job_type_id:4fq=position_education_level_id:8fq=position_salary_range_id:2fq=is_dirty:falsefq=is_staffing:falsefq=-position_soc_2011_2_code:99fq=-covering_source_id:(839%20OR%201145%20OR%2025%20OR%20802%20OR%20777%20OR%2085%20OR%20881%20OR%20775%20OR%201558%20OR%20743%20OR%20800%20OR%201580%20OR%201147%20OR%201690%20OR%20674%20OR%20894%20OR%20791)q=%20(title:photographer%20OR%20ad_description:photographer%20OR%20super_alias:photographer)%20AND%20(_val_:%22sum(product(75,div(5000,sum(50,sub(3500,position_refreshed_date_id,product(0.75,job_score),product(0.75,source_score))%22)facet=truefacet.mincount=1f.state_id.facet.limit=10facet.field=state_idfacet.field=position_salary_range_idfacet.field=position_job_type_idfacet.field=position_naics_6_codefacet.field=place_idfacet.field=position_education_level_idfacet.field=position_soc_2011_8_codef.position_salary_range_id.facet.limit=10f.position_job_type_id.facet.limit=10f.position_naics_6_code.facet.limit=10f.place_id.facet.limit=10f.position_education_level_id.facet.limit=10f.position_soc_2011_8_code.facet.limit=10rows=10start=0fl=job_id,position_id,super_alias_id,advertiser,super_alias,credited_source_id,position_first_seen_date_id,position_last_seen_date_id,%20position_posted_date_id,%20position_refreshed_date_id,%20position_job_type_id,%20position_function_id,position_green_code,title_id,semi_clean_title_id,clean_title_id,position_empl_count,place_id,%20state_id,county_id,msa_id,country_id,position_id,position_job_type_mva,%20ad_activity_status_id,%20position_score,%20ad_score,position_salary,position_salary_range_id,position_salary_source,position_naics_6_code,position_education_level_id,%20is_staffing,is_bulk,is_anonymous,is_third_party,is_dirty,ref_num,tags,lat,long,position_duns_number,url,advertiser_id,%20title,%20semi_clean_title,%20ad_description,%20position_description,%20ad_bls_salary,%20position_bls_salary,%20covering_source_id,%20content_model_id,position_soc_2011_8_code,position_noc_2006_4_idgroup.field=position_idgroup=truegroup.ngroups=truegroup.main=truesort=score%20desc it's quite long but this request uses both faceting and grouping. If I remove the grouping then the cache is used. Is this a normal behavior or a bug? Thanks From: Jean-Sebastien Vachon Sent: Wednesday, July 31, 2013 2:38 PM To: solr-user@lucene.apache.org Subject: RE: queryResultCache showing all zeros Looks like the problem might not be related to Solr but to a proprietary system we have on top of it. I made some queries with facets and the cache was updated. We are looking into this... I should not have assumed that the problem was coming from Solr ;) I'll let you know if there is anything From: Chris Hostetter Sent: Wednesday, July 31, 2013 1:58 PM To: solr-user@lucene.apache.org Subject: Re: queryResultCache showing all zeros : We just configured a new Solr cloud (5 nodes) running Solr 4.3, ran : about 200 000 queries taken from our production environment and measured : the performance of the cloud over a collection of 14M documents with the : default Solr settings. We are now trying to tune the different caches : and when I look at each node of the cloud, all of them are showing no : activity (see below) regarding the queryResultCache... all other caches : are showing some activity. Any idea what could cause this? Can you show us some examples of hte types of queries you are executing? Do you have useFilterForSortedQuery in your solrconfig.xml ? -Hoss
RE: queryResultCache showing all zeros
: it's quite long but this request uses both faceting and grouping. If I : remove the grouping then the cache is used. Is this a normal behavior or : a bug? I believe that is expected -- i don't think grouping can take advantage of the queryResultCache because of how it collects documents. there is however a group.cache.percent option tha you might look into -- but i honestly have no idea if that toggles the use of queryResultCache or something else, i havn't played with it before... https://wiki.apache.org/solr/FieldCollapsing#Request_Parameters -Hoss
Re: Performance question on Spatial Search
the list of IDs does change relatively frequently, but this doesn't seem to have very much impact on the performance of the query as far as I can tell. attached are the stacks thanks, steve On Wed, Jul 31, 2013 at 6:33 AM, Mikhail Khludnev mkhlud...@griddynamics.com wrote: On Wed, Jul 31, 2013 at 1:10 AM, Steven Bower sbo...@alcyon.net wrote: not sure what you mean by good hit raitio? I mean such queries are really expensive (even on cache hit), so if the list of ids changes every time, it never hit cache and hence executes these heavy queries every time. It's well known performance problem. Here are the stacks... they seems like hotspots, and shows index reading that's reasonable. But I can't see what caused these readings, to get that I need whole stack of hot thread. Name Time (ms) Own Time (ms) org.apache.lucene.search.MultiTermQueryWrapperFilter.getDocIdSet(AtomicReaderContext, Bits) 300879 203478 org.apache.lucene.codecs.lucene41.Lucene41PostingsReader$BlockDocsEnum.nextDoc() 45539 19 org.apache.lucene.codecs.lucene41.Lucene41PostingsReader$BlockDocsEnum.refillDocs() 45519 40 org.apache.lucene.codecs.lucene41.Lucene41PostingsReader.readVIntBlock(IndexInput, int[], int[], int, boolean) 24352 0 org.apache.lucene.store.DataInput.readVInt() 24352 24352 org.apache.lucene.codecs.lucene41.ForUtil.readBlock(IndexInput, byte[], int[]) 21126 14976 org.apache.lucene.store.ByteBufferIndexInput.readBytes(byte[], int, int) 6150 0 java.nio.DirectByteBuffer.get(byte[], int, int) 6150 0 java.nio.Bits.copyToArray(long, Object, long, long, long) 6150 6150 org.apache.lucene.codecs.BlockTreeTermsReader$FieldReader$SegmentTermsEnum.docs(Bits, DocsEnum, int) 35342 421 org.apache.lucene.codecs.BlockTreeTermsReader$FieldReader$SegmentTermsEnum$Frame.decodeMetaData() 34920 27939 org.apache.lucene.codecs.lucene41.Lucene41PostingsReader.nextTerm(FieldInfo, BlockTermState) 6980 6980 org.apache.lucene.codecs.BlockTreeTermsReader$FieldReader$SegmentTermsEnum.next() 14129 1053 org.apache.lucene.codecs.BlockTreeTermsReader$FieldReader$SegmentTermsEnum$Frame.loadNextFloorBlock() 5948 261 org.apache.lucene.codecs.BlockTreeTermsReader$FieldReader$SegmentTermsEnum$Frame.loadBlock() 5686 199 org.apache.lucene.store.ByteBufferIndexInput.readBytes(byte[], int, int) 3606 0 java.nio.DirectByteBuffer.get(byte[], int, int) 3606 0 java.nio.Bits.copyToArray(long, Object, long, long, long) 3606 3606 org.apache.lucene.codecs.lucene41.Lucene41PostingsReader.readTermsBlock(IndexInput, FieldInfo, BlockTermState) 1879 80 org.apache.lucene.store.ByteBufferIndexInput.readBytes(byte[], int, int) 1798 0java.nio.DirectByteBuffer.get(byte[], int, int) 1798 0 java.nio.Bits.copyToArray(long, Object, long, long, long) 1798 1798 org.apache.lucene.codecs.BlockTreeTermsReader$FieldReader$SegmentTermsEnum$Frame.next() 4010 3324 org.apache.lucene.codecs.BlockTreeTermsReader$FieldReader$SegmentTermsEnum$Frame.nextNonLeaf() 685 685 org.apache.lucene.codecs.BlockTreeTermsReader$FieldReader$SegmentTermsEnum$Frame.loadBlock() 3117 144 org.apache.lucene.store.ByteBufferIndexInput.readBytes(byte[], int, int) 1861 0java.nio.DirectByteBuffer.get(byte[], int, int) 1861 0 java.nio.Bits.copyToArray(long, Object, long, long, long) 1861 1861 org.apache.lucene.codecs.lucene41.Lucene41PostingsReader.readTermsBlock(IndexInput, FieldInfo, BlockTermState) 1090 19 org.apache.lucene.store.ByteBufferIndexInput.readBytes(byte[], int, int) 1070 0 java.nio.DirectByteBuffer.get(byte[], int, int) 1070 0 java.nio.Bits.copyToArray(long, Object, long, long, long) 1070 1070 org.apache.lucene.codecs.BlockTreeTermsReader$FieldReader$SegmentTermsEnum.initIndexInput() 20 0org.apache.lucene.store.ByteBufferIndexInput.clone() 20 0 org.apache.lucene.store.ByteBufferIndexInput.clone() 20 0 org.apache.lucene.store.ByteBufferIndexInput.buildSlice(long, long) 20 0 org.apache.lucene.util.WeakIdentityMap.put(Object, Object) 20 0 org.apache.lucene.util.WeakIdentityMap$IdentityWeakReference.init(Object, ReferenceQueue) 20 0 java.lang.System.identityHashCode(Object) 20 20 org.apache.lucene.index.FilteredTermsEnum.docs(Bits, DocsEnum, int) 1485 527 org.apache.lucene.codecs.BlockTreeTermsReader$FieldReader$SegmentTermsEnum.docs(Bits, DocsEnum, int) 957 0 org.apache.lucene.codecs.BlockTreeTermsReader$FieldReader$SegmentTermsEnum$Frame.decodeMetaData() 957 513 org.apache.lucene.codecs.lucene41.Lucene41PostingsReader.nextTerm(FieldInfo, BlockTermState) 443 443 org.apache.lucene.index.FilteredTermsEnum.next() 874 324 org.apache.lucene.search.NumericRangeQuery$NumericRangeTermsEnum.accept(BytesRef) 368 0 org.apache.lucene.util.BytesRef$UTF8SortedAsUnicodeComparator.compare(Object, Object) 368
Re: Auto Correction of Solr Query
Hi Siva, I think I mention this several days ago... DYM ReSearcher will do that: http://sematext.com/products/dym-researcher/index.html Otis On Tuesday, July 30, 2013, sivaprasad wrote: Hi, Is there any way to auto correct the Solr query and get the results? For example, user tries to search for beats by dre , but by mistake , he typed beats bt dre. In this case, Solr should correct the query and return the results for beats by dre. Is there any suggestions, how we can achieve this? -Siva -- View this message in context: http://lucene.472066.n3.nabble.com/Auto-Correction-of-Solr-Query-tp4081220.html Sent from the Solr - User mailing list archive at Nabble.com. -- Otis -- Solr ElasticSearch Support -- http://sematext.com/ Performance Monitoring -- http://sematext.com/spm
RE: Highlighting externally stored text
Just an update. Change was pretty straight forward (at least for my simple test case) just a few lines in the getBestFragments method seemed to do the trick. -- View this message in context: http://lucene.472066.n3.nabble.com/Highlighting-externally-stored-text-tp4078387p4081748.html Sent from the Solr - User mailing list archive at Nabble.com.
Inconsistent facet ranges when using distributed search in Solr 4.3
Hi all, I am seeing some inconsistent behavior with facets, specifically range facets, on Solr 4.3. Running the same query several times (pressing F5 on the browser) produces different facet ranges when doing distributed searches, as some times it doesn't include some of the buckets. The results of the search are always correct as far as I can tell, it is just the range facets that sometimes miss ranges . Has anyone seen this behavior in Solr before? Any recommendations on how to troubleshoot this issue? Here are some details and an example: As an example of what I am seeing, take this query, in which I'll be faceting on the docnumber field: http://SERVER:8081/solr/shard1/myhandler? shards=SERVER:8081/solr/shard1,SERVER:8081/solr/shard2,SERVER:8081/solr/shard3 shards.qt=myhandler facet=true facet.field=docnumber f.docnumber.facet.sort=index facet.range=docnumber f.docnumber.facet.range.start=0 f.docnumber.facet.range.gap=100 f.docnumber.facet.range.end=10 f.docnumber.facet.limit=1000 facet.mincount=1 q=type:document wt=xml When I run it, I get one of the following three response, seemingly at random (haven't been able to notice a pattern so far): 1. Get 859 results (correct), but nothing on the facet ranges: ... result name=response numFound=859 start=0 maxScore=8.006225 ... lst name=facet_ranges lst name=docnumber lst name=counts/ int name=gap100/int int name=start0/int int name=end10/int /lst /lst 2. Get 859 results (correct), and the correct number of facets come up in the facet ranges (118+109+119+122+134+100+100+57=859): ... result name=response numFound=859 start=0 maxScore=8.006225 ... lst name=facet_ranges lst name=docnumber lst name=counts int name=0118/int int name=100109/int int name=200119/int int name=300122/int int name=400134/int int name=500100/int int name=600100/int int name=70057/int /lst int name=gap100/int int name=start0/int int name=end10/int /lst /lst 3. Get 859 results (correct), and only a partial number of facet ranges (118+109+119+122+134=602 vs. 859 results): ... result name=response numFound=859 start=0 maxScore=8.006225 ... lst name=facet_ranges lst name=docnumber lst name=counts int name=0118/int int name=100109/int int name=200119/int int name=300122/int int name=400134/int /lst int name=gap100/int int name=start0/int int name=end10/int /lst /lst I am using Solr 4.3 (4.3.0 1477023), with these parameters: Facet-related: facet=true facet.field=docnumber f.docnumber.facet.sort=index facet.range=docnumber f.docnumber.facet.range.start=0 f.docnumber.facet.range.gap=100 f.docnumber.facet.range.end=10 f.docnumber.facet.limit=1000 facet.mincount=1 For distributed search (environment has 3 cores in the same box): shards=SERVER:8081/solr/shard1,SERVER:8081/solr/shard2,SERVER:8081/solr/shard3 shards.qt=myhandler And the query: q=type:document wt=xml It is also worth noting that the facet field section does come up with the correct facets, the issue seems to be related only to the facet ranges (unless I am missing something). In the responses for all three examples above, the facet_fields list has all the values for docnumber, from 1 to 756, even if the facet ranges are missing buckets. lst name=facet_fields lst name=docnumber int name=11/int int name=22/int ... (continues on from 3 to 754) ... int name=7551/int int name=7561/int /lst /lst Thanks, Jose.
RE: Highlighting externally stored text
Hey Bryan, Thanks for the response! To make use of the FastVectorHighlighter you need to enable termVectors, termPositions, and termOffsets correct? Which takes a considerable amount of space, but is good to know and I may possibly pursue this solution as well. Just starting to look at the code now, do you remember how substantial the change was? Are there any other options? John, Yes, you do need to enable those, and yes, it takes a considerable amount of space. It has been a while, but the change itself was not too bad, mostly at the top level, isolating an interface that returns the structure you need, and transposing that into something for Solr to return. The only other issues are around queries. If FVH supports all the queries you use, great. If it's just missing something simple to deal with, like DisjunctionMaxQuery, then it's just adding another rewrite call. But if you are using the SpanQuery hierarchy, it's much trickier. I did in fact do an implementation for that, but it was not very satisfactory -- transposing unordered SpanNearQuery into the representation used by FVH was an O(n!) operation, and the complexity of the implementation was quite high, for a number of reasons including lack of FVH representation for mixed-slop phrases. I don't know of other options -- except for the one I finally wound up doing, which was writing my own highlighter, which unfortunately I am not in a position to share for reasons not my own. But the main reason for that was the SpanNearQuery support, which may not be a problem you have. It's possible that something similar could be done with the Postings highlighter, but I did not look too deeply into that, because the lack of phrase support was a blocker. -- Bryan
Re: Measuring SOLR performance
No, I haven't had time for that (and unlikely won't have for the next few weeks), but it is on the list - if it is 25% improvement, it would be really worth of the change to G1. Thanks, roman On Wed, Jul 31, 2013 at 1:00 PM, Markus Jelsma markus.jel...@openindex.iowrote: Did you also test indexing speed? With default G1GC settings we're seeing a slightly higher latency for queries than CMS. However, G1GC allows for much higher throughput than CMS when indexing. I haven't got the raw numbers here but it is roughly 45 minutes against 60 in favour of G1GC! Load is obviously higher with G1GC. -Original message- From:Roman Chyla roman.ch...@gmail.com Sent: Wednesday 31st July 2013 18:32 To: solr-user@lucene.apache.org Subject: Re: Measuring SOLR performance I'll try to run it with the new parameters and let you know how it goes. I've rechecked details for the G1 (default) garbage collector run and I can confirm that 2 out of 3 runs were showing high max response times, in some cases even 10secs, but the customized G1 never - so definitely the parameters had effect because the max time for the customized G1 never went higher than 1.5secs (and that happend for 2 query classes only). Both the cms-custom and G1-custom are similar, the G1 seems to have higher values in the max fields, but that may be random. So, yes, now I am sure what to think of default G1 as 'bad', and that these G1 parameters, even if they don't seem G1 specific, have real effect. Thanks, roman On Tue, Jul 30, 2013 at 11:01 PM, Shawn Heisey s...@elyograg.org wrote: On 7/30/2013 6:59 PM, Roman Chyla wrote: I have been wanting some tools for measuring performance of SOLR, similar to Mike McCandles' lucene benchmark. so yet another monitor was born, is described here: http://29min.wordpress.com/2013/07/31/measuring-solr-query-performance/ I tested it on the problem of garbage collectors (see the blogs for details) and so far I can't conclude whether highly customized G1 is better than highly customized CMS, but I think interesting details can be seen there. Hope this helps someone, and of course, feel free to improve the tool and share! I have a CMS config that's even more tuned than before, and it has made things MUCH better. This new config is inspired by more info that I got on IRC: http://wiki.apache.org/solr/ShawnHeisey#GC_Tuning The G1 customizations in your blog post don't look like they are really G1-specific - they may be useful with CMS as well. This statement also applies to some of the CMS parameters, so I would use those with G1 as well for any testing. UseNUMA looks interesting for machines that actually are NUMA. All the information that I can find says it is only for the throughput (parallel) collector, so it's probably not doing anything for G1. The pause parameters you've got for G1 are targets only. It will *try* to stick within those parameters, but if a collection requires more than 50 milliseconds or has to happen more often than once a second, the collector will ignore what you have told it. Thanks, Shawn
Re: queryResultCache showing all zeros
On Wed, Jul 31, 2013 at 3:49 PM, Chris Hostetter hossman_luc...@fucit.org wrote: there is however a group.cache.percent option tha you might look into -- but i honestly have no idea if that toggles the use of queryResultCache or something else, i havn't played with it before... That's only a single-request cache (caches some ids/scores within a single request and is not reused across different requests). -Yonik http://lucidworks.com
Re: SolrCloud and Joins
Thanks Walter, Existing media sets will rarely change but new media sets will be added relatively frequently. (There is a many to many relationship between media sets and media sources.) Given the size of data, a new Media Set that only includes 1% of the collection would include 6 million rows. Our data is stored in a Postgresql database and imported using the dataImportHandler. It takes around 3 days to fully import the data. In the single shard case, the nice thing about using joins is that the media set to source mapping data could be updated using an hourly cron job while the sentence data could be updated using a delta query. The obvious alternative to joins is to add the media_sets_id to the sentence data as a multi-value field. We'll benchmark this. But my concern is that importing the full data will take even longer and that there will be no easy way to automatically update each affected row when a new media set is created. (I could write a separate one-off query for DataImportHandler each time a new media set is added but this requires a lot of manual interaction.) Does SolrCloud really not have a simple way to specify which shard to put a document on? I'm considering randomly generating document ID prefixes and then taking their murmurhash to determine what shards they correspond to. I could then explicitly send documents to a particular shard by specifying a document ID prefix. However, this seems like a hackish approach. Is there a better way? On Mon, Jul 29, 2013 at 12:45 PM, Walter Underwood wun...@wunderwood.orgwrote: A join may seem clean, but it will be slow and (currently) doesn't work in a cluster. You find all the sentences in a media set by searching for that set id and requesting only the sentence_id (yes, you need that). Then you reindex them. With small documents like this, it is probably fairly fast. If you can't estimate how often the media sets will change or the size of the changes, then you aren't ready to choose a design. wunder On Jul 29, 2013, at 8:41 AM, David Larochelle wrote: We'd like to be able to easily update the media set to source mapping. I'm concerned that if we store the media_sets_id in the sentence documents, it will be very difficult to add additional media set to source mapping. I imagine that adding a new media set would either require reimporting all 600 million documents or writing complicated application logic to find out which sentences to update. Hence joins seem like a cleaner solution. -- David On Mon, Jul 29, 2013 at 11:22 AM, Walter Underwood wun...@wunderwood.orgwrote: Denormalize. Add media_set_id to each sentence document. Done. wunder On Jul 29, 2013, at 7:58 AM, David Larochelle wrote: I'm setting up SolrCloud with around 600 million documents. The basic structure of each document is: stories_id: integer, media_id: integer, sentence: text_en We have a number of stories from different media and we treat each sentence as a separate document because we need to run sentence level analytics. We also have a concept of groups or sets of sources. We've imported this media source to media sets mapping into Solr using the following structure: media_id_inner: integer, media_sets_id: integer For the single node case, we're able to filter our sources by media_set_id using a join query like the following: http://localhost:8983/solr/select?q={!join+from=media_id_inner+to=media_id}media_sets_id:1http://localhost:8983/solr/select?q=%7B!join+from=media_id_inner+to=media_id%7Dmedia_sets_id:1 http://localhost:8983/solr/select?q=%7B!join+from=media_id_inner+to=media_id%7Dmedia_sets_id:1 However, this does not work correctly with SolrCloud. The problem is that the join query is performed separately on each of the shards and no shard has the complete media set to source mapping data. So SolrCloud returns incomplete results. Since the complete media set to source mapping data is comparatively small (~50,000 rows), I would like to replicate it on every shard. So that the results of the individual join queries on separate shards would be equivalent to performing the same query on a single shard system. However, I'm can't figure out how to replicate documents on separate shards. The compositeID router has the ability to colocate documents based on a prefix in the document ID but this isn't what I need. What I would like is some way to either have the media set to source data replicated on every shard or to be able to explicitly upload this data to the individual shards. (For the rest of the data I like the compositeID autorouting.) Any suggestions? -- Thanks, David -- Walter Underwood wun...@wunderwood.org -- Walter Underwood wun...@wunderwood.org
Re: FieldCollapsing issues in SolrCloud 4.4
If your issue is that you want to retrieve the number of groups, group.ngroups will return the sum of the number of groups per shard. This is not the number of groups overall as there if some groups are present on more than one shard. To make sure that this does not happen, one can choose to distribute documents so that all the documents with the same group key goes to the same shard. (Disclaimer : Before doing so, you need to make sure that your documents will still be spread about equally.) You can check out how to do that here https://cwiki.apache.org/confluence/display/solr/Shards+and+Indexing+Data+in+SolrCloud On Wed, Jul 31, 2013 at 8:02 PM, Ali, Saqib docbook@gmail.com wrote: Hello Paul, Can you please explain what you mean by: To get the exact number of groups, you need to shard along your grouping field Thanks! :) On Wed, Jul 31, 2013 at 3:08 AM, Paul Masurel paul.masu...@gmail.com wrote: Do you mean you get different results with group=true? numFound is supposed returns the number of ungrouped hits. To get the number of groups, you are expected to set set group.ngroups=true. Even then, the result will only give you an upperbound in a distributed environment. To get the exact number of groups, you need to shard along your grouping field. If you have many groups, you may also experience a huge performance hit, as the current implementation has been heaviy optimized for low number of groups (e.g. e-commerce categories). Paul On Wed, Jul 31, 2013 at 1:59 AM, Ali, Saqib docbook@gmail.com wrote: Hello all, Is anyone experiencing issues with the numFound when using group=true in SolrCloud 4.4? Sometimes the results are off for us. I will post more details shortly. Thanks. -- __ Masurel Paul e-mail: paul.masu...@gmail.com -- __ Masurel Paul e-mail: paul.masu...@gmail.com
no servers hosting shard
I have setup solr cloud and when I try to access documents I get this error, lst name=errorstr name=msgno servers hosting shard: /strint name=code503/int/lst However if I add shards=shard1 param it works. -- View this message in context: http://lucene.472066.n3.nabble.com/no-servers-hosting-shard-tp4081783.html Sent from the Solr - User mailing list archive at Nabble.com.
debian package for solr with jetty
Hi, I am trying to create a debian package for solr 4.3 (default installation with jetty). Is there anything already available? Also, I need 3 different cores so plan to create corresponding packages for each of them to create solr core using admin/cores or collections api. I also want to use, solrcloud setup with external zookeeper ensemble, whats the best way to create a debian package for updating zookeeper config files as well? Please suggest. Any pointers will be helpful. Thanks, -Manasi -- View this message in context: http://lucene.472066.n3.nabble.com/debian-package-for-solr-with-jetty-tp4081784.html Sent from the Solr - User mailing list archive at Nabble.com.
Proposal/request for comments: Solr schema annotation
In thinking about making the entire Solr schema REST-API-addressable (SOLR-4898), I'd like to be able to add arbitrary metadata at both the top level of the schema and at each leaf node, and allow read/write access to that metadata via the REST API. Some uses I've thought of for such a facility: 1. The managed schema now drops XML comments from schema.xml upon conversion to managed-schema format, but it would be much better if these were somehow preserved, as well as round-trippable when retrieving the schema and its constituents via the REST API. 2. Some comments in the example schemas don't refer to just one or to all leaf nodes, but rather to a group of them. I'd like to be able to group nodes by adding same-named tags to multiple nodes, and also have a top-level (optional) tag description - this description could then be presented with tagged nodes in various output formats. 3. Some comments in the example schema are documentation about a feature, e.g. copyFields. A top-level documentation annotation could take a leaf node element name (or maybe an XPath? probably overkill) and apply to all matching elements. 4. When modifying the schema via REST API, a last-modified annotation could be automatically added. 5. There were a couple of user complaints recently when schema.xml parsing was tightened to disallow unknown attributes on field declarations (SOLR-4641): people were storing their own information there. User-level metadata would support this in a round-trippable way - I'm thinking we could restrict it to flat string-typed key/value pairs, with no nested structure. W3C XML Schema has a similar facility: http://www.w3.org/TR/2004/REC-xmlschema-1-20041028/structures.html#element-annotation. Thoughts? Some concrete examples of what I'm thinking of in schema.xml format (syntax/naming as yet unsettled): schema name=example version=1.5 annotation description element=tag content=plain-numeric-field-types Plain numeric field types store and index the text value verbatim. /description documentation element=copyField copyField commands copy one field to another at the time a document is added to the index. It's used either to index the same field differently, or to add multiple fields to the same field for easier/faster searching. /documentation last-modified2014-03-08T12:14:02Z/last-modified … /annotation … fieldType name=pint class=solr.IntField annotation tagplain-numeric-field-types/tag /annotation /fieldType fieldType name=plong class=solr.LongField annotation tagplain-numeric-field-types/tag /annotation /fieldType … copyField source=cat dest=text annotation todoShould this field really be copied to the catchall text field?/todo /annotation /copyField … field name=text type=text_general annotation descriptioncatchall field/description visibilitypublic/visibility /annotation /field
Re: Proposal/request for comments: Solr schema annotation
An annotation field would be much better than the current anything goes schema-less schema.xml. Has anyone built an XML Schema for schema.xml? I know it is extensible, but it would be worth a try. wunder On Jul 31, 2013, at 6:21 PM, Steve Rowe wrote: In thinking about making the entire Solr schema REST-API-addressable (SOLR-4898), I'd like to be able to add arbitrary metadata at both the top level of the schema and at each leaf node, and allow read/write access to that metadata via the REST API. Some uses I've thought of for such a facility: 1. The managed schema now drops XML comments from schema.xml upon conversion to managed-schema format, but it would be much better if these were somehow preserved, as well as round-trippable when retrieving the schema and its constituents via the REST API. 2. Some comments in the example schemas don't refer to just one or to all leaf nodes, but rather to a group of them. I'd like to be able to group nodes by adding same-named tags to multiple nodes, and also have a top-level (optional) tag description - this description could then be presented with tagged nodes in various output formats. 3. Some comments in the example schema are documentation about a feature, e.g. copyFields. A top-level documentation annotation could take a leaf node element name (or maybe an XPath? probably overkill) and apply to all matching elements. 4. When modifying the schema via REST API, a last-modified annotation could be automatically added. 5. There were a couple of user complaints recently when schema.xml parsing was tightened to disallow unknown attributes on field declarations (SOLR-4641): people were storing their own information there. User-level metadata would support this in a round-trippable way - I'm thinking we could restrict it to flat string-typed key/value pairs, with no nested structure. W3C XML Schema has a similar facility: http://www.w3.org/TR/2004/REC-xmlschema-1-20041028/structures.html#element-annotation. Thoughts? Some concrete examples of what I'm thinking of in schema.xml format (syntax/naming as yet unsettled): schema name=example version=1.5 annotation description element=tag content=plain-numeric-field-types Plain numeric field types store and index the text value verbatim. /description documentation element=copyField copyField commands copy one field to another at the time a document is added to the index. It's used either to index the same field differently, or to add multiple fields to the same field for easier/faster searching. /documentation last-modified2014-03-08T12:14:02Z/last-modified … /annotation … fieldType name=pint class=solr.IntField annotation tagplain-numeric-field-types/tag /annotation /fieldType fieldType name=plong class=solr.LongField annotation tagplain-numeric-field-types/tag /annotation /fieldType … copyField source=cat dest=text annotation todoShould this field really be copied to the catchall text field?/todo /annotation /copyField … field name=text type=text_general annotation descriptioncatchall field/description visibilitypublic/visibility /annotation /field - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org -- Walter Underwood wun...@wunderwood.org