Early Access Release #3 for Solr 4.x Deep Dive book is now available for download on Lulu.com
Okay, it’s hot off the e-presses: Solr 4.x Deep Dive, Early Access Release #3 is now available for purchase and download as an e-book for $9.99 on Lulu.com at: http://www.lulu.com/shop/jack-krupansky/solr-4x-deep-dive-early-access-release-1/ebook/product-21079719.html (That link says “1”, but it apparently correctly redirects to EAR #3.) My recent blog posts over the past two weeks detailed the changes from EAR#2. Besides more cleanup, the focus was on features of Solr 4.4, including update processors and token filters. I still haven’t finished 4.4 coverage, but this is progress. See: http://basetechnology.blogspot.com/ The next EAR will be in approximately two weeks, contents TBD. If you have purchased EAR#1 or #2, there is no need to rush out and pick up EAR#3. I mean, the technical content changes were relatively modest (68 new pages), and EAR#4 will be out in another two weeks anyway. That said, EAR#3 is a significant improvement over EAR#1 and EAR#2. -- Jack Krupansky
Re: Sort by document similarity counts
Not sure if it will work. Say we have SearchComponent which does this in process method: 1. DocList docs = rb.getResults().docList; 2. Go over docs and for each doc do: 3. BooleanQuery q = new BooleanQuery(); //construct a query which gets all docs which are not equal to current one and are from a different host (we deal there with web pages) q.add(new TermQuery(new Term(host, host)), BooleanClause.Occur.MUST_NOT); q.add(new TermQuery(new Term(id, name)), BooleanClause.Occur.MUST_NOT); DocListAndSet sim = searcher.getDocListAndSet( q, (TermQuery) null, null, 0, 1000); //TODO how to set proper limit not hard-coded 1000 4. for all docs in sim calculate similarity to current doc (from #2) 5. Count all similar documents and add a new field FieldType ft = new FieldType(); ft.setStored(true); ft.setIndexed(true); Field f = new IntField(similarCount, ds.size(), ft); d.add(f); Now the problem is with #1 this comes in already sorted. That is if I call solr with q=*sort=similarityCount, sort is applied before calling last component, which does all the above defined steps. If I add this to first-components then #1 call will return null. Completely different approach would be to calculate aggregate values on update via UpdateRequestProcessor. But then I need to be able to do searches in update processor (step #3). But in that case docs for searcher are available only after commit. I'd expect that this would work but search always returns 0 public void processCommit(CommitUpdateCommand cmd) throws IOException { TopDocs docs = searcher.search(new MatchAllDocsQuery(), 100); DocListAndSet sim = searcher.getDocListAndSet( new MatchAllDocsQuery(), (TermQuery) null, null, 0, 10); DocList docs = sim.docList; Is always empty (Tried placing it after solr.RunUpdateProcessorFactory in update chain, no change) Even if searcher would work, it looks bad. Because in this case I would need to update not only incoming document but also all those documents which are similar to a current one (That is if A is similar to B and C, then B and C are similar to A, and similarCount field has to be increased in B and C as well). From: Koji Sekiguchi k...@r.email.ne.jp To: solr-user@lucene.apache.org Sent: Thursday, July 18, 2013 4:29 PM Subject: Re: Sort by document similarity counts I have tried doing this via custom SearchComponent, where I can find all similar documents for each document in current search result, then add a new field into document hoping to use sort parameter (q=*sort=similarityCount). I don't understand this part very well, but: But this will not work because sort is done before handling my custom search component, if added via last-components. Can't add it via first-components, because then I will have no access to query results. And I do not want to override QueryComponent because I need to have all the functionality it covers: grouping, facets, etc. You may want to put your custom SearchComponent to last-component and inject SortSpec in your prepare() so that QueryComponent can sort the result complying with your SortSpec? koji -- http://soleami.com/blog/automatically-acquiring-synonym-knowledge-from-wikipedia.html
Solr 4.3.1 - SolrCloud nodes down and lost documents
While indexing some documents to a SolrCloud cluster (10 machines, 5 shards and 2 replicas, so one replica on each machine) one of the replicas stopped receiving documents, while the other replica of the shard continued to grow. That was overnight so I was unable to track exactly what happened (I'm going off our Graphite graphs here). This morning when I was able to look at the cluster both replicas of that shard were marked as down (with one marked as leader). I attempted to restart the non-leader node but it took a long time to restart so I killed it and restarted the old leader, which also took a long time. I killed that one (I'm impatient) and left the non-leader node to restart, not realising it was missing approximately 700k documents that the old leader had. Eventually it restarted and became leader. I restarted the old leader and it dropped the number of documents it had to match the previous non-leader. Is this expected behaviour when a replica with fewer documents is started before the other and elected leader? Should I have been paying more attention to the number of documents on the server before restarting nodes? I am still in the process of tuning the caches and warming for these servers but we are putting some load through the cluster so it is possible that the nodes are having to work quite hard when a new version of the core comes is made available. Is this likely to explain why I occasionally see nodes dropping out? Unfortunately in restarting the nodes I lost the GC logs to see whether that was likely to be the culprit. Is this the sort of situation where you raise the ZooKeeper timeout a bit? Currently the timeout for all nodes is 15 seconds. Are there any known issues which might explain what's happening? I'm just getting started with SolrCloud after using standard master/slave replication for an index which has got too big for one machine over the last few months. Also, is there any particular information that would be helpful to help with these issues if it should happen again?
IDNA Support For Solr
Hi; Is there any support for IDNA at Solr? (IDNA: http://en.wikipedia.org/wiki/Internationalized_domain_name)
RE: IDNA Support For Solr
Hi - What kind of support would you expect Solr to provide? IDN is only about conversion between Unicode in your address bas and ASCII in the DNS. -Original message- From:Furkan KAMACI furkankam...@gmail.com Sent: Friday 19th July 2013 11:09 To: solr-user@lucene.apache.org Subject: IDNA Support For Solr Hi; Is there any support for IDNA at Solr? (IDNA: http://en.wikipedia.org/wiki/Internationalized_domain_name)
Help !
HI, Need help on configuring SOLR search in Alfresco. -- Regards Narasimha Please do not print this email unless it is absolutely necessary. ATTENTION: The information in this electronic mail message is private and confidential, and only intended for the addressee. Should you receive this message by mistake, you are hereby notified that any disclosure, reproduction, distribution or use of this message is strictly prohibited. Please inform the sender by reply transmission and delete the message without copying or opening it. Messages and attachments are scanned for all viruses known. If this message contains password-protected attachments, the files have NOT been scanned for viruses by the ING mail domain. Always scan attachments before opening them. ING Vysya Bank Limited, Regd Corp off: ING Vysya House, # 22, M. G. Road, Bangalore – 560 001. www.ingvysyabank.com.
custom field type plugin
I have a particular use case that I think might require a custom field type, however I am having trouble getting the plugin to work. My use case has to do with genetics data, and we are running into several situations were we need to be able to query multiple regions of a chromosome (or gene, or other object types). All that really boils down to is being able to give a number, e.g. 10234, and return documents that have regions containing the number. So you'd have a document with a list like [1:16090,400:8000,40123:43564], and it should come back because 10234 falls between 1:16090. If there is a better or easier way to do this please speak up. I'd rather not have to use a join on another index, because 1) it's more complex to set up, and 2) we might need to join against something else and you can only do one join at a time. Anyway… I tried creating a field type similar to a PointType just to see if I could get one working. I added the following jars to get it to compile: apache-solr-core-4.0.0,lucene-core-4.0.0,lucene-queries-4.0.0,apache-solr-solrj-4.0.0. I am running solr 4.0.0 on jetty, and put my jar file in a sharedLib folder, and specified it in my solr.xml (I have multiple cores). After starting up solr, I got the line that it picked up the jar: INFO: Adding 'file:/blah/blah/lib/CustomPlugins.jar' to classloader But I get this error about it not being able to find the AbstractSubTypeFieldType class. Here is the first bit of the trace: SEVERE: null:java.lang.NoClassDefFoundError: org/apache/solr/schema/AbstractSubTypeFieldType at java.lang.ClassLoader.defineClass1(Native Method) at java.lang.ClassLoader.defineClass(ClassLoader.java:791) at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142) at java.net.URLClassLoader.defineClass(URLClassLoader.java:449) at java.net.URLClassLoader.access$100(URLClassLoader.java:71) at java.net.URLClassLoader$1.run(URLClassLoader.java:361) at java.net.URLClassLoader$1.run(URLClassLoader.java:355) ...etc… Any hints as to what I did wrong? I can provide source code, or a fuller stack trace, config settings, etc. Also, I did try to unpack the solr.war, stick my jar in WEB-INF/lib, then repack. However, when I did that, I get a NoClassDefFoundError for my plugin itself. Thanks, Kevin The information in this email, including attachments, may be confidential and is intended solely for the addressee(s). If you believe you received this email by mistake, please notify the sender by return email as soon as possible.
Re: Indexing into SolrCloud
Usually EOF errors indicate that the packet you're sending are too big. Wait, though. 50K is not buffered docs, I think it's buffered _requests_. So you're creating a queue that's ginormous and asking 2 threads to empty it. But that's not really the issue I suspect. How many documents are you adding at a time when you call server.add? I.e. are you using sever.add(doc) or server.add(doclist)? If the latter and you're adding a bunch of docs, try lowering that number. If you're sending one doc at a time I'm on the wrong track. Best Erick On Thu, Jul 18, 2013 at 2:51 PM, Beale, Jim (US-KOP) jim.be...@hibu.com wrote: Hey folks, I've been migrating an application which indexes about 15M documents from straight-up Lucene into SolrCloud. We've set up 5 Solr instances with a 3 zookeeper ensemble using HAProxy for load balancing. The documents are processed on a quad core machine with 6 threads and indexed into SolrCloud through HAProxy using ConcurrentUpdateSolrServer in order to batch the updates. The indexing box is heavily-loaded during indexing but I don't think it is so bad that it would cause issues. I'm using Solr 4.3.1 on client and server side, zookeeper 3.4.5 and HAProxy 1.4.22. I've been accepting the default HttpClient with 50K buffered docs and 2 threads, i.e., int solrMaxBufferedDocs = 5; int solrThreadCount = 2; solrServer = new ConcurrentUpdateSolrServer(solrHttpIPAddress, solrMaxBufferedDocs, solrThreadCount); autoCommit is configured in the solrconfig as follows: autoCommit maxTime60/maxTime maxDocs50/maxDocs openSearcherfalse/openSearcher /autoCommit I'm getting the following errors on the client and server sides respectively: Client side: 2013-07-16 19:02:47,002 [concurrentUpdateScheduler-1-thread-4] INFO SystemDefaultHttpClient - I/O exception (java.net.SocketException) caught when processing request: Software caused connection abort: socket write error 2013-07-16 19:02:47,002 [concurrentUpdateScheduler-1-thread-4] INFO SystemDefaultHttpClient - Retrying request 2013-07-16 19:02:47,002 [concurrentUpdateScheduler-1-thread-5] INFO SystemDefaultHttpClient - I/O exception (java.net.SocketException) caught when processing request: Software caused connection abort: socket write error 2013-07-16 19:02:47,002 [concurrentUpdateScheduler-1-thread-5] INFO SystemDefaultHttpClient - Retrying request Server side: 7988753 [qtp1956653918-23] ERROR org.apache.solr.core.SolrCore â java.lang.RuntimeException: [was class org.eclipse.jetty.io.EofException] early EOF at com.ctc.wstx.util.ExceptionUtil.throwRuntimeException(ExceptionUtil.java:18) at com.ctc.wstx.sr.StreamScanner.throwLazyError(StreamScanner.java:731) at com.ctc.wstx.sr.BasicStreamReader.safeFinishToken(BasicStreamReader.java:3657) at com.ctc.wstx.sr.BasicStreamReader.getText(BasicStreamReader.java:809) at org.apache.solr.handler.loader.XMLLoader.readDoc(XMLLoader.java:393) When I disabled autoCommit on the server side, I didn't see any errors there but I still get the issue client-side after about 2 million documents - which is about 45 minutes. Has anyone seen this issue before? I couldn't find anything useful on the usual places. I suppose I could setup wireshark to see what is happening but I'm hoping that someone has a better suggestion. Thanks in advance for any help! Best regards, Jim Beale hibu.com 2201 Renaissance Boulevard, King of Prussia, PA, 19406 Office: 610-879-3864 Mobile: 610-220-3067 The information contained in this email message, including any attachments, is intended solely for use by the individual or entity named above and may be confidential. If the reader of this message is not the intended recipient, you are hereby notified that you must not read, use, disclose, distribute or copy any part of this communication. If you have received this communication in error, please immediately notify me by email and destroy the original message, including any attachments. Thank you. - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org The information contained in this email message, including any attachments, is intended solely for use by the individual or entity named above and may be confidential. If the reader of this message is not the intended recipient, you are hereby notified that you must not read, use, disclose, distribute or copy any part of this communication. If you have received this communication in error, please immediately notify me by email and destroy the original message, including any attachments. Thank you.
Re: Auto-sharding and numShard parameter
First the numShards parameter is only relevant the very first time you create your collection. It's a little confusing because in the SolrCloud examples you're getting collection1 by default. Look further down the SolrCloud Wiki page, the section titled Managing Collections via the Collections API for creating collections with a different name. Either way, either when you run the bootstrap command or when you create a new collection, that's the only time numShards counts. It's ignored the rest of the time. As far as data growing, you need to either 1 create enough shards to handle the eventual size things will be, sometimes called oversharding or 2 use the splitShard capabilities in very recent Solrs to expand capacity. Best Erick On Thu, Jul 18, 2013 at 4:52 PM, Flavio Pompermaier pomperma...@okkam.it wrote: Hi to all, Probably this question has a simple answer but I just want to be sure of the potential drawbacks..when I run SolrCloud I run the main solr instance with the -numShard option (e.g. 2). Then as data grows, shards could potentially become a huge number. If I hadstio to restart all nodes and I re-run the master with the numShard=2, what will happen? It will be just ignored or Solr will try to reduce shards...? Another question...in SolrCloud, how do I restart all the cloud at once? Is it possible? Best, Flavio
Re: IDNA Support For Solr
I mean that: there is a web adress: *çorba.com http://xn--orba-zoa.com* However its IDNA coded version is: *xn--orba-zoa.com* You can check it from here: * http://www.whois.com.tr/?q=%C3%A7orbasldtld=com* Let's assume that I've indexed a web page with that URL: *xn--orba-zoa.com*and one searches that word: *çorba *Than I have to say that there is a URL match for that search. However I've indexed that URL as IDNA coded I will not able to see that URL includes that word: *çorba.* 2013/7/19 Markus Jelsma markus.jel...@openindex.io Hi - What kind of support would you expect Solr to provide? IDN is only about conversion between Unicode in your address bas and ASCII in the DNS. -Original message- From:Furkan KAMACI furkankam...@gmail.com Sent: Friday 19th July 2013 11:09 To: solr-user@lucene.apache.org Subject: IDNA Support For Solr Hi; Is there any support for IDNA at Solr? (IDNA: http://en.wikipedia.org/wiki/Internationalized_domain_name)
RE: IDNA Support For Solr
No, you'll have to index the Unicode version of the domain name. Nutch 1.x already deals with this conversion for you. Or you could create a custom update processor for Solr and code it there. It's quite simple, IDN is in java.net package. -Original message- From:Furkan KAMACI furkankam...@gmail.com Sent: Friday 19th July 2013 14:39 To: solr-user@lucene.apache.org Subject: Re: IDNA Support For Solr I mean that: there is a web adress: *çorba.com http://xn--orba-zoa.com* However its IDNA coded version is: *xn--orba-zoa.com* You can check it from here: * http://www.whois.com.tr/?q=%C3%A7orbasldtld=com* Let's assume that I've indexed a web page with that URL: *xn--orba-zoa.com*and one searches that word: *çorba *Than I have to say that there is a URL match for that search. However I've indexed that URL as IDNA coded I will not able to see that URL includes that word: *çorba.* 2013/7/19 Markus Jelsma markus.jel...@openindex.io Hi - What kind of support would you expect Solr to provide? IDN is only about conversion between Unicode in your address bas and ASCII in the DNS. -Original message- From:Furkan KAMACI furkankam...@gmail.com Sent: Friday 19th July 2013 11:09 To: solr-user@lucene.apache.org Subject: IDNA Support For Solr Hi; Is there any support for IDNA at Solr? (IDNA: http://en.wikipedia.org/wiki/Internationalized_domain_name)
Re: Auto-sharding and numShard parameter
Thank you for the reply Erick, I was facing exactly with that problem..from the documentation it seems that those parameter are required to run SolrCloud, instead they are just used to initialize a sample collection.. I think that in the examples on the user doc it should be better to separate those 2 concepts: one is starting the server, another one is creating/managing collections. Best, Flavio On Fri, Jul 19, 2013 at 2:13 PM, Erick Erickson erickerick...@gmail.comwrote: First the numShards parameter is only relevant the very first time you create your collection. It's a little confusing because in the SolrCloud examples you're getting collection1 by default. Look further down the SolrCloud Wiki page, the section titled Managing Collections via the Collections API for creating collections with a different name. Either way, either when you run the bootstrap command or when you create a new collection, that's the only time numShards counts. It's ignored the rest of the time. As far as data growing, you need to either 1 create enough shards to handle the eventual size things will be, sometimes called oversharding or 2 use the splitShard capabilities in very recent Solrs to expand capacity. Best Erick On Thu, Jul 18, 2013 at 4:52 PM, Flavio Pompermaier pomperma...@okkam.it wrote: Hi to all, Probably this question has a simple answer but I just want to be sure of the potential drawbacks..when I run SolrCloud I run the main solr instance with the -numShard option (e.g. 2). Then as data grows, shards could potentially become a huge number. If I hadstio to restart all nodes and I re-run the master with the numShard=2, what will happen? It will be just ignored or Solr will try to reduce shards...? Another question...in SolrCloud, how do I restart all the cloud at once? Is it possible? Best, Flavio
AW: Avoid Solr Pivot Faceting Out of Memory / Shorter result for pivot faceting requests with facet.pivot.ngroup=true and facet.pivot.showLastList=false
Dear Members. Do you guys think I am better off in the solr developer group with this question. To summarize I would like to add a facet.pivot.ngroup =true param for show the count of the facet list Further on I would like to avoid an out of memory exceptions in reducing the result of a facet.pivot query. Best Regards Sandro Zbinden -Ursprüngliche Nachricht- Von: Sandro Zbinden [mailto:zbin...@imagic.ch] Gesendet: Mittwoch, 17. Juli 2013 13:45 An: solr-user@lucene.apache.org Betreff: Avoid Solr Pivot Faceting Out of Memory / Shorter result for pivot faceting requests with facet.pivot.ngroup=true and facet.pivot.showLastList=false Dear Usergroup I am getting an out of memory exception in the following scenario. I have 4 sql tables: patient, visit, study and image that will be denormalized for the solr index The solr index looks like the following |p_id |p_lastname|v_id |v_name |... | 1 | Miller| 10 | Study 1 |... | 2 | Miller| 11 | Study 2 |... | 2 | Miller| 12 | Study 3 |... -- Duplication because of denormalization | 3 | Smith| 13 | Study 4 |... -- Now I am executing a facet query q=*:*facet=true facet.pivot=p_lastname,p_id facet.limit=-1 And I get the following result lst str name=fieldp_lastname/str str name=valueMiller/str int name=count3/int arr name=pivot lst str name=fieldp_id/str int name=value1/int int name=count1/int /lst lst str name=fieldp_id/str int name=value2/int int name=count2/int /lst /arr /lst lst str name=fieldp_lastname/str str name=valueSmith/str int name=count1/int arr name=pivot str name=fieldp_id/str int name=value3/int int name=count1/int /lst /arr /lst The goal is to show our clients a list of the group value and in parentheses how many patients the group contains. - Miller (2) - Smith (1) This is why we need to use the facet.pivot method with facet.limit-1. It is as far as I know the only way to get a grouping for 2 criterias. And we need the pivot list to count how many patients are in a group. Currently this works good on smaller indexes but if we have arround 1'000'000 patients and we execute a query like the one above we run in an out of memory. I figured out that the problem is not the calculation of the pivot but is the presentation of the result. Because we load all fields (we can not us facet.offset because we need to order the results ascending and descending) the result can get really big. To avoid this overload I created a change in the solr-core PivotFacetHandler.java class. In the method doPivots i added the following code NamedListInteger nl = this.getTermCounts(subField); pivot.add( ngroups, nl.size()); This will give me the group size of the list. Then I removed the recursion call pivot.add( pivot, doPivots( nl, subField, nextField, fnames, subset) ); Like this my result looks like the following lst str name=fieldp_lastname/str str name=valueMiller/str int name=count3/int int name=ngroup2/int /lst lst str name=fieldp_lastname/str str name=valueSmith/str int name=count1/int int name=ngroup1/int /lst My questions is now if there is already something planned like facet.pivot.ngroup=true and facet.pivot.showLastList=false to improve the performance of pivot faceting. Is there a chance we could get this into the solr code. I think it's a really small change of the code but could improve the product enormous. Best Regards Sandro Zbinden
Re: Help !
On 19 July 2013 10:39, narasimh...@ingvysyabank.com wrote: HI, Need help on configuring SOLR search in Alfresco. Please do not ask questions that are so overly broad that they are impossible to respond to. Firstly, do your basic homework: Alfresco is now integrated with Solr. Secondly, your question is more pertinent to an Alfresco list. You might want to take a look at how best to use mailing lists: http://wiki.apache.org/solr/UsingMailingLists Regards, Gora
RE: Indexing into SolrCloud
Hi Erick! Thanks for the reply. When I call server.add() it is just to add a single document. But, still, I think you might be correct about the size of the ultimate request. I decided to grab the bull by the horns by instantiating my own HttpClient and, in so doing, my first run changed the following parameters, SOLR_HTTP_THREAD_COUNT=4 SOLR_MAX_BUFFERED_DOCS=1 SOLR_MAX_CONNECTIONS=256 SOLR_MAX_CONNECTIONS_PER_HOST=128 SOLR_CONNECTION_TIMEOUT=0 SOLR_SO_TIMEOUT=0 I doubled the number of emptying threads, reduced the size of the request buffer 5x, increased the connection limits and set the timeouts to infinite. (I'm not actually sure what the defaults for the timeouts were since I didn't see them in the Solr code and didn't track it down.) Anyway, the good news is that this combination of parameters worked. The bad news is that I don't know whether it was resolved by changing one or more of the parameters. But, regardless, I think the whole experiment verifies your thinking that the request was too big! Thanks again!! :) Jim Beale Lead Developer hibu.com 2201 Renaissance Boulevard, King of Prussia, PA, 19406 Office: 610-879-3864 Mobile: 610-220-3067 -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: Friday, July 19, 2013 8:08 AM To: solr-user@lucene.apache.org Subject: Re: Indexing into SolrCloud Usually EOF errors indicate that the packet you're sending are too big. Wait, though. 50K is not buffered docs, I think it's buffered _requests_. So you're creating a queue that's ginormous and asking 2 threads to empty it. But that's not really the issue I suspect. How many documents are you adding at a time when you call server.add? I.e. are you using sever.add(doc) or server.add(doclist)? If the latter and you're adding a bunch of docs, try lowering that number. If you're sending one doc at a time I'm on the wrong track. Best Erick On Thu, Jul 18, 2013 at 2:51 PM, Beale, Jim (US-KOP) jim.be...@hibu.com wrote: Hey folks, I've been migrating an application which indexes about 15M documents from straight-up Lucene into SolrCloud. We've set up 5 Solr instances with a 3 zookeeper ensemble using HAProxy for load balancing. The documents are processed on a quad core machine with 6 threads and indexed into SolrCloud through HAProxy using ConcurrentUpdateSolrServer in order to batch the updates. The indexing box is heavily-loaded during indexing but I don't think it is so bad that it would cause issues. I'm using Solr 4.3.1 on client and server side, zookeeper 3.4.5 and HAProxy 1.4.22. I've been accepting the default HttpClient with 50K buffered docs and 2 threads, i.e., int solrMaxBufferedDocs = 5; int solrThreadCount = 2; solrServer = new ConcurrentUpdateSolrServer(solrHttpIPAddress, solrMaxBufferedDocs, solrThreadCount); autoCommit is configured in the solrconfig as follows: autoCommit maxTime60/maxTime maxDocs50/maxDocs openSearcherfalse/openSearcher /autoCommit I'm getting the following errors on the client and server sides respectively: Client side: 2013-07-16 19:02:47,002 [concurrentUpdateScheduler-1-thread-4] INFO SystemDefaultHttpClient - I/O exception (java.net.SocketException) caught when processing request: Software caused connection abort: socket write error 2013-07-16 19:02:47,002 [concurrentUpdateScheduler-1-thread-4] INFO SystemDefaultHttpClient - Retrying request 2013-07-16 19:02:47,002 [concurrentUpdateScheduler-1-thread-5] INFO SystemDefaultHttpClient - I/O exception (java.net.SocketException) caught when processing request: Software caused connection abort: socket write error 2013-07-16 19:02:47,002 [concurrentUpdateScheduler-1-thread-5] INFO SystemDefaultHttpClient - Retrying request Server side: 7988753 [qtp1956653918-23] ERROR org.apache.solr.core.SolrCore â java.lang.RuntimeException: [was class org.eclipse.jetty.io.EofException] early EOF at com.ctc.wstx.util.ExceptionUtil.throwRuntimeException(ExceptionUtil.java:18) at com.ctc.wstx.sr.StreamScanner.throwLazyError(StreamScanner.java:731) at com.ctc.wstx.sr.BasicStreamReader.safeFinishToken(BasicStreamReader.java:3657) at com.ctc.wstx.sr.BasicStreamReader.getText(BasicStreamReader.java:809) at org.apache.solr.handler.loader.XMLLoader.readDoc(XMLLoader.java:393) When I disabled autoCommit on the server side, I didn't see any errors there but I still get the issue client-side after about 2 million documents - which is about 45 minutes. Has anyone seen this issue before? I couldn't find anything useful on the usual places. I suppose I could setup wireshark to see what is happening but I'm hoping that someone has a better suggestion. Thanks in advance for any help! Best regards, Jim Beale hibu.com 2201 Renaissance Boulevard, King of Prussia, PA, 19406 Office: 610-879-3864 Mobile:
Date for 4.4 solr release
Hi, we are currently using solr 4.2.1. There are a lot of fix in the 4.4 that we need. Can we have an approximative date of the first stable release of solr 4.4 please ? Regards, jean charles Kelkoo SAS Société par Actions Simplifiée Au capital de € 4.168.964,30 Siège social : 8, rue du Sentier 75002 Paris 425 093 069 RCS Paris Ce message et les pièces jointes sont confidentiels et établis à l'attention exclusive de leurs destinataires. Si vous n'êtes pas le destinataire de ce message, merci de le détruire et d'en avertir l'expéditeur.
Re: dataimporter, custom fields and parsing error
Dumb question: they are in your schema? Spelled right, in the right section, using types also defined? Can you populate them by hand with a CSV file and post.jar? Regards, Alex. Personal website: http://www.outerthoughts.com/ LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch - Time is the quality of nature that keeps events from happening all at once. Lately, it doesn't seem to be working. (Anonymous - via GTD book) On Fri, Jul 19, 2013 at 12:09 PM, Andreas Owen a...@conx.ch wrote: i'm using solr 4.3 which i just downloaded today and am using only jars that came with it. i have enabled the dataimporter and it runs without error. but the field path (included in schema.xml) and text (file content) aren't indexed. what am i doing wrong? solr-path: C:\ColdFusion10\cfusion\jetty-new collection-path: C:\ColdFusion10\cfusion\jetty-new\solr\collection1 pdf-doc-path: C:\web\development\tkb\internet\public data-config.xml: dataConfig dataSource type=BinFileDataSource name=data/ dataSource type=BinURLDataSource name=dataUrl/ dataSource type=URLDataSource baseUrl= http://127.0.0.1/tkb/internet/; name=main/ document entity name=rec processor=XPathEntityProcessor url=docImportUrl.xml forEach=/albums/album dataSource=main !-- transformer=script:GenerateId-- field column=title xpath=//title / field column=id xpath=//file / field column=path xpath=//path / field column=Author xpath=//author / !-- field column=tstamp2013-07-05T14:59:46.889Z/field -- entity name=tika processor=TikaEntityProcessor url=../../../../../web/development/tkb/internet/public/${rec.path}/${ rec.id} dataSource=data field column=text / /entity /entity /document /dataConfig docImportUrl.xml: ?xml version=1.0 encoding=utf-8? albums album authorPeter Z./author titleBeratungsseminar kundenbrief/title descriptionwie kommuniziert man/description file0226520141_e-banking_Checkliste_CLX.Sentinel.pdf/file pathdownload/online/path /album album authorMarcel X./author titlekuchen backen/title descriptiontorten, kuchen, geb‰ck .../description fileKundenbrief.pdf/file pathdownload/online/path /album /albums
Collapsing similar queries
Hi, Are there any known good tools or approaches to collapsing queries. For example, imagine 4 original queries: * big house * big houses * the big house * bigger house ...and all 4 being reduced/collapsed to just big house. What might be some good approached for doing this? 1) stem them all and collapse if the are identical 2) compute levenstein distance and collapse if they are close enough Maybe also remove stop words from them first? (not so good for queries consisting of all or lots of stop words, like to be or not to be) Any better approaches? Thanks, Otis -- Solr ElasticSearch Support -- http://sematext.com/ Performance Monitoring -- http://sematext.com/spm
Re: Date for 4.4 solr release
Hahahaha ... Good 1 On 20/07/2013, at 1:43 AM, Jack Krupansky j...@basetechnology.com wrote: real_soon:[NOW+3DAYS TO NOW+10DAYS] -- Jack Krupansky -Original Message- From: Jabouille Jean Charles Sent: Friday, July 19, 2013 11:10 AM To: solr-user@lucene.apache.org Subject: Date for 4.4 solr release Hi, we are currently using solr 4.2.1. There are a lot of fix in the 4.4 that we need. Can we have an approximative date of the first stable release of solr 4.4 please ? Regards, jean charles Kelkoo SAS Société par Actions Simplifiée Au capital de € 4.168.964,30 Siège social : 8, rue du Sentier 75002 Paris 425 093 069 RCS Paris Ce message et les pièces jointes sont confidentiels et établis à l'attention exclusive de leurs destinataires. Si vous n'êtes pas le destinataire de ce message, merci de le détruire et d'en avertir l'expéditeur.
Re: Date for 4.4 solr release
real_soon:[NOW+3DAYS TO NOW+10DAYS] -- Jack Krupansky -Original Message- From: Jabouille Jean Charles Sent: Friday, July 19, 2013 11:10 AM To: solr-user@lucene.apache.org Subject: Date for 4.4 solr release Hi, we are currently using solr 4.2.1. There are a lot of fix in the 4.4 that we need. Can we have an approximative date of the first stable release of solr 4.4 please ? Regards, jean charles Kelkoo SAS Société par Actions Simplifiée Au capital de € 4.168.964,30 Siège social : 8, rue du Sentier 75002 Paris 425 093 069 RCS Paris Ce message et les pièces jointes sont confidentiels et établis à l'attention exclusive de leurs destinataires. Si vous n'êtes pas le destinataire de ce message, merci de le détruire et d'en avertir l'expéditeur.
Indexing CSV files in a Folder
Hi I have flume dumping CSV files in folders and I would like Solr to build a index using these CSV files. What should I do? Thanks, Rajesh
Re: Indexing CSV files in a Folder
Read: http://wiki.apache.org/solr/UpdateCSV -- Jack Krupansky -Original Message- From: Rajesh Jain Sent: Friday, July 19, 2013 1:55 PM To: solr-user@lucene.apache.org Subject: Indexing CSV files in a Folder Hi I have flume dumping CSV files in folders and I would like Solr to build a index using these CSV files. What should I do? Thanks, Rajesh
Re: Custom RequestHandlerBase XML Response Issue
: So as you mentioned in your last mail, how can I prepare a combined : response for this xml doc and even if I do I don't think it would work : because the same I am doing in the RequstHandler. Part of the disconnect you seem to be having with the advice others have been giving you is that Solr does a very good job of abstracting away the *data* being returned to users from the *format* of that data, similar to an MVC setup (Ryan McKinley once told me he used Solr as his MVC Framework for all sorts of applications even if they didn't use the underlying index). RequestHandlers are responsible for processing hte logic of a request (the Controller) and creating/manipulating the SolrQueryresponse (Model) which is then formated and written back to clients using a ResponseWriter (View) ... clients can request to use a completley arbitrary ResponseWriter depending on what format they want to get data in, independent of the RequestHandler they use to generate the data. In your case, the data that your custom RequestHandler wants to return is itself an XML structure -- but that doens't mean the existing solr XML ResponseWriter is prepared otwrite it out to you as is -- the XML ResponseWriter is designed to serialized structures of the supported data types in a specific solr XML format -- just as the JSON ResponseWriter is designed to serialize structures of the supported data types in a specific solr json format, etc... you could serialize your XML DOM as a string, and ask the response writer to handle that -- but it's probably not going to be what you want, because he response writer itself is going to take your arbitrary string data (that just so happens to be XML) and wrap it in it's own markup (XML, JSON, etc...) In gneral, i agree with the questions/comments made by several other people... 1) what *exactly* is your ultimate goal (XY Problem?) 2) why are you doing this XML combining logic in solr, and not in your own application But if you insist on the approach you are taking, you may find that the RawResponseWriter is useful to you -- it is an extremely specialized ResponseWriter for the purposes of use in the solr Admin request handlers and for remotely streaming files in DIH, but it may also work for your purposes. -Hoss
Re: Collapsing similar queries
For starters, I think you need to elaborate your criteria for queries that can be collapsed. You can say they're similar, but then that begs the questions of: 1) How to measure similarity, and 2) What threshold level of similarity to use for ok to collapse. Two measures of similarity to consider: 1. How many top results do they have in common? 2. How many top terms and phrases from their top results do they have in common. Maybe, ultimately, some arbitrary heuristic is good enough, say using editing distance for the raw query text. Or some adjusted editing distance. Or editing distance of the top terms of the top documents. Or, simply ANY heuristic that simple seems to both discriminate on differences and combine on similarities. Here's a test case: query set 1. Office 2. The Office 3. Official 4. Office release 5. Official release 6. Office DVD There are three distinct groups there. If you have a specific, narrow domain in mind, a thesaurus of concepts and synonyms for that domain would help you a lot. -- Jack Krupansky -Original Message- From: Otis Gospodnetic Sent: Friday, July 19, 2013 12:33 PM To: solr-user@lucene.apache.org Subject: Collapsing similar queries Hi, Are there any known good tools or approaches to collapsing queries. For example, imagine 4 original queries: * big house * big houses * the big house * bigger house ...and all 4 being reduced/collapsed to just big house. What might be some good approached for doing this? 1) stem them all and collapse if the are identical 2) compute levenstein distance and collapse if they are close enough Maybe also remove stop words from them first? (not so good for queries consisting of all or lots of stop words, like to be or not to be) Any better approaches? Thanks, Otis -- Solr ElasticSearch Support -- http://sematext.com/ Performance Monitoring -- http://sematext.com/spm
Re: Indexing CSV files in a Folder
Thanks Jack, I am looking at more real time, I am streaming CSV file in a folder using Flume and I would like Solr to build the index automatically, rather that posting using curl. I think there is some discussion about the MorphlineSolrSink on Flume site, but the documentation is very little. I can think out writing the curl as a periodic job, but ... the file name might change every time. Thanks, Rajesh On Fri, Jul 19, 2013 at 2:18 PM, Jack Krupansky j...@basetechnology.comwrote: Read: http://wiki.apache.org/solr/**UpdateCSVhttp://wiki.apache.org/solr/UpdateCSV -- Jack Krupansky -Original Message- From: Rajesh Jain Sent: Friday, July 19, 2013 1:55 PM To: solr-user@lucene.apache.org Subject: Indexing CSV files in a Folder Hi I have flume dumping CSV files in folders and I would like Solr to build a index using these CSV files. What should I do? Thanks, Rajesh
Re: custom field type plugin
: a chromosome (or gene, or other object types). All that really boils : down to is being able to give a number, e.g. 10234, and return documents : that have regions containing the number. So you'd have a document with a : list like [1:16090,400:8000,40123:43564], and it should come You should take a look at some of the build in features using the spatial types... http://wiki.apache.org/solr/SpatialForTimeDurations I believe David also covered this usecase in his talk in san diego... http://www.lucenerevolution.org/2013/Lucene-Solr4-Spatial-Deep-Dive : But I get this error about it not being able to find the AbstractSubTypeFieldType class. : Here is the first bit of the trace: ... : Any hints as to what I did wrong? I can provide source code, or a fuller stack trace, config settings, etc. : : Also, I did try to unpack the solr.war, stick my jar in WEB-INF/lib, : then repack. However, when I did that, I get a NoClassDefFoundError for : my plugin itself. a fuller stack trace might help -- but the key question is what order did you try these two approaches in? and what exactly did you fieldType declaration look like? my guess is that you tried repacking the war first, and maybe your exploded war classpath is still polluted with your old jar from when you repacked it and now you have multiple copies in the plugin classloaders classpath. (the initial NoClassDefFoundError could have been from a mistake in your fieldType/ declaration) try starting competley clean, using the stock war and sample configs and make sure you get no errors. then try declaring your custom fieldType, using hte fully qualified classname w/o even telling solr about your jar, and ensure that you get a NoClassDefFoundError for your custom class -- if you get an error about AbstractSubTypeFieldType again then you still have a copy of your custom class somwhere in the classpath. *THEN* try adding a lib/ directive to load your jar to load it. if that still doesn't work provide us with the details of your servlet container, solr version, the full stack trace, the details of how you are configuring your fieldType/, how you declared the lib/ what your filesystem looks like for your solrhome, war, etc... -Hoss
AUTO: Siobhan Roche is out of the office (returning 22/07/2013)
I am out of the office until 22/07/2013. I will respond to your query on my return, Thanks Siobhan Note: This is an automated response to your message custom field type plugin sent on 19/07/2013 13:06:27. This is the only notification you will receive while this person is away.
Request to be added to the ContributorsGroup
Hello, Would someone please be kind enough and add me to the ContributorsGroup? My Wiki Username is: RickyGill Thanks again. Regards Ricky Gill | Managing Director | Jobuzu.co.uk Mob: 07455071710 (Any Time) | Tel: 0845 805 2162 (11:00am - 5:30pm) Skype: JobuzuLTD | Email: mailto:ricky.g...@jobuzu.co.uk ricky.g...@jobuzu.co.uk Web: http://jobuzu.co.uk/ http://jobuzu.co.uk http://jobuzu.co.uk/ We are a NO-SPAM company and respect your privacy if you would like not to receive further emails from us please reply back with the following subject: Remove Me _ Jobuzu Ltd or any of its subsidiary companies may not be held responsible for the content of this email as it may reflect the personal view of the sender and not that of the company. Should you receive this email in error, please notify the sender immediately and do not disclose copy or distribute it. While Jobuzu Ltd runs anti-virus software on all servers and all workstations, it cannot be held responsible for any infected files that you may receive Jobuzu Ltd advises all recipients to virus scan any files.
dataimporter, custom fields and parsing error
i'm using solr 4.3 which i just downloaded today and am using only jars that came with it. i have enabled the dataimporter and it runs without error. but the field path (included in schema.xml) and text (file content) aren't indexed. what am i doing wrong? solr-path: C:\ColdFusion10\cfusion\jetty-new collection-path: C:\ColdFusion10\cfusion\jetty-new\solr\collection1 pdf-doc-path: C:\web\development\tkb\internet\public data-config.xml: dataConfig dataSource type=BinFileDataSource name=data/ dataSource type=BinURLDataSource name=dataUrl/ dataSource type=URLDataSource baseUrl=http://127.0.0.1/tkb/internet/; name=main/ document entity name=rec processor=XPathEntityProcessor url=docImportUrl.xml forEach=/albums/album dataSource=main !-- transformer=script:GenerateId-- field column=title xpath=//title / field column=id xpath=//file / field column=path xpath=//path / field column=Author xpath=//author / !-- field column=tstamp2013-07-05T14:59:46.889Z/field -- entity name=tika processor=TikaEntityProcessor url=../../../../../web/development/tkb/internet/public/${rec.path}/${rec.id} dataSource=data field column=text / /entity /entity /document /dataConfig docImportUrl.xml: ?xml version=1.0 encoding=utf-8? albums album authorPeter Z./author titleBeratungsseminar kundenbrief/title descriptionwie kommuniziert man/description file0226520141_e-banking_Checkliste_CLX.Sentinel.pdf/file pathdownload/online/path /album album authorMarcel X./author titlekuchen backen/title descriptiontorten, kuchen, geb‰ck .../description fileKundenbrief.pdf/file pathdownload/online/path /album /albums
Re: Solr 4.3 open a lot more files than solr 3.6
Did you try setting useCompoundFile to true in solrconfig.xml? Also, try using a lower mergeFactor which will result in fewer segments and hence fewer open files. Also, I assume you can set the limit using a ulimit command.. ex: ulimit -n20 -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-4-3-open-a-lot-more-files-than-solr-3-6-tp4079013p4079221.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: The way edismax parses colon seems weird
What field type analyzer and tokenizer are you using, and what does a sample of the input data look like? Generally, a single backslash I all that is needed for escaping. And, escaping is not needed within a quoted phrase, except for quotes and literal backslashes. -- Jack Krupansky -Original Message- From: jefferyyuan Sent: Friday, July 19, 2013 6:01 PM To: solr-user@lucene.apache.org Subject: The way edismax parses colon seems weird In our application, user may search error code like 12:34. We define default search field, like: str name=qftitle^10 body_stored^8 content^5/str So when user search: 12:34, we want to search the error code in the specified fields. In the code, if we search q=12:34 directly, this can't find anything. It's expected as it'ss to search 34 on 12 field. Then we try to escape the colon, search: 12\:34, the parsedquery would be +12\:34, still can't find the expected page. str name=parsedquery(+12\:34)/no_coord/str str name=parsedquery_toString+12\:34/str str name=QParserExtendedDismaxQParser/str If I type 2 \\, seems it can find the error page: q=12\\:34 str name=parsedquery (+DisjunctionMaxQuery((content:12 34^0.5 | body_stored:(12\:34 12) 34^0.8 | title:12 34^1.1)))/no_coord /str str name=parsedquery_toString +(content:12 34^0.5 | body_stored:(12\:34 12) 34^0.8 | title:12 34^1.1) /str str name=QParserExtendedDismaxQParser/str Is this a bug in Solr edismax or not? -- View this message in context: http://lucene.472066.n3.nabble.com/The-way-edismax-parses-colon-seems-weird-tp4079226.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Request to be added to the ContributorsGroup
Sure :) Done! - Stefan On Friday, July 19, 2013 at 9:28 PM, ricky gill wrote: Hello, Would someone please be kind enough and add me to the “ContributorsGroup”? My Wiki Username is: RickyGill Thanks again. Regards Ricky Gill | Managing Director | Jobuzu.co.uk (http://Jobuzu.co.uk) Mob: 07455071710 (Any Time) | Tel: 0845 805 2162 (11:00am - 5:30pm) Skype: JobuzuLTD | Email: ricky.g...@jobuzu.co.uk (mailto:ricky.g...@jobuzu.co.uk) Web: http://jobuzu.co.uk (http://jobuzu.co.uk/) We are a NO-SPAM company and respect your privacy if you would like not to receive further emails from us please reply back with the following subject: Remove Me Jobuzu Ltd or any of its subsidiary companies may not be held responsible for the content of this email as it may reflect the personal view of the sender and not that of the company. Should you receive this email in error, please notify the sender immediately and do not disclose copy or distribute it. While Jobuzu Ltd runs anti-virus software on all servers and all workstations, it cannot be held responsible for any infected files that you may receive Jobuzu Ltd advises all recipients to virus scan any files.
The way edismax parses colon seems weird
In our application, user may search error code like 12:34. We define default search field, like: str name=qftitle^10 body_stored^8 content^5/str So when user search: 12:34, we want to search the error code in the specified fields. In the code, if we search q=12:34 directly, this can't find anything. It's expected as it'ss to search 34 on 12 field. Then we try to escape the colon, search: 12\:34, the parsedquery would be +12\:34, still can't find the expected page. str name=parsedquery(+12\:34)/no_coord/str str name=parsedquery_toString+12\:34/str str name=QParserExtendedDismaxQParser/str If I type 2 \\, seems it can find the error page: q=12\\:34 str name=parsedquery (+DisjunctionMaxQuery((content:12 34^0.5 | body_stored:(12\:34 12) 34^0.8 | title:12 34^1.1)))/no_coord /str str name=parsedquery_toString +(content:12 34^0.5 | body_stored:(12\:34 12) 34^0.8 | title:12 34^1.1) /str str name=QParserExtendedDismaxQParser/str Is this a bug in Solr edismax or not? -- View this message in context: http://lucene.472066.n3.nabble.com/The-way-edismax-parses-colon-seems-weird-tp4079226.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: The way edismax parses colon seems weird
On 7/19/2013 4:01 PM, jefferyyuan wrote: If I type 2 \\, seems it can find the error page: q=12\\:34 str name=parsedquery (+DisjunctionMaxQuery((content:12 34^0.5 | body_stored:(12\:34 12) 34^0.8 | title:12 34^1.1)))/no_coord /str str name=parsedquery_toString +(content:12 34^0.5 | body_stored:(12\:34 12) 34^0.8 | title:12 34^1.1) /str str name=QParserExtendedDismaxQParser/str Is this a bug in Solr edismax or not? It sounds like it's a requirement for whatever you are using to construct your queries. When building Strings in Java, for instance, a double backslash is required for a literal backslash, because a single backslash is used for special characters, like \n for newline. It's similar for Perl, and probably PHP as well as other programming languages. Thanks, Shawn
Re: The way edismax parses colon seems weird
Could this be related: https://issues.apache.org/jira/browse/SOLR-4333(Fixed in 4.4, so you could even run your test against RC1) Regards, Alex. Personal website: http://www.outerthoughts.com/ LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch - Time is the quality of nature that keeps events from happening all at once. Lately, it doesn't seem to be working. (Anonymous - via GTD book) On Fri, Jul 19, 2013 at 6:01 PM, jefferyyuan yuanyun...@gmail.com wrote: In our application, user may search error code like 12:34. We define default search field, like: str name=qftitle^10 body_stored^8 content^5/str So when user search: 12:34, we want to search the error code in the specified fields. In the code, if we search q=12:34 directly, this can't find anything. It's expected as it'ss to search 34 on 12 field. Then we try to escape the colon, search: 12\:34, the parsedquery would be +12\:34, still can't find the expected page. str name=parsedquery(+12\:34)/no_coord/str str name=parsedquery_toString+12\:34/str str name=QParserExtendedDismaxQParser/str If I type 2 \\, seems it can find the error page: q=12\\:34 str name=parsedquery (+DisjunctionMaxQuery((content:12 34^0.5 | body_stored:(12\:34 12) 34^0.8 | title:12 34^1.1)))/no_coord /str str name=parsedquery_toString +(content:12 34^0.5 | body_stored:(12\:34 12) 34^0.8 | title:12 34^1.1) /str str name=QParserExtendedDismaxQParser/str Is this a bug in Solr edismax or not? -- View this message in context: http://lucene.472066.n3.nabble.com/The-way-edismax-parses-colon-seems-weird-tp4079226.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: The way edismax parses colon seems weird
You havne't told us anything about how you have the anslysis configured for the fileds you are using -- and those details probably contain the specifics of your problem. once you've escaped hte colon so that eismax no longer recognizes it as search this specific user field syntax, any other questions about what the final query for that clauses windsup being, and what it does or doesn't match are entirely depending on your analyis. When I use the Solr 4.3.1 example configs, and index a document like this... $ java -Ddata=args -jar post.jar 'adddocfield name=idHOSS/fieldfield name=title12:34/fieldfield name=cat12:34/field/doc/add' Then the following query will find it... http://localhost:8983/solr/select?debugQuery=truedefType=edismaxqf=titleq=12\:34 ...and the parsed query, because of the fieldType analyzier for the title field, looks like... (+DisjunctionMaxQuery(((title:12 title:34/no_coord This query will also find it... http://localhost:8983/solr/select?debugQuery=truedefType=edismaxqf=catq=56\:78 (+DisjunctionMaxQuery((cat:56:78)))/no_coord As will this one... http://localhost:8983/solr/select?debugQuery=truedefType=edismaxqf=skuq=99\:00 (+DisjunctionMaxQuery((sku:9900)))/no_coord As does combining them all together... http://localhost:8983/solr/select?debugQuery=trueq.op=ANDdefType=edismaxqf=sku+title+catq=12\:34+56\:78+99\:00 Note that if you use to uf option to tighten down which field names edismax allows with the : syntax, you don't even have to escape it... http://localhost:8983/solr/select?debugQuery=trueq.op=ANDdefType=edismaxuf=-*qf=sku+title+catq=12:34+56:78+99:00 -Hoss
Re: Indexing CSV files in a Folder
Did you look in to this link? http://www.marshut.com/ruzyy/download-and-configure-morphlinesolrsink.html -- View this message in context: http://lucene.472066.n3.nabble.com/Indexing-CSV-files-in-a-Folder-tp4079192p4079222.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: The way edismax parses colon seems weird
Thanks very much for the reply. We are querying solr directly from browser: http://localhost:8080/solr/select?q=12\:34defType=edismaxdebug=queryqf=content str name=rawquerystring12\:34/str str name=querystring12\:34/str str name=parsedquery(+12\:34)/no_coord/str str name=parsedquery_toString+12\:34/str str name=QParserExtendedDismaxQParser/str And seems this is not related with which (default) field I use to query. -- View this message in context: http://lucene.472066.n3.nabble.com/The-way-edismax-parses-colon-seems-weird-tp4079226p4079234.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Date for 4.4 solr release
Shouldn't that be real_soon:[NOW/DAY+3DAYS TO NOW/DAY+10DAYS] You know, just to avoid the performance problems of the people asking every five minutes. :-) Regards, Alex. P.s. Or is this a premature optimization? Personal website: http://www.outerthoughts.com/ LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch - Time is the quality of nature that keeps events from happening all at once. Lately, it doesn't seem to be working. (Anonymous - via GTD book) On Fri, Jul 19, 2013 at 11:43 AM, Jack Krupansky j...@basetechnology.comwrote: real_soon:[NOW+3DAYS TO NOW+10DAYS] -- Jack Krupansky -Original Message- From: Jabouille Jean Charles Sent: Friday, July 19, 2013 11:10 AM To: solr-user@lucene.apache.org Subject: Date for 4.4 solr release Hi, we are currently using solr 4.2.1. There are a lot of fix in the 4.4 that we need. Can we have an approximative date of the first stable release of solr 4.4 please ? Regards, jean charles Kelkoo SAS Société par Actions Simplifiée Au capital de € 4.168.964,30 Siège social : 8, rue du Sentier 75002 Paris 425 093 069 RCS Paris Ce message et les pièces jointes sont confidentiels et établis à l'attention exclusive de leurs destinataires. Si vous n'êtes pas le destinataire de ce message, merci de le détruire et d'en avertir l'expéditeur.
Re: The way edismax parses colon seems weird
Very good chance that is it. -- Jack Krupansky -Original Message- From: Alexandre Rafalovitch Sent: Friday, July 19, 2013 7:16 PM To: solr-user@lucene.apache.org Subject: Re: The way edismax parses colon seems weird Could this be related: https://issues.apache.org/jira/browse/SOLR-4333(Fixed in 4.4, so you could even run your test against RC1) Regards, Alex. Personal website: http://www.outerthoughts.com/ LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch - Time is the quality of nature that keeps events from happening all at once. Lately, it doesn't seem to be working. (Anonymous - via GTD book) On Fri, Jul 19, 2013 at 6:01 PM, jefferyyuan yuanyun...@gmail.com wrote: In our application, user may search error code like 12:34. We define default search field, like: str name=qftitle^10 body_stored^8 content^5/str So when user search: 12:34, we want to search the error code in the specified fields. In the code, if we search q=12:34 directly, this can't find anything. It's expected as it'ss to search 34 on 12 field. Then we try to escape the colon, search: 12\:34, the parsedquery would be +12\:34, still can't find the expected page. str name=parsedquery(+12\:34)/no_coord/str str name=parsedquery_toString+12\:34/str str name=QParserExtendedDismaxQParser/str If I type 2 \\, seems it can find the error page: q=12\\:34 str name=parsedquery (+DisjunctionMaxQuery((content:12 34^0.5 | body_stored:(12\:34 12) 34^0.8 | title:12 34^1.1)))/no_coord /str str name=parsedquery_toString +(content:12 34^0.5 | body_stored:(12\:34 12) 34^0.8 | title:12 34^1.1) /str str name=QParserExtendedDismaxQParser/str Is this a bug in Solr edismax or not? -- View this message in context: http://lucene.472066.n3.nabble.com/The-way-edismax-parses-colon-seems-weird-tp4079226.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: The way edismax parses colon seems weird
I noticed that a single backslash in the URL query got turned into a backslash in the parsed query, which implies that the backslash was escaped (improperly) by Solr: http://localhost:8080/solr/select?q=12\:34defType=edismaxdebug=queryqf=content str name=parsedquery_toString+12\:34/str As a workaround, enclose the term in quotes, without the escaping: http://localhost:8080/solr/select?q=12:34defType=edismaxdebug=queryqf=content -- Jack Krupansky -Original Message- From: jefferyyuan Sent: Friday, July 19, 2013 7:09 PM To: solr-user@lucene.apache.org Subject: Re: The way edismax parses colon seems weird Thanks very much for the reply. We are querying solr directly from browser: http://localhost:8080/solr/select?q=12\:34defType=edismaxdebug=queryqf=content str name=rawquerystring12\:34/str str name=querystring12\:34/str str name=parsedquery(+12\:34)/no_coord/str str name=parsedquery_toString+12\:34/str str name=QParserExtendedDismaxQParser/str And seems this is not related with which (default) field I use to query. -- View this message in context: http://lucene.472066.n3.nabble.com/The-way-edismax-parses-colon-seems-weird-tp4079226p4079234.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: custom field type plugin
I can try again this weekend to get a clean environment. However, the order I did things in was the reverse of what you suggest. I got the AbstractSubTypeFieldType error first. Then I removed my jar from the sharedLib folder, and tried the war repacking solution. That is when I got NoClassDefFoundError on my custom class. The spatial feature looks intriguing, although I have no idea if it could fit my use case. It looks fairly complex a concept, but maybe it is all the different shapes and geometry that is confusing me. If I thought of my problem in terms of geometry, I would say a chromosome region is like a segment of a line. I would need to define multiple line segments and be able to query by a single point and only return documents that have a line segment that the single point falls on. Does that make sense? Is that at all doable with a spatial query? -Kevin From: Chris Hostetter [hossman_luc...@fucit.org] Sent: Friday, July 19, 2013 3:15 PM To: solr-user@lucene.apache.org Subject: Re: custom field type plugin : a chromosome (or gene, or other object types). All that really boils : down to is being able to give a number, e.g. 10234, and return documents : that have regions containing the number. So you'd have a document with a : list like [1:16090,400:8000,40123:43564], and it should come You should take a look at some of the build in features using the spatial types... http://wiki.apache.org/solr/SpatialForTimeDurations I believe David also covered this usecase in his talk in san diego... http://www.lucenerevolution.org/2013/Lucene-Solr4-Spatial-Deep-Dive : But I get this error about it not being able to find the AbstractSubTypeFieldType class. : Here is the first bit of the trace: ... : Any hints as to what I did wrong? I can provide source code, or a fuller stack trace, config settings, etc. : : Also, I did try to unpack the solr.war, stick my jar in WEB-INF/lib, : then repack. However, when I did that, I get a NoClassDefFoundError for : my plugin itself. a fuller stack trace might help -- but the key question is what order did you try these two approaches in? and what exactly did you fieldType declaration look like? my guess is that you tried repacking the war first, and maybe your exploded war classpath is still polluted with your old jar from when you repacked it and now you have multiple copies in the plugin classloaders classpath. (the initial NoClassDefFoundError could have been from a mistake in your fieldType/ declaration) try starting competley clean, using the stock war and sample configs and make sure you get no errors. then try declaring your custom fieldType, using hte fully qualified classname w/o even telling solr about your jar, and ensure that you get a NoClassDefFoundError for your custom class -- if you get an error about AbstractSubTypeFieldType again then you still have a copy of your custom class somwhere in the classpath. *THEN* try adding a lib/ directive to load your jar to load it. if that still doesn't work provide us with the details of your servlet container, solr version, the full stack trace, the details of how you are configuring your fieldType/, how you declared the lib/ what your filesystem looks like for your solrhome, war, etc... -Hoss The information in this email, including attachments, may be confidential and is intended solely for the addressee(s). If you believe you received this email by mistake, please notify the sender by return email as soon as possible.
RE: custom field type plugin
: I can try again this weekend to get a clean environment. However, the : order I did things in was the reverse of what you suggest. I got the Hmmm... then i'm kind of at a loss to explain what you're describing. need to see more details of the configs, dir structure, jar structure, etc... : The spatial feature looks intriguing, although I have no idea if it : could fit my use case. It looks fairly complex a concept, but maybe it : is all the different shapes and geometry that is confusing me. If I : thought of my problem in terms of geometry, I would say a chromosome : region is like a segment of a line. I would need to define multiple line : segments and be able to query by a single point and only return : documents that have a line segment that the single point falls on. Does : that make sense? Is that at all doable with a spatial query? The tricky thing about leveraging the spatial stuff for this type of problem is that it's frequently better to *not* let yourself think in terms of the a straightforward mapping between your problem space and geometry. Instead of modeling your data as documents containing multiple line segments and trying to search for a document containing a line segment that contains your 1D point, imagine modeling your data as documents containing multiple 2D points, one point per range, where the X coordinate is the lower bound of your range, and the Y axis is the upper bound of the range... https://people.apache.org/~hossman/spatial-for-non-spatial-meetup-20130117/#slide8 ...and to find all documents containing a range that contains a specified input value V, you then query for all documents containing points inside of a specially crafted bounding box based on V... https://people.apache.org/~hossman/spatial-for-non-spatial-meetup-20130117/#slide11 ..the big caveat to this approach that i failed to mention before is that it presumes there is an absolute min/max definable for the overall range of values you are dealing with so that you can define the bounding boxes appropriates -- otherwise the geometery won't work. In anycase .. it's an interesting idea i wanted to through out there for you to consider i case it worked for you before you jumped through a tone of hoops trying to get a new custom FieldType to work. -Hoss
Re: Solr index lot of pdf, doc, txt
I'm using Solr 4.2 but I don't understand well this post recursive way. Maybe I think write a bash script. But bash script is not good solution. Another way solution ? Please advice me. -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-index-lot-of-pdf-doc-txt-tp4078651p4079253.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Date for 4.4 solr release
+1 :D -- View this message in context: http://lucene.472066.n3.nabble.com/Date-for-4-4-solr-release-tp4079152p4079254.html Sent from the Solr - User mailing list archive at Nabble.com.