Re: StreamingUpdateSolrServer - exceptions not propagated
On 3/26/2012 10:25 PM, Shawn Heisey wrote: The problem is that I currently have no way (that I know of so far) to detect that a problem happened. As far as my code is concerned, everything worked, so it updates my position tracking and those documents will never be inserted. I have not yet delved into the response object to see whether it can tell me anything. My code currently assumes that if no exception was thrown, it was successful. This works with CHSS. I will write some test code that tries out various error situations and see what the response contains. I've written some test code. When doing an add with SUSS against a server that's down, no exception is thrown. It does throw one for query and deleteByQuery. When doing the add test with CHSS, an exception is thrown. I guess I'll just have to use CHSS until this gets fixed, assuming it ever does. Would it be at all helpful to file an issue in jira, or has one already been filed? With a quick search, I could not find one. Thanks, Shawn
Re: Using the ids parameter
Hi, Actually we ran into the same issue with using ids parameter, in the solr front with shards architecture (exception throws in the solr front). Were you able to solve it by using the key:value syntax or some other way? BTW, there was a related issue: https://issues.apache.org/jira/browse/SOLR-1477 but it's marked as Won't Fix, does anyone know why it is so, or if this is planned to be resolved? Dmitry On Tue, Mar 20, 2012 at 11:53 PM, Jamie Johnson wrote: > We're running into an issue where we are trying to use the ids= > parameter to return a set of documents given their id. This seems to > work intermittently when running in SolrCloud. The first question I > have is this something that we should be using or instead should we > doing a query with key:? The stack trace that I am getting right now > is included below, any thoughts would be appreciated. > > Mar 20, 2012 5:36:38 PM org.apache.solr.core.SolrCore execute > INFO: [slice1_shard1] webapp=/solr path=/select > > params={hl.fragsize=1&ids=4f14cc9b-f669-4d6f-85ae-b22fad143492,urn:uuid:020335a7-1476-43d6-8f91-241bce1e7696,urn:uuid:352473eb-af56-4f6f-94d5-c0096dcb08d4} > status=500 QTime=32 > Mar 20, 2012 5:36:38 PM org.apache.solr.common.SolrException log > SEVERE: null:java.lang.NullPointerException > at > org.apache.solr.handler.component.ShardFieldSortedHitQueue$1.compare(ShardDoc.java:232) > at > org.apache.solr.handler.component.ShardFieldSortedHitQueue.lessThan(ShardDoc.java:159) > at > org.apache.solr.handler.component.ShardFieldSortedHitQueue.lessThan(ShardDoc.java:101) > at org.apache.lucene.util.PriorityQueue.upHeap(PriorityQueue.java:231) > at org.apache.lucene.util.PriorityQueue.add(PriorityQueue.java:140) > at > org.apache.lucene.util.PriorityQueue.insertWithOverflow(PriorityQueue.java:156) > at > org.apache.solr.handler.component.QueryComponent.mergeIds(QueryComponent.java:839) > at > org.apache.solr.handler.component.QueryComponent.handleRegularResponses(QueryComponent.java:630) > at > org.apache.solr.handler.component.QueryComponent.handleResponses(QueryComponent.java:609) > at > org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:332) > at > org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129) > at org.apache.solr.core.SolrCore.execute(SolrCore.java:1539) > at > org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:406) > at > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:255) > at > org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) > at > org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399) > at > org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) > at > org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182) > at > org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766) > at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450) > at > org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230) > at > org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114) > at > org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152) > at org.mortbay.jetty.Server.handle(Server.java:326) > at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542) > at > org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928) > at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549) > at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212) > at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404) > at > org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:228) > at > org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582) >
Re: Re: Indexing Source Code
I can't find better examples at the moment... I hope they are sufficient to describe what I need. ''' Code 25 01 RETURN-CODES. 26 05 RTC00 PIC X(2) VALUE '00'. 27 05 RTC01 PIC X(2) VALUE '01'. 28 05 RTC04 PIC X(2) VALUE '04'. 29 05 RTC08 PIC X(2) VALUE '08'. ''' /Code This is a variable (field/array) in Cobol. I use it in some (sub)programs. No I need to Reengener it because I need more return Codes or I want to get rid of the Return Codes because I have another solution. I want to find all appearance of this variable. - List of variables used by a program. ''' Code 005915 CALL 'Unit01221' RETURN-CODE ''' /Code Unit00221 is the subprogram I call. The Main Program is Unit00200 - the last 2 Numbers > 0 says it is a subprogram. - I want to get a list of the subprograms the mainporgram uses. - I give the name of a Subprogram, and get a list where it is used (can be in more then one main program). - When possible I give the name of the Main and get a (nested) list of used subprograms, subsubprograms... Thanks, Bastian Am schrieb Marcelo Carvalho Fernandes : Hi Bastian, Can you please tell us what kind of search you imagine doing with some (use case) examples? Marcelo On Monday, March 26, 2012, Bastian H arbeit.bast...@googlemail.com> wrote: > Hi, > > I like to index my Source Code - the most is Cobol, Asembler and Java - > with Solr. > > I don't know where to start... I think I need to parse it to get XML for > Solr. Do I need Tinka? Is there any Parser I could use? > > I want to index functions, variables and function calls as well as > commentaries. > > Can somebody show me to a starting point? > > Thanks > Bastian > -- Marcelo Carvalho Fernandes +55 21 8272-7970 +55 21 2205-2786
how to store file path in Solr when using TikaEntityProcessor
Hi, I am using DIH to index local file system. But the file path, size and lastmodified field were not stored. in the schema.xml I defined: And also defined tika-data-config.xml: The Solr version is 3.5. any idea? Thanks in advance.
Re: Practical Optimization
What type of logging were you using? Did you try log back? We get a pretty large increase when using that. On Fri, Mar 23, 2012 at 2:57 PM, dw5ight wrote: > Hey All- > > we run a http://carsabi.com car search engine with Solr and did some > benchmarking recently after we switched from a hosted service to > self-hosting. In brief, we went from 800ms complex range queries on a 1.5M > document corpus to 43ms. The major shifts were switching from EC2 Large to > EC2 CC8XL which got us down to 282ms (2.82x speed gain due to 2.75x CPU > speed increase we think), and then down to 43ms when we sharded to 8 cores. > We tried sharding to 12 and 16 but saw negligible gains after this point. > > Anyway, hope this might be useful to someone - we write up exact stats and a > step by step sharding procedure on our > http://carsabi.com/car-news/2012/03/23/optimizing-solr-7x-your-search-speed/ > tech blog if anyone's interested. > > best > Dwight > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/Practical-Optimization-tp3852776p3852776.html > Sent from the Solr - User mailing list archive at Nabble.com. -- Bill Bell billnb...@gmail.com cell 720-256-8076
Re: StreamingUpdateSolrServer - exceptions not propagated
On 3/26/2012 6:43 PM, Mark Miller wrote: It doesn't get thrown because that logic needs to continue - you don't necessarily want one bad document to stop all the following documents from being added. So the exception is sent to that method with the idea that you can override and do what you would like. I've written sample code around stopping and throwing an exception, but I guess its not totally trivial. Other ideas for reporting errors have been thrown around in the past, but no work on it has gotten any traction. - Mark Miller lucidimagination.com On Mar 26, 2012, at 7:33 PM, Shawn Heisey wrote: I've been building a new version of my app that keeps our Solr indexes up to date. I had hoped to use StreamingUpdateSolrServer instead of CommonsHttpSolrServer for performance reasons, but I have run into a showstopper problem that has made me revert to CHSS. I have been relying on exception handling to detect when there is any kind of problem with any request sent to Solr. Looking at the code for SUSS, it seems that any exceptions thrown by lower level code are simply logged, then forgotten as if they had never happened. The problem is that I currently have no way (that I know of so far) to detect that a problem happened. As far as my code is concerned, everything worked, so it updates my position tracking and those documents will never be inserted. I have not yet delved into the response object to see whether it can tell me anything. My code currently assumes that if no exception was thrown, it was successful. This works with CHSS. I will write some test code that tries out various error situations and see what the response contains. Thanks, Shawn
Re: Solr cores issue
yes ,I must have mis-copied and yes, i do have the conf folder per core with schema etc ... Because of this issue ,we have decided to have multiple webapps with about 50 cores per webapp ,instead of one singe webapp with all 200 cores ,would this make better sense ? what would be your suggestion? Regards Sujatha On Tue, Mar 27, 2012 at 12:07 AM, Erick Erickson wrote: > Shouldn't be. What do your log files say? You have to treat each > core as a separate index. In other words, you need to have a core#/conf > with the schema matching your core#/data/index directory etc. > > I suspect you've simply mis-copied something. > > Best > Erick > > On Mon, Mar 26, 2012 at 8:27 AM, Sujatha Arun wrote: > > I was migrating to cores from webapp ,and I was copying a bunch of > indexes > > from webapps to respective cores ,when I restarted ,I had this issue > where > > the whole webapp with the cores would not startup and was getting index > > corrupted message.. > > > > In this scenario or in a scenario where there is an issue with schema > > /config file for one core ,will the whole webapp with the cores not > restart? > > > > Regards > > Sujatha > > > > On Mon, Mar 26, 2012 at 4:43 PM, Erick Erickson >wrote: > > > >> Index corruption is very rare, can you provide more details how you > >> got into that state? > >> > >> Best > >> Erick > >> > >> On Sun, Mar 25, 2012 at 1:22 PM, Sujatha Arun > wrote: > >> > Hello, > >> > > >> > Suppose I have several cores in a single webapp ,I have issue with > Index > >> > beong corrupted in one core ,or schema /solrconfig of one core is not > >> well > >> > formed ,then entire webapp refused to load on server restart? > >> > > >> > Why does this happen? > >> > > >> > Regards > >> > Sujatha > >> >
Why my highlights are wrong(one character offset)?
all of my highlights has one character mistake in the offset,some fragments from my response. Thanks! 0 259 on sequence true 10 2.2 *,score true 0 sequence:NGNFN TSQSELSNGNFNRRPKIELSNFDGNHPKTWIRKC GENTRERNGNFNSLTRERSFAELENHPPKVRRNGSEG EGRYPCNNGNFNLTTGRCVCEKNYVHLIYEDRI YAEENYINGNFNEEPY KEVADDCNGNFNQPTGVRI -- View this message in context: http://lucene.472066.n3.nabble.com/Why-my-highlights-are-wrong-one-character-offset-tp3860286p3860286.html Sent from the Solr - User mailing list archive at Nabble.com.
Why my highlights are wrong(one character offset)?
all of my highlights has one character mistake in the offset,some fragments from my response. Thanks! 0 259 on sequence true 10 2.2 *,score true 0 sequence:NGNFN TSQSELSNGNFNRRPKIELSNFDGNHPKTWIRKC GENTRERNGNFNSLTRERSFAELENHPPKVRRNGSEG EGRYPCNNGNFNLTTGRCVCEKNYVHLIYEDRI YAEENYINGNFNEEPY KEVADDCNGNFNQPTGVRI -- View this message in context: http://lucene.472066.n3.nabble.com/Why-my-highlights-are-wrong-one-character-offset-tp3860283p3860283.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr Cloud multiple shards and replicas in one instance freezes
For whatever reason. I'm having difficult ulty reproducing the issue, I'll continue to try and reproduce On Sunday, March 25, 2012, Mark Miller wrote: > Yeah, sorry - that's what I meant. > > Sent from my iPad > > On Mar 24, 2012, at 2:18 PM, Jamie Johnson wrote: > >> There is no stack trace, I can fire things back up and try to get a >> thread dump if that's useful. >> >> On Sat, Mar 24, 2012 at 4:07 AM, Mark Miller wrote: >>> Can you get a stack trace dump? >>> >>> Sent from my iPhone >>> >>> On Mar 23, 2012, at 10:38 PM, Jamie Johnson wrote: >>> I run a test setup on my mac where I setup 4 cores 2 of which are replicas in a single solr JVM instance. I recently attempted to move this same setup to Ubuntu 10.04.4 LTS but the system seems to just lock up. I am running a local test which essentially adds 100 docs and says commit after 10s, after doing this once the solr instance just becomes non responsive, what can I look at to try to diagnose why? I've increased the number of open file descriptors for the user running solr to 200,000. Any pointers of where to look would be great. >
Re: StreamingUpdateSolrServer - exceptions not propagated
It doesn't get thrown because that logic needs to continue - you don't necessarily want one bad document to stop all the following documents from being added. So the exception is sent to that method with the idea that you can override and do what you would like. I've written sample code around stopping and throwing an exception, but I guess its not totally trivial. Other ideas for reporting errors have been thrown around in the past, but no work on it has gotten any traction. - Mark Miller lucidimagination.com On Mar 26, 2012, at 7:33 PM, Shawn Heisey wrote: > I've been building a new version of my app that keeps our Solr indexes up to > date. I had hoped to use StreamingUpdateSolrServer instead of > CommonsHttpSolrServer for performance reasons, but I have run into a > showstopper problem that has made me revert to CHSS. > > I have been relying on exception handling to detect when there is any kind of > problem with any request sent to Solr. Looking at the code for SUSS, it > seems that any exceptions thrown by lower level code are simply logged, then > forgotten as if they had never happened. > > So far I have not been able to decipher how things actually work, so I can't > tell if it would be possible to propagate the exception back up into my code. > > Questions for the experts: Would such propagation be possible without > compromising performance? Is this a bug? Can I somehow detect the failure > and throw an exception of my own? > > For reference, here is the exception that gets logged, but not actually > thrown: > > java.net.ConnectException: Connection refused >at java.net.PlainSocketImpl.socketConnect(Native Method) >at > java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339) >at > java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200) >at > java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182) >at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:391) >at java.net.Socket.connect(Socket.java:579) >at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) >at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) >at java.lang.reflect.Method.invoke(Method.java:601) >at > org.apache.commons.httpclient.protocol.ReflectionSocketFactory.createSocket(ReflectionSocketFactory.java:140) >at > org.apache.commons.httpclient.protocol.DefaultProtocolSocketFactory.createSocket(DefaultProtocolSocketFactory.java:125) >at > org.apache.commons.httpclient.HttpConnection.open(HttpConnection.java:707) >at > org.apache.commons.httpclient.MultiThreadedHttpConnectionManager$HttpConnectionAdapter.open(MultiThreadedHttpConnectionManager.java:1361) >at > org.apache.commons.httpclient.HttpMethodDirector.executeWithRetry(HttpMethodDirector.java:387) >at > org.apache.commons.httpclient.HttpMethodDirector.executeMethod(HttpMethodDirector.java:171) >at > org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:397) >at > org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:323) >at > org.apache.solr.client.solrj.impl.StreamingUpdateSolrServer$Runner.run(StreamingUpdateSolrServer.java:154) >at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) >at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) >at java.lang.Thread.run(Thread.java:722) > > Thanks, > Shawn >
StreamingUpdateSolrServer - exceptions not propagated
I've been building a new version of my app that keeps our Solr indexes up to date. I had hoped to use StreamingUpdateSolrServer instead of CommonsHttpSolrServer for performance reasons, but I have run into a showstopper problem that has made me revert to CHSS. I have been relying on exception handling to detect when there is any kind of problem with any request sent to Solr. Looking at the code for SUSS, it seems that any exceptions thrown by lower level code are simply logged, then forgotten as if they had never happened. So far I have not been able to decipher how things actually work, so I can't tell if it would be possible to propagate the exception back up into my code. Questions for the experts: Would such propagation be possible without compromising performance? Is this a bug? Can I somehow detect the failure and throw an exception of my own? For reference, here is the exception that gets logged, but not actually thrown: java.net.ConnectException: Connection refused at java.net.PlainSocketImpl.socketConnect(Native Method) at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339) at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200) at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182) at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:391) at java.net.Socket.connect(Socket.java:579) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:601) at org.apache.commons.httpclient.protocol.ReflectionSocketFactory.createSocket(ReflectionSocketFactory.java:140) at org.apache.commons.httpclient.protocol.DefaultProtocolSocketFactory.createSocket(DefaultProtocolSocketFactory.java:125) at org.apache.commons.httpclient.HttpConnection.open(HttpConnection.java:707) at org.apache.commons.httpclient.MultiThreadedHttpConnectionManager$HttpConnectionAdapter.open(MultiThreadedHttpConnectionManager.java:1361) at org.apache.commons.httpclient.HttpMethodDirector.executeWithRetry(HttpMethodDirector.java:387) at org.apache.commons.httpclient.HttpMethodDirector.executeMethod(HttpMethodDirector.java:171) at org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:397) at org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:323) at org.apache.solr.client.solrj.impl.StreamingUpdateSolrServer$Runner.run(StreamingUpdateSolrServer.java:154) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) at java.lang.Thread.run(Thread.java:722) Thanks, Shawn
Re: Slave index size growing fast
Erick, I haven't changed the maxCommitsToKeep yet. We stopped the slave that had issues and removed the data dir as you pointed and afer starting it, everything started working as normal. I guess that at some point someone commited on the slave or even copied the master files over and made this mess. Will check on the internal docs to prevent this from happening again. Thanks for explaining the whole concept, will be useful to understand the whole process. Best, Alexandre On Fri, Mar 23, 2012 at 4:05 PM, Erick Erickson wrote: > Alexandre: > > Have you changed anything like on your slave? > And do you have more than one slave? If you do, have you considered > just blowing away the entire .../data directory on the slave and letting > it re-start from scratch? I'd take the slave out of service for the > duration of this operation, or do it when you are OK with some number of > requests going to an empty index > > Because having an index. directory indicates that sometime > someone forced the slave to get out of sync, possibly as you say by > doing a commit. Or sending docs to it to be indexed or some such. Starting > the slave over should fix that if it's the root of your problem. > > Note a curious thing about the . When you start indexing, the > index version is a timestamp. However, from that point on when the index > changes, the version number is just incremented (not made the current > time). This is to avoid problems with masters and slaves having different > times. But a consequence of that is if your slave somehow gets an index > that's newer, the replication process does the best it can to not delete > indexes that are out of sync with the master and saves them away. This > might be what you're seeing. > > I'm grasping at straws a bit here, but this seems possible. > > Best > Erick > > On Fri, Mar 23, 2012 at 1:16 PM, Alexandre Rocco > wrote: > > Tomás, > > > > The 300+GB size is only inside the index.20110926152410 dir. Inside there > > are a lot of files. > > I am almost conviced that something is messed up like someone commited on > > this slave machine. > > > > Thanks > > > > 2012/3/23 Tomás Fernández Löbbe > > > >> Alexandre, additionally to what Erick said, you may want to check in the > >> slave if what's 300+GB is the "data" directory or the > "index." > >> directory. > >> > >> On Fri, Mar 23, 2012 at 12:25 PM, Erick Erickson < > erickerick...@gmail.com > >> >wrote: > >> > >> > not really, unless perhaps you're issuing commits or optimizes > >> > on the _slave_ (which you should NOT do). > >> > > >> > Replication happens based on the version of the index on the master. > >> > True, it starts out as a timestamp, but then successive versions > >> > just have that number incremented. The version number > >> > in the index on the slave is compared against the one on the master, > >> > but the actual time (on the slave or master) is irrelevant. This is > >> > explicitly to avoid problems with time synching across > >> > machines/timezones/whataver > >> > > >> > It would be instructive to look at the admin/info page to see what > >> > the index version is on the master and slave. > >> > > >> > But, if you optimize or commit (I think) on the _slave_, you might > >> > change the timestamp and mess things up (although I'm reaching > >> > here, I don't know this for certain). > >> > > >> > What's the index look like on the slave as compared to the master? > >> > Are there just a bunch of files on the slave? Or a bunch of > directories? > >> > > >> > Instead of re-indexing on the master, you could try to bring down the > >> > slave, blow away the entire index and start it back up. Since this is > a > >> > production system, I'd only try this if I had more than one slave. > >> Although > >> > you could bring up a new slave and attach it to the master and see > >> > what happens there. You wouldn't affect production if you didn't point > >> > incoming requests at it... > >> > > >> > Best > >> > Erick > >> > > >> > On Fri, Mar 23, 2012 at 11:03 AM, Alexandre Rocco > >> > wrote: > >> > > Erick, > >> > > > >> > > We're using Solr 3.3 on Linux (CentOS 5.6). > >> > > The /data dir on master is actually 1.2G. > >> > > > >> > > I haven't tried to recreate the index yet. Since it's a production > >> > > environment, > >> > > I guess that I can stop replication and indexing and then recreate > the > >> > > master index to see if it makes any difference. > >> > > > >> > > Also just noticed another thread here named "Simple Slave > Replication > >> > > Question" that tells that it could be a problem if I'm seeing an > >> > > /data/index with an timestamp on the slave node. > >> > > Is this info relevant to this issue? > >> > > > >> > > Thanks, > >> > > Alexandre > >> > > > >> > > On Fri, Mar 23, 2012 at 11:48 AM, Erick Erickson < > >> > erickerick...@gmail.com>wrote: > >> > > > >> > >> What version of Solr and what operating system? > >> > >> > >> > >> But regardless, this shouldn't be happening.
Re: Old Google Guava library needs updating (r05)
I've filed an issue for myself as a reminder. Guava r05 is pretty old indeed, time to upgrade. S. On Mon, Mar 26, 2012 at 23:12, Nicholas Ball wrote: > > Hey Staszek, > > Thanks for the reply. Yep using 4.x and that was exactly what I ended up > doing, a quick replace :) > Just thought I'd document it somewhere for a proper fix to be done in the > 4.0 release. > > No issues arose for me but then again Erick mentions it's only used in > Carrot2 contrib which I'm not using in my deployment. > > Thanks for the help! > Nick > > On Mon, 26 Mar 2012 22:40:14 +0200, Stanislaw Osinski > wrote: > > Hi Nick, > > > > Which version of Solr do you have in mind? The official 3.x line or 4.0? > > > > The quick and dirty fix to try would be to just replace Guava r05 with > the > > latest version, chances are it will work (we did that in the past though > > the version number difference was smaller). > > > > The proper fix would be for us to make a point release of Carrot2 with > > dependencies updated and update Carrot2 in Solr. And this brings us to > the > > question about the version of Solr you use. Upgrading Carrot2 in 4.0 > > shouldn't be an issue, but when it comes to 3.x I'd need to check. > > > > Staszek > > > > On Mon, Mar 26, 2012 at 13:10, Erick Erickson > > wrote: > > > >> Hmmm, near as I can tell, guava is only used in the Carrot2 contrib, so > >> maybe > >> ask over at: http://project.carrot2.org/? > >> > >> Best > >> Erick > >> > >> On Sat, Mar 24, 2012 at 3:31 PM, Nicholas Ball > >> wrote: > >> > > >> > Hey all, > >> > > >> > Working on a plugin, which uses the Curator library (ZooKeeper > client). > >> > Curator depends on the very latest Google Guava library which > >> unfortunately > >> > clashes with Solr's outdated r05 of Guava. > >> > Think it's safe to say that Solr should be using the very latest > Guava > >> > library (11.0.1) too right? > >> > Shall I open up a JIRA issue for someone to update it? > >> > > >> > Cheers, > >> > Nick > >> >
Re: Old Google Guava library needs updating (r05)
Hey Staszek, Thanks for the reply. Yep using 4.x and that was exactly what I ended up doing, a quick replace :) Just thought I'd document it somewhere for a proper fix to be done in the 4.0 release. No issues arose for me but then again Erick mentions it's only used in Carrot2 contrib which I'm not using in my deployment. Thanks for the help! Nick On Mon, 26 Mar 2012 22:40:14 +0200, Stanislaw Osinski wrote: > Hi Nick, > > Which version of Solr do you have in mind? The official 3.x line or 4.0? > > The quick and dirty fix to try would be to just replace Guava r05 with the > latest version, chances are it will work (we did that in the past though > the version number difference was smaller). > > The proper fix would be for us to make a point release of Carrot2 with > dependencies updated and update Carrot2 in Solr. And this brings us to the > question about the version of Solr you use. Upgrading Carrot2 in 4.0 > shouldn't be an issue, but when it comes to 3.x I'd need to check. > > Staszek > > On Mon, Mar 26, 2012 at 13:10, Erick Erickson > wrote: > >> Hmmm, near as I can tell, guava is only used in the Carrot2 contrib, so >> maybe >> ask over at: http://project.carrot2.org/? >> >> Best >> Erick >> >> On Sat, Mar 24, 2012 at 3:31 PM, Nicholas Ball >> wrote: >> > >> > Hey all, >> > >> > Working on a plugin, which uses the Curator library (ZooKeeper client). >> > Curator depends on the very latest Google Guava library which >> unfortunately >> > clashes with Solr's outdated r05 of Guava. >> > Think it's safe to say that Solr should be using the very latest Guava >> > library (11.0.1) too right? >> > Shall I open up a JIRA issue for someone to update it? >> > >> > Cheers, >> > Nick >>
Re: Old Google Guava library needs updating (r05)
Hi Nick, Which version of Solr do you have in mind? The official 3.x line or 4.0? The quick and dirty fix to try would be to just replace Guava r05 with the latest version, chances are it will work (we did that in the past though the version number difference was smaller). The proper fix would be for us to make a point release of Carrot2 with dependencies updated and update Carrot2 in Solr. And this brings us to the question about the version of Solr you use. Upgrading Carrot2 in 4.0 shouldn't be an issue, but when it comes to 3.x I'd need to check. Staszek On Mon, Mar 26, 2012 at 13:10, Erick Erickson wrote: > Hmmm, near as I can tell, guava is only used in the Carrot2 contrib, so > maybe > ask over at: http://project.carrot2.org/? > > Best > Erick > > On Sat, Mar 24, 2012 at 3:31 PM, Nicholas Ball > wrote: > > > > Hey all, > > > > Working on a plugin, which uses the Curator library (ZooKeeper client). > > Curator depends on the very latest Google Guava library which > unfortunately > > clashes with Solr's outdated r05 of Guava. > > Think it's safe to say that Solr should be using the very latest Guava > > library (11.0.1) too right? > > Shall I open up a JIRA issue for someone to update it? > > > > Cheers, > > Nick >
Re: Index a set of file as one document in SOLR
Consider writing a SolrJ program that extracts the data from the PDF file and combines it with the XML data. Here's an example to get you started, it shows how to do the PDF extraction at least. The other part of the code is a database connection, ignore that part. You'll have to read in the XML, parse it, extract the relevant bits and add them to the SolrInputDocument (see the example) http://www.lucidimagination.com/blog/2012/02/14/indexing-with-solrj/ Best Erick On Mon, Mar 26, 2012 at 9:25 AM, Anupam Bhattacharya wrote: > I have a set/group of documents of XML and PDF type. > > Each XML document contains the bibliographic information and has a > reference to the supporting PDF document. > How can i index this Parent-Child doc types in SOLR schema as one doc. The > PDF should be full text indexed for searching & only the corresponding > Parent XML details should be shown if the PDF contains the searched > keyword. > > How to design this kind of functionality in SOLR ? > > Appreciate any help on this. > > Regards > Anupam
document inside document?
Hey, I am making an image search engine where people can tag images with various items that are themselves tagged. For example, http://example.com/abc.jpg is tagged with the following three items: - item1 that is tagged with: tall blond woman - item2 that is tagged with: yellow purse - item3 that is tagged with: gucci red dress Querying for +yellow +purse will return the example image. But, querying for +gucci +purse will not because the image does not have an item tagged with both gucci and purse. In addition to "items", each image has various metadata such as alt text, location, description, photo credit.. etc that should be available for search. How should I write my schema.xml ? If imageUrl is primary key, do I implement my own fieldType for items, so that I can write: What would myItemType look like so that solr would know the example image will not be part of the query, +gucci +purse?? If itemId is primary key, I can use result grouping ( http://wiki.apache.org/solr/FieldCollapsing). But, I need to repeat alt text and other image metadata for each item. Or, should I create different schema for item search and metadata search? Thanks. Sam.
Re: Solr cores issue
Shouldn't be. What do your log files say? You have to treat each core as a separate index. In other words, you need to have a core#/conf with the schema matching your core#/data/index directory etc. I suspect you've simply mis-copied something. Best Erick On Mon, Mar 26, 2012 at 8:27 AM, Sujatha Arun wrote: > I was migrating to cores from webapp ,and I was copying a bunch of indexes > from webapps to respective cores ,when I restarted ,I had this issue where > the whole webapp with the cores would not startup and was getting index > corrupted message.. > > In this scenario or in a scenario where there is an issue with schema > /config file for one core ,will the whole webapp with the cores not restart? > > Regards > Sujatha > > On Mon, Mar 26, 2012 at 4:43 PM, Erick Erickson > wrote: > >> Index corruption is very rare, can you provide more details how you >> got into that state? >> >> Best >> Erick >> >> On Sun, Mar 25, 2012 at 1:22 PM, Sujatha Arun wrote: >> > Hello, >> > >> > Suppose I have several cores in a single webapp ,I have issue with Index >> > beong corrupted in one core ,or schema /solrconfig of one core is not >> well >> > formed ,then entire webapp refused to load on server restart? >> > >> > Why does this happen? >> > >> > Regards >> > Sujatha >>
Re: First steps with Solr
trying to play with javascript to clean-up my URL!! Context is velocity Suggestions? Thanks -- View this message in context: http://lucene.472066.n3.nabble.com/First-steps-with-Solr-tp3858406p3858959.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Client-side failover with SolrJ
Did you try http://lucene.apache.org/solr/api/org/apache/solr/client/solrj/impl/LBHttpSolrServer.html? This might be what you're looking for. On Mon, Mar 26, 2012 at 11:23 AM, wrote: > Hi, > > has SolrJ any possiblities to do a failover from a master to a slave for > searching? > > Thank you >
Re: First steps with Solr
Partially solved problem! I am playing with the doc.vm file in the velocity folder. I have replaced where access is the value or the URL I want. problem is that someone seems to insert spaces (%20) between * Chausey? and #field('access') resulting into an invalid query. Everthing else seems OK. Is there a way to control this spaces insertion, or to remove them client side. Thanks, Henri -- View this message in context: http://lucene.472066.n3.nabble.com/First-steps-with-Solr-tp3858406p3858607.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: First steps with Solr
Partially solved problem! I am playing with the doc.vm file in the velocity folder. I have replaced *#field('name')* by * http://127.0.0.1:2317/Chausey?#field('access') #field('name') * where access is the value or the URL I want. problem is that someone seems to insert spaces (%20) between * Chausey? and #field('access') resulting into an invalid query. Everthing else seems OK. Is there a way to control this spaces insertion, or to remove them client side. Thanks, Henri -- View this message in context: http://lucene.472066.n3.nabble.com/First-steps-with-Solr-tp3858406p3858582.html Sent from the Solr - User mailing list archive at Nabble.com.
QueryHandler
Hi A noobie question. I am uncertain what is the best way to design for my requirement which the following. I want to allow another client in solrj to query solr with a query that is handled with a custom handler localhost:9090/solr/tokenSearch?tokens{!dismax qf=content}pear,apples,oyster,king kong&fl=score&rows=1000 i.e. a list of tokens (single word and phrases) is sent in one http call. What I would like to do is to search over each individual token and compose a single response back to the client The current approach I have taken is to create a custom search handler as follows dismax myHandler myHandler (which extends SearchComponent) overrides prepare and process methods, extracting and iterating over each token in the input. The problem I am hitting in this design is that the prepare() method is passed a reference to the SolrIndexSearcher in the ResponseBuilder parameter (so for efficiency reasons i don't want to open up another server connection for the search). I can construct a Lucene query and search just fine, but what i would like to do is instead use the e/dismax queries (rather than construct my own - to reduce errors). The getDocList() method of SolrIndexSearcher on the other hand requires a lucene query object. Is this an appropriate design for my requirement? And if so what is the best way to send a SolrQuery to the SolrIndexSearcher? Thank you Peyman
Client-side failover with SolrJ
Hi, has SolrJ any possiblities to do a failover from a master to a slave for searching? Thank you
First steps with Solr
Hi, I have been exploring Solr through the example provided. I have created my own set of documents, and can start to index and query using the Solritas GUI. Two questions: 1/ I would like to have the "name" field to contain a URL to another server on my machine. When I put " text " inside the name field, Solr complains at indexing time. What is the easy solution 2/ Where are the various aspects of the GUI documented or parametrised? I believe playing with an existing/running program is one nice way to discover. Thanks for any help. Henri -- View this message in context: http://lucene.472066.n3.nabble.com/First-steps-with-Solr-tp3858406p3858406.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: web gui specify shard and collection when adding a new core
https://issues.apache.org/jira/browse/SOLR-3275 is the ticket I created. If it's not clear enough please let me know I can try to elaborate. On Mon, Mar 26, 2012 at 3:46 AM, Stefan Matheis wrote: > Jamie: SOLR-3238 is the current admin-ticket. create a new one for the > cloud-related options and describe what (and where) you'd like to have :) > > > > On Sunday, March 25, 2012 at 4:25 AM, Jamie Johnson wrote: > >> Is there a plan to add the ability to specify the shard and >> collection when adding a core through the enhanced web gui, is there a >> JIRA for this? If not I'd be more than happy to add the request if >> someone can point me to the active JIRA (both 3162 and 2667 are marked >> Resolved). > > >
Using DateMath in Facet Label
Hi, We have a requirement to facet on a field with a date value so that the following buckets are shown: a) Last Week b) Last Month c) Last Year d) 2012 e) 2011 or earlier Of course, as 2013 rolls in, then the labels for the last two buckets should change to “2013” and “2012 or earlier”. Is there any way to have Solr return the correct year based on the current date? For example, I thought of trying to do something like this for d) above: …&facet.query={!key=[NOW-1YEAR/YEAR]}date_entered:[NOW-1YEAR/YEAR TO NOW/YEAR]... Thanks, Carlos
RE: Simple Slave Replication Question
That's great information. Thanks for all the help and guidance, its been invaluable. Thanks Ben -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: 26 March 2012 12:21 To: solr-user@lucene.apache.org Subject: Re: Simple Slave Replication Question It's the optimize step. Optimize essentially forces all the segments to be copied into a single new segment, which means that your entire index will be replicated to the slaves. In recent Solrs, there's usually no need to optimize, so unless and until you can demonstrate a noticeable change, I'd just leave the optimize step off. In fact, trunk renames it to forceMerge or something just because it's so common for people to think "of course I want to optimize my index!" and get the unintended consequences you're seeing even thought the optimize doesn't actually do that much good in most cases. Some people just do the optimize once a day (or week or whatever) during off-peak hours as a compromise. Best Erick On Mon, Mar 26, 2012 at 5:02 AM, Ben McCarthy wrote: > Hello, > > Had to leave the office so didn't get a chance to reply. Nothing in the > logs. Just ran one through from the ingest tool. > > Same results full copy of the index. > > Is it something to do with: > > server.commit(); > server.optimize(); > > I call this at the end of the ingestion. > > Would optimize then work across the whole index? > > Thanks > Ben > > -Original Message- > From: Tomás Fernández Löbbe [mailto:tomasflo...@gmail.com] > Sent: 23 March 2012 15:10 > To: solr-user@lucene.apache.org > Subject: Re: Simple Slave Replication Question > > Also, what happens if, instead of adding the 40K docs you add just one and > commit? > > 2012/3/23 Tomás Fernández Löbbe > >> Have you changed the mergeFactor or are you using 10 as in the >> example solrconfig? >> >> What do you see in the slave's log during replication? Do you see any >> line like "Skipping download for..."? >> >> >> On Fri, Mar 23, 2012 at 11:57 AM, Ben McCarthy < >> ben.mccar...@tradermedia.co.uk> wrote: >> >>> I just have a index directory. >>> >>> I push the documents through with a change to a field. Im using >>> SOLRJ to do this. Im using the guide from the wiki to setup the >>> replication. When the feed of updates to the master finishes I call >>> a commit again using SOLRJ. I then have a poll period of 5 minutes >>> from the slave. When it kicks in I see a new version of the index >>> and then it copys the full 5gb index. >>> >>> Thanks >>> Ben >>> >>> -Original Message- >>> From: Tomás Fernández Löbbe [mailto:tomasflo...@gmail.com] >>> Sent: 23 March 2012 14:29 >>> To: solr-user@lucene.apache.org >>> Subject: Re: Simple Slave Replication Question >>> >>> Hi Ben, only new segments are replicated from master to slave. In a >>> situation where all the segments are new, this will cause the index >>> to be fully replicated, but this rarely happen with incremental >>> updates. It can also happen if the slave Solr assumes it has an "invalid" >>> index. >>> Are you committing or optimizing on the slaves? After replication, >>> the index directory on the slaves is called "index" or "index."? >>> >>> Tomás >>> >>> On Fri, Mar 23, 2012 at 11:18 AM, Ben McCarthy < >>> ben.mccar...@tradermedia.co.uk> wrote: >>> >>> > So do you just simpy address this with big nic and network pipes. >>> > >>> > -Original Message- >>> > From: Martin Koch [mailto:m...@issuu.com] >>> > Sent: 23 March 2012 14:07 >>> > To: solr-user@lucene.apache.org >>> > Subject: Re: Simple Slave Replication Question >>> > >>> > I guess this would depend on network bandwidth, but we move around >>> > 150G/hour when hooking up a new slave to the master. >>> > >>> > /Martin >>> > >>> > On Fri, Mar 23, 2012 at 12:33 PM, Ben McCarthy < >>> > ben.mccar...@tradermedia.co.uk> wrote: >>> > >>> > > Hello, >>> > > >>> > > Im looking at the replication from a master to a number of slaves. >>> > > I have configured it and it appears to be working. When >>> > > updating 40K records on the master is it standard to always copy >>> > > over the full index, currently 5gb in size. If this is standard >>> > > what do people do who have massive 200gb indexs, does it not >>> > > take a while to bring the >>> > slaves inline with the master? >>> > > >>> > > Thanks >>> > > Ben >>> > > >>> > > >>> > > >>> > > >>> > > This e-mail is sent on behalf of Trader Media Group Limited, >>> > > Registered >>> > > Office: Auto Trader House, Cutbush Park Industrial Estate, >>> > > Danehill, Lower Earley, Reading, Berkshire, RG6 4UT(Registered in >>> > > England No. >>> > 4768833). >>> > > This email and any files transmitted with it are confidential >>> > > and may be legally privileged, and intended solely for the use >>> > > of the individual or entity to whom they are addressed. If you >>> > > have received this email in error please notify the sender. This >>> > > email message h
RE: "ant test" and contribs
Check out solr/contrib/analysis-extras/build.xml -Original Message- From: Lance Norskog [mailto:goks...@gmail.com] Sent: Monday, March 26, 2012 2:14 AM To: solr-user@lucene.apache.org Subject: Re: "ant test" and contribs Ah! It is more complex. There is code and a library jar in lucene/contrib/module and a solr module that uses it in solr/contrib/module. I had to copy the library from lucene/contrib/module/jar to lucene/contrib/module/jar or else the solr contrib part would not compile. There does not seem to be any contrib that does this. There are lucene/contrib parts that export jars. But there is no solr/contrib that needs one of those jars, is there? On Sat, Mar 24, 2012 at 5:05 PM, Steven A Rowe wrote: > Hi Lance, > > Are you adding a new solr/contrib/project/? If so, why not use the build.xml > file from a sibling project? E.g. try starting from > solr/contrib/velocity/build.xml - it is very simple and enables all build > steps by importing solr/contrib/contrib-build.xml. > > solr/contrib/contrib-build.xml imports solr/common-build.xml; > solr/common-build.xml imports lucene/contrib/contrib-build.xml; and > lucene/contrib/contrib-build.xml imports lucene/common-build.xml. > > Simple! > > Steve > > -Original Message- > From: Lance Norskog [mailto:goks...@gmail.com] > Sent: Saturday, March 24, 2012 7:56 PM > To: solr-user > Subject: "ant test" and contribs > > What do I need to add so that a contrib/project/src/test/ directory can find > the classes in contrib/project/src/java? I've gotten the ant files to where > 'ant test-contrib' works. But 'ant test' fails: it cannot compile the test > classes after building the jars for contrib/project. Any hints? > > -- > Lance Norskog > goks...@gmail.com -- Lance Norskog goks...@gmail.com
Re: Solr cores issue
I was migrating to cores from webapp ,and I was copying a bunch of indexes from webapps to respective cores ,when I restarted ,I had this issue where the whole webapp with the cores would not startup and was getting index corrupted message.. In this scenario or in a scenario where there is an issue with schema /config file for one core ,will the whole webapp with the cores not restart? Regards Sujatha On Mon, Mar 26, 2012 at 4:43 PM, Erick Erickson wrote: > Index corruption is very rare, can you provide more details how you > got into that state? > > Best > Erick > > On Sun, Mar 25, 2012 at 1:22 PM, Sujatha Arun wrote: > > Hello, > > > > Suppose I have several cores in a single webapp ,I have issue with Index > > beong corrupted in one core ,or schema /solrconfig of one core is not > well > > formed ,then entire webapp refused to load on server restart? > > > > Why does this happen? > > > > Regards > > Sujatha >
Re: Querying field with parenthesis
Your problem is the KeywordTokenizerFactory and the query parser. This often trips people up. When you use author:(stephen king), the query parser breaks this up before it gets to the analysis chain into two separate tokens. But by virtue of the fact that you're using KeywordTokenizer, the actual field only has a single token "stephen king". So neither of the pieces match. When you put "stephen king" (with quotes) in, the query parser does not try to break the tokens up and the analysis chain gets a single token rather than two. Your wildcards are matching because steph* and *king both match the _single_ token "stephen king". Two ways you can get lots of help with this kind of think is the admin/analysis page and attaching &debugQuery=on to your URL and look at the parsed query results. Using something like WhitespaceTokenizerFactory might give you more expected results. Best Erick 2012/3/26 Tim Terlegård : > I have created my own field type. I have indexed "Stephen King" and > get no hit when searching > author:(stephen king) > > I get a hit when searching like this > author:(stephen* AND *king) > > I also get a hit when searching like this > author:"stephen king" > > So it seems like when querying with (...) it actually splits the > words. This is the type of the author field > > > > > > > > > I expected that author:(stephen king) would do the same thing as > author:"stephen king". Why is this not the case? > > Thanks, > Tim
Re: Simple Slave Replication Question
It's the optimize step. Optimize essentially forces all the segments to be copied into a single new segment, which means that your entire index will be replicated to the slaves. In recent Solrs, there's usually no need to optimize, so unless and until you can demonstrate a noticeable change, I'd just leave the optimize step off. In fact, trunk renames it to forceMerge or something just because it's so common for people to think "of course I want to optimize my index!" and get the unintended consequences you're seeing even thought the optimize doesn't actually do that much good in most cases. Some people just do the optimize once a day (or week or whatever) during off-peak hours as a compromise. Best Erick On Mon, Mar 26, 2012 at 5:02 AM, Ben McCarthy wrote: > Hello, > > Had to leave the office so didn't get a chance to reply. Nothing in the > logs. Just ran one through from the ingest tool. > > Same results full copy of the index. > > Is it something to do with: > > server.commit(); > server.optimize(); > > I call this at the end of the ingestion. > > Would optimize then work across the whole index? > > Thanks > Ben > > -Original Message- > From: Tomás Fernández Löbbe [mailto:tomasflo...@gmail.com] > Sent: 23 March 2012 15:10 > To: solr-user@lucene.apache.org > Subject: Re: Simple Slave Replication Question > > Also, what happens if, instead of adding the 40K docs you add just one and > commit? > > 2012/3/23 Tomás Fernández Löbbe > >> Have you changed the mergeFactor or are you using 10 as in the example >> solrconfig? >> >> What do you see in the slave's log during replication? Do you see any >> line like "Skipping download for..."? >> >> >> On Fri, Mar 23, 2012 at 11:57 AM, Ben McCarthy < >> ben.mccar...@tradermedia.co.uk> wrote: >> >>> I just have a index directory. >>> >>> I push the documents through with a change to a field. Im using >>> SOLRJ to do this. Im using the guide from the wiki to setup the >>> replication. When the feed of updates to the master finishes I call >>> a commit again using SOLRJ. I then have a poll period of 5 minutes >>> from the slave. When it kicks in I see a new version of the index >>> and then it copys the full 5gb index. >>> >>> Thanks >>> Ben >>> >>> -Original Message- >>> From: Tomás Fernández Löbbe [mailto:tomasflo...@gmail.com] >>> Sent: 23 March 2012 14:29 >>> To: solr-user@lucene.apache.org >>> Subject: Re: Simple Slave Replication Question >>> >>> Hi Ben, only new segments are replicated from master to slave. In a >>> situation where all the segments are new, this will cause the index >>> to be fully replicated, but this rarely happen with incremental >>> updates. It can also happen if the slave Solr assumes it has an "invalid" >>> index. >>> Are you committing or optimizing on the slaves? After replication, >>> the index directory on the slaves is called "index" or "index."? >>> >>> Tomás >>> >>> On Fri, Mar 23, 2012 at 11:18 AM, Ben McCarthy < >>> ben.mccar...@tradermedia.co.uk> wrote: >>> >>> > So do you just simpy address this with big nic and network pipes. >>> > >>> > -Original Message- >>> > From: Martin Koch [mailto:m...@issuu.com] >>> > Sent: 23 March 2012 14:07 >>> > To: solr-user@lucene.apache.org >>> > Subject: Re: Simple Slave Replication Question >>> > >>> > I guess this would depend on network bandwidth, but we move around >>> > 150G/hour when hooking up a new slave to the master. >>> > >>> > /Martin >>> > >>> > On Fri, Mar 23, 2012 at 12:33 PM, Ben McCarthy < >>> > ben.mccar...@tradermedia.co.uk> wrote: >>> > >>> > > Hello, >>> > > >>> > > Im looking at the replication from a master to a number of slaves. >>> > > I have configured it and it appears to be working. When updating >>> > > 40K records on the master is it standard to always copy over the >>> > > full index, currently 5gb in size. If this is standard what do >>> > > people do who have massive 200gb indexs, does it not take a while >>> > > to bring the >>> > slaves inline with the master? >>> > > >>> > > Thanks >>> > > Ben >>> > > >>> > > >>> > > >>> > > >>> > > This e-mail is sent on behalf of Trader Media Group Limited, >>> > > Registered >>> > > Office: Auto Trader House, Cutbush Park Industrial Estate, >>> > > Danehill, Lower Earley, Reading, Berkshire, RG6 4UT(Registered in >>> > > England No. >>> > 4768833). >>> > > This email and any files transmitted with it are confidential and >>> > > may be legally privileged, and intended solely for the use of the >>> > > individual or entity to whom they are addressed. If you have >>> > > received this email in error please notify the sender. This email >>> > > message has been swept for the presence of computer viruses. >>> > > >>> > > >>> > >>> > >>> > >>> > >>> > This e-mail is sent on behalf of Trader Media Group Limited, >>> > Registered >>> > Office: Auto Trader House, Cutbush Park Industrial Estate, >>> > Danehil
Re: Solr cores issue
Index corruption is very rare, can you provide more details how you got into that state? Best Erick On Sun, Mar 25, 2012 at 1:22 PM, Sujatha Arun wrote: > Hello, > > Suppose I have several cores in a single webapp ,I have issue with Index > beong corrupted in one core ,or schema /solrconfig of one core is not well > formed ,then entire webapp refused to load on server restart? > > Why does this happen? > > Regards > Sujatha
Re: Trouble Setting Up Development Environment
Depending upon what you actually need to do, you could consider just attaching to the running Solr instance remotely. I know it's easy in IntelliJ, and believe Eclipse makes this easy as well but I haven't used Eclipse in a while Best Erick On Sat, Mar 24, 2012 at 11:11 PM, Li Li wrote: > I forgot to write that I am running it in tomcat 6, not jetty. > you can right click the project -> Debug As -> Debug on Server -> Manually > define a new Server -> Apache -> Tomcat 6 > if you should have configured a tomcat. > > On Sun, Mar 25, 2012 at 4:17 AM, Karthick Duraisamy Soundararaj < > karthick.soundara...@gmail.com> wrote: > >> I followed your instructions. I got 8 Errors and a bunch of warnings few >> of them related to classpath. I also got the following exception when I >> tried to run with the jetty ( i have attached the full console output with >> this email. I figured solr directory with config files might be missing and >> added that in WebContent. >> >> Could be of great help if someone can point me at right direction. >> >> ls WebContent >> admin favicon.ico index.jsp solr WEB-INF >> >> >> *SEVERE: Error in solrconfig.xml:org.apache.solr.common.SolrException: No >> system property or default value specified for solr.test.sys.prop1* >> at >> org.apache.solr.common.util.DOMUtil.substituteProperty(DOMUtil.java:331) >> at >> org.apache.solr.common.util.DOMUtil.substituteProperties(DOMUtil.java:290) >> at >> org.apache.solr.common.util.DOMUtil.substituteProperties(DOMUtil.java:292) >> at org.apache.solr.core.Config.(Config.java:165) >> at org.apache.solr.core.SolrConfig.(SolrConfig.java:131) >> at org.apache.solr.core.CoreContainer.create(CoreContainer.java:435) >> at org.apache.solr.core.CoreContainer.load(CoreContainer.java:316) >> at >> org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:133) >> at >> org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:94) >> at org.mortbay.jetty.servlet.FilterHolder.doStart(FilterHolder.java:97) >> at >> org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50) >> at >> org.mortbay.jetty.servlet.ServletHandler.initialize(ServletHandler.java:713) >> at org.mortbay.jetty.servlet.Context.startContext(Context.java:140) >> at >> org.mortbay.jetty.webapp.WebAppContext.startContext(WebAppContext.java:1282) >> at >> org.mortbay.jetty.handler.ContextHandler.doStart(ContextHandler.java:518) >> at >> org.mortbay.jetty.webapp.WebAppContext.doStart(WebAppContext.java:499) >> at >> org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50) >> at >> org.mortbay.jetty.handler.HandlerWrapper.doStart(HandlerWrapper.java:130) >> at org.mortbay.jetty.Server.doStart(Server.java:224) >> at >> org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50) >> at runjettyrun.Bootstrap.main(Bootstrap.java:97) >> >> >> *Here are the 8 errors I got* >> *Description >> Resource >> Path >> >> Location Type* >> core cannot be resolved >> dataimport.jsp >> /solr3_5/ssrc/solr/contrib/dataimporthandler/src/webapp/admin >> line 27 JSP Problem >> End tag () not closed properly, expected >. package.html >> /solr3_5/ssrc/lucene/contrib/queryparser/src/java/org/apache/lucene/queryParser/core/config >> line 64 HTML Problem >> Fragment "_info.jsp" was not found at expected >> path /solr3_5/ssrc/solr/contrib/ >> dataimporthandler/src/webapp/admin/_info.jsp dataimport.jsp >> /solr3_5/ssrc/solr/contrib/dataimporthandler/src/webapp/admin >> line 21 JSP Problem >> Fragment "_info.jsp" was not found at expected >> path /solr3_5/ssrc/solr/contrib/dataimporthandler >> /src/webapp/admin/_info.jsp >> debug.jsp >> /solr3_5/ssrc/solr/contrib/dataimporthandler/src/webapp/admin >> line 19 JSP Problem >> Named template dotdots is not available tabutils.xsl >> >> /solr3_5/ssrc/lucene/src/site/src/documentation/skins/common/xslt/html >> line 41 XSL Problem >> Named template dotdots is not available tabutils.xsl >> /solr3_5/ssrc/solr/site-src/src/documentation/skins/common/xslt/html >> line 41 XSL Problem >> Unhandled exception type Throwable ping.jsp >> /solr3_5/WebContent/admin >> line 46 JSP Problem >> Unhandled exception type Throwable ping.jsp >> /solr3_5/ssrc/solr/webapp/web/admin >> >> line 46 JSP Problem >> >> >> *Here are the warnings I got* >> >>> Description Resource Path Location Type >>> Classpath entry >>> /solr3_5/
Querying field with parenthesis
I have created my own field type. I have indexed "Stephen King" and get no hit when searching author:(stephen king) I get a hit when searching like this author:(stephen* AND *king) I also get a hit when searching like this author:"stephen king" So it seems like when querying with (...) it actually splits the words. This is the type of the author field I expected that author:(stephen king) would do the same thing as author:"stephen king". Why is this not the case? Thanks, Tim
Re: Old Google Guava library needs updating (r05)
Hmmm, near as I can tell, guava is only used in the Carrot2 contrib, so maybe ask over at: http://project.carrot2.org/? Best Erick On Sat, Mar 24, 2012 at 3:31 PM, Nicholas Ball wrote: > > Hey all, > > Working on a plugin, which uses the Curator library (ZooKeeper client). > Curator depends on the very latest Google Guava library which unfortunately > clashes with Solr's outdated r05 of Guava. > Think it's safe to say that Solr should be using the very latest Guava > library (11.0.1) too right? > Shall I open up a JIRA issue for someone to update it? > > Cheers, > Nick
Re: Indexing Source Code
Hi Bastian, Can you please tell us what kind of search you imagine doing with some (use case) examples? Marcelo On Monday, March 26, 2012, Bastian H wrote: > Hi, > > I like to index my Source Code - the most is Cobol, Asembler and Java - > with Solr. > > I don't know where to start... I think I need to parse it to get XML for > Solr. Do I need Tinka? Is there any Parser I could use? > > I want to index functions, variables and function calls as well as > commentaries. > > Can somebody show me to a starting point? > > Thanks > Bastian > -- Marcelo Carvalho Fernandes +55 21 8272-7970 +55 21 2205-2786
Re: Practical Optimization
Thanks, this is very helpful! On Sat, Mar 24, 2012 at 4:50 AM, Martin Koch wrote: > Thanks for writing this up. These are good tips. > > /Martin > > On Fri, Mar 23, 2012 at 9:57 PM, dw5ight wrote: > >> Hey All- >> >> we run a http://carsabi.com car search engine with Solr and did some >> benchmarking recently after we switched from a hosted service to >> self-hosting. In brief, we went from 800ms complex range queries on a 1.5M >> document corpus to 43ms. The major shifts were switching from EC2 Large to >> EC2 CC8XL which got us down to 282ms (2.82x speed gain due to 2.75x CPU >> speed increase we think), and then down to 43ms when we sharded to 8 cores. >> We tried sharding to 12 and 16 but saw negligible gains after this point. >> >> Anyway, hope this might be useful to someone - we write up exact stats and >> a >> step by step sharding procedure on our >> >> http://carsabi.com/car-news/2012/03/23/optimizing-solr-7x-your-search-speed/ >> tech blog if anyone's interested. >> >> best >> Dwight >> >> -- >> View this message in context: >> http://lucene.472066.n3.nabble.com/Practical-Optimization-tp3852776p3852776.html >> Sent from the Solr - User mailing list archive at Nabble.com. >>
Indexing Source Code
Hi, I like to index my Source Code - the most is Cobol, Asembler and Java - with Solr. I don't know where to start... I think I need to parse it to get XML for Solr. Do I need Tinka? Is there any Parser I could use? I want to index functions, variables and function calls as well as commentaries. Can somebody show me to a starting point? Thanks Bastian
RE: Simple Slave Replication Question
Hello, Had to leave the office so didn't get a chance to reply. Nothing in the logs. Just ran one through from the ingest tool. Same results full copy of the index. Is it something to do with: server.commit(); server.optimize(); I call this at the end of the ingestion. Would optimize then work across the whole index? Thanks Ben -Original Message- From: Tomás Fernández Löbbe [mailto:tomasflo...@gmail.com] Sent: 23 March 2012 15:10 To: solr-user@lucene.apache.org Subject: Re: Simple Slave Replication Question Also, what happens if, instead of adding the 40K docs you add just one and commit? 2012/3/23 Tomás Fernández Löbbe > Have you changed the mergeFactor or are you using 10 as in the example > solrconfig? > > What do you see in the slave's log during replication? Do you see any > line like "Skipping download for..."? > > > On Fri, Mar 23, 2012 at 11:57 AM, Ben McCarthy < > ben.mccar...@tradermedia.co.uk> wrote: > >> I just have a index directory. >> >> I push the documents through with a change to a field. Im using >> SOLRJ to do this. Im using the guide from the wiki to setup the >> replication. When the feed of updates to the master finishes I call >> a commit again using SOLRJ. I then have a poll period of 5 minutes >> from the slave. When it kicks in I see a new version of the index >> and then it copys the full 5gb index. >> >> Thanks >> Ben >> >> -Original Message- >> From: Tomás Fernández Löbbe [mailto:tomasflo...@gmail.com] >> Sent: 23 March 2012 14:29 >> To: solr-user@lucene.apache.org >> Subject: Re: Simple Slave Replication Question >> >> Hi Ben, only new segments are replicated from master to slave. In a >> situation where all the segments are new, this will cause the index >> to be fully replicated, but this rarely happen with incremental >> updates. It can also happen if the slave Solr assumes it has an "invalid" >> index. >> Are you committing or optimizing on the slaves? After replication, >> the index directory on the slaves is called "index" or "index."? >> >> Tomás >> >> On Fri, Mar 23, 2012 at 11:18 AM, Ben McCarthy < >> ben.mccar...@tradermedia.co.uk> wrote: >> >> > So do you just simpy address this with big nic and network pipes. >> > >> > -Original Message- >> > From: Martin Koch [mailto:m...@issuu.com] >> > Sent: 23 March 2012 14:07 >> > To: solr-user@lucene.apache.org >> > Subject: Re: Simple Slave Replication Question >> > >> > I guess this would depend on network bandwidth, but we move around >> > 150G/hour when hooking up a new slave to the master. >> > >> > /Martin >> > >> > On Fri, Mar 23, 2012 at 12:33 PM, Ben McCarthy < >> > ben.mccar...@tradermedia.co.uk> wrote: >> > >> > > Hello, >> > > >> > > Im looking at the replication from a master to a number of slaves. >> > > I have configured it and it appears to be working. When updating >> > > 40K records on the master is it standard to always copy over the >> > > full index, currently 5gb in size. If this is standard what do >> > > people do who have massive 200gb indexs, does it not take a while >> > > to bring the >> > slaves inline with the master? >> > > >> > > Thanks >> > > Ben >> > > >> > > >> > > >> > > >> > > This e-mail is sent on behalf of Trader Media Group Limited, >> > > Registered >> > > Office: Auto Trader House, Cutbush Park Industrial Estate, >> > > Danehill, Lower Earley, Reading, Berkshire, RG6 4UT(Registered in >> > > England No. >> > 4768833). >> > > This email and any files transmitted with it are confidential and >> > > may be legally privileged, and intended solely for the use of the >> > > individual or entity to whom they are addressed. If you have >> > > received this email in error please notify the sender. This email >> > > message has been swept for the presence of computer viruses. >> > > >> > > >> > >> > >> > >> > >> > This e-mail is sent on behalf of Trader Media Group Limited, >> > Registered >> > Office: Auto Trader House, Cutbush Park Industrial Estate, >> > Danehill, Lower Earley, Reading, Berkshire, RG6 4UT(Registered in England >> > No. >> 4768833). >> > This email and any files transmitted with it are confidential and >> > may be legally privileged, and intended solely for the use of the >> > individual or entity to whom they are addressed. If you have >> > received this email in error please notify the sender. This email >> > message has been swept for the presence of computer viruses. >> > >> > >> >> >> >> >> This e-mail is sent on behalf of Trader Media Group Limited, >> Registered >> Office: Auto Trader House, Cutbush Park Industrial Estate, Danehill, >> Lower Earley, Reading, Berkshire, RG6 4UT(Registered in England No. 4768833). >> This email and any files transmitted with it are confidential and may >> be legally privileged, and intended solely for the use of the >> individual or entity to whom they are addressed
Re: web gui specify shard and collection when adding a new core
Jamie: SOLR-3238 is the current admin-ticket. create a new one for the cloud-related options and describe what (and where) you'd like to have :) On Sunday, March 25, 2012 at 4:25 AM, Jamie Johnson wrote: > Is there a plan to add the ability to specify the shard and > collection when adding a core through the enhanced web gui, is there a > JIRA for this? If not I'd be more than happy to add the request if > someone can point me to the active JIRA (both 3162 and 2667 are marked > Resolved).