Re: Solr expert(s) needed
2009/1/10 Lance Norskog > > I have used the rss format of the data input handler, and it works well but > has problems with detecting errors etc. That is, it works well when it > works > but does not fail gracefully in a useful way. > > Lance, some error handling logic was added after you described your use-cases in a previous mail: https://issues.apache.org/jira/browse/SOLR-842 We also have a very simple event listener in DIH for import start and end. Probably we can another for onError as well. https://issues.apache.org/jira/browse/SOLR-938 If there are other things that can help make DIH more robust, please do let us know. -- Regards, Shalin Shekhar Mangar.
Re: EmbeddedSolrServer in Single Core
On Jan 9, 2009, at 8:12 PM, qp19 wrote: Please bear with me. I am new to Solr. I have searched all the existing posts about this and could not find an answer. I wanted to know how do I go about creating a SolrServer using EmbeddedSolrServer. I tried to initialize this several ways but was unsuccesful. I do not have multi-core. I am using solrj 1.3. I attempted to use the depracated methods as mentioned in the SolrJ documentation the following way but it fails as well with unable to locate Core. SolrCore core = SolrCore.getSolrCore(); This function is deprecated and *really* should no be used -- especially for embedded solr server. (the only chance you would have for it to work is if you start up solr in a web app before calling this) SolrServer server = new EmbeddedSolrServer( core ); Core initialization is kind of a mess, but this contains everything you would need: CoreContainer container = new CoreContainer(new SolrResourceLoader(SolrResourceLoader.locateInstanceDir())); CoreDescriptor dcore = new CoreDescriptor(container, coreName, solrConfig.getResourceLoader().getInstanceDir()); dcore.setConfigName(solrConfig.getResourceName()); dcore.setSchemaName(indexSchema.getResourceName()); SolrCore core = new SolrCore( null, dataDirectory, solrConfig, indexSchema, dcore); container.register(coreName, core, false); So far my installation is pretty basic with Solr running on Tomcat as per instructions in the wiki. My solr home is outside of webapps folder i.e "c:/tomcat-solr/solr". I am able to connect using CommonsHttpSolrServer("http://localhost:8080/solr ") without a problem. The question in a nutshell is, how do I instantiate EmbeddedSolrServer using new EmbeddedSolrServer(CoreContainer coreContainer, String coreName) ? Initializing CoreContainer appears to be complicated when compared to SolrCore.getSolrCore() as per the examples. Is there a simpler way to Initialize CoreContainer? Is a core(or CoreName) necessary eventhough I don't use multi-core? Also, is it possible to initialize EmbeddedSolrServer using spring? Thanks in advance for the help. yes, I use this: ${dir} ${dconfigFile} class="org.apache.solr.client.solrj.embedded.EmbeddedSolrServer"> class="org.apache.solr.client.solrj.embedded.EmbeddedSolrServer"> ryan
Re: Solr expert(s) needed
I don't know about the Nutch format -> Solr schema idea either. The NUTCH-442 system uses Solr for both indexing and searching, and uses Nutch for only crawling. At my last job we had a custom scripting system that crawled the front page of over 5000 sites. Each site had a configured script. Yes, it was complex. We also had custom crawlers for Youtube & myspace and some other sites which gave APIs, but in general it was all hand-coded. I have used the rss format of the data input handler, and it works well but has problems with detecting errors etc. That is, it works well when it works but does not fail gracefully in a useful way. Lance 2009/1/9 Tony Wang > Thanks Lance! I have no idea whether the Nuth-generated index could be > converted to Solr schema. I wonder what people are using this NUTCH-442 for > (http://issues.apache.org/jira/browse/NUTCH-442). > > So what crawler do you use to generate index for Solr? Thanks a lot!! > > On Fri, Jan 9, 2009 at 8:04 PM, Lance Norskog wrote: > > > http://issues.apache.org/jira/browse/NUTCH-442 > > > > Haven't used Nutch. Can the Nutch-generated index be reverse-engineered > > into > > a Solr schema? In that case, you can just copy the Lucene index files > away > > from Nutch and run them under Solr. > > > > > > -- > Are you RCholic? www.RCholic.com > 温 良 恭 俭 让 仁 义 礼 智 信 >
UUID field type documentation and ExtractingRequestHandler
The UUID field type is not documented on the Wiki. https://issues.apache.org/jira/browse/SOLR-308 The ExtractingRequestHandler creates its own UUID instead of using the UUID field type. http://issues.apache.org/jira/browse/SOLR-284
Re: Solr expert(s) needed
Thanks Lance! I have no idea whether the Nuth-generated index could be converted to Solr schema. I wonder what people are using this NUTCH-442 for (http://issues.apache.org/jira/browse/NUTCH-442). So what crawler do you use to generate index for Solr? Thanks a lot!! On Fri, Jan 9, 2009 at 8:04 PM, Lance Norskog wrote: > http://issues.apache.org/jira/browse/NUTCH-442 > > Haven't used Nutch. Can the Nutch-generated index be reverse-engineered > into > a Solr schema? In that case, you can just copy the Lucene index files away > from Nutch and run them under Solr. > -- Are you RCholic? www.RCholic.com 温 良 恭 俭 让 仁 义 礼 智 信
Re: Solr expert(s) needed
http://issues.apache.org/jira/browse/NUTCH-442 Haven't used Nutch. Can the Nutch-generated index be reverse-engineered into a Solr schema? In that case, you can just copy the Lucene index files away from Nutch and run them under Solr.
RE: Ensuring documents indexed by autocommit
: Thanks again for your inputs. : But then I am still stuck on the question that how do we ensure that : document is successfully indexed. One option I see is search for every Have faith. If the add completes successfully then the data made it to solr, was indexed, and now lives in the index files. If the commit completes sucessfully then the index files have been flushed and checkpointed so all new uses of them will seethe data. If you want to be sure your data is indexed, all you have to do is check that neither of those calls got an error (hence Shalin's point about doing the commit yourself instead of using autocommit so you can actually test the response from the commit call. But frankly, i wouldn't worry so much. (How do you ensure that rows are successfully stored when you do database updates?) -Hoss
EmbeddedSolrServer in Single Core
Please bear with me. I am new to Solr. I have searched all the existing posts about this and could not find an answer. I wanted to know how do I go about creating a SolrServer using EmbeddedSolrServer. I tried to initialize this several ways but was unsuccesful. I do not have multi-core. I am using solrj 1.3. I attempted to use the depracated methods as mentioned in the SolrJ documentation the following way but it fails as well with unable to locate Core. SolrCore core = SolrCore.getSolrCore(); SolrServer server = new EmbeddedSolrServer( core ); So far my installation is pretty basic with Solr running on Tomcat as per instructions in the wiki. My solr home is outside of webapps folder i.e "c:/tomcat-solr/solr". I am able to connect using CommonsHttpSolrServer("http://localhost:8080/solr";) without a problem. The question in a nutshell is, how do I instantiate EmbeddedSolrServer using new EmbeddedSolrServer(CoreContainer coreContainer, String coreName) ? Initializing CoreContainer appears to be complicated when compared to SolrCore.getSolrCore() as per the examples. Is there a simpler way to Initialize CoreContainer? Is a core(or CoreName) necessary eventhough I don't use multi-core? Also, is it possible to initialize EmbeddedSolrServer using spring? Thanks in advance for the help. -- View this message in context: http://www.nabble.com/EmbeddedSolrServer-in-Single-Core-tp21383525p21383525.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr/Lucene MoreLikeThis with RangeQuery
Hi, Thanks for the help. >If i'm understanding you correctly, you modified the MoreLikeThis class to >include your new clause (using those two lines above) correct? Yes. The time field is a "long" and so is the range variables, so the problem should not be related to that. If I construct the query by adding a ConstantScoreRangeQuery, nothing more, no results are returned. But I have not tried to add it to the filter part of the mlt-handler; I suspect that this would solve the problem. However, after trying more alternatives, I think that adding a &fq=time:[1230922259744+TO+1231440659744] to the mlt-url-request seems to actually add a time filter to the constructed MLT-query: Query:(+kategori:nyheter titel:moderbolaget^2.0 artikel:moderbolaget titel:pininfarin^2.0 artikel:pininfarin titel:bilbygg^1.9725448 artikel:bilbygg^0.9862724 titel:huvudäg^1.9257689 artikel:huvudäg^0.9628844 titel:uddevall^1.9054867 artikel:uddevall^0.95274335 titel:majoritet^1.71646 artikel:majoritet^0.85823 titel:volvo^1.6696839 artikel:volvo^0.83484197 titel:italiensk^1.5226858 artikel:italiensk^0.7613429)~5 So, a mlt.fq does not seems to be necessary to implement since the fq filter seems to be passed to the mlt-query. To use a long for the time field rather than a Field.Date is probably bad, but it seems to work at least for testing. So, I think that my problem is solved. Thanks! /Clas On Fri, Jan 9, 2009 at 2:40 AM, Chris Hostetter wrote: > > : Solr/Lucene. I am in a situation where I think that I can improve the > : quality of the LikeThis-documents significantly by restricting the > : MoreLikeThis-query to documents where one field has its term in a > : specified range. That is, I would like to add a RangeQuery to the > : default MoreLikeThis query. >[...] > : I would like to also add a range restriction as, > : > : rq = new > ConstantScoreRangeQuery("time",startTimeString,endTimeString,true,true); > : query.add(rq, BooleanClause.Occur.MUST); > : > : This is all made in > : > contrib/queries/src/java/org/apache/lucene/search/similar/MoreLikeThis.java > : > : However, this does not work at all when running from Solr (no MLT > : suggestions are returned). I suspect that the problem is that the > > If i'm understanding you correctly, you modified the MoreLikeThis class to > include your new clause (using those two lines above) correct? > > If you aren't getting any results, i suspect it may be an issues of term > value encoding ... is your "time" field a Solr DateField? what is the > value of startTimeString and endTimeString? ... if you replace all of the > MLT Query logic so that it's *just* the ConstantScoreRangeQuery do you get > any results? > > : does not perform a standard query, but a getDocList: > : > : results.docList = searcher.getDocList(mltQuery, filters, null, > : start, rows, flags); > : > : and that this type of query does not handle a RangeQuery. Is this > : correct, or what is the problem with adding a RangeQuery? Should it be > > a RangeQuery will work just fine. but in general the type of problem you > are trying to solve could be more generally dealt with if the MLT code had > a way to let people specify "filter" queries (like the existing "fq" > param) to be applied tothe MLT logic -- that way they wouldn't contribute > to the relevancy ... it seems like it would be pretty easy to add a > "mlt.fq" param for this purpose if you wanted to appraoch the problem > thatway as a more generic path -- but i'm not too familiar with the MLT > code to say for certain waht would be required, and I know the code is > probably more complicated then it should be with the MoreLikeThisHandler > and the MoreLikeThisComponent (i think there's a MoreLikeThisHelper that > they share or something) > > > > -Hoss > >
Re: Amount range and facet fields returns [facet_fields]
On Jan 8, 2009, at 9:29 AM, Yevgeniy Belman wrote: the response i get when executing only the following, produces no facet counts. It could be a bug. facet.query=[price:[* TO 500], price:[500 TO *] That's an invalid query. If you want two ranges, use two facet.query parameters. Erik
Re: Boosting based on number of values in multiValued field?
On Jan 9, 2009, at 12:56 PM, Eric Kilby wrote: Each document has a multivalued field, with 1-n values in it (as many as 20). The actual values don't matter to me, but the number of values is a rough proxy for the quality of a record. I'd like to apply a very small boost based on the number of values in that field, so that among a set of similar documents the ones with more values will score higher and sort ahead of those with less values. The simplest technique would be to have your indexer add another field with the count (or some boost factor based on it), and then leverage that. Perhaps even use the document boost capability at indexing time. Erik
Re: Deduplication patch not working in nightly build
Hey Mark, Sorry I was not enough especific, I wanted to mean that I have and I always had autoCommit=false. I will do some more traces and test. Will post if I have any new important thing to mention. Thanks. Marc Sturlese wrote: > > Hey Shalin, > > In the begining (when the error was appearing) i had > 32 > and no maxBufferedDocs set > > Now I have: > 32 > 50 > > I think taht setting maxBufferedDocs to 50 I am forcing more disk writting > than I would like... but at least it works fine (but a bit > slower,opiously). > > I keep saying that the most weird thing is that I don't have that problem > using solr1.3, just with the nightly... > > Even that it's good that it works well now, would be great if someone can > give me an explanation why this is happening > > > > Shalin Shekhar Mangar wrote: >> >> On Fri, Jan 9, 2009 at 9:23 PM, Marc Sturlese >> wrote: >> >>> >>> hey there, >>> I hadn't autoCommit set to true but I have it sorted! The error >>> stopped >>> appearing after setting the property maxBufferedDocs in solrconfig.xml. >>> I >>> can't exactly undersand why but it just worked. >>> Anyway, maxBufferedDocs is deprecaded, would ramBufferSizeMB do the >>> same? >>> >>> >> What I find strange is this line in the exception: >> "Last packet sent to the server was 202481 ms ago." >> >> Something took very very long to complete and the connection got closed >> by >> the time the next row was fetched from the opened resultset. >> >> Just curious, what was the previous value of maxBufferedDocs and what did >> you change it to? >> >> >>> >>> -- >>> View this message in context: >>> http://www.nabble.com/Deduplication-patch-not-working-in-nightly-build-tp21287327p21374908.html >>> Sent from the Solr - User mailing list archive at Nabble.com. >>> >>> >> >> >> -- >> Regards, >> Shalin Shekhar Mangar. >> >> > > -- View this message in context: http://www.nabble.com/Deduplication-patch-not-working-in-nightly-build-tp21287327p21378069.html Sent from the Solr - User mailing list archive at Nabble.com.
Boosting based on number of values in multiValued field?
hi, I'm looking through the list archives and the documentation on boost queries, and I don't see anything that matches this case. I have an index of documents, some of which are very similar but not identical. Therefore the scores are very close and the ordering is affected by somewhat arbitrary factors. When I do a query the similar documents come up close together, so that's a good start. Each document has a multivalued field, with 1-n values in it (as many as 20). The actual values don't matter to me, but the number of values is a rough proxy for the quality of a record. I'd like to apply a very small boost based on the number of values in that field, so that among a set of similar documents the ones with more values will score higher and sort ahead of those with less values. Is there currently a function or set of functions that can be applied to this use case? Or a place where I could build and contribute something? In that case I'd look for a starting point on where to look. thanks, Eric -- View this message in context: http://www.nabble.com/Boosting-based-on-number-of-values-in-multiValued-field--tp21377250p21377250.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Deduplication patch not working in nightly build
Your basically writing segments more often now, and somehow avoiding a longer merge I think. Also, likely, deduplication is probably adding enough extra data to your index to hit a sweet spot where a merge is too long. Or something to that effect - MySql is especially sensitive to timeouts when doing a select * on a huge db in my testing. I didnt understand your answer on the autocommit - I take it you are using it? Or no? All a guess, but it def points to a merge taking a bit long and causing a timeout. I think you can relax the MySql timeout settings if that is it. I'd like to get to the bottom of this as well, so any other info you can provide would be great. - Mark Marc Sturlese wrote: Hey Shalin, In the begining (when the error was appearing) i had 32 and no maxBufferedDocs set Now I have: 32 50 I think taht setting maxBufferedDocs to 50 I am forcing more disk writting than I would like... but at least it works fine (but a bit slower,opiously). I keep saying that the most weird thing is that I don't have that problem using solr1.3, just with the nightly... Even that it's good that it works well now, would be great if someone can give me an explanation why this is happening Shalin Shekhar Mangar wrote: On Fri, Jan 9, 2009 at 9:23 PM, Marc Sturlese wrote: hey there, I hadn't autoCommit set to true but I have it sorted! The error stopped appearing after setting the property maxBufferedDocs in solrconfig.xml. I can't exactly undersand why but it just worked. Anyway, maxBufferedDocs is deprecaded, would ramBufferSizeMB do the same? What I find strange is this line in the exception: "Last packet sent to the server was 202481 ms ago." Something took very very long to complete and the connection got closed by the time the next row was fetched from the opened resultset. Just curious, what was the previous value of maxBufferedDocs and what did you change it to? -- View this message in context: http://www.nabble.com/Deduplication-patch-not-working-in-nightly-build-tp21287327p21374908.html Sent from the Solr - User mailing list archive at Nabble.com. -- Regards, Shalin Shekhar Mangar.
Re: Deduplication patch not working in nightly build
Hey Shalin, In the begining (when the error was appearing) i had 32 and no maxBufferedDocs set Now I have: 32 50 I think taht setting maxBufferedDocs to 50 I am forcing more disk writting than I would like... but at least it works fine (but a bit slower,opiously). I keep saying that the most weird thing is that I don't have that problem using solr1.3, just with the nightly... Even that it's good that it works well now, would be great if someone can give me an explanation why this is happening Shalin Shekhar Mangar wrote: > > On Fri, Jan 9, 2009 at 9:23 PM, Marc Sturlese > wrote: > >> >> hey there, >> I hadn't autoCommit set to true but I have it sorted! The error >> stopped >> appearing after setting the property maxBufferedDocs in solrconfig.xml. I >> can't exactly undersand why but it just worked. >> Anyway, maxBufferedDocs is deprecaded, would ramBufferSizeMB do the same? >> >> > What I find strange is this line in the exception: > "Last packet sent to the server was 202481 ms ago." > > Something took very very long to complete and the connection got closed by > the time the next row was fetched from the opened resultset. > > Just curious, what was the previous value of maxBufferedDocs and what did > you change it to? > > >> >> -- >> View this message in context: >> http://www.nabble.com/Deduplication-patch-not-working-in-nightly-build-tp21287327p21374908.html >> Sent from the Solr - User mailing list archive at Nabble.com. >> >> > > > -- > Regards, > Shalin Shekhar Mangar. > > -- View this message in context: http://www.nabble.com/Deduplication-patch-not-working-in-nightly-build-tp21287327p21376235.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Deduplication patch not working in nightly build
On Fri, Jan 9, 2009 at 9:23 PM, Marc Sturlese wrote: > > hey there, > I hadn't autoCommit set to true but I have it sorted! The error stopped > appearing after setting the property maxBufferedDocs in solrconfig.xml. I > can't exactly undersand why but it just worked. > Anyway, maxBufferedDocs is deprecaded, would ramBufferSizeMB do the same? > > What I find strange is this line in the exception: "Last packet sent to the server was 202481 ms ago." Something took very very long to complete and the connection got closed by the time the next row was fetched from the opened resultset. Just curious, what was the previous value of maxBufferedDocs and what did you change it to? > > -- > View this message in context: > http://www.nabble.com/Deduplication-patch-not-working-in-nightly-build-tp21287327p21374908.html > Sent from the Solr - User mailing list archive at Nabble.com. > > -- Regards, Shalin Shekhar Mangar.
Re: Beginner: importing own data
You were searching for "1899" which is the value of the "date" field in the document you added. You need to specify q=date:1899 to search on the date field. You can also use the "" element in schema.xml to specify the field on which you'd like to search if no field name is specified in the query. Typically, one creates a catch-all field which copies data from all the fields you want to search on. http://wiki.apache.org/solr/SchemaXml#head-b80c539a0a01eef8034c3776e49e8fe1c064f496 Also look at the DisMax queries: http://wiki.apache.org/solr/DisMaxRequestHandler On Fri, Jan 9, 2009 at 8:35 PM, phil cryer wrote: > Otis > Thanks for your reply, I wrote out a long email explaining the steps I > took, and the results, but it was returned by the Solr-user email > server stamped as spam. I've put my note on pastebin, you can see it > here: http://pastebin.cryer.us/pastebin.php?show=m359e2e47 > > I'd appreciate any feedback, I know I'm close to getting this working, > just can't see what I'm missing. > > Thank you > > P > > On Thu, Jan 8, 2009 at 4:19 PM, Otis Gospodnetic > wrote: > > Phil, > > > > The easiest thing to do at this stage in Solr learning experience is to > restart Solr (servlet container) and redo the search. Results shouls start > showing up then because this will effectively reopen the index. > > > > > > Otis > > -- > > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch > > > > > > > > - Original Message > >> From: phil cryer > >> To: solr-user@lucene.apache.org > >> Sent: Thursday, January 8, 2009 5:00:29 PM > >> Subject: Beginner: importing own data > >> > >> So I have Solr running, I've run through the tutorials online, can > >> import data from the example xml and see the results, so it works! > >> Now, I take some xml data I have, convert it over to the add / doc > >> type that the demo ones are, run it and find out which fields aren't > >> defined in schema.xml, I add them there until they're all there and I > >> can finally import my own xml into solr w/o error. But, when I go to > >> query solr, it's not there. Again, I'm using the same procedure that > >> I used on the example xml files, and they did the 'commit' at the end, > >> so I'm doing something wrong. > >> > >> Is that all I need to do, define my fields in schema.xml and then > >> import via post.jar? It seems to work, but no results are ever found > >> by solr. I'm open to trying any debugging or whatever, I need to > >> figure this out before I can start learning solr. > >> > >> Thanks > >> > >> P > > > > > -- Regards, Shalin Shekhar Mangar.
solr admin page throwing errors
Hi, I am using solr admin page with index.jsp from < <%-- $Id: index.jsp 686780 2008-08-18 15:08:28Z yonik $ --%> I am getting these errors. Any insight will be helpful. HTTP Status 500 - javax.servlet.ServletException: java.lang.NoSuchFieldError: config org.apache.jasper.JasperException: javax.servlet.ServletException: java.lang.NoSuchFieldError: config at org.apache.jasper.servlet.JspServletWrapper.handleJspException(JspServletWrapper.java:532) at org.apache.jasper.servlet.JspServletWrapper.service(JspServletWrapper.java:408) at org.apache.jasper.servlet.JspServlet.serviceJspFile(JspServlet.java:320) at org.apache.jasper.servlet.JspServlet.service(JspServlet.java:266) at javax.servlet.http.HttpServlet.service(HttpServlet.java:803) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:290) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at org.apache.catalina.core.ApplicationDispatcher.invoke(ApplicationDispatcher.java:687) at org.apache.catalina.core.ApplicationDispatcher.processRequest(ApplicationDispatcher.java:469) at org.apache.catalina.core.ApplicationDispatcher.doForward(ApplicationDispatcher.java:403) at org.apache.catalina.core.ApplicationDispatcher.forward(ApplicationDispatcher.java:301) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:273) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:228) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:175) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:104) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:216) at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:844) at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:634) at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:445) at java.lang.Thread.run(Thread.java:619) Caused by: javax.servlet.ServletException: java.lang.NoSuchFieldError: config at org.apache.jasper.runtime.PageContextImpl.doHandlePageException(PageContextImpl.java:855) at org.apache.jasper.runtime.PageContextImpl.handlePageException(PageContextImpl.java:784) at org.apache.jsp.admin.index_jsp._jspService(index_jsp.java:324) at org.apache.jasper.runtime.HttpJspBase.service(HttpJspBase.java:70) at javax.servlet.http.HttpServlet.service(HttpServlet.java:803) at org.apache.jasper.servlet.JspServletWrapper.service(JspServletWrapper.java:384) ... 22 more Caused -- View this message in context: http://www.nabble.com/solr-admin-page-throwing-errors-tp21375221p21375221.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Deduplication patch not working in nightly build
hey there, I hadn't autoCommit set to true but I have it sorted! The error stopped appearing after setting the property maxBufferedDocs in solrconfig.xml. I can't exactly undersand why but it just worked. Anyway, maxBufferedDocs is deprecaded, would ramBufferSizeMB do the same? Thanks Marc Sturlese wrote: > > Hey there, > I was using the Deduplication patch with Solr 1.3 release and everything > was working perfectly. Now I upgraded to a nigthly build (20th december) > to be able to use new facet algorithm and other stuff and DeDuplication is > not working any more. I have followed exactly the same steps to apply the > patch to the source code. I am geting this error: > > WARNING: Error reading data > com.mysql.jdbc.CommunicationsException: Communications link failure due to > underlying exception: > > ** BEGIN NESTED EXCEPTION ** > > java.io.EOFException > > STACKTRACE: > > java.io.EOFException > at com.mysql.jdbc.MysqlIO.readFully(MysqlIO.java:1905) > at com.mysql.jdbc.MysqlIO.reuseAndReadPacket(MysqlIO.java:2404) > at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:2862) > at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:771) > at com.mysql.jdbc.MysqlIO.nextRow(MysqlIO.java:1289) > at > com.mysql.jdbc.RowDataDynamic.nextRecord(RowDataDynamic.java:362) > at com.mysql.jdbc.RowDataDynamic.next(RowDataDynamic.java:352) > at com.mysql.jdbc.ResultSet.next(ResultSet.java:6144) > at > org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.hasnext(JdbcDataSource.java:294) > at > org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.access$400(JdbcDataSource.java:189) > at > org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator$1.hasNext(JdbcDataSource.java:225) > at > org.apache.solr.handler.dataimport.EntityProcessorBase.getNext(EntityProcessorBase.java:229) > at > org.apache.solr.handler.dataimport.SqlEntityProcessor.nextRow(SqlEntityProcessor.java:76) > at > org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:351) > at > org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:193) > at > org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:144) > at > org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:334) > at > org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:407) > at > org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:388) > > > ** END NESTED EXCEPTION ** > Last packet sent to the server was 202481 ms ago. > at com.mysql.jdbc.MysqlIO.reuseAndReadPacket(MysqlIO.java:2563) > at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:2862) > at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:771) > at com.mysql.jdbc.MysqlIO.nextRow(MysqlIO.java:1289) > at > com.mysql.jdbc.RowDataDynamic.nextRecord(RowDataDynamic.java:362) > at com.mysql.jdbc.RowDataDynamic.next(RowDataDynamic.java:352) > at com.mysql.jdbc.ResultSet.next(ResultSet.java:6144) > at > org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.hasnext(JdbcDataSource.java:294) > at > org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.access$400(JdbcDataSource.java:189) > at > org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator$1.hasNext(JdbcDataSource.java:225) > at > org.apache.solr.handler.dataimport.EntityProcessorBase.getNext(EntityProcessorBase.java:229) > at > org.apache.solr.handler.dataimport.SqlEntityProcessor.nextRow(SqlEntityProcessor.java:76) > at > org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:351) > at > org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:193) > at > org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:144) > at > org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:334) > at > org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:407) > at > org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:388) > Jan 5, 2009 10:06:16 AM org.apache.solr.handler.dataimport.JdbcDataSource > logError > WARNING: Exception while closing result set > com.mysql.jdbc.CommunicationsException: Communications link failure due to > underlying exception: > > ** BEGIN NESTED EXCEPTION ** > > java.io.EOFException > > STACKTRACE: > > java.io.EOFException > at com.mysql.jdbc.MysqlIO.readFully(MysqlIO.java:1905) > at com.mysql.jdbc.MysqlIO.reuseAndReadPacket(MysqlIO.java:2351) > at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:2862) > at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:771) > at com.mysql.jdbc.MysqlIO.nextRow(My
Re: Overlapping Replication Scripts
You do a commit in step 1 after the update, right? So if you configure Solr on the indexer to invoke snapshooter after a commit and optimize, then you would not need to invoke snapshooter explicitly using cron. snappuller doesn't do anything unless there is a new snapshot on the indexer. Bill On Fri, Jan 9, 2009 at 4:31 AM, Shalin Shekhar Mangar < shalinman...@gmail.com> wrote: > On Fri, Jan 9, 2009 at 4:28 AM, wojtekpia wrote: > > > > > What happens if I overlap the execution of my cron jobs? Do any of these > > scripts detect that another instance is already executing? > > > No, they don't. > > > > > > -- > > View this message in context: > > > http://www.nabble.com/Overlapping-Replication-Scripts-tp21362434p21362434.html > > Sent from the Solr - User mailing list archive at Nabble.com. > > > > > > > -- > Regards, > Shalin Shekhar Mangar. >
Re: Beginner: importing own data
Otis Thanks for your reply, I wrote out a long email explaining the steps I took, and the results, but it was returned by the Solr-user email server stamped as spam. I've put my note on pastebin, you can see it here: http://pastebin.cryer.us/pastebin.php?show=m359e2e47 I'd appreciate any feedback, I know I'm close to getting this working, just can't see what I'm missing. Thank you P On Thu, Jan 8, 2009 at 4:19 PM, Otis Gospodnetic wrote: > Phil, > > The easiest thing to do at this stage in Solr learning experience is to > restart Solr (servlet container) and redo the search. Results shouls start > showing up then because this will effectively reopen the index. > > > Otis > -- > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch > > > > - Original Message >> From: phil cryer >> To: solr-user@lucene.apache.org >> Sent: Thursday, January 8, 2009 5:00:29 PM >> Subject: Beginner: importing own data >> >> So I have Solr running, I've run through the tutorials online, can >> import data from the example xml and see the results, so it works! >> Now, I take some xml data I have, convert it over to the add / doc >> type that the demo ones are, run it and find out which fields aren't >> defined in schema.xml, I add them there until they're all there and I >> can finally import my own xml into solr w/o error. But, when I go to >> query solr, it's not there. Again, I'm using the same procedure that >> I used on the example xml files, and they did the 'commit' at the end, >> so I'm doing something wrong. >> >> Is that all I need to do, define my fields in schema.xml and then >> import via post.jar? It seems to work, but no results are ever found >> by solr. I'm open to trying any debugging or whatever, I need to >> figure this out before I can start learning solr. >> >> Thanks >> >> P > >
Re: Beginner: importing own data
Paul I have looked at those, but want to learn how to do the easy things first - as I posted below I can import example data and then search against it. Data that I've tried to import seems to import, but I can't search/find it, I want to know how to do this first, so if you have any idea, I would appreciate it. Thanks P On Thu, Jan 8, 2009 at 8:18 PM, Noble Paul നോബിള് नोब्ळ् wrote: > did you explore using SolrJ to index data? > http://wiki.apache.org/solr/Solrj > > or DataImportHandler. > http://wiki.apache.org/solr/DataImportHandler > > On Fri, Jan 9, 2009 at 3:49 AM, Otis Gospodnetic > wrote: >> Phil, >> >> The easiest thing to do at this stage in Solr learning experience is to >> restart Solr (servlet container) and redo the search. Results shouls start >> showing up then because this will effectively reopen the index. >> >> >> Otis >> -- >> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch >> >> >> >> - Original Message >>> From: phil cryer >>> To: solr-user@lucene.apache.org >>> Sent: Thursday, January 8, 2009 5:00:29 PM >>> Subject: Beginner: importing own data >>> >>> So I have Solr running, I've run through the tutorials online, can >>> import data from the example xml and see the results, so it works! >>> Now, I take some xml data I have, convert it over to the add / doc >>> type that the demo ones are, run it and find out which fields aren't >>> defined in schema.xml, I add them there until they're all there and I >>> can finally import my own xml into solr w/o error. But, when I go to >>> query solr, it's not there. Again, I'm using the same procedure that >>> I used on the example xml files, and they did the 'commit' at the end, >>> so I'm doing something wrong. >>> >>> Is that all I need to do, define my fields in schema.xml and then >>> import via post.jar? It seems to work, but no results are ever found >>> by solr. I'm open to trying any debugging or whatever, I need to >>> figure this out before I can start learning solr. >>> >>> Thanks >>> >>> P >> >> > > > > -- > --Noble Paul >
Re: Solr on a multiprocessor machine
On Fri, Jan 9, 2009 at 12:18 AM, smock wrote: > In some ways I have a 'small index' (~8 million documents at the moment). > However, I have a lot of attributes (currently about 30, but I'm expecting > that number to keep growing) and am interested in faceting across all of > them for every search OK, this is where you will become CPU bound (faceting on 30 fields). But if you will have any search traffic at all, you are better off going with non-distributed search on a single box over distributed on a single box. Distributed search needs to do more work than non-distributed for faceting also (in the form of over-requesting and facet refinement requests). If you are interested in why this extra work needs to be done, search form "refinement" in https://issues.apache.org/jira/browse/SOLR-303 -Yonik
Re: Deduplication patch not working in nightly build
I can't imagine why dedupe would have anything to do with this, other than what was said, it perhaps is taking a bit longer to get a document to the db, and it times out (maybe a long signature calculation?). Have you tried changing your MySql settings to allow for a longer timeout? (sorry, I'm not to up to date on what you have tried). Also, are you using autocommit during the import? If so, you might try turning it off for the full import. - Mark Marc Sturlese wrote: Hey there, I am stack in this problem sine 3 days ago and no idea how to sort it. I am using the nighlty from a week ago, mysql and this driver and url: driver="com.mysql.jdbc.Driver" url="jdbc:mysql://localhost/my_db" I can use deduplication patch with indexs of 200.000 docs and no problem. When I try a full-import with a db of 1.500.000 it stops indexing at doc number 15.000 aprox showing me the error posted above. Once I get the exception, i restart tomcat and start a delta-import... this time everything works fine! I need to avoid this error in the full import, i have tryed: url="jdbc:mysql://localhost/my_db?autoReconnect=true to sort it in case the connection was closed due to long time until next doc was indexed, but nothing changed... I keep having this: Jan 9, 2009 1:38:18 PM org.apache.solr.handler.dataimport.JdbcDataSource logError WARNING: Error reading data com.mysql.jdbc.CommunicationsException: Communications link failure due to underlying exception: ** BEGIN NESTED EXCEPTION ** java.io.EOFException STACKTRACE: java.io.EOFException at com.mysql.jdbc.MysqlIO.readFully(MysqlIO.java:1905) at com.mysql.jdbc.MysqlIO.reuseAndReadPacket(MysqlIO.java:2404) at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:2862) at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:771) at com.mysql.jdbc.MysqlIO.nextRow(MysqlIO.java:1289) at com.mysql.jdbc.RowDataDynamic.nextRecord(RowDataDynamic.java:362) at com.mysql.jdbc.RowDataDynamic.next(RowDataDynamic.java:352) at com.mysql.jdbc.ResultSet.next(ResultSet.java:6144) at org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.hasnext(JdbcDataSource.java:279) at org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.access$500(JdbcDataSource.java:167) at org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator$1.hasNext(JdbcDataSource.java:205) at org.apache.solr.handler.dataimport.EntityProcessorBase.getNext(EntityProcessorBase.java:229) at org.apache.solr.handler.dataimport.SqlEntityProcessor.nextRow(SqlEntityProcessor.java:77) at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:387) at org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:209) at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:160) at org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:368) at org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:437) at org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:428) ** END NESTED EXCEPTION ** Last packet sent to the server was 206097 ms ago. at com.mysql.jdbc.MysqlIO.reuseAndReadPacket(MysqlIO.java:2563) at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:2862) at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:771) at com.mysql.jdbc.MysqlIO.nextRow(MysqlIO.java:1289) at com.mysql.jdbc.RowDataDynamic.nextRecord(RowDataDynamic.java:362) at com.mysql.jdbc.RowDataDynamic.next(RowDataDynamic.java:352) at com.mysql.jdbc.ResultSet.next(ResultSet.java:6144) at org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.hasnext(JdbcDataSource.java:279) at org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.access$500(JdbcDataSource.java:167) at org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator$1.hasNext(JdbcDataSource.java:205) at org.apache.solr.handler.dataimport.EntityProcessorBase.getNext(EntityProcessorBase.java:229) at org.apache.solr.handler.dataimport.SqlEntityProcessor.nextRow(SqlEntityProcessor.java:77) at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:387) at org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:209) at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:160) at org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:368) at org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:437) at org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:428) Jan 9, 2009 1:38:18 PM org.apache.solr.handler.dataimport.JdbcDataSource logError WARNING: Exception while closing result set com.mysql.jdbc.CommunicationsExcepti
Re: Query regarding Spelling Suggestions
Can you put the full log (as short as possibly demonstrates the problem) somewhere where I can take a look? Likewise, can you share your schema? Also, does the spelling index exist under /data/index? If you open it w/ Luke, does it have entries? Thanks, Grant On Jan 8, 2009, at 11:30 PM, Deshpande, Mukta wrote: Yes. I send the build command as: http://localhost:8080/solr/select/?q=documnet&spellcheck=true&spellcheck .build =true&spellcheck.count=2&spellcheck.q=parfect&spellcheck.dictionar y=dict The Tomcat log shows: Jan 9, 2009 9:55:19 AM org.apache.solr.core.SolrCore execute INFO: [] webapp=/solr path=/select/ params ={spellcheck=true&q=documnet&spellcheck.q=parfect&spellcheck.dicti onary=dict&spellcheck.count=2&spellcheck.build=true} hits=0 status=0 QTime=141 Even after sending the build command I do not get any suggestions. Can you please check. Thanks, ~Mukta -Original Message- From: Grant Ingersoll [mailto:gsing...@apache.org] Sent: Thursday, January 08, 2009 7:42 PM To: solr-user@lucene.apache.org Subject: Re: Query regarding Spelling Suggestions Did you send in the build command? See http://wiki.apache.org/solr/SpellCheckComponent On Jan 8, 2009, at 5:14 AM, Deshpande, Mukta wrote: Hi, I am using Wordnet dictionary for spelling suggestions. The dictionary is converted to Solr index with only one field "word" and stored in location /data/syn_index, using syns2Index.java program available at http://www.tropo.com/techno/java/lucene/wordnet.html I have added the "word" field in my "schema.xml" as name="word" type="textSpell" indexed="true" stored="true"/> My application data indexes are in /data I am trying to use solr.IndexBasedSpellChecker to get spelling suggestions. My spell check component is configured as: textSpell dict solr.IndexBasedSpellChecker word UTF-8 ./syn_index I have added this component to my standard request handler as: explicit spellcheck With the above configuration, I do not get any spelling suggestions. Can somebody help ASAP. Thanks, ~Mukta -- Grant Ingersoll Lucene Helpful Hints: http://wiki.apache.org/lucene-java/BasicsOfPerformance http://wiki.apache.org/lucene-java/LuceneFAQ -- Grant Ingersoll Lucene Helpful Hints: http://wiki.apache.org/lucene-java/BasicsOfPerformance http://wiki.apache.org/lucene-java/LuceneFAQ
RE: Ensuring documents indexed by autocommit
Thanks again for your inputs. But then I am still stuck on the question that how do we ensure that document is successfully indexed. One option I see is search for every document sent to solr. Or do we assume that autocommit always indexes all the documents successfully? Thanks, Siddharth -Original Message- From: Shalin Shekhar Mangar [mailto:shalinman...@gmail.com] Sent: Friday, January 09, 2009 5:08 PM To: solr-user@lucene.apache.org Subject: Re: Ensuring documents indexed by autocommit On Fri, Jan 9, 2009 at 5:00 PM, Alexander Ramos Jardim < alexander.ramos.jar...@gmail.com> wrote: > Shalin, > > Just to remember that with he is indexing more documents that he has > memory avaiable, it is a good thing to have autocommit set. Yes, sorry, I had assumed that he has enough memory on the solr server. If not, then autoCommit may improve performance. Thanks for pointing this out Alexander. -- Regards, Shalin Shekhar Mangar.
Re: Deduplication patch not working in nightly build
Hey there, I am stack in this problem sine 3 days ago and no idea how to sort it. I am using the nighlty from a week ago, mysql and this driver and url: driver="com.mysql.jdbc.Driver" url="jdbc:mysql://localhost/my_db" I can use deduplication patch with indexs of 200.000 docs and no problem. When I try a full-import with a db of 1.500.000 it stops indexing at doc number 15.000 aprox showing me the error posted above. Once I get the exception, i restart tomcat and start a delta-import... this time everything works fine! I need to avoid this error in the full import, i have tryed: url="jdbc:mysql://localhost/my_db?autoReconnect=true to sort it in case the connection was closed due to long time until next doc was indexed, but nothing changed... I keep having this: Jan 9, 2009 1:38:18 PM org.apache.solr.handler.dataimport.JdbcDataSource logError WARNING: Error reading data com.mysql.jdbc.CommunicationsException: Communications link failure due to underlying exception: ** BEGIN NESTED EXCEPTION ** java.io.EOFException STACKTRACE: java.io.EOFException at com.mysql.jdbc.MysqlIO.readFully(MysqlIO.java:1905) at com.mysql.jdbc.MysqlIO.reuseAndReadPacket(MysqlIO.java:2404) at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:2862) at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:771) at com.mysql.jdbc.MysqlIO.nextRow(MysqlIO.java:1289) at com.mysql.jdbc.RowDataDynamic.nextRecord(RowDataDynamic.java:362) at com.mysql.jdbc.RowDataDynamic.next(RowDataDynamic.java:352) at com.mysql.jdbc.ResultSet.next(ResultSet.java:6144) at org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.hasnext(JdbcDataSource.java:279) at org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.access$500(JdbcDataSource.java:167) at org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator$1.hasNext(JdbcDataSource.java:205) at org.apache.solr.handler.dataimport.EntityProcessorBase.getNext(EntityProcessorBase.java:229) at org.apache.solr.handler.dataimport.SqlEntityProcessor.nextRow(SqlEntityProcessor.java:77) at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:387) at org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:209) at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:160) at org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:368) at org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:437) at org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:428) ** END NESTED EXCEPTION ** Last packet sent to the server was 206097 ms ago. at com.mysql.jdbc.MysqlIO.reuseAndReadPacket(MysqlIO.java:2563) at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:2862) at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:771) at com.mysql.jdbc.MysqlIO.nextRow(MysqlIO.java:1289) at com.mysql.jdbc.RowDataDynamic.nextRecord(RowDataDynamic.java:362) at com.mysql.jdbc.RowDataDynamic.next(RowDataDynamic.java:352) at com.mysql.jdbc.ResultSet.next(ResultSet.java:6144) at org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.hasnext(JdbcDataSource.java:279) at org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.access$500(JdbcDataSource.java:167) at org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator$1.hasNext(JdbcDataSource.java:205) at org.apache.solr.handler.dataimport.EntityProcessorBase.getNext(EntityProcessorBase.java:229) at org.apache.solr.handler.dataimport.SqlEntityProcessor.nextRow(SqlEntityProcessor.java:77) at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:387) at org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:209) at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:160) at org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:368) at org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:437) at org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:428) Jan 9, 2009 1:38:18 PM org.apache.solr.handler.dataimport.JdbcDataSource logError WARNING: Exception while closing result set com.mysql.jdbc.CommunicationsException: Communications link failure due to underlying exception: ** BEGIN NESTED EXCEPTION ** java.io.EOFException STACKTRACE: java.io.EOFException at com.mysql.jdbc.MysqlIO.readFully(MysqlIO.java:1905) at com.mysql.jdbc.MysqlIO.reuseAndReadPacket(MysqlIO.java:2351) at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:2862) at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:771) at com.mysql.jdbc.MysqlIO.nextRow(MysqlIO.java:1289)
Re: Problem in Out Put of Search
Can you sample us the query you are doing and how you are indexing documents? 2009/1/9 rohit arora > > Hi, > > I have add one document only single time but the out put provided by lucene > give me > the same document multiple times.. > > If i specify rows=2 in out put same document will be 2 times. > If i specify rows=10 in out put same document will be 10 times. > > I have already defined 'id' field as a uniqueKey in the schema.xml > > with regards > Rohit Arora > > --- On Fri, 1/9/09, Shalin Shekhar Mangar wrote: > From: Shalin Shekhar Mangar > Subject: Re: Problem in Out Put of Search > To: solr-user@lucene.apache.org > Date: Friday, January 9, 2009, 11:55 AM > > There are two documents in that response. Are you adding the same document > multiple times to Solr? > > You can also specify a uniqueKey in the schema.xml which will make sure > that > Solr keeps only one document for a given key and removes the duplicate > documents. > > In the response you have pasted, the 'id' field looks like it should > have > been defined as a uniqueKey. > > On Fri, Jan 9, 2009 at 11:12 AM, rohit arora > wrote: > > > > > Hi, > > > > It gives this out put .. > > > > > > 5.361002 > > 8232 > > Quality Testing > International > > > > Quality Testing International the ideal exhibition for measuring > technique > > testing of materials and quality assurance. Profile for exhibit include > > Customer profiling; customer marketing; loyalty systems and operators; > > customer intelligence; market research and analysis; customer experience > > management; employee motivation and incentivising; data warehousing/data > > mining; employee training; contact/call centre; customer service > management; > > sales promotions and incentives; field marketing; CRM solutions. > > > > > > Quality Testing International the ideal exhibition for measuring > technique > > testing of materials and quality assurance. > > > > > > > > 5.361002 > > 8232 > > Quality Testing > International > > > > Quality Testing International the ideal exhibition for measuring > technique > > testing of materials and quality assurance. Profile for exhibit include > > Customer profiling; customer marketing; loyalty systems and operators; > > customer intelligence; market research and analysis; customer experience > > management; employee motivation and incentivising; data warehousing/data > > mining; employee training; contact/call centre; customer service > management; > > sales promotions and incentives; field marketing; CRM solutions. > > > > > > Quality Testing International the ideal exhibition for measuring > technique > > testing of materials and quality assurance. > > > > > > > > > > If you look it provide same record of (id,name,large_desc,small_desc) > > multiple times.. > > > > I have attached the out put in a (.txt) file.. > > > > with regards > > Rohit Arora > > > > > > > > > > > > --- On *Thu, 1/8/09, Erik Hatcher * > wrote: > > > > From: Erik Hatcher > > Subject: Re: Problem in Out Put of Search > > To: solr-user@lucene.apache.org > > Date: Thursday, January 8, 2009, 7:10 PM > > > > > > Please provide an example of what you mean. > > What and how did you index? What > > was the query? > > > > Erik > > > > On Jan 8, 2009, at 8:34 AM, rohit arora wrote: > > > > > > > > Hi, > > > > > > I have installed solr lucene 1.3. I am facing a problem wile > searching it > > did not provides multiple records. > > > > > > Instead of providing multiple records it provides single record > multiple > > times.. > > > > > > with regards > > > Rohit Arora > > > > > > > > > > > > > > > > > > -- > Regards, > Shalin Shekhar Mangar. > > > > > -- Alexander Ramos Jardim
Re: Ensuring documents indexed by autocommit
On Fri, Jan 9, 2009 at 5:00 PM, Alexander Ramos Jardim < alexander.ramos.jar...@gmail.com> wrote: > Shalin, > > Just to remember that with he is indexing more documents that he has memory > avaiable, it is a good thing to have autocommit set. Yes, sorry, I had assumed that he has enough memory on the solr server. If not, then autoCommit may improve performance. Thanks for pointing this out Alexander. -- Regards, Shalin Shekhar Mangar.
Re: Ensuring documents indexed by autocommit
Shalin, Just to remember that with he is indexing more documents that he has memory avaiable, it is a good thing to have autocommit set. 2009/1/9 Shalin Shekhar Mangar > On Fri, Jan 9, 2009 at 4:47 PM, Gargate, Siddharth > wrote: > > > But what you were suggesting is that I > > should call commit only after some time or after few number of > > documents, right? > > > Correct. If you are using Solrj client for indexing data, you can use the > SolrServer#add(Collection docs) method to add multiple > documents in a batch and then call commit. > > But unless you really need to commit in between adding documents, > committing > at the very end of the indexing process usually gives the best performance. > > -- > Regards, > Shalin Shekhar Mangar. > -- Alexander Ramos Jardim
Re: Ensuring documents indexed by autocommit
On Fri, Jan 9, 2009 at 4:47 PM, Gargate, Siddharth wrote: > But what you were suggesting is that I > should call commit only after some time or after few number of > documents, right? Correct. If you are using Solrj client for indexing data, you can use the SolrServer#add(Collection docs) method to add multiple documents in a batch and then call commit. But unless you really need to commit in between adding documents, committing at the very end of the indexing process usually gives the best performance. -- Regards, Shalin Shekhar Mangar.
RE: Ensuring documents indexed by autocommit
Sorry, for the previous question. What I meant was whether we can set the configuration from the code. But what you were suggesting is that I should call commit only after some time or after few number of documents, right? -Original Message- From: Gargate, Siddharth [mailto:sgarg...@ptc.com] Sent: Friday, January 09, 2009 4:43 PM To: solr-user@lucene.apache.org Subject: RE: Ensuring documents indexed by autocommit How do we set the maxDocs or maxTime for commit from the application? Thanks, Siddharth -Original Message- From: Shalin Shekhar Mangar [mailto:shalinman...@gmail.com] Sent: Friday, January 09, 2009 4:34 PM To: solr-user@lucene.apache.org Subject: Re: Ensuring documents indexed by autocommit On Fri, Jan 9, 2009 at 4:20 PM, Gargate, Siddharth wrote: > Thanks Shalin for the reply. > I am working with the remote Solr server. I am using autocommit > instead of commit method call because I observed significant > performance improvement with autocommit. > Just wanted to make sure that callback functionality is currently not > available in Solr. > > You provide your own implementation of SolrEventListener to do a call back to your application in any way you need. I don't think using autoCommit gives a performance advantage over normal commits. Calling commit after each document is not a good idea since commit is an expensive operation. The only reason you are seeing better performance after autoCommit is because it is set to commit after 'X' number of documents or minutes. This is something you can do from your application as well. -- Regards, Shalin Shekhar Mangar.
RE: Ensuring documents indexed by autocommit
How do we set the maxDocs or maxTime for commit from the application? Thanks, Siddharth -Original Message- From: Shalin Shekhar Mangar [mailto:shalinman...@gmail.com] Sent: Friday, January 09, 2009 4:34 PM To: solr-user@lucene.apache.org Subject: Re: Ensuring documents indexed by autocommit On Fri, Jan 9, 2009 at 4:20 PM, Gargate, Siddharth wrote: > Thanks Shalin for the reply. > I am working with the remote Solr server. I am using autocommit > instead of commit method call because I observed significant > performance improvement with autocommit. > Just wanted to make sure that callback functionality is currently not > available in Solr. > > You provide your own implementation of SolrEventListener to do a call back to your application in any way you need. I don't think using autoCommit gives a performance advantage over normal commits. Calling commit after each document is not a good idea since commit is an expensive operation. The only reason you are seeing better performance after autoCommit is because it is set to commit after 'X' number of documents or minutes. This is something you can do from your application as well. -- Regards, Shalin Shekhar Mangar.
Re: Querying based on term position possible?
2009/1/8 Otis Gospodnetic > Hello Mark, > > As for assigning different weight to fields, have a look at DisMax request > handler - > http://wiki.apache.org/solr/DisMaxRequestHandler#head-af452050ee272a1c88e2ff89dc0012049e69e180 > Field boosting should solve this issue too, right? > > > Otis > -- > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch > > > > - Original Message > > From: Mark Tovey > > To: solr-user@lucene.apache.org > > Sent: Thursday, January 8, 2009 12:16:39 PM > > Subject: Querying based on term position possible? > > > > I'm a relative newbie at Solr/Lucene so apologies if this question is > > overly simplistic. I have an index built and functioning as expected, > > but I am trying to build a query that can sort/score results based on > > the search terms position in the document, with a document appearing > > higher in the results list if the term appears earlier in the document. > > For example, "Red fox in the forest" would be scored over "My shoes are > > red today and my shirt is also red" if I search for the term "red". It > > seems to me that the default scoring algorithm is based more on the term > > frequency than term position, though this may be a simplistic > > interpretation. Does anyone on the list know if there is a way to > > achieve my desired results by structuring a query a certain way, or is > > this more of an indexing issue where I should have set a parameter(s) in > > my schema to a certain value? Any help is hugely appreciated as I have > > been puzzling away at this for the past couple of days with no success. > > > > > > > > Alternatively, is there a way to query on two fields for a search term > > with documents being placed higher in the results if the term occurs in > > field1 over field2? I ask this because one of the fields in my schema > > (title in this case) is more deemed more important in our scenario than > > the "text" field (which holds the title plus the contents of the > > remainder of the document). I tried, for example, title:red text:red but > > again was stumped on the syntax to place an "importance" variable on > > field1 over field2. > > > > > > > > Of course, it may be that what I'm trying to accomplish is simply not > > doable with the Lucene engine, at which point feel free to point out the > > error of my ways ;) > > > > > > > > Regards, > > > > --Mark Tovey > > -- Alexander Ramos Jardim
Re: Ensuring documents indexed by autocommit
On Fri, Jan 9, 2009 at 4:20 PM, Gargate, Siddharth wrote: > Thanks Shalin for the reply. > I am working with the remote Solr server. I am using autocommit instead > of commit method call because I observed significant performance > improvement with autocommit. > Just wanted to make sure that callback functionality is currently not > available in Solr. > > You provide your own implementation of SolrEventListener to do a call back to your application in any way you need. I don't think using autoCommit gives a performance advantage over normal commits. Calling commit after each document is not a good idea since commit is an expensive operation. The only reason you are seeing better performance after autoCommit is because it is set to commit after 'X' number of documents or minutes. This is something you can do from your application as well. -- Regards, Shalin Shekhar Mangar.
RE: Ensuring documents indexed by autocommit
Thanks Shalin for the reply. I am working with the remote Solr server. I am using autocommit instead of commit method call because I observed significant performance improvement with autocommit. Just wanted to make sure that callback functionality is currently not available in Solr. Thanks, Siddharth -Original Message- From: Shalin Shekhar Mangar [mailto:shalinman...@gmail.com] Sent: Friday, January 09, 2009 3:16 PM To: solr-user@lucene.apache.org Subject: Re: Ensuring documents indexed by autocommit On Fri, Jan 9, 2009 at 3:03 PM, Gargate, Siddharth wrote: > Hi all, >I am using CommonsHttpSolrServer to add documents to Solr. Instead > of explicitly calling commit for every document I have configured > autocommit in solrconfig.xml. But how do we ensure that the document > added is successfully indexed/committed on Solr side. Is there any > callback mechanism available where the callback method my application > will get called? I looked at the postCommit listener in solrconfig.xml > file but looks like it just supports execution of external executables. > Are you using embedded Solr? or is it on a remote machine? A callback would only work on the same JVM anyway. You can always call commit through CommonsHttpSolrServer and then do a query to check if the document you expect got indexed. Though, if all the add and commit calls were successful (i.e. returned HTTP 200), it is very unlikely that the document won't be indexed. -- Regards, Shalin Shekhar Mangar.
Re: Solr on a multiprocessor machine
On Jan 9, 2009, at 12:28 AM, smock wrote: I'm using 1.3 - are the nightly builds stable enough to use in production? Testing always recommended, and no official guarantees are made of course, but trunk is vastly superior to 1.3 in faceting performance. I'd use trunk (in fact I am) in production. Erik
Re: Problem in Out Put of Search
Rohit, I'd guess you don't have set to id in schema.xml. Erik On Jan 9, 2009, at 1:57 AM, rohit arora wrote: Hi, I have add one document only single time but the out put provided by lucene give me the same document multiple times.. If i specify rows=2 in out put same document will be 2 times. If i specify rows=10 in out put same document will be 10 times. I have already defined 'id' field as a uniqueKey in the schema.xml with regards Rohit Arora --- On Fri, 1/9/09, Shalin Shekhar Mangar wrote: From: Shalin Shekhar Mangar Subject: Re: Problem in Out Put of Search To: solr-user@lucene.apache.org Date: Friday, January 9, 2009, 11:55 AM There are two documents in that response. Are you adding the same document multiple times to Solr? You can also specify a uniqueKey in the schema.xml which will make sure that Solr keeps only one document for a given key and removes the duplicate documents. In the response you have pasted, the 'id' field looks like it should have been defined as a uniqueKey. On Fri, Jan 9, 2009 at 11:12 AM, rohit arora wrote: Hi, It gives this out put .. 5.361002 8232 Quality Testing International Quality Testing International the ideal exhibition for measuring technique testing of materials and quality assurance. Profile for exhibit include Customer profiling; customer marketing; loyalty systems and operators; customer intelligence; market research and analysis; customer experience management; employee motivation and incentivising; data warehousing/ data mining; employee training; contact/call centre; customer service management; sales promotions and incentives; field marketing; CRM solutions. Quality Testing International the ideal exhibition for measuring technique testing of materials and quality assurance. 5.361002 8232 Quality Testing International Quality Testing International the ideal exhibition for measuring technique testing of materials and quality assurance. Profile for exhibit include Customer profiling; customer marketing; loyalty systems and operators; customer intelligence; market research and analysis; customer experience management; employee motivation and incentivising; data warehousing/ data mining; employee training; contact/call centre; customer service management; sales promotions and incentives; field marketing; CRM solutions. Quality Testing International the ideal exhibition for measuring technique testing of materials and quality assurance. If you look it provide same record of (id,name,large_desc,small_desc) multiple times.. I have attached the out put in a (.txt) file.. with regards Rohit Arora --- On *Thu, 1/8/09, Erik Hatcher * wrote: From: Erik Hatcher Subject: Re: Problem in Out Put of Search To: solr-user@lucene.apache.org Date: Thursday, January 8, 2009, 7:10 PM Please provide an example of what you mean. What and how did you index? What was the query? Erik On Jan 8, 2009, at 8:34 AM, rohit arora wrote: Hi, I have installed solr lucene 1.3. I am facing a problem wile searching it did not provides multiple records. Instead of providing multiple records it provides single record multiple times.. with regards Rohit Arora -- Regards, Shalin Shekhar Mangar.
Re: Ensuring documents indexed by autocommit
On Fri, Jan 9, 2009 at 3:03 PM, Gargate, Siddharth wrote: > Hi all, >I am using CommonsHttpSolrServer to add documents to Solr. Instead > of explicitly calling commit for every document I have configured > autocommit in solrconfig.xml. But how do we ensure that the document > added is successfully indexed/committed on Solr side. Is there any > callback mechanism available where the callback method my application > will get called? I looked at the postCommit listener in solrconfig.xml > file but looks like it just supports execution of external executables. > Are you using embedded Solr? or is it on a remote machine? A callback would only work on the same JVM anyway. You can always call commit through CommonsHttpSolrServer and then do a query to check if the document you expect got indexed. Though, if all the add and commit calls were successful (i.e. returned HTTP 200), it is very unlikely that the document won't be indexed. -- Regards, Shalin Shekhar Mangar.
Re: 2 questions about solr spellcheck
On Fri, Jan 9, 2009 at 12:59 AM, Qingdi wrote: > > Hi, > > I use solr 1.3 and I have two questions about spellcheck. > > 1) if my index docs are like: > > university1 > UNIVERSITY > > > street1, city1 > LOCATION > > is it possible to build the spell check dictionary using field "NAME" but > with filter "TYPE"="UNIVERSITY"? > That is, I only want to include the university name in the dictionary. What > is the proper way to implement this? > It is not possible out of the box. However, there are a couple of ways to do this. 1. You can create a copy field for 'NAME' (say 'NAME_SPELL') which has a value only if "TYPE"="UNIVERSITY" for the document. 2. You can create your own implementation of the IndexBasedSpellChecker and HighFrequencyDictionary which applies a filter query on "TYPE" and then uses the terms to create the dictionary. Option #1 would be probably be the easiest if you care only about "TYPE"="UNIVERSITY". > 2) my current data index size is about 11G, and the spelling dictionary > index size is about 6 G. After adding the spell check component, will the > spell checking have any impact on the runtime query performance and memory > usage? Should I increase the memory allocation for the solr server? > I think the spelling index will have some impact. But the magnitude of the impact and the memory needed depends on a number of factors such as type of queries, query rate etc. > > Thanks for your help. > > Qingdi > -- > View this message in context: > http://www.nabble.com/2-questions-about-solr-spellcheck-tp21359183p21359183.html > Sent from the Solr - User mailing list archive at Nabble.com. > > -- Regards, Shalin Shekhar Mangar.
Ensuring documents indexed by autocommit
Hi all, I am using CommonsHttpSolrServer to add documents to Solr. Instead of explicitly calling commit for every document I have configured autocommit in solrconfig.xml. But how do we ensure that the document added is successfully indexed/committed on Solr side. Is there any callback mechanism available where the callback method my application will get called? I looked at the postCommit listener in solrconfig.xml file but looks like it just supports execution of external executables. Thanks in advance, Siddharth
Re: Overlapping Replication Scripts
On Fri, Jan 9, 2009 at 4:28 AM, wojtekpia wrote: > > What happens if I overlap the execution of my cron jobs? Do any of these > scripts detect that another instance is already executing? No, they don't. > > -- > View this message in context: > http://www.nabble.com/Overlapping-Replication-Scripts-tp21362434p21362434.html > Sent from the Solr - User mailing list archive at Nabble.com. > > -- Regards, Shalin Shekhar Mangar.
Re: Problem in Out Put of Search
Did you add the uniqueKey to schema.xml after indexing? If not, you need to re-index after changing the schema. Solr/Lucene do not duplicate documents by itself. How are you indexing documents to Solr? Did the example setup shipping with Solr work correctly for you? On Fri, Jan 9, 2009 at 12:27 PM, rohit arora wrote: > > Hi, > > I have add one document only single time but the out put provided by lucene > give me > the same document multiple times.. > > If i specify rows=2 in out put same document will be 2 times. > If i specify rows=10 in out put same document will be 10 times. > > I have already defined 'id' field as a uniqueKey in the schema.xml > > with regards > Rohit Arora > > --- On Fri, 1/9/09, Shalin Shekhar Mangar wrote: > From: Shalin Shekhar Mangar > Subject: Re: Problem in Out Put of Search > To: solr-user@lucene.apache.org > Date: Friday, January 9, 2009, 11:55 AM > > There are two documents in that response. Are you adding the same document > multiple times to Solr? > > You can also specify a uniqueKey in the schema.xml which will make sure > that > Solr keeps only one document for a given key and removes the duplicate > documents. > > In the response you have pasted, the 'id' field looks like it should > have > been defined as a uniqueKey. > > On Fri, Jan 9, 2009 at 11:12 AM, rohit arora > wrote: > > > > > Hi, > > > > It gives this out put .. > > > > > > 5.361002 > > 8232 > > Quality Testing > International > > > > Quality Testing International the ideal exhibition for measuring > technique > > testing of materials and quality assurance. Profile for exhibit include > > Customer profiling; customer marketing; loyalty systems and operators; > > customer intelligence; market research and analysis; customer experience > > management; employee motivation and incentivising; data warehousing/data > > mining; employee training; contact/call centre; customer service > management; > > sales promotions and incentives; field marketing; CRM solutions. > > > > > > Quality Testing International the ideal exhibition for measuring > technique > > testing of materials and quality assurance. > > > > > > > > 5.361002 > > 8232 > > Quality Testing > International > > > > Quality Testing International the ideal exhibition for measuring > technique > > testing of materials and quality assurance. Profile for exhibit include > > Customer profiling; customer marketing; loyalty systems and operators; > > customer intelligence; market research and analysis; customer experience > > management; employee motivation and incentivising; data warehousing/data > > mining; employee training; contact/call centre; customer service > management; > > sales promotions and incentives; field marketing; CRM solutions. > > > > > > Quality Testing International the ideal exhibition for measuring > technique > > testing of materials and quality assurance. > > > > > > > > > > If you look it provide same record of (id,name,large_desc,small_desc) > > multiple times.. > > > > I have attached the out put in a (.txt) file.. > > > > with regards > > Rohit Arora > > > > > > > > > > > > --- On *Thu, 1/8/09, Erik Hatcher * > wrote: > > > > From: Erik Hatcher > > Subject: Re: Problem in Out Put of Search > > To: solr-user@lucene.apache.org > > Date: Thursday, January 8, 2009, 7:10 PM > > > > > > Please provide an example of what you mean. > > What and how did you index? What > > was the query? > > > > Erik > > > > On Jan 8, 2009, at 8:34 AM, rohit arora wrote: > > > > > > > > Hi, > > > > > > I have installed solr lucene 1.3. I am facing a problem wile > searching it > > did not provides multiple records. > > > > > > Instead of providing multiple records it provides single record > multiple > > times.. > > > > > > with regards > > > Rohit Arora > > > > > > > > > > > > > > > > > > -- > Regards, > Shalin Shekhar Mangar. > > > > > -- Regards, Shalin Shekhar Mangar.
Re: Problem with WT parameter when upgrading from Solr1.2 to solr1.3
yeah, finally I did it by modifying the required solrDocumentList and using it instead of DocList object as in Solr 1.2 Thanks Pooja On Fri, Jan 9, 2009 at 9:01 AM, Yonik Seeley wrote: > On Thu, Jan 8, 2009 at 9:40 PM, Chris Hostetter > wrote: > > you have a custom response writer you had working in > > Solr 1.2, and now you are trying to use that same custom response writer > in > > Solr 1.3 with distributed requests? > > Right, that's probably the crux of it - distributed search required > some extensions to response writers... things like handling > SolrDocument and SolrDocumentList. > > -Yonik >