Incorrect Guava version in maven repository
Hi, When I try to upgrade Guava that SOLR depends on, I notice the Guava version listed in maven repository for SOLR is 14.0.1 ( https://mvnrepository.com/artifact/org.apache.solr/solr-core/8.0.0). I also noticed that there is a Jira issue resolved in SOLR that upgraded Guava dependency to 25.1(https://issues.apache.org/jira/browse/SOLR-11763). Is the Guava version listed in maven repository correct? Which Guava version does SOLR 8.0.0 and 7.5.0 depends on? Thanks, Amber
RE: Cassandra Solr Integration, what driver to use?
I use this fa jar for Solr 6.6.5 https://github.com/adejanovski/cassandra-jdbc-wrapper Kind regards, Daphne Liu BI Architect • Big Data - Matrix SCM CEVA Logistics / 10751 Deerwood Park Blvd, Suite 200, Jacksonville, FL 32256 USA / www.cevalogistics.com T 904.5641192/ F 904.928.1525 / daphne@cevalogistics.com Making business flow -Original Message- From: Ka Mok Sent: Thursday, November 15, 2018 4:26 PM To: solr-user@lucene.apache.org Subject: Cassandra Solr Integration, what driver to use? I'm trying to do some data integration with a Cassandra 3.11.3 database with Solr 7.5 I've spent the past 2 days looking for the right driver, and hasn't found a single one other than some product offered by Datastax. Is there really no way to use the default DataImportHandler? In the Solr Admin console, it reads that 1 request is made, 0 received / processed/ skipped. However, when I tail Cassandra, I see nothing was sent. I can confirm connection using a db GUI software like TablePlus or SQuirreLSQL. Anyone have any ideas? NVOCC Services are provided by CEVA as agents for and on behalf of Pyramid Lines Limited trading as Pyramid Lines. This e-mail message is intended for the above named recipient(s) only. It may contain confidential information that is privileged. If you are not the intended recipient, you are hereby notified that any dissemination, distribution or copying of this e-mail and any attachment(s) is strictly prohibited. If you have received this e-mail by error, please immediately notify the sender by replying to this e-mail and deleting the message including any attachment(s) from your system. Thank you in advance for your cooperation and assistance. Although the company has taken reasonable precautions to ensure no viruses are present in this email, the company cannot accept responsibility for any loss or damage arising from the use of this email or attachments.
RE: 20180917-Need Apache SOLR support
You have to increase your RAM. We have upgraded our Solr cluster to 12 solr nodes, each with 64G RAM, our shard size is around 25G, each server only hosts either one shard ( leading node or replica), Performance is very good. For better performance, memory needs to be over your shard size. Kind regards, Daphne Liu BI Architect • Big Data - Matrix SCM CEVA Logistics / 10751 Deerwood Park Blvd, Suite 200, Jacksonville, FL 32256 USA / www.cevalogistics.com T 904.9281448 / F 904.928.1525 / daphne@cevalogistics.com Making business flow -Original Message- From: zhenyuan wei Sent: Tuesday, September 18, 2018 3:12 AM To: solr-user@lucene.apache.org Subject: Re: 20180917-Need Apache SOLR support I have 6 machines,and each machine run a solr server, each solr server use RAM 18GB. Total document number is 3.2billion,1.4TB , my collection‘s replica factor is 1。collection shard number is 60,currently each shard is 20~30GB。 15 fields per document。 Query rate is slow now,maybe 100-500 requests per second. Shawn Heisey 于2018年9月18日周二 下午12:07写道: > On 9/17/2018 9:05 PM, zhenyuan wei wrote: > > Is that means: Small amount of shards gains better performance? > > I also have a usecase which contains 3 billion documents,the > > collection contains 60 shard now. Is that 10 shard is better than 60 shard? > > There is no definite answer to this question. It depends on a bunch > of things. How big is each shard once it's finally built? What's > your query rate? How many machines do you have, and how much memory > do those machines have? > > Thanks, > Shawn > > NVOCC Services are provided by CEVA as agents for and on behalf of Pyramid Lines Limited trading as Pyramid Lines. This e-mail message is intended for the above named recipient(s) only. It may contain confidential information that is privileged. If you are not the intended recipient, you are hereby notified that any dissemination, distribution or copying of this e-mail and any attachment(s) is strictly prohibited. If you have received this e-mail by error, please immediately notify the sender by replying to this e-mail and deleting the message including any attachment(s) from your system. Thank you in advance for your cooperation and assistance. Although the company has taken reasonable precautions to ensure no viruses are present in this email, the company cannot accept responsibility for any loss or damage arising from the use of this email or attachments.
missing jmx stats for num_docs and max_doc
Hi, We are running a 7.4.0 solr cluster with 3 tlogs and a few pulls. There is one collection divided into 8 shards, with each tlog has all 8 shards, and each pull either has shard1 to 4 or shard5 to 8. When using jmx to collect num_docs metrics via datadog, we found that the metrics for some shards are missing. For example, on one tlog, we saw only num_docs stats for shard3/4/5/8 and on another shard1/2/3/4/5/8. There seems to be more max_doc, but it's also missing for some shards. This only happens to the tlog instances so far. Restarting the solr process does not help. Did anyone encounter this before? What should I do next to continue to troubleshoot this? Thanks, Zehua
Exception writing document xxxxxx to the index; possible analysis error.
) at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:226) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1160) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:511) at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1092) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141) at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:213) at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:119) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134) at org.eclipse.jetty.server.Server.handle(Server.java:518) at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:308) at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:244) at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:273) at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:95) at org.eclipse.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93) at org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.produceAndRun(ExecuteProduceConsume.java:246) at org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProduceConsume.java:156) at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:654) at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:572) at java.lang.Thread.run(Thread.java:745) Caused by: org.apache.lucene.store.AlreadyClosedException: this IndexWriter is closed at org.apache.lucene.index.IndexWriter.ensureOpen(IndexWriter.java:740) at org.apache.lucene.index.IndexWriter.ensureOpen(IndexWriter.java:754) at org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1558) at org.apache.solr.update.DirectUpdateHandler2.doNormalUpdate(DirectUpdateHandler2.java:279) at org.apache.solr.update.DirectUpdateHandler2.addDoc0(DirectUpdateHandler2.java:211) at org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:166) ... 62 more Caused by: java.lang.ArrayIndexOutOfBoundsException at org.apache.lucene.store.BufferedIndexInput.readBytes(BufferedIndexInput.java:125) at org.apache.lucene.store.BufferedIndexInput.readBytes(BufferedIndexInput.java:116) at org.apache.lucene.codecs.lucene54.Lucene54DocValuesProducer$CompressedBinaryDocValues$CompressedBinaryTermsEnum.readTerm(Lucene54DocValuesProducer.java:1349) at org.apache.lucene.codecs.lucene54.Lucene54DocValuesProducer$CompressedBinaryDocValues$CompressedBinaryTermsEnum.next(Lucene54DocValuesProducer.java:1365) at org.apache.lucene.index.MultiTermsEnum.pushTop(MultiTermsEnum.java:275) at org.apache.lucene.index.MultiTermsEnum.next(MultiTermsEnum.java:301) at org.apache.lucene.index.MultiDocValues$OrdinalMap.(MultiDocValues.java:527) at org.apache.lucene.index.MultiDocValues$OrdinalMap.build(MultiDocValues.java:484) at org.apache.lucene.codecs.DocValuesConsumer.mergeSortedField(DocValuesConsumer.java:638) at org.apache.lucene.codecs.DocValuesConsumer.merge(DocValuesConsumer.java:204) at org.apache.lucene.codecs.perfield.PerFieldDocValuesFormat$FieldsWriter.merge(PerFieldDocValuesFormat.java:153) at org.apache.lucene.index.SegmentMerger.mergeDocValues(SegmentMerger.java:167) at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:111) at org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:4312) at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3889) at org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:588) at org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:626) Kind regards, Daphne Liu BI Architect • Big Data - Matrix SCM CEVA Logistics / 10751 Deerwood Park Blvd, Suite 200, Jacksonville, FL 32256 USA / www.cevalogistics.com T 904.9281448 / F 904.928.1525 / daphne@cevalogistics.com Making business flow -Original Message- From: Erick Erickson Sent: Wednesday, July 11, 2018 4:51 PM To: solr-user Subject: Re: solr filter query on text field bq. is there any difference if the fq field is a string field vs test Absolutely. string fields are not analyzed in any way. They're not tokenized. There are case sensitive. Etc. For example takd My dog as input. A string field will have a _single_ token "M
RE: Solr or Elasticsearch
I used Solr + Cassandra for Document search. Solr works very well with document indexing. For big data visualization, I use Elasticsearch + Grafana. As for today, Grafana is not supporting Solr. Elasticseach is very friendly and easy to use on multi-dimensional Group by and its real-time query performance is very good. Grafana dashboard solution can be viewed @ https://grafana.com/dashboards/5204/edit Kind regards, Daphne Liu BI Architect Big Data - Matrix SCM CEVA Logistics / 10751 Deerwood Park Blvd, Suite 200, Jacksonville, FL 32256 USA / www.cevalogistics.com T 904.9281448 / F 904.928.1525 / daphne@cevalogistics.com Making business flow -Original Message- From: Steven White [mailto:swhite4...@gmail.com] Sent: Thursday, March 22, 2018 9:14 AM To: solr-user@lucene.apache.org Subject: Solr or Elasticsearch Hi everyone, There are some good write ups on the internet comparing the two and the one thing that keeps coming up about Elasticsearch being superior to Solr is it's analytic capability. However, I cannot find what those analytic capabilities are and why they cannot be done using Solr. Can someone help me with this question? Personally, I'm a Solr user and the thing that concerns me about Elasticsearch is the fact that it is owned by a company that can any day decide to stop making Elasticsearch avaialble under Apache license and even completely close free access to it. So, this is a 2 part question: 1) What are the analytic capability of Elasticsearch that cannot be done using Solr? I want to see a complete list if possible. 2) Should an Elasticsearch user be worried that Elasticsearch may close it's open-source policy at anytime or that outsiders have no say about it's road map? Thanks, Steve NVOCC Services are provided by CEVA as agents for and on behalf of Pyramid Lines Limited trading as Pyramid Lines. This e-mail message is intended for the above named recipient(s) only. It may contain confidential information that is privileged. If you are not the intended recipient, you are hereby notified that any dissemination, distribution or copying of this e-mail and any attachment(s) is strictly prohibited. If you have received this e-mail by error, please immediately notify the sender by replying to this e-mail and deleting the message including any attachment(s) from your system. Thank you in advance for your cooperation and assistance. Although the company has taken reasonable precautions to ensure no viruses are present in this email, the company cannot accept responsibility for any loss or damage arising from the use of this email or attachments.
Solr deltaImportQuery ID configuration
Hello, I am using Solr 6.3.0. Does anyone know in deltaImportQuery when referencing id, should I use '${dih.delta.id}' or '${dataimporter.delta.id} ? Both were mentioned in Delta-Import wiki. I am confused. Thank you. Kind regards, Daphne Liu BI Architect - Matrix SCM CEVA Logistics / 10751 Deerwood Park Blvd, Suite 200, Jacksonville, FL 32256 USA / www.cevalogistics.com<http://www.cevalogistics.com/> T 904.564.1192 / F 904.928.1448 / daphne@cevalogistics.com<mailto:daphne@cevalogistics.com> NVOCC Services are provided by CEVA as agents for and on behalf of Pyramid Lines Limited trading as Pyramid Lines. This e-mail message is intended for the above named recipient(s) only. It may contain confidential information that is privileged. If you are not the intended recipient, you are hereby notified that any dissemination, distribution or copying of this e-mail and any attachment(s) is strictly prohibited. If you have received this e-mail by error, please immediately notify the sender by replying to this e-mail and deleting the message including any attachment(s) from your system. Thank you in advance for your cooperation and assistance. Although the company has taken reasonable precautions to ensure no viruses are present in this email, the company cannot accept responsibility for any loss or damage arising from the use of this email or attachments.
Solrcloud updating issue.
Hi All: We are trying to index a large number of documents in solrcloud and keep seeing the following error: org.apache.solr.common.SolrException: Service Unavailable, or org.apache.solr.common.SolrException: Service Unavailable but with a similar stack: request: http://wp-np2-c0:8983/solr/uniprot/update?wt=javabin=2 at org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrClient$Runner.sendUpdateStream(ConcurrentUpdateSolrClient.java:320) at org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrClient$Runner.run(ConcurrentUpdateSolrClient.java:185) at org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$22(ExecutorUtil.java:229) at org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor$$Lambda$57/936653983.run(Unknown Source) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) the settings are: 5 nodes in the cluster with each 16g memory, for the collection, it is defined with 5 shards, and replicate factor 2. the total number of documents is about 90m, each document size is quite large as well. we have also 5 zookeeper instances running on each node. On the solr side, we can see error like: solr.log.3-Error from server at http://wp-np2-c4.ebi.ac.uk:8983/solr/uniprot_shard5_replica1: Server Error solr.log.3-request: http://wp-np2-c4.ebi.ac.uk:8983/solr/uniprot_shard5_replica1/update?update.distrib=TOLEADER=http%3A%2F%2Fwp-np2-c0.ebi.ac.uk%3A8983%2Fsolr%2Funiprot_shard2_replica1%2F=javabin=2 solr.log.3-Remote error message: Async exception during distributed update: Connect to wp-np2-c2.ebi.ac.uk:8983 timed out solr.log.3- at org.apache.solr.update.processor.DistributedUpdateProcessor.doFinish(DistributedUpdateProcessor.java:948) solr.log.3- at org.apache.solr.update.processor.DistributedUpdateProcessor.finish(DistributedUpdateProcessor.java:1679) solr.log.3- at org.apache.solr.update.processor.LogUpdateProcessorFactory$LogUpdateProcessor.finish(LogUpdateProcessorFactory.java:182) -- solr.log.3- at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:95) solr.log.3- at org.eclipse.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93) solr.log.3- at org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.executeProduceConsume(ExecuteProduceConsume.java:303) solr.log.3- at org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.produceConsume(ExecuteProduceConsume.java:148) solr.log.3- at org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProduceConsume.java:136) solr.log.3- at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:671) solr.log.3- at org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:589) solr.log.3- at java.lang.Thread.run(Thread.java:745) The strange bit is this exception doesn't seem to be captured by the try/catch block in our main thread. and the cluster seems in the good health (all nodes up) after the job done, we just missing lots of documents! any suggestion where we should look to resolve this problem? Best Regards, Wudong
Can solrcloud be running on a read-only filesystem?
Hi All: We have a normal build/stage -> prod settings for our production pipeline. And we would build solr index in the build environment and then the index is copied to the prod environment. The solrcloud in prod seems working fine when the file system backing it is writable. However, we see many errors when the file system is readonly. Many exceptions are thrown regarding the tlog file cannot be open for write when the solr nodes are restarted with the new data; some of the nodes eventually are stuck in the recovering phase and never able to go back online in the cloud. Just wondering is anyone has any experience on Solrcloud running in readonly file system? Is it possible at all? Regards, Wudong
RE: Data Import
NO, I use the free version. I have the driver from someone else. I can share it if you want to use Cassandra. They have modified it for me since the free JDBC driver I found will timeout when the document is greater than 16mb. Kind regards, Daphne Liu BI Architect - Matrix SCM CEVA Logistics / 10751 Deerwood Park Blvd, Suite 200, Jacksonville, FL 32256 USA / www.cevalogistics.com T 904.564.1192 / F 904.928.1448 / daphne@cevalogistics.com -Original Message- From: vishal jain [mailto:jain02...@gmail.com] Sent: Friday, March 17, 2017 12:42 PM To: solr-user@lucene.apache.org Subject: Re: Data Import Hi Daphne, Are you using DSE? Thanks & Regards, Vishal On Fri, Mar 17, 2017 at 7:40 PM, Liu, Daphne <daphne@cevalogistics.com> wrote: > I just want to share my recent project. I have successfully sent all > our EDI documents to Cassandra 3.7 clusters using Solr 6.3 Data Import > JDBC Cassandra connector indexing our documents. > Since Cassandra is so fast for writing, compression rate is around 13% > and all my documents can be keep in my Cassandra clusters' memory, we > are very happy with the result. > > > Kind regards, > > Daphne Liu > BI Architect - Matrix SCM > > CEVA Logistics / 10751 Deerwood Park Blvd, Suite 200, Jacksonville, FL > 32256 USA / www.cevalogistics.com T 904.564.1192 / F 904.928.1448 / > daphne@cevalogistics.com > > > > -Original Message- > From: Alexandre Rafalovitch [mailto:arafa...@gmail.com] > Sent: Friday, March 17, 2017 9:54 AM > To: solr-user <solr-user@lucene.apache.org> > Subject: Re: Data Import > > I feel DIH is much better for prototyping, even though people do use > it in production. If you do want to use DIH, you may benefit from > reviewing the DIH-DB example I am currently rewriting in > https://issues.apache.org/jira/browse/SOLR-10312 (may need to change > luceneMatchVersion in solrconfig.xml first). > > CSV, etc, could be useful if you want to keep history of past imports, > again useful during development, as you evolve schema. > > SolrJ may actually be easiest/best for production since you already > have Java stack. > > The choice is yours in the end. > > Regards, >Alex. > > http://www.solr-start.com/ - Resources for Solr users, new and > experienced > > > On 17 March 2017 at 08:56, Shawn Heisey <apa...@elyograg.org> wrote: > > On 3/17/2017 3:04 AM, vishal jain wrote: > >> I am new to Solr and am trying to move data from my RDBMS to Solr. > >> I > know the available options are: > >> 1) Post Tool > >> 2) DIH > >> 3) SolrJ (as ours is a J2EE application). > >> > >> I want to know what is the recommended way for Data import in > >> production environment. Will sending data via SolrJ in batches be > faster than posting a csv using POST tool? > > > > I've heard that CSV import runs EXTREMELY fast, but I have never > > tested it. The same threading problem that I discuss below would > > apply to indexing this way. > > > > DIH is extremely powerful, but it has one glaring problem: It's > > single-threaded, which means that only one stream of data is going > > into Solr, and each batch of documents to be inserted must wait for > > the previous one to finish inserting before it can start. I do not > > know if DIH batches documents or sends them in one at a time. If > > you have a manually sharded index, you can run DIH on each shard in > > parallel, but each one will be single-threaded. That single thread > > is pretty efficient, but it's still only one thread. > > > > Sending multiple index updates to Solr in parallel (multi-threading) > > is how you radically speed up the Solr part of indexing. This is > > usually done with a custom indexing program, which might be written > > with SolrJ or even in a completely different language. > > > > One thing to keep in mind with ANY indexing method: Once the > > situation is examined closely, most people find that it's not Solr > > that makes their indexing slow. The bottleneck is usually the > > source system -- how quickly the data can be retrieved. It usually > > takes a lot longer to obtain the data than it does for Solr to index it. > > > > Thanks, > > Shawn > > > This e-mail message is intended for the above named recipient(s) only. > It may contain confidential information that is privileged. If you are > not the intended recipient, you are hereby notified that any > dissemination, distribution or copying of this e-mail and any > attachment(s) is strictly prohibited. If you have received this e-mail > by error, please immediately noti
RE: Data Import
I just want to share my recent project. I have successfully sent all our EDI documents to Cassandra 3.7 clusters using Solr 6.3 Data Import JDBC Cassandra connector indexing our documents. Since Cassandra is so fast for writing, compression rate is around 13% and all my documents can be keep in my Cassandra clusters' memory, we are very happy with the result. Kind regards, Daphne Liu BI Architect - Matrix SCM CEVA Logistics / 10751 Deerwood Park Blvd, Suite 200, Jacksonville, FL 32256 USA / www.cevalogistics.com T 904.564.1192 / F 904.928.1448 / daphne@cevalogistics.com -Original Message- From: Alexandre Rafalovitch [mailto:arafa...@gmail.com] Sent: Friday, March 17, 2017 9:54 AM To: solr-user <solr-user@lucene.apache.org> Subject: Re: Data Import I feel DIH is much better for prototyping, even though people do use it in production. If you do want to use DIH, you may benefit from reviewing the DIH-DB example I am currently rewriting in https://issues.apache.org/jira/browse/SOLR-10312 (may need to change luceneMatchVersion in solrconfig.xml first). CSV, etc, could be useful if you want to keep history of past imports, again useful during development, as you evolve schema. SolrJ may actually be easiest/best for production since you already have Java stack. The choice is yours in the end. Regards, Alex. http://www.solr-start.com/ - Resources for Solr users, new and experienced On 17 March 2017 at 08:56, Shawn Heisey <apa...@elyograg.org> wrote: > On 3/17/2017 3:04 AM, vishal jain wrote: >> I am new to Solr and am trying to move data from my RDBMS to Solr. I know >> the available options are: >> 1) Post Tool >> 2) DIH >> 3) SolrJ (as ours is a J2EE application). >> >> I want to know what is the recommended way for Data import in >> production environment. Will sending data via SolrJ in batches be faster >> than posting a csv using POST tool? > > I've heard that CSV import runs EXTREMELY fast, but I have never > tested it. The same threading problem that I discuss below would > apply to indexing this way. > > DIH is extremely powerful, but it has one glaring problem: It's > single-threaded, which means that only one stream of data is going > into Solr, and each batch of documents to be inserted must wait for > the previous one to finish inserting before it can start. I do not > know if DIH batches documents or sends them in one at a time. If you > have a manually sharded index, you can run DIH on each shard in > parallel, but each one will be single-threaded. That single thread is > pretty efficient, but it's still only one thread. > > Sending multiple index updates to Solr in parallel (multi-threading) > is how you radically speed up the Solr part of indexing. This is > usually done with a custom indexing program, which might be written > with SolrJ or even in a completely different language. > > One thing to keep in mind with ANY indexing method: Once the > situation is examined closely, most people find that it's not Solr > that makes their indexing slow. The bottleneck is usually the source > system -- how quickly the data can be retrieved. It usually takes a > lot longer to obtain the data than it does for Solr to index it. > > Thanks, > Shawn > This e-mail message is intended for the above named recipient(s) only. It may contain confidential information that is privileged. If you are not the intended recipient, you are hereby notified that any dissemination, distribution or copying of this e-mail and any attachment(s) is strictly prohibited. If you have received this e-mail by error, please immediately notify the sender by replying to this e-mail and deleting the message including any attachment(s) from your system. Thank you in advance for your cooperation and assistance. Although the company has taken reasonable precautions to ensure no viruses are present in this email, the company cannot accept responsibility for any loss or damage arising from the use of this email or attachments.
RE: Data Import Handler on 6.4.1
For Solr 6.3, I have to move mine to ../solr-6.3.0/server/solr-webapp/webapp/WEB-INF/lib. If you are using jetty. Kind regards, Daphne Liu BI Architect - Matrix SCM CEVA Logistics / 10751 Deerwood Park Blvd, Suite 200, Jacksonville, FL 32256 USA / www.cevalogistics.com T 904.564.1192 / F 904.928.1448 / daphne@cevalogistics.com -Original Message- From: Michael Tobias [mailto:mtob...@btinternet.com] Sent: Wednesday, March 15, 2017 2:36 PM To: solr-user@lucene.apache.org Subject: Data Import Handler on 6.4.1 I am sure I am missing something simple but I am running Solr 4.8.1 and trialling 6.4.1 on another computer. I have had to manually modify the automatic 6.4.1 scheme config as we use a set of specialised field types. They work fine. I am now trying to populate my core with data and having problems. Exactly what names/paths should I be using in the solrconfig.xml file to get this working - I don’t recall doing ANYTHING for 4.8.1 ? And where do I put the mysql-connector-java-5.1.29-bin.jar file and how do I reference it to get it loaded? ?? And then later in the solrconfig.xml I have: db-data-config.xml Any help much appreciated. Regards Michael -Original Message- From: David Hastings [mailto:hastings.recurs...@gmail.com] Sent: 15 March 2017 17:47 To: solr-user@lucene.apache.org Subject: Re: Get handler not working from your previous email: "There is no "id" field defined in the schema." you need an id field to use the get handler On Wed, Mar 15, 2017 at 1:45 PM, Chris Ulicny <culicny@iq.media> wrote: > I thought that "id" and "ids" were fixed parameters for the get > handler, but I never remember, so I've already tried both. Each time > it comes back with the same response of no document. > > On Wed, Mar 15, 2017 at 1:31 PM Alexandre Rafalovitch > <arafa...@gmail.com> > wrote: > > > Actually. > > > > I think Real Time Get handler has "id" as a magical parameter, not > > as a field name. It maps to the real id field via the uniqueKey > > definition: > > https://cwiki.apache.org/confluence/display/solr/RealTime+Get > > > > So, if you have not, could you try the way you originally wrote it. > > > > Regards, > >Alex. > > > > http://www.solr-start.com/ - Resources for Solr users, new and > experienced > > > > > > On 15 March 2017 at 13:22, Chris Ulicny <culicny@iq.media> wrote: > > > Sorry, that is a typo. The get is using the iqdocid field. There > > > is no > > "id" > > > field defined in the schema. > > > > > > solr/TestCollection/get?iqdocid=2957-TV-201604141900 > > > > > > solr/TestCollection/select?q=*:*=iqdocid:2957-TV-201604141900 > > > > > > On Wed, Mar 15, 2017 at 1:15 PM Erick Erickson < > erickerick...@gmail.com> > > > wrote: > > > > > >> Is this a typo or are you trying to use get with an "id" field > > >> and your filter query uses "iqdocid"? > > >> > > >> Best, > > >> Erick > > >> > > >> On Wed, Mar 15, 2017 at 8:31 AM, Chris Ulicny <culicny@iq.media> > wrote: > > >> > Yes, we're using a fixed schema with the iqdocid field set as > > >> > the > > >> uniqueKey. > > >> > > > >> > On Wed, Mar 15, 2017 at 11:28 AM Alexandre Rafalovitch < > > >> arafa...@gmail.com> > > >> > wrote: > > >> > > > >> >> What is your uniqueKey? Is it iqdocid? > > >> >> > > >> >> Regards, > > >> >>Alex. > > >> >> > > >> >> http://www.solr-start.com/ - Resources for Solr users, new and > > >> experienced > > >> >> > > >> >> > > >> >> On 15 March 2017 at 11:24, Chris Ulicny <culicny@iq.media> wrote: > > >> >> > Hi, > > >> >> > > > >> >> > I've been trying to use the get handler for a new solr cloud > > >> collection > > >> >> we > > >> >> > are using, and something seems to be amiss. > > >> >> > > > >> >> > We are running 6.3.0, so we did not explicitly define the > > >> >> > request > > >> handler > > >> >> > in the solrconfig since it's supposed to be implicitly defined. > We > > >> also > > >> >> > have the update log enabled with the defaul
Delta Import JDBC connection frame size larger than max length
Hello Solr experts, Is there a place in Solr (Delta Import Datasource?) where I can adjust the JDBC connection frame size to 256 mb ? I have adjusted the settings in Cassandra but I'm still getting this error. NonTransientConnectionException: org.apache.thrift.transport.TTransportException: Frame size (17676563) larger than max length (16384000 Thank you. Kind regards, Daphne Liu BI Architect - Matrix SCM CEVA Logistics / 10751 Deerwood Park Blvd, Suite 200, Jacksonville, FL 32256 USA / www.cevalogistics.com<http://www.cevalogistics.com/> T 904.564.1192 / F 904.928.1448 / daphne@cevalogistics.com<mailto:daphne@cevalogistics.com> This e-mail message is intended for the above named recipient(s) only. It may contain confidential information that is privileged. If you are not the intended recipient, you are hereby notified that any dissemination, distribution or copying of this e-mail and any attachment(s) is strictly prohibited. If you have received this e-mail by error, please immediately notify the sender by replying to this e-mail and deleting the message including any attachment(s) from your system. Thank you in advance for your cooperation and assistance. Although the company has taken reasonable precautions to ensure no viruses are present in this email, the company cannot accept responsibility for any loss or damage arising from the use of this email or attachments.
Query/Field Index Analysis corrected but return no docs in search
hi all: I was using solr 3.6 and tried to solve a recall-problem today , but encountered a weird problem. There's doc with field value : 均匀肤色, (just treated that word as a symbol if you don't know it, I just want to describe the problem as exact as possible). And below was the analysis result ( tokenization) : [image: Inline image 2] ( and text-version if need. Index Analyzer 均匀肤色 均匀 匀肤 肤色 均匀肤色 均匀 匀肤 肤色 均匀肤色 均匀 匀肤 肤色 Query Analyzer 均匀肤色 均匀肤色 均匀肤色 均匀肤色 The tokenization result indicate the query will recall/hit the doc undoubtedly. But the doc did not appear in the result if I search with "均匀肤色". I tried to simplify the qf/bf/fq/q, just test it with single field and single document, to make sure it was not caused by other problems but failed. It's knotty to debug because it only reproduced in product environments, I tried same config/index/query but not produce in dev environment. I'm here ask for helps if you met similar problem, or any clues/debug-method will be really helped.
RE: how to sampling search result
Alexandre, Thanks for reply. The use case is customer want to review document based on search result. But they do not want to review all, since it is costly. So, they want to pick partial (from 1% to 100%) document to review. For statistics, user also ask this function. It is kind of common requirement Do you know any plan to implement this feature in future? Post filter should work. Like collapsing query parser. Thanks, Yongtao -Original Message- From: Alexandre Rafalovitch [mailto:arafa...@gmail.com] Sent: Tuesday, September 27, 2016 9:25 PM To: solr-user Subject: Re: how to sampling search result I am not sure I understand what the business case is. However, you might be able to do something with a custom post-filter. Regards, Alex. Newsletter and resources for Solr beginners and intermediates: http://www.solr-start.com/ On 27 September 2016 at 22:29, Yongtao Liu <y...@commvault.com> wrote: > Mikhail, > > Thanks for your reply. > > Random field is based on index time. > We want to do sampling based on search result. > > Like if the random field has value 1 - 100. > And the query touched documents may all in range 90 - 100. > So random field will not help. > > Is it possible we can sampling based on search result? > > Thanks, > Yongtao > -Original Message- > From: Mikhail Khludnev [mailto:m...@apache.org] > Sent: Tuesday, September 27, 2016 11:16 AM > To: solr-user > Subject: Re: how to sampling search result > > Perhaps, you can apply a filter on random field. > > On Tue, Sep 27, 2016 at 5:57 PM, googoo <liu...@gmail.com> wrote: > >> Hi, >> >> Is it possible I can sampling based on "search result"? >> Like run query first, and search result return 1 million documents. >> With random sampling, 50% (500K) documents return for facet, and stats. >> >> The sampling need based on "search result". >> >> Thanks, >> Yongtao >> >> >> >> -- >> View this message in context: http://lucene.472066.n3. >> nabble.com/how-to-sampling-search-result-tp4298269.html >> Sent from the Solr - User mailing list archive at Nabble.com. >> > > > > -- > Sincerely yours > Mikhail Khludnev
RE: how to remove duplicate from search result
Shamik, Thanks a lot. Collapsing query parser solve the issue. Thanks, Yongtao -Original Message- From: shamik [mailto:sham...@gmail.com] Sent: Tuesday, September 27, 2016 3:09 PM To: solr-user@lucene.apache.org Subject: RE: how to remove duplicate from search result Did you take a look at Collapsin Query Parser ? https://cwiki.apache.org/confluence/display/solr/Collapse+and+Expand+Results -- View this message in context: http://lucene.472066.n3.nabble.com/how-to-remove-duplicate-from-search-result-tp4298272p4298305.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: how to remove duplicate from search result
David, Thanks for your reply. Group cannot solve the issue. We also need run facet and stats based on search result. With group, facet and stats result still count duplicate. Thanks, Yongtao -Original Message- From: David Santamauro [mailto:david.santama...@gmail.com] Sent: Tuesday, September 27, 2016 11:35 AM To: solr-user@lucene.apache.org Cc: david.santama...@gmail.com Subject: Re: how to remove duplicate from search result Have a look at https://cwiki.apache.org/confluence/display/solr/Result+Grouping On 09/27/2016 11:03 AM, googoo wrote: > hi, > > We want to provide remove duplicate from search result function. > > like we have below documents. > id(uniqueKey) guid > doc1 G1 > doc2 G2 > doc3 G3 > doc4 G1 > > user run one query and hit doc1, doc2 and doc4. > user want to remove duplicate from search result based on guid field. > since doc1 and doc4 has same guid, one of them should be drop from > search result. > > how we can address this requirement? > > Thanks, > Yongtao > > > > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/how-to-remove-duplicate-from-search > -result-tp4298272.html Sent from the Solr - User mailing list archive > at Nabble.com. >
RE: how to sampling search result
Mikhail, Thanks for your reply. Random field is based on index time. We want to do sampling based on search result. Like if the random field has value 1 - 100. And the query touched documents may all in range 90 - 100. So random field will not help. Is it possible we can sampling based on search result? Thanks, Yongtao -Original Message- From: Mikhail Khludnev [mailto:m...@apache.org] Sent: Tuesday, September 27, 2016 11:16 AM To: solr-user Subject: Re: how to sampling search result Perhaps, you can apply a filter on random field. On Tue, Sep 27, 2016 at 5:57 PM, googoo <liu...@gmail.com> wrote: > Hi, > > Is it possible I can sampling based on "search result"? > Like run query first, and search result return 1 million documents. > With random sampling, 50% (500K) documents return for facet, and stats. > > The sampling need based on "search result". > > Thanks, > Yongtao > > > > -- > View this message in context: http://lucene.472066.n3. > nabble.com/how-to-sampling-search-result-tp4298269.html > Sent from the Solr - User mailing list archive at Nabble.com. > -- Sincerely yours Mikhail Khludnev
remove user defined duplicate from search result
Hi, I am try to remove user defined duplicate from search result. like below documents match the query. when query return, I try to remove doc3 from result since it has duplicate guid with doc1. Id (uniqueKey) guid doc1 G1 doc2 G2 doc3 G1 To do this, I generate exclude list based guid field terms. For each term, we add from the second document to exclude list. And add these docs to QueryCommand filter. If there any better approach to handler this requirement? Below is code change in SolrIndexSearcer.java private TreeMapdupDocs = null; public QueryResult search(QueryResult qr, QueryCommand cmd) throws IOException { if (cmd.getUniqueField() != null) { DocSet filter = getDuplicateByField(cmd.getUniqueField()); if (cmd.getFilter() != null) cmd.getFilter().addAllTo(filter); cmd.setFilter(filter); } getDocListC(qr,cmd); return qr; } private synchronized BitDocSet getDuplicateByField(String field) throws IOException { if (dupDocs != null && dupDocs.containsKey(field)) { return dupDocs.get(field); } if (dupDocs == null) { dupDocs = new TreeMap (); } LeafReader reader = getLeafReader(); BitDocSet res = new BitDocSet(new FixedBitSet(maxDoc())); Terms terms = reader.terms(field); if (terms == null) { dupDocs.put(field, res); return res; } TermsEnum termEnum = terms.iterator(); PostingsEnum docs = null; BytesRef term = null; while ((term = termEnum.next()) != null) { docs = termEnum.postings(docs, PostingsEnum.NONE); // slip first document docs.nextDoc(); int docID = 0; while ((docID = docs.nextDoc()) != DocIdSetIterator.NO_MORE_DOCS) { res.add(docID); } } dupDocs.put(field, res); return res; } Thanks, Yongtao
RE: remove user defined duplicate from search result
Sorry, the table is missing. Update below email with table. -Original Message- From: Yongtao Liu [mailto:y...@commvault.com] Sent: Monday, September 26, 2016 10:47 AM To: 'solr-user@lucene.apache.org' Subject: remove user defined duplicate from search result Hi, I am try to remove user defined duplicate from search result. like below documents match the query. when query return, I try to remove doc3 from result since it has duplicate guid with doc1. id(uniqueKey) guid doc1G1 doc2G2 doc2G1 To do this, I generate exclude list based guid field terms. For each term, we add from the second document to exclude list. And add these docs to QueryCommand filter. If there any better approach to handler this requirement? Below is code change in SolrIndexSearcer.java private TreeMap<String, BitDocSet> dupDocs = null; public QueryResult search(QueryResult qr, QueryCommand cmd) throws IOException { if (cmd.getUniqueField() != null) { DocSet filter = getDuplicateByField(cmd.getUniqueField()); if (cmd.getFilter() != null) cmd.getFilter().addAllTo(filter); cmd.setFilter(filter); } getDocListC(qr,cmd); return qr; } private synchronized BitDocSet getDuplicateByField(String field) throws IOException { if (dupDocs != null && dupDocs.containsKey(field)) { return dupDocs.get(field); } if (dupDocs == null) { dupDocs = new TreeMap<String, BitDocSet>(); } LeafReader reader = getLeafReader(); BitDocSet res = new BitDocSet(new FixedBitSet(maxDoc())); Terms terms = reader.terms(field); if (terms == null) { dupDocs.put(field, res); return res; } TermsEnum termEnum = terms.iterator(); PostingsEnum docs = null; BytesRef term = null; while ((term = termEnum.next()) != null) { docs = termEnum.postings(docs, PostingsEnum.NONE); // slip first document docs.nextDoc(); int docID = 0; while ((docID = docs.nextDoc()) != DocIdSetIterator.NO_MORE_DOCS) { res.add(docID); } } dupDocs.put(field, res); return res; } Thanks, Yongtao
RE: Errors for Streaming Expressions using JDBC (Oracle) stream source
Opened ticket: Issue SOLR-9246 - Errors for Streaming Expressions using JDBC (Oracle) stream source Regards, Hui -Original Message- From: Joel Bernstein [mailto:joels...@gmail.com] Sent: Thursday, June 23, 2016 11:56 AM To: solr-user@lucene.apache.org Subject: Re: Errors for Streaming Expressions using JDBC (Oracle) stream source Ok you should be able to create the jira. Joel Bernstein http://joelsolr.blogspot.com/ On Thu, Jun 23, 2016 at 11:52 AM, Hui Liu <h...@opentext.com> wrote: > Joel, I just opened an account for this, my user name is > h...@opentext.com; let me know when I can open the ticket. > > And thanks for the info, I will be glad to do any collaboration needed > as a reporter on this issue, so feel free to let me know what I need to do. > > Regards, > Hui > > -Original Message- > From: Joel Bernstein [mailto:joels...@gmail.com] > Sent: Thursday, June 23, 2016 11:23 AM > To: solr-user@lucene.apache.org > Subject: Re: Errors for Streaming Expressions using JDBC (Oracle) > stream source > > Sure. You can create a ticket from here > > https://issues.apache.org/jira/browse/SOLR/?selectedTab=com.atlassian. > jira.jira-projects-plugin:summary-panel > > After you've created an account I'll need to add your username to the > contributors group. If you post your username back to this thread I'll > do that. > > Then you can open a ticket. > > This particular issue will require access to an Oracle database so it > will likely be handled as a collaboration between the reporter and a > committer, because not all committers are going to have access to Oracle. > > DIH will accomplish the data load for you. > > The JDBCStream can be used to do things like joins involving RDMBS and > Solr. > > > > > > > > > > Joel Bernstein > http://joelsolr.blogspot.com/ > > On Thu, Jun 23, 2016 at 11:06 AM, Hui Liu <h...@opentext.com> wrote: > > > Thanks Joel, I have never opened a ticket before with Solr, do you > > know the steps (url etc) I should follow? I will be glad to do so... > > At the meantime, I guess the workaround is to use 'data import > > handler' to get the data from Oracle into Solr? > > > > Regards, > > Hui > > -Original Message- > > From: Joel Bernstein [mailto:joels...@gmail.com] > > Sent: Thursday, June 23, 2016 10:55 AM > > To: solr-user@lucene.apache.org > > Subject: Re: Errors for Streaming Expressions using JDBC (Oracle) > > stream source > > > > Let's open a ticket for this issue specific to Oracle. > > > > Joel Bernstein > > http://joelsolr.blogspot.com/ > > > > On Thu, Jun 23, 2016 at 10:54 AM, Joel Bernstein > > <joels...@gmail.com> > > wrote: > > > > > I think we're going to have to add some debugging into the code to > > > find what's going on. On line 225 in JDBCStream it's getting the > > > class name for each column. It would be good know what the class > > > names are that the Oracles driver is returning. > > > > > > > > > https://github.com/apache/lucene-solr/blob/releases/lucene-solr/6.0. > > > 0/ > > > solr/solrj/src/java/org/apache/solr/client/solrj/io/stream/JDBCStream. > > > java > > > > > > We probably need to throw an exception that includes the class > > > name to help users report what different drivers using for the classes. > > > > > > > > > Joel Bernstein > > > http://joelsolr.blogspot.com/ > > > > > > On Thu, Jun 23, 2016 at 10:18 AM, Hui Liu <h...@opentext.com> wrote: > > > > > >> Joel - thanks for the quick response, in my previous test, the > > >> collection 'document5' does have a field called 'date_created' > > >> which is type 'date', even though my SQL SELECT below did not > > >> select any un-supported data type (all columns are either long or > > >> String in jdbc type); but to totally rule out this issue, I > > >> created a new collection 'document6' which only contain long and > > >> string data type, and a new Oracle table 'document6' that only > > >> contain columns whose jdbc type is long and string, see below for > > >> schema.xml > and table definition: > > >> > > >> schema.xml for Solr collection 'document6': (newly created empty > > >> collections with 2 shards) > > >> > > >> = > > >> == == = > > >> > > >> > > >>
RE: Errors for Streaming Expressions using JDBC (Oracle) stream source
Joel, I just opened an account for this, my user name is h...@opentext.com; let me know when I can open the ticket. And thanks for the info, I will be glad to do any collaboration needed as a reporter on this issue, so feel free to let me know what I need to do. Regards, Hui -Original Message- From: Joel Bernstein [mailto:joels...@gmail.com] Sent: Thursday, June 23, 2016 11:23 AM To: solr-user@lucene.apache.org Subject: Re: Errors for Streaming Expressions using JDBC (Oracle) stream source Sure. You can create a ticket from here https://issues.apache.org/jira/browse/SOLR/?selectedTab=com.atlassian.jira.jira-projects-plugin:summary-panel After you've created an account I'll need to add your username to the contributors group. If you post your username back to this thread I'll do that. Then you can open a ticket. This particular issue will require access to an Oracle database so it will likely be handled as a collaboration between the reporter and a committer, because not all committers are going to have access to Oracle. DIH will accomplish the data load for you. The JDBCStream can be used to do things like joins involving RDMBS and Solr. Joel Bernstein http://joelsolr.blogspot.com/ On Thu, Jun 23, 2016 at 11:06 AM, Hui Liu <h...@opentext.com> wrote: > Thanks Joel, I have never opened a ticket before with Solr, do you > know the steps (url etc) I should follow? I will be glad to do so... > At the meantime, I guess the workaround is to use 'data import > handler' to get the data from Oracle into Solr? > > Regards, > Hui > -Original Message- > From: Joel Bernstein [mailto:joels...@gmail.com] > Sent: Thursday, June 23, 2016 10:55 AM > To: solr-user@lucene.apache.org > Subject: Re: Errors for Streaming Expressions using JDBC (Oracle) > stream source > > Let's open a ticket for this issue specific to Oracle. > > Joel Bernstein > http://joelsolr.blogspot.com/ > > On Thu, Jun 23, 2016 at 10:54 AM, Joel Bernstein <joels...@gmail.com> > wrote: > > > I think we're going to have to add some debugging into the code to > > find what's going on. On line 225 in JDBCStream it's getting the > > class name for each column. It would be good know what the class > > names are that the Oracles driver is returning. > > > > > > https://github.com/apache/lucene-solr/blob/releases/lucene-solr/6.0. > > 0/ > > solr/solrj/src/java/org/apache/solr/client/solrj/io/stream/JDBCStream. > > java > > > > We probably need to throw an exception that includes the class name > > to help users report what different drivers using for the classes. > > > > > > Joel Bernstein > > http://joelsolr.blogspot.com/ > > > > On Thu, Jun 23, 2016 at 10:18 AM, Hui Liu <h...@opentext.com> wrote: > > > >> Joel - thanks for the quick response, in my previous test, the > >> collection 'document5' does have a field called 'date_created' > >> which is type 'date', even though my SQL SELECT below did not > >> select any un-supported data type (all columns are either long or > >> String in jdbc type); but to totally rule out this issue, I created > >> a new collection 'document6' which only contain long and string > >> data type, and a new Oracle table 'document6' that only contain > >> columns whose jdbc type is long and string, see below for schema.xml and > >> table definition: > >> > >> schema.xml for Solr collection 'document6': (newly created empty > >> collections with 2 shards) > >> > >> === > >> == = > >> > >> > >> > >> >> sortMissingLast="true" docValues="true" /> > >> >> precisionStep="0" positionIncrementGap="0"/> > >> > >> > >> > >> > >> > >>>> sortMissingLast="true" omitNorms="true"/> > >> > >> > >> >> multiValued="false"/> > >> >> docValues="true"/> > >> >> stored="true" docValues="true"/> > >> >> stored="true" docValues="true"/> > >> >> stored="true" docValues="true"/> > >> >> stored="true" docValues="true"/> > >> > >> document_id > >> document_id > >> > >> > >> Oracle table 'document6': (newly create
RE: Errors for Streaming Expressions using JDBC (Oracle) stream source
Thanks Joel, I have never opened a ticket before with Solr, do you know the steps (url etc) I should follow? I will be glad to do so... At the meantime, I guess the workaround is to use 'data import handler' to get the data from Oracle into Solr? Regards, Hui -Original Message- From: Joel Bernstein [mailto:joels...@gmail.com] Sent: Thursday, June 23, 2016 10:55 AM To: solr-user@lucene.apache.org Subject: Re: Errors for Streaming Expressions using JDBC (Oracle) stream source Let's open a ticket for this issue specific to Oracle. Joel Bernstein http://joelsolr.blogspot.com/ On Thu, Jun 23, 2016 at 10:54 AM, Joel Bernstein <joels...@gmail.com> wrote: > I think we're going to have to add some debugging into the code to > find what's going on. On line 225 in JDBCStream it's getting the class > name for each column. It would be good know what the class names are > that the Oracles driver is returning. > > > https://github.com/apache/lucene-solr/blob/releases/lucene-solr/6.0.0/ > solr/solrj/src/java/org/apache/solr/client/solrj/io/stream/JDBCStream. > java > > We probably need to throw an exception that includes the class name to > help users report what different drivers using for the classes. > > > Joel Bernstein > http://joelsolr.blogspot.com/ > > On Thu, Jun 23, 2016 at 10:18 AM, Hui Liu <h...@opentext.com> wrote: > >> Joel - thanks for the quick response, in my previous test, the >> collection 'document5' does have a field called 'date_created' which >> is type 'date', even though my SQL SELECT below did not select any >> un-supported data type (all columns are either long or String in jdbc >> type); but to totally rule out this issue, I created a new collection >> 'document6' which only contain long and string data type, and a new >> Oracle table 'document6' that only contain columns whose jdbc type is >> long and string, see below for schema.xml and table definition: >> >> schema.xml for Solr collection 'document6': (newly created empty >> collections with 2 shards) >> >> = >> = >> >> >> >> > sortMissingLast="true" docValues="true" /> >> > precisionStep="0" positionIncrementGap="0"/> >> >> >> >> >> >> > sortMissingLast="true" omitNorms="true"/> >> >> >> > multiValued="false"/> >> > docValues="true"/> >> > stored="true" docValues="true"/> >> > stored="true" docValues="true"/> >> > stored="true" docValues="true"/> >> > stored="true" docValues="true"/> >> >> document_id >> document_id >> >> >> Oracle table 'document6': (newly created Oracle table with 9 records) >> == >> QA_DOCREP@qlgdb1 > desc document6 >> Name Null?Type >> - >> >> DOCUMENT_ID NOT NULL NUMBER(12) >> SENDER_MSG_DESTVARCHAR2(256) >> RECIP_MSG_DEST VARCHAR2(256) >> DOCUMENT_TYPE VARCHAR2(20) >> DOCUMENT_KEY VARCHAR2(100) >> >> Then I tried this jdbc streaming expression in my browser, >> still getting the same error stack (see below); By looking at the >> source code you have provided below, it seems Solr is able to connect >> to this Oracle db, but just cannot read the resultset for some >> reason? Do you think it has something to do with the jdbc driver version? >> >> http://localhost:8988/solr/document6/stream?expr=jdbc(connection= >> "jdbc:oracle:thin:qa_docrep/ >> abc...@lit-racq01-scan.qa.gxsonline.net:1521/qlgdb",sql="SELECT >> document_id,sender_msg_dest,recip_msg_dest,document_type,document_key >> FROM document6",sort="document_id >> asc",driver="oracle.jdbc.driver.OracleDriver") >> >> errors in solr.log >> == >> 2016-06-23 14:07:02.833 INFO (qtp1389647288-139) [c:document6 >> s:shard2 >> r:core_node1 x:document6_shard2_replica1] o.a.s.c.S.Request >> [document6_shard2_
RE: Errors for Streaming Expressions using JDBC (Oracle) stream source
onseWriter.java:183) at org.apache.solr.response.JSONWriter.writeNamedList(JSONResponseWriter.java:299) at org.apache.solr.response.JSONWriter.writeResponse(JSONResponseWriter.java:95) at org.apache.solr.response.JSONResponseWriter.write(JSONResponseWriter.java:60) at org.apache.solr.response.QueryResponseWriterUtil.writeQueryResponse(QueryResponseWriterUtil.java:65) at org.apache.solr.servlet.HttpSolrCall.writeResponse(HttpSolrCall.java:725) at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:469) ... 26 more -Original Message- From: Joel Bernstein [mailto:joels...@gmail.com] Sent: Thursday, June 23, 2016 7:56 AM To: solr-user@lucene.apache.org Subject: Re: Errors for Streaming Expressions using JDBC (Oracle) stream source I'm wondering if you're selecting an unsupported data type. The exception being thrown looks like it could happen if that were the case. The supported types are in the Java doc. https://github.com/apache/lucene-solr/blob/releases/lucene-solr/6.0.0/solr/solrj/src/java/org/apache/solr/client/solrj/io/stream/JDBCStream.java Joel Bernstein http://joelsolr.blogspot.com/ On Wed, Jun 22, 2016 at 11:46 PM, Hui Liu <h...@opentext.com> wrote: > Hi, > > > > I have Solr 6.0.0 installed on my PC (windows 7), I was > experimenting with ‘Streaming Expression’ by using Oracle jdbc as the > stream source, following is the http command I am using: > > > > http://localhost:8988/solr/document5/stream?expr=jdbc(connection= > "jdbc:oracle:thin:qa_docrep/ > abc...@lit-racq01-scan.qa.gxsonline.net:1521/qlgdb",sql="SELECT > document_id,sender_msg_dest,recip_msg_dest,document_type,document_key, > sender_bu_id,recip_bu_id,date_created > FROM tg_document WHERE rownum < 5",sort="document_id > asc",driver="oracle.jdbc.driver.OracleDriver") > > > > I can access this Oracle db from my PC via regular JDBC > connection. I did put Oracle jdbc driver jar ‘ojdbc14.jar’ (same jar > used in my regular jdbc code) under Solr/server/lib dir and restarted > Solr cloud. Below is the error from solr.log (got a null pointer > error); I am merely trying to get the data returned from Oracle table, > I have not tried to index them in the Solr yet, attached is the > shema.xml and solrconfig.xml for this collection ‘document5’; does > anyone know what am I missing? thanks for any help! > > > > Regards, > > Hui Liu > > > > Error from Solr.log: > > = > > 2016-06-23 03:17:34.413 INFO (qtp1389647288-19) [c:document5 s:shard2 > r:core_node2 x:document5_shard2_replica1] o.a.s.c.S.Request > [document5_shard2_replica1] webapp=/solr path=/stream > params={expr=jdbc(connection%3D"jdbc:oracle:thin:qa_docrep/ > abc...@lit-racq01-scan.qa.gxsonline.net:1521/qlgdb",sql%3D"SELECT+docu > ment_id,sender_msg_dest,recip_msg_dest,document_type,document_key,send > er_bu_id,recip_bu_id+FROM+tg_document+WHERE+rownum+<+5",sort%3D"docume > nt_id+asc",driver%3D"oracle.jdbc.OracleDriver")} > status=0 QTime=0 > > 2016-06-23 03:17:37.588 ERROR (qtp1389647288-19) [c:document5 s:shard2 > r:core_node2 x:document5_shard2_replica1] > o.a.s.c.s.i.s.ExceptionStream java.lang.NullPointerException > > at > org.apache.solr.client.solrj.io.stream.JDBCStream.read(JDBCStream.java > :305) > > at > org.apache.solr.client.solrj.io.stream.ExceptionStream.read(ExceptionS > tream.java:64) > > at > org.apache.solr.handler.StreamHandler$TimerStream.read(StreamHandler.j > ava:374) > > at > org.apache.solr.response.TextResponseWriter.writeTupleStream(TextRespo > nseWriter.java:305) > > at > org.apache.solr.response.TextResponseWriter.writeVal(TextResponseWrite > r.java:167) > > at > org.apache.solr.response.JSONWriter.writeNamedListAsMapWithDups(JSONRe > sponseWriter.java:183) > > at > org.apache.solr.response.JSONWriter.writeNamedList(JSONResponseWriter. > java:299) > > at > org.apache.solr.response.JSONWriter.writeResponse(JSONResponseWriter.j > ava:95) > > at > org.apache.solr.response.JSONResponseWriter.write(JSONResponseWriter.j > ava:60) > > at > org.apache.solr.response.QueryResponseWriterUtil.writeQueryResponse(Qu > eryResponseWriterUtil.java:65) > > at > org.apache.solr.servlet.HttpSolrCall.writeResponse(HttpSolrCall.java:7 > 25) > > at > org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:469) > > at > org.apache.solr.servl
Errors for Streaming Expressions using JDBC (Oracle) stream source
Hi, I have Solr 6.0.0 installed on my PC (windows 7), I was experimenting with 'Streaming Expression' by using Oracle jdbc as the stream source, following is the http command I am using: http://localhost:8988/solr/document5/stream?expr=jdbc(connection="jdbc:oracle:thin:qa_docrep/abc...@lit-racq01-scan.qa.gxsonline.net:1521/qlgdb",sql="SELECT document_id,sender_msg_dest,recip_msg_dest,document_type,document_key,sender_bu_id,recip_bu_id,date_created FROM tg_document WHERE rownum < 5",sort="document_id asc",driver="oracle.jdbc.driver.OracleDriver") I can access this Oracle db from my PC via regular JDBC connection. I did put Oracle jdbc driver jar 'ojdbc14.jar' (same jar used in my regular jdbc code) under Solr/server/lib dir and restarted Solr cloud. Below is the error from solr.log (got a null pointer error); I am merely trying to get the data returned from Oracle table, I have not tried to index them in the Solr yet, attached is the shema.xml and solrconfig.xml for this collection 'document5'; does anyone know what am I missing? thanks for any help! Regards, Hui Liu Error from Solr.log: = 2016-06-23 03:17:34.413 INFO (qtp1389647288-19) [c:document5 s:shard2 r:core_node2 x:document5_shard2_replica1] o.a.s.c.S.Request [document5_shard2_replica1] webapp=/solr path=/stream params={expr=jdbc(connection%3D"jdbc:oracle:thin:qa_docrep/abc...@lit-racq01-scan.qa.gxsonline.net:1521/qlgdb",sql%3D"SELECT+document_id,sender_msg_dest,recip_msg_dest,document_type,document_key,sender_bu_id,recip_bu_id+FROM+tg_document+WHERE+rownum+<+5",sort%3D"document_id+asc",driver%3D"oracle.jdbc.OracleDriver")} status=0 QTime=0 2016-06-23 03:17:37.588 ERROR (qtp1389647288-19) [c:document5 s:shard2 r:core_node2 x:document5_shard2_replica1] o.a.s.c.s.i.s.ExceptionStream java.lang.NullPointerException at org.apache.solr.client.solrj.io.stream.JDBCStream.read(JDBCStream.java:305) at org.apache.solr.client.solrj.io.stream.ExceptionStream.read(ExceptionStream.java:64) at org.apache.solr.handler.StreamHandler$TimerStream.read(StreamHandler.java:374) at org.apache.solr.response.TextResponseWriter.writeTupleStream(TextResponseWriter.java:305) at org.apache.solr.response.TextResponseWriter.writeVal(TextResponseWriter.java:167) at org.apache.solr.response.JSONWriter.writeNamedListAsMapWithDups(JSONResponseWriter.java:183) at org.apache.solr.response.JSONWriter.writeNamedList(JSONResponseWriter.java:299) at org.apache.solr.response.JSONWriter.writeResponse(JSONResponseWriter.java:95) at org.apache.solr.response.JSONResponseWriter.write(JSONResponseWriter.java:60) at org.apache.solr.response.QueryResponseWriterUtil.writeQueryResponse(QueryResponseWriterUtil.java:65) at org.apache.solr.servlet.HttpSolrCall.writeResponse(HttpSolrCall.java:725) at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:469) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:229) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:184) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1668) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:581) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143) at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:226) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1160) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:511) at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1092) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141) at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:213) at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:119) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134) at org.eclipse.jetty.server.Server.handle(Server.java:518) at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:308) at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:244) at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(Abs
RE: Questions regarding re-index when using Solr as a data source
Thank you Walter. -Original Message- From: Walter Underwood [mailto:wun...@wunderwood.org] Sent: Friday, June 10, 2016 3:53 PM To: solr-user@lucene.apache.org Subject: Re: Questions regarding re-index when using Solr as a data source Those are brand new features that I have not used, so I can’t comment on them. But I know they do not make Solr into a database. If you need a transactional database that can support search, you probably want MarkLogic. I worked at MarkLogic for a couple of years. In some ways, MarkLogic is like Solr, but the support for transactions goes very deep. It is not something you can put on top of a search engine. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Jun 10, 2016, at 12:39 PM, Hui Liu <h...@opentext.com> wrote: > > What if we plan to use Solr version 6.x? this url says it support 2 different > update modes: atomic update and optimistic concurrency: > > https://cwiki.apache.org/confluence/display/solr/Updating+Parts+of+Documents > > I tested 'optimistic concurrency' and it appears to be working, i.e if a > document I am updating got changed by another person I will get error if I > supply a _version_ value, So maybe you are referring to an older version of > Solr? > > Regards, > Hui > > -Original Message- > From: Walter Underwood [mailto:wun...@wunderwood.org] > Sent: Friday, June 10, 2016 11:18 AM > To: solr-user@lucene.apache.org > Subject: Re: Questions regarding re-index when using Solr as a data source > > Solr does not have transactions at all. The “commit” is really “submit batch”. > > Solr does not have update. You can add, delete, or replace an entire document. > > There is no optimistic concurrency control because there is no concurrency > control. Clients can concurrently add documents to a batch, then any client > can submit the entire batch. > > Replication is not transactional. Replication is a file copy of the > underlying indexes (classic) or copying the documents in a batch (Solr Cloud). > > wunder > Walter Underwood > wun...@wunderwood.org > http://observer.wunderwood.org/ (my blog) > > >> On Jun 10, 2016, at 7:41 AM, Hui Liu <h...@opentext.com> wrote: >> >> Walter, >> >> Thank you for your advice. We are new to Solr and have been using >> Oracle for past 10+ years, so we are used to the idea of having a tool that >> can be used as both data store and also searchable by having indexes on top >> of it. I guess the reason we are considering Solr as data store is due to it >> has some features of a database that our application requires, such as 1) be >> able to detect duplicate record by having a unique field; 2) allow us to do >> concurrent update by using Optimistic concurrency control feature; 3) its >> 'replication' feature allowing us to store multiple copies of data; so if we >> were to use a file system, we will not have the above features (at least not >> 1 and 2) and have to implement those ourselves. The other option is to pick >> another database tool such as Mysql or Cassandra, then we will need to learn >> and support an additional tool besides Solr; but you brought up several very >> good points about operational factors we should consider if we pick Solr as >> a data store. Also our application is more of a OLTP than OLAP. I will >> update our colleagues and stakeholders about these concerns. Thanks again! >> >> Regards, >> Hui >> -Original Message- >> From: Walter Underwood [mailto:wun...@wunderwood.org] >> Sent: Thursday, June 09, 2016 1:24 PM >> To: solr-user@lucene.apache.org >> Subject: Re: Questions regarding re-index when using Solr as a data source >> >> In the HowToReindex page, under “Using Solr as a Data Store”, it says this: >> "Don't do this unless you have no other option. Solr is not really designed >> for this role.” So don’t start by planning to do this. >> >> Using a second copy of Solr is still using Solr as a repository. That >> doesn’t satisfy any sort of requirements for disaster recovery. How do you >> know that data is good? How do you make a third copy? How do you roll back >> to a previous version? How do you deal with a security breach that affects >> all your systems? Are the systems in the same data center? How do you deal >> with ransomware (U. of Calgary paid $20K yesterday)? >> >> If a consultant suggested this to me, I’d probably just give up and get a >> different consultant. >> >> Here is what we do for batch loading. >> >> 1. For each Solr collection, we define a JSONL feed f
RE: Questions regarding re-index when using Solr as a data source
What if we plan to use Solr version 6.x? this url says it support 2 different update modes: atomic update and optimistic concurrency: https://cwiki.apache.org/confluence/display/solr/Updating+Parts+of+Documents I tested 'optimistic concurrency' and it appears to be working, i.e if a document I am updating got changed by another person I will get error if I supply a _version_ value, So maybe you are referring to an older version of Solr? Regards, Hui -Original Message- From: Walter Underwood [mailto:wun...@wunderwood.org] Sent: Friday, June 10, 2016 11:18 AM To: solr-user@lucene.apache.org Subject: Re: Questions regarding re-index when using Solr as a data source Solr does not have transactions at all. The “commit” is really “submit batch”. Solr does not have update. You can add, delete, or replace an entire document. There is no optimistic concurrency control because there is no concurrency control. Clients can concurrently add documents to a batch, then any client can submit the entire batch. Replication is not transactional. Replication is a file copy of the underlying indexes (classic) or copying the documents in a batch (Solr Cloud). wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Jun 10, 2016, at 7:41 AM, Hui Liu <h...@opentext.com> wrote: > > Walter, > > Thank you for your advice. We are new to Solr and have been using > Oracle for past 10+ years, so we are used to the idea of having a tool that > can be used as both data store and also searchable by having indexes on top > of it. I guess the reason we are considering Solr as data store is due to it > has some features of a database that our application requires, such as 1) be > able to detect duplicate record by having a unique field; 2) allow us to do > concurrent update by using Optimistic concurrency control feature; 3) its > 'replication' feature allowing us to store multiple copies of data; so if we > were to use a file system, we will not have the above features (at least not > 1 and 2) and have to implement those ourselves. The other option is to pick > another database tool such as Mysql or Cassandra, then we will need to learn > and support an additional tool besides Solr; but you brought up several very > good points about operational factors we should consider if we pick Solr as a > data store. Also our application is more of a OLTP than OLAP. I will update > our colleagues and stakeholders about these concerns. Thanks again! > > Regards, > Hui > -Original Message- > From: Walter Underwood [mailto:wun...@wunderwood.org] > Sent: Thursday, June 09, 2016 1:24 PM > To: solr-user@lucene.apache.org > Subject: Re: Questions regarding re-index when using Solr as a data source > > In the HowToReindex page, under “Using Solr as a Data Store”, it says this: > "Don't do this unless you have no other option. Solr is not really designed > for this role.” So don’t start by planning to do this. > > Using a second copy of Solr is still using Solr as a repository. That doesn’t > satisfy any sort of requirements for disaster recovery. How do you know that > data is good? How do you make a third copy? How do you roll back to a > previous version? How do you deal with a security breach that affects all > your systems? Are the systems in the same data center? How do you deal with > ransomware (U. of Calgary paid $20K yesterday)? > > If a consultant suggested this to me, I’d probably just give up and get a > different consultant. > > Here is what we do for batch loading. > > 1. For each Solr collection, we define a JSONL feed format, with a JSON > Schema. > 2. The owners of the data write an extractor to pull the data out of wherever > it is, then generate the JSON feed. > 3. We validate the JSON feed against the JSON schema. > 4. If the feed is valid, we save it to Amazon S3 along with a manifest which > lists the version of the JSON Schema. > 5. Then a multi-threaded loader reads the feed and sends it to Solr. > > Reloading is safe and easy, because all the feeds in S3 are valid. > > Storing backups in S3 instead of running a second Solr is massively cheaper, > easier, and safer. > > We also have a clear contract between the content owners and the search team. > That contract is enforced by the JSON Schema on every single batch. > > wunder > Walter Underwood > wun...@wunderwood.org > http://observer.wunderwood.org/ (my blog) > > >> On Jun 9, 2016, at 9:51 AM, Hui Liu <h...@opentext.com> wrote: >> >> Hi Walter, >> >> Thank you for the reply, sorry I need to clarify what I mean by 'migrate >> tables' from Oracle to Solr, we are not literally move existing records fro
RE: Questions regarding re-index when using Solr as a data source
Walter, Thank you for your advice. We are new to Solr and have been using Oracle for past 10+ years, so we are used to the idea of having a tool that can be used as both data store and also searchable by having indexes on top of it. I guess the reason we are considering Solr as data store is due to it has some features of a database that our application requires, such as 1) be able to detect duplicate record by having a unique field; 2) allow us to do concurrent update by using Optimistic concurrency control feature; 3) its 'replication' feature allowing us to store multiple copies of data; so if we were to use a file system, we will not have the above features (at least not 1 and 2) and have to implement those ourselves. The other option is to pick another database tool such as Mysql or Cassandra, then we will need to learn and support an additional tool besides Solr; but you brought up several very good points about operational factors we should consider if we pick Solr as a data store. Also our application is more of a OLTP than OLAP. I will update our colleagues and stakeholders about these concerns. Thanks again! Regards, Hui -Original Message- From: Walter Underwood [mailto:wun...@wunderwood.org] Sent: Thursday, June 09, 2016 1:24 PM To: solr-user@lucene.apache.org Subject: Re: Questions regarding re-index when using Solr as a data source In the HowToReindex page, under “Using Solr as a Data Store”, it says this: "Don't do this unless you have no other option. Solr is not really designed for this role.” So don’t start by planning to do this. Using a second copy of Solr is still using Solr as a repository. That doesn’t satisfy any sort of requirements for disaster recovery. How do you know that data is good? How do you make a third copy? How do you roll back to a previous version? How do you deal with a security breach that affects all your systems? Are the systems in the same data center? How do you deal with ransomware (U. of Calgary paid $20K yesterday)? If a consultant suggested this to me, I’d probably just give up and get a different consultant. Here is what we do for batch loading. 1. For each Solr collection, we define a JSONL feed format, with a JSON Schema. 2. The owners of the data write an extractor to pull the data out of wherever it is, then generate the JSON feed. 3. We validate the JSON feed against the JSON schema. 4. If the feed is valid, we save it to Amazon S3 along with a manifest which lists the version of the JSON Schema. 5. Then a multi-threaded loader reads the feed and sends it to Solr. Reloading is safe and easy, because all the feeds in S3 are valid. Storing backups in S3 instead of running a second Solr is massively cheaper, easier, and safer. We also have a clear contract between the content owners and the search team. That contract is enforced by the JSON Schema on every single batch. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Jun 9, 2016, at 9:51 AM, Hui Liu <h...@opentext.com> wrote: > > Hi Walter, > > Thank you for the reply, sorry I need to clarify what I mean by 'migrate > tables' from Oracle to Solr, we are not literally move existing records from > Oracle to Solr, instead, we are building a new application directly feed data > into Solr as document and fields, in parallel of another existing application > which feeds the same data into Oracle tables/columns, of course, the Solr > schema will be somewhat different than Oracle; also we only keep those data > for 90 days for user to search on, we hope once we run both system in > parallel for some time (> 90 days), we will build up enough new data in Solr > and we no longer need any old data in Oracle, by then we will be able to use > Solr as our only data store. > > It sounds to me that we may need to consider save the data into either file > system, or another database, in case we need to rebuild the indexes; and the > reason I mentioned to save data into another Solr system is by reading this > info from https://wiki.apache.org/solr/HowToReindex : so just trying to get a > feedback on if there is any update on this approach? And any better way to do > this to minimize the downtime caused by the schema change and re-index? For > example, in Oracle, we are able to add a new column or new index online > without any impact of existing queries as existing indexes are intact. > > Alternatives when a traditional reindex isn't possible > > Sometimes the option of "do your indexing again" is difficult. Perhaps the > original data is very slow to access, or it may be difficult to get in the > first place. > > Here's where we go against our own advice that we just gave you. Above we > said "don't use Solr itself as a datasource" ... but one way to deal with > d
RE: Questions regarding re-index when using Solr as a data source
Hi Walter, Thank you for the reply, sorry I need to clarify what I mean by 'migrate tables' from Oracle to Solr, we are not literally move existing records from Oracle to Solr, instead, we are building a new application directly feed data into Solr as document and fields, in parallel of another existing application which feeds the same data into Oracle tables/columns, of course, the Solr schema will be somewhat different than Oracle; also we only keep those data for 90 days for user to search on, we hope once we run both system in parallel for some time (> 90 days), we will build up enough new data in Solr and we no longer need any old data in Oracle, by then we will be able to use Solr as our only data store. It sounds to me that we may need to consider save the data into either file system, or another database, in case we need to rebuild the indexes; and the reason I mentioned to save data into another Solr system is by reading this info from https://wiki.apache.org/solr/HowToReindex : so just trying to get a feedback on if there is any update on this approach? And any better way to do this to minimize the downtime caused by the schema change and re-index? For example, in Oracle, we are able to add a new column or new index online without any impact of existing queries as existing indexes are intact. Alternatives when a traditional reindex isn't possible Sometimes the option of "do your indexing again" is difficult. Perhaps the original data is very slow to access, or it may be difficult to get in the first place. Here's where we go against our own advice that we just gave you. Above we said "don't use Solr itself as a datasource" ... but one way to deal with data availability problems is to set up a completely separate Solr instance (not distributed, which for SolrCloud means numShards=1) whose only job is to store the data, then use the SolrEntityProcessor in the DataImportHandler to index from that instance to your real Solr install. If you need to reindex, just run the import again on your real installation. Your schema for the intermediate Solr install would have stored="true" and indexed="false" for all fields, and would only use basic types like int, long, and string. It would not have any copyFields. This is the approach used by the Smithsonian for their Solr installation, because getting access to the source databases for the individual entities within the organization is very difficult. This way they can reindex the online Solr at any time without having to get special permission from all those entities. When they index new content, it goes into a copy of Solr configured for storage only, not in-depth searching. Their main Solr instance uses SolrEntityProcessor to import from the intermediate Solr servers, so they can always reindex. Regards, Hui -Original Message- From: Walter Underwood [mailto:wun...@wunderwood.org] Sent: Thursday, June 09, 2016 12:19 PM To: solr-user@lucene.apache.org Subject: Re: Questions regarding re-index when using Solr as a data source First, using Solr as a repository is pretty risky. I would keep the official copy of the data in a database, not in Solr. Second, you can’t “migrate tables” because Solr doesn’t have tables. You need to turn the tables into documents, then index the documents. It can take a lot of joins to flatten a relational schema into Solr documents. Solr does not support schema migration, so yes, you will need to save off all the documents, then reload them. I would save them to files. It makes no sense to put them in another copy of Solr. Changing the schema will be difficult and time-consuming, but you’ll probably run into much worse problems trying to use Solr as a repository. wunder Walter Underwood wun...@wunderwood.org<mailto:wun...@wunderwood.org> http://observer.wunderwood.org/ (my blog) > On Jun 9, 2016, at 8:50 AM, Hui Liu > <h...@opentext.com<mailto:h...@opentext.com>> wrote: > > Hi, > > We are porting an application currently hosted in Oracle 11g to > Solr Cloud 6.x, i.e we plan to migrate all tables in Oracle as collections in > Solr, index them, and build search tools on top of this; the goal is we won't > be using Oracle at all after this has been implemented; every fields in Solr > will have 'stored=true' and selectively a subset of searchable fields will > have 'indexed=true'; the question is what steps we should follow if we need > to re-index a collection after making some schema changes - mostly we only > add new fields to store, or make a non-indexed field as indexed, we normally > do not delete or rename any existing fields; according to this url: > https://wiki.apache.org/solr/HowToReindex it seems we need to setup a > 'intermediate' Solr1 to only store the data themselves without any indexing, > then have another Solr2 setup
Questions regarding re-index when using Solr as a data source
Hi, We are porting an application currently hosted in Oracle 11g to Solr Cloud 6.x, i.e we plan to migrate all tables in Oracle as collections in Solr, index them, and build search tools on top of this; the goal is we won't be using Oracle at all after this has been implemented; every fields in Solr will have 'stored=true' and selectively a subset of searchable fields will have 'indexed=true'; the question is what steps we should follow if we need to re-index a collection after making some schema changes - mostly we only add new fields to store, or make a non-indexed field as indexed, we normally do not delete or rename any existing fields; according to this url: https://wiki.apache.org/solr/HowToReindex it seems we need to setup a 'intermediate' Solr1 to only store the data themselves without any indexing, then have another Solr2 setup to store the indexed data, and in case of re-index, just delete all the documents in Solr2 for the collection and re-import data from Solr1 into Solr2 using SolrEntityProcessor (from dataimport handler)? Is this still the recommended approach? I can see the downside of this approach is if we have tremendous amount of data for a collection (some of our collection could have several billions of documents), re-import it from Solr1 to Solr2 may take a few hours or even days, and during this time, users cannot query the data, is there any better way to do this and avoid this type of down time? Any feedback is appreciated! Regards, Hui Liu Opentext, Inc.
RE: Help needed on Solr Streaming Expressions
The only difference between document3 and document5 is document3 has no data in 'shard2', after loading some data into shard2, the http command also worked: http://localhost:8988/solr/document3/stream?expr=search(document3,zkHost="127.0.0.1:2181",q="*:*",fl="document_id, sender_msg_dest", sort="document_id asc",qt="/export") my guess is the 'null pointer' error from the stack trace is caused by no data in the 'shard2'. Regards, Hui -Original Message- From: Hui Liu Sent: Monday, June 06, 2016 1:04 PM To: solr-user@lucene.apache.org Subject: RE: Help needed on Solr Streaming Expressions Joel, Thank you very much for your help, I tried the http command below with my existing 2 shards collection 'document3' (sorry I have a typo below should be document3 instead of document2), this time I got much better error: {"result-set":{"docs":[ {"EXCEPTION":"Unable to construct instance of org.apache.solr.client.solrj.io.stream.CloudSolrStream","EOF":true}]}} I attach the error stack trace from 'solr-8988-console.log' and 'solr.log' here in file 'solr_error.txt'. However I continued and tried create another identical collection 'document5' with 2 shards and 2 replica using the same schema, this time the http URL worked!!! Maybe my previous collection 'document3' has some corruption? -- command to create collection 'document5': solr create -c document5 -d new_doc_configs5 -p 8988 -s 2 -rf 2 -- command for stream expression: http://localhost:8988/solr/document5/stream?expr=search(document5,zkHost="127.0.0.1:2181",q="*:*",fl="document_id, sender_msg_dest", sort="document_id asc",qt="/export") -- result from browser: {"result-set":{"docs":[ {"document_id":20346005172,"sender_msg_dest":"ZZ:035239425"}, {"document_id":20346005173,"sender_msg_dest":"ZZ:035239425"}, {"document_id":20346006403,"sender_msg_dest":"14:004321519IBMP"}, {"document_id":20346006406,"sender_msg_dest":"14:004321519IBMP"}, {"document_id":20346006741,"sender_msg_dest":"14:004321519IBMP"}, {"document_id":20346006743,"sender_msg_dest":"14:004321519IBMP"}, {"EOF":true,"RESPONSE_TIME":10}]}} Do you think I can try the same in http using other 'Stream Decorators' such as 'complement' and 'innerJoin'? Regards, Hui -Original Message- From: Joel Bernstein [mailto:joels...@gmail.com] Sent: Monday, June 06, 2016 9:51 AM To: solr-user@lucene.apache.org Subject: Re: Help needed on Solr Streaming Expressions Hi, To eliminate any issues that might be happening due to curl, try running the command from your browser. http://localhost:8988/solr/document2/stream?expr=search(document3,zkHost=" 127.0.0.1:2181",q="*:*",fl="document_id, sender_msg_dest", sort="document_id asc",qt="/export") I think most browsers will url encode the expression automatically, but you can url encode also using an online tool. Also you can remove the zkHost param and it should default to zkHost your solr is connected to. If you still get an error take a look at the logs and post the full stack trace to this thread, which will help determine where the problem is. Joel Bernstein http://joelsolr.blogspot.com/ On Sun, Jun 5, 2016 at 2:11 PM, Hui Liu <h...@opentext.com> wrote: > Hi, > > > > I have Solr 6.0.0 installed on my PC (windows 7), I was > experimenting with ‘Streaming Expression’ feature by following steps > from this link: > https://cwiki.apache.org/confluence/display/solr/Streaming+Expressions > , but cannot get it to work, attached is my solrconfig.xml and > schema.xml, note I do have ‘export’ handler defined in my > ‘solrconfig.xml’ and enabled all fields as ‘docvalues’ in > ‘schema.xml’; I am using solr cloud and external zookeeper (also > installed on m PC), here is the command to start this 2-node Solr > cloud instance and to create the collection ‘document3’: > > > > -- start 2-node solr cloud instances: > > solr start -c -z 127.0.0.1:2181 -p 8988 -s solr3 > > solr start -c -z 127.0.0.1:2181 -p 8989 -s solr4 > > > > -- create the collection: > > solr create -c document3 -d new_doc_configs3 -p 8988 -s 2 -rf 2 > > > > after creating the collection I loaded a few documents > using ‘csv’ format and I was able to query it using ‘curl’ command from my PC: > > > > -- this works on my PC: > > curl > http://localhost:8988/solr/document3/select?q=*:*=document_id+des > c,sender_msg_dest+desc=document_id,sender_msg_dest,recip_msg_dest > > > >
RE: Help needed on Solr Streaming Expressions
Joel, Thank you very much for your help, I tried the http command below with my existing 2 shards collection 'document3' (sorry I have a typo below should be document3 instead of document2), this time I got much better error: {"result-set":{"docs":[ {"EXCEPTION":"Unable to construct instance of org.apache.solr.client.solrj.io.stream.CloudSolrStream","EOF":true}]}} I attach the error stack trace from 'solr-8988-console.log' and 'solr.log' here in file 'solr_error.txt'. However I continued and tried create another identical collection 'document5' with 2 shards and 2 replica using the same schema, this time the http URL worked!!! Maybe my previous collection 'document3' has some corruption? -- command to create collection 'document5': solr create -c document5 -d new_doc_configs5 -p 8988 -s 2 -rf 2 -- command for stream expression: http://localhost:8988/solr/document5/stream?expr=search(document5,zkHost="127.0.0.1:2181",q="*:*",fl="document_id, sender_msg_dest", sort="document_id asc",qt="/export") -- result from browser: {"result-set":{"docs":[ {"document_id":20346005172,"sender_msg_dest":"ZZ:035239425"}, {"document_id":20346005173,"sender_msg_dest":"ZZ:035239425"}, {"document_id":20346006403,"sender_msg_dest":"14:004321519IBMP"}, {"document_id":20346006406,"sender_msg_dest":"14:004321519IBMP"}, {"document_id":20346006741,"sender_msg_dest":"14:004321519IBMP"}, {"document_id":20346006743,"sender_msg_dest":"14:004321519IBMP"}, {"EOF":true,"RESPONSE_TIME":10}]}} Do you think I can try the same in http using other 'Stream Decorators' such as 'complement' and 'innerJoin'? Regards, Hui -Original Message- From: Joel Bernstein [mailto:joels...@gmail.com] Sent: Monday, June 06, 2016 9:51 AM To: solr-user@lucene.apache.org Subject: Re: Help needed on Solr Streaming Expressions Hi, To eliminate any issues that might be happening due to curl, try running the command from your browser. http://localhost:8988/solr/document2/stream?expr=search(document3,zkHost=" 127.0.0.1:2181",q="*:*",fl="document_id, sender_msg_dest", sort="document_id asc",qt="/export") I think most browsers will url encode the expression automatically, but you can url encode also using an online tool. Also you can remove the zkHost param and it should default to zkHost your solr is connected to. If you still get an error take a look at the logs and post the full stack trace to this thread, which will help determine where the problem is. Joel Bernstein http://joelsolr.blogspot.com/ On Sun, Jun 5, 2016 at 2:11 PM, Hui Liu <h...@opentext.com> wrote: > Hi, > > > > I have Solr 6.0.0 installed on my PC (windows 7), I was > experimenting with ‘Streaming Expression’ feature by following steps > from this link: > https://cwiki.apache.org/confluence/display/solr/Streaming+Expressions > , but cannot get it to work, attached is my solrconfig.xml and > schema.xml, note I do have ‘export’ handler defined in my > ‘solrconfig.xml’ and enabled all fields as ‘docvalues’ in > ‘schema.xml’; I am using solr cloud and external zookeeper (also > installed on m PC), here is the command to start this 2-node Solr > cloud instance and to create the collection ‘document3’: > > > > -- start 2-node solr cloud instances: > > solr start -c -z 127.0.0.1:2181 -p 8988 -s solr3 > > solr start -c -z 127.0.0.1:2181 -p 8989 -s solr4 > > > > -- create the collection: > > solr create -c document3 -d new_doc_configs3 -p 8988 -s 2 -rf 2 > > > > after creating the collection I loaded a few documents > using ‘csv’ format and I was able to query it using ‘curl’ command from my PC: > > > > -- this works on my PC: > > curl > http://localhost:8988/solr/document3/select?q=*:*=document_id+des > c,sender_msg_dest+desc=document_id,sender_msg_dest,recip_msg_dest > > > > but when trying Streaming ‘search’ using curl, it does > not work, I tried with 3 different options: with zkHost, using > ‘export’, or using ‘select’, all getting the same error: > > > curl: (6) Could not resolve host: sort=document_id asc,qt= > > {"result-set":{"docs":[ > > {"EXCEPTION":null,"EOF":true}]}} > > -- different curl commands tried, all getting the same error above: > > curl --data-urlencode > 'expr=search(document3,zkHost="127.0.0.1:2181",q="*:*",fl="document_id > , sender_msg_dest", sort="document_id asc&q
Help needed on Solr Streaming Expressions
Hi, I have Solr 6.0.0 installed on my PC (windows 7), I was experimenting with 'Streaming Expression' feature by following steps from this link: https://cwiki.apache.org/confluence/display/solr/Streaming+Expressions, but cannot get it to work, attached is my solrconfig.xml and schema.xml, note I do have 'export' handler defined in my 'solrconfig.xml' and enabled all fields as 'docvalues' in 'schema.xml'; I am using solr cloud and external zookeeper (also installed on m PC), here is the command to start this 2-node Solr cloud instance and to create the collection 'document3': -- start 2-node solr cloud instances: solr start -c -z 127.0.0.1:2181 -p 8988 -s solr3 solr start -c -z 127.0.0.1:2181 -p 8989 -s solr4 -- create the collection: solr create -c document3 -d new_doc_configs3 -p 8988 -s 2 -rf 2 after creating the collection I loaded a few documents using 'csv' format and I was able to query it using 'curl' command from my PC: -- this works on my PC: curl http://localhost:8988/solr/document3/select?q=*:*=document_id+desc,sender_msg_dest+desc=document_id,sender_msg_dest,recip_msg_dest but when trying Streaming 'search' using curl, it does not work, I tried with 3 different options: with zkHost, using 'export', or using 'select', all getting the same error: curl: (6) Could not resolve host: sort=document_id asc,qt= {"result-set":{"docs":[ {"EXCEPTION":null,"EOF":true}]}} -- different curl commands tried, all getting the same error above: curl --data-urlencode 'expr=search(document3,zkHost="127.0.0.1:2181",q="*:*",fl="document_id, sender_msg_dest", sort="document_id asc",qt="/export")' "http://localhost:8988/solr/document2/stream; curl --data-urlencode 'expr=search(document3,q="*:*",fl="document_id, sender_msg_dest", sort="document_id asc",qt="/export")' "http://localhost:8988/solr/document2/stream; curl --data-urlencode 'expr=search(document3,q="*:*",fl="document_id, sender_msg_dest", sort="document_id asc",qt="/select",rows=10)' "http://localhost:8988/solr/document2/stream; what am I doing wrong? Thanks for any help! Regards, Hui Liu 6.0.0 ${solr.data.dir:} ${solr.lock.type:native} true ${solr.ulog.dir:} ${solr.ulog.numVersionBuckets:65536} ${solr.autoCommit.maxTime:15000} false ${solr.autoSoftCommit.maxTime:-1} 1024 true 20 200 false 2 explicit 10 explicit json true text text explicit true true false terms {!xport} xsort false query document_id document_id
答复: help need example code of solrj to get schema of a given core
Thanks Georg very much! Ming -邮件原件- 发件人: Georg Sorst [mailto:georg.so...@gmail.com] 发送时间: 2016年5月31日 18:22 收件人: solr-user@lucene.apache.org 主题: Re: help need example code of solrj to get schema of a given core Querying the schema can be done with the Schema API ( https://cwiki.apache.org/confluence/display/solr/Schema+API), which is fully supported by SolrJ: http://lucene.apache.org/solr/6_0_0/solr-solrj/org/apache/solr/client/solrj/request/schema/package-summary.html . Liu, Ming (Ming) <ming@esgyn.cn> schrieb am Di., 31. Mai 2016 09:41: > Hello, > > I am very new to Solr, I want to write a simple Java program to get a > core's schema information. Like how many field and details of each > field. I spent a few time searching on internet, but cannot get much > information about this. The solrj wiki seems not updated for long > time. I am using Solr > 5.5.0 > > Hope there are some example code, or please give me some advices, or > simple hint like which java class I can take a look at. > > Thanks in advance! > Ming >
help need example code of solrj to get schema of a given core
Hello, I am very new to Solr, I want to write a simple Java program to get a core's schema information. Like how many field and details of each field. I spent a few time searching on internet, but cannot get much information about this. The solrj wiki seems not updated for long time. I am using Solr 5.5.0 Hope there are some example code, or please give me some advices, or simple hint like which java class I can take a look at. Thanks in advance! Ming
Re: Documents cannot be searched immediately when indexed using REST API with Solr Cloud
Hi Edvin Please review your commit/soft-commit configuration, soft commits are about visibility, hard commits are about durability by a wise man. :) If you are doing NRT index and searching, your probably need a short soft commit interval or commit explicitly in your request handler. Be advised that these strategies and configurations need to be tested and adjusted according to your data size, searching and index updating frequency. You should be able to find the answer yourself here: http://lucidworks.com/blog/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/ All the best Liu Bo On 19 March 2015 at 17:54, Zheng Lin Edwin Yeo edwinye...@gmail.com wrote: Hi, I'm using Solr Cloud now, with 2 shards known as shard1 and shard2, and when I try to index rich-text documents using REST API or the default Documents module in Solr Admin UI, the documents that are indexed do not appear immediately when I do a search. It only appears after I restarted the Solr services (both shard1 and shard2). However, the same issue do not happen when I index the same documents using post.jar, and I can search for the indexed documents immediately. Here's my ExtractingRequestHandler in solrconfig.xml. requestHandler name=/update/extract class=solr.extraction.ExtractingRequestHandler lst name=defaults str name=lowernamestrue/str str name=uprefixignored_/str !-- capture link hrefs but ignore div attributes -- str name=captureAttrtrue/str str name=fmap.alinks/str str name=fmap.divignored_/str /lst /requestHandler What could be the reason why this is happening, and any solutions to solve it? Regards, Edwin
solr always loading and not any response
hi, all, solr admin page is always loading, and when I send query request also can not get any response. the tcp link is always ESTABLISHED。only restart solr service can fix it. how to find out the problem? solr:4.6 jetty:8 thanks so much.
Re: Where to specify numShards when startup up a cloud setup
Hi zzT Putting numShards in core.properties also works. I struggled a little bit while figuring out this configuration approach. I knew I am not alone! ;-) On 2 April 2014 18:06, zzT zis@gmail.com wrote: It seems that I've figured out a configuration approach to this issue. I'm having the exact same issue and the only viable solutions found on the net till now are 1) Pass -DnumShards=x when starting up Solr server 2) Use the Collections API as indicated by Shawn. What I've noticed though - after making the call to /collections to create a node solr.xml - is that a new core entry is added inside solr.xml with the attribute numShards. So, right now I'm configuring solr.xml with numShards attribute inside my core nodes. This way I don't have to worry with annoying stuff you've already mentioned e.g. waiting for Solr to start up etc. Of course same logic applies here, numShards param is meanigful only the first time. Even if you change it at a later point the # of shards stays the same. -- View this message in context: http://lucene.472066.n3.nabble.com/Where-to-specify-numShards-when-startup-up-a-cloud-setup-tp4078473p4128566.html Sent from the Solr - User mailing list archive at Nabble.com. -- All the best Liu Bo
Re: Multiple Languages in Same Core
Hi Jeremy There're a lot of multi language discussions, two main approaches 1. like yours, a language is one core 2. all in one core, different language has it's own field. We have multi-language support in a single core, each multilingual field has it's own suffix such as name_en_US. We customized query handler to hide the query details to client. The main reason we want to do this is about NRT index and search, take product for example: product has price, quantity which is common and it's used by filtering and sorting, name, description is multi language field, if we split product in do different cores, the common field updating may end up a update in all of the multi language cores. As to scalability, we don't change solr cores/collections when a new language is added, but we probably need update our customized index process and run a full re-index. This approach suits our requirement for now, but you may have your own concerns. We have similar suggest filter problem like yours, we want to return suggest result filtering by stores. I can't find a way to build dictionary with query at my version of solr 4.6 What I do is run a query on a N-Gram analyzed field and with filter queries on store_id field. The suggest is actually a query. It may not perform as well as suggestion but can do the trick. You can try it to build a additional N-GRAM field for suggestion only and search on it with fq on your Locale field. All the best Liu Bo On 25 March 2014 09:15, Alexandre Rafalovitch arafa...@gmail.com wrote: Solr In Action has a significant discussion on the multi-lingual approach. They also have some code samples out there. Might be worth a look Regards, Alex. Personal website: http://www.outerthoughts.com/ LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch - Time is the quality of nature that keeps events from happening all at once. Lately, it doesn't seem to be working. (Anonymous - via GTD book) On Tue, Mar 25, 2014 at 4:43 AM, Jeremy Thomerson jer...@thomersonfamily.com wrote: I recently deployed Solr to back the site search feature of a site I work on. The site itself is available in hundreds of languages. With the initial release of site search we have enabled the feature for ten of those languages. This is distributed across eight cores, with two Chinese languages plus Korean combined into one CJK core and each of the other seven languages in their own individual cores. The reason for splitting these into separate cores was so that we could have the same field names across all cores but have different configuration for analyzers, etc, per core. Now I have some questions on this approach. 1) Scalability: Considering I need to scale this to many dozens more languages, perhaps hundreds more, is there a better way so that I don't end up needing dozens or hundreds of cores? My initial plan was that many languages that didn't have special support within Solr would simply get lumped into a single default core that has some default analyzers that are applicable to the majority of languages. 1b) Related to this: is there a practical limit to the number of cores that can be run on one instance of Lucene? 2) Auto Suggest: In phase two I intend to add auto-suggestions as a user types a query. In reviewing how this is implemented and how the suggestion dictionary is built I have concerns. If I have more than one language in a single core (and I keep the same field name for suggestions on all languages within a core) then it seems that I could get suggestions from another language returned with a suggest query. Is there a way to build a separate dictionary for each language, but keep these languages within the same core? If it's helpful to know: I have a field in every core for Locale. Values will be the locale of the language of that document, i.e. en, es, zh_hans, etc. I'd like to be able to: 1) when building a suggestion dictionary, divide it into multiple dictionaries, grouping them by locale, and 2) supply a parameter to the suggest query that allows the suggest component to only return suggestions from the appropriate dictionary for that locale. If the answer to #1 is keep splitting groups of languages that have different analyzers into their own cores and the answer to #2 is that's not supported, then I'd be curious: where would I start to write my own extension that supported #2? I looked last night at the suggest lookup classes, dictionary classes, etc. But I didn't see a clear point where it would be clean to implement something like I'm suggesting above. Best Regards, Jeremy Thomerson -- All the best Liu Bo
Re: Grouping results with group.limit return wrong numFound ?
hi @Ahmet I've thought about using group.ngroups=true , but when you use group.main=true, there's no ngroups field in the response. and according to http://wiki.apache.org/solr/FieldCollapsing, the result might not be correct in solrcloud. I don't like using facet for this but seems have to... On 1 January 2014 00:35, Ahmet Arslan iori...@yahoo.com wrote: Hi Tasmaniski, I don't follow. How come Liu's faceting workaround and n.groups=true produce different results? On Tuesday, December 31, 2013 6:08 PM, tasmaniski tasmani...@gmail.com wrote: @kamaci Ofcourse. That is the problem. group.limit is: the number of results (documents) to return for each group. NumFound is number of total found, but *not* sum number of *return for each group.* @Liu Bo seems to be the is only workaround for problem but it's to much expensive to go through all the groups and calculate total number of found/returned (I use PHP for client:) ). @iorixxx Yes, I consider that (group.ngroups=true) but in some group I have number of found result lesser than limit. -- View this message in context: http://lucene.472066.n3.nabble.com/Grouping-results-with-group-limit-return-wrong-numFound-tp4108174p4108906.html Sent from the Solr - User mailing list archive at Nabble.com. -- All the best Liu Bo
Re: Chaining plugins
Hi I've done similar things as paul. what I do is extending the default QueryComponent and overwrite the preparing method, then I just change the solrparams according to our logic and then call super.prepare(). Then replace the default QueryComponent with it in my search/query handler. In this way, nothing of solr default behavior is touched. I think you can do your logic in prepare method, and then let solr proceed the search. I've tested it along with other components in both single solr node and solrcloud. It works fine. Hope it helps Cheers Bold On 31 December 2013 06:03, Chris Hostetter hossman_luc...@fucit.org wrote: You don't need to write your own handler. See the previpous comment about implementing a SearchComponent -- you can check for the params in your prepare() method and do whatever side effects you want, then register your custom component and hook it into the component chain of whatever handler configuration you want (either using the components arr or by specifying it as a first-components... https://cwiki.apache.org/confluence/display/solr/RequestHandlers+and+SearchComponents+in+SolrConfig : I want to save the query into a file when a user is changing a parameter in : the query, lets say he adds logTofile=1 then the searchHandler will : provide the same result as without this parameter, but in the background it : will do some logic(ex. save the query to file) . : But I dont want to touch solr source code, all I want is to add code(like : plugin). if i understand it right I want to write my own search handler , do : some logic , then pass the data to solr default search handler. -Hoss http://www.lucidworks.com/ -- All the best Liu Bo
Re: Grouping results with group.limit return wrong numFound ?
Hi I've met the same problem, and I've googled it around but not found direct solution. But there's a work around, do a facet on your group field, with parameters like str name=facettrue/str str name=facet.fieldyour_field/str str name=facet.limit-1/str str name=facet.mincount1/str and then count how many facted pairs in the response. This should be the same with the number of documents after grouping. Cheers Bold On 31 December 2013 06:40, Furkan KAMACI furkankam...@gmail.com wrote: Hi; group.limit is: the number of results (documents) to return for each group. Defaults to 1. Did you check the page here: https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=32604232 Thanks; Furkan KAMACI 25 Aralık 2013 Çarşamba tarihinde tasmaniski tasmani...@gmail.com adlı kullanıcı şöyle yazdı: Hi All, When I perform a search with grouping result in a groups and do limit results in one group I got that *numFound* is the same as I didn't use limit.looks like SOLR first perform search and calculate numFound and that group and limit the results.I do not know if this is a bug or a feature :)But I cannot use pagination and other stuff.Is there any workaround or I missed something ?Example:I want to search book title and limit the search to 3 results per one publisher.q=book_title: solr phpgroup=truegroup.field=publishergroup.limit=3group.main=trueI have for apress publisher 20 results but I show only 3 that works OKBut in numFound I still have 20 for apress publisher... -- View this message in context: http://lucene.472066.n3.nabble.com/Grouping-results-with-group-limit-return-wrong-numFound-tp4108174.html Sent from the Solr - User mailing list archive at Nabble.com. -- All the best Liu Bo
Re: PostingsSolrHighlighter
hi Josip for the 1 question we've done similar things: copying search field to a text field. But highlighting is normally on specific fields such as tittle depending on how the search content is displayed to the front end, you can search on text and highlight on the field you wanted by specify hl.fl ref: http://wiki.apache.org/solr/HighlightingParameters#hl.fl On 17 December 2013 02:29, Josip Delic j...@lugensa.com wrote: Hi @all, i am playing with the PostingsSolrHighlighter. I'm running solr 4.6.0 and my configuration is from here: https://lucene.apache.org/solr/4_6_0/solr-core/org/apache/solr/highlight/ PostingsSolrHighlighter.html Search query and result (not working): http://pastebin.com/13Uan0ZF Schema (not complete): http://pastebin.com/JGa38UDT Search query and result (working): http://pastebin.com/4CP8XKnr Solr config: searchComponent class=solr.HighlightComponent name=highlight highlighting class=org.apache.solr.highlight.PostingsSolrHighlighter/ /searchComponent So this is working just fine, but now i have some questions: 1.) With the old default highlighter component it was possible to search in searchable_text and to retrive highlighted text. This is essential, because we use copyfield to put almost everything to searchable_text (title, subtitle, description, ...) 2.) I can't get ellipsis working i tried hl.tag.ellipsis=..., f.text.hl.tag.ellipsis=..., configuring it in RequestHandler noting seems to work, maxAnalyzedChars is just cutting the sentence? Kind Regards Josip Delic -- All the best Liu Bo
Re: an array liked string is treated as multivalued when adding doc to solr
Hi Alexandre It's quite a rare case, just one out of tens of thousands. I'm planning to have every multilingual field as multivalued and just get the first one while formatting the response to our business object. The first value update processor seems a lot helpful, thank you. All the best Liu Bo On 18 December 2013 15:26, Alexandre Rafalovitch arafa...@gmail.com wrote: If this happens rarely and you want to deal with in on the way into Solr, you could just keep one of the values, using URP: http://lucene.apache.org/solr/4_6_0/solr-core/org/apache/solr/update/processor/FirstFieldValueUpdateProcessorFactory.html Regards, Alex Personal website: http://www.outerthoughts.com/ LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch - Time is the quality of nature that keeps events from happening all at once. Lately, it doesn't seem to be working. (Anonymous - via GTD book) On Wed, Dec 18, 2013 at 2:20 PM, Liu Bo diabl...@gmail.com wrote: Hey Furkan and solr users This is a miss reported problem. It's not solr problem but our data issue. Sorry for this. It's a data issue of our side, a coupon happened to have two piece English description, which is not allowed in our business logic, but it happened and we added twice of the name_en_US to solr document. I've done a set of test and deep debugging to solr source code, and found out that a array like string such as [Get 20% Off Official Barca Kits, coupon] won't be treated as multivalued field. Sorry again for not digging more before sent out question email. I trust our business logic and data integrity more than solr, I will definitely not do this again. ;-) All the best Liu Bo On 11 December 2013 07:21, Furkan KAMACI furkankam...@gmail.com wrote: Hi Liu; Yes. it is an expected behavior. If you send data within square brackets Solr will behave it as a multivalued field. You can test it with this way: if you use Solrj and use a List for a field it will be considered as multivalued too because when you call toString() method of your List you can see that elements are printed within square brackets. This is the reason that a List can be used for a multivalued field. If you explain your situation I can offer a way how to do it. Thanks; Furkan KAMACI 2013/12/6 Liu Bo diabl...@gmail.com Dear solr users: I've met this kind of error several times, when add a array liked string such as:[Get 20% Off Official Barça Kits, coupon] to a multiValued=false field, solr will complain: org.apache.solr.common.SolrException: ERROR: [doc=7781396456243918692] multiple values encountered for non multiValued field name_en_US: [Get 20% Off Official Barca Kits, coupon] my schema defination: field name=name_en_US type=text_en indexed=true stored=true multiValued=false / This field is stored as the search result needs this field and it's value in original format, and indexed to give it a boost while searching . What I do is adding name (java.lang.String) to SolrInputDocument by addField(name_en_US, product.getName()) method, and then add this to solr using an AddUpdateCommand It seems solr treats this kind of string data as multivalued, even I add this field to solr only once. Is this a bug or a supposed behavior? Is there any way to tell solr this is not a multivalued value add don't break it? Your help and suggestion will be much of my appreciation. -- All the best Liu Bo -- All the best Liu Bo -- All the best Liu Bo
Re: PostingsSolrHighlighter
Hi Josip that's quite weird, to my experience highlight is strict on string field which needs a exact match, text fields should be fine. I copy your schema definition and do a quick test in a new core, everything is default from the tutorial, and the search component is using solr.HighlightComponent . search on searchable_text can highlight text, I copied your search url and just change the host part, the input parameters are exactly the same, result is attached. Can you upload your complete solrconfig.xml and schema.xml? On 18 December 2013 19:02, Josip Delic j...@lugensa.com wrote: Am 18.12.2013 09:55, schrieb Liu Bo: hi Josip hi liu, for the 1 question we've done similar things: copying search field to a text field. But highlighting is normally on specific fields such as tittle depending on how the search content is displayed to the front end, you can search on text and highlight on the field you wanted by specify hl.fl ref: http://wiki.apache.org/solr/HighlightingParameters#hl.fl thats exactly what i'm doing in that pastebin: http://pastebin.com/13Uan0ZF I'm searing there for 'q=searchable_text:labore' this is present in 'text' and in the copyfield 'searchable_text' but it is not highlighted in 'text' (hl.fl=text) The same query is working if set 'q=text:labore' as you can see in http://pastebin.com/4CP8XKnr For 2 question i figured out that the PostingsSolrHighlighter ellipsis is not like i thought for adding ellipsis to start or/and end in highlighted text. It is instead used to combine multiple snippets together if snippets is 1. cheers josip On 17 December 2013 02:29, Josip Delic j...@lugensa.com wrote: Hi @all, i am playing with the PostingsSolrHighlighter. I'm running solr 4.6.0 and my configuration is from here: https://lucene.apache.org/solr/4_6_0/solr-core/org/ apache/solr/highlight/ PostingsSolrHighlighter.html Search query and result (not working): http://pastebin.com/13Uan0ZF Schema (not complete): http://pastebin.com/JGa38UDT Search query and result (working): http://pastebin.com/4CP8XKnr Solr config: searchComponent class=solr.HighlightComponent name=highlight highlighting class=org.apache.solr.highlight. PostingsSolrHighlighter/ /searchComponent So this is working just fine, but now i have some questions: 1.) With the old default highlighter component it was possible to search in searchable_text and to retrive highlighted text. This is essential, because we use copyfield to put almost everything to searchable_text (title, subtitle, description, ...) 2.) I can't get ellipsis working i tried hl.tag.ellipsis=..., f.text.hl.tag.ellipsis=..., configuring it in RequestHandler noting seems to work, maxAnalyzedChars is just cutting the sentence? Kind Regards Josip Delic -- All the best Liu Bo http://localhost:8080/solr/try/select?wt=jsonfl=text%2Cscore=hl=truehl.fl=textq=%28searchable_text%3Alabore%29rows=10sort=score+descstart=0 { responseHeader: { status: 0, QTime: 36, params: { sort: score desc, fl: text, start: 0, ,score: , q: (searchable_text:labore), hl.fl: text, wt: json, hl: true, rows: 10 } }, response: { numFound: 3, start: 0, docs: [ { text: Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet. Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet. }, { text: Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet. Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet. }, { text: Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd gubergren, no sea takimata
Re: an array liked string is treated as multivalued when adding doc to solr
Hey Furkan and solr users This is a miss reported problem. It's not solr problem but our data issue. Sorry for this. It's a data issue of our side, a coupon happened to have two piece English description, which is not allowed in our business logic, but it happened and we added twice of the name_en_US to solr document. I've done a set of test and deep debugging to solr source code, and found out that a array like string such as [Get 20% Off Official Barca Kits, coupon] won't be treated as multivalued field. Sorry again for not digging more before sent out question email. I trust our business logic and data integrity more than solr, I will definitely not do this again. ;-) All the best Liu Bo On 11 December 2013 07:21, Furkan KAMACI furkankam...@gmail.com wrote: Hi Liu; Yes. it is an expected behavior. If you send data within square brackets Solr will behave it as a multivalued field. You can test it with this way: if you use Solrj and use a List for a field it will be considered as multivalued too because when you call toString() method of your List you can see that elements are printed within square brackets. This is the reason that a List can be used for a multivalued field. If you explain your situation I can offer a way how to do it. Thanks; Furkan KAMACI 2013/12/6 Liu Bo diabl...@gmail.com Dear solr users: I've met this kind of error several times, when add a array liked string such as:[Get 20% Off Official Barça Kits, coupon] to a multiValued=false field, solr will complain: org.apache.solr.common.SolrException: ERROR: [doc=7781396456243918692] multiple values encountered for non multiValued field name_en_US: [Get 20% Off Official Barca Kits, coupon] my schema defination: field name=name_en_US type=text_en indexed=true stored=true multiValued=false / This field is stored as the search result needs this field and it's value in original format, and indexed to give it a boost while searching . What I do is adding name (java.lang.String) to SolrInputDocument by addField(name_en_US, product.getName()) method, and then add this to solr using an AddUpdateCommand It seems solr treats this kind of string data as multivalued, even I add this field to solr only once. Is this a bug or a supposed behavior? Is there any way to tell solr this is not a multivalued value add don't break it? Your help and suggestion will be much of my appreciation. -- All the best Liu Bo -- All the best Liu Bo
an array liked string is treated as multivalued when adding doc to solr
Dear solr users: I've met this kind of error several times, when add a array liked string such as:[Get 20% Off Official Barça Kits, coupon] to a multiValued=false field, solr will complain: org.apache.solr.common.SolrException: ERROR: [doc=7781396456243918692] multiple values encountered for non multiValued field name_en_US: [Get 20% Off Official Barca Kits, coupon] my schema defination: field name=name_en_US type=text_en indexed=true stored=true multiValued=false / This field is stored as the search result needs this field and it's value in original format, and indexed to give it a boost while searching . What I do is adding name (java.lang.String) to SolrInputDocument by addField(name_en_US, product.getName()) method, and then add this to solr using an AddUpdateCommand It seems solr treats this kind of string data as multivalued, even I add this field to solr only once. Is this a bug or a supposed behavior? Is there any way to tell solr this is not a multivalued value add don't break it? Your help and suggestion will be much of my appreciation. -- All the best Liu Bo
Re: deleting a doc inside a custom UpdateRequestProcessor
hi, you can try this in your checkIfIsDuplicate(), build a query based on your title, and set it to a delete command: //build your query accordingly, this depends on how your tittle is indexed, eg analyzed or not. be careful with it and do some test. DeleteUpdateCommand cmd = new DeleteUpdateCommand(req); cmd.commitWithin = commitWithin; cmd.setQuery(query); processDelete(cmd); Processors are normally chained, you should make sure that your processor comes the first so that it can control what's coming next based on your logic. you can also try to write your own updaterequesthandler instead of a customized processor. you can do a set of operations in your function @Override public void handleRequestBody(SolrQueryRequest req, SolrQueryResponse rsp) throws Exception {} get your processor chain in this function and passes a delete command to it such as : SolrParams params = req.getParams(); checkParameter(params); UpdateRequestProcessorChain processorChain = req.getCore().getUpdateProcessingChain(params.get(UpdateParams.UPDATE_CHAIN)); UpdateRequestProcessor processor = processorChain.createProcessor(req, rsp); DeleteUpdateCommand cmd = new DeleteUpdateCommand(req); cmd.commitWithin = commitWithin; cmd.setQuery(query); processor.processDelete(cmd); this is what I am doing when customizing a update request handler, I try not to touch the original process chain but tell solr what to do by commands. On 19 November 2013 10:01, Peyman Faratin pey...@robustlinks.com wrote: Hi I am building a custom UpdateRequestProcessor to intercept any doc heading to the index. Basically what I want to do is to check if the current index has a doc with the same title (i am using IDs as the uniques so I can't use that, and besides the logic of checking is a little more complicated). If the incoming doc has a duplicate and some other conditions hold then one of 2 things can happen: 1- we don't index the incoming document 2- we index the incoming and delete the duplicate currently in the index I think (1) can be done by simple not passing the call up the chain (not calling super.processAdd(cmd)). However, I don't know how to implement the second condition, deleting the duplicate document, inside a custom UpdateRequestProcessor. This thread is the closest to my goal http://lucene.472066.n3.nabble.com/SOLR-4-3-0-Migration-How-to-use-DeleteUpdateCommand-td4062454.html however i am not clear how to proceed. Code snippets below. thank you in advance for your help class isDuplicate extends UpdateRequestProcessor { public isDuplicate( UpdateRequestProcessor next) { super( next ); } @Override public void processAdd(AddUpdateCommand cmd) throws IOException { try { boolean indexIncomingDoc = checkIfIsDuplicate(cmd); if(indexIncomingDoc) super.processAdd(cmd); } catch (SolrServerException e) {e.printStackTrace();} catch (ParseException e) {e.printStackTrace();} } public boolean checkIfIsDuplicate(AddUpdateCommand cmd) ...{ SolrInputDocument incomingDoc = cmd.getSolrInputDocument(); if(incomingDoc == null) return false; String title = (String) incomingDoc.getFieldValue( title ); SolrIndexSearcher searcher = cmd.getReq().getSearcher(); boolean addIncomingDoc = true; Integer idOfDuplicate = searcher.getFirstMatch(new Term(title,title)); if(idOfDuplicate != -1) { addIncomingDoc = compareDocs(searcher,incomingDoc,idOfDuplicate,title,addIncomingDoc); } return addIncomingDoc; } private boolean compareDocs(.){ if( condition 1 ) { -- DELETE DUPLICATE DOC in INDEX -- addIncomingDoc = true; } return addIncomingDoc; } -- All the best Liu Bo
Re: Multi-core support for indexing multiple servers
As far as I know about magento, it's DB schema is designed for extensible property storage and relationships between db tables are kind of complex. Product has its attribute sets and properties which are stored in different tables. Configurable product may have different attribute values for each of it's sub simple products. Handle relationship like this in DIH won't be easy, especially when you want to group attributes of a configurable product into one document. But if you just need to search on name and description but not other attributes, you can try write DIH on catalog_product_flat_x tables, magento may have several of them. We used to use lucene core to provide search on magento products, what we do is using SOAP service provided by magento to get products, and then converting them to lucene document. Indexes are updated daily. This hides lots of magento implementation details but it's kind of slow. On 12 November 2013 22:41, Robert Veliz rob...@mavenbridge.com wrote: I have two sources/servers--one of them is Magento. Since Magento has a more or less out of the box integration with Solr, my thought was to run Solr server from the Magento instance and then use DIH to get/merge content from the other source/server. Seem feasible/appropriate? I spec'd it out and it seems to make sense... R On Nov 11, 2013, at 11:25 PM, Liu Bo diabl...@gmail.com wrote: like Erick said, merge data from different datasource could be very difficult, SolrJ is much easier to use but may need another application to do handle index process if you don't want to extends solr much. I eventually end up with a customized request handler which use SolrWriter from DIH package to index data, So that I can fully control the index process, quite like SolrJ, you can write code to convert your data into SolrInputDocument, and then post them to SolrWriter, SolrWriter will handles the rest stuff. On 8 November 2013 21:46, Erick Erickson erickerick...@gmail.com wrote: Yep, you can define multiple data sources for use with DIH. Combining data from those multiple sources into a single index can be a bit tricky with DIH, personally I tend to prefer SolrJ, but that's mostly personal preference, especially if I want to get some parallelism going on. But whatever works Erick On Thu, Nov 7, 2013 at 11:17 PM, manju16832003 manju16832...@gmail.com wrote: Eric, Just a question :-), wouldn't it be easy to use DIH to pull data from multiple data sources. I do use DIH to do that comfortably. I have three data sources - MySQL - URLDataSource that returns XML from an .NET application - URLDataSource that connects to an API and return XML Here is part of data-config data source settings dataSource type=JdbcDataSource name=solr driver=com.mysql.jdbc.Driver url=jdbc:mysql://localhost/employeeDB batchSize=-1 user=root password=root/ dataSource name=CRMServer type=URLDataSource encoding=UTF-8 connectionTimeout=5000 readTimeout=1/ dataSource name=ImageServer type=URLDataSource encoding=UTF-8 connectionTimeout=5000 readTimeout=1/ Of course, in application I do the same. To construct my results, I do connect to MySQL and those two data sources. Basically we have two point of indexing - Using DIH at one time indexing - At application whenever there is transaction to the details that we are storing in Solr. -- View this message in context: http://lucene.472066.n3.nabble.com/Multi-core-support-for-indexing-multiple-servers-tp4099729p4099933.html Sent from the Solr - User mailing list archive at Nabble.com. -- All the best Liu Bo -- All the best Liu Bo
Re: eDisMax, multiple language support and stopwords
Happy to see some one have similar solutions as ours. we have similar multi-language search feature and we index different language content to _fr, _en field like you've done but in search, we need a language code as a parameter to specify the language client wants to search on which is normally decided by the website visited, such as: qf=name descriptionlanguage=en and in our search components we find the right field: name_en and description_en to be searched on we used to support on all language search and removed that later, as the site tells the customer which language is supported, we also don't think we have many language experts on our web sites that knows more than two language and need to search them at the same time. On 7 November 2013 23:01, Tom Mortimer tom.m.f...@gmail.com wrote: Ah, thanks Markus. I think I'll just add the Boolean operators to the stopwords list in that case. Tom On 7 November 2013 12:01, Markus Jelsma markus.jel...@openindex.io wrote: This is an ancient problem. The issue here is your mm-parameter, it gets confused because for separate fields different amount of tokens are filtered/emitted so it is never going to work just like this. The easiest option is not to use the stopfilter. http://lucene.472066.n3.nabble.com/Dismax-Minimum-Match-Stopwords-Bug-td493483.html https://issues.apache.org/jira/browse/SOLR-3085 -Original message- From:Tom Mortimer tom.m.f...@gmail.com Sent: Thursday 7th November 2013 12:50 To: solr-user@lucene.apache.org Subject: eDisMax, multiple language support and stopwords Hi all, Thanks for the help and advice I've got here so far! Another question - I want to support stopwords at search time, so that e.g. the query oscar and wilde is equivalent to oscar wilde (this is with lowercaseOperators=false). Fair enough, I have stopword and in the query analyser chain. However, I also need to support French as well as English, so I've got _en and _fr versions of the text fields, with appropriate stemming and stopwords. I index French content into the _fr fields and English into the _en fields. I'm searching with eDisMax over both versions, e.g.: str name=qfheadline_en headline_fr/str However, this means I get no results for oscar and wilde. The parsed query is: (+((DisjunctionMaxQuery((headline_fr:osca | headline_en:oscar)) DisjunctionMaxQuery((headline_fr:and)) DisjunctionMaxQuery((headline_fr:wild | headline_en:wild)))~3))/no_coord If I add and to the French stopwords list, I *do* get results, and the parsed query is: (+((DisjunctionMaxQuery((headline_fr:osca | headline_en:oscar)) DisjunctionMaxQuery((headline_fr:wild | headline_en:wild)))~2))/no_coord This implies that the only solution is to have a minimal, shared stopwords list for all languages I want to support. Is this correct, or is there a way of supporting this kind of searching with per-language stopword lists? Thanks for any ideas! Tom -- All the best Liu Bo
Re: Multi-core support for indexing multiple servers
like Erick said, merge data from different datasource could be very difficult, SolrJ is much easier to use but may need another application to do handle index process if you don't want to extends solr much. I eventually end up with a customized request handler which use SolrWriter from DIH package to index data, So that I can fully control the index process, quite like SolrJ, you can write code to convert your data into SolrInputDocument, and then post them to SolrWriter, SolrWriter will handles the rest stuff. On 8 November 2013 21:46, Erick Erickson erickerick...@gmail.com wrote: Yep, you can define multiple data sources for use with DIH. Combining data from those multiple sources into a single index can be a bit tricky with DIH, personally I tend to prefer SolrJ, but that's mostly personal preference, especially if I want to get some parallelism going on. But whatever works Erick On Thu, Nov 7, 2013 at 11:17 PM, manju16832003 manju16832...@gmail.com wrote: Eric, Just a question :-), wouldn't it be easy to use DIH to pull data from multiple data sources. I do use DIH to do that comfortably. I have three data sources - MySQL - URLDataSource that returns XML from an .NET application - URLDataSource that connects to an API and return XML Here is part of data-config data source settings dataSource type=JdbcDataSource name=solr driver=com.mysql.jdbc.Driver url=jdbc:mysql://localhost/employeeDB batchSize=-1 user=root password=root/ dataSource name=CRMServer type=URLDataSource encoding=UTF-8 connectionTimeout=5000 readTimeout=1/ dataSource name=ImageServer type=URLDataSource encoding=UTF-8 connectionTimeout=5000 readTimeout=1/ Of course, in application I do the same. To construct my results, I do connect to MySQL and those two data sources. Basically we have two point of indexing - Using DIH at one time indexing - At application whenever there is transaction to the details that we are storing in Solr. -- View this message in context: http://lucene.472066.n3.nabble.com/Multi-core-support-for-indexing-multiple-servers-tp4099729p4099933.html Sent from the Solr - User mailing list archive at Nabble.com. -- All the best Liu Bo
how does solr load plugins?
Hi I write a plugin to index contents reusing our DAO layer which is developed using Spring. What I am doing now is putting the plugin jar and all other depending jars of DAO layer to shared lib folder under solr home. In the log, I can see all the jars are loaded through SolrResourceLoader like: INFO - 2013-10-16 16:25:30.611; org.apache.solr.core.SolrResourceLoader; Adding 'file:/D:/apache-tomcat-7.0.42/solr/lib/spring-tx-3.1.0.RELEASE.jar' to classloader Then initialize the Spring context using: ApplicationContext context = new FileSystemXmlApplicationContext(/solr/spring/solr-plugin-bean-test.xml); Then Spring will complain: INFO - 2013-10-16 16:33:57.432; org.springframework.context.support.AbstractApplicationContext; Refreshing org.springframework.context.support.FileSystemXmlApplicationContext@e582a85: startup date [Wed Oct 16 16:33:57 CST 2013]; root of context hierarchy INFO - 2013-10-16 16:33:57.491; org.springframework.beans.factory.xml.XmlBeanDefinitionReader; Loading XML bean definitions from file [D:\apache-tomcat-7.0.42\solr\spring\solr-plugin-bean-test.xml] ERROR - 2013-10-16 16:33:59.944; com.test.search.solr.spring.AppicationContextWrapper; Configuration problem: Unable to locate Spring NamespaceHandler for XML schema namespace [ http://www.springframework.org/schema/context] Offending resource: file [D:\apache-tomcat-7.0.42\solr\spring\solr-plugin-bean-test.xml] Spring context requires spring-tx-3.1.xsd which does exist in spring-tx-3.1.0.RELEASE.jar under org\springframework\transaction\config\ package, but the program can't find it even though it could load spring classes successfully. The following won't work either. ApplicationContext context = new ClassPathXmlApplicationContext(classpath:spring/solr-plugin-bean-test.xml); //the solr-plugin-bean-test.xml is packaged in plugin.jar as well. But when I but all the jars under TOMECAT_HOME/webapp/solr/WEB-INF/lib, and using ApplicationContext context = new ClassPathXmlApplicationContext(classpath:spring/solr-plugin-bean-test.xml); everything works fine, I could initialize spring context and load DAO beans to read data and then write them to solr index. But isn't modifying solr.war a bad practice? It seems SolrResourceLoader only loads classes from plugins jars but these jars are NOT in classpath. Please correct me if I am wrong, Is there any ways to use resources in plugin jars such as configuration file? BTW is there any difference between SolrResourceLoader with tomcat webapp classLoader? -- All the best Liu Bo
Re: SolrDocumentList - bitwise operation
join query might be helpful: http://wiki.apache.org/solr/Join join can across indexes but probably won't work in solr clound. be aware that only to documents are retrievable, if you want content from both documents, join query won't work. And in lucene join query doesn't quite work on multiple join conditions, haven't test it in solr yet. I have similar join case like you, eventually I choose to denormalize our data into one set of documents. On 13 October 2013 22:34, Michael Tyler michaeltyler1...@gmail.com wrote: Hello, I have 2 different solr indexes returning 2 different sets of SolrDocumentList. Doc Id is the foreign key relation. After obtaining them, I want to perform AND operation between them and then return results to user. Can you tell me how do I get this? I am using solr 4.3 SolrDocumentList results1 = responseA.getResults(); SolrDocumentList results2 = responseB.getResults(); results1 : d1, d2, d3 results2 : d1,d2, d4 Return : d1, d2 Regards, Michael -- All the best Liu Bo
Re: SolrCore 'collection1' is not available due to init failure
org.apache.solr.core.SolrCore.init(SolrCore.java:821) ... 13 more Caused by: org.apache.lucene.store.LockObtainFailedException: Lock obtain timed out: NativeFSLock@/usr/share/solr-4.5.0/example/solr/ collection1/data/index/write.lock: java.io.FileNotFoundException: /usr/share/solr-4.5.0/example/solr/collection1/data/index/write.lock (Permission denied) at org.apache.lucene.store.Lock.obtain(Lock.java:84) at it seems a permission problem, the user that start tomcat don't have permission to access your index folder. try grant read and write permission to current user to your solr data folder and restart tomcat to see what happens. -- All the best Liu Bo
Re: Multiple schemas in the same SolrCloud ?
you can try this way: start zookeeper server first. upload your configurations to zookeeper and link them to your collection using zkcli just like shawn said let's say you have conf1 and conf2, you can link them to collection1 and collection2 remove the bootstrap stuff and start solr server. after you have solr running, create collection1 and collection2 via core admin, you don't have conf because all your core specified configurations are in zookeeper or you could use core discovery and have collection name specified in core.properties, see : http://wiki.apache.org/solr/Core%20Discovery%20%284.4%20and%20beyond%29 On 10 October 2013 23:57, maephisto my_sky...@yahoo.com wrote: On this topic, once you've uploaded you collection's configuration in ZK, how can you update it? Upload the new one with the same config name ? -- View this message in context: http://lucene.472066.n3.nabble.com/Multiple-schemas-in-the-same-SolrCloud-tp4094279p4094729.html Sent from the Solr - User mailing list archive at Nabble.com. -- All the best Liu Bo
Re: documents are not commited distributively in solr cloud tomcat with core discovery, range is null for shards in clusterstate.json
I've solved this problem myself. If you use core discovery, you must specify the numShards parameter in core.properties. or else solr won't be allocate range for each shards and then documents won't be distributed properly. Using core discovery to set up solr cloud in tomcat is much easier and clean than coreAdmin described in the wiki: http://wiki.apache.org/solr/SolrCloudTomcat. It costs me some time to move from jetty to tomcat, but I think our IT team will like this way. :) On 6 October 2013 23:53, Liu Bo diabl...@gmail.com wrote: Hi all I've sent out this mail before, but I only subscribed to lucene-user but not solr-user at that time. Sorry for repeating if any and your help will be much of my appreciation. I'm trying out the tutorial about solrcloud, and then I manage to write my own plugin to import data from our set of databases, I use SolrWriter from DataImporter package and the docs could be distributed commit to shards. Every thing works fine using jetty from the solr example, but when I move to tomcat, solrcloud seems not been configured right. As the documents are just committed to the shard where update requested goes to. The cause probably is the range is null for shards in clusterstate.json. The router is implicit instead of compositeId as well. Is there anything missed or configured wrong in the following steps? How could I fix it. Your help will be much of my appreciation. PS, solr cloud tomcat wiki page isn't up to 4.4 with core discovery, I'm trying out after reading SoclrCloud, SolrCloudJboss, and CoreAdmin wiki pages. Here's what I've done and some useful logs: 1. start three zookeeper server. 2. upload configuration files to zookeeper, the collection name is content_collection 3. start three tomcat instants on three server with core discovery a) core file: name=content loadOnStartup=true transient=false shard=shard1 (differrent on servers) collection=content_collection b) solr.xml solr solrcloud str name=host${host:}/str str name=hostContext${hostContext:solr}/str int name=hostPort8080/int int name=zkClientTimeout${zkClientTimeout:15000}/int str name=zkHost10.199.46.176:2181,10.199.46.165:2181, 10.199.46.158:2181/str bool name=genericCoreNodeNames${genericCoreNodeNames:true}/bool /solrcloud shardHandlerFactory name=shardHandlerFactory class=HttpShardHandlerFactory int name=socketTimeout${socketTimeout:0}/int int name=connTimeout${connTimeout:0}/int /shardHandlerFactory /solr 4. In the solr.log, I see the three shards are recognized, and the solrcloud can see the content_collection has three shards as well. 5. write documents to content_collection using my update request, the documents only commits to the shard the request goes to, in the log I can see the DistributedUpdateProcessorFactory is in the processorChain and disribute commit is triggered: INFO - 2013-09-30 16:31:43.205; com.microstrategy.alert.search.solr.plugin.index.handler.IndexRequestHandler; updata request processor factories: INFO - 2013-09-30 16:31:43.206; com.microstrategy.alert.search.solr.plugin.index.handler.IndexRequestHandler; org.apache.solr.update.processor.LogUpdateProcessorFactory@4ae7b77 INFO - 2013-09-30 16:31:43.207; com.microstrategy.alert.search.solr.plugin.index.handler.IndexRequestHandler; org.apache.solr.update.processor.*DistributedUpdateProcessorFactory* @5b2bc407 INFO - 2013-09-30 16:31:43.207; com.microstrategy.alert.search.solr.plugin.index.handler.IndexRequestHandler; org.apache.solr.update.processor.RunUpdateProcessorFactory@1652d654 INFO - 2013-09-30 16:31:43.283; org.apache.solr.core.SolrDeletionPolicy; SolrDeletionPolicy.onInit: commits: num=1 commit{dir=/home/bold/work/tomcat/solr/content/data/index,segFN=segments_1,generation=1} INFO - 2013-09-30 16:31:43.284; org.apache.solr.core.SolrDeletionPolicy; newest commit generation = 1 INFO - 2013-09-30 16:31:43.440; *org.apache.solr.update.SolrCmdDistributor; Distrib commit to*:[StdNode: http://10.199.46.176:8080/solr/content/, StdNode: http://10.199.46.165:8080/solr/content/] params:commit_end_point=truecommit=truesoftCommit=falsewaitSearcher=trueexpungeDeletes=false but the documents won't go to other shards, the other shards only has a request with not documents: INFO - 2013-09-30 16:31:43.841; org.apache.solr.update.DirectUpdateHandler2; start commit{,optimize=false,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=false,prepareCommit=false} INFO - 2013-09-30 16:31:43.855; org.apache.solr.core.SolrDeletionPolicy; SolrDeletionPolicy.onInit: commits: num=1 commit{dir=/home/bold/work/tomcat/solr/content/data/index,segFN=segments_1,generation=1} INFO - 2013-09-30 16:31:43.855; org.apache.solr.core.SolrDeletionPolicy; newest commit
documents are not commited distributively in solr cloud tomcat with core discovery, range is null for shards in clusterstate.json
@3c74c144main{StandardDirectoryReader(segments_1:1:nrt)} INFO - 2013-09-30 16:31:43.870; org.apache.solr.update.DirectUpdateHandler2; end_commit_flush INFO - 2013-09-30 16:31:43.870; org.apache.solr.update.processor.LogUpdateProcessor; [content] webapp=/solr path=/update params={waitSearcher=truecommit=truewt=javabinexpungeDeletes=falsecommit_end_point=trueversion=2softCommit=false} {commit=} 0 42 6) later I found the range is null in clusterstate.json which might have caused the document isn't committed distributively {content_collection:{ shards:{ shard1:{ * range:null,* state:active, replicas:{core_node1:{ state:active, core:content, node_name:10.199.46.176:8080_solr, base_url:http://10.199.46.176:8080/solr;, leader:true}}}, shard3:{ * range:null,* state:active, replicas:{core_node2:{ state:active, core:content, node_name:10.199.46.202:8080_solr, base_url:http://10.199.46.202:8080/solr;, leader:true}}}, shard2:{ * range:null,* state:active, replicas:{core_node3:{ state:active, core:content, node_name:10.199.46.165:8080_solr, base_url:http://10.199.46.165:8080/solr;, leader:true, *router:implicit*}} -- All the best Liu Bo
documents are not commited distributively in solr cloud tomcat with core discovery, range is null for shards in clusterstate.json
:31:43.870; org.apache.solr.update.processor.LogUpdateProcessor; [content] webapp=/solr path=/update params={waitSearcher=truecommit=truewt=javabinexpungeDeletes=falsecommit_end_point=trueversion=2softCommit=false} {commit=} 0 42 6) later I found the range is null in clusterstate.json which might have caused the document isn't committed distributively {content_collection:{ shards:{ shard1:{ * range:null,* state:active, replicas:{core_node1:{ state:active, core:content, node_name:10.199.46.176:8080_solr, base_url:http://10.199.46.176:8080/solr;, leader:true}}}, shard3:{ * range:null,* state:active, replicas:{core_node2:{ state:active, core:content, node_name:10.199.46.202:8080_solr, base_url:http://10.199.46.202:8080/solr;, leader:true}}}, shard2:{ * range:null,* state:active, replicas:{core_node3:{ state:active, core:content, node_name:10.199.46.165:8080_solr, base_url:http://10.199.46.165:8080/solr;, leader:true, *router:implicit*}} -- All the best Liu Bo
how can I use DataImportHandler on multiple MySQL databases with the same schema?
Hi all Our system has distributed MySQL databases, we create a database for every customer signed up and distributed it to one of our MySQL hosts. We currently use lucene core to perform search on these databases, and we write java code to loop through these databases and convert the data to lucene index. Right now we are planning to move to Solr for distribution, and I am doing investigation on it. I tried to use DataImportHandlerhttp://wiki.apache.org/solr/DataImportHandler in the wiki page, but I can't figured out a way to use multiple datasoures with the same schema. The other question is, we have the database connection data in one table, can I create datasource connections info from it, and loop through the databases using DataImporter? If DataImporter isn't working, is there a way to feed data to solr using customized SolrRequestHandler without using SolrJ? If neither of these two ways is working, I think I am going to reuse the DAO of the old project and feed the data to solr using SolrJ, probably using embedded Solr server. Your help will be much of my appreciation. http://wiki.apache.org/solr/DataImportHandlerFaq-- All the best Liu Bo
答复: removing duplicates
This picture is extracted from apache-solr-ref-guide-4.4.pdf ,Maybe it will help you. You could download the document from https://www.apache.org/dyn/closer.cgi/lucene/solr/ref-guide/ -邮件原件- 发件人: Ali, Saqib [mailto:docbook@gmail.com] 发送时间: 2013年8月22日 5:15 收件人: solr-user@lucene.apache.org 主题: removing duplicates hello, We have documents that are duplicates i.e. the ID is different, but rest of the fields are same. Is there a query that can remove duplicate, and just leave one copy of the document on solr? There is one numeric field that we can key off for find duplicates. Please advise. Thanks
How to sort by the function: relevance_score*numberic_field/(relevance_score +numberic_field )
Hi: I want to rank the search result by the function: relevance_score*numberic_field/(relevance_score +numberic_field ) , this function equals to 1/((1/relevance_score)+1/numberic_field) As far as I know ,I could use function query: sort= div(1,sum(div(1,field(numberic_field)),div(1,query({!edismax v=' somewords''} .There is a subquery in this function: query({!edismax v='somewords'}) ,it returns the relevance_sore .But I can't figure out its query efficiency. After tracking the source code, I think the efficiency is OK, but I can't make sure. Do we have other approaches to sort docs by: relevance_score*numberic_field/(relevance_score +numberic_field ) ? Thank you Leo
One case for shingle and synonym filter
Hi, Here is the case:Given a doc named sport center, we hope some query like sportctr (user ignore) can recall it.Can shingle and synonym filter be combined in some smart way to produce the term? Thanks,Xiang
HELP: CommonsHttpSolrServer.commit() time out after 1min
Hi, we have an index with 2mil documents in it. From time to time we rewrite about 1/10 of the documents (just under 200k). No autocommit. At the end we a single commit and got time out after 60 sec. My questions are: 1. is it normal to have the commit of this size takes more than 1min? I know it's probably depend on the server ... 2. I know there're a few parameters I can set in CommonsHttpSolrServer class: setConnectionManagerTimeout(), setConnectionTimeout(), setSoTimeout(). Which should I use? TIA
Re: HELP: CommonsHttpSolrServer.commit() time out after 1min
Thanks for the quick response. It's Solr 3.4. I'm pretty sure we get plenty memory. On Tue, Feb 19, 2013 at 7:50 PM, Alexandre Rafalovitch arafa...@gmail.comwrote: Which version of Solr? Are you sure you did not run out of memory half way through import? Regards, Alex. Personal blog: http://blog.outerthoughts.com/ LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch - Time is the quality of nature that keeps events from happening all at once. Lately, it doesn't seem to be working. (Anonymous - via GTD book) On Tue, Feb 19, 2013 at 7:44 PM, Siping Liu liu01...@gmail.com wrote: Hi, we have an index with 2mil documents in it. From time to time we rewrite about 1/10 of the documents (just under 200k). No autocommit. At the end we a single commit and got time out after 60 sec. My questions are: 1. is it normal to have the commit of this size takes more than 1min? I know it's probably depend on the server ... 2. I know there're a few parameters I can set in CommonsHttpSolrServer class: setConnectionManagerTimeout(), setConnectionTimeout(), setSoTimeout(). Which should I use? TIA
Re: HELP: CommonsHttpSolrServer.commit() time out after 1min
Solrj. On Tue, Feb 19, 2013 at 9:08 PM, Erick Erickson erickerick...@gmail.comwrote: Well, your commits may have to wait until any merges are done, which _may_ be merging your entire index into a single segment. Possibly this could take more than 60 seconds. _How_ are you doing this? DIH? SolrJ? post.jar? Best Erick On Tue, Feb 19, 2013 at 8:00 PM, Siping Liu liu01...@gmail.com wrote: Thanks for the quick response. It's Solr 3.4. I'm pretty sure we get plenty memory. On Tue, Feb 19, 2013 at 7:50 PM, Alexandre Rafalovitch arafa...@gmail.comwrote: Which version of Solr? Are you sure you did not run out of memory half way through import? Regards, Alex. Personal blog: http://blog.outerthoughts.com/ LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch - Time is the quality of nature that keeps events from happening all at once. Lately, it doesn't seem to be working. (Anonymous - via GTD book) On Tue, Feb 19, 2013 at 7:44 PM, Siping Liu liu01...@gmail.com wrote: Hi, we have an index with 2mil documents in it. From time to time we rewrite about 1/10 of the documents (just under 200k). No autocommit. At the end we a single commit and got time out after 60 sec. My questions are: 1. is it normal to have the commit of this size takes more than 1min? I know it's probably depend on the server ... 2. I know there're a few parameters I can set in CommonsHttpSolrServer class: setConnectionManagerTimeout(), setConnectionTimeout(), setSoTimeout(). Which should I use? TIA
Re: custom sorter
Hi -- thanks for the response. It's the right direction. However on closer look I don't think I can use it directly. The reason is that in my case, the query string is always *:*, we use filter query to get different results. When fq=(field1:xyz) we want to boost one document and let sort= to take care of the rest results, and when field1 has other value, sort= takes care of all results. Maybe I can define my own SearchComponent class, and specify it in arr name=last-components strmy_search_component/str /arr I have to try and see if that'd work. thanks. On Fri, Jul 20, 2012 at 3:24 AM, Lee Carroll lee.a.carr...@googlemail.comwrote: take a look at http://wiki.apache.org/solr/QueryElevationComponent On 20 July 2012 03:48, Siping Liu liu01...@gmail.com wrote: Hi, I have requirements to place a document to a pre-determined position for special filter query values, for instance when filter query is fq=(field1:xyz) place document abc as first result (the rest of the result set will be ordered by sort=field2). I guess I have to plug in my Java code as a custom sorter. I'd appreciate it if someone can shed light on this (how to add custom sorter, etc.) TIA.
custom sorter
Hi, I have requirements to place a document to a pre-determined position for special filter query values, for instance when filter query is fq=(field1:xyz) place document abc as first result (the rest of the result set will be ordered by sort=field2). I guess I have to plug in my Java code as a custom sorter. I'd appreciate it if someone can shed light on this (how to add custom sorter, etc.) TIA.
Re: help: I always get NULL with row.get(columnName)
anyone knows? On Thu, Jul 19, 2012 at 5:48 PM, Roy Liu liuchua...@gmail.com wrote: Hi, When I use Transformer to handle files, I always get NULL with row.get(columnName). anyone knows? -- The following file is *data-config.xml* dataConfig dataSource type=JdbcDataSource name=ds driver=oracle.jdbc.driver.OracleDriver url=jdbc:oracle:thin:@10.1.1.1:1521:sid user=username password=pwd / document name=BS_REPORT entity name=report pk=ID query=select a.objid as ID from DOCGENERAL a where a.objid=14154965 field column=ID name=id / *entity name=attachment * *query=select docid as ID, name as filename, storepath as filepath from attachment where docid=${report.ID} * * transformer=com.bs.solr.BSFileTransformer * * field column=ID name=bs_attachment_id /* * field column=filename name=bs_attachment_name /* * field column=filepath name=bs_attachment isfile=true/* * /entity* /entity /document /dataConfig public class *BSFileTransformer *extends Transformer { private static Log LOGGER = LogFactory.getLog(BSFileTransformer.class); @Override public Object transformRow(MapString, Object row, Context context) { // row.get(filename) is always null,but row.get(id) is OK. S*ystem.out.println(==filename:+row.get(filename));* ListMapString, String fields = context.getAllEntityFields(); String id = null; // Entity ID String fileName = NONAME; for (MapString, String field : fields) { String name = field.get(name); System.out.println(name: + name); if (bs_attachment_id.equals(name)) { String columnName = field.get(column); id = String.valueOf(row.get(columnName)); } if (bs_attachment_name.equals(name)) { String columnName = field.get(column); fileName = (String) row.get(columnName); } String isFile = field.get(isfile); if (true.equals(isFile)) { String columnName = field.get(column); String filePath = (String) row.get(columnName); try { System.out.println(fileName:+ fileName+,filePath: + filePath); if(filePath != null){ File file = new File(filePath); InputStream inputStream = new FileInputStream(file); Tika tika = new Tika(); String text = tika.parseToString(inputStream); row.put(columnName, text); } LOGGER.info(Processed File OK! Entity: + fileName + , ID: +id); } catch (IOException ioe) { LOGGER.error(ioe.getMessage()); row.put(columnName, ); } catch (TikaException e) { LOGGER.error(Parse File Error: + id + , Error: + e.getMessage()); row.put(columnName, ); } } } return row; } }
Solr mail dataimporter cannot be found
Hi, I want to index emails using solr. I put the user name, password, hostname in data-config.xml under mail folder. This is a valid email but when I run in url http://localhost:8983/solr/mail/dataimport?command=full-import It said cannot access mail/dataimporter reason: no found. But when i run http://localhost:8983/solr/rss/dataimport?command=full-importhttp://localhost:8983/solr/mail/dataimport?command=full-import or http://localhost:8983/solr/db/dataimport?command=full-imporhttp://localhost:8983/solr/mail/dataimport?command=full-import They can be found. In addition, when I run the command java -Dsolr.solr.home=./example-DIH/solr/ -jar start.jar , on the left side of solr UI, there are db, rss, tika and solr but no mail. Is it a bug that mail indexing? Thank you so much! Best, Emma
RE: memory usage keep increase
Erick, Thanks for your reply. Yes, virtual memory does not mean physical memory. But if when virtual memory physical memory, the system will change to slow, since lots for paging request happen. Yongtao -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: Tuesday, November 15, 2011 8:37 AM To: solr-user@lucene.apache.org Subject: Re: memory usage keep increase I'm pretty sure not. The words virtual memory address space is important here, that's not physical memory... Best Erick On Mon, Nov 14, 2011 at 11:55 AM, Yongtao Liu y...@commvault.com wrote: Hi all, I saw one issue is ram usage keep increase when we run query. After look in the code, looks like Lucene use MMapDirectory to map index file to ram. According to http://lucene.apache.org/java/3_1_0/api/core/org/apache/lucene/store/MMapDirectory.html comments, it will use lot of memory. NOTE: memory mapping uses up a portion of the virtual memory address space in your process equal to the size of the file being mapped. Before using this class, be sure your have plenty of virtual address space, e.g. by using a 64 bit JRE, or a 32 bit JRE with indexes that are guaranteed to fit within the address space. So, my understanding is solr request physical RAM = index file size, is it right? Yongtao **Legal Disclaimer*** This communication may contain confidential and privileged material for the sole use of the intended recipient. Any unauthorized review, use or distribution by others is strictly prohibited. If you have received the message in error, please advise the sender by reply email and delete the message. Thank you. *
memory usage keep increase
Hi all, I saw one issue is ram usage keep increase when we run query. After look in the code, looks like Lucene use MMapDirectory to map index file to ram. According to http://lucene.apache.org/java/3_1_0/api/core/org/apache/lucene/store/MMapDirectory.html comments, it will use lot of memory. NOTE: memory mapping uses up a portion of the virtual memory address space in your process equal to the size of the file being mapped. Before using this class, be sure your have plenty of virtual address space, e.g. by using a 64 bit JRE, or a 32 bit JRE with indexes that are guaranteed to fit within the address space. So, my understanding is solr request physical RAM = index file size, is it right? Yongtao **Legal Disclaimer*** This communication may contain confidential and privileged material for the sole use of the intended recipient. Any unauthorized review, use or distribution by others is strictly prohibited. If you have received the message in error, please advise the sender by reply email and delete the message. Thank you. *
Re: FW: MMapDirectory failed to map a 23G compound index segment
I hit similar issue recently. Not sure if MMapDirectory is right way to go. When index file be map to ram, JVM will call OS file mapping function. The memory usage is in share memory, it may not be calculate to JVM process space. I saw one problem is if the index file bigger then physical ram, and there are lot of query which cause wide index file access. Then, the machine has no available memory. The system change to very slow. What i did is change lucene code to disable MMapDirectory. On Wed, Sep 21, 2011 at 1:26 PM, Yongtao Liu y...@commvault.com wrote: -Original Message- From: Michael McCandless [mailto:luc...@mikemccandless.com] Sent: Tuesday, September 20, 2011 3:33 PM To: solr-user@lucene.apache.org Subject: Re: MMapDirectory failed to map a 23G compound index segment Since you hit OOME during mmap, I think this is an OS issue not a JVM issue. Ie, the JVM isn't running out of memory. How many segments were in the unoptimized index? It's possible the OS rejected the mmap because of process limits. Run cat /proc/sys/vm/max_map_count to see how many mmaps are allowed. Or: is it possible you reopened the reader several times against the index (ie, after committing from Solr)? If so, I think 2.9.x never unmaps the mapped areas, and so this would accumulate against the system limit. My memory of this is a little rusty but isn't mmap also limited by mem + swap on the box? What does 'free -g' report? I don't think this should be the case; you are using a 64 bit OS/JVM so in theory (except for OS system wide / per-process limits imposed) you should be able to mmap up to the full 64 bit address space. Your virtual memory is unlimited (from ulimit output), so that's good. Mike McCandless http://blog.mikemccandless.com On Wed, Sep 7, 2011 at 12:25 PM, Rich Cariens richcari...@gmail.com wrote: Ahoy ahoy! I've run into the dreaded OOM error with MMapDirectory on a 23G cfs compound index segment file. The stack trace looks pretty much like every other trace I've found when searching for OOM map failed[1]. My configuration follows: Solr 1.4.1/Lucene 2.9.3 (plus SOLR-1969https://issues.apache.org/jira/browse/SOLR-1969 ) CentOS 4.9 (Final) Linux 2.6.9-100.ELsmp x86_64 yada yada yada Java SE (build 1.6.0_21-b06) Hotspot 64-bit Server VM (build 17.0-b16, mixed mode) ulimits: core file size (blocks, -c) 0 data seg size(kbytes, -d) unlimited file size (blocks, -f) unlimited pending signals(-i) 1024 max locked memory (kbytes, -l) 32 max memory size (kbytes, -m) unlimited open files(-n) 256000 pipe size (512 bytes, -p) 8 POSIX message queues (bytes, -q) 819200 stack size(kbytes, -s) 10240 cpu time(seconds, -t) unlimited max user processes (-u) 1064959 virtual memory(kbytes, -v) unlimited file locks(-x) unlimited Any suggestions? Thanks in advance, Rich [1] ... java.io.IOException: Map failed at sun.nio.ch.FileChannelImpl.map(Unknown Source) at org.apache.lucene.store.MMapDirectory$MMapIndexInput.init(Unknown Source) at org.apache.lucene.store.MMapDirectory$MMapIndexInput.init(Unknown Source) at org.apache.lucene.store.MMapDirectory.openInput(Unknown Source) at org.apache.lucene.index.SegmentReader$CoreReaders.init(Unknown Source) at org.apache.lucene.index.SegmentReader.get(Unknown Source) at org.apache.lucene.index.SegmentReader.get(Unknown Source) at org.apache.lucene.index.DirectoryReader.init(Unknown Source) at org.apache.lucene.index.ReadOnlyDirectoryReader.init(Unknown Source) at org.apache.lucene.index.DirectoryReader$1.doBody(Unknown Source) at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(Unknown Source) at org.apache.lucene.index.DirectoryReader.open(Unknown Source) at org.apache.lucene.index.IndexReader.open(Unknown Source) ... Caused by: java.lang.OutOfMemoryError: Map failed at sun.nio.ch.FileChannelImpl.map0(Native Method) ... **Legal Disclaimer*** This communication may contain confidential and privileged material for the sole use of the intended recipient. Any unauthorized review, use or distribution by others is strictly prohibited. If you have received the message in error, please advise the sender by reply email and delete the message. Thank you. *
Re: How to index PDF file stored in SQL Server 2008
Hi, all Thank YOU very much for your kindly help. *1. I have upgrade from Solr 1.4 to Solr 3.1* *2. Change data-config-sql.xml * dataConfig dataSource type=JdbcDataSource name=*bsds* driver=com.microsoft.sqlserver.jdbc.SQLServerDriver url=jdbc:sqlserver://localhost:1433;databaseName=bs_docmanager user=username password=pw/ datasource name=*docds* type=*BinURLDataSource* / document name=docs entity name=*doc* dataSource=*bsds* query=select id,attachment,filename from attachment where ext='pdf' and id30001030 field column=id name=id / *entity dataSource=docds processor=TikaEntityProcessor url=${doc.attachment} format=text ** field column=attachment name=bs_attachment / /entity* field column=filename name=title / /entity /document /dataConfig *3. solrconfig.xml and schema.xml are NOT changed.* However, when I access *http://localhost:8080/solr/dataimport?command=full-import* It still has errors: Full Import failed:org.apache.solr.handler.dataimport.DataImportHandlerException: Unable to execute query:[B@ae1393 Processing Document # 1 Could you give me some advices. This problem is so boring me. Thanks. -- Best Regards, Roy Liu On Mon, Apr 11, 2011 at 5:16 AM, Lance Norskog goks...@gmail.com wrote: You have to upgrade completely to the Apache Solr 3.1 release. It is worth the effort. You cannot copy any jars between Solr releases. Also, you cannot copy over jars from newer Tika releases. On Fri, Apr 8, 2011 at 10:47 AM, Darx Oman darxo...@gmail.com wrote: Hi again what you are missing is field mapping field column=id name=id / no need for TikaEntityProcessor since you are not accessing pdf files -- Lance Norskog goks...@gmail.com
Re: How to index PDF file stored in SQL Server 2008
Hi, I have copied \apache-solr-3.1.0\dist\apache-solr-dataimporthandler-extras-3.1.0.jar into \apache-tomcat-6.0.32\webapps\solr\WEB-INF\lib\ Other Errors: Caused by: com.microsoft.sqlserver.jdbc.SQLServerException: Unclosed quotation mark after the character string 'B@3e574'. -- Best Regards, Roy Liu On Mon, Apr 11, 2011 at 2:12 PM, Darx Oman darxo...@gmail.com wrote: Hi there Error is not clear... but did you copy apache-solr-dataimporthandler-extras-4.0-SNAPSHOT.jar to your solr\lib ?
Re: How to index PDF file stored in SQL Server 2008
I changed data-config-sql.xml to dataConfig dataSource type=JdbcDataSource name=bsds driver=com.microsoft.sqlserver.jdbc.SQLServerDriver url=jdbc:sqlserver://localhost:1433;databaseName=bs_docmanager user=username password=pw convertType=true / document name=docs entity name=doc dataSource=bsds query=select id,filename,attachment from attachment where ext='pdf' and id=3632 field column=id name=id / field column=filename name=title / field column=attachment name=bs_attachment / /entity /document /dataConfig There are no errors, but, the indexed pdf is convert to Numbers.. 200 1 202 1 203 1 212 1 222 1 236 1 242 1 244 1 254 1 255 -- Best Regards, Roy Liu On Mon, Apr 11, 2011 at 2:02 PM, Roy Liu liuchua...@gmail.com wrote: Hi, all Thank YOU very much for your kindly help. *1. I have upgrade from Solr 1.4 to Solr 3.1* *2. Change data-config-sql.xml * dataConfig dataSource type=JdbcDataSource name=*bsds* driver=com.microsoft.sqlserver.jdbc.SQLServerDriver url=jdbc:sqlserver://localhost:1433;databaseName=bs_docmanager user=username password=pw/ datasource name=*docds* type=*BinURLDataSource* / document name=docs entity name=*doc* dataSource=*bsds* query=select id,attachment,filename from attachment where ext='pdf' and id30001030 field column=id name=id / *entity dataSource=docds processor=TikaEntityProcessor url=${doc.attachment} format=text ** field column=attachment name=bs_attachment / /entity* field column=filename name=title / /entity /document /dataConfig *3. solrconfig.xml and schema.xml are NOT changed.* However, when I access *http://localhost:8080/solr/dataimport?command=full-import* It still has errors: Full Import failed:org.apache.solr.handler.dataimport.DataImportHandlerException: Unable to execute query:[B@ae1393 Processing Document # 1 Could you give me some advices. This problem is so boring me. Thanks. -- Best Regards, Roy Liu On Mon, Apr 11, 2011 at 5:16 AM, Lance Norskog goks...@gmail.com wrote: You have to upgrade completely to the Apache Solr 3.1 release. It is worth the effort. You cannot copy any jars between Solr releases. Also, you cannot copy over jars from newer Tika releases. On Fri, Apr 8, 2011 at 10:47 AM, Darx Oman darxo...@gmail.com wrote: Hi again what you are missing is field mapping field column=id name=id / no need for TikaEntityProcessor since you are not accessing pdf files -- Lance Norskog goks...@gmail.com
Re: Tika, Solr running under Tomcat 6 on Debian
\apache-solr-3.1.0\contrib\extraction\lib\tika*.jar -- Best Regards, Roy Liu On Mon, Apr 11, 2011 at 3:10 PM, Mike satish01sud...@gmail.com wrote: Hi All, I have the same issue. I have installed solr instance on tomcat6. When try to index pdf I am running into the below exception: 11 Apr, 2011 12:11:55 PM org.apache.solr.common.SolrException log SEVERE: java.lang.NoClassDefFoundError: org/apache/tika/exception/TikaException at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:247) at org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:359) at org.apache.solr.core.SolrCore.createInstance(SolrCore.java:413) at org.apache.solr.core.SolrCore.createRequestHandler(SolrCore.java:449) at org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.getWrappedHandler(RequestHandlers.java:240) at org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:231) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:175) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:286) at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:844) at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:583) at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:447) at java.lang.Thread.run(Thread.java:619) Caused by: java.lang.ClassNotFoundException: org.apache.tika.exception.TikaException at java.net.URLClassLoader$1.run(URLClassLoader.java:202) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:190) at java.lang.ClassLoader.loadClass(ClassLoader.java:307) at java.lang.ClassLoader.loadClass(ClassLoader.java:248) ... 22 more I could not found any tika jar file. Could you please help me out in fixing the above issue. Thanks, Mike -- View this message in context: http://lucene.472066.n3.nabble.com/Tika-Solr-running-under-Tomcat-6-on-Debian-tp993295p2805615.html Sent from the Solr - User mailing list archive at Nabble.com.
How to index MS SQL Server column with image type
Hi all, When I index a column(image type) of a table via * http://localhost:8080/solr/dataimport?command=full-import* *There is a error like this: String length must be a multiple of four.* Any help? Thank you very much. PS. the attachment includes Chinese character. *1. data-config.xml* dataConfig dataSource type=JdbcDataSource driver=net.sourceforge.jtds.jdbc.Driver url=jdbc:jtds:sqlserver://host:1433/db user=username password=password/ document entity name=doc query=select id,*attachment*,filename as title from attachment where ext='doc' and id1 * field column=attachment name=bs_attachment/* /entity /document /dataConfig *2. schema.xml* field name=bs_attachment type=binary indexed=true stored=true/ *3. Database* *attachment *is a column of table attachment. it's type is IMAGE. Best Regards, Roy Liu
How to index PDF file stored in SQL Server 2008
Hi, I have a table named *attachment *in MS SQL Server 2008. COLUMNTYPE - id int titlevarchar(200) attachment image I need to index the attachment(store pdf files) column from database via DIH. After access this URL, it returns Indexing completed. Added/Updated: 5 documents. Deleted 0 documents. http://localhost:8080/solr/dataimport?command=full-import However, I can not search anything. Anyone can help me ? Thanks. *data-config-sql.xml* dataConfig dataSource type=JdbcDataSource driver=com.microsoft.sqlserver.jdbc.SQLServerDriver url=jdbc:sqlserver://localhost:1433;databaseName=master user=user password=pw/ document entity name=doc query=select id,title,attachment from attachment /entity /document /dataConfig *schema.xml* field name=attachment type=text indexed=true stored=true/ Best Regards, Roy Liu
Re: How to index PDF file stored in SQL Server 2008
Thanks Lance, I'm using Solr 1.4. If I want to using TikaEP, need to upgrade to Solr 3.1 or import jar files? Best Regards, Roy Liu On Fri, Apr 8, 2011 at 10:22 AM, Lance Norskog goks...@gmail.com wrote: You need the TikaEntityProcessor to unpack the PDF image. You are sticking binary blobs into the index. Tika unpacks the text out of the file. TikaEP is not in Solr 1.4, but it is in the new Solr 3.1 release. On Thu, Apr 7, 2011 at 7:14 PM, Roy Liu liuchua...@gmail.com wrote: Hi, I have a table named *attachment *in MS SQL Server 2008. COLUMNTYPE - id int titlevarchar(200) attachment image I need to index the attachment(store pdf files) column from database via DIH. After access this URL, it returns Indexing completed. Added/Updated: 5 documents. Deleted 0 documents. http://localhost:8080/solr/dataimport?command=full-import However, I can not search anything. Anyone can help me ? Thanks. *data-config-sql.xml* dataConfig dataSource type=JdbcDataSource driver=com.microsoft.sqlserver.jdbc.SQLServerDriver url=jdbc:sqlserver://localhost:1433;databaseName=master user=user password=pw/ document entity name=doc query=select id,title,attachment from attachment /entity /document /dataConfig *schema.xml* field name=attachment type=text indexed=true stored=true/ Best Regards, Roy Liu -- Lance Norskog goks...@gmail.com
Need help for solr searching case insensative item
Hi all, I just noticed a wierd thing happend to my solr search result. if I do a search for ecommons, it cannot get the result for eCommons, instead, if i do a search for eCommons, i can only get all the match for eCommons, but not ecommons. I cannot figure it out why? please help me Thanks very much in advance
Re: How to delete documents from a SOLR cloud / balance the shards in the cloud?
Stephan and all, I am evaluating this like you are. You may want to check http://www.tomkleinpeter.com/2008/03/17/programmers-toolbox-part-3-consistent-hashing/. I would appreciate if others can shed some light on this, too. Bests, James On Fri, Sep 10, 2010 at 6:07 AM, Stephan Raemy stephan.ra...@gmail.comwrote: Hi solr-cloud users, I'm currently setting up a solr-cloud/zookeeper instance and so far, everything works out fine. I downloaded the source from the cloud branch yesterday and build it from source. I've got 10 shards distributed across 4 servers and a zookeeper instance. Searching documents with the flag distrib=true works out and it returns the expected result. But here comes the tricky question. I will add new documents every day and therefore, I'd like to balance my shards to keep the system speedy. The Wiki says that one can calculate the hash of a document id and then determine the corresponding shard. But IMHO, this does not take into account that the cloud may become bigger or shrink over time by adding or removing shards. Obviously adding has a higher priority since one wants to reduce the shard size to improve the response time of distributed searches. When reading through the Wikis and existing documentation, it is still unclear to me how to do the following operations: - Modify/Delete a document stored in the cloud without having to store the document:shard mapping information outside of the cloud. I would expect something like shard attribute on each doc in the SOLR query result (activated/deactivated by a flag), so that i can query the SOLR cloud for a doc and then delete it on the specific shard. - Balance a cloud when adding/removing new shards or just balance them after many deletions. Of course there are solutions to this, but at the end, I'd love to have a true cloud where i do not have to worry about shard performance optimization. Hints are greatly appreciated. Cheers, Stephan
match to non tokenizable word (helloworld)
I get no match when searching for helloworld, even though I have hello world in my index. How do people usually deal with this? Write a custom analyzer, with help from a collection of all dictionary words? thanks for suggestions/comments. _ Hotmail has tools for the New Busy. Search, chat and e-mail from your inbox. http://www.windowslive.com/campaign/thenewbusy?ocid=PID28326::T:WLMTAGL:ON:WL:en-US:WM_HMP:042010_1
how to stress test solr
before stressing test, Should i close SolrCache? which tool u use? How to do stress test correctly? Any pointers? -- regards j.L ( I live in Shanghai, China)
weird problem with solr.DateField
Hi, I'm using Solr 1.4 (from nightly build about 2 months ago) and have this defined in solrconfig: fieldType name=date class=solr.DateField sortMissingLast=true omitNorms=true / field name=lastUpdate type=date indexed=true stored=true default=NOW multiValued=false / and following code that get executed once every night: CommonsHttpSolrServer solrServer = new CommonsHttpSolrServer(http://...;); solrServer.setRequestWriter(new BinaryRequestWriter()); solrServer.add(documents); solrServer.commit(); UpdateResponse deleteResult = solrServer.deleteByQuery(lastUpdate:[* TO NOW-2HOUR]); solrServer.commit(); The purpose is to refresh index with latest data (in documents). This works fine, except that after a few days I start to see a few documents with no lastUpdate field (query -lastUpdate:[* TO *]) -- how can that be possible? thanks in advance. _ Windows 7: Unclutter your desktop. http://go.microsoft.com/?linkid=9690331ocid=PID24727::T:WLMTAGL:ON:WL:en-US:WWL_WIN_evergreen:112009
RE: Solr and Garbage Collection
Hi, I read pretty much all posts on this thread (before and after this one). Looks like the main suggestion from you and others is to keep max heap size (-Xmx) as small as possible (as long as you don't see OOM exception). This brings more questions than answers (for me at least. I'm new to Solr). First, our environment and problem encountered: Solr1.4 (nightly build, downloaded about 2 months ago), Sun JDK1.6, Tomcat 5.5, running on Solaris(multi-cpu/cores). The cache setting is from the default solrconfig.xml (looks very small). At first we used minimum JAVA_OPTS and quickly run into the problem similar to the one orignal poster reported -- long pause (seconds to minutes) under load test. jconsole showed that it pauses on GC. So more JAVA_OPTS get added: -XX:+UseConcMarkSweepGC -XX:+UseParNewGC -XX:ParallelGCThreads=8 -XX:SurvivorRatio=2 -XX:NewSize=128m -XX:MaxNewSize=512m -XX:MaxGCPauseMillis=200, the thinking is with mutile-cpu/cores we can get over with GC as quickly as possibe. With the new setup, it works fine until Tomcat reaches heap size, then it blocks and takes minutes on full GC to get more space from tenure generation. We tried different Xmx (from very small to large), no difference in long GC time. We never run into OOM. Questions: * In general various cachings are good for performance, we have more RAM to use and want to use more caching to boost performance, isn't your suggestion (of lowering heap limit) going against that? * Looks like Solr caching made its way into tenure-generation on heap, that's good. But why they get GC'ed eventually?? I did a quick check of Solr code (Solr 1.3, not 1.4), and see a single instance of using WeakReference. Is that what is causing all this? This seems to suggest a design flaw in Solr's memory management strategy (or just my ignorance about Solr?). I mean, wouldn't this be the right way of doing it -- you allow user to specify the cache size in solrconfig.xml, then user can set up heap limit in JAVA_OPTS accordingly, and no need to use WeakReference (BTW, why not SoftReference)?? * Right now I have a single Tomcat hosting Solr and other applications. I guess now it's better to have Solr on its own Tomcat, given that it's tricky to adjust the java options. thanks. From: wun...@wunderwood.org To: solr-user@lucene.apache.org Subject: RE: Solr and Garbage Collection Date: Fri, 25 Sep 2009 09:51:29 -0700 30ms is not better or worse than 1s until you look at the service requirements. For many applications, it is worth dedicating 10% of your processing time to GC if that makes the worst-case pause short. On the other hand, my experience with the IBM JVM was that the maximum query rate was 2-3X better with the concurrent generational GC compared to any of their other GC algorithms, so we got the best throughput along with the shortest pauses. Solr garbage generation (for queries) seems to have two major components: per-request garbage and cache evictions. With a generational collector, these two are handled by separate parts of the collector. Per-request garbage should completely fit in the short-term heap (nursery), so that it can be collected rapidly and returned to use for further requests. If the nursery is too small, the per-request allocations will be made in tenured space and sit there until the next major GC. Cache evictions are almost always in long-term storage (tenured space) because an LRU algorithm guarantees that the garbage will be old. Check the growth rate of tenured space (under constant load, of course) while increasing the size of the nursery. That rate should drop when the nursery gets big enough, then not drop much further as it is increased more. After that, reduce the size of tenured space until major GCs start happening too often (a judgment call). A bigger tenured space means longer major GCs and thus longer pauses, so you don't want it oversized by too much. Also check the hit rates of your caches. If the hit rate is low, say 20% or less, make that cache much bigger or set it to zero. Either one will reduce the number of cache evictions. If you have an HTTP cache in front of Solr, zero may be the right choice, since the HTTP cache is cherry-picking the easily cacheable requests. Note that a commit nearly doubles the memory required, because you have two live Searcher objects with all their caches. Make sure you have headroom for a commit. If you want to test the tenured space usage, you must test with real world queries. Those are the only way to get accurate cache eviction rates. wunder _ Bing™ brings you maps, menus, and reviews organized in one place. Try it now. http://www.bing.com/search?q=restaurantsform=MLOGENpubl=WLHMTAGcrea=TEXT_MLOGEN_Core_tagline_local_1x1
anyway to get Document update time stamp
I understand there's no update in Solr/lucene, it's really delete+insert. Is there anyway to get a Document's insert time stamp, w/o explicitely creating such a data field in the document? If so, how can I query it, for instance get all documents that are older than 24 hours? Thanks. _ Hotmail: Free, trusted and rich email service. http://clk.atdmt.com/GBL/go/171222984/direct/01/
Re: Is it problem? I use solr to search and index is made by lucene. (not EmbeddedSolrServer(wiki is old))
solr have much fieldtype, like: integer,long, double, sint, sfloat, tint,tfloat,,and more. but lucene not fieldtype,,just name and value, value only string. so i not sure is it a problem when i use solr to search( index made by lucene). -- regards j.L ( I live in Shanghai, China)
IndexMerge not found
i try http://wiki.apache.org/solr/MergingSolrIndexes system: win2003, jdk 1.6 Error information: Caused by: java.lang.ClassNotFoundException: org.apache.lucene.misc.IndexMergeTo ol at java.net.URLClassLoader$1.run(Unknown Source) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(Unknown Source) at java.lang.ClassLoader.loadClass(Unknown Source) at sun.misc.Launcher$AppClassLoader.loadClass(Unknown Source) at java.lang.ClassLoader.loadClass(Unknown Source) at java.lang.ClassLoader.loadClassInternal(Unknown Source) Could not find the main class: org/apache/lucene/misc/IndexMergeTool. Program w ill exit. -- regards j.L ( I live in Shanghai, China)
Re: IndexMerge not found
i use lucene-core-2.9-dev.jar, lucene-misc-2.9-dev.jar On Thu, Jul 2, 2009 at 2:02 PM, James liu liuping.ja...@gmail.com wrote: i try http://wiki.apache.org/solr/MergingSolrIndexes system: win2003, jdk 1.6 Error information: Caused by: java.lang.ClassNotFoundException: org.apache.lucene.misc.IndexMergeTo ol at java.net.URLClassLoader$1.run(Unknown Source) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(Unknown Source) at java.lang.ClassLoader.loadClass(Unknown Source) at sun.misc.Launcher$AppClassLoader.loadClass(Unknown Source) at java.lang.ClassLoader.loadClass(Unknown Source) at java.lang.ClassLoader.loadClassInternal(Unknown Source) Could not find the main class: org/apache/lucene/misc/IndexMergeTool. Program w ill exit. -- regards j.L ( I live in Shanghai, China) -- regards j.L ( I live in Shanghai, China)
Is it problem? I use solr to search and index is made by lucene. (not EmbeddedSolrServer(wiki is old))
I use solr to search and index is made by lucene. (not EmbeddedSolrServer(wiki is old)) Is it problem when i use solr to search? which the difference between Index(made by lucene and solr)? thks -- regards j.L ( I live in Shanghai, China)
DisMaxRequestHandler usage
Hi, I have this standard query: q=(field1:hello OR field2:hello) AND (field3:world) Can I use dismax handler for this (applying the same search term on field1 and field2, but keep field3 with something separate)? If it can be done, what's the advantage of doing it this way over using the standard query? thanks. _ Microsoft brings you a new way to search the web. Try Bing™ now http://www.bing.com?form=MFEHPGpubl=WLHMTAGcrea=TEXT_MFEHPG_Core_tagline_try bing_1x1
does solr support summary
if user use keyword to search and get summary(auto generated by keyword)...like this doc filed: id, text id: 001 text: Open source is a development method for software that harnesses the power of distributed peer review and transparency of process. The promise of open source is better quality, higher reliability, more flexibility, lower cost, and an end to predatory vendor lock-in. if keyword is source,,summary is: Open source is a development...The promise of open source is better quality if keyword is power ,,,summary is: Open...harnesses the power of distributed peer review and transparency of process... just like google search results... and any advice will be appreciated. -- regards j.L ( I live in Shanghai, China)
Query faceting
Hi, I have a field called service with following values: - Shuttle Services - Senior Discounts - Laundry Rooms - ... When I conduct query with facet=truefacet.field=servicefacet.limit=-1, I get something like this back: - shuttle 2 - service 3 - senior 0 - laundry 0 - room 3 - ... Questions: - How not to break up fields values in words, so I can get something like Shuttle Services 2 back? - How to tell Solr not to return facet with 0 value? The query takes long time to finish, seemingly because of the long list of items with 0 count. thanks for any advice. _ Insert movie times and more without leaving Hotmail®. http://windowslive.com/Tutorial/Hotmail/QuickAdd?ocid=TXT_TAGLM_WL_HM_Tutorial_QuickAdd_062009
Re: timeouts
*Collins: *i don't know what u wanna say? -- regards j.L ( I live in Shanghai, China)
Re: indexing Chienese langage
first: u not have to restart solr,,,u can use new data to replace old data and call solr to use new search..u can find something in shell script which with solr two: u not have to restart solr,,,just keep id is same..example: old id:1,title:hi, new id:1,title:welcome,,just index new data,,it will delete old data and insert new doc,,,like replace,,but it will use more time and resouce. u can find indexed doc number from solr admin page. On Fri, Jun 5, 2009 at 7:42 AM, Fer-Bj fernando.b...@gmail.com wrote: What we usually do to reindex is: 1. stop solr 2. rmdir -r data (that is to remove everything in /opt/solr/data/ 3. mkdir data 4. start solr 5. start reindex. with this we're sure about not having old copies or index.. To check the index size we do: cd data du -sh Otis Gospodnetic wrote: I can't tell what that analyzer does, but I'm guessing it uses n-grams? Maybe consider trying https://issues.apache.org/jira/browse/LUCENE-1629 instead? Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Fer-Bj fernando.b...@gmail.com To: solr-user@lucene.apache.org Sent: Thursday, June 4, 2009 2:20:03 AM Subject: Re: indexing Chienese langage We are trying SOLR 1.3 with Paoding Chinese Analyzer , and after reindexing the index size went from 1.5 Gb to 2.7 Gb. Is that some expected behavior ? Is there any switch or trick to avoid having a double + index file size? Koji Sekiguchi-2 wrote: CharFilter can normalize (convert) traditional chinese to simplified chinese or vice versa, if you define mapping.txt. Here is the sample of Chinese character normalization: https://issues.apache.org/jira/secure/attachment/12392639/character-normalization.JPG See SOLR-822 for the detail: https://issues.apache.org/jira/browse/SOLR-822 Koji revathy arun wrote: Hi, When I index chinese content using chinese tokenizer and analyzer in solr 1.3 ,some of the chinese text files are getting indexed but others are not. Since chinese has got many different language subtypes as in standard chinese,simplified chinese etc which of these does the chinese tokenizer support and is there any method to find the type of chiense language from the file? Rgds -- View this message in context: http://www.nabble.com/indexing-Chienese-langage-tp22033302p23864358.html Sent from the Solr - User mailing list archive at Nabble.com. -- View this message in context: http://www.nabble.com/indexing-Chienese-langage-tp22033302p23879730.html Sent from the Solr - User mailing list archive at Nabble.com. -- regards j.L ( I live in Shanghai, China)
Re: indexing Chienese langage
On Mon, Feb 16, 2009 at 4:30 PM, revathy arun revas...@gmail.com wrote: Hi, When I index chinese content using chinese tokenizer and analyzer in solr 1.3 ,some of the chinese text files are getting indexed but others are not. are u sure ur analyzer can do it good? if not sure, u can use analzyer link in solr admin page to check it Since chinese has got many different language subtypes as in standard chinese,simplified chinese etc which of these does the chinese tokenizer support and is there any method to find the type of chiense language from the file? Rgds -- regards j.L ( I live in Shanghai, China)
Re: Using Chinese / How to ?
1: modify ur schema.xml: like fieldtype name=text_cn class=solr.TextField analyzer class=chineseAnalyzer/ analyzer 2: add your field: field name=urfield type=text_cn indexd=true stored=true/ 3: add your analyzer to {solr_dir}\lib\ 4: rebuild newsolr and u will find it in {solr_dir}\dist 5: follow tutorial to setup solr 6: open your browser to solr admin page, find analyzer to check analyzer, it will tell u how to analyzer world, use which analyzer -- regards j.L ( I live in Shanghai, China)
Re: Using Chinese / How to ?
u means how to config solr which support chinese? Update problem? On Tuesday, June 2, 2009, Fer-Bj fernando.b...@gmail.com wrote: I'm sending 3 files: - schema.xml - solrconfig.xml - error.txt (with the error description) I can confirm by now that this error is due to invalid characters for the XML format (ASCII 0 or 11). However, this problem now is taking a different direction: how to start using the CJK instead of the english! http://www.nabble.com/file/p23825881/error.txt error.txt http://www.nabble.com/file/p23825881/solrconfig.xml solrconfig.xml http://www.nabble.com/file/p23825881/schema.xml schema.xml Grant Ingersoll-6 wrote: Can you provide details on the errors? I don't think we have a specific how to, but I wouldn't think it would be much different from 1.2 -Grant On May 31, 2009, at 10:31 PM, Fer-Bj wrote: Hello, is there any how to already created to get me up using SOLR 1.3 running for a chinese based website? Currently our site is using SOLR 1.2, and we tried to move into 1.3 but we couldn't complete our reindex as it seems like 1.3 is more strict when it comes to special chars. I would appreciate any help anyone may provide on this. Thanks!! -- View this message in context: http://www.nabble.com/Using-Chinese---How-to---tp23810129p23810129.html Sent from the Solr - User mailing list archive at Nabble.com. -- Grant Ingersoll http://www.lucidimagination.com/ Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using Solr/Lucene: http://www.lucidimagination.com/search -- View this message in context: http://www.nabble.com/Using-Chinese---How-to---tp23810129p23825881.html Sent from the Solr - User mailing list archive at Nabble.com. -- regards j.L ( I live in Shanghai, China)
Re: Solr multiple keyword search as google
U can find answer in tutorial or example On Tuesday, June 2, 2009, The Spider maheshmura...@rediffmail.com wrote: Hi, I am using solr nightly bind for my search. I have to search in the location field of the table which is not my default search field. I will briefly explain my requirement below: I want to get the same/similar result when I give location multiple keywords, say San jose ca USA or USA ca san jose or CA San jose USA (like that of google search). That means even if I rearranged the keywords of location I want to get proper results. Is there any way to do that? Thanks in advance -- View this message in context: http://www.nabble.com/Solr-multiple-keyword-search-as-google-tp23826278p23826278.html Sent from the Solr - User mailing list archive at Nabble.com. -- regards j.L ( I live in Shanghai, China)
RE: Creating a distributed search in a searchComponent
I was looking for answer to the same question, and have similar concern. Looks like any serious customization work requires developing custom SearchComponent, but it's not clear to me how Solr designer wanted this to be done. I have more confident to either do it at Lucene level, or stay on client side and using something like Multi-core (as discussed here http://wiki.apache.org/solr/MultipleIndexes). Date: Wed, 20 May 2009 13:47:20 -0400 Subject: RE: Creating a distributed search in a searchComponent From: nicholas.bai...@rackspace.com To: solr-user@lucene.apache.org It seems I sent this out a bit too soon. After looking at the source it seems there are two seperate paths for distributed and regular queries, however the prepare method for for all components is run before the shards parameter is checked. So I can build the shards portion by using the prepare method of the my own search component. However I'm not sure if this is the greatest idea in case solr changes at some point. -Nick -Original Message- From: Nick Bailey nicholas.bai...@rackspace.com Sent: Wednesday, May 20, 2009 1:29pm To: solr-user@lucene.apache.org Subject: Creating a distributed search in a searchComponent Hi, I am wondering if it is possible to basically add the distributed portion of a search query inside of a searchComponent. I am hoping to build my own component and add it as a first-component to the StandardRequestHandler. Then hopefully I will be able to use this component to build the shards parameter of the query and have the Handler then treat the query as a distributed search. Anyone have any experience or know if this is possible? Thanks, Nick _ Hotmail® has ever-growing storage! Don’t worry about storage limits. http://windowslive.com/Tutorial/Hotmail/Storage?ocid=TXT_TAGLM_WL_HM_Tutorial_Storage1_052009