Re: Difference Between Indexing and Reindexing
Hi Otis, then what is the difference between add and update? And how we update or add documents into Solr (I see that there is just one update handler)? 2013/4/4 Otis Gospodnetic otis.gospodne...@gmail.com I don't recall what Nutch does, so it's hard to tell. In Solr (Lucene, really), you can: * add documents * update documents * delete documents Currently, update is really a delete + readd under the hood. It's been like that for 13+ years, but this may change: https://issues.apache.org/jira/browse/LUCENE-4258 Otis -- Solr ElasticSearch Support http://sematext.com/ On Wed, Apr 3, 2013 at 9:15 PM, Furkan KAMACI furkankam...@gmail.com wrote: OK, This could be a so easy question but I want to learn just a bit more technical detail of it. When I use Nutch to send documents to Solr to be indexing there are two parameters: -index and -reindex. What Solr does at each one different from the other one?
Re: Difference Between Indexing and Reindexing
Technically, update and add are identical from a user perspective - you don't need to worry about whether the document already exists. But, there is another, newer form of update, selective or atomic which is updating a subset of the fields in an existing document without needing to re-send all of the other fields of the existing document. See: http://wiki.apache.org/solr/Atomic_Updates But... none of this has to do with indexing vs. reindexing... you need to be clear what real question you are trying to ask, otherwise we can keeping following your questions, answering each in detail, bouncing all over the place without understanding what it is that you are really looking for. More specifically, what exactly is the problem you are trying to solve? -- Jack Krupansky -Original Message- From: Furkan KAMACI Sent: Thursday, April 04, 2013 2:45 AM To: solr-user@lucene.apache.org Subject: Re: Difference Between Indexing and Reindexing Hi Otis, then what is the difference between add and update? And how we update or add documents into Solr (I see that there is just one update handler)? 2013/4/4 Otis Gospodnetic otis.gospodne...@gmail.com I don't recall what Nutch does, so it's hard to tell. In Solr (Lucene, really), you can: * add documents * update documents * delete documents Currently, update is really a delete + readd under the hood. It's been like that for 13+ years, but this may change: https://issues.apache.org/jira/browse/LUCENE-4258 Otis -- Solr ElasticSearch Support http://sematext.com/ On Wed, Apr 3, 2013 at 9:15 PM, Furkan KAMACI furkankam...@gmail.com wrote: OK, This could be a so easy question but I want to learn just a bit more technical detail of it. When I use Nutch to send documents to Solr to be indexing there are two parameters: -index and -reindex. What Solr does at each one different from the other one?
Re: Difference Between Indexing and Reindexing
I craw webages with Nutch and send them to Solr for indexing. There are two parameters to send data into Solr. One of them is -index and the other one is -reindex. I just want to learn what they do. 2013/4/4 Jack Krupansky j...@basetechnology.com Technically, update and add are identical from a user perspective - you don't need to worry about whether the document already exists. But, there is another, newer form of update, selective or atomic which is updating a subset of the fields in an existing document without needing to re-send all of the other fields of the existing document. See: http://wiki.apache.org/solr/**Atomic_Updateshttp://wiki.apache.org/solr/Atomic_Updates But... none of this has to do with indexing vs. reindexing... you need to be clear what real question you are trying to ask, otherwise we can keeping following your questions, answering each in detail, bouncing all over the place without understanding what it is that you are really looking for. More specifically, what exactly is the problem you are trying to solve? -- Jack Krupansky -Original Message- From: Furkan KAMACI Sent: Thursday, April 04, 2013 2:45 AM To: solr-user@lucene.apache.org Subject: Re: Difference Between Indexing and Reindexing Hi Otis, then what is the difference between add and update? And how we update or add documents into Solr (I see that there is just one update handler)? 2013/4/4 Otis Gospodnetic otis.gospodne...@gmail.com I don't recall what Nutch does, so it's hard to tell. In Solr (Lucene, really), you can: * add documents * update documents * delete documents Currently, update is really a delete + readd under the hood. It's been like that for 13+ years, but this may change: https://issues.apache.org/**jira/browse/LUCENE-4258https://issues.apache.org/jira/browse/LUCENE-4258 Otis -- Solr ElasticSearch Support http://sematext.com/ On Wed, Apr 3, 2013 at 9:15 PM, Furkan KAMACI furkankam...@gmail.com wrote: OK, This could be a so easy question but I want to learn just a bit more technical detail of it. When I use Nutch to send documents to Solr to be indexing there are two parameters: -index and -reindex. What Solr does at each one different from the other one?
Re: Difference Between Indexing and Reindexing
That's a question for the Nutch email list. In Solr, reindexing simply means that you manually delete your full Solr index (or at least delete all documents using a query) and fully ingest all documents, from scratch. There is no option, it's just something that you, the user/developer, do manually. But, as I said... it sounds like your question is not for us here at the Solr list, but for the Nutch guys on their list. -- Jack Krupansky -Original Message- From: Furkan KAMACI Sent: Thursday, April 04, 2013 9:03 AM To: solr-user@lucene.apache.org Subject: Re: Difference Between Indexing and Reindexing I craw webages with Nutch and send them to Solr for indexing. There are two parameters to send data into Solr. One of them is -index and the other one is -reindex. I just want to learn what they do. 2013/4/4 Jack Krupansky j...@basetechnology.com Technically, update and add are identical from a user perspective - you don't need to worry about whether the document already exists. But, there is another, newer form of update, selective or atomic which is updating a subset of the fields in an existing document without needing to re-send all of the other fields of the existing document. See: http://wiki.apache.org/solr/**Atomic_Updateshttp://wiki.apache.org/solr/Atomic_Updates But... none of this has to do with indexing vs. reindexing... you need to be clear what real question you are trying to ask, otherwise we can keeping following your questions, answering each in detail, bouncing all over the place without understanding what it is that you are really looking for. More specifically, what exactly is the problem you are trying to solve? -- Jack Krupansky -Original Message- From: Furkan KAMACI Sent: Thursday, April 04, 2013 2:45 AM To: solr-user@lucene.apache.org Subject: Re: Difference Between Indexing and Reindexing Hi Otis, then what is the difference between add and update? And how we update or add documents into Solr (I see that there is just one update handler)? 2013/4/4 Otis Gospodnetic otis.gospodne...@gmail.com I don't recall what Nutch does, so it's hard to tell. In Solr (Lucene, really), you can: * add documents * update documents * delete documents Currently, update is really a delete + readd under the hood. It's been like that for 13+ years, but this may change: https://issues.apache.org/**jira/browse/LUCENE-4258https://issues.apache.org/jira/browse/LUCENE-4258 Otis -- Solr ElasticSearch Support http://sematext.com/ On Wed, Apr 3, 2013 at 9:15 PM, Furkan KAMACI furkankam...@gmail.com wrote: OK, This could be a so easy question but I want to learn just a bit more technical detail of it. When I use Nutch to send documents to Solr to be indexing there are two parameters: -index and -reindex. What Solr does at each one different from the other one?
RE: Difference Between Indexing and Reindexing
I assume you're using Nutch 2.x? Nutch 1.x does not have such an option and i find it strange to hear 2.x does. It really makes no sense to have a -reindex option and it should be removed. I'd recommend to stick to plain indexing. -Original message- From:Jack Krupansky j...@basetechnology.com Sent: Thu 04-Apr-2013 15:31 To: solr-user@lucene.apache.org Subject: Re: Difference Between Indexing and Reindexing That's a question for the Nutch email list. In Solr, reindexing simply means that you manually delete your full Solr index (or at least delete all documents using a query) and fully ingest all documents, from scratch. There is no option, it's just something that you, the user/developer, do manually. But, as I said... it sounds like your question is not for us here at the Solr list, but for the Nutch guys on their list. -- Jack Krupansky -Original Message- From: Furkan KAMACI Sent: Thursday, April 04, 2013 9:03 AM To: solr-user@lucene.apache.org Subject: Re: Difference Between Indexing and Reindexing I craw webages with Nutch and send them to Solr for indexing. There are two parameters to send data into Solr. One of them is -index and the other one is -reindex. I just want to learn what they do. 2013/4/4 Jack Krupansky j...@basetechnology.com Technically, update and add are identical from a user perspective - you don't need to worry about whether the document already exists. But, there is another, newer form of update, selective or atomic which is updating a subset of the fields in an existing document without needing to re-send all of the other fields of the existing document. See: http://wiki.apache.org/solr/**Atomic_Updateshttp://wiki.apache.org/solr/Atomic_Updates But... none of this has to do with indexing vs. reindexing... you need to be clear what real question you are trying to ask, otherwise we can keeping following your questions, answering each in detail, bouncing all over the place without understanding what it is that you are really looking for. More specifically, what exactly is the problem you are trying to solve? -- Jack Krupansky -Original Message- From: Furkan KAMACI Sent: Thursday, April 04, 2013 2:45 AM To: solr-user@lucene.apache.org Subject: Re: Difference Between Indexing and Reindexing Hi Otis, then what is the difference between add and update? And how we update or add documents into Solr (I see that there is just one update handler)? 2013/4/4 Otis Gospodnetic otis.gospodne...@gmail.com I don't recall what Nutch does, so it's hard to tell. In Solr (Lucene, really), you can: * add documents * update documents * delete documents Currently, update is really a delete + readd under the hood. It's been like that for 13+ years, but this may change: https://issues.apache.org/**jira/browse/LUCENE-4258https://issues.apache.org/jira/browse/LUCENE-4258 Otis -- Solr ElasticSearch Support http://sematext.com/ On Wed, Apr 3, 2013 at 9:15 PM, Furkan KAMACI furkankam...@gmail.com wrote: OK, This could be a so easy question but I want to learn just a bit more technical detail of it. When I use Nutch to send documents to Solr to be indexing there are two parameters: -index and -reindex. What Solr does at each one different from the other one?
Re: Difference Between Indexing and Reindexing
On Thu, Apr 4, 2013 at 9:03 AM, Furkan KAMACI furkankam...@gmail.comwrote: I craw webages with Nutch and send them to Solr for indexing. There are two parameters to send data into Solr. One of them is -index and the other one is -reindex. I just want to learn what they do. Are you sure this is not Nutch-side issue. Perhaps it means revisit the already-indexed pages and resubmit them to Solr (which will override the entries). A quick look at the Nutch source might be able to tell you that. Another way to check is to check Solr-side and see what kind of requests are being made in both cases. The logs should show that. I strongly doubt Solr's new 'update' functionality is being used by Nutch at this point. Regards, Alex. Personal blog: http://blog.outerthoughts.com/ LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch - Time is the quality of nature that keeps events from happening all at once. Lately, it doesn't seem to be working. (Anonymous - via GTD book)
Re: Difference Between Indexing and Reindexing
On 4 April 2013 18:33, Furkan KAMACI furkankam...@gmail.com wrote: I craw webages with Nutch and send them to Solr for indexing. There are two parameters to send data into Solr. One of them is -index and the other one is -reindex. I just want to learn what they do. [...] Which version of Nutch are you using? Unless I have completely missed something, both 1.6 and 2.1 use solrindex: http://wiki.apache.org/nutch/NutchTutorial#A6._Integrate_Solr_with_Nutch Where do you see -index and -reindex? Regards, Gora
Re: Difference Between Indexing and Reindexing
I use Nutch 2.1 and using that: bin/nutch solrindex http://localhost:8983/solr -index bin/nutch solrindex http://localhost:8983/solr -reindex 2013/4/4 Gora Mohanty g...@mimirtech.com On 4 April 2013 18:33, Furkan KAMACI furkankam...@gmail.com wrote: I craw webages with Nutch and send them to Solr for indexing. There are two parameters to send data into Solr. One of them is -index and the other one is -reindex. I just want to learn what they do. [...] Which version of Nutch are you using? Unless I have completely missed something, both 1.6 and 2.1 use solrindex: http://wiki.apache.org/nutch/NutchTutorial#A6._Integrate_Solr_with_Nutch Where do you see -index and -reindex? Regards, Gora
Re: Difference Between Indexing and Reindexing
On 4 April 2013 19:29, Furkan KAMACI furkankam...@gmail.com wrote: I use Nutch 2.1 and using that: bin/nutch solrindex http://localhost:8983/solr -index bin/nutch solrindex http://localhost:8983/solr -reindex [...] Sorry, but are you sure that you are using 2.1. Here is what I get with: ./bin/nutch solrindex Usage: SolrIndexer solr url crawldb [-linkdb linkdb] [-params k1=v1k2=v2...] (segment ... | -dir segments) [-noCommit] [-deleteGone] [-filter] [-normalize] i.e., there are no index/reindex options, and nor do I see any in the code for src/java/org/apache/nutch/indexer/solr/SolrIndexer.java Regards, Gora
Re: Difference Between Indexing and Reindexing
On 4 April 2013 20:16, Gora Mohanty g...@mimirtech.com wrote: On 4 April 2013 19:29, Furkan KAMACI furkankam...@gmail.com wrote: I use Nutch 2.1 and using that: bin/nutch solrindex http://localhost:8983/solr -index bin/nutch solrindex http://localhost:8983/solr -reindex [...] Sorry, but are you sure that you are using 2.1. Here is what I get with: ./bin/nutch solrindex [...] I am running in local mode, however, as I do not currently have access to a Hadoop cluster. Regards, Gora
Re: Difference Between Indexing and Reindexing
It may be a deprecated usage(maybe not) but certainly can run -index and -reindex on Nutch 2.1. 2013/4/4 Gora Mohanty g...@mimirtech.com On 4 April 2013 20:16, Gora Mohanty g...@mimirtech.com wrote: On 4 April 2013 19:29, Furkan KAMACI furkankam...@gmail.com wrote: I use Nutch 2.1 and using that: bin/nutch solrindex http://localhost:8983/solr -index bin/nutch solrindex http://localhost:8983/solr -reindex [...] Sorry, but are you sure that you are using 2.1. Here is what I get with: ./bin/nutch solrindex [...] I am running in local mode, however, as I do not currently have access to a Hadoop cluster. Regards, Gora
Re: Difference Between Indexing and Reindexing
Could you guys please take this discussion offline or over to a Nutch mailing list - where it belongs? This has nothing to do with Solr. -- Jack Krupansky -Original Message- From: Gora Mohanty Sent: Thursday, April 04, 2013 10:46 AM To: solr-user@lucene.apache.org Subject: Re: Difference Between Indexing and Reindexing On 4 April 2013 19:29, Furkan KAMACI furkankam...@gmail.com wrote: I use Nutch 2.1 and using that: bin/nutch solrindex http://localhost:8983/solr -index bin/nutch solrindex http://localhost:8983/solr -reindex [...] Sorry, but are you sure that you are using 2.1. Here is what I get with: ./bin/nutch solrindex Usage: SolrIndexer solr url crawldb [-linkdb linkdb] [-params k1=v1k2=v2...] (segment ... | -dir segments) [-noCommit] [-deleteGone] [-filter] [-normalize] i.e., there are no index/reindex options, and nor do I see any in the code for src/java/org/apache/nutch/indexer/solr/SolrIndexer.java Regards, Gora
Difference Between Indexing and Reindexing
OK, This could be a so easy question but I want to learn just a bit more technical detail of it. When I use Nutch to send documents to Solr to be indexing there are two parameters: -index and -reindex. What Solr does at each one different from the other one?
Re: Difference Between Indexing and Reindexing
I don't recall what Nutch does, so it's hard to tell. In Solr (Lucene, really), you can: * add documents * update documents * delete documents Currently, update is really a delete + readd under the hood. It's been like that for 13+ years, but this may change: https://issues.apache.org/jira/browse/LUCENE-4258 Otis -- Solr ElasticSearch Support http://sematext.com/ On Wed, Apr 3, 2013 at 9:15 PM, Furkan KAMACI furkankam...@gmail.com wrote: OK, This could be a so easy question but I want to learn just a bit more technical detail of it. When I use Nutch to send documents to Solr to be indexing there are two parameters: -index and -reindex. What Solr does at each one different from the other one?