Re: Difference Between Indexing and Reindexing

2013-04-04 Thread Furkan KAMACI
Hi Otis, then what is the difference between add and update? And how we
update or add documents into Solr (I see that there is just one update
handler)?


2013/4/4 Otis Gospodnetic otis.gospodne...@gmail.com

 I don't recall what Nutch does, so it's hard to tell.

 In Solr (Lucene, really), you can:
 * add documents
 * update documents
 * delete documents

 Currently, update is really a delete + readd under the hood.  It's
 been like that for 13+ years, but this may change:
 https://issues.apache.org/jira/browse/LUCENE-4258

 Otis
 --
 Solr  ElasticSearch Support
 http://sematext.com/





 On Wed, Apr 3, 2013 at 9:15 PM, Furkan KAMACI furkankam...@gmail.com
 wrote:
  OK, This could be a so easy question but I want to learn just a bit more
  technical detail of it.
  When I use Nutch to send documents to Solr to be indexing there are two
  parameters:
 
  -index and -reindex.
 
  What Solr does at each one different from the other one?



Re: Difference Between Indexing and Reindexing

2013-04-04 Thread Jack Krupansky
Technically, update and add are identical from a user perspective - you 
don't need to worry about whether the document already exists.


But, there is another, newer form of update, selective or atomic which 
is updating a subset of the fields in an existing document without needing 
to re-send all of the other fields of the existing document.

See:
http://wiki.apache.org/solr/Atomic_Updates

But... none of this has to do with indexing vs. reindexing... you need 
to be clear what real question you are trying to ask, otherwise we can 
keeping following your questions, answering each in detail, bouncing all 
over the place without understanding what it is that you are really looking 
for.


More specifically, what exactly is the problem you are trying to solve?

-- Jack Krupansky

-Original Message- 
From: Furkan KAMACI

Sent: Thursday, April 04, 2013 2:45 AM
To: solr-user@lucene.apache.org
Subject: Re: Difference Between Indexing and Reindexing

Hi Otis, then what is the difference between add and update? And how we
update or add documents into Solr (I see that there is just one update
handler)?


2013/4/4 Otis Gospodnetic otis.gospodne...@gmail.com


I don't recall what Nutch does, so it's hard to tell.

In Solr (Lucene, really), you can:
* add documents
* update documents
* delete documents

Currently, update is really a delete + readd under the hood.  It's
been like that for 13+ years, but this may change:
https://issues.apache.org/jira/browse/LUCENE-4258

Otis
--
Solr  ElasticSearch Support
http://sematext.com/





On Wed, Apr 3, 2013 at 9:15 PM, Furkan KAMACI furkankam...@gmail.com
wrote:
 OK, This could be a so easy question but I want to learn just a bit more
 technical detail of it.
 When I use Nutch to send documents to Solr to be indexing there are two
 parameters:

 -index and -reindex.

 What Solr does at each one different from the other one?





Re: Difference Between Indexing and Reindexing

2013-04-04 Thread Furkan KAMACI
I craw webages with Nutch and send them to Solr for indexing. There are two
parameters to send data into Solr. One of them is -index and the other one
is -reindex. I just want to learn what they do.


2013/4/4 Jack Krupansky j...@basetechnology.com

 Technically, update and add are identical from a user perspective - you
 don't need to worry about whether the document already exists.

 But, there is another, newer form of update, selective or atomic which
 is updating a subset of the fields in an existing document without needing
 to re-send all of the other fields of the existing document.
 See:
 http://wiki.apache.org/solr/**Atomic_Updateshttp://wiki.apache.org/solr/Atomic_Updates

 But... none of this has to do with indexing vs. reindexing... you need
 to be clear what real question you are trying to ask, otherwise we can
 keeping following your questions, answering each in detail, bouncing all
 over the place without understanding what it is that you are really looking
 for.

 More specifically, what exactly is the problem you are trying to solve?

 -- Jack Krupansky

 -Original Message- From: Furkan KAMACI
 Sent: Thursday, April 04, 2013 2:45 AM
 To: solr-user@lucene.apache.org
 Subject: Re: Difference Between Indexing and Reindexing


 Hi Otis, then what is the difference between add and update? And how we
 update or add documents into Solr (I see that there is just one update
 handler)?


 2013/4/4 Otis Gospodnetic otis.gospodne...@gmail.com

  I don't recall what Nutch does, so it's hard to tell.

 In Solr (Lucene, really), you can:
 * add documents
 * update documents
 * delete documents

 Currently, update is really a delete + readd under the hood.  It's
 been like that for 13+ years, but this may change:
 https://issues.apache.org/**jira/browse/LUCENE-4258https://issues.apache.org/jira/browse/LUCENE-4258

 Otis
 --
 Solr  ElasticSearch Support
 http://sematext.com/





 On Wed, Apr 3, 2013 at 9:15 PM, Furkan KAMACI furkankam...@gmail.com
 wrote:
  OK, This could be a so easy question but I want to learn just a bit more
  technical detail of it.
  When I use Nutch to send documents to Solr to be indexing there are two
  parameters:
 
  -index and -reindex.
 
  What Solr does at each one different from the other one?





Re: Difference Between Indexing and Reindexing

2013-04-04 Thread Jack Krupansky

That's a question for the Nutch email list.

In Solr, reindexing simply means that you manually delete your full Solr 
index (or at least delete all documents using a query) and fully ingest all 
documents, from scratch. There is no option, it's just something that you, 
the user/developer, do manually.


But, as I said... it sounds like your question is not for us here at the 
Solr list, but for the Nutch guys on their list.


-- Jack Krupansky

-Original Message- 
From: Furkan KAMACI

Sent: Thursday, April 04, 2013 9:03 AM
To: solr-user@lucene.apache.org
Subject: Re: Difference Between Indexing and Reindexing

I craw webages with Nutch and send them to Solr for indexing. There are two
parameters to send data into Solr. One of them is -index and the other one
is -reindex. I just want to learn what they do.


2013/4/4 Jack Krupansky j...@basetechnology.com


Technically, update and add are identical from a user perspective - you
don't need to worry about whether the document already exists.

But, there is another, newer form of update, selective or atomic which
is updating a subset of the fields in an existing document without needing
to re-send all of the other fields of the existing document.
See:
http://wiki.apache.org/solr/**Atomic_Updateshttp://wiki.apache.org/solr/Atomic_Updates

But... none of this has to do with indexing vs. reindexing... you need
to be clear what real question you are trying to ask, otherwise we can
keeping following your questions, answering each in detail, bouncing all
over the place without understanding what it is that you are really 
looking

for.

More specifically, what exactly is the problem you are trying to solve?

-- Jack Krupansky

-Original Message- From: Furkan KAMACI
Sent: Thursday, April 04, 2013 2:45 AM
To: solr-user@lucene.apache.org
Subject: Re: Difference Between Indexing and Reindexing


Hi Otis, then what is the difference between add and update? And how we
update or add documents into Solr (I see that there is just one update
handler)?


2013/4/4 Otis Gospodnetic otis.gospodne...@gmail.com

 I don't recall what Nutch does, so it's hard to tell.


In Solr (Lucene, really), you can:
* add documents
* update documents
* delete documents

Currently, update is really a delete + readd under the hood.  It's
been like that for 13+ years, but this may change:
https://issues.apache.org/**jira/browse/LUCENE-4258https://issues.apache.org/jira/browse/LUCENE-4258

Otis
--
Solr  ElasticSearch Support
http://sematext.com/





On Wed, Apr 3, 2013 at 9:15 PM, Furkan KAMACI furkankam...@gmail.com
wrote:
 OK, This could be a so easy question but I want to learn just a bit 
 more

 technical detail of it.
 When I use Nutch to send documents to Solr to be indexing there are two
 parameters:

 -index and -reindex.

 What Solr does at each one different from the other one?








RE: Difference Between Indexing and Reindexing

2013-04-04 Thread Markus Jelsma
I assume you're using Nutch 2.x? Nutch 1.x does not have such an option and i 
find it strange to hear 2.x does. It really makes no sense to have a -reindex 
option and it should be removed. I'd recommend to stick to plain indexing. 
 
-Original message-
 From:Jack Krupansky j...@basetechnology.com
 Sent: Thu 04-Apr-2013 15:31
 To: solr-user@lucene.apache.org
 Subject: Re: Difference Between Indexing and Reindexing
 
 That's a question for the Nutch email list.
 
 In Solr, reindexing simply means that you manually delete your full Solr 
 index (or at least delete all documents using a query) and fully ingest all 
 documents, from scratch. There is no option, it's just something that you, 
 the user/developer, do manually.
 
 But, as I said... it sounds like your question is not for us here at the 
 Solr list, but for the Nutch guys on their list.
 
 -- Jack Krupansky
 
 -Original Message- 
 From: Furkan KAMACI
 Sent: Thursday, April 04, 2013 9:03 AM
 To: solr-user@lucene.apache.org
 Subject: Re: Difference Between Indexing and Reindexing
 
 I craw webages with Nutch and send them to Solr for indexing. There are two
 parameters to send data into Solr. One of them is -index and the other one
 is -reindex. I just want to learn what they do.
 
 
 2013/4/4 Jack Krupansky j...@basetechnology.com
 
  Technically, update and add are identical from a user perspective - you
  don't need to worry about whether the document already exists.
 
  But, there is another, newer form of update, selective or atomic which
  is updating a subset of the fields in an existing document without needing
  to re-send all of the other fields of the existing document.
  See:
  http://wiki.apache.org/solr/**Atomic_Updateshttp://wiki.apache.org/solr/Atomic_Updates
 
  But... none of this has to do with indexing vs. reindexing... you need
  to be clear what real question you are trying to ask, otherwise we can
  keeping following your questions, answering each in detail, bouncing all
  over the place without understanding what it is that you are really 
  looking
  for.
 
  More specifically, what exactly is the problem you are trying to solve?
 
  -- Jack Krupansky
 
  -Original Message- From: Furkan KAMACI
  Sent: Thursday, April 04, 2013 2:45 AM
  To: solr-user@lucene.apache.org
  Subject: Re: Difference Between Indexing and Reindexing
 
 
  Hi Otis, then what is the difference between add and update? And how we
  update or add documents into Solr (I see that there is just one update
  handler)?
 
 
  2013/4/4 Otis Gospodnetic otis.gospodne...@gmail.com
 
   I don't recall what Nutch does, so it's hard to tell.
 
  In Solr (Lucene, really), you can:
  * add documents
  * update documents
  * delete documents
 
  Currently, update is really a delete + readd under the hood.  It's
  been like that for 13+ years, but this may change:
  https://issues.apache.org/**jira/browse/LUCENE-4258https://issues.apache.org/jira/browse/LUCENE-4258
 
  Otis
  --
  Solr  ElasticSearch Support
  http://sematext.com/
 
 
 
 
 
  On Wed, Apr 3, 2013 at 9:15 PM, Furkan KAMACI furkankam...@gmail.com
  wrote:
   OK, This could be a so easy question but I want to learn just a bit 
   more
   technical detail of it.
   When I use Nutch to send documents to Solr to be indexing there are two
   parameters:
  
   -index and -reindex.
  
   What Solr does at each one different from the other one?
 
 
  
 
 


Re: Difference Between Indexing and Reindexing

2013-04-04 Thread Alexandre Rafalovitch
On Thu, Apr 4, 2013 at 9:03 AM, Furkan KAMACI furkankam...@gmail.comwrote:

 I craw webages with Nutch and send them to Solr for indexing. There are two
 parameters to send data into Solr. One of them is -index and the other one
 is -reindex. I just want to learn what they do.


Are you sure this is not Nutch-side issue. Perhaps it means revisit the
already-indexed pages and resubmit them to Solr (which will override the
entries). A quick look at the Nutch source might be able to tell you that.

Another way to check is to check Solr-side and see what kind of requests
are being made in both cases. The logs should show that.

I strongly doubt Solr's new 'update' functionality is being used by Nutch
at this point.

Regards,
   Alex.

Personal blog: http://blog.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all at
once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD book)


Re: Difference Between Indexing and Reindexing

2013-04-04 Thread Gora Mohanty
On 4 April 2013 18:33, Furkan KAMACI furkankam...@gmail.com wrote:
 I craw webages with Nutch and send them to Solr for indexing. There are two
 parameters to send data into Solr. One of them is -index and the other one
 is -reindex. I just want to learn what they do.
[...]

Which version of Nutch are you using?
Unless I have completely missed something, both 1.6 and 2.1
use solrindex: 
http://wiki.apache.org/nutch/NutchTutorial#A6._Integrate_Solr_with_Nutch

Where do you see -index and -reindex?

Regards,
Gora


Re: Difference Between Indexing and Reindexing

2013-04-04 Thread Furkan KAMACI
I use Nutch 2.1 and using that:

bin/nutch solrindex http://localhost:8983/solr -index
bin/nutch solrindex http://localhost:8983/solr -reindex


2013/4/4 Gora Mohanty g...@mimirtech.com

 On 4 April 2013 18:33, Furkan KAMACI furkankam...@gmail.com wrote:
  I craw webages with Nutch and send them to Solr for indexing. There are
 two
  parameters to send data into Solr. One of them is -index and the other
 one
  is -reindex. I just want to learn what they do.
 [...]

 Which version of Nutch are you using?
 Unless I have completely missed something, both 1.6 and 2.1
 use solrindex:
 http://wiki.apache.org/nutch/NutchTutorial#A6._Integrate_Solr_with_Nutch

 Where do you see -index and -reindex?

 Regards,
 Gora



Re: Difference Between Indexing and Reindexing

2013-04-04 Thread Gora Mohanty
On 4 April 2013 19:29, Furkan KAMACI furkankam...@gmail.com wrote:
 I use Nutch 2.1 and using that:

 bin/nutch solrindex http://localhost:8983/solr -index
 bin/nutch solrindex http://localhost:8983/solr -reindex
[...]

Sorry, but are you sure that you are using 2.1. Here is
what I get with:
./bin/nutch solrindex

Usage: SolrIndexer solr url crawldb [-linkdb linkdb] [-params
k1=v1k2=v2...] (segment ... | -dir segments) [-noCommit]
[-deleteGone] [-filter] [-normalize]

i.e., there are no index/reindex options, and nor do I see any in
the code for src/java/org/apache/nutch/indexer/solr/SolrIndexer.java

Regards,
Gora


Re: Difference Between Indexing and Reindexing

2013-04-04 Thread Gora Mohanty
On 4 April 2013 20:16, Gora Mohanty g...@mimirtech.com wrote:
 On 4 April 2013 19:29, Furkan KAMACI furkankam...@gmail.com wrote:
 I use Nutch 2.1 and using that:

 bin/nutch solrindex http://localhost:8983/solr -index
 bin/nutch solrindex http://localhost:8983/solr -reindex
 [...]

 Sorry, but are you sure that you are using 2.1. Here is
 what I get with:
 ./bin/nutch solrindex
[...]

I am running in local mode, however, as I do not currently
have access to a Hadoop cluster.

Regards,
Gora


Re: Difference Between Indexing and Reindexing

2013-04-04 Thread Furkan KAMACI
It may be a deprecated usage(maybe not) but certainly can run -index and
-reindex on Nutch 2.1.


2013/4/4 Gora Mohanty g...@mimirtech.com

 On 4 April 2013 20:16, Gora Mohanty g...@mimirtech.com wrote:
  On 4 April 2013 19:29, Furkan KAMACI furkankam...@gmail.com wrote:
  I use Nutch 2.1 and using that:
 
  bin/nutch solrindex http://localhost:8983/solr -index
  bin/nutch solrindex http://localhost:8983/solr -reindex
  [...]
 
  Sorry, but are you sure that you are using 2.1. Here is
  what I get with:
  ./bin/nutch solrindex
 [...]

 I am running in local mode, however, as I do not currently
 have access to a Hadoop cluster.

 Regards,
 Gora



Re: Difference Between Indexing and Reindexing

2013-04-04 Thread Jack Krupansky
Could you guys please take this discussion offline or over to a Nutch 
mailing list - where it belongs? This has nothing to do with Solr.


-- Jack Krupansky
-Original Message- 
From: Gora Mohanty

Sent: Thursday, April 04, 2013 10:46 AM
To: solr-user@lucene.apache.org
Subject: Re: Difference Between Indexing and Reindexing

On 4 April 2013 19:29, Furkan KAMACI furkankam...@gmail.com wrote:

I use Nutch 2.1 and using that:

bin/nutch solrindex http://localhost:8983/solr -index
bin/nutch solrindex http://localhost:8983/solr -reindex

[...]

Sorry, but are you sure that you are using 2.1. Here is
what I get with:
./bin/nutch solrindex

Usage: SolrIndexer solr url crawldb [-linkdb linkdb] [-params
k1=v1k2=v2...] (segment ... | -dir segments) [-noCommit]
[-deleteGone] [-filter] [-normalize]

i.e., there are no index/reindex options, and nor do I see any in
the code for src/java/org/apache/nutch/indexer/solr/SolrIndexer.java

Regards,
Gora 



Difference Between Indexing and Reindexing

2013-04-03 Thread Furkan KAMACI
OK, This could be a so easy question but I want to learn just a bit more
technical detail of it.
When I use Nutch to send documents to Solr to be indexing there are two
parameters:

-index and -reindex.

What Solr does at each one different from the other one?


Re: Difference Between Indexing and Reindexing

2013-04-03 Thread Otis Gospodnetic
I don't recall what Nutch does, so it's hard to tell.

In Solr (Lucene, really), you can:
* add documents
* update documents
* delete documents

Currently, update is really a delete + readd under the hood.  It's
been like that for 13+ years, but this may change:
https://issues.apache.org/jira/browse/LUCENE-4258

Otis
--
Solr  ElasticSearch Support
http://sematext.com/





On Wed, Apr 3, 2013 at 9:15 PM, Furkan KAMACI furkankam...@gmail.com wrote:
 OK, This could be a so easy question but I want to learn just a bit more
 technical detail of it.
 When I use Nutch to send documents to Solr to be indexing there are two
 parameters:

 -index and -reindex.

 What Solr does at each one different from the other one?