Re: Re: Re: Solr and Nutch/Droids - to use or not to use?

2010-06-17 Thread Otis Gospodnetic
Mitch,

If you use Nutch+Solr then you wouldn't *index* the fetched content with Nutch.
Solr doesn't know anything about OPIC, but I suppose you can feed the OPIC 
score computed by Nutch into a Solr field and use it during scoring, if you 
want, say with a function query.

Yes, ES has built-in support for sharding and replication.  It also makes it 
easy to implement custom scoring, which may work for OPIC here.


Yes, ask questions here. :)

Otis

Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/



- Original Message 
 From: MitchK mitc...@web.de
 To: solr-user@lucene.apache.org
 Sent: Thu, June 17, 2010 1:52:32 AM
 Subject: RE: Re: Re: Solr and Nutch/Droids - to use or not to use?
 
 
Good morning!

Great feedback from you all. This really helped a lot 
 to get an impression
of what is possible and what is not.

What is 
 interesting to me are some detail questions.

Let's assume Solr is 
 possible to work on his own with distributed indexing,
so that the client 
 does not need to know anything about shards etc.

What is interesting to 
 me is:
I. 
The scoring - Nutch uses special Scoring-implementations like 
 the
OPIC-algorithm. Can Solr use such improvements or do I need to 
 reimplement
it for Solr?

II. 
The indexing.
At the moment it 
 really sounds like nutch would index the whole stuff and
afterwards Solr does 
 the job again.
Regarding to indexing it would make sense, if Nutch computes 
 things like the
document boost (I am not sure, but I think the results of the 
 OPIC-algorithm
were added to each document as a boost) and sends an 
 indexing-request to
Solr afterwards.
However, if Nutch indexes the page's 
 content and Solr does it, too - I would
waste some time, no?
Is this the 
 case or do I missunderstood something here?

III.
I am no 
 Java-Expert.
However, in a few month I will start to study computer-science 
 at an
university. Maybe I will find some literature to learn more 
 about
distributed software and how hashing needs to work, to do the job it 
 should
do, to make distributed indexing work.
Maybe than I can help to 
 implement this feature into  Solr.
On the other hand, not much is known 
 about Solr's distributed search-concept
and which classes are responsible for 
 that - but such things one could ask
on the mailing list, no? 

As far 
 as I know Elastic Search already supports distributed indexing. 
Maybe one 
 can reuse the responsible implementation for Solr.


Btw:
I think a 
 great benefit of using Solr + Nutch would be to extend the search.
I could 
 create several Solr cores for different kinds of search - one 
 for
picture-search, one for video-search etc. *and* with the help of Nutch I 
 can
index some of the needed content in special directories. So Solr does 
 not
need to care about indexing a picture - Nutch already does the job. 
 

Kind regards,
- Mitch
-- 
View this message in context: 
 href=http://lucene.472066.n3.nabble.com/Solr-and-Nutch-Droids-to-use-or-not-to-use-tp900069p901943.html;
  
 target=_blank 
 http://lucene.472066.n3.nabble.com/Solr-and-Nutch-Droids-to-use-or-not-to-use-tp900069p901943.html
Sent 
 from the Solr - User mailing list archive at Nabble.com.


Re: Re: Re: Solr and Nutch/Droids - to use or not to use?

2010-06-17 Thread MitchK



 Solr doesn't know anything about OPIC, but I suppose you can feed the OPIC
 score computed by Nutch into a Solr field and use it during scoring, if
 you want, say with a function query. 
 
Oh! Yes, that makes more sense than using the OPIC as doc-boost-value. :-)
Anywhere at the Lucene Mailing lists I read that in future it will be
possible to change field's contents without reindexing the whole document.
If one stores the OPIC-Score (which is independent from the page's content)
in a field and uses functionQuery to influence the score of a document, one
saves the effort of reindexing the whole doc, if the content did not change.

Regards
- Mitch
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-and-Nutch-Droids-to-use-or-not-to-use-tp900069p902158.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Re: Re: Solr and Nutch/Droids - to use or not to use?

2010-06-17 Thread Otis Gospodnetic
Mitch,

Yes, one day.  But it sounds like you are not aware of ExternalFieldFile, which 
you can use today:

http://search-lucene.com/?q=ExternalFileFieldfc_project=Solr

Otis

Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/



- Original Message 
 From: MitchK mitc...@web.de
 To: solr-user@lucene.apache.org
 Sent: Thu, June 17, 2010 4:15:27 AM
 Subject: Re: Re: Re: Solr and Nutch/Droids - to use or not to use?
 
 


 Solr doesn't know anything about OPIC, but I suppose you can 
 feed the OPIC
 score computed by Nutch into a Solr field and use it 
 during scoring, if
 you want, say with a function query. 
 
Oh! 
 Yes, that makes more sense than using the OPIC as doc-boost-value. 
 :-)
Anywhere at the Lucene Mailing lists I read that in future it will 
 be
possible to change field's contents without reindexing the whole 
 document.
If one stores the OPIC-Score (which is independent from the page's 
 content)
in a field and uses functionQuery to influence the score of a 
 document, one
saves the effort of reindexing the whole doc, if the content 
 did not change.

Regards
- Mitch
-- 
View this message in 
 context: 
 href=http://lucene.472066.n3.nabble.com/Solr-and-Nutch-Droids-to-use-or-not-to-use-tp900069p902158.html;
  
 target=_blank 
 http://lucene.472066.n3.nabble.com/Solr-and-Nutch-Droids-to-use-or-not-to-use-tp900069p902158.html
Sent 
 from the Solr - User mailing list archive at Nabble.com.


Re: Re: Re: Solr and Nutch/Droids - to use or not to use?

2010-06-17 Thread MitchK

Otis,

you are right. I wasn't aware of this. At least not with such a large
dataList (let's think of an index with 4mio docs, this would mean we got an
ExternalFile with 4mio records). But from what I've read at 
search-lucene.com it seems to perform very well. Thanks for the idea!

Btw: Otis, did you open a JIRA Issue for the distributed indexing ability of
Solr?
I would like to follow the issue, if it is open. 

Regards
- Mitch


Otis Gospodnetic-2 wrote:
 
 Mitch,
 
 Yes, one day.  But it sounds like you are not aware of ExternalFieldFile,
 which you can use today:
 
 http://search-lucene.com/?q=ExternalFileFieldfc_project=Solr
 
 Otis
 
 Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
 Lucene ecosystem search :: http://search-lucene.com/
 
 
 
 - Original Message 
 From: MitchK mitc...@web.de
 To: solr-user@lucene.apache.org
 Sent: Thu, June 17, 2010 4:15:27 AM
 Subject: Re: Re: Re: Solr and Nutch/Droids - to use or not to use?
 
 
 
 
 Solr doesn't know anything about OPIC, but I suppose you can 
 feed the OPIC
 score computed by Nutch into a Solr field and use it 
 during scoring, if
 you want, say with a function query. 
 
 Oh! 
 Yes, that makes more sense than using the OPIC as doc-boost-value. 
 :-)
 Anywhere at the Lucene Mailing lists I read that in future it will 
 be
 possible to change field's contents without reindexing the whole 
 document.
 If one stores the OPIC-Score (which is independent from the page's 
 content)
 in a field and uses functionQuery to influence the score of a 
 document, one
 saves the effort of reindexing the whole doc, if the content 
 did not change.
 
 Regards
 - Mitch
 -- 
 View this message in 
 context: 
 href=http://lucene.472066.n3.nabble.com/Solr-and-Nutch-Droids-to-use-or-not-to-use-tp900069p902158.html;
  
 target=_blank 
 http://lucene.472066.n3.nabble.com/Solr-and-Nutch-Droids-to-use-or-not-to-use-tp900069p902158.html
 Sent 
 from the Solr - User mailing list archive at Nabble.com.
 
 
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-and-Nutch-Droids-to-use-or-not-to-use-tp900069p903148.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Re: Re: Solr and Nutch/Droids - to use or not to use?

2010-06-17 Thread Otis Gospodnetic
I didn't open the issue, Mitch, but feel free to do it.

 Otis

Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/



- Original Message 
 From: MitchK mitc...@web.de
 To: solr-user@lucene.apache.org
 Sent: Thu, June 17, 2010 12:07:13 PM
 Subject: Re: Re: Re: Solr and Nutch/Droids - to use or not to use?
 
 
Otis,

you are right. I wasn't aware of this. At least not with such a 
 large
dataList (let's think of an index with 4mio docs, this would mean we 
 got an
ExternalFile with 4mio records). But from what I've read at 
 
search-lucene.com it seems to perform very well. Thanks for the 
 idea!

Btw: Otis, did you open a JIRA Issue for the distributed indexing 
 ability of
Solr?
I would like to follow the issue, if it is open. 
 

Regards
- Mitch


Otis Gospodnetic-2 wrote:
 
 
 Mitch,
 
 Yes, one day.  But it sounds like you are not aware 
 of ExternalFieldFile,
 which you can use today:
 
 
 href=http://search-lucene.com/?q=ExternalFileFieldfc_project=Solr; 
 target=_blank 
 http://search-lucene.com/?q=ExternalFileFieldfc_project=Solr
 
 
 Otis
 
 Sematext :: 
 target=_blank http://sematext.com/ :: Solr - Lucene - Nutch
 Lucene 
 ecosystem search :: 
 http://search-lucene.com/
 
 
 
 - Original 
 Message 
 From: MitchK 
 href=mailto:mitc...@web.de;mitc...@web.de
 To: 
 ymailto=mailto:solr-user@lucene.apache.org; 
 href=mailto:solr-user@lucene.apache.org;solr-user@lucene.apache.org
 
 Sent: Thu, June 17, 2010 4:15:27 AM
 Subject: Re: Re: Re: Solr and 
 Nutch/Droids - to use or not to use?
 
 
 
 
 
 Solr doesn't know anything about OPIC, but I suppose you can 
 
 feed the OPIC
 score computed by Nutch into a Solr field 
 and use it 
 during scoring, if
 you want, say with a 
 function query. 
 
 Oh! 
 Yes, that makes more 
 sense than using the OPIC as doc-boost-value. 
 :-)
 Anywhere 
 at the Lucene Mailing lists I read that in future it will 
 
 be
 possible to change field's contents without reindexing the whole 
 
 document.
 If one stores the OPIC-Score (which is 
 independent from the page's 
 content)
 in a field and uses 
 functionQuery to influence the score of a 
 document, one
 
 saves the effort of reindexing the whole doc, if the content 
 did 
 not change.
 
 Regards
 - Mitch
 -- 
 View 
 this message in 
 context: 
 href=
 href=http://lucene.472066.n3.nabble.com/Solr-and-Nutch-Droids-to-use-or-not-to-use-tp900069p902158.html;
  
 target=_blank 
 http://lucene.472066.n3.nabble.com/Solr-and-Nutch-Droids-to-use-or-not-to-use-tp900069p902158.html;
  
 
 target=_blank 
 
 href=http://lucene.472066.n3.nabble.com/Solr-and-Nutch-Droids-to-use-or-not-to-use-tp900069p902158.html;
  
 target=_blank 
 http://lucene.472066.n3.nabble.com/Solr-and-Nutch-Droids-to-use-or-not-to-use-tp900069p902158.html
 
 Sent 
 from the Solr - User mailing list archive at 
 Nabble.com.
 
 
-- 
View this message in context: 
 href=http://lucene.472066.n3.nabble.com/Solr-and-Nutch-Droids-to-use-or-not-to-use-tp900069p903148.html;
  
 target=_blank 
 http://lucene.472066.n3.nabble.com/Solr-and-Nutch-Droids-to-use-or-not-to-use-tp900069p903148.html
Sent 
 from the Solr - User mailing list archive at Nabble.com.


Re: Re: Re: Solr and Nutch/Droids - to use or not to use?

2010-06-17 Thread MitchK

Otis,

And again I wished I were registred.

I will check the JIRA and when I feel comfortable with it, I will open it.

Kind regards
- Mitch
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-and-Nutch-Droids-to-use-or-not-to-use-tp900069p904145.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: Re: Solr and Nutch/Droids - to use or not to use?

2010-06-16 Thread Markus Jelsma
Nutch does not, at this moment, support some form of consistent hashing to 
select an appropriate shard. It would be nice if someone could file an issue in 
Nutch' Jira to add sharding support to it, perhaps someone with a better 
understanding and more experience with Solr's distributed search than i have at 
the moment. I can't point Nutch' developers to the right piece of documentation 
on this one ;)
 
-Original message-
From: Otis Gospodnetic otis_gospodne...@yahoo.com
Sent: Wed 16-06-2010 21:03
To: solr-user@lucene.apache.org; 
Subject: Re: Solr and Nutch/Droids - to use or not to use?

Hi Mitch,

Solr can do distributed search, so it can definitely handle indices that can't 
fit on a single server without sharding.  What I think *might* be the case that 
the Nutch indexer that sends docs to Solr might not be capable of sending 
documents to multiple Solr cores/shards.  If that is the case, I think you need 
to move this to the Nutch user/dev list and see how to feed multiple Solr 
indices/cores/shards with Nutch data.

Otis

Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/



- Original Message 
 From: MitchK mitc...@web.de
 To: solr-user@lucene.apache.org
 Sent: Wed, June 16, 2010 2:27:16 PM
 Subject: Re: Solr and Nutch/Droids - to use or not to use?
 
 
Thanks, that really helps to find the right beginning for such a journey. 
 :-)



 * Use Solr, not Nutch's search webapp 
 
As 
 far as I have read, Solr can't scale, if the index gets too large for 
 one
Server



 The setup explained here has one significant 
 caveat you also need to keep
 in mind: scale. You cannot use this kind of 
 setup with vertical scale
 (collection size) that goes beyond one Solr 
 box. The horizontal scaling
 (query throughput) is still possible with 
 the standard Solr replication
 tools.
 
...from 
 Lucidimagination.com

Is this still the case?
Furthermore, as far as I 
 have understood this blogpost: 

 href=http://www.lucidimagination.com/blog/2009/03/09/nutch-solr/; 
 target=_blank 
 http://www.lucidimagination.com/blog/2009/03/09/nutch-solr/
Lucidimagination.com 
 : Nutch and Solr , they index the whole stuff with
nutch and reindex it to 
 Solr - sounds like a lot of redundant work.

Lucid, Sematext and the 
 Nutch-wiki are the only information-sources where I
can find talks about 
 Nutch and Solr, but no one seems to talk about these
facts - except this one 
 blogpost.

If you say this is wrong or contingent on the shown setup, can 
 you tell me
how to avoid these problems?

A lot of questions, but it's 
 such an exciting topic...

Hopefully you can answer some of 
 them.

Again, thank you for the feedback, Otis.

- Mitch
-- 
 
View this message in context: 
 href=http://lucene.472066.n3.nabble.com/Solr-and-Nutch-Droids-to-use-or-not-to-use-tp900069p900604.html;
  
 target=_blank 
 http://lucene.472066.n3.nabble.com/Solr-and-Nutch-Droids-to-use-or-not-to-use-tp900069p900604.html
Sent 
 from the Solr - User mailing list archive at Nabble.com.


Re: Re: Solr and Nutch/Droids - to use or not to use?

2010-06-16 Thread Otis Gospodnetic
Well, it's not that Nutch doesn't support it.  Solr itself doesn't support it.  
Indexing applications need to know which shard they want to send documents to.  
This may be a good case for a new wish issue in Solr JIRA?

 Otis

Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/



- Original Message 
 From: Markus Jelsma markus.jel...@buyways.nl
 To: solr-user@lucene.apache.org
 Sent: Wed, June 16, 2010 3:31:49 PM
 Subject: RE: Re: Solr and Nutch/Droids - to use or not to use?
 
 Nutch does not, at this moment, support some form of consistent hashing to 
 select an appropriate shard. It would be nice if someone could file an issue 
 in 
 Nutch' Jira to add sharding support to it, perhaps someone with a better 
 understanding and more experience with Solr's distributed search than i have 
 at 
 the moment. I can't point Nutch' developers to the right piece of 
 documentation 
 on this one ;)
 
-Original message-
From: Otis Gospodnetic 
 
 href=mailto:otis_gospodne...@yahoo.com;otis_gospodne...@yahoo.com
Sent: 
 Wed 16-06-2010 21:03
To: 
 href=mailto:solr-user@lucene.apache.org;solr-user@lucene.apache.org; 
 
Subject: Re: Solr and Nutch/Droids - to use or not to use?

Hi 
 Mitch,

Solr can do distributed search, so it can definitely handle 
 indices that can't fit on a single server without sharding.  What I think 
 *might* be the case that the Nutch indexer that sends docs to Solr might not 
 be 
 capable of sending documents to multiple Solr cores/shards.  If that is the 
 case, I think you need to move this to the Nutch user/dev list and see how to 
 feed multiple Solr indices/cores/shards with Nutch 
 data.

Otis

Sematext :: 
 target=_blank http://sematext.com/ :: Solr - Lucene - Nutch
Lucene 
 ecosystem search :: http://search-lucene.com/
 



- Original Message 
 From: MitchK 
 ymailto=mailto:mitc...@web.de; 
 href=mailto:mitc...@web.de;mitc...@web.de
 To: 
 ymailto=mailto:solr-user@lucene.apache.org; 
 href=mailto:solr-user@lucene.apache.org;solr-user@lucene.apache.org
 
 Sent: Wed, June 16, 2010 2:27:16 PM
 Subject: Re: Solr and Nutch/Droids - 
 to use or not to use?
 
 
Thanks, that really helps to find the 
 right beginning for such a journey. 
 :-)



 * Use Solr, 
 not Nutch's search webapp 
 
As 
 far as I have read, Solr 
 can't scale, if the index gets too large for 
 
 one
Server



 The setup explained here has one significant 
 
 caveat you also need to keep
 in mind: scale. You cannot use 
 this kind of 
 setup with vertical scale
 (collection size) that 
 goes beyond one Solr 
 box. The horizontal scaling
 (query 
 throughput) is still possible with 
 the standard Solr 
 replication
 tools.
 
...from 
 
 Lucidimagination.com

Is this still the case?
Furthermore, as far as I 
 
 have understood this blogpost: 

 href=
 href=http://www.lucidimagination.com/blog/2009/03/09/nutch-solr/; 
 target=_blank 
 http://www.lucidimagination.com/blog/2009/03/09/nutch-solr/; target=_blank 
 
 
 href=http://www.lucidimagination.com/blog/2009/03/09/nutch-solr/
  target=_blank http://www.lucidimagination.com/blog/2009/03/09/nutch-solr/
 
Lucidimagination.com 
 : Nutch and Solr , they index the whole 
 stuff with
nutch and reindex it to 
 Solr - sounds like a lot of 
 redundant work.

Lucid, Sematext and the 
 Nutch-wiki are the only 
 information-sources where I
can find talks about 
 Nutch and Solr, but 
 no one seems to talk about these
facts - except this one 
 
 blogpost.

If you say this is wrong or contingent on the shown setup, can 
 
 you tell me
how to avoid these problems?

A lot of questions, 
 but it's 
 such an exciting topic...

Hopefully you can answer some 
 of 
 them.

Again, thank you for the feedback, Otis.

- 
 Mitch
-- 
 
View this message in context: 
 href=
 href=http://lucene.472066.n3.nabble.com/Solr-and-Nutch-Droids-to-use-or-not-to-use-tp900069p900604.html;
  
 target=_blank 
 http://lucene.472066.n3.nabble.com/Solr-and-Nutch-Droids-to-use-or-not-to-use-tp900069p900604.html;
  
 
 target=_blank 
 
 href=http://lucene.472066.n3.nabble.com/Solr-and-Nutch-Droids-to-use-or-not-to-use-tp900069p900604.html
  target=_blank 
 http://lucene.472066.n3.nabble.com/Solr-and-Nutch-Droids-to-use-or-not-to-use-tp900069p900604.html
 
Sent 
 from the Solr - User mailing list archive at 
 Nabble.com.


RE: Re: Re: Solr and Nutch/Droids - to use or not to use?

2010-06-16 Thread Markus Jelsma
You're right. Currently clients need to take care of this, in this case, Nutch 
would be the client but it cannot be configured as such. It would, indeed, be 
more appropriate for Solr to take care of this. We can already query any server 
with a set of shard hosts specified, so it would make sense if Solr also 
supported some kind of consistent hashing and shard management configuration.

 

With CouchDB-Lounge we can easily create a shard map that supports redundant 
shards on different servers for fail-over. It would be marvelous if Solr would 
support it as well.
 
-Original message-
From: Otis Gospodnetic otis_gospodne...@yahoo.com
Sent: Wed 16-06-2010 21:41
To: solr-user@lucene.apache.org; 
Subject: Re: Re: Solr and Nutch/Droids - to use or not to use?

Well, it's not that Nutch doesn't support it.  Solr itself doesn't support it.  
Indexing applications need to know which shard they want to send documents to.  
This may be a good case for a new wish issue in Solr JIRA?

Otis

Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/



- Original Message 
 From: Markus Jelsma markus.jel...@buyways.nl
 To: solr-user@lucene.apache.org
 Sent: Wed, June 16, 2010 3:31:49 PM
 Subject: RE: Re: Solr and Nutch/Droids - to use or not to use?
 
 Nutch does not, at this moment, support some form of consistent hashing to 
 select an appropriate shard. It would be nice if someone could file an issue 
 in 
 Nutch' Jira to add sharding support to it, perhaps someone with a better 
 understanding and more experience with Solr's distributed search than i have 
 at 
 the moment. I can't point Nutch' developers to the right piece of 
 documentation 
 on this one ;)

-Original message-
From: Otis Gospodnetic 
 
 href=mailto:otis_gospodne...@yahoo.com;otis_gospodne...@yahoo.com
Sent: 
 Wed 16-06-2010 21:03
To: 
 href=mailto:solr-user@lucene.apache.org;solr-user@lucene.apache.org; 
 
Subject: Re: Solr and Nutch/Droids - to use or not to use?

Hi 
 Mitch,

Solr can do distributed search, so it can definitely handle 
 indices that can't fit on a single server without sharding.  What I think 
 *might* be the case that the Nutch indexer that sends docs to Solr might not 
 be 
 capable of sending documents to multiple Solr cores/shards.  If that is the 
 case, I think you need to move this to the Nutch user/dev list and see how to 
 feed multiple Solr indices/cores/shards with Nutch 
 data.

Otis

Sematext :: 
 target=_blank http://sematext.com/ :: Solr - Lucene - Nutch
Lucene 
 ecosystem search :: http://search-lucene.com/
 



- Original Message 
 From: MitchK 
 ymailto=mailto:mitc...@web.de; 
 href=mailto:mitc...@web.de;mitc...@web.de
 To: 
 ymailto=mailto:solr-user@lucene.apache.org; 
 href=mailto:solr-user@lucene.apache.org;solr-user@lucene.apache.org
 
 Sent: Wed, June 16, 2010 2:27:16 PM
 Subject: Re: Solr and Nutch/Droids - 
 to use or not to use?
 
 
Thanks, that really helps to find the 
 right beginning for such a journey. 
 :-)



 * Use Solr, 
 not Nutch's search webapp 
 
As 
 far as I have read, Solr 
 can't scale, if the index gets too large for 
 
 one
Server



 The setup explained here has one significant 
 
 caveat you also need to keep
 in mind: scale. You cannot use 
 this kind of 
 setup with vertical scale
 (collection size) that 
 goes beyond one Solr 
 box. The horizontal scaling
 (query 
 throughput) is still possible with 
 the standard Solr 
 replication
 tools.
 
...from 
 
 Lucidimagination.com

Is this still the case?
Furthermore, as far as I 
 
 have understood this blogpost: 

 href=
 href=http://www.lucidimagination.com/blog/2009/03/09/nutch-solr/; 
 target=_blank 
 http://www.lucidimagination.com/blog/2009/03/09/nutch-solr/; target=_blank 
 
 
 href=http://www.lucidimagination.com/blog/2009/03/09/nutch-solr/
  target=_blank http://www.lucidimagination.com/blog/2009/03/09/nutch-solr/
 
Lucidimagination.com 
 : Nutch and Solr , they index the whole 
 stuff with
nutch and reindex it to 
 Solr - sounds like a lot of 
 redundant work.

Lucid, Sematext and the 
 Nutch-wiki are the only 
 information-sources where I
can find talks about 
 Nutch and Solr, but 
 no one seems to talk about these
facts - except this one 
 
 blogpost.

If you say this is wrong or contingent on the shown setup, can 
 
 you tell me
how to avoid these problems?

A lot of questions, 
 but it's 
 such an exciting topic...

Hopefully you can answer some 
 of 
 them.

Again, thank you for the feedback, Otis.

- 
 Mitch
-- 
 
View this message in context: 
 href=
 href=http://lucene.472066.n3.nabble.com/Solr-and-Nutch-Droids-to-use-or-not-to-use-tp900069p900604.html;
  
 target=_blank 
 http://lucene.472066.n3.nabble.com/Solr-and-Nutch-Droids-to-use-or-not-to-use-tp900069p900604.html;
  
 
 target=_blank 
 
 href=http://lucene.472066.n3.nabble.com/Solr-and-Nutch-Droids-to-use-or-not-to-use-tp900069p900604.html
  target=_blank 
 http

RE: Re: Re: Solr and Nutch/Droids - to use or not to use?

2010-06-16 Thread MitchK

Good morning!

Great feedback from you all. This really helped a lot to get an impression
of what is possible and what is not.

What is interesting to me are some detail questions.

Let's assume Solr is possible to work on his own with distributed indexing,
so that the client does not need to know anything about shards etc.

What is interesting to me is:
I. 
The scoring - Nutch uses special Scoring-implementations like the
OPIC-algorithm. Can Solr use such improvements or do I need to reimplement
it for Solr?

II. 
The indexing.
At the moment it really sounds like nutch would index the whole stuff and
afterwards Solr does the job again.
Regarding to indexing it would make sense, if Nutch computes things like the
document boost (I am not sure, but I think the results of the OPIC-algorithm
were added to each document as a boost) and sends an indexing-request to
Solr afterwards.
However, if Nutch indexes the page's content and Solr does it, too - I would
waste some time, no?
Is this the case or do I missunderstood something here?

III.
I am no Java-Expert.
However, in a few month I will start to study computer-science at an
university. Maybe I will find some literature to learn more about
distributed software and how hashing needs to work, to do the job it should
do, to make distributed indexing work.
Maybe than I can help to implement this feature into  Solr.
On the other hand, not much is known about Solr's distributed search-concept
and which classes are responsible for that - but such things one could ask
on the mailing list, no? 

As far as I know Elastic Search already supports distributed indexing. 
Maybe one can reuse the responsible implementation for Solr.


Btw:
I think a great benefit of using Solr + Nutch would be to extend the search.
I could create several Solr cores for different kinds of search - one for
picture-search, one for video-search etc. *and* with the help of Nutch I can
index some of the needed content in special directories. So Solr does not
need to care about indexing a picture - Nutch already does the job. 

Kind regards,
- Mitch
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-and-Nutch-Droids-to-use-or-not-to-use-tp900069p901943.html
Sent from the Solr - User mailing list archive at Nabble.com.