Re: how to crawl when Solr is search engine?

2007-06-07 Thread Manoharam Reddy

Pardon me if I am taking too much of your time.

It would be really great if you could please highlight a few
advantages of caching and maintenance over nutch.

Some musing:-
(I have used Nutch before and one thing I observed there was that if I
delete the crawl folder when Nutch is running, users can still search
and obtain proper results. It seems Nutch caches all the indexes in
the memory when it starts. I don't understand how is that feasible
when the size of the crawl is in the order of 10 GBs where as you have
a RAM + swap of only a few GBs.)

How is Solr caching better than this?

On 6/7/07, Ian Holsman <[EMAIL PROTECTED]> wrote:

Manoharam Reddy wrote:
> Thanks for your quick response.
>
> This brings me to another question. As far as I know Nutch can take
> care of crawling as well as indexing. Then why go through the hassle
> of crawling through Nutch and integrating it into Solr?

I found Solr's caching and maintenance easier to use than nutch's. But
that's just me.

>
> Another question I have, Solr provides the search results in XML
> format, any ready made tools to convert them directly to web pages for
> visitors to see?

yep.. it's called XSLT. most modern browsers can do the transform on the
client side.
otherwise there is some server side tools (cocoon I think does this) to
do the transform on the server before sending it out.

--Ian
>
> On 6/7/07, Ian Holsman <[EMAIL PROTECTED]> wrote:
>> Hi Manoharam.
>>
>> we use nutch to do the crawl, and have used sami's patch of nutch
>> 
(http://blog.foofactory.fi/2007/02/online-indexing-integrating-nutch-with.html
>>
>> ) to have it integrate with Solr. It works quite well for our needs.
>>
>> If you are concerned with the speed, Solr also has a CSV upload
>> facility, which you might be able to use to upload the data into solr
>> that way, but we haven't found the HTTP Post speed to be an issue for
>> us.
>>
>> Regards
>> Ian
>>
>>
>> Manoharam Reddy wrote:
>> > I have just begun using Solr. I see that we have to insert documents
>> > by posting XMLs to solr/update
>> >
>> > I would like to know how Solr is used as a search engine in
>> > enterprises. How do you do the crawling of your intranet and passing
>> > the information as XML to solr/update. Isn't this going to be slow? To
>> > put all content in the index via a HTTP POST request requiring network
>> > sockets to be opened?
>> >
>> > Isn't there any direct way to to do the same thing without resorting
>> > to HTTP?
>> >
>>
>>
>




Re: how to crawl when Solr is search engine?

2007-06-07 Thread Manoharam Reddy

Thanks for your quick response.

This brings me to another question. As far as I know Nutch can take
care of crawling as well as indexing. Then why go through the hassle
of crawling through Nutch and integrating it into Solr?

Another question I have, Solr provides the search results in XML
format, any ready made tools to convert them directly to web pages for
visitors to see?

On 6/7/07, Ian Holsman <[EMAIL PROTECTED]> wrote:

Hi Manoharam.

we use nutch to do the crawl, and have used sami's patch of nutch
(http://blog.foofactory.fi/2007/02/online-indexing-integrating-nutch-with.html
) to have it integrate with Solr. It works quite well for our needs.

If you are concerned with the speed, Solr also has a CSV upload
facility, which you might be able to use to upload the data into solr
that way, but we haven't found the HTTP Post speed to be an issue for us.

Regards
Ian


Manoharam Reddy wrote:
> I have just begun using Solr. I see that we have to insert documents
> by posting XMLs to solr/update
>
> I would like to know how Solr is used as a search engine in
> enterprises. How do you do the crawling of your intranet and passing
> the information as XML to solr/update. Isn't this going to be slow? To
> put all content in the index via a HTTP POST request requiring network
> sockets to be opened?
>
> Isn't there any direct way to to do the same thing without resorting
> to HTTP?
>




how to crawl when Solr is search engine?

2007-06-07 Thread Manoharam Reddy

I have just begun using Solr. I see that we have to insert documents
by posting XMLs to solr/update

I would like to know how Solr is used as a search engine in
enterprises. How do you do the crawling of your intranet and passing
the information as XML to solr/update. Isn't this going to be slow? To
put all content in the index via a HTTP POST request requiring network
sockets to be opened?

Isn't there any direct way to to do the same thing without resorting to HTTP?


post.jar is absent in Solr distribution

2007-06-06 Thread Manoharam Reddy

I am an absolute noob to solr and I am trying out the Solr tutorial
present at http://lucene.apache.org/solr/tutorial.html

In the tutorial, post.jar is mentioned but I don't find post.jar
anywhere. I downloaded the solr tarball from
http://www.eu.apache.org/dist/lucene/solr/1.1/apache-solr-1.1.0-incubating.tgz

What do I do now?