Re: Will Solr/Lucene crawl multi websites (aka a mini google with faceted search)?

2011-09-12 Thread dpt9876
Thankyou for the clarification and help guys I will try them.
On Sep 12, 2011 10:29 AM, "kkrugler [via Lucene]" <
ml-node+s472066n332847...@n3.nabble.com> wrote:
>
>
>
> On Sep 11, 2011, at 7:04pm, dpt9876 wrote:
>
>> Hi thanks for the reply.
>>
>> How does nutch/solr handle the scenario where 1 website calls price,
"price"
>> and another website calls it "cost". Same thing different name, yet I
would
>> want the facet to handle that and not create a different facet.
>>
>> Is this combo of nutch and Solr that intelligent and or intuitive?
>
> What you're describing here is web mining, not web crawling.
>
> You want to extract price data from web pages, and put that into a
specific field in Solr.
>
> To do that using Nutch, you'd need to write custom plug-ins that know how
to extract the price from a page, and add that as a custom field to the
crawl results.
>
> The above is a topic for the Nutch mailing list, since Solr is just a
downstream consumer of whatever Nutch provides.
>
> -- Ken
>
>> On Sep 12, 2011 9:06 AM, "Erick Erickson [via Lucene]" <
>> ml-node+s472066n3328340...@n3.nabble.com> wrote:
>>>
>>>
>>> Nope, there's nothing in Solr that crawls anything, you have to feed
>>> documents in yourself from the websites.
>>>
>>> Or, look at the Nutch project, see: http://nutch.apache.org/about.html
>>>
>>> which is designed for this kind of problem.
>>>
>>> Best
>>> Erick
>>>
>>> On Sun, Sep 11, 2011 at 8:53 PM, dpt9876 
>> wrote:
 Hi all,
 I am wondering if Solr will do the following for a project I am working
>> on.
 I want to create a search engine with facets for potentially hundreds
of
 websites.
 Similar to say crawling amazon + buy.com + ebay and someone can search
>> these
 3 sites from my 1 website.
 (I realise there are better ways of doing the above example, its for
 illustrative purposes).
 Eventually I would build that search crawl to index say 200 or 1000
 merchants.
 Someone would come to my site and search for "digital camera".

 They would get results from all 3 indexes and hopefully dynamic facets
eg
 Price $100-200
 Price 200-300
 Resolution 1mp-2mp

 etc etc

 Can this be done on the fly?

 I ask this because I am currently developing webscrapers to crawl these
 websites, dump that data into a db, then was thinking of tacking on a
>> solr
 server to crawl my db.

 Problem with that approach is that crawling the worlds ecommerce sites
>> will
 take forever, when it seems solr might do that for me? (I have read
about
 multiple indexes etc).

 Many thanks

 --
 View this message in context:
>>
http://lucene.472066.n3.nabble.com/Will-Solr-Lucene-crawl-multi-websites-aka-a-mini-google-with-faceted-search-tp3328314p3328314.html
 Sent from the Solr - User mailing list archive at Nabble.com.

>>>
>>>
>>> ___
>>> If you reply to this email, your message will be added to the discussion
>> below:
>>>
>>
http://lucene.472066.n3.nabble.com/Will-Solr-Lucene-crawl-multi-websites-aka-a-mini-google-with-faceted-search-tp3328314p3328340.html
>>>
>>> To unsubscribe from Will Solr/Lucene crawl multi websites (aka a mini
>> google with faceted search)?, visit
>>
http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=3328314&code=ZGFuaW50aGV0cm9waWNzQGdtYWlsLmNvbXwzMzI4MzE0fC04MDk0NTc1ODg=
>>
>>
>> --
>> View this message in context:
http://lucene.472066.n3.nabble.com/Will-Solr-Lucene-crawl-multi-websites-aka-a-mini-google-with-faceted-search-tp3328314p3328449.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>
> --
> Ken Krugler
> +1 530-210-6378
> http://bixolabs.com
> custom big data solutions & training
> Hadoop, Cascading, Mahout & Solr
>
>
>
>
>
> ___
> If you reply to this email, your message will be added to the discussion
below:
>
http://lucene.472066.n3.nabble.com/Will-Solr-Lucene-crawl-multi-websites-aka-a-mini-google-with-faceted-search-tp3328314p3328470.html
>
> To unsubscribe from Will Solr/Lucene crawl multi websites (aka a mini
google with faceted search)?, visit
http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=3328314&code=ZGFuaW50aGV0cm9waWNzQGdtYWlsLmNvbXwzMzI4MzE0fC04MDk0NTc1ODg=


--
View this message in context: 
http://lucene.472066.n3.nabble.com/Will-Solr-Lucene-crawl-multi-websites-aka-a-mini-google-with-faceted-search-tp3328314p3328937.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Will Solr/Lucene crawl multi websites (aka a mini google with faceted search)?

2011-09-11 Thread Ken Krugler

On Sep 11, 2011, at 7:04pm, dpt9876 wrote:

> Hi thanks for the reply.
> 
> How does nutch/solr handle the scenario where 1 website calls price, "price"
> and another website calls it "cost". Same thing different name, yet I would
> want the facet to handle that and not create a different facet.
> 
> Is this combo of nutch and Solr that intelligent and or intuitive?

What you're describing here is web mining, not web crawling.

You want to extract price data from web pages, and put that into a specific 
field in Solr.

To do that using Nutch, you'd need to write custom plug-ins that know how to 
extract the price from a page, and add that as a custom field to the crawl 
results.

The above is a topic for the Nutch mailing list, since Solr is just a 
downstream consumer of whatever Nutch provides.

-- Ken

> On Sep 12, 2011 9:06 AM, "Erick Erickson [via Lucene]" <
> ml-node+s472066n3328340...@n3.nabble.com> wrote:
>> 
>> 
>> Nope, there's nothing in Solr that crawls anything, you have to feed
>> documents in yourself from the websites.
>> 
>> Or, look at the Nutch project, see: http://nutch.apache.org/about.html
>> 
>> which is designed for this kind of problem.
>> 
>> Best
>> Erick
>> 
>> On Sun, Sep 11, 2011 at 8:53 PM, dpt9876 
> wrote:
>>> Hi all,
>>> I am wondering if Solr will do the following for a project I am working
> on.
>>> I want to create a search engine with facets for potentially hundreds of
>>> websites.
>>> Similar to say crawling amazon + buy.com + ebay and someone can search
> these
>>> 3 sites from my 1 website.
>>> (I realise there are better ways of doing the above example, its for
>>> illustrative purposes).
>>> Eventually I would build that search crawl to index say 200 or 1000
>>> merchants.
>>> Someone would come to my site and search for "digital camera".
>>> 
>>> They would get results from all 3 indexes and hopefully dynamic facets eg
>>> Price $100-200
>>> Price 200-300
>>> Resolution 1mp-2mp
>>> 
>>> etc etc
>>> 
>>> Can this be done on the fly?
>>> 
>>> I ask this because I am currently developing webscrapers to crawl these
>>> websites, dump that data into a db, then was thinking of tacking on a
> solr
>>> server to crawl my db.
>>> 
>>> Problem with that approach is that crawling the worlds ecommerce sites
> will
>>> take forever, when it seems solr might do that for me? (I have read about
>>> multiple indexes etc).
>>> 
>>> Many thanks
>>> 
>>> --
>>> View this message in context:
> http://lucene.472066.n3.nabble.com/Will-Solr-Lucene-crawl-multi-websites-aka-a-mini-google-with-faceted-search-tp3328314p3328314.html
>>> Sent from the Solr - User mailing list archive at Nabble.com.
>>> 
>> 
>> 
>> ___
>> If you reply to this email, your message will be added to the discussion
> below:
>> 
> http://lucene.472066.n3.nabble.com/Will-Solr-Lucene-crawl-multi-websites-aka-a-mini-google-with-faceted-search-tp3328314p3328340.html
>> 
>> To unsubscribe from Will Solr/Lucene crawl multi websites (aka a mini
> google with faceted search)?, visit
> http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=3328314&code=ZGFuaW50aGV0cm9waWNzQGdtYWlsLmNvbXwzMzI4MzE0fC04MDk0NTc1ODg=
> 
> 
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Will-Solr-Lucene-crawl-multi-websites-aka-a-mini-google-with-faceted-search-tp3328314p3328449.html
> Sent from the Solr - User mailing list archive at Nabble.com.

--
Ken Krugler
+1 530-210-6378
http://bixolabs.com
custom big data solutions & training
Hadoop, Cascading, Mahout & Solr





Re: Will Solr/Lucene crawl multi websites (aka a mini google with faceted search)?

2011-09-11 Thread dpt9876
Hi thanks for the reply.

How does nutch/solr handle the scenario where 1 website calls price, "price"
and another website calls it "cost". Same thing different name, yet I would
want the facet to handle that and not create a different facet.

Is this combo of nutch and Solr that intelligent and or intuitive?

Thanks for the fast response.
On Sep 12, 2011 9:06 AM, "Erick Erickson [via Lucene]" <
ml-node+s472066n3328340...@n3.nabble.com> wrote:
>
>
> Nope, there's nothing in Solr that crawls anything, you have to feed
> documents in yourself from the websites.
>
> Or, look at the Nutch project, see: http://nutch.apache.org/about.html
>
> which is designed for this kind of problem.
>
> Best
> Erick
>
> On Sun, Sep 11, 2011 at 8:53 PM, dpt9876 
wrote:
>> Hi all,
>> I am wondering if Solr will do the following for a project I am working
on.
>> I want to create a search engine with facets for potentially hundreds of
>> websites.
>> Similar to say crawling amazon + buy.com + ebay and someone can search
these
>> 3 sites from my 1 website.
>> (I realise there are better ways of doing the above example, its for
>> illustrative purposes).
>> Eventually I would build that search crawl to index say 200 or 1000
>> merchants.
>> Someone would come to my site and search for "digital camera".
>>
>> They would get results from all 3 indexes and hopefully dynamic facets eg
>> Price $100-200
>> Price 200-300
>> Resolution 1mp-2mp
>>
>> etc etc
>>
>> Can this be done on the fly?
>>
>> I ask this because I am currently developing webscrapers to crawl these
>> websites, dump that data into a db, then was thinking of tacking on a
solr
>> server to crawl my db.
>>
>> Problem with that approach is that crawling the worlds ecommerce sites
will
>> take forever, when it seems solr might do that for me? (I have read about
>> multiple indexes etc).
>>
>> Many thanks
>>
>> --
>> View this message in context:
http://lucene.472066.n3.nabble.com/Will-Solr-Lucene-crawl-multi-websites-aka-a-mini-google-with-faceted-search-tp3328314p3328314.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>>
>
>
> ___
> If you reply to this email, your message will be added to the discussion
below:
>
http://lucene.472066.n3.nabble.com/Will-Solr-Lucene-crawl-multi-websites-aka-a-mini-google-with-faceted-search-tp3328314p3328340.html
>
> To unsubscribe from Will Solr/Lucene crawl multi websites (aka a mini
google with faceted search)?, visit
http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=3328314&code=ZGFuaW50aGV0cm9waWNzQGdtYWlsLmNvbXwzMzI4MzE0fC04MDk0NTc1ODg=


--
View this message in context: 
http://lucene.472066.n3.nabble.com/Will-Solr-Lucene-crawl-multi-websites-aka-a-mini-google-with-faceted-search-tp3328314p3328449.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Will Solr/Lucene crawl multi websites (aka a mini google with faceted search)?

2011-09-11 Thread Erick Erickson
Nope, there's nothing in Solr that crawls anything, you have to feed
documents in yourself from the websites.

Or, look at the Nutch project, see: http://nutch.apache.org/about.html

which is designed for this kind of problem.

Best
Erick

On Sun, Sep 11, 2011 at 8:53 PM, dpt9876  wrote:
> Hi all,
> I am wondering if Solr will do the following for a project I am working on.
> I want to create a search engine with facets for potentially hundreds of
> websites.
> Similar to say crawling amazon + buy.com + ebay and someone can search these
> 3 sites from my 1 website.
> (I realise there are better ways of doing the above example, its for
> illustrative purposes).
> Eventually I would build that search crawl to index say 200 or 1000
> merchants.
> Someone would come to my site and search for "digital camera".
>
> They would get results from all 3 indexes and hopefully dynamic facets eg
> Price $100-200
> Price 200-300
> Resolution 1mp-2mp
>
> etc etc
>
> Can this be done on the fly?
>
> I ask this because I am currently developing webscrapers to crawl these
> websites, dump that data into a db, then was thinking of tacking on a solr
> server to crawl my db.
>
> Problem with that approach is that crawling the worlds ecommerce sites will
> take forever, when it seems solr might do that for me? (I have read about
> multiple indexes etc).
>
> Many thanks
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Will-Solr-Lucene-crawl-multi-websites-aka-a-mini-google-with-faceted-search-tp3328314p3328314.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>