Re: Nutch+Solr

2018-10-08 Thread Bineesh
This is solved.

Nutch 1.15 have index-writers.xml file wherein we can pass the UN/PWD for
indexing to solr.



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Nutch+Solr

2018-10-03 Thread Terry Steichen
Bineesh,

I don't use Nutch, so don't know if this is relevant, but I've had
similar-sounding failures in doing and restoring backups.  The solution
for me was to deactivate authentication while the backup was being done,
and then activate it again afterwards.  Then everything was restored
correctly.  Otherwise, I got a whole bunch of efforts (if I left
authentication active when doing the backup). 

Terry


On 10/03/2018 10:21 AM, Bineesh wrote:
> Hello,
>
> We use Solr 7.3.1 and Nutch 1.15
>
> We've placed the authentication for our solr cloud setup using the basic
> auth plugin ( login details -> solr/SolrRocks)
>
> For the nutch to index data to solr, below properties added to nutch-sitexml
> file
>
>  
>   solr.auth
>   true
>   
>   Whether to enable HTTP basic authentication for communicating with Solr.
>   Use the solr.auth.username and solr.auth.password properties to configure
>   your credentials.
>   
> 
>
>
> 
>   solr.auth.username
>   solr
>   
>   Username
>   
> 
>
>
> 
>   solr.auth.password
>   SolrRocks
>   
>   Password
>   
> 
>
> While Nutch index data to solr, its failing due to authentication. Am i
> doing something wrong ? Pls help
>
>
>
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>



Re: Nutch/Solr

2010-09-07 Thread Markus Jelsma
Depends on your version of Nutch. At least trunk and 1.1 obey the 
solrmapping.xml file in Nutch' configuration directory. I'd suggest you start 
with that mapping file and the Solr schema.xml file shipped with Nutch as it 
exactly matches with the mapping file.

Just restart Solr with the new schema (or you change the mapping), crawl, 
fetch, parse and update your DB's and then push the index from Nutch to your 
Solr instance.


On Tuesday 07 September 2010 10:00:47 Yavuz Selim YILMAZ wrote:
 I tried to combine nutch and solr, want to ask somethig.
 
 After crawling, nutch has certain fields such as; content, tstamp, title.
 
 How can I map content field after crawling ? Do I have change the lucene
 code (such as add extra field)?
 
 Or overcome in solr stage?
 
 Any suggestion?
 
 Thx.
 --
 
 Yavuz Selim YILMAZ
 

Markus Jelsma - Technisch Architect - Buyways BV
http://www.linkedin.com/in/markus17
050-8536620 / 06-50258350



Re: Nutch/Solr

2010-09-07 Thread Yavuz Selim YILMAZ
In fact, I used nutch 0.9 version, but thinking of passing the new version.

If anybody did something like that, ı want to learn their experience.

If indexing an xml file, there are specific fields and all of them are
dependent among them, so duplicates don't happen.

I want to extract specific fields from the content field. Doing such
extraction, new fileds should be indexed as well, then comes me that,
content indexed twice for every new field.

By the way, any details about how to get new fields from the content will be
helpful.
--

Yavuz Selim YILMAZ


2010/9/7 Markus Jelsma markus.jel...@buyways.nl

 Depends on your version of Nutch. At least trunk and 1.1 obey the
 solrmapping.xml file in Nutch' configuration directory. I'd suggest you
 start
 with that mapping file and the Solr schema.xml file shipped with Nutch as
 it
 exactly matches with the mapping file.

 Just restart Solr with the new schema (or you change the mapping), crawl,
 fetch, parse and update your DB's and then push the index from Nutch to
 your
 Solr instance.


 On Tuesday 07 September 2010 10:00:47 Yavuz Selim YILMAZ wrote:
  I tried to combine nutch and solr, want to ask somethig.
 
  After crawling, nutch has certain fields such as; content, tstamp, title.
 
  How can I map content field after crawling ? Do I have change the
 lucene
  code (such as add extra field)?
 
  Or overcome in solr stage?
 
  Any suggestion?
 
  Thx.
  --
 
  Yavuz Selim YILMAZ
 

 Markus Jelsma - Technisch Architect - Buyways BV
 http://www.linkedin.com/in/markus17
 050-8536620 / 06-50258350




Re: Nutch/Solr

2010-09-07 Thread Markus Jelsma

You should:
- definately upgrade to 1.1 (1.2 is on the way), and
- subscribe to the Nutch mailing list for Nutch specific questions. 


On Tuesday 07 September 2010 10:36:58 Yavuz Selim YILMAZ wrote:
 In fact, I used nutch 0.9 version, but thinking of passing the new version.
 
 If anybody did something like that, ? want to learn their experience.
 
 If indexing an xml file, there are specific fields and all of them are
 dependent among them, so duplicates don't happen.
 
 I want to extract specific fields from the content field. Doing such
 extraction, new fileds should be indexed as well, then comes me that,
 content indexed twice for every new field.
 
 By the way, any details about how to get new fields from the content will
  be helpful.
 --
 
 Yavuz Selim YILMAZ
 
 
 2010/9/7 Markus Jelsma markus.jel...@buyways.nl
 
  Depends on your version of Nutch. At least trunk and 1.1 obey the
  solrmapping.xml file in Nutch' configuration directory. I'd suggest you
  start
  with that mapping file and the Solr schema.xml file shipped with Nutch as
  it
  exactly matches with the mapping file.
 
  Just restart Solr with the new schema (or you change the mapping), crawl,
  fetch, parse and update your DB's and then push the index from Nutch to
  your
  Solr instance.
 
  On Tuesday 07 September 2010 10:00:47 Yavuz Selim YILMAZ wrote:
   I tried to combine nutch and solr, want to ask somethig.
  
   After crawling, nutch has certain fields such as; content, tstamp,
   title.
  
   How can I map content field after crawling ? Do I have change the
 
  lucene
 
   code (such as add extra field)?
  
   Or overcome in solr stage?
  
   Any suggestion?
  
   Thx.
   --
  
   Yavuz Selim YILMAZ
 
  Markus Jelsma - Technisch Architect - Buyways BV
  http://www.linkedin.com/in/markus17
  050-8536620 / 06-50258350
 

Markus Jelsma - Technisch Architect - Buyways BV
http://www.linkedin.com/in/markus17
050-8536620 / 06-50258350



Re: Nutch - Solr latest?

2008-06-25 Thread Chris Hostetter

: Im curious, is there a spot / patch for the latest on Nutch / Solr
: integration, Ive found a few pages (a few outdated it seems), it would be nice
: (?) if it worked as a DataSource type to DataImportHandler, but not sure if
: that fits w/ how it works.  Either way a nice contrib patch the way the DIH is
: already setup would be nice to have.
...
: Is there currently work ongoing on this?  Seems like it belongs in either / or
: project and not both.

My understanding is that previous wok on bridging Nutch crawling with Solr 
indexing involved patching Nutch and using a Nutch specific schema.xml and 
the client code which has since been committed as SolrJ.

Most of the discussion seemed to take place on the Nutch list (which makes 
sense since Nutch required the patching) so you may wnt to start there).

I'm not sure if Nutch itegration would make sense as a DIH plugin (it 
seems like the Nutch crawler could push the data much more easily then 
DIH could pull it from the crawler) but if there is any advantage to 
having plugin code running in Solr to support this then that would 
absolutely make sense in the new /contrib area of solr (that i believe 
Otis already created/commited) but any nutch plugins or modifications 
would obviously need to be made in Nutch.

-Hoss