Re: Nutch+Solr
This is solved. Nutch 1.15 have index-writers.xml file wherein we can pass the UN/PWD for indexing to solr. -- Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
Re: Nutch+Solr
Bineesh, I don't use Nutch, so don't know if this is relevant, but I've had similar-sounding failures in doing and restoring backups. The solution for me was to deactivate authentication while the backup was being done, and then activate it again afterwards. Then everything was restored correctly. Otherwise, I got a whole bunch of efforts (if I left authentication active when doing the backup). Terry On 10/03/2018 10:21 AM, Bineesh wrote: > Hello, > > We use Solr 7.3.1 and Nutch 1.15 > > We've placed the authentication for our solr cloud setup using the basic > auth plugin ( login details -> solr/SolrRocks) > > For the nutch to index data to solr, below properties added to nutch-sitexml > file > > > solr.auth > true > > Whether to enable HTTP basic authentication for communicating with Solr. > Use the solr.auth.username and solr.auth.password properties to configure > your credentials. > > > > > > solr.auth.username > solr > > Username > > > > > > solr.auth.password > SolrRocks > > Password > > > > While Nutch index data to solr, its failing due to authentication. Am i > doing something wrong ? Pls help > > > > -- > Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html >
Re: Nutch/Solr
Depends on your version of Nutch. At least trunk and 1.1 obey the solrmapping.xml file in Nutch' configuration directory. I'd suggest you start with that mapping file and the Solr schema.xml file shipped with Nutch as it exactly matches with the mapping file. Just restart Solr with the new schema (or you change the mapping), crawl, fetch, parse and update your DB's and then push the index from Nutch to your Solr instance. On Tuesday 07 September 2010 10:00:47 Yavuz Selim YILMAZ wrote: I tried to combine nutch and solr, want to ask somethig. After crawling, nutch has certain fields such as; content, tstamp, title. How can I map content field after crawling ? Do I have change the lucene code (such as add extra field)? Or overcome in solr stage? Any suggestion? Thx. -- Yavuz Selim YILMAZ Markus Jelsma - Technisch Architect - Buyways BV http://www.linkedin.com/in/markus17 050-8536620 / 06-50258350
Re: Nutch/Solr
In fact, I used nutch 0.9 version, but thinking of passing the new version. If anybody did something like that, ı want to learn their experience. If indexing an xml file, there are specific fields and all of them are dependent among them, so duplicates don't happen. I want to extract specific fields from the content field. Doing such extraction, new fileds should be indexed as well, then comes me that, content indexed twice for every new field. By the way, any details about how to get new fields from the content will be helpful. -- Yavuz Selim YILMAZ 2010/9/7 Markus Jelsma markus.jel...@buyways.nl Depends on your version of Nutch. At least trunk and 1.1 obey the solrmapping.xml file in Nutch' configuration directory. I'd suggest you start with that mapping file and the Solr schema.xml file shipped with Nutch as it exactly matches with the mapping file. Just restart Solr with the new schema (or you change the mapping), crawl, fetch, parse and update your DB's and then push the index from Nutch to your Solr instance. On Tuesday 07 September 2010 10:00:47 Yavuz Selim YILMAZ wrote: I tried to combine nutch and solr, want to ask somethig. After crawling, nutch has certain fields such as; content, tstamp, title. How can I map content field after crawling ? Do I have change the lucene code (such as add extra field)? Or overcome in solr stage? Any suggestion? Thx. -- Yavuz Selim YILMAZ Markus Jelsma - Technisch Architect - Buyways BV http://www.linkedin.com/in/markus17 050-8536620 / 06-50258350
Re: Nutch/Solr
You should: - definately upgrade to 1.1 (1.2 is on the way), and - subscribe to the Nutch mailing list for Nutch specific questions. On Tuesday 07 September 2010 10:36:58 Yavuz Selim YILMAZ wrote: In fact, I used nutch 0.9 version, but thinking of passing the new version. If anybody did something like that, ? want to learn their experience. If indexing an xml file, there are specific fields and all of them are dependent among them, so duplicates don't happen. I want to extract specific fields from the content field. Doing such extraction, new fileds should be indexed as well, then comes me that, content indexed twice for every new field. By the way, any details about how to get new fields from the content will be helpful. -- Yavuz Selim YILMAZ 2010/9/7 Markus Jelsma markus.jel...@buyways.nl Depends on your version of Nutch. At least trunk and 1.1 obey the solrmapping.xml file in Nutch' configuration directory. I'd suggest you start with that mapping file and the Solr schema.xml file shipped with Nutch as it exactly matches with the mapping file. Just restart Solr with the new schema (or you change the mapping), crawl, fetch, parse and update your DB's and then push the index from Nutch to your Solr instance. On Tuesday 07 September 2010 10:00:47 Yavuz Selim YILMAZ wrote: I tried to combine nutch and solr, want to ask somethig. After crawling, nutch has certain fields such as; content, tstamp, title. How can I map content field after crawling ? Do I have change the lucene code (such as add extra field)? Or overcome in solr stage? Any suggestion? Thx. -- Yavuz Selim YILMAZ Markus Jelsma - Technisch Architect - Buyways BV http://www.linkedin.com/in/markus17 050-8536620 / 06-50258350 Markus Jelsma - Technisch Architect - Buyways BV http://www.linkedin.com/in/markus17 050-8536620 / 06-50258350
Re: Nutch - Solr latest?
: Im curious, is there a spot / patch for the latest on Nutch / Solr : integration, Ive found a few pages (a few outdated it seems), it would be nice : (?) if it worked as a DataSource type to DataImportHandler, but not sure if : that fits w/ how it works. Either way a nice contrib patch the way the DIH is : already setup would be nice to have. ... : Is there currently work ongoing on this? Seems like it belongs in either / or : project and not both. My understanding is that previous wok on bridging Nutch crawling with Solr indexing involved patching Nutch and using a Nutch specific schema.xml and the client code which has since been committed as SolrJ. Most of the discussion seemed to take place on the Nutch list (which makes sense since Nutch required the patching) so you may wnt to start there). I'm not sure if Nutch itegration would make sense as a DIH plugin (it seems like the Nutch crawler could push the data much more easily then DIH could pull it from the crawler) but if there is any advantage to having plugin code running in Solr to support this then that would absolutely make sense in the new /contrib area of solr (that i believe Otis already created/commited) but any nutch plugins or modifications would obviously need to be made in Nutch. -Hoss