protocol-smb: plugin protocol implementing the CIFS/SMB protocol. This protocol
allows Nutch to crawl Microsoft Windows Shares remotely using the CIFS/SMB
protocol implmentation.
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Key: NUTCH-427
URL: https://issues.apache.org/jira/browse/NUTCH-427
Project: Nutch
Issue Type: New Feature
Components: fetcher
Affects Versions: 0.8.1
Environment: JAVA - OS independent
Reporter: Armel Nene
Priority: Critical
Title: protocol-smb - Nutch protocol plugin for crawling Microsoft Windows
shares
Author: Armel T. Nene
Email: armel.nene NOSPAM-AT-NOSPAM idna-solutions.com
A. Introduction
The protocol-smb plugins allows you to crawl Microsoft Windows shares. It
implements
the CIFS/SMB protocol which is commonly used on Microsoft OS. The plugin
replicate the
behaviour of the protocol-file over CIFS/SMB protocol. This plugin uses the
JCifs library and also
support all the properties from the JCifs library.
You can find more information on the following site: http://jcifs.samba.org/
The smb protocol syntax is as follow: smb://xxxxx (i.e. smb://server/share)
.
B. Installation
1) Binaries only: Copy the "protocol-smb" to NUTCHHOME/build/plugins
directory.
Put the "smb.properties" file in the NUTCHHOME/conf
directory.
Configure the properties in "smb.properties" file
Enable the plugin by updating "nutch-site.xml" file
found in NUTCHHOME/conf directory
2) Source code: Always refer to the Nutch wiki for detailed
instructions on building Nutch. In short:
Copy the 'protocol-smb' folder to NUTCHHOME/src/plugin
Update the build.xml in NUTCHHOME/src/plugin to include
plugin
Update the NUTCHHOME/default.properties file to include
plugin
run ant to build
Copy the 'smb.properties' file to NUTCHHOME/conf, and
configure the properties
Enable the plugin by updating the nutch-site.xml file
C: Known Issues
1) URLMalformedException: unkown protocol: smb
The SMB URL protocol handler is not being successfully installed.
In short, the jCIFS jar must be loaded by the System class loader.
Workaround: a) a short term solutions will be to installed the JCIFS jar
library found in protocol-smb folder in
JDKHOME/jre/lib/ext and (or) JREHOME/lib/ext
b) After completing step a), if the exeception is still
thrown
set the System properties by passing the following
arguments
to the JVM:
-Djava.protocol.handler.pkgs=jcifs
Also you can visit the FAQ page: http://jcifs.samba.org/src/docs/faq.html
2) FATAL smb.SMB - Could not read content of protocol: smb://xxxxxx
This problem usually occurs if the following properties are not set
correctly in
the "smb.properties" file:
- username
- password
- domain
Also refer to the following resources for more information on the list of
available properties and how to set them:
http://jcifs.samba.org/src/docs/api/overview-summary.html#scp
Also you can visit the FAQ page: http://jcifs.samba.org/src/docs/faq.html
N.B. All properties should set in the "smb.properties" file. You can set
all supported JCIFS properties in the "smb.properties" file.
3) Only tested on Windows XP and Windows Server 2003. Please report any
tests
conclusion on other OS. It should also run on any other OS without any
change.
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira
-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Nutch-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-developers