Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change 
notification.

The "FirstReport" page has been changed by FjodorVershinin:
https://wiki.apache.org/nutch/FirstReport?action=diff&rev1=4&rev2=5

Comment:
Added links to prototype and app skeleton

  = Google Summer of Code 2014 Report 1 =
- 
  '''Project Name''': NUTCH-841 Create a Wicket-based Web Application for Nutch 
2.X
  
  '''Report date''': 19th June 2014
  
- '''Student Name''': Fjodor Vershinin 
+ '''Student Name''': Fjodor Vershinin
  
- '''Mentor Name''': Lewis John McGibbney (lewismc)
+ '''Mentor Name''': Lewis John McGibbney (lewismc) <<TableOfContents(4)>>
- <<TableOfContents(4)>>
  
- == Project description == 
+ == Project description ==
- 
  Main goal of this project is to create an Apache Wicket-based Web GUI for 
Apache Nutch 2.X.
  
  == Review of Previous Actions ==
- 
  No actions are available as this is the first report.
  
  == Objectives ==
- 
  === Explore Nutch Documentation ===
- 
  Firstly, I decided to make some background research about Apache Nutch in 
Russian segment of Internet. Then I briefly reviewed Nutch wiki, but especially 
paid attention to GUI specification page and Nutch tutorials.
  
  === Workspace Setup ===
- 
  Nutch compilation process is built with Ant+Ivy. I'd previously experience 
with only Maven, so I've spend some time on workspace setup. Also, I prefer to 
use Mercurial because of very continent MQ plugin, which I really missing in 
SVN and GIT. Fortunately, Mercurial has also SVN/GIT bridges, so it was not 
that troublesome to configure it. As a benefit I got very convenient tool to 
work with patch queues.
  
  === Contributions to Nutch community ===
- 
   * NUTCH-1731 I'd fixed NPE bug, added ability to stop remote server, and 
re-factored code so it uses Apache commons-cli library for command line parsing 
and automatic help message generation.
  
  === Some experimental crawling ===
- 
- Then I decided to make some test crawling sessions, in order to get into 
Nutch mechanics and plugin configuration. Despite very verbose wiki 
documentation it took some time to start, because of Hbase, Elasticsearch, and 
Hadoop setup. Using this software stack I'd crawled some web pages and got some 
Nutch user experience. 
+ Then I decided to make some test crawling sessions, in order to get into 
Nutch mechanics and plugin configuration. Despite very verbose wiki 
documentation it took some time to start, because of Hbase, Elasticsearch, and 
Hadoop setup. Using this software stack I'd crawled some web pages and got some 
Nutch user experience.
  
  === Review previous GUI application ===
+ I'd checkout legacy GUI application from Github. Despite that I'd spend some 
time to get it running, it was quite useful, because I get information about 
most important features which should be implemented in the first row. From my 
point of view, most important features were:
  
- I'd checkout legacy GUI application from Github. Despite that I'd spend some 
time to get it running, it was quite useful, because I get information about 
most important features which should be implemented in the first row. From my 
point of view, most important features were:
-  1.   Seed file upload
+  1. Seed file upload
-  2.   Component which gives ability to change nutch default settings.
+  1. Component which gives ability to change nutch default settings.
-  3.   Search in crawling results
+  1. Search in crawling results
-  4.   Crawling statistics
+  1. Crawling statistics
-  5.   Authentication
+  1. Authentication
-  6.   Instances management
+  1. Instances management
-  7.   Plugins support
+  1. Plugins support
  
  === Create HTML prototype ===
+ I created HTML prototype to make some experiments with Twitter Bootstrap and 
as a final result import legacy application HTML pages into my project. Wicket 
gives ability to use such prototype pages without any problem just by adding 
wicket:id properties.HTML prototype is far from complete, but you can get 
latest state by this link 
https://bitbucket.org/feodorv/prototype/get/default.tar.gz or clone mercurial 
repo using "hg clone https://feod...@bitbucket.org/feodorv/prototype";
- 
- I created HTML prototype to make some experiments with Twitter Bootstrap and 
as a final result import legacy application HTML pages into my project. Wicket 
gives ability to use such prototype pages without any problem just by adding 
wicket:id properties.
  
  === Nutch REST API research ===
- 
  My first serious meeting with Nutch code base. I decided that it will be good 
entry point if I start with REST API. Then I can better understand, what 
features have been already implemented on server side, which drawbacks there 
exists and how application works. It was quite useful despite it took quite a 
lot of time because of code complexity and cohesion. After this step I'd got 
better experience with Nutch internal architecture and source code.
  
  === Refactoring of NUTCH Rest API ===
- 
  After reviewing of this API I realized, that code requires refactoring. Most 
of this code has been written two years ago with plain old Restlet library. We 
decided that I can use JAX-RS interface, because it is standard for modern REST 
services. Restlet has been left as back-end, but it can be replaced now without 
any effort. Also, refactoring has decreased code cohesion and complexity. These 
changes were proposed as patch in NUTCH-1769 issue. Also, I would provide 
unit/integration tests for REST API, but it is out of GSoC scope.
  
  === Implement application skeleton ===
+ Most time consuming part. I'd spend some time on exploring technologies, such 
as embedded Jetty, Wicket Spring integration and so on. Then I asked for some 
help from professional Java web-developer, in order to get some advice how to 
setup initial application skeleton. He gave me some Wicket samples from his 
projects and routed me to the right documentation about Wicket and Spring. 
Implemented features in application skeleton:
  
- Most time consuming part. I'd spend some time on exploring technologies, such 
as embedded Jetty, Wicket Spring integration and so on. Then I asked for some 
help from professional Java web-developer, in order to get some advice how to 
setup initial application skeleton. He gave me some Wicket samples from his 
projects and routed me to the right documentation about Wicket and Spring. 
Implemented features in application skeleton:
   * RESTful client to Nutch API
   * GUI application configuration component
   * Base page structures
@@ -71, +59 @@

   * Unit tests
   * Integration tests
   * Plugin support
- (mercurial repository https://bitbucket.org/feodorv/uinutch)
+ 
+ You can clone mercurial repository using "hg clone 
https://feod...@bitbucket.org/feodorv/uinutch"; command or download latest 
version by this link https://bitbucket.org/feodorv/uinutch/get/default.tar.gz
  
  == Future Actions ==
- 
  '''To be conpleted by student'''
  
  == Mentors Comments ==
+ It has been some three weeks now since I am aware that Fjodor actually began 
coding on this project. He stated that he would be delaying the start due to 
exam committments and I was fine with this. My main concern was what would be 
the tangible outcome for mid-term reporting based upon how quickly he could get 
upt-to-speed with the rather complex nature of Nutch 2.X. As far as I am 
concerned, and based on his initial report as above, I am pretty satisfied that 
he understands the high level view of Nutch 2.X including the layers of Nutch 
as the Hadoop-based application, Gora as the storage abstraction later, HBase 
(amongst others) as the WebPage/Host storage mechanism, Solr as the indexing 
server and Hadoop as the framework for running Nutch jobs on. This in itself 
takes most people much longer than 3 weeks so I am reasonablty happy with his 
progress. I would like Fjodor to be contributing his documentation at regular 
intervals to '''THIS WIKI''', this serves a number of purposes
  
- It has been some three weeks now since I am aware that Fjodor actually began 
coding on this project. He stated that he would be delaying the start due to 
exam committments and I was fine with this. My main concern was what would be 
the tangible outcome for mid-term reporting based upon how quickly he could get 
upt-to-speed with the rather complex nature of Nutch 2.X.
- As far as I am concerned, and based on his initial report as above, I am 
pretty satisfied that he understands the high level view of Nutch 2.X including 
the layers of Nutch as the Hadoop-based application, Gora as the storage 
abstraction later, HBase (amongst others) as the WebPage/Host storage 
mechanism, Solr as the indexing server and Hadoop as the framework for running 
Nutch jobs on. This in itself takes most people much longer than 3 weeks so I 
am reasonablty happy with his progress.
- I would like Fjodor to be contributing his documentation at regular intervals 
to '''THIS WIKI''', this serves a number of purposes
   * It means I can track his progress between reporting sessions
   * It means we have one place where all of the documentation resides
   * It means we can direct GSoC admins, etc here when reporting comes around
   * It means that future students and/or anyone else can find how the project 
went
   * ... etc
+ 
  I would like to see Fjodor take action on the following items
+ 
   * If possible create a graphic of the REST API as it exists in his proposed 
patch for [[https://issues.apache.org/jira/browse/NUTCH-1769|NUTCH-1769 API 
refactoring]] this should only include the information included in his above 
commentary on the topic.
   * Provide links to the '''HTML Prototype''', I have not seen any of this 
code and therefore cannot assert that progress has been made as described above.
-  * Provide links/patches for the '''application skeleton''' as stated 
above... I have yet to see any code. 
+  * Provide links/patches for the '''application skeleton''' as stated 
above... I have yet to see any code.
- All in all, I am happy that Fjodor is being resourceful e.g. approaching Java 
developers to get help, etc. 
- Fjodor has been a quiet student and has not asked a lot of me so far, if he 
wishes to change this moving forward then I am very willing to engage more with 
him in his on going work.
  
- Signed: Lewis John McGibbney (lewismc) 
+ All in all, I am happy that Fjodor is being resourceful e.g. approaching Java 
developers to get help, etc.  Fjodor has been a quiet student and has not asked 
a lot of me so far, if he wishes to change this moving forward then I am very 
willing to engage more with him in his on going work.
  
+ Signed: Lewis John McGibbney (lewismc)
+ 

Reply via email to