Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change 
notification.

The "SecondReport" page has been changed by FjodorVershinin:
https://wiki.apache.org/nutch/SecondReport?action=diff&rev1=3&rev2=4

     * Add ability upload seed files (or post seed data) by REST API
  === Contributions to Nutch community ===
  At previous week (13.07-19.07) I worked on the most challenging task, namely 
I'd tried to implement crawling cycle in GUI part. The most problematic was 
tasks status controlling, but I'd solved this issue with simple polling. Other 
option is to post whole batch of jobs to Nutch Server, and shift all the 
responsibility to server's side. You can see my pullrequest on 
[[https://bitbucket.org/feodorv/uinutch/pull-request/2/implemented-crawling-script|
 bitbucket]]. Also, I would propose minor changes in API and created issue with 
a little patch about generate component. 
[[https://issues.apache.org/jira/browse/NUTCH-1819|NUTCH-1819]]
+ 19.07-27.07
+ I created page, which allows create and run remote crawls. Main issue was 
concerning asynchronous execution and displaying progress. I'd implemented this 
by using spring's @Async annotations and spring's executor. Progress reporting 
is made by polling mechanism, which can be replaced by wicket-atmosphere in 
future. Then html5 websockets can be used instead of polling. 
+ Also, some refactorings has been done and fixed bug in test execution process.
+ 
[[https://bitbucket.org/feodorv/uinutch/pull-request/4/implemented-crawls-page| 
pull request]]
  
- The next task is seed upload, then we can run our application on Apache's VM.
+ The next task is seed upload, then we can run our application on Apache's VM. 
Concerning seed upload, I would propose not to upload files, but add ability to 
create seed lists on UI side, which can be uploaded by API, and nutch server 
will create seed file.
+ This option can make management of seeding much easier. The second question 
is about data store. Now UI app should store too much info in plaintext 
properties file. I would propose to take embedded H2 java database, then data 
management wouldnt be an issue.
+ 
  
  == Future Actions ==
   

Reply via email to