Dear Wiki user, You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change notification.
The "SecondReport" page has been changed by FjodorVershinin: https://wiki.apache.org/nutch/SecondReport?action=diff&rev1=10&rev2=11 * --(If possible create a graphic of the REST API as it exists in his proposed patch for [[https://issues.apache.org/jira/browse/NUTCH-1769|NUTCH-1769 API refactoring]] this should only include the information included in his above commentary on the topic.)-- * --(Provide links to the '''HTML Prototype''', I have not seen any of this code and therefore cannot assert that progress has been made as described above.)-- * --(Provide links/patches for the '''application skeleton''' as stated above... I have yet to see any code.)-- - === Objectives === + == Objectives == + === Crawling cycle === - * Add ability to get logs by REST API - * --(Implement generic crawl cycle in GUI)-- - * --(Add ability upload seed files (or post seed data) by REST API)-- - === Contributions to Nutch community === At previous week (13.07-19.07) I worked on the most challenging task, namely I'd tried to implement crawling cycle in GUI part. The most problematic was tasks status controlling, but I'd solved this issue with simple polling. Other option is to post whole batch of jobs to Nutch Server, and shift all the responsibility to server's side. You can see my pullrequest on [[https://bitbucket.org/feodorv/uinutch/pull-request/2/implemented-crawling-script| bitbucket]]. Also, I would propose minor changes in API and created issue with a little patch about generate component. [[https://issues.apache.org/jira/browse/NUTCH-1819|NUTCH-1819]] 19.07-27.07 I created page, which allows create and run remote crawls. Main issue was concerning asynchronous execution and displaying progress. I'd implemented this by using spring's @Async annotations and spring's executor. Progress reporting is made by polling mechanism, which can be replaced by wicket-atmosphere in future. Then html5 websockets can be used instead of polling. Also, some refactorings has been done and fixed bug in test execution process. [[https://bitbucket.org/feodorv/uinutch/pull-request/4/implemented-crawls-page| pull request]] - The next task is seed upload, then we can run our application on Apache's VM. Concerning seed upload, I would propose not to upload files, but add ability to create seed lists on UI side, which can be uploaded by API, and nutch server will create seed file. - This option can make management of seeding much easier. The second question is about data store. Now UI app should store too much info in plaintext properties file. I would propose to take embedded H2 java database, then data management wouldnt be an issue. + === Seed upload === + Very important step to run application on VM. Concerning seed upload, I'd proposed not to upload files, but add ability to create seed lists on UI side, which can be uploaded by API, and nutch server will create seed file. This option can make management of seeding much easier. Current implementation creates directory in /tmp, which contains file with seed urls, which works fine for now. + === Data store === + Now UI app should store too much info in plaintext properties file. I'd proposed to take embedded H2 java database, then data management wouldn't be an issue. Current implementation has H2 database and Ormlite as persistence provider. It was hard decision to take ORM into this application, but SQL written in java looks even worse. With a ORM and database I'd got rid of ugly properties file parsing. Also, it gives ability to build more complex solutions for user and roles management in future. + + == Contributions to Nutch community == + [[https://issues.apache.org/jira/browse/NUTCH-1819|NUTCH-1819]] + [[https://bitbucket.org/feodorv/uinutch/pull-request/2/implemented-crawling-script| Crawling cycle]] + [[https://bitbucket.org/feodorv/uinutch/pull-request/4/implemented-crawls-page| Crawls management]] == Future Actions == * Change entire build structure to Ant + Ivy as per existing 2.x codebase