[jira] [Commented] (NUTCH-841) Create a Wicket-based Web Application for Nutch

2013-05-19 Thread Ivan Vershinin (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13661628#comment-13661628
 ] 

Ivan Vershinin commented on NUTCH-841:
--

Hi Chris,
I am student from Estonia (Tartu University). I have experience in Java web 
application development.
Tools and frameworks: Wicket, Spring, JUnit, Mockito, RESTful services, git, 
mercurial, linux.
I am looking forward to participate in this project during Google Summer of 
Code 2013.
Could you give me some advice concerning next steps to continue proposal?

Thanks,
Ivan

 Create a Wicket-based Web Application for Nutch
 ---

 Key: NUTCH-841
 URL: https://issues.apache.org/jira/browse/NUTCH-841
 Project: Nutch
  Issue Type: Improvement
  Components: web gui
Affects Versions: nutchgora
 Environment: Should work in both Nutch trunk and 2.0 branches.
Reporter: Chris A. Mattmann
Assignee: Chris A. Mattmann
  Labels: gsoc, gsoc2013
 Fix For: 2.3, 1.8


 In light of the conversation on NUTCH-837, we are removing the old Nutch 
 webapp and will replace it with a 2.0 one that works with GORA + Solr. 
 Apache Nutch versions prior to 1.3 used to ship with a web application that 
 allowed basic search, and browse of the information captured in the Nutch 
 index. Since 1.3, we deprecated and removed the webapp mainly due to the fact 
 that the segment API changed (we moved to Solr), and also due to the fact 
 that we didn't want to maintain a webapp b/c those JSPs were a pain.
 I am going to propose having a Nutch web application using Apache Wicket 
 http://wicket.apache.org/. This would be very cool and since I know Wicket, 
 I'm willing to help maintain it. 
 The webapp should implement all of the old web pages and functionality, and 
 also should support the basic views, and connection to Solr instead of to 
 Lucene, and of should also consider both the trunk branch, and the 2.0 branch 
 (Gora based).
 I'm putting this out there as a GSoC project for 2013.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Re: GSOC 2013 project: Apache-Wicket based Nutch webapp

2013-05-19 Thread Ivan Vershinin
Hello!
I am student from Estonia (Tartu University). I want to participate in GSoC
2013, and selected your project because i have experience in Java and
Wicket.
Can you give me some advice, where i can start my investigations?
Best regards,
Ivan Vershinin


[jira] [Commented] (NUTCH-1563) FetchSchedule#getFields is never used by GeneraterJob

2013-05-19 Thread Lewis John McGibbney (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-1563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13661637#comment-13661637
 ] 

Lewis John McGibbney commented on NUTCH-1563:
-

+1 for commit. Please commit this when you can Feng and change the Fix Version 
to 2.2.

 FetchSchedule#getFields is never used by GeneraterJob
 -

 Key: NUTCH-1563
 URL: https://issues.apache.org/jira/browse/NUTCH-1563
 Project: Nutch
  Issue Type: Bug
  Components: generator
Affects Versions: 2.1
Reporter: lufeng
Assignee: lufeng
Priority: Minor
 Fix For: 2.3

 Attachments: NUTCH-1563.patch


 The method of getFields in FetchSchedule if never used, so if user extends 
 the FetchSchedule and want to get some fields of WebPage, it always return 
 null.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (NUTCH-1573) Upgrade to most recent JUnit 4.x to improve test flexibility

2013-05-19 Thread Lewis John McGibbney (JIRA)

 [ 
https://issues.apache.org/jira/browse/NUTCH-1573?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lewis John McGibbney updated NUTCH-1573:


Attachment: NUTCH-1573.2.x.v2.patch

Updated patch.
This is quite large.
Process is as follows

Add the following to each Test class

{code}
import org.junit.Test;
import static org.junit.Assert.*;
{code}

and of course 

{code}
import org.junit.After;
import org.junit.Before;
{code}

if so required.

Secondly, add annotations to individual tests.

{code}
@Test
{code}

Test suites are somewhat unfavoured in JUnit 4.x so you can also see the patch 
for cases where I've dealt with the previous test suites.

Finally, constructors are not required. Most of the constructor material can be 
moved in to 

{code}
@Before
public void setUp()
{code} 

methods.

That's about it!
I will commit this today.


 Upgrade to most recent JUnit 4.x to improve test flexibility
 

 Key: NUTCH-1573
 URL: https://issues.apache.org/jira/browse/NUTCH-1573
 Project: Nutch
  Issue Type: Improvement
  Components: build, test
Affects Versions: 1.6, 2.1
Reporter: Lewis John McGibbney
Assignee: Lewis John McGibbney
 Fix For: 1.7, 2.2

 Attachments: NUTCH-1573.2.x.v1.patch, NUTCH-1573.2.x.v2.patch


 I wanted to try using the @Ignore functionality within JUnit, however I don't 
 think it is available in the current JUnit version we use in Nutch. We should 
 upgrade.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (NUTCH-1573) Upgrade to most recent JUnit 4.x to improve test flexibility

2013-05-19 Thread Lewis John McGibbney (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-1573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13661659#comment-13661659
 ] 

Lewis John McGibbney commented on NUTCH-1573:
-

Committed @revision 1484348 in 2.x HEAD

 Upgrade to most recent JUnit 4.x to improve test flexibility
 

 Key: NUTCH-1573
 URL: https://issues.apache.org/jira/browse/NUTCH-1573
 Project: Nutch
  Issue Type: Improvement
  Components: build, test
Affects Versions: 1.6, 2.1
Reporter: Lewis John McGibbney
Assignee: Lewis John McGibbney
 Fix For: 1.7, 2.2

 Attachments: NUTCH-1573.2.x.v1.patch, NUTCH-1573.2.x.v2.patch


 I wanted to try using the @Ignore functionality within JUnit, however I don't 
 think it is available in the current JUnit version we use in Nutch. We should 
 upgrade.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (NUTCH-1573) Upgrade to most recent JUnit 4.x to improve test flexibility

2013-05-19 Thread Tejas Patil (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-1573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13661662#comment-13661662
 ] 

Tejas Patil commented on NUTCH-1573:


[~lewismc] great !! 
Only if there were no homeworks then my life would have been awesome and I 
could have worked on ASF projects when-ever I wanted :(
Anyways, I will verify the patch on my system and update you soon. Lets get 
this change to repo today !!

 Upgrade to most recent JUnit 4.x to improve test flexibility
 

 Key: NUTCH-1573
 URL: https://issues.apache.org/jira/browse/NUTCH-1573
 Project: Nutch
  Issue Type: Improvement
  Components: build, test
Affects Versions: 1.6, 2.1
Reporter: Lewis John McGibbney
Assignee: Lewis John McGibbney
 Fix For: 1.7, 2.2

 Attachments: NUTCH-1573.2.x.v1.patch, NUTCH-1573.2.x.v2.patch


 I wanted to try using the @Ignore functionality within JUnit, however I don't 
 think it is available in the current JUnit version we use in Nutch. We should 
 upgrade.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Re: GSOC 2013 project: Apache-Wicket based Nutch webapp

2013-05-19 Thread Tejas Patil
This will help for getting an idea about what is needed:
http://wiki.apache.org/nutch/NutchAdministrationUserInterface

Rest API in nutch: (the jira comments and the patch will help you here)
https://issues.apache.org/jira/browse/NUTCH-880

Also, its worth to invest some time to get to know nutch.
This is an old paper by Doug Cutting on Nutch:
http://www.master.netseven.it/files/262-Nutch.pdf

Here is a video of a presentation by Julien @ Lucene Eurocon last year:
http://vimeopro.com/user11514798/apache-lucene-eurocon-2012/video/55566234

After that, roll up your sleeves, get the source code and start off
crawling. These are the relevant tutorials:
http://wiki.apache.org/nutch/NutchTutorial
http://wiki.apache.org/nutch/Nutch2Tutorial

Also, you will find some config and feature-centric documentation over the
wiki pages. Here is the wiki main page:
http://wiki.apache.org/nutch/

I think that your work would be a great contribution to Nutch. Looking
forward to see this feature in next release cycle.

Thanks,
Tejas Patil


On Sun, May 19, 2013 at 12:30 PM, Ivan Vershinin i...@vershinin.net wrote:

 Hello!
 I am student from Estonia (Tartu University). I want to participate in
 GSoC 2013, and selected your project because i have experience in Java and
 Wicket.
 Can you give me some advice, where i can start my investigations?
 Best regards,
 Ivan Vershinin



Re: GSOC 2013 project: Apache-Wicket based Nutch webapp

2013-05-19 Thread Tejas Patil
@dev,

I just realised that the images over the wiki page[0] are missing. Those
were displayed as an external image from
101tec.comhttp://101tec.com/wp-content/themes/101tec/images/instanceNew.jpg
which
is down. Is there any other place where those images might still be present
?

[0] : http://wiki.apache.org/nutch/NutchAdministrationUserInterface


On Sun, May 19, 2013 at 2:23 PM, Tejas Patil tejas.patil...@gmail.comwrote:

 This will help for getting an idea about what is needed:
 http://wiki.apache.org/nutch/NutchAdministrationUserInterface

 Rest API in nutch: (the jira comments and the patch will help you here)
 https://issues.apache.org/jira/browse/NUTCH-880

 Also, its worth to invest some time to get to know nutch.
 This is an old paper by Doug Cutting on Nutch:
 http://www.master.netseven.it/files/262-Nutch.pdf

 Here is a video of a presentation by Julien @ Lucene Eurocon last year:
 http://vimeopro.com/user11514798/apache-lucene-eurocon-2012/video/55566234

 After that, roll up your sleeves, get the source code and start off
 crawling. These are the relevant tutorials:
 http://wiki.apache.org/nutch/NutchTutorial
 http://wiki.apache.org/nutch/Nutch2Tutorial

 Also, you will find some config and feature-centric documentation over the
 wiki pages. Here is the wiki main page:
 http://wiki.apache.org/nutch/

 I think that your work would be a great contribution to Nutch. Looking
 forward to see this feature in next release cycle.

 Thanks,
 Tejas Patil


 On Sun, May 19, 2013 at 12:30 PM, Ivan Vershinin i...@vershinin.netwrote:

 Hello!
 I am student from Estonia (Tartu University). I want to participate in
 GSoC 2013, and selected your project because i have experience in Java and
 Wicket.
 Can you give me some advice, where i can start my investigations?
 Best regards,
 Ivan Vershinin





[jira] [Updated] (NUTCH-1569) Upgrade 2.x to Gora 0.3

2013-05-19 Thread Lewis John McGibbney (JIRA)

 [ 
https://issues.apache.org/jira/browse/NUTCH-1569?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lewis John McGibbney updated NUTCH-1569:


Attachment: NUTCH-1569.v2.patch

New updated patch for upgrade to Gora 0.3 artifacts.
This patch uses the MemStore for some Test's. 
Ut also introduces annotations to temporarily ignore some tests which extend 
AbtsractNutchTest (which previously used the now deprecated gora-sql 
0.1.1-incubating artifact).
Finally, this patch implements better Exception handling for our client code. 
There were some changes made in the Gora 0.3 API and these have been baked in 
to this patch.
Please test and we can hopefully upgrade before pushing 2.2 RC.
Thank you

 Upgrade 2.x to Gora 0.3
 ---

 Key: NUTCH-1569
 URL: https://issues.apache.org/jira/browse/NUTCH-1569
 Project: Nutch
  Issue Type: Bug
  Components: build, storage
Affects Versions: 2.2
Reporter: Lewis John McGibbney
Assignee: Lewis John McGibbney
 Fix For: 2.2

 Attachments: NUTCH-1569.patch, NUTCH-1569.v2.patch


 We just released the Maven artifacts and I would like to upgrade before we 
 push the RC for 2.2 :)
 Patch coming up

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (NUTCH-1573) Upgrade to most recent JUnit 4.x to improve test flexibility

2013-05-19 Thread Lewis John McGibbney (JIRA)

 [ 
https://issues.apache.org/jira/browse/NUTCH-1573?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lewis John McGibbney resolved NUTCH-1573.
-

Resolution: Fixed

Hi Tejas, no problem. I commited so that I could work on NUTCH-1569.
We are very close to 2.2 RC now I think.


 Upgrade to most recent JUnit 4.x to improve test flexibility
 

 Key: NUTCH-1573
 URL: https://issues.apache.org/jira/browse/NUTCH-1573
 Project: Nutch
  Issue Type: Improvement
  Components: build, test
Affects Versions: 1.6, 2.1
Reporter: Lewis John McGibbney
Assignee: Lewis John McGibbney
 Fix For: 1.7, 2.2

 Attachments: NUTCH-1573.2.x.v1.patch, NUTCH-1573.2.x.v2.patch


 I wanted to try using the @Ignore functionality within JUnit, however I don't 
 think it is available in the current JUnit version we use in Nutch. We should 
 upgrade.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[REQUEST] (NUTCH-1569) Upgrade 2.x to Gora 0.3

2013-05-19 Thread Lewis John Mcgibbney
Hi All,
I submitted a patch to upgrade the Nutch 2.x Branch codebase to the newly
released Gora 0.3.
The patch can be found here [0].
It would be excellent if folks could please test this patch and provide
feedback to the dev@ list.
The feedback will be very helpful in allowing us to progress towards a
Nutch 2.2 Release.
Thank you very much.
Lewis

[0] https://issues.apache.org/jira/browse/NUTCH-1569

-- 
*Lewis*


[jira] [Updated] (NUTCH-1569) Upgrade 2.x to Gora 0.3

2013-05-19 Thread Lewis John McGibbney (JIRA)

 [ 
https://issues.apache.org/jira/browse/NUTCH-1569?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lewis John McGibbney updated NUTCH-1569:


Issue Type: Improvement  (was: Bug)

 Upgrade 2.x to Gora 0.3
 ---

 Key: NUTCH-1569
 URL: https://issues.apache.org/jira/browse/NUTCH-1569
 Project: Nutch
  Issue Type: Improvement
  Components: build, storage
Affects Versions: 2.2
Reporter: Lewis John McGibbney
Assignee: Lewis John McGibbney
 Fix For: 2.2

 Attachments: NUTCH-1569.patch, NUTCH-1569.v2.patch


 We just released the Maven artifacts and I would like to upgrade before we 
 push the RC for 2.2 :)
 Patch coming up

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Comment Edited] (NUTCH-1569) Upgrade 2.x to Gora 0.3

2013-05-19 Thread Lewis John McGibbney (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-1569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13661671#comment-13661671
 ] 

Lewis John McGibbney edited comment on NUTCH-1569 at 5/19/13 9:41 PM:
--

New updated patch for upgrade to Gora 0.3 artifacts.
This patch uses the MemStore for some Test's. 
It also introduces annotations to temporarily ignore some tests which extend 
AbtsractNutchTest (which previously used the now deprecated gora-sql 
0.1.1-incubating artifact).
It also makes use of GORA-191 to make use of multiple avro schemas which may be 
present. In our case Host.avsc, and WebPage.avsc.
Finally, this patch implements better Exception handling for our client code. 
There were some changes made in the Gora 0.3 API and these have been baked in 
to this patch.
Please test and we can hopefully upgrade before pushing 2.2 RC.
Thank you

  was (Author: lewismc):
New updated patch for upgrade to Gora 0.3 artifacts.
This patch uses the MemStore for some Test's. 
Ut also introduces annotations to temporarily ignore some tests which extend 
AbtsractNutchTest (which previously used the now deprecated gora-sql 
0.1.1-incubating artifact).
Finally, this patch implements better Exception handling for our client code. 
There were some changes made in the Gora 0.3 API and these have been baked in 
to this patch.
Please test and we can hopefully upgrade before pushing 2.2 RC.
Thank you
  
 Upgrade 2.x to Gora 0.3
 ---

 Key: NUTCH-1569
 URL: https://issues.apache.org/jira/browse/NUTCH-1569
 Project: Nutch
  Issue Type: Improvement
  Components: build, storage
Affects Versions: 2.2
Reporter: Lewis John McGibbney
Assignee: Lewis John McGibbney
 Fix For: 2.2

 Attachments: NUTCH-1569.patch, NUTCH-1569.v2.patch


 We just released the Maven artifacts and I would like to upgrade before we 
 push the RC for 2.2 :)
 Patch coming up

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Re: GSOC 2013 project: Apache-Wicket based Nutch webapp

2013-05-19 Thread Mattmann, Chris A (398J)
Hi Tejas,

I was actually not thinking that this was a project for the Nutch
Admin GUI, but for the actual search web app no longer present.

But the Admin GUI would be icing too!

Cheers,
Chris

++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: chris.a.mattm...@nasa.gov
WWW:  http://sunset.usc.edu/~mattmann/
++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++






-Original Message-
From: Tejas Patil tejas.patil...@gmail.com
Reply-To: dev@nutch.apache.org dev@nutch.apache.org
Date: Sunday, May 19, 2013 2:31 PM
To: dev@nutch.apache.org dev@nutch.apache.org
Subject: Re: GSOC 2013 project: Apache-Wicket based Nutch webapp

@dev,


I just realised that the images over the wiki page[0] are missing. Those
were displayed as an external image from 101tec.com
http://101tec.com/wp-content/themes/101tec/images/instanceNew.jpg which
 is down. Is there any other place where those images might still be
present ?


[0] : http://wiki.apache.org/nutch/NutchAdministrationUserInterface



On Sun, May 19, 2013 at 2:23 PM, Tejas Patil
tejas.patil...@gmail.com wrote:

This will help for getting an idea about what is needed:
http://wiki.apache.org/nutch/NutchAdministrationUserInterface


Rest API in nutch: (the jira comments and the patch will help you here)
https://issues.apache.org/jira/browse/NUTCH-880




Also, its worth to invest some time to get to know nutch.
This is an old paper by Doug Cutting on Nutch:
http://www.master.netseven.it/files/262-Nutch.pdf



Here is a video of a presentation by Julien @ Lucene Eurocon last year:
http://vimeopro.com/user11514798/apache-lucene-eurocon-2012/video/55566234



After that, roll up your sleeves, get the source code and start off
crawling. These are the relevant tutorials:
http://wiki.apache.org/nutch/NutchTutorial

http://wiki.apache.org/nutch/Nutch2Tutorial



Also, you will find some config and feature-centric documentation over
the wiki pages. Here is the wiki main page:
http://wiki.apache.org/nutch/



I think that your work would be a great contribution to Nutch. Looking
forward to see this feature in next release cycle.



Thanks,
Tejas Patil



On Sun, May 19, 2013 at 12:30 PM, Ivan Vershinin
i...@vershinin.net wrote:

Hello!

I am student from Estonia (Tartu University). I want to participate in
GSoC 2013, and selected your project because i have experience in Java
and Wicket.

Can you give me some advice, where i can start my investigations?

Best regards,

Ivan Vershinin
















[jira] [Commented] (NUTCH-1573) Upgrade to most recent JUnit 4.x to improve test flexibility

2013-05-19 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-1573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13661676#comment-13661676
 ] 

Hudson commented on NUTCH-1573:
---

Integrated in Nutch-nutchgora #610 (See 
[https://builds.apache.org/job/Nutch-nutchgora/610/])
NUTCH-1573 Upgrade to most recent JUnit 4.x to improve test flexibility 
(Revision 1484348)

 Result = SUCCESS
lewismc : http://svn.apache.org/viewvc/nutch/branches/2.x/?view=revrev=1484348
Files : 
* /nutch/branches/2.x/CHANGES.txt
* /nutch/branches/2.x/ivy/ivy.xml
* 
/nutch/branches/2.x/src/plugin/creativecommons/src/test/org/creativecommons/nutch/TestCCParseFilter.java
* 
/nutch/branches/2.x/src/plugin/feed/src/test/org/apache/nutch/parse/feed/TestFeedParser.java
* 
/nutch/branches/2.x/src/plugin/index-anchor/src/test/org/apache/nutch/indexer/anchor/TestAnchorIndexingFilter.java
* 
/nutch/branches/2.x/src/plugin/index-basic/src/test/org/apache/nutch/indexer/basic/TestBasicIndexingFilter.java
* 
/nutch/branches/2.x/src/plugin/index-more/src/test/org/apache/nutch/indexer/more/TestMoreIndexingFilter.java
* 
/nutch/branches/2.x/src/plugin/language-identifier/src/test/org/apache/nutch/analysis/lang/TestHTMLLanguageParser.java
* 
/nutch/branches/2.x/src/plugin/lib-http/src/test/org/apache/nutch/protocol/http/api/TestRobotRulesParser.java
* 
/nutch/branches/2.x/src/plugin/lib-regex-filter/src/test/org/apache/nutch/urlfilter/api/RegexURLFilterBaseTest.java
* 
/nutch/branches/2.x/src/plugin/microformats-reltag/src/test/org/apache/nutch/microformats/reltag/TestRelTagIndexingFilter.java
* 
/nutch/branches/2.x/src/plugin/microformats-reltag/src/test/org/apache/nutch/microformats/reltag/TestRelTagParser.java
* 
/nutch/branches/2.x/src/plugin/parse-ext/src/test/org/apache/nutch/parse/ext/TestExtParser.java
* 
/nutch/branches/2.x/src/plugin/parse-html/src/test/org/apache/nutch/parse/html/TestDOMContentUtils.java
* 
/nutch/branches/2.x/src/plugin/parse-html/src/test/org/apache/nutch/parse/html/TestRobotsMetaProcessor.java
* 
/nutch/branches/2.x/src/plugin/parse-js/src/test/org/apache/nutch/parse/js/TestJSParseFilter.java
* 
/nutch/branches/2.x/src/plugin/parse-swf/src/test/org/apache/nutch/parse/swf/TestSWFParser.java
* 
/nutch/branches/2.x/src/plugin/parse-tika/src/test/org/apache/nutch/parse/tika/DOMContentUtilsTest.java
* 
/nutch/branches/2.x/src/plugin/parse-tika/src/test/org/apache/nutch/parse/tika/TestMSWordParser.java
* 
/nutch/branches/2.x/src/plugin/parse-tika/src/test/org/apache/nutch/parse/tika/TestOOParser.java
* 
/nutch/branches/2.x/src/plugin/parse-tika/src/test/org/apache/nutch/parse/tika/TestPdfParser.java
* 
/nutch/branches/2.x/src/plugin/parse-tika/src/test/org/apache/nutch/parse/tika/TestRSSParser.java
* 
/nutch/branches/2.x/src/plugin/parse-tika/src/test/org/apache/nutch/parse/tika/TestRTFParser.java
* 
/nutch/branches/2.x/src/plugin/parse-zip/src/test/org/apache/nutch/parse/zip/TestZipParser.java
* 
/nutch/branches/2.x/src/plugin/protocol-file/src/test/org/apache/nutch/protocol/file/TestProtocolFile.java
* 
/nutch/branches/2.x/src/plugin/protocol-httpclient/src/test/org/apache/nutch/protocol/httpclient/TestProtocolHttpClient.java
* 
/nutch/branches/2.x/src/plugin/subcollection/src/test/org/apache/nutch/collection/TestSubcollection.java
* 
/nutch/branches/2.x/src/plugin/urlfilter-automaton/src/test/org/apache/nutch/urlfilter/automaton/TestAutomatonURLFilter.java
* 
/nutch/branches/2.x/src/plugin/urlfilter-domain/src/test/org/apache/nutch/urlfilter/domain/TestDomainURLFilter.java
* 
/nutch/branches/2.x/src/plugin/urlfilter-regex/src/test/org/apache/nutch/urlfilter/regex/TestRegexURLFilter.java
* 
/nutch/branches/2.x/src/plugin/urlfilter-suffix/src/test/org/apache/nutch/urlfilter/suffix/TestSuffixURLFilter.java
* 
/nutch/branches/2.x/src/plugin/urlnormalizer-basic/src/test/org/apache/nutch/net/urlnormalizer/basic/TestBasicURLNormalizer.java
* 
/nutch/branches/2.x/src/plugin/urlnormalizer-pass/src/test/org/apache/nutch/net/urlnormalizer/pass/TestPassURLNormalizer.java
* 
/nutch/branches/2.x/src/plugin/urlnormalizer-regex/src/test/org/apache/nutch/net/urlnormalizer/regex/TestRegexURLNormalizer.java
* /nutch/branches/2.x/src/test/org/apache/nutch/crawl/TestGenerator.java
* /nutch/branches/2.x/src/test/org/apache/nutch/crawl/TestInjector.java
* /nutch/branches/2.x/src/test/org/apache/nutch/crawl/TestSignatureFactory.java
* /nutch/branches/2.x/src/test/org/apache/nutch/crawl/TestURLPartitioner.java
* /nutch/branches/2.x/src/test/org/apache/nutch/crawl/TestUrlWithScore.java
* /nutch/branches/2.x/src/test/org/apache/nutch/fetcher/TestFetcher.java
* /nutch/branches/2.x/src/test/org/apache/nutch/indexer/TestIndexingFilters.java
* /nutch/branches/2.x/src/test/org/apache/nutch/metadata/TestMetadata.java
* 
/nutch/branches/2.x/src/test/org/apache/nutch/metadata/TestSpellCheckedMetadata.java
* /nutch/branches/2.x/src/test/org/apache/nutch/net/TestURLFilters.java
* 

[jira] [Commented] (NUTCH-841) Create a Wicket-based Web Application for Nutch

2013-05-19 Thread Chris A. Mattmann (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13661684#comment-13661684
 ] 

Chris A. Mattmann commented on NUTCH-841:
-

Thanks Ivan. Unfortunately the deadline to participate in GSoC 2013 is behind 
us.

However if you are still interested in the project, you are welcome to work on 
it just not through GSoC.

 Create a Wicket-based Web Application for Nutch
 ---

 Key: NUTCH-841
 URL: https://issues.apache.org/jira/browse/NUTCH-841
 Project: Nutch
  Issue Type: Improvement
  Components: web gui
Affects Versions: nutchgora
 Environment: Should work in both Nutch trunk and 2.0 branches.
Reporter: Chris A. Mattmann
Assignee: Chris A. Mattmann
  Labels: gsoc, gsoc2013
 Fix For: 2.3, 1.8


 In light of the conversation on NUTCH-837, we are removing the old Nutch 
 webapp and will replace it with a 2.0 one that works with GORA + Solr. 
 Apache Nutch versions prior to 1.3 used to ship with a web application that 
 allowed basic search, and browse of the information captured in the Nutch 
 index. Since 1.3, we deprecated and removed the webapp mainly due to the fact 
 that the segment API changed (we moved to Solr), and also due to the fact 
 that we didn't want to maintain a webapp b/c those JSPs were a pain.
 I am going to propose having a Nutch web application using Apache Wicket 
 http://wicket.apache.org/. This would be very cool and since I know Wicket, 
 I'm willing to help maintain it. 
 The webapp should implement all of the old web pages and functionality, and 
 also should support the basic views, and connection to Solr instead of to 
 Lucene, and of should also consider both the trunk branch, and the 2.0 branch 
 (Gora based).
 I'm putting this out there as a GSoC project for 2013.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (NUTCH-841) Create a Wicket-based Web Application for Nutch

2013-05-19 Thread Chris A. Mattmann (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13661685#comment-13661685
 ] 

Chris A. Mattmann commented on NUTCH-841:
-

Yuan Yun : yes we should expose and leverage Nutch REST APIs, and extend them 
using JAX-RS.

 Create a Wicket-based Web Application for Nutch
 ---

 Key: NUTCH-841
 URL: https://issues.apache.org/jira/browse/NUTCH-841
 Project: Nutch
  Issue Type: Improvement
  Components: web gui
Affects Versions: nutchgora
 Environment: Should work in both Nutch trunk and 2.0 branches.
Reporter: Chris A. Mattmann
Assignee: Chris A. Mattmann
  Labels: gsoc, gsoc2013
 Fix For: 2.3, 1.8


 In light of the conversation on NUTCH-837, we are removing the old Nutch 
 webapp and will replace it with a 2.0 one that works with GORA + Solr. 
 Apache Nutch versions prior to 1.3 used to ship with a web application that 
 allowed basic search, and browse of the information captured in the Nutch 
 index. Since 1.3, we deprecated and removed the webapp mainly due to the fact 
 that the segment API changed (we moved to Solr), and also due to the fact 
 that we didn't want to maintain a webapp b/c those JSPs were a pain.
 I am going to propose having a Nutch web application using Apache Wicket 
 http://wicket.apache.org/. This would be very cool and since I know Wicket, 
 I'm willing to help maintain it. 
 The webapp should implement all of the old web pages and functionality, and 
 also should support the basic views, and connection to Solr instead of to 
 Lucene, and of should also consider both the trunk branch, and the 2.0 branch 
 (Gora based).
 I'm putting this out there as a GSoC project for 2013.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Re: GSOC 2013 project: Apache-Wicket based Nutch webapp

2013-05-19 Thread Mattmann, Chris A (398J)
Thanks Ivan.

I commented on JIRA too - unfortunately the deadline has passed
for student submission to GSoC.

But you are free to work on the project regardless..just not through
GSoC.

Cheers,
Chris

++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: chris.a.mattm...@nasa.gov
WWW:  http://sunset.usc.edu/~mattmann/
++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++






-Original Message-
From: Ivan Vershinin i...@vershinin.net
Reply-To: dev@nutch.apache.org dev@nutch.apache.org
Date: Sunday, May 19, 2013 12:30 PM
To: dev@nutch.apache.org dev@nutch.apache.org
Subject: Re: GSOC 2013 project: Apache-Wicket based Nutch webapp

Hello!

I am student from Estonia (Tartu University). I want to participate in
GSoC 2013, and selected your project because i have experience in Java
and Wicket.

Can you give me some advice, where i can start my investigations?

Best regards,

Ivan Vershinin