[jira] [Commented] (NUTCH-2234) Upgrade to elasticsearch 2.1.1

2016-05-24 Thread Lewis John McGibbney (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-2234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15298946#comment-15298946
 ] 

Lewis John McGibbney commented on NUTCH-2234:
-

The scoring plugins do not rely upon Lucene heavily. The upgrades would be 
trivial and may even not mean any programmatic API changes. IMHO making the 
upgrades at different times can be somewhat problem-some so i would encourage 
an upgrade across the board. 

> Upgrade to elasticsearch 2.1.1
> --
>
> Key: NUTCH-2234
> URL: https://issues.apache.org/jira/browse/NUTCH-2234
> Project: Nutch
>  Issue Type: Improvement
>  Components: indexer
>Affects Versions: 1.11
>Reporter: Tien Nguyen Manh
>Assignee: Markus Jelsma
> Fix For: 1.13
>
> Attachments: NUTCH-2234.patch
>
>
> Currently we use elasticsearch 1.x, We should upgrade to 2.x



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (NUTCH-2234) Upgrade to elasticsearch 2.1.1

2016-05-24 Thread Joseph Naegele (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-2234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15298924#comment-15298924
 ] 

Joseph Naegele commented on NUTCH-2234:
---

Hmm I'm a bit confused. ES 2.3.3 depends on Lucene 5.5.0 libraries. It appears 
indexer-solr does not depend on Lucene, only Solrj. lucene-analyzers-common 
4.10.2 is a Nutch-wide dependency in ivy/ivy.xml, but it appears to only be 
used by plugins: indexer-elastic, parsefilter-naivebayes, and 
scoring-similarity, of which indexer-elastic and parsefilter-naivebayes specify 
their Lucene dependencies in their own plugin.xml (scoring-similarity appears 
to rely on lucene-core 4.10.2 being a transitive dependency through 
lucene-analyzers-common. Changing the lucene version in ivy/ivy.xml requires 
changes to the scoring-similarity plugin, which I think should be its own issue.



> Upgrade to elasticsearch 2.1.1
> --
>
> Key: NUTCH-2234
> URL: https://issues.apache.org/jira/browse/NUTCH-2234
> Project: Nutch
>  Issue Type: Improvement
>  Components: indexer
>Affects Versions: 1.11
>Reporter: Tien Nguyen Manh
>Assignee: Markus Jelsma
> Fix For: 1.13
>
> Attachments: NUTCH-2234.patch
>
>
> Currently we use elasticsearch 1.x, We should upgrade to 2.x



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (NUTCH-2234) Upgrade to elasticsearch 2.1.1

2016-05-24 Thread Lewis John McGibbney (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-2234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15298842#comment-15298842
 ] 

Lewis John McGibbney commented on NUTCH-2234:
-

bq. I can update the patch or open a PR on Github.
Please do. Please make sure that you run tests as the dependencies have caught 
us out before. Please also consider that with indexer-solr we want to keep 
indexer-elastic and indexer-solr (and any other indexers) relying upon the same 
underlying version of Lucene if possible.

> Upgrade to elasticsearch 2.1.1
> --
>
> Key: NUTCH-2234
> URL: https://issues.apache.org/jira/browse/NUTCH-2234
> Project: Nutch
>  Issue Type: Improvement
>  Components: indexer
>Affects Versions: 1.11
>Reporter: Tien Nguyen Manh
>Assignee: Markus Jelsma
> Fix For: 1.13
>
> Attachments: NUTCH-2234.patch
>
>
> Currently we use elasticsearch 1.x, We should upgrade to 2.x



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (NUTCH-2234) Upgrade to elasticsearch 2.1.1

2016-05-24 Thread Joseph Naegele (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-2234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15298791#comment-15298791
 ] 

Joseph Naegele commented on NUTCH-2234:
---

Since this also adds support for multiple, comma-separated Elasticsearch hosts 
in {{elastic.host}}, the description {{nutch-default.xml}} should be updated 
accordingly. Is there any reason not to update this to use the most recent 
version of Elasticsearch (2.3.3)? I can update the patch or open a PR on Github.

> Upgrade to elasticsearch 2.1.1
> --
>
> Key: NUTCH-2234
> URL: https://issues.apache.org/jira/browse/NUTCH-2234
> Project: Nutch
>  Issue Type: Improvement
>  Components: indexer
>Affects Versions: 1.11
>Reporter: Tien Nguyen Manh
>Assignee: Markus Jelsma
> Fix For: 1.13
>
> Attachments: NUTCH-2234.patch
>
>
> Currently we use elasticsearch 1.x, We should upgrade to 2.x



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (NUTCH-1687) Pick queue in Round Robin

2016-05-24 Thread Joseph Naegele (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-1687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15298757#comment-15298757
 ] 

Joseph Naegele commented on NUTCH-1687:
---

Any issues with Tien's updated patch?

> Pick queue in Round Robin
> -
>
> Key: NUTCH-1687
> URL: https://issues.apache.org/jira/browse/NUTCH-1687
> Project: Nutch
>  Issue Type: Improvement
>  Components: fetcher
>Reporter: Tien Nguyen Manh
>Priority: Minor
> Attachments: NUTCH-1687-2.patch, NUTCH-1687.patch, 
> NUTCH-1687.tejasp.v1.patch
>
>
> Currently we chose queue to pick url from start of queues list, so queue at 
> the start of list have more change to be pick first, that can cause problem 
> of long tail queue, which only few queue available at the end which have many 
> urls.
> public synchronized FetchItem getFetchItem() {
>   final Iterator> it =
> queues.entrySet().iterator(); ==> always reset to find queue from 
> start
>   while (it.hasNext()) {
> 
> I think it is better to pick queue in round robin, that can make reduce time 
> to find the available queue and make all queue was picked in round robin and 
> if we use TopN during generator there are no long tail queue at the end.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[Nutch Wiki] Update of "bin/nutch nutchserver" by kamaci

2016-05-24 Thread Apache Wiki
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change 
notification.

The "bin/nutch nutchserver" page has been changed by kamaci:
https://wiki.apache.org/nutch/bin/nutch%20nutchserver?action=diff&rev1=4&rev2=5

  nutchserver is an alias for org.apache.nutch.api.NutchServer
  
  = Nutch 1.X =
- NutchServer is not urrently not available in Nutch 1.X. There is however a 
[[https://issues.apache.org/jira/browse/NUTCH-1040|Jira ticket for backporting 
the REST API from Nutch 2.X to Nutch 1.X]]
+ NutchServer is not currently not available in Nutch 1.X. There is however a 
[[https://issues.apache.org/jira/browse/NUTCH-1040|Jira ticket for backporting 
the REST API from Nutch 2.X to Nutch 1.X]]
  
  = Nutch 2.X =
  Invoking the call to nutchserver ensures that a Nutch Server runs locally on 
a user defined port... by default this is set to 8081 if none is specified. 
This is a fully REST API for configuring and administering your Nutch crawler.


[jira] [Updated] (NUTCH-1800) Documentation for Nutch 1.X REST API

2016-05-24 Thread Furkan KAMACI (JIRA)

 [ 
https://issues.apache.org/jira/browse/NUTCH-1800?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Furkan KAMACI updated NUTCH-1800:
-
Description: 
This issue should build on NUTCH-1769 with full Java documentation for all 
classes in the following packages

org.apache.nutch.api.*

I am assigning this one to [~fjodor.vershinin] as he is doing an excellent job 
on the REST API. His UML graphic in [0] and commantary shows that he has a good 
understanding of the REST API and its functionality.

Thank you [~fjodor.vershinin] great work.

[0] https://wiki.apache.org/nutch/NutchRESTAPI#UML_Graphic

  was:
This issue should build on NUTCH-1769 with full Java documentation for all 
classes in the following packages

org.apache.nutch.api.*

I am assigning this one to [~fjodor.vershinin] as he is doing an excellent job 
on the REST API. His UML graphic in [0] and commantary shows that he has a goo 
dunderstanding of the REST API and its functionality.

Thank you [~fjodor.vershinin] great work.

[0] https://wiki.apache.org/nutch/NutchRESTAPI#UML_Graphic


> Documentation for Nutch 1.X REST API
> 
>
> Key: NUTCH-1800
> URL: https://issues.apache.org/jira/browse/NUTCH-1800
> Project: Nutch
>  Issue Type: New Feature
>  Components: documentation, REST_api
>Reporter: Lewis John McGibbney
>Assignee: Lewis John McGibbney
> Fix For: 1.11
>
> Attachments: NUTCH-1800.patch
>
>
> This issue should build on NUTCH-1769 with full Java documentation for all 
> classes in the following packages
> org.apache.nutch.api.*
> I am assigning this one to [~fjodor.vershinin] as he is doing an excellent 
> job on the REST API. His UML graphic in [0] and commantary shows that he has 
> a good understanding of the REST API and its functionality.
> Thank you [~fjodor.vershinin] great work.
> [0] https://wiki.apache.org/nutch/NutchRESTAPI#UML_Graphic



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (NUTCH-2089) Move Nutch 2.x to compile on JDK 8

2016-05-24 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-2089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15298177#comment-15298177
 ] 

Hudson commented on NUTCH-2089:
---

FAILURE: Integrated in Nutch-nutchgora #1559 (See 
[https://builds.apache.org/job/Nutch-nutchgora/1559/])
NUTCH-2089 Move Nutch to compile on JDK 8 (lewis.mcgibbney: rev 
581c5a4d1b1db2205f6ffe07cc439b7331dfac2f)
* default.properties


> Move Nutch 2.x to compile on JDK 8
> --
>
> Key: NUTCH-2089
> URL: https://issues.apache.org/jira/browse/NUTCH-2089
> Project: Nutch
>  Issue Type: Bug
>  Components: build
>Reporter: Lewis John McGibbney
>Assignee: Furkan KAMACI
> Fix For: 2.4
>
> Attachments: java8output.txt, java8output.txt
>
>
> Public support updates for JDK 1.7 stopped in April of this year.
> https://www.java.com/en/download/faq/java_7.xml
> In our next release we should shift support to JDK 1.8.
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (NUTCH-2266) Fix dead link in build.xml for javadoc

2016-05-24 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-2266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15298176#comment-15298176
 ] 

Hudson commented on NUTCH-2266:
---

FAILURE: Integrated in Nutch-nutchgora #1559 (See 
[https://builds.apache.org/job/Nutch-nutchgora/1559/])
NUTCH-2266 Dead link in build.xml for javadoc is fixed. (furkankamaci: rev 
3f9ad62d53a974d810a78db8bcd33a8cc1eaf67d)
* default.properties
* build.xml


> Fix dead link in build.xml for javadoc
> --
>
> Key: NUTCH-2266
> URL: https://issues.apache.org/jira/browse/NUTCH-2266
> Project: Nutch
>  Issue Type: Bug
>  Components: build
>Reporter: Furkan KAMACI
>Assignee: Furkan KAMACI
>Priority: Minor
> Fix For: 2.4
>
>
> build.xml has a dead link for javadoc.link.lucene and should be fixed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Build failed in Jenkins: Nutch-nutchgora #1559

2016-05-24 Thread Apache Jenkins Server
See 

Changes:

[lewis.mcgibbney] NUTCH-2089 Move Nutch to compile on JDK 8

[furkankamaci] NUTCH-2266 Dead link in build.xml for javadoc is fixed.

--
[...truncated 510 lines...]

resolve-default:
[ivy:resolve] :: loading settings :: file = 


compile:
 [echo] Compiling plugin: lib-http

jar:

clean-lib:

resolve-default:
[ivy:resolve] :: loading settings :: file = 


compile:
 [echo] Compiling plugin: protocol-http

jar:

deps-test:

deploy:

copy-generated-lib:

init:

init-plugin:

deps-jar:

init:

init-plugin:

deps-jar:

clean-lib:

resolve-default:
[ivy:resolve] :: loading settings :: file = 


compile:
 [echo] Compiling plugin: lib-http

jar:

clean-lib:

resolve-default:
[ivy:resolve] :: loading settings :: file = 


compile:
 [echo] Compiling plugin: protocol-httpclient

jar:

deps-test:

deploy:

copy-generated-lib:

init:

init-plugin:

deps-jar:

clean-lib:

resolve-default:
[ivy:resolve] :: loading settings :: file = 


compile:
 [echo] Compiling plugin: protocol-sftp

jar:

deps-test:

deploy:

copy-generated-lib:

init:

init-plugin:

deps-jar:

clean-lib:

resolve-default:
[ivy:resolve] :: loading settings :: file = 


compile:
 [echo] Compiling plugin: parse-js

jar:

deps-test:

init:

init-plugin:

clean-lib:

resolve-default:
[ivy:resolve] :: loading settings :: file = 


compile:

jar:

deps-test:

deploy:

copy-generated-lib:

init:

init-plugin:

deps-jar:

clean-lib:

resolve-default:
[ivy:resolve] :: loading settings :: file = 


compile:
 [echo] Compiling plugin: protocol-file

jar:

deps-test:

deploy:

copy-generated-lib:

deploy:

copy-generated-lib:

init:

init-plugin:

deps-jar:

init:

init-plugin:

clean-lib:

resolve-default:
[ivy:resolve] :: loading settings :: file = 


compile:

jar:

clean-lib:

resolve-default:
[ivy:resolve] :: loading settings :: file = 


compile:
 [echo] Compiling plugin: parse-html

jar:

deps-test:

init:

init-plugin:

clean-lib:

resolve-default:
[ivy:resolve] :: loading settings :: file = 


compile:

jar:

deps-test:

deploy:

copy-generated-lib:

init:

init-plugin:

clean-lib:

resolve-default:
[ivy:resolve] :: loading settings :: file = 


compile:

jar:

deps-test:

deploy:

copy-generated-lib:

deploy:

copy-generated-lib:

init:

init-plugin:

deps-jar:

clean-lib:

resolve-default:
[ivy:resolve] :: loading settings :: file = 


compile:
 [echo] Compiling plugin: parse-tika
[javac] Compiling 1 source file to 

[javac] javac: invalid target release: 1.8
[javac] Usage: javac  
[javac] use -help for a list of possible options

BUILD FAILED
:113: The following 
error occurred while executing this line:
:52: The 
following error occurred while executing this line:
:117:
 Compile failed; see the compiler error output for details.

Total time: 50 seconds
Build step 'Invoke Ant' marked build as failure
Publishing Javadoc
Updating NUTCH-2266
Updating NUTCH-2089


[GitHub] nutch pull request: NUTCH-2089 Move Nutch to compile on JDK 8

2016-05-24 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/nutch/pull/116


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Updated] (NUTCH-2089) Move Nutch 2.x to compile on JDK 8

2016-05-24 Thread Lewis John McGibbney (JIRA)

 [ 
https://issues.apache.org/jira/browse/NUTCH-2089?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lewis John McGibbney updated NUTCH-2089:

Fix Version/s: (was: 2.5)
   2.4

> Move Nutch 2.x to compile on JDK 8
> --
>
> Key: NUTCH-2089
> URL: https://issues.apache.org/jira/browse/NUTCH-2089
> Project: Nutch
>  Issue Type: Bug
>  Components: build
>Reporter: Lewis John McGibbney
>Assignee: Furkan KAMACI
> Fix For: 2.4
>
> Attachments: java8output.txt, java8output.txt
>
>
> Public support updates for JDK 1.7 stopped in April of this year.
> https://www.java.com/en/download/faq/java_7.xml
> In our next release we should shift support to JDK 1.8.
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (NUTCH-2089) Move Nutch 2.x to compile on JDK 8

2016-05-24 Thread Lewis John McGibbney (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-2089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15298119#comment-15298119
 ] 

Lewis John McGibbney commented on NUTCH-2089:
-

The PR for this issue has been merged however there are till many Javadoc 
warnings. 

> Move Nutch 2.x to compile on JDK 8
> --
>
> Key: NUTCH-2089
> URL: https://issues.apache.org/jira/browse/NUTCH-2089
> Project: Nutch
>  Issue Type: Bug
>  Components: build
>Reporter: Lewis John McGibbney
>Assignee: Furkan KAMACI
> Fix For: 2.4
>
> Attachments: java8output.txt, java8output.txt
>
>
> Public support updates for JDK 1.7 stopped in April of this year.
> https://www.java.com/en/download/faq/java_7.xml
> In our next release we should shift support to JDK 1.8.
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (NUTCH-2089) Move Nutch 2.x to compile on JDK 8

2016-05-24 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-2089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15298117#comment-15298117
 ] 

ASF GitHub Bot commented on NUTCH-2089:
---

Github user asfgit closed the pull request at:

https://github.com/apache/nutch/pull/116


> Move Nutch 2.x to compile on JDK 8
> --
>
> Key: NUTCH-2089
> URL: https://issues.apache.org/jira/browse/NUTCH-2089
> Project: Nutch
>  Issue Type: Bug
>  Components: build
>Reporter: Lewis John McGibbney
>Assignee: Furkan KAMACI
> Fix For: 2.4
>
> Attachments: java8output.txt, java8output.txt
>
>
> Public support updates for JDK 1.7 stopped in April of this year.
> https://www.java.com/en/download/faq/java_7.xml
> In our next release we should shift support to JDK 1.8.
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (NUTCH-2089) Move Nutch 2.x to compile on JDK 8

2016-05-24 Thread Lewis John McGibbney (JIRA)

 [ 
https://issues.apache.org/jira/browse/NUTCH-2089?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lewis John McGibbney updated NUTCH-2089:

Summary: Move Nutch 2.x to compile on JDK 8  (was: Move Nutch to compile on 
JDK 8)

> Move Nutch 2.x to compile on JDK 8
> --
>
> Key: NUTCH-2089
> URL: https://issues.apache.org/jira/browse/NUTCH-2089
> Project: Nutch
>  Issue Type: Bug
>  Components: build
>Reporter: Lewis John McGibbney
>Assignee: Furkan KAMACI
> Fix For: 2.5
>
> Attachments: java8output.txt, java8output.txt
>
>
> Public support updates for JDK 1.7 stopped in April of this year.
> https://www.java.com/en/download/faq/java_7.xml
> In our next release we should shift support to JDK 1.8.
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (NUTCH-2266) Fix dead link in build.xml for javadoc

2016-05-24 Thread Lewis John McGibbney (JIRA)

 [ 
https://issues.apache.org/jira/browse/NUTCH-2266?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lewis John McGibbney resolved NUTCH-2266.
-
Resolution: Fixed

Thanks [~kamaci] :)

> Fix dead link in build.xml for javadoc
> --
>
> Key: NUTCH-2266
> URL: https://issues.apache.org/jira/browse/NUTCH-2266
> Project: Nutch
>  Issue Type: Bug
>  Components: build
>Reporter: Furkan KAMACI
>Assignee: Furkan KAMACI
>Priority: Minor
> Fix For: 2.4
>
>
> build.xml has a dead link for javadoc.link.lucene and should be fixed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (NUTCH-2266) Fix dead link in build.xml for javadoc

2016-05-24 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-2266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15298110#comment-15298110
 ] 

ASF GitHub Bot commented on NUTCH-2266:
---

Github user asfgit closed the pull request at:

https://github.com/apache/nutch/pull/117


> Fix dead link in build.xml for javadoc
> --
>
> Key: NUTCH-2266
> URL: https://issues.apache.org/jira/browse/NUTCH-2266
> Project: Nutch
>  Issue Type: Bug
>  Components: build
>Reporter: Furkan KAMACI
>Assignee: Furkan KAMACI
>Priority: Minor
> Fix For: 2.4
>
>
> build.xml has a dead link for javadoc.link.lucene and should be fixed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] nutch pull request: NUTCH-2266 Dead link in build.xml for javadoc ...

2016-05-24 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/nutch/pull/117


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Updated] (NUTCH-2266) Fix dead link in build.xml for javadoc

2016-05-24 Thread Lewis John McGibbney (JIRA)

 [ 
https://issues.apache.org/jira/browse/NUTCH-2266?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lewis John McGibbney updated NUTCH-2266:

Fix Version/s: (was: 2.5)
   2.4

> Fix dead link in build.xml for javadoc
> --
>
> Key: NUTCH-2266
> URL: https://issues.apache.org/jira/browse/NUTCH-2266
> Project: Nutch
>  Issue Type: Bug
>  Components: build
>Reporter: Furkan KAMACI
>Assignee: Furkan KAMACI
>Priority: Minor
> Fix For: 2.4
>
>
> build.xml has a dead link for javadoc.link.lucene and should be fixed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


RE: [ANNOUNCE] New Nutch committer and PMC - Karanjeet Singh

2016-05-24 Thread Markus Jelsma
Welcome too Karanjeet. Thanks for the good work on HtmlUnit plugin.

Cheers,
Markus

 
 
-Original message-
> From:Karanjeet Singh 
> Sent: Monday 23rd May 2016 19:52
> To: dev@nutch.apache.org; u...@nutch.apache.org
> Subject: Re: [ANNOUNCE] New Nutch committer and PMC - Karanjeet Singh
> 
> Hi Sebastian, 
> Thanks for the invitation and warm welcome. 
> 
> Hello Everyone, 
> I am glad to be on board and having this opportunity to work with all of you. 
> I am a graduate student at the University of Southern California (USC) 
> pursuing my Master’s in Computer Science. Prior to this, I was working as a 
> web developer at Computer Sciences Corporation (CSC), India.  At CSC, I have 
> developed applications for a global payments technology company adhering to 
> PCI DSS standards.  
> And now, I am starting my summer internship at NASA JPL. 
> Last year, in 2015, I took a course named Information Retrieval (IR) under 
> Prof. Chris Mattmann where I got the opportunity to learn and work on Nutch 
> 1.x. This was the time when I started working on some of its bugs. The 
> semester ended but not the interest and therefore I moved ahead working on 
> Nutch plugins, particularly, HtmlUnit and Selenium. 
> During this summer, I plan to make more contributions and help the community 
> grow. Also, I plan to port Nutch backend on Spark for an improved performance 
> and better after-crawl analysis. I am also interested in working on real-time 
> crawl analysis in Nutch through a clean and easy to understand visual 
> interface. 
> I am excited to be a part of this community!!! 
> Regards, 
> Karanjeet Singh 
> USC 
> 
> 
> On Sun, May 22, 2016 at 12:51 PM, Sebastian Nagel  > wrote:
> Dear all,
 
> 
 
> on behalf of the Nutch PMC it is my pleasure to announce
 
> that Karanjeet Singh has joined the Nutch team as committer
 
> and PMC member. Karanjeet, would you mind to introduce
 
> yourself and tell the Nutch community about your relation
 
> to Apache Nutch, what you have done or plan to do, etc.?
 
> 
 
> Congratulations and welcome on board!
 
> 
 
> Regards,
 
> Sebastian
 
> 
> ᐧ 


RE: [ANNOUNCE] New Nutch committer and PMC - Thamme Gowda N.

2016-05-24 Thread Markus Jelsma
Welcome Thamme Gowda!

Cheers,
Markus

 
 
-Original message-
> From:Thamme Gowda 
> Sent: Monday 23rd May 2016 0:56
> To: dev@nutch.apache.org; u...@nutch.apache.org
> Subject: Re: [ANNOUNCE] New Nutch committer and PMC - Thamme Gowda N.
> 
> Hi Sebastian, 
>  thanks for the invitation and setting this up. 
> 
> Hello everybody, 
> 
> I am so glad to be on board. 
> 
> About me: 
>   Im currently a grad student (masters) at Univ. of Southern California 
> (USC), Los Angeles. Im fortunate enough to meet professor Chris Mattmann at 
> USC. 
> Prior to my grad studies, I worked as a full-stack developer at few startups 
> in Bangalore, India. I am also a tech co-founder of a text analysis platform, 
> http://datoin.com . I found my interest in A.I. so here I 
> am at USC grad school. I am on my way for an internship at NASA JPL this 
> summer. 
> 
> How I met Nutch: 
>  In 2014, with my team at Datoin.com we integrated Crawler/Input component to 
> our platform. We picked Nutch because we had rest of the platform on Hadoop. 
> Boom! that was when I first put my hands on nutch code. 
>  Last fall I took a graduate level Information Retrieval (IR) course at USC 
> taught by prof. Mattmann. Then joined hands with his team at NASA JPL to work 
> on IR related projects. We use and improve Nutch. 
> 
> Some of my recent work related to Nutch: 
> Added an extension point and an extension to pass certain external URLS when 
> db.ignore.external is set. Fixed bugs and improved common crawl dumper. A 
> clustering toolkit for clustering Nutch output based on CSS styles and DOM 
> structures [2]... 
> 
> More coming soon this summer! 
> 
> I am interested in after-crawl analysis and bringing them back to Nutch as 
> extensions. 
> I also presented "Clustering the output of Nutch " at recent ApacheCon NA 
> [1]. 
> 
> I also love work on these: 
>   reusable JVM containers to make it fast and efficient. Thinking of 
> spark execution backend (A step ahead - a switchable execution backend to 
> support MR and Spark, just like what Gora did to storage backend). 
> stats and analytics of crawl job in real-time
> I am exicted to be involved with the community to imrove Nutch. 
> 
> - 
> Thanks and Regards, 
> Thamme 
> 
> [1] 
> http://www.slideshare.net/thammegowda/clustering-output-of-apache-nutch-using-apache-spark
>  
> [2]
>  https://github.com/uscdataScience/autoextractor/wiki/Clustering-Tutorial 
> 
> 
> -- 
> Thamme Gowda  
> Grad Student at USC   
> @thammegowda  | 213-536-3552 
> http://scf.usc.edu/~tnarayan/ 
> 
> On Sun, May 22, 2016 at 1:02 PM, Sebastian Nagel  > wrote:
> Dear all,
 
> 
 
> it is my pleasure to announce that Thamme Gowda N. has joined us
 
> as committer and member of the Nutch PMC.  Congratulations on your
 
> new role within the Apache Nutch community!
 
> 
 
> Thamme, would you mind telling us about yourself, your relation
 
> to Nutch, what youve done so far, etc.?
 
> 
 
> Cheers and welcome on board!
 
> 
 
> Sebastian (on behalf of the Nutch PMC)
 
>