[jira] [Updated] (NUTCH-2856) Implement a protocol-smb plugin based on hierynomus/smbj

2021-12-30 Thread Lewis John McGibbney (Jira)


 [ 
https://issues.apache.org/jira/browse/NUTCH-2856?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lewis John McGibbney updated NUTCH-2856:

Summary: Implement a protocol-smb plugin based on hierynomus/smbj  (was: 
Implement a protocol-smb plugin based on )

> Implement a protocol-smb plugin based on hierynomus/smbj
> 
>
> Key: NUTCH-2856
> URL: https://issues.apache.org/jira/browse/NUTCH-2856
> Project: Nutch
>  Issue Type: New Feature
>  Components: external, plugin, protocol
>Reporter: Hiran Chaudhuri
>Assignee: Lewis John McGibbney
>Priority: Major
> Fix For: 1.19
>
>
> The plugin protocol-smb advertized on 
> [https://cwiki.apache.org/confluence/display/NUTCH/PluginCentral] actually 
> refers to the JCIFS library. According to this library's homepage 
> [https://www.jcifs.org/]:
> _If you're looking for the latest and greatest open source Java SMB library, 
> this is not it. JCIFS has been in maintenance-mode-only for several years and 
> although what it does support works fine (SMB1, NTLMv2, midlc, MSRPC and 
> various utility classes), jCIFS does not support the newer SMB2/3 variants of 
> the SMB protocol which is slowly becoming required (Windows 10 requires 
> SMB2/3). JCIFS only supports SMB1 but Microsoft has deprecated SMB1 in their 
> products. *So if SMB1 is disabled on your network, JCIFS' file related 
> operations will NOT work.*_
> Looking at 
> [https://en.wikipedia.org/wiki/Server_Message_Block#SMB_/_CIFS_/_SMB1:|https://en.wikipedia.org/wiki/Server_Message_Block#SMB_/_CIFS_/_SMB1]
> _Microsoft added SMB1 to the Windows Server 2012 R2 deprecation list in June 
> 2013. Windows Server 2016 and some versions of Windows 10 Fall Creators 
> Update do not have SMB1 installed by default._
> As a conclusion, the chances that SMB1 protocol is installed and/or 
> configured are getting vastly smaller. Therefore some migration towards 
> SMB2/3 is required. Luckily the JCIFS homepage lists alternatives:
>  * [jcifs-codelibs|https://github.com/codelibs/jcifs]
>  * [jcifs-ng|https://github.com/AgNO3/jcifs-ng]
>  * [smbj|https://github.com/hierynomus/smbj]



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (NUTCH-2856) Implement an appropriately licensed protocol-smb plugin

2021-12-30 Thread Lewis John McGibbney (Jira)


[ 
https://issues.apache.org/jira/browse/NUTCH-2856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17467111#comment-17467111
 ] 

Lewis John McGibbney commented on NUTCH-2856:
-

Adding some notes from my research. 
* The smbj API looks very intuitive I think it will be a great fit
* I was concerned about acquiring a SMB server which could be used for 
integration tests. Luckily the smbj project does have integration tests which 
show hwo this can be done but there were some missing pieces. They create an 
SMB (samba) server via Docker however they did not publish the image. Luckily a 
fellow Tika PMC took the initiative to 
[clone|https://github.com/nddipiazza/smbj-docker] and 
[publish|https://hub.docker.com/r/ndipiazza/smbj-inttest] it.
* In the Gora project, we've been using 
[testcontainers|https://www.testcontainers.org/] for some time. This allows us 
to perform integration testing easily as you can either run a precanned 
container or you can [arbitrarily define 
one|https://www.testcontainers.org/features/creating_container/]. In this case, 
I can simply reference _ndipiazza/smbj-inttest_ and then test against it. There 
is a downside to this however, the host running the tests must have Docker 
installed. I need to therefore figure out a means of running this particular 
integration test only if the host has Docker installed and skipping it 
otherwise.

> Implement an appropriately licensed protocol-smb plugin
> ---
>
> Key: NUTCH-2856
> URL: https://issues.apache.org/jira/browse/NUTCH-2856
> Project: Nutch
>  Issue Type: New Feature
>  Components: external, plugin, protocol
>Reporter: Hiran Chaudhuri
>Assignee: Lewis John McGibbney
>Priority: Major
> Fix For: 1.19
>
>
> The plugin protocol-smb advertized on 
> [https://cwiki.apache.org/confluence/display/NUTCH/PluginCentral] actually 
> refers to the JCIFS library. According to this library's homepage 
> [https://www.jcifs.org/]:
> _If you're looking for the latest and greatest open source Java SMB library, 
> this is not it. JCIFS has been in maintenance-mode-only for several years and 
> although what it does support works fine (SMB1, NTLMv2, midlc, MSRPC and 
> various utility classes), jCIFS does not support the newer SMB2/3 variants of 
> the SMB protocol which is slowly becoming required (Windows 10 requires 
> SMB2/3). JCIFS only supports SMB1 but Microsoft has deprecated SMB1 in their 
> products. *So if SMB1 is disabled on your network, JCIFS' file related 
> operations will NOT work.*_
> Looking at 
> [https://en.wikipedia.org/wiki/Server_Message_Block#SMB_/_CIFS_/_SMB1:|https://en.wikipedia.org/wiki/Server_Message_Block#SMB_/_CIFS_/_SMB1]
> _Microsoft added SMB1 to the Windows Server 2012 R2 deprecation list in June 
> 2013. Windows Server 2016 and some versions of Windows 10 Fall Creators 
> Update do not have SMB1 installed by default._
> As a conclusion, the chances that SMB1 protocol is installed and/or 
> configured are getting vastly smaller. Therefore some migration towards 
> SMB2/3 is required. Luckily the JCIFS homepage lists alternatives:
>  * [jcifs-codelibs|https://github.com/codelibs/jcifs]
>  * [jcifs-ng|https://github.com/AgNO3/jcifs-ng]
>  * [smbj|https://github.com/hierynomus/smbj]



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (NUTCH-2856) Implement a protocol-smb plugin based on

2021-12-30 Thread Lewis John McGibbney (Jira)


 [ 
https://issues.apache.org/jira/browse/NUTCH-2856?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lewis John McGibbney updated NUTCH-2856:

Summary: Implement a protocol-smb plugin based on   (was: Implement an 
appropriately licensed protocol-smb plugin)

> Implement a protocol-smb plugin based on 
> -
>
> Key: NUTCH-2856
> URL: https://issues.apache.org/jira/browse/NUTCH-2856
> Project: Nutch
>  Issue Type: New Feature
>  Components: external, plugin, protocol
>Reporter: Hiran Chaudhuri
>Assignee: Lewis John McGibbney
>Priority: Major
> Fix For: 1.19
>
>
> The plugin protocol-smb advertized on 
> [https://cwiki.apache.org/confluence/display/NUTCH/PluginCentral] actually 
> refers to the JCIFS library. According to this library's homepage 
> [https://www.jcifs.org/]:
> _If you're looking for the latest and greatest open source Java SMB library, 
> this is not it. JCIFS has been in maintenance-mode-only for several years and 
> although what it does support works fine (SMB1, NTLMv2, midlc, MSRPC and 
> various utility classes), jCIFS does not support the newer SMB2/3 variants of 
> the SMB protocol which is slowly becoming required (Windows 10 requires 
> SMB2/3). JCIFS only supports SMB1 but Microsoft has deprecated SMB1 in their 
> products. *So if SMB1 is disabled on your network, JCIFS' file related 
> operations will NOT work.*_
> Looking at 
> [https://en.wikipedia.org/wiki/Server_Message_Block#SMB_/_CIFS_/_SMB1:|https://en.wikipedia.org/wiki/Server_Message_Block#SMB_/_CIFS_/_SMB1]
> _Microsoft added SMB1 to the Windows Server 2012 R2 deprecation list in June 
> 2013. Windows Server 2016 and some versions of Windows 10 Fall Creators 
> Update do not have SMB1 installed by default._
> As a conclusion, the chances that SMB1 protocol is installed and/or 
> configured are getting vastly smaller. Therefore some migration towards 
> SMB2/3 is required. Luckily the JCIFS homepage lists alternatives:
>  * [jcifs-codelibs|https://github.com/codelibs/jcifs]
>  * [jcifs-ng|https://github.com/AgNO3/jcifs-ng]
>  * [smbj|https://github.com/hierynomus/smbj]



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work started] (NUTCH-2856) Implement an appropriately licensed protocol-smb plugin

2021-12-30 Thread Lewis John McGibbney (Jira)


 [ 
https://issues.apache.org/jira/browse/NUTCH-2856?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on NUTCH-2856 started by Lewis John McGibbney.
---
> Implement an appropriately licensed protocol-smb plugin
> ---
>
> Key: NUTCH-2856
> URL: https://issues.apache.org/jira/browse/NUTCH-2856
> Project: Nutch
>  Issue Type: New Feature
>  Components: external, plugin, protocol
>Reporter: Hiran Chaudhuri
>Assignee: Lewis John McGibbney
>Priority: Major
> Fix For: 1.19
>
>
> The plugin protocol-smb advertized on 
> [https://cwiki.apache.org/confluence/display/NUTCH/PluginCentral] actually 
> refers to the JCIFS library. According to this library's homepage 
> [https://www.jcifs.org/]:
> _If you're looking for the latest and greatest open source Java SMB library, 
> this is not it. JCIFS has been in maintenance-mode-only for several years and 
> although what it does support works fine (SMB1, NTLMv2, midlc, MSRPC and 
> various utility classes), jCIFS does not support the newer SMB2/3 variants of 
> the SMB protocol which is slowly becoming required (Windows 10 requires 
> SMB2/3). JCIFS only supports SMB1 but Microsoft has deprecated SMB1 in their 
> products. *So if SMB1 is disabled on your network, JCIFS' file related 
> operations will NOT work.*_
> Looking at 
> [https://en.wikipedia.org/wiki/Server_Message_Block#SMB_/_CIFS_/_SMB1:|https://en.wikipedia.org/wiki/Server_Message_Block#SMB_/_CIFS_/_SMB1]
> _Microsoft added SMB1 to the Windows Server 2012 R2 deprecation list in June 
> 2013. Windows Server 2016 and some versions of Windows 10 Fall Creators 
> Update do not have SMB1 installed by default._
> As a conclusion, the chances that SMB1 protocol is installed and/or 
> configured are getting vastly smaller. Therefore some migration towards 
> SMB2/3 is required. Luckily the JCIFS homepage lists alternatives:
>  * [jcifs-codelibs|https://github.com/codelibs/jcifs]
>  * [jcifs-ng|https://github.com/AgNO3/jcifs-ng]
>  * [smbj|https://github.com/hierynomus/smbj]



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (NUTCH-2856) Implement an appropriately licensed protocol-smb plugin

2021-12-30 Thread Lewis John McGibbney (Jira)


 [ 
https://issues.apache.org/jira/browse/NUTCH-2856?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lewis John McGibbney updated NUTCH-2856:

Issue Type: New Feature  (was: Bug)

> Implement an appropriately licensed protocol-smb plugin
> ---
>
> Key: NUTCH-2856
> URL: https://issues.apache.org/jira/browse/NUTCH-2856
> Project: Nutch
>  Issue Type: New Feature
>  Components: external, plugin, protocol
>Reporter: Hiran Chaudhuri
>Assignee: Lewis John McGibbney
>Priority: Major
> Fix For: 1.19
>
>
> The plugin protocol-smb advertized on 
> [https://cwiki.apache.org/confluence/display/NUTCH/PluginCentral] actually 
> refers to the JCIFS library. According to this library's homepage 
> [https://www.jcifs.org/]:
> _If you're looking for the latest and greatest open source Java SMB library, 
> this is not it. JCIFS has been in maintenance-mode-only for several years and 
> although what it does support works fine (SMB1, NTLMv2, midlc, MSRPC and 
> various utility classes), jCIFS does not support the newer SMB2/3 variants of 
> the SMB protocol which is slowly becoming required (Windows 10 requires 
> SMB2/3). JCIFS only supports SMB1 but Microsoft has deprecated SMB1 in their 
> products. *So if SMB1 is disabled on your network, JCIFS' file related 
> operations will NOT work.*_
> Looking at 
> [https://en.wikipedia.org/wiki/Server_Message_Block#SMB_/_CIFS_/_SMB1:|https://en.wikipedia.org/wiki/Server_Message_Block#SMB_/_CIFS_/_SMB1]
> _Microsoft added SMB1 to the Windows Server 2012 R2 deprecation list in June 
> 2013. Windows Server 2016 and some versions of Windows 10 Fall Creators 
> Update do not have SMB1 installed by default._
> As a conclusion, the chances that SMB1 protocol is installed and/or 
> configured are getting vastly smaller. Therefore some migration towards 
> SMB2/3 is required. Luckily the JCIFS homepage lists alternatives:
>  * [jcifs-codelibs|https://github.com/codelibs/jcifs]
>  * [jcifs-ng|https://github.com/AgNO3/jcifs-ng]
>  * [smbj|https://github.com/hierynomus/smbj]



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (NUTCH-2429) Fix Plugin System to allow protocol plugins to bundle their URLStreamHandlers

2021-12-30 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/NUTCH-2429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17467069#comment-17467069
 ] 

ASF GitHub Bot commented on NUTCH-2429:
---

lewismc commented on pull request #720:
URL: https://github.com/apache/nutch/pull/720#issuecomment-1003244488


   This should also pave the way for me to work on 
[NUTCH-2856](https://issues.apache.org/jira/browse/NUTCH-2856)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@nutch.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Fix Plugin System to allow protocol plugins to bundle their URLStreamHandlers
> -
>
> Key: NUTCH-2429
> URL: https://issues.apache.org/jira/browse/NUTCH-2429
> Project: Nutch
>  Issue Type: Improvement
>  Components: commoncrawl
>Affects Versions: 1.14
> Environment: Tested on both Nutch 1.13 and 1.14 in Ubuntu Linux with 
> OpenJDK 1.8.
>Reporter: Hiran Chaudhuri
>Priority: Major
> Fix For: 1.19
>
>
> While trying to use the protocol-smb plugin (which is not part of the Nutch 
> distribution) I realized there are four steps to successfully make use of a 
> protocol plugin:
> 1 - put the artifact into the plugins directory
> 2 - modify Nutch configuration files to allow smb:// urls plus include the 
> plugin to the loaded list
> 3 - extract jcifs.jar and place it on the system classpath
> 4 - run nutch with the correct system property
> While steps 1 and 2 seem obvious, 3 and 4 require knowledge of plugin 
> internals which does not feel right for nutch and plugin users. Even more, 
> the jcifs.jar would exist twice on the classpath and could even cause further 
> problems during runtime.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[GitHub] [nutch] lewismc commented on pull request #720: NUTCH-2429 Fix Plugin System to allow protocol plugins to bundle their URLStreamHandlers

2021-12-30 Thread GitBox


lewismc commented on pull request #720:
URL: https://github.com/apache/nutch/pull/720#issuecomment-1003244488


   This should also pave the way for me to work on 
[NUTCH-2856](https://issues.apache.org/jira/browse/NUTCH-2856)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@nutch.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Updated] (NUTCH-2856) Implement an appropriately licensed protocol-smb plugin

2021-12-30 Thread Lewis John McGibbney (Jira)


 [ 
https://issues.apache.org/jira/browse/NUTCH-2856?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lewis John McGibbney updated NUTCH-2856:

Summary: Implement an appropriately licensed protocol-smb plugin  (was: 
protocol-smb plugin is outdated)

> Implement an appropriately licensed protocol-smb plugin
> ---
>
> Key: NUTCH-2856
> URL: https://issues.apache.org/jira/browse/NUTCH-2856
> Project: Nutch
>  Issue Type: Bug
>  Components: external, plugin, protocol
>Reporter: Hiran Chaudhuri
>Assignee: Lewis John McGibbney
>Priority: Major
> Fix For: 1.19
>
>
> The plugin protocol-smb advertized on 
> [https://cwiki.apache.org/confluence/display/NUTCH/PluginCentral] actually 
> refers to the JCIFS library. According to this library's homepage 
> [https://www.jcifs.org/]:
> _If you're looking for the latest and greatest open source Java SMB library, 
> this is not it. JCIFS has been in maintenance-mode-only for several years and 
> although what it does support works fine (SMB1, NTLMv2, midlc, MSRPC and 
> various utility classes), jCIFS does not support the newer SMB2/3 variants of 
> the SMB protocol which is slowly becoming required (Windows 10 requires 
> SMB2/3). JCIFS only supports SMB1 but Microsoft has deprecated SMB1 in their 
> products. *So if SMB1 is disabled on your network, JCIFS' file related 
> operations will NOT work.*_
> Looking at 
> [https://en.wikipedia.org/wiki/Server_Message_Block#SMB_/_CIFS_/_SMB1:|https://en.wikipedia.org/wiki/Server_Message_Block#SMB_/_CIFS_/_SMB1]
> _Microsoft added SMB1 to the Windows Server 2012 R2 deprecation list in June 
> 2013. Windows Server 2016 and some versions of Windows 10 Fall Creators 
> Update do not have SMB1 installed by default._
> As a conclusion, the chances that SMB1 protocol is installed and/or 
> configured are getting vastly smaller. Therefore some migration towards 
> SMB2/3 is required. Luckily the JCIFS homepage lists alternatives:
>  * [jcifs-codelibs|https://github.com/codelibs/jcifs]
>  * [jcifs-ng|https://github.com/AgNO3/jcifs-ng]
>  * [smbj|https://github.com/hierynomus/smbj]



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (NUTCH-2429) Fix Plugin System to allow protocol plugins to bundle their URLStreamHandlers

2021-12-30 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/NUTCH-2429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17467066#comment-17467066
 ] 

ASF GitHub Bot commented on NUTCH-2429:
---

lewismc commented on pull request #222:
URL: https://github.com/apache/nutch/pull/222#issuecomment-1003242122


   Superseded by #720 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@nutch.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Fix Plugin System to allow protocol plugins to bundle their URLStreamHandlers
> -
>
> Key: NUTCH-2429
> URL: https://issues.apache.org/jira/browse/NUTCH-2429
> Project: Nutch
>  Issue Type: Improvement
>  Components: commoncrawl
>Affects Versions: 1.14
> Environment: Tested on both Nutch 1.13 and 1.14 in Ubuntu Linux with 
> OpenJDK 1.8.
>Reporter: Hiran Chaudhuri
>Priority: Major
> Fix For: 1.19
>
>
> While trying to use the protocol-smb plugin (which is not part of the Nutch 
> distribution) I realized there are four steps to successfully make use of a 
> protocol plugin:
> 1 - put the artifact into the plugins directory
> 2 - modify Nutch configuration files to allow smb:// urls plus include the 
> plugin to the loaded list
> 3 - extract jcifs.jar and place it on the system classpath
> 4 - run nutch with the correct system property
> While steps 1 and 2 seem obvious, 3 and 4 require knowledge of plugin 
> internals which does not feel right for nutch and plugin users. Even more, 
> the jcifs.jar would exist twice on the classpath and could even cause further 
> problems during runtime.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (NUTCH-2429) Fix Plugin System to allow protocol plugins to bundle their URLStreamHandlers

2021-12-30 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/NUTCH-2429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17467065#comment-17467065
 ] 

ASF GitHub Bot commented on NUTCH-2429:
---

lewismc closed pull request #222:
URL: https://github.com/apache/nutch/pull/222


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@nutch.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Fix Plugin System to allow protocol plugins to bundle their URLStreamHandlers
> -
>
> Key: NUTCH-2429
> URL: https://issues.apache.org/jira/browse/NUTCH-2429
> Project: Nutch
>  Issue Type: Improvement
>  Components: commoncrawl
>Affects Versions: 1.14
> Environment: Tested on both Nutch 1.13 and 1.14 in Ubuntu Linux with 
> OpenJDK 1.8.
>Reporter: Hiran Chaudhuri
>Priority: Major
> Fix For: 1.19
>
>
> While trying to use the protocol-smb plugin (which is not part of the Nutch 
> distribution) I realized there are four steps to successfully make use of a 
> protocol plugin:
> 1 - put the artifact into the plugins directory
> 2 - modify Nutch configuration files to allow smb:// urls plus include the 
> plugin to the loaded list
> 3 - extract jcifs.jar and place it on the system classpath
> 4 - run nutch with the correct system property
> While steps 1 and 2 seem obvious, 3 and 4 require knowledge of plugin 
> internals which does not feel right for nutch and plugin users. Even more, 
> the jcifs.jar would exist twice on the classpath and could even cause further 
> problems during runtime.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[GitHub] [nutch] lewismc commented on pull request #222: NUTCH-2429 Fix Plugin System to allow protocol plugins to bundle their URLStreamHandlers

2021-12-30 Thread GitBox


lewismc commented on pull request #222:
URL: https://github.com/apache/nutch/pull/222#issuecomment-1003242122


   Superseded by #720 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@nutch.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [nutch] lewismc closed pull request #222: NUTCH-2429 Fix Plugin System to allow protocol plugins to bundle their URLStreamHandlers

2021-12-30 Thread GitBox


lewismc closed pull request #222:
URL: https://github.com/apache/nutch/pull/222


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@nutch.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (NUTCH-2429) Fix Plugin System to allow protocol plugins to bundle their URLStreamHandlers

2021-12-30 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/NUTCH-2429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17467064#comment-17467064
 ] 

ASF GitHub Bot commented on NUTCH-2429:
---

lewismc opened a new pull request #720:
URL: https://github.com/apache/nutch/pull/720


   This issue addresses 
[NUTCH-2429](https://issues.apache.org/jira/browse/NUTCH-2429). Some notes
   * supersedes #222 by updating everything which was done there (excellent 
work @ HiranChaudhuri )
   * incorporates @sebastian-nagel work a la  
[/sebastian-nagel/nutch/tree/NUTCH-2429](https://github.com/sebastian-nagel/nutch/commit/e589f05ef42486892427d347ecd10abfa9e380d7)
   * organizes the imports for each Class touched in this pull request
   * addresses a couple of rogue Classes which declared `public static final 
Logger` --> `private static final Logger`


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@nutch.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Fix Plugin System to allow protocol plugins to bundle their URLStreamHandlers
> -
>
> Key: NUTCH-2429
> URL: https://issues.apache.org/jira/browse/NUTCH-2429
> Project: Nutch
>  Issue Type: Improvement
>  Components: commoncrawl
>Affects Versions: 1.14
> Environment: Tested on both Nutch 1.13 and 1.14 in Ubuntu Linux with 
> OpenJDK 1.8.
>Reporter: Hiran Chaudhuri
>Priority: Major
> Fix For: 1.19
>
>
> While trying to use the protocol-smb plugin (which is not part of the Nutch 
> distribution) I realized there are four steps to successfully make use of a 
> protocol plugin:
> 1 - put the artifact into the plugins directory
> 2 - modify Nutch configuration files to allow smb:// urls plus include the 
> plugin to the loaded list
> 3 - extract jcifs.jar and place it on the system classpath
> 4 - run nutch with the correct system property
> While steps 1 and 2 seem obvious, 3 and 4 require knowledge of plugin 
> internals which does not feel right for nutch and plugin users. Even more, 
> the jcifs.jar would exist twice on the classpath and could even cause further 
> problems during runtime.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[GitHub] [nutch] lewismc opened a new pull request #720: NUTCH-2429 Fix Plugin System to allow protocol plugins to bundle their URLStreamHandlers

2021-12-30 Thread GitBox


lewismc opened a new pull request #720:
URL: https://github.com/apache/nutch/pull/720


   This issue addresses 
[NUTCH-2429](https://issues.apache.org/jira/browse/NUTCH-2429). Some notes
   * supersedes #222 by updating everything which was done there (excellent 
work @ HiranChaudhuri )
   * incorporates @sebastian-nagel work a la  
[/sebastian-nagel/nutch/tree/NUTCH-2429](https://github.com/sebastian-nagel/nutch/commit/e589f05ef42486892427d347ecd10abfa9e380d7)
   * organizes the imports for each Class touched in this pull request
   * addresses a couple of rogue Classes which declared `public static final 
Logger` --> `private static final Logger`


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@nutch.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (NUTCH-2924) Generate maxCount expr evaluated only once

2021-12-30 Thread Markus Jelsma (Jira)


[ 
https://issues.apache.org/jira/browse/NUTCH-2924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17466942#comment-17466942
 ] 

Markus Jelsma commented on NUTCH-2924:
--

Updated patch, logging INFO > DEBUG. Otherwise slow reducers due to excessive 
logging.

> Generate maxCount expr evaluated only once
> --
>
> Key: NUTCH-2924
> URL: https://issues.apache.org/jira/browse/NUTCH-2924
> Project: Nutch
>  Issue Type: Bug
>  Components: generator
>Affects Versions: 1.16
>Reporter: Markus Jelsma
>Assignee: Markus Jelsma
>Priority: Major
> Fix For: 1.19
>
> Attachments: NUTCH-2924-1.patch, NUTCH-2924.patch
>
>
> The generate.maxCount expression is evaluated only once in the generator's 
> reducer, instead, it must be set once per host.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (NUTCH-2924) Generate maxCount expr evaluated only once

2021-12-30 Thread Markus Jelsma (Jira)


 [ 
https://issues.apache.org/jira/browse/NUTCH-2924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Markus Jelsma updated NUTCH-2924:
-
Attachment: NUTCH-2924-1.patch

> Generate maxCount expr evaluated only once
> --
>
> Key: NUTCH-2924
> URL: https://issues.apache.org/jira/browse/NUTCH-2924
> Project: Nutch
>  Issue Type: Bug
>  Components: generator
>Affects Versions: 1.16
>Reporter: Markus Jelsma
>Assignee: Markus Jelsma
>Priority: Major
> Fix For: 1.19
>
> Attachments: NUTCH-2924-1.patch, NUTCH-2924.patch
>
>
> The generate.maxCount expression is evaluated only once in the generator's 
> reducer, instead, it must be set once per host.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (NUTCH-2924) Generate maxCount expr evaluated only once

2021-12-30 Thread Markus Jelsma (Jira)


 [ 
https://issues.apache.org/jira/browse/NUTCH-2924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Markus Jelsma updated NUTCH-2924:
-
Attachment: NUTCH-2924.patch

> Generate maxCount expr evaluated only once
> --
>
> Key: NUTCH-2924
> URL: https://issues.apache.org/jira/browse/NUTCH-2924
> Project: Nutch
>  Issue Type: Bug
>  Components: generator
>Affects Versions: 1.16
>Reporter: Markus Jelsma
>Assignee: Markus Jelsma
>Priority: Major
> Fix For: 1.19
>
> Attachments: NUTCH-2924.patch
>
>
> The generate.maxCount expression is evaluated only once in the generator's 
> reducer, instead, it must be set once per host.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (NUTCH-2924) Generate maxCount expr evaluated only once

2021-12-30 Thread Markus Jelsma (Jira)


[ 
https://issues.apache.org/jira/browse/NUTCH-2924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17466926#comment-17466926
 ] 

Markus Jelsma commented on NUTCH-2924:
--

Patch, again, only for 1.15, for now.

> Generate maxCount expr evaluated only once
> --
>
> Key: NUTCH-2924
> URL: https://issues.apache.org/jira/browse/NUTCH-2924
> Project: Nutch
>  Issue Type: Bug
>  Components: generator
>Affects Versions: 1.16
>Reporter: Markus Jelsma
>Assignee: Markus Jelsma
>Priority: Major
> Fix For: 1.19
>
> Attachments: NUTCH-2924.patch
>
>
> The generate.maxCount expression is evaluated only once in the generator's 
> reducer, instead, it must be set once per host.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (NUTCH-2924) Generate maxCount expr evaluated only once

2021-12-30 Thread Markus Jelsma (Jira)
Markus Jelsma created NUTCH-2924:


 Summary: Generate maxCount expr evaluated only once
 Key: NUTCH-2924
 URL: https://issues.apache.org/jira/browse/NUTCH-2924
 Project: Nutch
  Issue Type: Bug
  Components: generator
Affects Versions: 1.16
Reporter: Markus Jelsma
Assignee: Markus Jelsma
 Fix For: 1.19


The generate.maxCount expression is evaluated only once in the generator's 
reducer, instead, it must be set once per host.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)