[jira] [Commented] (NUTCH-2665) Upgrade to Apache Tika 1.19.1

2018-12-28 Thread Markus Jelsma (JIRA)


[ 
https://issues.apache.org/jira/browse/NUTCH-2665?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16730203#comment-16730203
 ] 

Markus Jelsma commented on NUTCH-2665:
--

Thanks!

> Upgrade to Apache Tika 1.19.1
> -
>
> Key: NUTCH-2665
> URL: https://issues.apache.org/jira/browse/NUTCH-2665
> Project: Nutch
>  Issue Type: Task
>  Components: parser
>Affects Versions: 2.3.1
>Reporter: Markus Jelsma
>Assignee: Markus Jelsma
>Priority: Major
> Attachments: NUTCH-2665.patch, NUTCH-2665.patch
>
>
> Borrowing from [~wastl-nagel]'s efforts on NUTCH-2651, 2.x can be upgraded to 
> Apache Tika 1.19.1 as well.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (NUTCH-2665) Upgrade to Apache Tika 1.19.1

2018-10-24 Thread Markus Jelsma (JIRA)


[ 
https://issues.apache.org/jira/browse/NUTCH-2665?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16662234#comment-16662234
 ] 

Markus Jelsma commented on NUTCH-2665:
--

On my machine it really fails with the latest patch, weird! When removing the 
patch everything passes, patching again causes this one to fail.

Also, what about duplicate NUTCH-2667. It has a patch but it doesn't entirely 
correspond to this one. Either one of the issues should be closed as duplicate.

> Upgrade to Apache Tika 1.19.1
> -
>
> Key: NUTCH-2665
> URL: https://issues.apache.org/jira/browse/NUTCH-2665
> Project: Nutch
>  Issue Type: Task
>  Components: parser
>Affects Versions: 2.3.1
>Reporter: Markus Jelsma
>Assignee: Markus Jelsma
>Priority: Major
> Fix For: 2.4
>
> Attachments: NUTCH-2665.patch, NUTCH-2665.patch
>
>
> Borrowing from [~wastl-nagel]'s efforts on NUTCH-2651, 2.x can be upgraded to 
> Apache Tika 1.19.1 as well.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (NUTCH-2665) Upgrade to Apache Tika 1.19.1

2018-10-24 Thread Sebastian Nagel (JIRA)


[ 
https://issues.apache.org/jira/browse/NUTCH-2665?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16662187#comment-16662187
 ] 

Sebastian Nagel commented on NUTCH-2665:


Hi [~markus17], I do not see this test failure when applying the second patch 
and running {{ant clean runtime test}}. But all parse-tika tests fail with a 
NoSuchMethodError. Did you run "clean"?

> Upgrade to Apache Tika 1.19.1
> -
>
> Key: NUTCH-2665
> URL: https://issues.apache.org/jira/browse/NUTCH-2665
> Project: Nutch
>  Issue Type: Task
>  Components: parser
>Affects Versions: 2.3.1
>Reporter: Markus Jelsma
>Assignee: Markus Jelsma
>Priority: Major
> Fix For: 2.4
>
> Attachments: NUTCH-2665.patch, NUTCH-2665.patch
>
>
> Borrowing from [~wastl-nagel]'s efforts on NUTCH-2651, 2.x can be upgraded to 
> Apache Tika 1.19.1 as well.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (NUTCH-2665) Upgrade to Apache Tika 1.19.1

2018-10-24 Thread Jorge Luis Betancourt Gonzalez (JIRA)


[ 
https://issues.apache.org/jira/browse/NUTCH-2665?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16662072#comment-16662072
 ] 

Jorge Luis Betancourt Gonzalez commented on NUTCH-2665:
---

+1 [~markus17] I think it's safe to update the test.

> Upgrade to Apache Tika 1.19.1
> -
>
> Key: NUTCH-2665
> URL: https://issues.apache.org/jira/browse/NUTCH-2665
> Project: Nutch
>  Issue Type: Task
>  Components: parser
>Affects Versions: 2.3.1
>Reporter: Markus Jelsma
>Assignee: Markus Jelsma
>Priority: Major
> Fix For: 2.4
>
> Attachments: NUTCH-2665.patch, NUTCH-2665.patch
>
>
> Borrowing from [~wastl-nagel]'s efforts on NUTCH-2651, 2.x can be upgraded to 
> Apache Tika 1.19.1 as well.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (NUTCH-2665) Upgrade to Apache Tika 1.19.1

2018-10-24 Thread Markus Jelsma (JIRA)


[ 
https://issues.apache.org/jira/browse/NUTCH-2665?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16661983#comment-16661983
 ] 

Markus Jelsma commented on NUTCH-2665:
--

Helloe [~axr], yes it compiles fine, that is where the default.properties patch 
is for.

Running tests:
{code}
ContentType http://127.0.0.1:47501/basic-http.jsp 
expected:<[application/xhtml+x]ml> but was:<[text/ht]ml>
junit.framework.AssertionFailedError: ContentType 
http://127.0.0.1:47501/basic-http.jsp expected:<[application/xhtml+x]ml> but 
was:<[text/ht]ml>
at 
org.apache.nutch.protocol.http.TestProtocolHttp.fetchPage(TestProtocolHttp.java:134)
at 
org.apache.nutch.protocol.http.TestProtocolHttp.testStatusCode(TestProtocolHttp.java:79)
{code}

This fails, but i am actually fine with this response. I propose to change the 
test to assert for text/html instead. Opinions?

> Upgrade to Apache Tika 1.19.1
> -
>
> Key: NUTCH-2665
> URL: https://issues.apache.org/jira/browse/NUTCH-2665
> Project: Nutch
>  Issue Type: Task
>  Components: parser
>Affects Versions: 2.3.1
>Reporter: Markus Jelsma
>Assignee: Markus Jelsma
>Priority: Major
> Fix For: 2.4
>
> Attachments: NUTCH-2665.patch, NUTCH-2665.patch
>
>
> Borrowing from [~wastl-nagel]'s efforts on NUTCH-2651, 2.x can be upgraded to 
> Apache Tika 1.19.1 as well.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (NUTCH-2665) Upgrade to Apache Tika 1.19.1

2018-10-23 Thread Akshar Dave (JIRA)


[ 
https://issues.apache.org/jira/browse/NUTCH-2665?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16661680#comment-16661680
 ] 

Akshar Dave commented on NUTCH-2665:


were you able to commit this change and successfully build? I am trying to 
build locally after merging all the changes and getting dependency related 
error:

[ivy:resolve] ::
[ivy:resolve] :: UNRESOLVED DEPENDENCIES ::
[ivy:resolve] ::
[ivy:resolve] :: javax.measure#unit-api;working@axr.local: not found
[ivy:resolve] ::

> Upgrade to Apache Tika 1.19.1
> -
>
> Key: NUTCH-2665
> URL: https://issues.apache.org/jira/browse/NUTCH-2665
> Project: Nutch
>  Issue Type: Task
>  Components: parser
>Affects Versions: 2.3.1
>Reporter: Markus Jelsma
>Assignee: Markus Jelsma
>Priority: Major
> Fix For: 2.4
>
> Attachments: NUTCH-2665.patch, NUTCH-2665.patch
>
>
> Borrowing from [~wastl-nagel]'s efforts on NUTCH-2651, 2.x can be upgraded to 
> Apache Tika 1.19.1 as well.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (NUTCH-2665) Upgrade to Apache Tika 1.19.1

2018-10-23 Thread Markus Jelsma (JIRA)


[ 
https://issues.apache.org/jira/browse/NUTCH-2665?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16660625#comment-16660625
 ] 

Markus Jelsma commented on NUTCH-2665:
--

I'll commit this one later today, if i don't forget, unless further objections.


> Upgrade to Apache Tika 1.19.1
> -
>
> Key: NUTCH-2665
> URL: https://issues.apache.org/jira/browse/NUTCH-2665
> Project: Nutch
>  Issue Type: Task
>  Components: parser
>Affects Versions: 2.3.1
>Reporter: Markus Jelsma
>Assignee: Markus Jelsma
>Priority: Major
> Fix For: 2.4
>
> Attachments: NUTCH-2665.patch, NUTCH-2665.patch
>
>
> Borrowing from [~wastl-nagel]'s efforts on NUTCH-2651, 2.x can be upgraded to 
> Apache Tika 1.19.1 as well.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (NUTCH-2665) Upgrade to Apache Tika 1.19.1

2018-10-23 Thread Markus Jelsma (JIRA)


[ 
https://issues.apache.org/jira/browse/NUTCH-2665?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16660525#comment-16660525
 ] 

Markus Jelsma commented on NUTCH-2665:
--

Updated patch defining the property in ivysettings.xml.

> Upgrade to Apache Tika 1.19.1
> -
>
> Key: NUTCH-2665
> URL: https://issues.apache.org/jira/browse/NUTCH-2665
> Project: Nutch
>  Issue Type: Task
>  Components: parser
>Affects Versions: 2.3.1
>Reporter: Markus Jelsma
>Assignee: Markus Jelsma
>Priority: Major
> Fix For: 2.4
>
> Attachments: NUTCH-2665.patch, NUTCH-2665.patch
>
>
> Borrowing from [~wastl-nagel]'s efforts on NUTCH-2651, 2.x can be upgraded to 
> Apache Tika 1.19.1 as well.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (NUTCH-2665) Upgrade to Apache Tika 1.19.1

2018-10-23 Thread Sebastian Nagel (JIRA)


[ 
https://issues.apache.org/jira/browse/NUTCH-2665?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16660518#comment-16660518
 ] 

Sebastian Nagel commented on NUTCH-2665:


+1 Thanks, [~markus17]!
For 1.x I needed several trials to get the fix for the javax.ws dependency 
working on the [Jenkins builds|https://builds.apache.org/job/Nutch-trunk/]. 
Defining packaging.type=jar in the default.properties didn't work, also adding 
it as an ant param did not (equiv. to {{ant -Dpackaging.type=jar ...}}). 
Defining the property in the ivysettings.xml finally solved it, see 
[65c4fed|https://gitbox.apache.org/repos/asf?p=nutch.git;a=commitdiff;h=65c4fedfacdb873a050e97a50602ed366c7b5a98].
 Can you integrate this change into your patch?

> Upgrade to Apache Tika 1.19.1
> -
>
> Key: NUTCH-2665
> URL: https://issues.apache.org/jira/browse/NUTCH-2665
> Project: Nutch
>  Issue Type: Task
>  Components: parser
>Affects Versions: 2.3.1
>Reporter: Markus Jelsma
>Assignee: Markus Jelsma
>Priority: Major
> Fix For: 2.4
>
> Attachments: NUTCH-2665.patch
>
>
> Borrowing from [~wastl-nagel]'s efforts on NUTCH-2651, 2.x can be upgraded to 
> Apache Tika 1.19.1 as well.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (NUTCH-2665) Upgrade to Apache Tika 1.19.1

2018-10-23 Thread Markus Jelsma (JIRA)


[ 
https://issues.apache.org/jira/browse/NUTCH-2665?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16660455#comment-16660455
 ] 

Markus Jelsma commented on NUTCH-2665:
--

Patch for 2.x!

> Upgrade to Apache Tika 1.19.1
> -
>
> Key: NUTCH-2665
> URL: https://issues.apache.org/jira/browse/NUTCH-2665
> Project: Nutch
>  Issue Type: Task
>  Components: parser
>Affects Versions: 2.3.1
>Reporter: Markus Jelsma
>Assignee: Markus Jelsma
>Priority: Major
> Fix For: 2.4
>
> Attachments: NUTCH-2665.patch
>
>
> Borrowing from [~wastl-nagel]'s efforts on NUTCH-2651, 2.x can be upgraded to 
> Apache Tika 1.19.1 as well.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)