[
https://issues.apache.org/jira/browse/NUTCH-961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15203025#comment-15203025
]
ASF GitHub Bot commented on NUTCH-961:
--
Github user asfgit closed the pull request at:
[
https://issues.apache.org/jira/browse/NUTCH-961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15170557#comment-15170557
]
ASF GitHub Bot commented on NUTCH-961:
--
Github user lewismc commented on a diff in the pull request:
[
https://issues.apache.org/jira/browse/NUTCH-961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15170556#comment-15170556
]
ASF GitHub Bot commented on NUTCH-961:
--
Github user lewismc commented on a diff in the pull request:
[
https://issues.apache.org/jira/browse/NUTCH-961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15170555#comment-15170555
]
ASF GitHub Bot commented on NUTCH-961:
--
Github user lewismc commented on a diff in the pull request:
[
https://issues.apache.org/jira/browse/NUTCH-961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15170554#comment-15170554
]
ASF GitHub Bot commented on NUTCH-961:
--
Github user lewismc commented on a diff in the pull request:
[
https://issues.apache.org/jira/browse/NUTCH-961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15168821#comment-15168821
]
ASF GitHub Bot commented on NUTCH-961:
--
GitHub user jeremie70 opened a pull request:
[
https://issues.apache.org/jira/browse/NUTCH-961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15148758#comment-15148758
]
Hudson commented on NUTCH-961:
--
SUCCESS: Integrated in Nutch-trunk #3347 (See
[
https://issues.apache.org/jira/browse/NUTCH-961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15148642#comment-15148642
]
Markus Jelsma commented on NUTCH-961:
-
Tests pass as expected and Boilerpipe as well. Will commit
[
https://issues.apache.org/jira/browse/NUTCH-961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15117024#comment-15117024
]
Markus Jelsma commented on NUTCH-961:
-
Yes! :)
> Expose Tika's boilerpipe support
>
[
https://issues.apache.org/jira/browse/NUTCH-961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15117020#comment-15117020
]
Tien Nguyen Manh commented on NUTCH-961:
Can NUTCH-1233: use tika to extract outlink solve that
[
https://issues.apache.org/jira/browse/NUTCH-961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15116975#comment-15116975
]
Markus Jelsma commented on NUTCH-961:
-
With boilerpipe, you get only a very few outlinks, those found
[
https://issues.apache.org/jira/browse/NUTCH-961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15116772#comment-15116772
]
Tien Nguyen Manh commented on NUTCH-961:
AH yes, Could you explain why we need to parse it twice?
[
https://issues.apache.org/jira/browse/NUTCH-961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15114989#comment-15114989
]
Markus Jelsma commented on NUTCH-961:
-
That is probably due to the patch parsing twice. Once with BP
[
https://issues.apache.org/jira/browse/NUTCH-961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15114658#comment-15114658
]
Tien Nguyen Manh commented on NUTCH-961:
One note with boilerpipe support, it is significant slower
[
https://issues.apache.org/jira/browse/NUTCH-961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15110373#comment-15110373
]
Markus Jelsma commented on NUTCH-961:
-
Hello - that doesn't seem related to this issue as it doesn't
[
https://issues.apache.org/jira/browse/NUTCH-961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15111292#comment-15111292
]
Markus Jelsma commented on NUTCH-961:
-
Some news, the upstream Tika issue has been committed and
[
https://issues.apache.org/jira/browse/NUTCH-961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15110217#comment-15110217
]
Tien Nguyen Manh commented on NUTCH-961:
i'm using this patch NUTCH-961-1.11-1.patch, it works fine
[
https://issues.apache.org/jira/browse/NUTCH-961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15106570#comment-15106570
]
Markus Jelsma commented on NUTCH-961:
-
Yes but it requires NUTCH-1233.
> Expose Tika's boilerpipe
[
https://issues.apache.org/jira/browse/NUTCH-961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15106783#comment-15106783
]
Markus Jelsma commented on NUTCH-961:
-
Update, i've updated NUTCH-1233 for current trunk as well as a
[
https://issues.apache.org/jira/browse/NUTCH-961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15106152#comment-15106152
]
Otis Gospodnetic commented on NUTCH-961:
Any chance we could commit this,
[
https://issues.apache.org/jira/browse/NUTCH-961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14391558#comment-14391558
]
Alexander Kingson commented on NUTCH-961:
-
Hello,
Since I was not getting
[
https://issues.apache.org/jira/browse/NUTCH-961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13900180#comment-13900180
]
Markus Jelsma commented on NUTCH-961:
-
I am sorry, i did not mean to speak for the
[
https://issues.apache.org/jira/browse/NUTCH-961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13899044#comment-13899044
]
Matzz commented on NUTCH-961:
-
{quote}We don't use it BP anymore {quote}
BP integration will
[
https://issues.apache.org/jira/browse/NUTCH-961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13789686#comment-13789686
]
Otis Gospodnetic commented on NUTCH-961:
Looks like [~kkrugler] is offering to help
[
https://issues.apache.org/jira/browse/NUTCH-961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13789735#comment-13789735
]
Markus Jelsma commented on NUTCH-961:
-
Hi Otis - there are no significant improvements
[
https://issues.apache.org/jira/browse/NUTCH-961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13789894#comment-13789894
]
Otis Gospodnetic commented on NUTCH-961:
bq. We don't use it BP anymore
What do
[
https://issues.apache.org/jira/browse/NUTCH-961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13788911#comment-13788911
]
Nguyen Manh Tien commented on NUTCH-961:
I used patch NUTCH-961-2.1-v2.patch for
[
https://issues.apache.org/jira/browse/NUTCH-961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13617739#comment-13617739
]
Miles Rowland commented on NUTCH-961:
-
Roland, thanks for porting to 2.1. I'm having an
[
https://issues.apache.org/jira/browse/NUTCH-961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13592056#comment-13592056
]
Roland commented on NUTCH-961:
--
Kiran, did you already start porting it to 2.x?
[
https://issues.apache.org/jira/browse/NUTCH-961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13592179#comment-13592179
]
kiran commented on NUTCH-961:
-
No Roland, not yet. I just switched to using 1.x series, but i
Hey Kiran,
drop me a line prior to starting, I will give it a try tomorrow (I hope).
--Roland
Am 04.03.2013 14:13, schrieb kiran (JIRA):
[
[
https://issues.apache.org/jira/browse/NUTCH-961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13581459#comment-13581459
]
kiran commented on NUTCH-961:
-
Markus, do you think this patch can also work for 2.x Series ?
[
https://issues.apache.org/jira/browse/NUTCH-961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13581530#comment-13581530
]
Markus Jelsma commented on NUTCH-961:
-
Should work fine, parse plugins have not changed
[
https://issues.apache.org/jira/browse/NUTCH-961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13176194#comment-13176194
]
Markus Jelsma commented on NUTCH-961:
-
Fixed already. See NUTCH-1233 for a patch!
[
https://issues.apache.org/jira/browse/NUTCH-961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13047130#comment-13047130
]
Gabriele Kahlout commented on NUTCH-961:
{quote}it needs to use a different
[
https://issues.apache.org/jira/browse/NUTCH-961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13047490#comment-13047490
]
Ken Krugler commented on NUTCH-961:
---
The way that Boilerpipe in Tika works is that it
[
https://issues.apache.org/jira/browse/NUTCH-961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13047501#comment-13047501
]
Markus Jelsma commented on NUTCH-961:
-
Ah, that's great! Is this in 0.9 or trunk? We
[
https://issues.apache.org/jira/browse/NUTCH-961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13025286#comment-13025286
]
Gabriele Kahlout commented on NUTCH-961:
@Markus - Thank you.
Watch out for [1] in
[
https://issues.apache.org/jira/browse/NUTCH-961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13025295#comment-13025295
]
Markus Jelsma commented on NUTCH-961:
-
Not safely, there are still issues regarding
[
https://issues.apache.org/jira/browse/NUTCH-961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12987575#action_12987575
]
Markus Jelsma commented on NUTCH-961:
-
Boilerpipe comes with several algorithms for
40 matches
Mail list logo