[jira] [Commented] (NUTCH-1527) Port nutch-elasticsearch-indexer to Nutch

2013-05-27 Thread Markus Jelsma (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-1527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13667642#comment-13667642
 ] 

Markus Jelsma commented on NUTCH-1527:
--

Hi Luca, sure you can help out. The patch should be rewritten to work with 
NUTCH-1047 as pluggable indexer. It would be great to have this in svn.

> Port nutch-elasticsearch-indexer to Nutch
> -
>
> Key: NUTCH-1527
> URL: https://issues.apache.org/jira/browse/NUTCH-1527
> Project: Nutch
>  Issue Type: Bug
>  Components: indexer
>Affects Versions: 1.6, 2.1
>Reporter: Lewis John McGibbney
>Assignee: lufeng
>Priority: Minor
> Fix For: 2.4
>
> Attachments: NUTCH-1527.patch
>
>
> The source repos for this can be found here [0].
> This issue should be inline with the work already done by Julien and others 
> over at NUTCH-1047.
> [0] https://github.com/ctjmorgan/nutch-elasticsearch-indexer

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (NUTCH-1527) Port nutch-elasticsearch-indexer to Nutch

2013-05-27 Thread lufeng (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-1527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13667766#comment-13667766
 ] 

lufeng commented on NUTCH-1527:
---

Hi luca,sorry for my delayed reply, yes, you can improve this patch follow
you suggestion, can I assign this issue to you, I am willing to testing it.
Thanks. Luca.




-- 
Don't Grow Old, Grow Up... :-)


> Port nutch-elasticsearch-indexer to Nutch
> -
>
> Key: NUTCH-1527
> URL: https://issues.apache.org/jira/browse/NUTCH-1527
> Project: Nutch
>  Issue Type: Bug
>  Components: indexer
>Affects Versions: 1.6, 2.1
>Reporter: Lewis John McGibbney
>Assignee: lufeng
>Priority: Minor
> Fix For: 2.4
>
> Attachments: NUTCH-1527.patch
>
>
> The source repos for this can be found here [0].
> This issue should be inline with the work already done by Julien and others 
> over at NUTCH-1047.
> [0] https://github.com/ctjmorgan/nutch-elasticsearch-indexer

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (NUTCH-1527) Port nutch-elasticsearch-indexer to Nutch

2013-05-27 Thread Luca Cavanna (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-1527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13667770#comment-13667770
 ] 

Luca Cavanna commented on NUTCH-1527:
-

Ok guys, I will look into this the coming days.

> Port nutch-elasticsearch-indexer to Nutch
> -
>
> Key: NUTCH-1527
> URL: https://issues.apache.org/jira/browse/NUTCH-1527
> Project: Nutch
>  Issue Type: Bug
>  Components: indexer
>Affects Versions: 1.6, 2.1
>Reporter: Lewis John McGibbney
>Assignee: lufeng
>Priority: Minor
> Fix For: 2.4
>
> Attachments: NUTCH-1527.patch
>
>
> The source repos for this can be found here [0].
> This issue should be inline with the work already done by Julien and others 
> over at NUTCH-1047.
> [0] https://github.com/ctjmorgan/nutch-elasticsearch-indexer

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (NUTCH-1527) Port nutch-elasticsearch-indexer to Nutch

2013-05-27 Thread lufeng (JIRA)

 [ 
https://issues.apache.org/jira/browse/NUTCH-1527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

lufeng updated NUTCH-1527:
--

Assignee: (was: lufeng)

> Port nutch-elasticsearch-indexer to Nutch
> -
>
> Key: NUTCH-1527
> URL: https://issues.apache.org/jira/browse/NUTCH-1527
> Project: Nutch
>  Issue Type: Bug
>  Components: indexer
>Affects Versions: 1.6, 2.1
>Reporter: Lewis John McGibbney
>Priority: Minor
> Fix For: 2.4
>
> Attachments: NUTCH-1527.patch
>
>
> The source repos for this can be found here [0].
> This issue should be inline with the work already done by Julien and others 
> over at NUTCH-1047.
> [0] https://github.com/ctjmorgan/nutch-elasticsearch-indexer

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (NUTCH-1527) Port nutch-elasticsearch-indexer to Nutch

2013-05-27 Thread lufeng (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-1527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13667775#comment-13667775
 ] 

lufeng commented on NUTCH-1527:
---

Hi luca, now you can click assign to me,and then attach you improvement patch, 
thanks luca.

> Port nutch-elasticsearch-indexer to Nutch
> -
>
> Key: NUTCH-1527
> URL: https://issues.apache.org/jira/browse/NUTCH-1527
> Project: Nutch
>  Issue Type: Bug
>  Components: indexer
>Affects Versions: 1.6, 2.1
>Reporter: Lewis John McGibbney
>Priority: Minor
> Fix For: 2.4
>
> Attachments: NUTCH-1527.patch
>
>
> The source repos for this can be found here [0].
> This issue should be inline with the work already done by Julien and others 
> over at NUTCH-1047.
> [0] https://github.com/ctjmorgan/nutch-elasticsearch-indexer

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (NUTCH-1495) -normalize and -filter for updatedb command in nutch 2.x

2013-05-27 Thread Alexey K (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-1495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13667824#comment-13667824
 ] 

Alexey K commented on NUTCH-1495:
-

I've tried your patch on 2.1 version.
I'm using updatedb -filter but rules from regex-urlfilter.txt isn't applied and 
I'm getting extra new URLs in database

At the same time rules correctly applied when I'm using inject command

> -normalize and -filter for updatedb command in nutch 2.x
> 
>
> Key: NUTCH-1495
> URL: https://issues.apache.org/jira/browse/NUTCH-1495
> Project: Nutch
>  Issue Type: Improvement
>Affects Versions: 2.2
>Reporter: Nathan Gass
> Fix For: 2.3
>
> Attachments: patch-updatedb-normalize-filter-2012-11-09.txt, 
> patch-updatedb-normalize-filter-2012-11-13.txt
>
>
> AFAIS in nutch 1.x you could change your url filters and normalizers during 
> the crawl, and update the db using crawldb -normalize -filter. There does not 
> seem to be a away to achieve the same in nutch 2.x?
> Anyway, I went ahead and tried to implement -normalize and -filter for the 
> nutch 2.x updatedb command. I have no experience with any of the used 
> technologies including java, so please check the attached code carefully 
> before using it. I'm very interested to hear if this is the right approach or 
> any other comments.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira