RE: fetch deletes all metadata except _csh_ and _rs_

2016-02-23 Thread Adnane Benjelloun
Thanks you Lewis

I already report it last week on JIRA :

https://issues.apache.org/jira/browse/NUTCH-

Please let me know if you have any question to give you more details.

Best regards,

Adnane


-Original Message-
From: Lewis John Mcgibbney [mailto:lewis.mcgibb...@gmail.com] 
Sent: February 23, 2016 9:18 PM
To: user@nutch.apache.org
Subject: Re: fetch deletes all metadata except _csh_ and _rs_

Hi Adnane,
Yes, we were getting your mail. Just too busy too respond so thank you for your 
patience.
OK so this sounds like a bug IMHO. No Metadata should be deleted, at the most 
updates should occur, that is all.
Can you please log an issue at the Nutch Jira instance describing your Nutch 
2.X search stack along with an entire log if possible and any queries which can 
allow us to better understand the issue at hand?
Thanks in advance.
lewis

On Wed, Feb 17, 2016 at 10:34 AM,  wrote:

> From: Adnane Benjelloun 
> To: 
> Cc:
> Date: Tue, 16 Feb 2016 22:03:53 -0500
> Subject: fetch deletes all metadata except _csh_ and _rs_ Hello,
>
>
>
> This problem happens at the second time I crawl a page
>
>
>
> bin/nutch inject urls/
>
> bin/nutch generate -topN 1000
>
> bin/nutch fetch -all
>
> bin/nutch parse -force -all
>
> bin/nutch updatedb -all
>
>
>
> second time :
>
>
>
> bin/nutch generate -topN 1000 --> batchid changes for all existing pages
>
> bin/nutch fetch -all --> *** metadatas are delete for all pages already
> crawled **
>
> bin/nutch parse -force -all
>
> bin/nutch updatedb -all
>
>
>
> I'm using mongodb
>
>
>
> Any Help please ? I'm not sure if it's a nutch bug or  it's my
> misunderstanding on nutch.
>
>
>
> Best regards,
>
>
> Adnane
>
>



Re: fetch deletes all metadata except _csh_ and _rs_

2016-02-23 Thread Lewis John Mcgibbney
Hi Adnane,
Yes, we were getting your mail. Just too busy too respond so thank you for
your patience.
OK so this sounds like a bug IMHO. No Metadata should be deleted, at the
most updates should occur, that is all.
Can you please log an issue at the Nutch Jira instance describing your
Nutch 2.X search stack along with an entire log if possible and any queries
which can allow us to better understand the issue at hand?
Thanks in advance.
lewis

On Wed, Feb 17, 2016 at 10:34 AM,  wrote:

> From: Adnane Benjelloun 
> To: 
> Cc:
> Date: Tue, 16 Feb 2016 22:03:53 -0500
> Subject: fetch deletes all metadata except _csh_ and _rs_
> Hello,
>
>
>
> This problem happens at the second time I crawl a page
>
>
>
> bin/nutch inject urls/
>
> bin/nutch generate -topN 1000
>
> bin/nutch fetch -all
>
> bin/nutch parse -force -all
>
> bin/nutch updatedb -all
>
>
>
> second time :
>
>
>
> bin/nutch generate -topN 1000 --> batchid changes for all existing pages
>
> bin/nutch fetch -all --> *** metadatas are delete for all pages already
> crawled **
>
> bin/nutch parse -force -all
>
> bin/nutch updatedb -all
>
>
>
> I'm using mongodb
>
>
>
> Any Help please ? I'm not sure if it's a nutch bug or  it's my
> misunderstanding on nutch.
>
>
>
> Best regards,
>
>
> Adnane
>
>


RE: fetch deletes all metadata except _csh_ and _rs_

2016-02-23 Thread Adnane Benjelloun
Thank you Markus.

-Original Message-
From: Markus Jelsma [mailto:markus.jel...@openindex.io] 
Sent: February 23, 2016 5:56 AM
To: user@nutch.apache.org
Subject: RE: fetch deletes all metadata except _csh_ and _rs_

Hello Adnane - your mails are received on the mailing list. There is probably 
no one that has read your mail and can respond to it.
Markus
 
-Original message-
> From:Adnane Benjelloun 
> Sent: Tuesday 23rd February 2016 1:25
> To: user@nutch.apache.org
> Subject: Re: fetch deletes all metadata except _csh_ and _rs_
> 
> Hi,
> 
> Can you please confirm if you receives my emails ?
> 
> > On Feb 22, 2016, at 9:23 AM, Adnane Benjelloun  
> > wrote:
> > 
> > Hi everybody,
> > 
> > No one has tried to help me. Any suggestion please ? 
> > 
> > Is there another place where I can ask my question if I'm not in the 
> > right list ?Best regards,
> >  
> > Adnane
> > 
> > -
> > 
> > From: Adnane Benjelloun [mailto:adn...@mediaplusplus.com]
> > Sent: February 16, 2016 10:04 PM
> > To: user@nutch.apache.org
> > Subject: fetch deletes all metadata except _csh_ and _rs_
> > 
> > Hello,
> > 
> > This problem happens at the second time I crawl a page
> > 
> > bin/nutch inject urls/
> > bin/nutch generate -topN 1000
> > bin/nutch fetch -all
> > bin/nutch parse -force -all
> > bin/nutch updatedb –all
> > 
> > second time :
> > 
> > bin/nutch generate -topN 1000 --> batchid changes for all existing 
> > pages bin/nutch fetch -all --> *** metadatas are delete for all 
> > pages already crawled ** bin/nutch parse -force -all bin/nutch 
> > updatedb –all
> > 
> > I'm using mongodb
> > 
> > Any Help please ? I’m not sure if it’s a nutch bug or  it’s my 
> > misunderstanding on nutch.
> > 
> > Best regards,
> > 
> > Adnane
> > 
> > 
> > 
> > 
> 



RE: fetch deletes all metadata except _csh_ and _rs_

2016-02-23 Thread Markus Jelsma
Hello Adnane - your mails are received on the mailing list. There is probably 
no one that has read your mail and can respond to it.
Markus
 
-Original message-
> From:Adnane Benjelloun 
> Sent: Tuesday 23rd February 2016 1:25
> To: user@nutch.apache.org
> Subject: Re: fetch deletes all metadata except _csh_ and _rs_
> 
> Hi,
> 
> Can you please confirm if you receives my emails ?
> 
> > On Feb 22, 2016, at 9:23 AM, Adnane Benjelloun  
> > wrote:
> > 
> > Hi everybody,
> > 
> > No one has tried to help me. Any suggestion please ? 
> > 
> > Is there another place where I can ask my question if I'm not in the right
> > list ?Best regards,
> >  
> > Adnane
> > 
> > -
> > 
> > From: Adnane Benjelloun [mailto:adn...@mediaplusplus.com]
> > Sent: February 16, 2016 10:04 PM
> > To: user@nutch.apache.org
> > Subject: fetch deletes all metadata except _csh_ and _rs_
> > 
> > Hello,
> > 
> > This problem happens at the second time I crawl a page
> > 
> > bin/nutch inject urls/
> > bin/nutch generate -topN 1000
> > bin/nutch fetch -all
> > bin/nutch parse -force -all
> > bin/nutch updatedb –all
> > 
> > second time :
> > 
> > bin/nutch generate -topN 1000 --> batchid changes for all existing pages
> > bin/nutch fetch -all --> *** metadatas are delete for all pages already
> > crawled **
> > bin/nutch parse -force -all
> > bin/nutch updatedb –all
> > 
> > I'm using mongodb
> > 
> > Any Help please ? I’m not sure if it’s a nutch bug or  it’s my
> > misunderstanding on nutch.
> > 
> > Best regards,
> > 
> > Adnane
> > 
> > 
> > 
> > 
> 


Re: fetch deletes all metadata except _csh_ and _rs_

2016-02-22 Thread Adnane Benjelloun
Hi,

Can you please confirm if you receives my emails ?

> On Feb 22, 2016, at 9:23 AM, Adnane Benjelloun  
> wrote:
> 
> Hi everybody,
> 
> No one has tried to help me. Any suggestion please ? 
> 
> Is there another place where I can ask my question if I'm not in the right
> list ?Best regards,
>  
> Adnane
> 
> -
> 
> From: Adnane Benjelloun [mailto:adn...@mediaplusplus.com]
> Sent: February 16, 2016 10:04 PM
> To: user@nutch.apache.org
> Subject: fetch deletes all metadata except _csh_ and _rs_
> 
> Hello,
> 
> This problem happens at the second time I crawl a page
> 
> bin/nutch inject urls/
> bin/nutch generate -topN 1000
> bin/nutch fetch -all
> bin/nutch parse -force -all
> bin/nutch updatedb –all
> 
> second time :
> 
> bin/nutch generate -topN 1000 --> batchid changes for all existing pages
> bin/nutch fetch -all --> *** metadatas are delete for all pages already
> crawled **
> bin/nutch parse -force -all
> bin/nutch updatedb –all
> 
> I'm using mongodb
> 
> Any Help please ? I’m not sure if it’s a nutch bug or  it’s my
> misunderstanding on nutch.
> 
> Best regards,
> 
> Adnane
> 
> 
> 
> 


RE: fetch deletes all metadata except _csh_ and _rs_

2016-02-22 Thread Adnane Benjelloun
Hi everybody,

No one has tried to help me. Any suggestion please ? 

Is there another place where I can ask my question if I'm not in the right
list ?Best regards,
 
Adnane

-

From: Adnane Benjelloun [mailto:adn...@mediaplusplus.com] 
Sent: February 16, 2016 10:04 PM
To: user@nutch.apache.org
Subject: fetch deletes all metadata except _csh_ and _rs_

Hello,

This problem happens at the second time I crawl a page

bin/nutch inject urls/
bin/nutch generate -topN 1000
bin/nutch fetch -all
bin/nutch parse -force -all
bin/nutch updatedb –all

second time :

bin/nutch generate -topN 1000 --> batchid changes for all existing pages
bin/nutch fetch -all --> *** metadatas are delete for all pages already
crawled **
bin/nutch parse -force -all
bin/nutch updatedb –all

I'm using mongodb

Any Help please ? I’m not sure if it’s a nutch bug or  it’s my
misunderstanding on nutch.

Best regards,

Adnane