Re: Option to skip documents

2018-10-09 Thread Karl Wright
r1843343 adds this condition to the list of caught conditions.

In the future it would be better to create a ticket.

Karl


On Tue, Oct 9, 2018 at 3:06 PM Karl Wright  wrote:

> I can make it retry then skip if it doesn't succeed in a while.
>
> Karl
>
>
> On Tue, Oct 9, 2018 at 11:38 AM Romaric Pighetti <
> romaric.pighe...@francelabs.com> wrote:
>
>> Hi Karl,
>>
>> You're right it might be better to reschedule the file for later in this
>> case.
>>
>> In my case, I was able to crawl the files the first time I tried.
>> When launching another crawl a few days later, the same files were locked.
>> I tried to crawl them several times during the day but never could reach
>> them with always the same error.
>>
>> Currently MCF retries to access the file several times in a row, gives up
>> after several tries and stops the jobs with a message reporting the smb
>> Exception encountered.
>>
>> Thanks for your answer,
>> Romaric
>>
>> So it is indeed a temporary lock, but we can't tell how long it will last.
>>
>> Le 09/10/2018 à 17:04, Karl Wright a écrit :
>>
>> Hi Romaric,
>> If the error is transient, then the right thing to do is *not* to skip
>> the file, but to retry later.  What currently happens?
>>
>> Karl
>>
>>
>> On Tue, Oct 9, 2018 at 10:05 AM Romaric Pighetti <
>> romaric.pighe...@francelabs.com> wrote:
>>
>>> Hi Karl,
>>> Along the lines of this ticket
>>> https://issues.apache.org/jira/projects/CONNECTORS/issues/CONNECTORS-1455?filter=allissues
>>> submitted by Julien, I recently stumbled across another smb exception
>>> thrown when dealing with some kind of locked files. The error was
>>> SmbException tossed processing smb://path/to/some/file.pst
>>> jcifs.smb.SmbException: 0xC054
>>> MSDN documentation about this error can be found on this page:
>>> https://msdn.microsoft.com/en-us/library/ee441884.aspx?f=255=-2147217396
>>>
>>> This happens with large pst files (outlook archives) that are in use for
>>> example.
>>> It is a case that would require the file to be skipped rather than
>>> stopping the job in my opinion.
>>> What do you think about it ?
>>>
>>> Thanks,
>>> Romaric
>>>
>>> --
>>> Romaric Pighetti
>>> France Labs – Les experts du Search
>>> Retrouvez-nous à l’Enterprise Search & Discovery
>>>  Summit
>>> à Washington DC
>>>
>>> [image: cid:image001.png@01D42F35.80534520]
>>> 
>>> www.francelabs.com
>>>
>>
>> --
>> Romaric Pighetti
>> France Labs – Les experts du Search
>> Retrouvez-nous à l’Enterprise Search & Discovery
>>  Summit à
>> Washington DC
>>
>> [image: cid:image001.png@01D42F35.80534520]
>> 
>> www.francelabs.com
>>
>


Re: Option to skip documents

2018-10-09 Thread Karl Wright
I can make it retry then skip if it doesn't succeed in a while.

Karl


On Tue, Oct 9, 2018 at 11:38 AM Romaric Pighetti <
romaric.pighe...@francelabs.com> wrote:

> Hi Karl,
>
> You're right it might be better to reschedule the file for later in this
> case.
>
> In my case, I was able to crawl the files the first time I tried.
> When launching another crawl a few days later, the same files were locked.
> I tried to crawl them several times during the day but never could reach
> them with always the same error.
>
> Currently MCF retries to access the file several times in a row, gives up
> after several tries and stops the jobs with a message reporting the smb
> Exception encountered.
>
> Thanks for your answer,
> Romaric
>
> So it is indeed a temporary lock, but we can't tell how long it will last.
>
> Le 09/10/2018 à 17:04, Karl Wright a écrit :
>
> Hi Romaric,
> If the error is transient, then the right thing to do is *not* to skip the
> file, but to retry later.  What currently happens?
>
> Karl
>
>
> On Tue, Oct 9, 2018 at 10:05 AM Romaric Pighetti <
> romaric.pighe...@francelabs.com> wrote:
>
>> Hi Karl,
>> Along the lines of this ticket
>> https://issues.apache.org/jira/projects/CONNECTORS/issues/CONNECTORS-1455?filter=allissues
>> submitted by Julien, I recently stumbled across another smb exception
>> thrown when dealing with some kind of locked files. The error was
>> SmbException tossed processing smb://path/to/some/file.pst
>> jcifs.smb.SmbException: 0xC054
>> MSDN documentation about this error can be found on this page:
>> https://msdn.microsoft.com/en-us/library/ee441884.aspx?f=255=-2147217396
>>
>> This happens with large pst files (outlook archives) that are in use for
>> example.
>> It is a case that would require the file to be skipped rather than
>> stopping the job in my opinion.
>> What do you think about it ?
>>
>> Thanks,
>> Romaric
>>
>> --
>> Romaric Pighetti
>> France Labs – Les experts du Search
>> Retrouvez-nous à l’Enterprise Search & Discovery
>>  Summit à
>> Washington DC
>>
>> [image: cid:image001.png@01D42F35.80534520]
>> 
>> www.francelabs.com
>>
>
> --
> Romaric Pighetti
> France Labs – Les experts du Search
> Retrouvez-nous à l’Enterprise Search & Discovery
>  Summit à
> Washington DC
>
> [image: cid:image001.png@01D42F35.80534520]
> 
> www.francelabs.com
>


Re: Option to skip documents

2018-10-09 Thread Romaric Pighetti

Hi Karl,

You're right it might be better to reschedule the file for later in this 
case.


In my case, I was able to crawl the files the first time I tried.
When launching another crawl a few days later, the same files were locked.
I tried to crawl them several times during the day but never could reach 
them with always the same error.


Currently MCF retries to access the file several times in a row, gives 
up after several tries and stops the jobs with a message reporting the 
smb Exception encountered.


Thanks for your answer,
Romaric

So it is indeed a temporary lock, but we can't tell how long it will last.

Le 09/10/2018 à 17:04, Karl Wright a écrit :

Hi Romaric,
If the error is transient, then the right thing to do is *not* to skip 
the file, but to retry later.  What currently happens?


Karl


On Tue, Oct 9, 2018 at 10:05 AM Romaric Pighetti 
> wrote:


Hi Karl,

Along the lines of this ticket

https://issues.apache.org/jira/projects/CONNECTORS/issues/CONNECTORS-1455?filter=allissues
submitted by Julien, I recently stumbled across another smb
exception thrown when dealing with some kind of locked files. The
error was
SmbException tossed processing smb://path/to/some/file.pst
jcifs.smb.SmbException: 0xC054
MSDN documentation about this error can be found on this page:

https://msdn.microsoft.com/en-us/library/ee441884.aspx?f=255=-2147217396

This happens with large pst files (outlook archives) that are in
use for example.
It is a case that would require the file to be skipped rather than
stopping the job in my opinion.
What do you think about it ?

Thanks,
Romaric

-- 
Romaric Pighetti

France Labs – Les experts du Search
Retrouvez-nous à l’Enterprise Search & Discovery

Summit à Washington DC

cid:image001.png@01D42F35.80534520


www.francelabs.com 



--
Re: Nouvelle signature jusqu'à novembre Romaric Pighetti
France Labs – Les experts du Search
Retrouvez-nous à l’Enterprise Search & Discovery 
 Summit à 
Washington DC


cid:image001.png@01D42F35.80534520 



www.francelabs.com 


Re: Option to skip documents

2018-10-09 Thread Karl Wright
Hi Romaric,
If the error is transient, then the right thing to do is *not* to skip the
file, but to retry later.  What currently happens?

Karl


On Tue, Oct 9, 2018 at 10:05 AM Romaric Pighetti <
romaric.pighe...@francelabs.com> wrote:

> Hi Karl,
> Along the lines of this ticket
> https://issues.apache.org/jira/projects/CONNECTORS/issues/CONNECTORS-1455?filter=allissues
> submitted by Julien, I recently stumbled across another smb exception
> thrown when dealing with some kind of locked files. The error was
> SmbException tossed processing smb://path/to/some/file.pst
> jcifs.smb.SmbException: 0xC054
> MSDN documentation about this error can be found on this page:
> https://msdn.microsoft.com/en-us/library/ee441884.aspx?f=255=-2147217396
>
> This happens with large pst files (outlook archives) that are in use for
> example.
> It is a case that would require the file to be skipped rather than
> stopping the job in my opinion.
> What do you think about it ?
>
> Thanks,
> Romaric
>
> --
> Romaric Pighetti
> France Labs – Les experts du Search
> Retrouvez-nous à l’Enterprise Search & Discovery
>  Summit à
> Washington DC
>
> [image: cid:image001.png@01D42F35.80534520]
> 
> www.francelabs.com
>


Option to skip documents

2018-10-09 Thread Romaric Pighetti

Hi Karl,

Along the lines of this ticket 
https://issues.apache.org/jira/projects/CONNECTORS/issues/CONNECTORS-1455?filter=allissues 
submitted by Julien, I recently stumbled across another smb exception 
thrown when dealing with some kind of locked files. The error was

SmbException tossed processing smb://path/to/some/file.pst
jcifs.smb.SmbException: 0xC054
MSDN documentation about this error can be found on this page: 
https://msdn.microsoft.com/en-us/library/ee441884.aspx?f=255=-2147217396


This happens with large pst files (outlook archives) that are in use for 
example.
It is a case that would require the file to be skipped rather than 
stopping the job in my opinion.

What do you think about it ?

Thanks,
Romaric

--
Re: Nouvelle signature jusqu'à novembre Romaric Pighetti
France Labs – Les experts du Search
Retrouvez-nous à l’Enterprise Search & Discovery 
 Summit à 
Washington DC


cid:image001.png@01D42F35.80534520 



www.francelabs.com 


Re: Query to get the number of documents processed from PostgreSQL

2018-10-09 Thread Romaric Pighetti

Thanks Karl for the quick answer.

I guess to get only the documents completed while the job is running i 
will have to fiddle around with the status, for which potential values 
are expressed in the JobQueue class.
I noticed that sometimes (mainly when pausing and restarting a job), 
selecting only the elements with status 'C' does not seem to give the 
exact same value as the "Completed" column of the UI.

If you have any hint I am interested, else I will try to figure that out.

Thanks again,
Romaric

P.S.: copying my answer to the mailing list. Sorry Karl for mailing you 
directly too, I wanted to answer to the mailing list.


Le 08/10/2018 à 12:17, Karl Wright a écrit :

If you want all the documents for a specific job, the query is:

select count(*) from jobqueue where jobid=

Karl


On Mon, Oct 8, 2018 at 4:23 AM Romaric Pighetti 
> wrote:


Hi Karl,

I am currently facing the need of getting the number of documents
processed by MCF in a specific job.
This number is getting bigger than the limit set for the web
interface
and i don't want to increase this limit because of the stress it will
put on the database (openning the tab in the UI will pop queries
for all
the jobs, and I know from previous readings that these queries are
heavy
to process for postgre).
Thus i would like to know if you can provide me with the query
used in
the interface to display the number of processed documents so that
i can
fire it to postgreSQL manually and request it only for the job i am
interested in; lowering the impact on postgre.

Thanks for your help.
Romaric

-- 
Romaric Pighetti

France Labs – Les experts du Search

Les créateurs de Datafari 4, LA solution de recherche pour entreprise

www.francelabs.com 



--
Re: Nouvelle signature jusqu'à novembre Romaric Pighetti
France Labs – Les experts du Search
Retrouvez-nous à l’Enterprise Search & Discovery 
 Summit à 
Washington DC


cid:image001.png@01D42F35.80534520 



www.francelabs.com