Participate in the ASF 25th Anniversary Campaign

2024-04-03 Thread Brian Proffitt
Hi everyone,

As part of The ASF’s 25th anniversary campaign[1], we will be celebrating
projects and communities in multiple ways.

We invite all projects and contributors to participate in the following
ways:

* Individuals - submit your first contribution:
https://news.apache.org/foundation/entry/the-asf-launches-firstasfcontribution-campaign
* Projects - share your public good story:
https://docs.google.com/forms/d/1vuN-tUnBwpTgOE5xj3Z5AG1hsOoDNLBmGIqQHwQT6k8/viewform?edit_requested=true
* Projects - submit a project spotlight for the blog:
https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=278466116
* Projects - contact the Voice of Apache podcast (formerly Feathercast) to
be featured: https://feathercast.apache.org/help/
*  Projects - use the 25th anniversary template and the #ASF25Years hashtag
on social media:
https://docs.google.com/presentation/d/1oDbMol3F_XQuCmttPYxBIOIjRuRBksUjDApjd8Ve3L8/edit#slide=id.g26b0919956e_0_13

If you have questions, email the Marketing & Publicity team at
mark...@apache.org.

Peace,
BKP

[1] https://apache.org/asf25years/

[NOTE: You are receiving this message because you are a contributor to an
Apache Software Foundation project. The ASF will very occasionally send out
messages relating to the Foundation to contributors and members, such as
this one.]

Brian Proffitt
VP, Marketing & Publicity
VP, Conferences


Community Over Code NA 2024 Travel Assistance Applications now open!

2024-03-27 Thread Gavin McDonald
Hello to all users, contributors and Committers!

[ You are receiving this email as a subscriber to one or more ASF project
dev or user
  mailing lists and is not being sent to you directly. It is important that
we reach all of our
  users and contributors/committers so that they may get a chance to
benefit from this.
  We apologise in advance if this doesn't interest you but it is on topic
for the mailing
  lists of the Apache Software Foundation; and it is important please that
you do not
  mark this as spam in your email client. Thank You! ]

The Travel Assistance Committee (TAC) are pleased to announce that
travel assistance applications for Community over Code NA 2024 are now
open!

We will be supporting Community over Code NA, Denver Colorado in
October 7th to the 10th 2024.

TAC exists to help those that would like to attend Community over Code
events, but are unable to do so for financial reasons. For more info
on this years applications and qualifying criteria, please visit the
TAC website at < https://tac.apache.org/ >. Applications are already
open on https://tac-apply.apache.org/, so don't delay!

The Apache Travel Assistance Committee will only be accepting
applications from those people that are able to attend the full event.

Important: Applications close on Monday 6th May, 2024.

Applicants have until the the closing date above to submit their
applications (which should contain as much supporting material as
required to efficiently and accurately process their request), this
will enable TAC to announce successful applications shortly
afterwards.

As usual, TAC expects to deal with a range of applications from a
diverse range of backgrounds; therefore, we encourage (as always)
anyone thinking about sending in an application to do so ASAP.

For those that will need a Visa to enter the Country - we advise you apply
now so that you have enough time in case of interview delays. So do not
wait until you know if you have been accepted or not.

We look forward to greeting many of you in Denver, Colorado , October 2024!

Kind Regards,

Gavin

(On behalf of the Travel Assistance Committee)


Jetty Config changes

2024-03-11 Thread ritika jain
Hi All,

When Manifoldcf start with start.jar , it creates an entry in system's tmp
folder ,but it does not automatically get cleaned when server/manifold
stops

On my Live server I am using dockerised environment , where we have a
mechanism that restarts manifold whenever required , every time manifoldcf
restarts it creates and entry in tmp directory of linux based
operating system on docker host , which when we tried to clean up tmp
directory gives us permission issues.

[image: image.png]

Is there any way we can change jetty server instance entry to some other
place like a custom folder , which we can manage by ourselves?(handled
permissions)

OR

If there can be any way that in config files we can change this default path
Thanks
Ritika


Community Over Code Asia 2024 Travel Assistance Applications now open!

2024-02-20 Thread Gavin McDonald
Hello to all users, contributors and Committers!

The Travel Assistance Committee (TAC) are pleased to announce that
travel assistance applications for Community over Code Asia 2024 are now
open!

We will be supporting Community over Code Asia, Hangzhou, China
July 26th - 28th, 2024.

TAC exists to help those that would like to attend Community over Code
events, but are unable to do so for financial reasons. For more info
on this year's applications and qualifying criteria, please visit the
TAC website at < https://tac.apache.org/ >. Applications are already
open on https://tac-apply.apache.org/, so don't delay!

The Apache Travel Assistance Committee will only be accepting
applications from those people that are able to attend the full event.

Important: Applications close on Friday, May 10th, 2024.

Applicants have until the the closing date above to submit their
applications (which should contain as much supporting material as
required to efficiently and accurately process their request), this
will enable TAC to announce successful applications shortly
afterwards.

As usual, TAC expects to deal with a range of applications from a
diverse range of backgrounds; therefore, we encourage (as always)
anyone thinking about sending in an application to do so ASAP.

For those that will need a Visa to enter the Country - we advise you to
apply
now so that you have enough time in case of interview delays. So do not
wait until you know if you have been accepted or not.

We look forward to greeting many of you in Hangzhou, China in July, 2024!

Kind Regards,

Gavin

(On behalf of the Travel Assistance Committee)


Community over Code EU 2024 Travel Assistance Applications now open!

2024-02-03 Thread Gavin McDonald
Hello to all users, contributors and Committers!

The Travel Assistance Committee (TAC) are pleased to announce that
travel assistance applications for Community over Code EU 2024 are now
open!

We will be supporting Community over Code EU, Bratislava, Slovakia,
June 3th - 5th, 2024.

TAC exists to help those that would like to attend Community over Code
events, but are unable to do so for financial reasons. For more info
on this years applications and qualifying criteria, please visit the
TAC website at < https://tac.apache.org/ >. Applications are already
open on https://tac-apply.apache.org/, so don't delay!

The Apache Travel Assistance Committee will only be accepting
applications from those people that are able to attend the full event.

Important: Applications close on Friday, March 1st, 2024.

Applicants have until the the closing date above to submit their
applications (which should contain as much supporting material as
required to efficiently and accurately process their request), this
will enable TAC to announce successful applications shortly
afterwards.

As usual, TAC expects to deal with a range of applications from a
diverse range of backgrounds; therefore, we encourage (as always)
anyone thinking about sending in an application to do so ASAP.

For those that will need a Visa to enter the Country - we advise you apply
now so that you have enough time in case of interview delays. So do not
wait until you know if you have been accepted or not.

We look forward to greeting many of you in Bratislava, Slovakia in June,
2024!

Kind Regards,

Gavin

(On behalf of the Travel Assistance Committee)


[no subject]

2024-02-03 Thread Gavin McDonald
Hello to all users, contributors and Committers!

The Travel Assistance Committee (TAC) are pleased to announce that
travel assistance applications for Community over Code EU 2024 are now
open!

We will be supporting Community over Code EU, Bratislava, Slovakia,
June 3th - 5th, 2024.

TAC exists to help those that would like to attend Community over Code
events, but are unable to do so for financial reasons. For more info
on this years applications and qualifying criteria, please visit the
TAC website at < https://tac.apache.org/ >. Applications are already
open on https://tac-apply.apache.org/, so don't delay!

The Apache Travel Assistance Committee will only be accepting
applications from those people that are able to attend the full event.

Important: Applications close on Friday, March 1st, 2024.

Applicants have until the the closing date above to submit their
applications (which should contain as much supporting material as
required to efficiently and accurately process their request), this
will enable TAC to announce successful applications shortly
afterwards.

As usual, TAC expects to deal with a range of applications from a
diverse range of backgrounds; therefore, we encourage (as always)
anyone thinking about sending in an application to do so ASAP.

For those that will need a Visa to enter the Country - we advise you apply
now so that you have enough time in case of interview delays. So do not
wait until you know if you have been accepted or not.

We look forward to greeting many of you in Bratislava, Slovakia in June,
2024!

Kind Regards,

Gavin

(On behalf of the Travel Assistance Committee)


Re: Documents Out Of Scope and hop count

2023-09-26 Thread Karl Wright
No, only the seed URLs get updated with that option.


On Tue, Sep 26, 2023 at 10:09 AM Marisol Redondo <
marisol.redondo.gar...@gmail.com> wrote:

> Thanks a lot for the explanation, Karl, really useful.
>
> I will wait for your reply at the end of the week, but I thought that the
> main reason for the option "Reset seeding" was for that, for reevaluating
> all pages, as a new fresh execution.
>
>
> On Tue, 26 Sept 2023 at 13:30, Karl Wright  wrote:
>
>> Okay, that is good to know.
>> The hopcount assessment occurs when documents are added to the queue.
>> Hopcounts are stored for each document in the hopcount table.  So if you
>> change a hopcount limit, it is quite possible that nothing will change
>> unless documents that are at the previous hopcount limit are re-evaluated.
>> I believe there is no logic in ManifoldCF for that at this time, but I'd
>> have to review the codebase to be certain of that.
>>
>> What that means is that you can't increase the hopcount limit and expect
>> the next crawl to pick up the documents you excluded before with the
>> hopcount mechanism.  Only when the documents need to be rescanned for some
>> other reason would that happen as it stands now.  But I will get back to
>> you after a review at the end of the week.
>>
>> Karl
>>
>> Karl
>>
>>
>> On Tue, Sep 26, 2023 at 8:04 AM Marisol Redondo <
>> marisol.redondo.gar...@gmail.com> wrote:
>>
>>> No, I haven't used this options, I have it configured as "Keep
>>> unreachable documents, for now", but it's also ignoring them because they
>>> were already kept?. With this option, when the unreachable document for now
>>> are converted to forever?
>>>
>>> The only solution I can think on is creating a new job with the exact
>>> same characteristics and run it.
>>>
>>> Regards and thanks
>>>Marisol
>>>
>>>
>>>
>>> On Tue, 26 Sept 2023 at 12:35, Karl Wright  wrote:
>>>
 If you ever set "Ignore unreachable documents forever" for the job, you
 can't go back and stop ignoring them.  The data that the job would need to
 have recorded for this is gone.  The only way to get it back is if you can
 convince the ManifoldCF to recrawl all documents in the job.


 On Tue, Sep 26, 2023 at 4:51 AM Marisol Redondo <
 marisol.redondo.gar...@gmail.com> wrote:

>
> Hi, I had a problem with document out of scope
>
> I change the Maximum hop count for type "redirect" in one of my job to
> 5, and saw that the job is not processing some pages because of that, so I
> removed the value to get them injecting into the output connector (Solr
> connector)
> After that, the same pages are still out of scope like the limit has
> been set to 1, and they are not indexed.
>
> I have tried to "Reset seeding" thinking that maybe the pages need to
> be check again, but still having the same problem, I don't think the
> problem is with the output, but I have also use the option "Re-index all
> associated documents" and "Remove all associated records" with the same
> result
> I don't want to clear the history in the repository, that it's a
> website connector, as I don't want to lost all the history.
>
> Is this a bug in Manifold? Is there any option to fix this issue?
>
> I'm using Manifold version 2.24.
>
> Thanks
> Marisol
>
>


Re: Documents Out Of Scope and hop count

2023-09-26 Thread Marisol Redondo
Thanks a lot for the explanation, Karl, really useful.

I will wait for your reply at the end of the week, but I thought that the
main reason for the option "Reset seeding" was for that, for reevaluating
all pages, as a new fresh execution.


On Tue, 26 Sept 2023 at 13:30, Karl Wright  wrote:

> Okay, that is good to know.
> The hopcount assessment occurs when documents are added to the queue.
> Hopcounts are stored for each document in the hopcount table.  So if you
> change a hopcount limit, it is quite possible that nothing will change
> unless documents that are at the previous hopcount limit are re-evaluated.
> I believe there is no logic in ManifoldCF for that at this time, but I'd
> have to review the codebase to be certain of that.
>
> What that means is that you can't increase the hopcount limit and expect
> the next crawl to pick up the documents you excluded before with the
> hopcount mechanism.  Only when the documents need to be rescanned for some
> other reason would that happen as it stands now.  But I will get back to
> you after a review at the end of the week.
>
> Karl
>
> Karl
>
>
> On Tue, Sep 26, 2023 at 8:04 AM Marisol Redondo <
> marisol.redondo.gar...@gmail.com> wrote:
>
>> No, I haven't used this options, I have it configured as "Keep
>> unreachable documents, for now", but it's also ignoring them because they
>> were already kept?. With this option, when the unreachable document for now
>> are converted to forever?
>>
>> The only solution I can think on is creating a new job with the exact
>> same characteristics and run it.
>>
>> Regards and thanks
>>Marisol
>>
>>
>>
>> On Tue, 26 Sept 2023 at 12:35, Karl Wright  wrote:
>>
>>> If you ever set "Ignore unreachable documents forever" for the job, you
>>> can't go back and stop ignoring them.  The data that the job would need to
>>> have recorded for this is gone.  The only way to get it back is if you can
>>> convince the ManifoldCF to recrawl all documents in the job.
>>>
>>>
>>> On Tue, Sep 26, 2023 at 4:51 AM Marisol Redondo <
>>> marisol.redondo.gar...@gmail.com> wrote:
>>>

 Hi, I had a problem with document out of scope

 I change the Maximum hop count for type "redirect" in one of my job to
 5, and saw that the job is not processing some pages because of that, so I
 removed the value to get them injecting into the output connector (Solr
 connector)
 After that, the same pages are still out of scope like the limit has
 been set to 1, and they are not indexed.

 I have tried to "Reset seeding" thinking that maybe the pages need to
 be check again, but still having the same problem, I don't think the
 problem is with the output, but I have also use the option "Re-index all
 associated documents" and "Remove all associated records" with the same
 result
 I don't want to clear the history in the repository, that it's a
 website connector, as I don't want to lost all the history.

 Is this a bug in Manifold? Is there any option to fix this issue?

 I'm using Manifold version 2.24.

 Thanks
 Marisol




Re: Documents Out Of Scope and hop count

2023-09-26 Thread Karl Wright
Okay, that is good to know.
The hopcount assessment occurs when documents are added to the queue.
Hopcounts are stored for each document in the hopcount table.  So if you
change a hopcount limit, it is quite possible that nothing will change
unless documents that are at the previous hopcount limit are re-evaluated.
I believe there is no logic in ManifoldCF for that at this time, but I'd
have to review the codebase to be certain of that.

What that means is that you can't increase the hopcount limit and expect
the next crawl to pick up the documents you excluded before with the
hopcount mechanism.  Only when the documents need to be rescanned for some
other reason would that happen as it stands now.  But I will get back to
you after a review at the end of the week.

Karl

Karl


On Tue, Sep 26, 2023 at 8:04 AM Marisol Redondo <
marisol.redondo.gar...@gmail.com> wrote:

> No, I haven't used this options, I have it configured as "Keep unreachable
> documents, for now", but it's also ignoring them because they were already
> kept?. With this option, when the unreachable document for now are
> converted to forever?
>
> The only solution I can think on is creating a new job with the exact same
> characteristics and run it.
>
> Regards and thanks
>Marisol
>
>
>
> On Tue, 26 Sept 2023 at 12:35, Karl Wright  wrote:
>
>> If you ever set "Ignore unreachable documents forever" for the job, you
>> can't go back and stop ignoring them.  The data that the job would need to
>> have recorded for this is gone.  The only way to get it back is if you can
>> convince the ManifoldCF to recrawl all documents in the job.
>>
>>
>> On Tue, Sep 26, 2023 at 4:51 AM Marisol Redondo <
>> marisol.redondo.gar...@gmail.com> wrote:
>>
>>>
>>> Hi, I had a problem with document out of scope
>>>
>>> I change the Maximum hop count for type "redirect" in one of my job to
>>> 5, and saw that the job is not processing some pages because of that, so I
>>> removed the value to get them injecting into the output connector (Solr
>>> connector)
>>> After that, the same pages are still out of scope like the limit has
>>> been set to 1, and they are not indexed.
>>>
>>> I have tried to "Reset seeding" thinking that maybe the pages need to be
>>> check again, but still having the same problem, I don't think the problem
>>> is with the output, but I have also use the option "Re-index all associated
>>> documents" and "Remove all associated records" with the same result
>>> I don't want to clear the history in the repository, that it's a website
>>> connector, as I don't want to lost all the history.
>>>
>>> Is this a bug in Manifold? Is there any option to fix this issue?
>>>
>>> I'm using Manifold version 2.24.
>>>
>>> Thanks
>>> Marisol
>>>
>>>


Re: Documents Out Of Scope and hop count

2023-09-26 Thread Marisol Redondo
No, I haven't used this options, I have it configured as "Keep unreachable
documents, for now", but it's also ignoring them because they were already
kept?. With this option, when the unreachable document for now are
converted to forever?

The only solution I can think on is creating a new job with the exact same
characteristics and run it.

Regards and thanks
   Marisol



On Tue, 26 Sept 2023 at 12:35, Karl Wright  wrote:

> If you ever set "Ignore unreachable documents forever" for the job, you
> can't go back and stop ignoring them.  The data that the job would need to
> have recorded for this is gone.  The only way to get it back is if you can
> convince the ManifoldCF to recrawl all documents in the job.
>
>
> On Tue, Sep 26, 2023 at 4:51 AM Marisol Redondo <
> marisol.redondo.gar...@gmail.com> wrote:
>
>>
>> Hi, I had a problem with document out of scope
>>
>> I change the Maximum hop count for type "redirect" in one of my job to 5,
>> and saw that the job is not processing some pages because of that, so I
>> removed the value to get them injecting into the output connector (Solr
>> connector)
>> After that, the same pages are still out of scope like the limit has been
>> set to 1, and they are not indexed.
>>
>> I have tried to "Reset seeding" thinking that maybe the pages need to be
>> check again, but still having the same problem, I don't think the problem
>> is with the output, but I have also use the option "Re-index all associated
>> documents" and "Remove all associated records" with the same result
>> I don't want to clear the history in the repository, that it's a website
>> connector, as I don't want to lost all the history.
>>
>> Is this a bug in Manifold? Is there any option to fix this issue?
>>
>> I'm using Manifold version 2.24.
>>
>> Thanks
>> Marisol
>>
>>


Re: Documents Out Of Scope and hop count

2023-09-26 Thread Karl Wright
If you ever set "Ignore unreachable documents forever" for the job, you
can't go back and stop ignoring them.  The data that the job would need to
have recorded for this is gone.  The only way to get it back is if you can
convince the ManifoldCF to recrawl all documents in the job.


On Tue, Sep 26, 2023 at 4:51 AM Marisol Redondo <
marisol.redondo.gar...@gmail.com> wrote:

>
> Hi, I had a problem with document out of scope
>
> I change the Maximum hop count for type "redirect" in one of my job to 5,
> and saw that the job is not processing some pages because of that, so I
> removed the value to get them injecting into the output connector (Solr
> connector)
> After that, the same pages are still out of scope like the limit has been
> set to 1, and they are not indexed.
>
> I have tried to "Reset seeding" thinking that maybe the pages need to be
> check again, but still having the same problem, I don't think the problem
> is with the output, but I have also use the option "Re-index all associated
> documents" and "Remove all associated records" with the same result
> I don't want to clear the history in the repository, that it's a website
> connector, as I don't want to lost all the history.
>
> Is this a bug in Manifold? Is there any option to fix this issue?
>
> I'm using Manifold version 2.24.
>
> Thanks
> Marisol
>
>


Documents Out Of Scope and hop count

2023-09-26 Thread Marisol Redondo
Hi, I had a problem with document out of scope

I change the Maximum hop count for type "redirect" in one of my job to 5,
and saw that the job is not processing some pages because of that, so I
removed the value to get them injecting into the output connector (Solr
connector)
After that, the same pages are still out of scope like the limit has been
set to 1, and they are not indexed.

I have tried to "Reset seeding" thinking that maybe the pages need to be
check again, but still having the same problem, I don't think the problem
is with the output, but I have also use the option "Re-index all associated
documents" and "Remove all associated records" with the same result
I don't want to clear the history in the repository, that it's a website
connector, as I don't want to lost all the history.

Is this a bug in Manifold? Is there any option to fix this issue?

I'm using Manifold version 2.24.

Thanks
Marisol


R: web crawler https

2023-09-26 Thread Bisonti Mario
Thanks a lot Karl!
I uploaded ssl certificate and flag on “always trust” and it works

Mario


Da: Karl Wright 
Inviato: lunedì 25 settembre 2023 20:41
A: user@manifoldcf.apache.org
Oggetto: Re: web crawler https

See this article:

https://stackoverflow.com/questions/6784463/error-trustanchors-parameter-must-be-non-empty

ManifoldCF web crawler configuration allows you to drop certs into a local 
trust store for the connection.  You need to either do that (adding whatever 
certificate authority cert you think might be missing), or by checking the 
"trust https" checkbox.

You can generally debug what certs a site might need by trying to fetch a page 
with curl and using verbose debug mode.

Karl


On Mon, Sep 25, 2023 at 10:48 AM Bisonti Mario 
mailto:mario.biso...@vimar.com>> wrote:
Hi,
I would like to try indexing a Wordpress internal site.
I tried to configure Repository Web, Job with seeds but I always obtain:

WARN 2023-09-25T16:31:50,905 (Worker thread '4') - Service interruption 
reported for job 1695649924581 connection 'Wp': IO exception 
(javax.net.ssl.SSLException)reading header: Unexpected error: 
java.security.InvalidAlgorithmParameterException: the trustAnchors parameter 
must be non-empty

How could I solve?
Thanks a lot
Mario


Re: web crawler https

2023-09-25 Thread Karl Wright
See this article:

https://stackoverflow.com/questions/6784463/error-trustanchors-parameter-must-be-non-empty

ManifoldCF web crawler configuration allows you to drop certs into a local
trust store for the connection.  You need to either do that (adding
whatever certificate authority cert you think might be missing), or by
checking the "trust https" checkbox.

You can generally debug what certs a site might need by trying to fetch a
page with curl and using verbose debug mode.

Karl


On Mon, Sep 25, 2023 at 10:48 AM Bisonti Mario 
wrote:

> Hi,
>
> I would like to try indexing a Wordpress internal site.
>
> I tried to configure Repository Web, Job with seeds but I always obtain:
>
>
>
> WARN 2023-09-25T16:31:50,905 (Worker thread '4') - Service interruption
> reported for job 1695649924581 connection 'Wp': IO exception
> (javax.net.ssl.SSLException)reading header: Unexpected error:
> java.security.InvalidAlgorithmParameterException: the trustAnchors
> parameter must be non-empty
>
>
>
> How could I solve?
>
> Thanks a lot
>
> Mario
>
>


web crawler https

2023-09-25 Thread Bisonti Mario
Hi,
I would like to try indexing a Wordpress internal site.
I tried to configure Repository Web, Job with seeds but I always obtain:

WARN 2023-09-25T16:31:50,905 (Worker thread '4') - Service interruption 
reported for job 1695649924581 connection 'Wp': IO exception 
(javax.net.ssl.SSLException)reading header: Unexpected error: 
java.security.InvalidAlgorithmParameterException: the trustAnchors parameter 
must be non-empty

How could I solve?
Thanks a lot
Mario


Documentation issue?

2023-09-14 Thread Bisonti Mario
Hi, I would like to report that at the url: 
https://manifoldcf.apache.org/release/release-2.25/en_US/index.html I obtain:

Not Found
The requested URL was not found on this server.

Thank you
Mario


Registration open for Community Over Code North America

2023-08-28 Thread Rich Bowen
Hello! Registration is still open for the upcoming Community Over Code
NA event in Halifax, NS! We invite you to  register for the event
https://communityovercode.org/registration/

Apache Committers, note that you have a special discounted rate for the
conference at US$250. To take advantage of this rate, use the special
code sent to the committers@ list by Brian Proffitt earlier this month.

If you are in need of an invitation letter, please consult the
information at https://communityovercode.org/visa-letter/

Please see https://communityovercode.org/ for more information about
the event, including how to make reservations for discounted hotel
rooms in Halifax. Discounted rates will only be available until Sept.
5, so reserve soon!

--Rich, for the event planning team


Re: Duplicate key value violates unique constraint "repohistory_pkey"

2023-06-16 Thread Marisol Redondo
Hi,

Did you find any solution for that or do you have still disabled the
history?

I'm having the same problem, and we are using postgresql as the db.

Regards

On Sun, 29 Jan 2023 at 05:48, Artem Abeleshev 
wrote:

> Hi everyone!
>
> We are using ManifoldCF 2.22.1 with multiple nodes in our production. And
> I am investigating the problem we've got recently (it happens at least 5-6
> times already). Couple of our jobs are end up with the following error:
>
> ```
> Error: ERROR: duplicate key value violates unique constraint
> "repohistory_pkey" Detail: Key (id)=(1672652357009) already exists.
> ```
>
> and following log entry appears in the logs of the one of the nodes:
>
> ```
> org.apache.manifoldcf.core.interfaces.ManifoldCFException: ERROR:
> duplicate key value violates unique constraint "repohistory_pkey"
>   Detail: Key (id)=(1673507409625) already exists.
> at
> org.apache.manifoldcf.core.database.DBInterfacePostgreSQL.reinterpretException(DBInterfacePostgreSQL.java:638)
> ~[mcf-core.jar:?]
> at
> org.apache.manifoldcf.core.database.DBInterfacePostgreSQL.performModification(DBInterfacePostgreSQL.java:665)
> ~[mcf-core.jar:?]
> at
> org.apache.manifoldcf.core.database.DBInterfacePostgreSQL.performInsert(DBInterfacePostgreSQL.java:187)
> ~[mcf-core.jar:?]
> at
> org.apache.manifoldcf.core.database.BaseTable.performInsert(BaseTable.java:68)
> ~[mcf-core.jar:?]
> at
> org.apache.manifoldcf.crawler.repository.RepositoryHistoryManager.addRow(RepositoryHistoryManager.java:202)
> ~[mcf-pull-agent.jar:?]
> at
> org.apache.manifoldcf.crawler.repository.RepositoryConnectionManager.recordHistory(RepositoryConnectionManager.java:706)
> ~[mcf-pull-agent.jar:?]
> at
> org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.recordActivity(WorkerThread.java:1878)
> ~[mcf-pull-agent.jar:?]
> at
> org.apache.manifoldcf.crawler.connectors.webcrawler.WebcrawlerConnector.processDocument(WebcrawlerConnector.java:1470)
> ~[?:?]
> at
> org.apache.manifoldcf.crawler.connectors.webcrawler.WebcrawlerConnector.processDocuments(WebcrawlerConnector.java:753)
> ~[?:?]
> at
> org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:402)
> [mcf-pull-agent.jar:?]
> Caused by: org.postgresql.util.PSQLException: ERROR: duplicate key value
> violates unique constraint "repohistory_pkey"
>   Detail: Key (id)=(1673507409625) already exists.
> at
> org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:2476)
> ~[postgresql-42.1.3.jar:42.1.3]
> at
> org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:2189)
> ~[postgresql-42.1.3.jar:42.1.3]
> at
> org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:300)
> ~[postgresql-42.1.3.jar:42.1.3]
> at
> org.postgresql.jdbc.PgStatement.executeInternal(PgStatement.java:428)
> ~[postgresql-42.1.3.jar:42.1.3]
> at org.postgresql.jdbc.PgStatement.execute(PgStatement.java:354)
> ~[postgresql-42.1.3.jar:42.1.3]
> at
> org.postgresql.jdbc.PgPreparedStatement.executeWithFlags(PgPreparedStatement.java:169)
> ~[postgresql-42.1.3.jar:42.1.3]
> at
> org.postgresql.jdbc.PgPreparedStatement.executeUpdate(PgPreparedStatement.java:136)
> ~[postgresql-42.1.3.jar:42.1.3]
> at
> org.apache.manifoldcf.core.database.Database.execute(Database.java:916)
> ~[mcf-core.jar:?]
> at
> org.apache.manifoldcf.core.database.Database$ExecuteQueryThread.run(Database.java:696)
> ~[mcf-core.jar:?]
> ```
>
> First, I have noticed that ID of the entities in the ManifoldCF are
> actualy a timestamps. So I become curious how it handles duplications and
> starting to dig the sources to get an idea of how an ids are generated. I
> found that ids are generated by the `IDFactory`
> (`org.apache.manifoldcf.core.interfaces.IDFactory`). `IDFactory` is using
> the id's pool. Each time we need a new id it will be extracted from the
> pool. In case of pool is empty `IDFactory` will generate another 100
> entries. To make sure ids are not overlapped the last generated id is
> stored in the zookeeper, so each time `IDFactory` will start generating
> next batch of ids, it will start from the last id generated. This part
> looks clean to me.
>
> Next investigation was concerning locking. It is obvious that during id
> generation we should handle synronization on the thread level (local jvm)
> and global level (zookeeper). Both global and local locking also looks fine.
>
> The other observation I made is that all cases happens during saving the
> repository history records. So the next idea was that probably the same
> record was trying to be stored repeatedly. But it seems it is quite hard to
> investigate this part as a lot of service layers can call this.
>
> For now I have just disabled history completely by placing
> `org.apache.manifoldcf.crawler.repository.store_history` propeprty with
> `false` 

TAC Applications for Community Over Code North America and Asia now open

2023-06-16 Thread Gavin McDonald
Hi All,

(This email goes out to all our user and dev project mailing lists, so you
may receive this
email more than once.)

The Travel Assistance Committee has opened up applications to help get
people to the following events:


*Community Over Code Asia 2023 - *
*August 18th to August 20th in Beijing , China*

Applications for this event closes on the 6th July so time is short, please
apply as soon as possible. TAC is prioritising applications from the Asia
and Oceania regions.

More details on this event can be found at:
https://apachecon.com/acasia2023/

More information on how to apply please read: https://tac.apache.org/


*Community Over Code North America - *
*October 7th to October 10th in Halifax, Canada*

Applications for this event closes on the 22nd July. We expect many
applications so please do apply as soon as you can. TAC is prioritising
applications from the North and South America regions.

More details on this event can be found at: https://communityovercode.org/

More information on how to apply please read: https://tac.apache.org/


*Have you applied to be a Speaker?*

If you have applied or intend to apply as a Speaker at either of these
events, and think you
may require assistance for Travel and/or Accommodation - TAC advises that
you do not
wait until you have been notified of your speaker status and to apply
early. Should you
not be accepted as a speaker and still wish to attend you can amend you
application to
include Conference fees, or, you may withdraw your application.

The call for presentations for Halifax is here:
https://communityovercode.org/call-for-presentations/
and you have until the 13th of July to apply.

The call for presentations for Beijing is here:
https://apachecon.com/acasia2023/cfp.html
and you have until the 18th June to apply.

*IMPORTANT Note on Visas:*

It is important that you apply for a Visa as soon as possible - do not wait
until you know if you have been accepted for Travel Assistance or not, as
due to current wait times for Interviews in some Countries, waiting that
long may be too late, so please do apply for a Visa right away. Contact
tac-ap...@tac.apache.org if you need any more information or assistance in
this area.

*Spread the Word!!*

TAC encourages you to spread the word about Travel Assistance to get to
these events, so feel free to repost as you see fit on Social Media, at
work, schools, universities etc etc...

Thank You and hope to see you all soon

Gavin McDonald on behalf of the ASF Travel Assistance Committee.


Re: Solr connector authentication issue

2023-06-07 Thread Karl Wright
But if those are set, and the connection health check passes, then I can't
tell you why Solr is unhappy with your connection.  It's clearly working
sometimes.  I'd look on the Solr end to figure out whether its rejection is
coming from just one of your instances.



On Wed, Jun 7, 2023 at 7:49 AM Karl Wright  wrote:

> The Solr output connection configuration contains all credentials that are
> sent to Solr.  If those aren't set Solr won't get them.
>
> Karl
>
>
> On Wed, Jun 7, 2023 at 7:23 AM Marisol Redondo <
> marisol.redondo.gar...@gmail.com> wrote:
>
>> Hi,
>>
>> We are using Solr 8 with basic authentication, and when checking the
>> output connection I'm getting an Exception "Solr authorization failure,
>> code 401: aborting job"
>>
>> The solr type is Solrcloud, as we have 3 server (installed in AWS
>> Kubernette containers), I have set the user ID and password in the Sever
>> tab and can connect to Zookeeper and solr, as, if I unchecked the option
>> "Bock anonymous request", the connector is working.
>>
>> How can I make the connection working? I can't unchecked the "Block
>> anonymous request"
>> Am I missing any other configuration?
>> Is there any other place where I have to set the user and password?
>>
>> Thanks
>> Marisol
>>
>>


Re: Solr connector authentication issue

2023-06-07 Thread Karl Wright
The Solr output connection configuration contains all credentials that are
sent to Solr.  If those aren't set Solr won't get them.

Karl


On Wed, Jun 7, 2023 at 7:23 AM Marisol Redondo <
marisol.redondo.gar...@gmail.com> wrote:

> Hi,
>
> We are using Solr 8 with basic authentication, and when checking the
> output connection I'm getting an Exception "Solr authorization failure,
> code 401: aborting job"
>
> The solr type is Solrcloud, as we have 3 server (installed in AWS
> Kubernette containers), I have set the user ID and password in the Sever
> tab and can connect to Zookeeper and solr, as, if I unchecked the option
> "Bock anonymous request", the connector is working.
>
> How can I make the connection working? I can't unchecked the "Block
> anonymous request"
> Am I missing any other configuration?
> Is there any other place where I have to set the user and password?
>
> Thanks
> Marisol
>
>


Solr connector authentication issue

2023-06-07 Thread Marisol Redondo
Hi,

We are using Solr 8 with basic authentication, and when checking the output
connection I'm getting an Exception "Solr authorization failure, code 401:
aborting job"

The solr type is Solrcloud, as we have 3 server (installed in AWS
Kubernette containers), I have set the user ID and password in the Sever
tab and can connect to Zookeeper and solr, as, if I unchecked the option
"Bock anonymous request", the connector is working.

How can I make the connection working? I can't unchecked the "Block
anonymous request"
Am I missing any other configuration?
Is there any other place where I have to set the user and password?

Thanks
Marisol


R: Long Job on Windows Share

2023-06-07 Thread Bisonti Mario
In the manifoldcf.log I see many:
WARN 2023-06-05T21:36:51,630 (Worker thread '31') - JCIFS: Possibly transient 
exception detected on attempt 2 while getting share security: All pipe 
instances are busy.
jcifs.smb.SmbException: All pipe instances are busy.
at jcifs.smb.SmbTransportImpl.checkStatus2(SmbTransportImpl.java:1441) 
~[jcifs-ng-2.1.2.jar:?]
at jcifs.smb.SmbTransportImpl.checkStatus(SmbTransportImpl.java:1552) 
~[jcifs-ng-2.1.2.jar:?]
at jcifs.smb.SmbTransportImpl.sendrecv(SmbTransportImpl.java:1007) 
~[jcifs-ng-2.1.2.jar:?]
at jcifs.smb.SmbTransportImpl.send(SmbTransportImpl.java:1523) 
~[jcifs-ng-2.1.2.jar:?]


I don’t see information about reindex or not reindex

Have I to search in a differetn file?

Thanks a lot Mario


R: Long Job on Windows Share

2023-05-26 Thread Bisonti Mario
Thanks a lot Karl

In the “Simple History” in ManifoldCF I see, for every document, even if it’s 
not been modified every day:

26/05/23, 08:47:47 document ingest (SolrShare) 
file:/...Avanzato%202014.pptx
26/05/23, 08:47:46 extract [TikaTrasform]  
file:/...Avanzato%202014.pptx
26/05/23, 08:47:45 access  
file:/...Avanzato%202014.pptx


In Solr, I execute the query to search the document and I see, omitting 
extended result..) :

{
  "responseHeader":{
"status":0,
"QTime":977,
"params":{
  "q":"id:*Avanzato*202014*",
  "_":"1685082709862"}},
  "response":{"numFound":1,"start":0,"docs":[
  {
"id":file:/...Avanzato%202014.pptx,
"last_modified":"2015-03-25T17:27:22Z",
"resourcename":"...Avanzato 2014.pptx",

"content_type":["application/vnd.openxmlformats-officedocument.presentationml.presentation"],
"allow_token_document":["Active+Directory:S-1-5-21-…..",
  "Active+Directory:S-1-..."],
"deny_token_document":["Active+Directory:DEAD_AUTHORITY"],
"allow_token_share":["Active+Directory:S-1-1-0"],
"deny_token_share":["Active+Directory:DEAD_AUTHORITY"],
"deny_token_parent":["__nosecurity__"],
"allow_token_parent":["__nosecurity__"],
"content":["ESER..
"_version_":1766940934228934656}]
  }}


Is this what did you mean when you mentioned “activity log” ?

I see that document in Solr, so, I suppose that it is indexed

What could I investigated furthermore?
Thanks a lot

Mario



Da: Karl Wright 
Inviato: venerdì 26 maggio 2023 07:20
A: user@manifoldcf.apache.org
Oggetto: Re: Long Job on Windows Share

The jcifs connector does not include a lot of information in the version string 
for a file - basically, the length, and the modified date.  So I would not 
expect there to be lot of actual work involved if there are no changes to a 
document.

The activity "access" does imply that the system believes that the document 
does need to be reindexed.  It clearly reads the document properly.  I would 
check to be sure it actually indexes the document.  I suspect that your job may 
be reading the file but determining it is not suitable for indexing and then 
repeating that every day.  You can see this by looking for the document in the 
activity log to see what ManifoldCF decided to do with it.

Karl


On Thu, May 25, 2023 at 6:03 AM Bisonti Mario 
mailto:mario.biso...@vimar.com>> wrote:
Hi,
I would like to understand how recrawl works

My job scan, using “Connection Type”  “Windows shares” works for near 18 hours.
My document numebr a little bit of 1 million.

If I check the documents scan from MifoldCF I see, for example:
[cid:image001.png@01D98FB1.12689F10]

It seems that re work on the document every day even if it hadn’t been modified.
So, is it right or I chose a wrong job to crawl the documents?

Thanks a lot
Mario




Re: Long Job on Windows Share

2023-05-25 Thread Karl Wright
The jcifs connector does not include a lot of information in the version
string for a file - basically, the length, and the modified date.  So I
would not expect there to be lot of actual work involved if there are no
changes to a document.

The activity "access" does imply that the system believes that the document
does need to be reindexed.  It clearly reads the document properly.  I
would check to be sure it actually indexes the document.  I suspect that
your job may be reading the file but determining it is not suitable for
indexing and then repeating that every day.  You can see this by looking
for the document in the activity log to see what ManifoldCF decided to do
with it.

Karl



On Thu, May 25, 2023 at 6:03 AM Bisonti Mario 
wrote:

> Hi,
>
> I would like to understand how recrawl works
>
>
>
> My job scan, using “Connection Type”  “Windows shares” works for near 18
> hours.
>
> My document numebr a little bit of 1 million.
>
>
>
> If I check the documents scan from MifoldCF I see, for example:
>
>
>
> It seems that re work on the document every day even if it hadn’t been
> modified.
>
> So, is it right or I chose a wrong job to crawl the documents?
>
>
>
> Thanks a lot
>
> Mario
>
>
>
>
>


Long Job on Windows Share

2023-05-25 Thread Bisonti Mario
Hi,
I would like to understand how recrawl works

My job scan, using "Connection Type"  "Windows shares" works for near 18 hours.
My document numebr a little bit of 1 million.

If I check the documents scan from MifoldCF I see, for example:
[cid:image001.png@01D98F00.F3071580]

It seems that re work on the document every day even if it hadn't been modified.
So, is it right or I chose a wrong job to crawl the documents?

Thanks a lot
Mario




Re: Apache Manifold Documentum connector

2023-03-17 Thread Rasťa Šíša
Thanks a lot for your kind and elaborate response!
I will do some further investigation on my own towards the documentum.
Best regards,
Rasta

pá 17. 3. 2023 v 12:08 odesílatel Karl Wright  napsal:

> It was open-sourced back in 2012 at the same time ManifoldCF was
> open-sourced.  It was written by a contractor paid by MetaCarta, who also
> paid for the development of ManifoldCF itself (I developed that).  It was
> spun off as open source when MetaCarta was bought by Nokia who had no
> interest in the framework or the connectors.
>
> I do not, off the top of my head, remember the contractor's name nor have
> his contact information any longer.
>
> There are many users of the Documentum Connector, however, and I would
> hope one of them with more DQL experience will respond.
>
> Karl
>
>
>
> On Fri, Mar 17, 2023 at 5:41 AM Rasťa Šíša  wrote:
>
>> Hi Karl, thanks for your answer! Would you be able to point me towards
>> the author/git branch of the documentum connector?
>> Best regards, Rasta
>>
>> čt 16. 3. 2023 v 20:58 odesílatel Karl Wright 
>> napsal:
>>
>>> Hi,
>>>
>>> I didn't write the documentum connector initially, so I trust that the
>>> engineer who did knew how to construct the proper DQL.  I've not seen any
>>> bugs related to it so it does seem to work.
>>>
>>> Karl
>>>
>>>
>>> On Thu, Mar 16, 2023 at 8:23 AM Rasťa Šíša  wrote:
>>>
 Hello,
 i would like to ask how does Documentum Manifold connector select the
 latest version from Documentum system?

 The first query that gets composed collects list of i_chronicle_id in
 DCTM.java. I would like to know though, how does the Manifold recognize the
 latest version of the document(e.g. Effective status).
 From the ui, i am able to select some of the objecttypes, but not
 objecttypes (all).

 In dql it is just e.g.
 *select i_chronicle_id from   *
 instead of *select i_chronicle_id from  (all)
 . *

 This "(all) object" returns all of them. The internal functioning of
 documentum though, with the first type of query, does not select
 i_chronicle_id of documents, that have a newly created version e.g. the
 document is created approved and effective, but someone already created a
 new draft for it. with the (all) in the dql, it brings in all the documents
 and their r_object_id, among which we can select the effective version by
 status.
 Is this a bug in manifold documentum connector, that it does not allow
 you to select those (all) objects and select those documents with new
 versions?
 Best regards,
 Rastislav Sisa

>>>


Re: Apache Manifold Documentum connector

2023-03-17 Thread Karl Wright
It was open-sourced back in 2012 at the same time ManifoldCF was
open-sourced.  It was written by a contractor paid by MetaCarta, who also
paid for the development of ManifoldCF itself (I developed that).  It was
spun off as open source when MetaCarta was bought by Nokia who had no
interest in the framework or the connectors.

I do not, off the top of my head, remember the contractor's name nor have
his contact information any longer.

There are many users of the Documentum Connector, however, and I would hope
one of them with more DQL experience will respond.

Karl



On Fri, Mar 17, 2023 at 5:41 AM Rasťa Šíša  wrote:

> Hi Karl, thanks for your answer! Would you be able to point me towards the
> author/git branch of the documentum connector?
> Best regards, Rasta
>
> čt 16. 3. 2023 v 20:58 odesílatel Karl Wright  napsal:
>
>> Hi,
>>
>> I didn't write the documentum connector initially, so I trust that the
>> engineer who did knew how to construct the proper DQL.  I've not seen any
>> bugs related to it so it does seem to work.
>>
>> Karl
>>
>>
>> On Thu, Mar 16, 2023 at 8:23 AM Rasťa Šíša  wrote:
>>
>>> Hello,
>>> i would like to ask how does Documentum Manifold connector select the
>>> latest version from Documentum system?
>>>
>>> The first query that gets composed collects list of i_chronicle_id in
>>> DCTM.java. I would like to know though, how does the Manifold recognize the
>>> latest version of the document(e.g. Effective status).
>>> From the ui, i am able to select some of the objecttypes, but not
>>> objecttypes (all).
>>>
>>> In dql it is just e.g.
>>> *select i_chronicle_id from   *
>>> instead of *select i_chronicle_id from  (all)
>>> . *
>>>
>>> This "(all) object" returns all of them. The internal functioning of
>>> documentum though, with the first type of query, does not select
>>> i_chronicle_id of documents, that have a newly created version e.g. the
>>> document is created approved and effective, but someone already created a
>>> new draft for it. with the (all) in the dql, it brings in all the documents
>>> and their r_object_id, among which we can select the effective version by
>>> status.
>>> Is this a bug in manifold documentum connector, that it does not allow
>>> you to select those (all) objects and select those documents with new
>>> versions?
>>> Best regards,
>>> Rastislav Sisa
>>>
>>


Re: Apache Manifold Documentum connector

2023-03-17 Thread Rasťa Šíša
Hi Karl, thanks for your answer! Would you be able to point me towards the
author/git branch of the documentum connector?
Best regards, Rasta

čt 16. 3. 2023 v 20:58 odesílatel Karl Wright  napsal:

> Hi,
>
> I didn't write the documentum connector initially, so I trust that the
> engineer who did knew how to construct the proper DQL.  I've not seen any
> bugs related to it so it does seem to work.
>
> Karl
>
>
> On Thu, Mar 16, 2023 at 8:23 AM Rasťa Šíša  wrote:
>
>> Hello,
>> i would like to ask how does Documentum Manifold connector select the
>> latest version from Documentum system?
>>
>> The first query that gets composed collects list of i_chronicle_id in
>> DCTM.java. I would like to know though, how does the Manifold recognize the
>> latest version of the document(e.g. Effective status).
>> From the ui, i am able to select some of the objecttypes, but not
>> objecttypes (all).
>>
>> In dql it is just e.g.
>> *select i_chronicle_id from   *
>> instead of *select i_chronicle_id from  (all)
>> . *
>>
>> This "(all) object" returns all of them. The internal functioning of
>> documentum though, with the first type of query, does not select
>> i_chronicle_id of documents, that have a newly created version e.g. the
>> document is created approved and effective, but someone already created a
>> new draft for it. with the (all) in the dql, it brings in all the documents
>> and their r_object_id, among which we can select the effective version by
>> status.
>> Is this a bug in manifold documentum connector, that it does not allow
>> you to select those (all) objects and select those documents with new
>> versions?
>> Best regards,
>> Rastislav Sisa
>>
>


Re: Apache Manifold Documentum connector

2023-03-16 Thread Karl Wright
Hi,

I didn't write the documentum connector initially, so I trust that the
engineer who did knew how to construct the proper DQL.  I've not seen any
bugs related to it so it does seem to work.

Karl


On Thu, Mar 16, 2023 at 8:23 AM Rasťa Šíša  wrote:

> Hello,
> i would like to ask how does Documentum Manifold connector select the
> latest version from Documentum system?
>
> The first query that gets composed collects list of i_chronicle_id in
> DCTM.java. I would like to know though, how does the Manifold recognize the
> latest version of the document(e.g. Effective status).
> From the ui, i am able to select some of the objecttypes, but not
> objecttypes (all).
>
> In dql it is just e.g.
> *select i_chronicle_id from   *
> instead of *select i_chronicle_id from  (all)
> . *
>
> This "(all) object" returns all of them. The internal functioning of
> documentum though, with the first type of query, does not select
> i_chronicle_id of documents, that have a newly created version e.g. the
> document is created approved and effective, but someone already created a
> new draft for it. with the (all) in the dql, it brings in all the documents
> and their r_object_id, among which we can select the effective version by
> status.
> Is this a bug in manifold documentum connector, that it does not allow you
> to select those (all) objects and select those documents with new versions?
> Best regards,
> Rastislav Sisa
>


Apache Manifold Documentum connector

2023-03-16 Thread Rasťa Šíša
Hello,
i would like to ask how does Documentum Manifold connector select the
latest version from Documentum system?

The first query that gets composed collects list of i_chronicle_id in
DCTM.java. I would like to know though, how does the Manifold recognize the
latest version of the document(e.g. Effective status).
>From the ui, i am able to select some of the objecttypes, but not
objecttypes (all).

In dql it is just e.g.
*select i_chronicle_id from   *
instead of *select i_chronicle_id from  (all) . *

This "(all) object" returns all of them. The internal functioning of
documentum though, with the first type of query, does not select
i_chronicle_id of documents, that have a newly created version e.g. the
document is created approved and effective, but someone already created a
new draft for it. with the (all) in the dql, it brings in all the documents
and their r_object_id, among which we can select the effective version by
status.
Is this a bug in manifold documentum connector, that it does not allow you
to select those (all) objects and select those documents with new versions?
Best regards,
Rastislav Sisa


Fwd: Defining an attribute that is an array in documentum connector

2023-03-14 Thread Rasťa Šíša
Hello, i tried to define a condition in manifoldCF UI where the documentum
attribute is an array on the other side.
applicable_sites = 'desired_value'
But it keeps telling me  Documentum error:
[DM_QUERY_E_REPEATING_USED]error: "You have specified a repeating attribute
(applicable_sites) where it is not allowed."

How can i resolve this? How can i define in ui something like
applicable_sites contains 'desired_value' ? Is there a special syntax for
this?
Best regards,
Rastislav Sisa


Database configuration

2023-02-17 Thread Macek, Radek via user
Hello, please advise how to set up manifoldcf to use a database that runs on 
another machine.

Thx Radek
This e-mail message, together with any attachments, contains information of 
Merck & Co., Inc. (126 East Lincoln Ave., P.O. Box 2000, Rahway, NJ USA 07065) 
and/or its affiliates, that may be confidential, proprietary copyrighted and/or 
legally privileged.   (Direct contact information for affiliates is available 
at - Contact us - MSD.)  It is intended solely 
for the use of the individual or entity named on this message. If you are not 
the intended recipient, and have received this message in error, please notify 
us immediately by reply e-mail and then delete it from your system.


R: JCIFS: Possibly transient exception detected on attempt 1 while getting share security: All pipe instances are busy

2023-02-09 Thread Bisonti Mario

Hi, could you give me any suggestion to solve my issue?

I note that to index 1 million documents (office, pdf, ect) so I use Tika, it 
finishes after near 18 hours.

My host is an Ubuntu server with 8cpu and 68GB RAM

Thanks a lot
Mario



Da: Bisonti Mario
Inviato: mercoledì 1 febbraio 2023 17:50
A: user@manifoldcf.apache.org
Oggetto: R: JCIFS: Possibly transient exception detected on attempt 1 while 
getting share security: All pipe instances are busy

I don't understand.

Would you like to explain me what "running with a profiler" mean, please?

I start agent running a start-agents.sh script, and zookeeper too.

/opt/manifoldcf/multiprocess-zk-example-proprietary/runzookeeper.sh
/opt/manifoldcf/multiprocess-zk-example-proprietary/start-agents.sh

Where start-agents.sh is:

#!/bin/bash -e

cd /opt/manifoldcf/multiprocess-zk-example-proprietary/

if [ -e "$JAVA_HOME"/bin/java ] ; then
if [ -f ./properties.xml ] ; then
./executecommand.sh -Dorg.apache.manifoldcf.processid=A 
org.apache.manifoldcf.agents.AgentRun
   exit $?

else
echo "Working directory contains no properties.xml file." 1>&2
exit 1
fi

else
echo "Environment variable JAVA_HOME is not properly set." 1>&2
exit 1
fi

Thanks a lot Karl.



Da: Karl Wright mailto:daddy...@gmail.com>>
Inviato: mercoledì 1 febbraio 2023 17:38
A: user@manifoldcf.apache.org
Oggetto: Re: JCIFS: Possibly transient exception detected on attempt 1 while 
getting share security: All pipe instances are busy

It looks like you are running with a profiler?  That uses a lot of memory.
Karl


On Wed, Feb 1, 2023 at 8:06 AM Bisonti Mario 
mailto:mario.biso...@vimar.com>> wrote:
This is my hs_err_pid_.log

Command Line: -Xms32768m -Xmx32768m 
-Dorg.apache.manifoldcf.configfile=./properties.xml 
-Djava.security.auth.login.con
fig= -Dorg.apache.manifoldcf.processid=A org.apache.manifoldcf.agents.AgentRun

.
.
.
CodeHeap 'non-profiled nmethods': size=120032Kb used=23677Kb max_used=23677Kb 
free=96354Kb
CodeHeap 'profiled nmethods': size=120028Kb used=20405Kb max_used=27584Kb 
free=99622Kb
CodeHeap 'non-nmethods': size=5700Kb used=1278Kb max_used=1417Kb free=4421Kb
Memory: 4k page, physical 72057128k(7300332k free), swap 4039676k(4039676k free)
.
.

Perhaps could be a RAM problem?

Thanks a lot




Da: Bisonti Mario
Inviato: venerdì 20 gennaio 2023 10:28
A: user@manifoldcf.apache.org
Oggetto: R: JCIFS: Possibly transient exception detected on attempt 1 while 
getting share security: All pipe instances are busy

I see that the agent crashed:

#
# A fatal error has been detected by the Java Runtime Environment:
#
#  Internal Error (g1ConcurrentMark.cpp:1665), pid=2537463, tid=2537470
#  fatal error: Overflow during reference processing, can not continue. Please 
increase MarkStackSizeMax (current value: 16777216) and restart.
#
# JRE version: OpenJDK Runtime Environment (11.0.16+8) (build 
11.0.16+8-post-Ubuntu-0ubuntu120.04)
# Java VM: OpenJDK 64-Bit Server VM (11.0.16+8-post-Ubuntu-0ubuntu120.04, mixed 
mode, tiered, g1 gc, linux-amd64)
# Core dump will be written. Default location: Core dumps may be processed with 
"/usr/share/apport/apport -p%p -s%s -c%c -d%d -P%P -u%u -g%g -- %E" (or dumping 
to /opt/manifoldcf/multiprocess-zk-example-proprietary/core.2537463)
#
# If you would like to submit a bug report, please visit:
#   
https://bugs.launchpad.net/ubuntu/+source/openjdk-lts
#

---  S U M M A R Y 

Command Line: -Xms32768m -Xmx32768m 
-Dorg.apache.manifoldcf.configfile=./properties.xml 
-Djava.security.auth.login.config= -Dorg.apache.manifoldcf.processid=A 
org.apache.manifoldcf.agents.AgentRun

Host: Intel(R) Xeon(R) Gold 6248 CPU @ 2.50GHz, 8 cores, 68G, Ubuntu 20.04.4 LTS
Time: Fri Jan 20 09:38:54 2023 CET elapsed time: 54532.106681 seconds (0d 15h 
8m 52s)

---  T H R E A D  ---

Current thread (0x7f051940a000):  VMThread "VM Thread" [stack: 
0x7f051c50a000,0x7f051c60a000] [id=2537470]

Stack: [0x7f051c50a000,0x7f051c60a000],  sp=0x7f051c608080,  free 
space=1016k
Native frames: (J=compiled Java code, A=aot compiled Java code, j=interpreted, 
Vv=VM code, C=native code)
V  [libjvm.so+0xe963a9]
V  [libjvm.so+0x67b504]
V  [libjvm.so+0x7604e6]


So, where could I change that parameter?
Is it an Agent configuration?

Thanks a lot
Mario


Da: Karl Wright mailto:daddy...@gmail.com>>
Inviato: mercoledì 18 gennaio 2023 14:59
A: 

Performance problems

2023-02-05 Thread Artem Abeleshev
Hi everyone!

I've struggling with the performance problem already for the couple of
weeks. We have two environments:

- `dev` with 2 nodes of ManifoldCF agent + 1 node of Zookeeper
- `prod` with 4 nodes of ManifoldCF agent + 3 node of Zookeeper

ManifoldCF agent settings are identical at the moment, we have expicitly
indicated following settings:

- 200 db handles (`org.apache.manifoldcf.database.maxhandles`)
- 100 worker threads (`org.apache.manifoldcf.crawler.threads`)
- 10 expire threads (`org.apache.manifoldcf.crawler.expirethreads`)
- 10 cleanup threads (`org.apache.manifoldcf.crawler.cleanupthreads`)
- 10 document delete threads (`org.apache.manifoldcf.crawler.deletethreads`)

(but I have tried prod with various configs, the result is the same)

In the Postgres config, at the moment we have `max_connections` of `840`
and `shared_buffers` of `244559`.

I have a job that is runninhg really slowly on production evironment in
comparing to the development environment. I have monitored the JVM using
VisualVM and noticed that all worker threads almost all the time spending
in `WAITING` or `TIMED_WAITING` statuses. I grabbed a lot of threadudmps
and almost every time I found worker threads are waiting on `LockGate`.
What can be the possible cause of it and what I can do with that?

Another thing that makes threads sleep for a long time is a concurrent
modification failures caused by PostgreSQL due to the usage of
`SYNCHRONIZED` isolation level. After failure thread is send to sleep for a
random time up to `6` millis. It is made by design, but is there a way
to reduce amount of these failures?

I will be grateful for any hints or ideas.

Thank you!

With respect,
Abeleshev Artem


Re: Job stucked with cleaning up status

2023-02-03 Thread Karl Wright
The shutdown procedure for ManifoldCF involves sending interruptions (or
socket interruptions) to all worker threads.  These then put the threads in
the "terminated" state, one by one.  So you should only get this if you
shut down the agents process, or try to.  The handling for this is correct,
although sometimes embedded libraries do not handle thread shutdown
requests properly.

Anyhow, the cause of the problem is actually the fact that the output
connection cannot talk to the service, as stated.

Karl


On Fri, Feb 3, 2023 at 12:54 AM Artem Abeleshev <
artem.abeles...@rondhuit.com> wrote:

> Karl, good day!
>
> Thank you for the hint! It was very useful! Actually, you was right and
> the actual problem was about the connection. But I doesn't expect it would
> be so dramatic. Here is what I found using some debugging:
>
> First I have found the actual code that was responsible for the deletion
> of the documents. It was called by the `DocumentDeleteThread`
> (`org.apache.manifoldcf.crawler.system.DocumentDeleteThread`). Then I
> checked how many `DocumentDeleteThread` threads supposed to be started. I
> haven't override the value and got default 10 threads. Then I grabbed
> thread dump and check those threads. I found two strange things:
>
> 1. Not all threads were alived. Some of them were terminated.
> 2. Some live threads have a huge amount of supplementary zk threads like
> `Worker thread 'x'-EventThread` and `Worker thread 'n'-EventThread(...)`.
> Even the threads that already have been termanted also leave behind theirs
> supplementary threads (since they are deamon threads). As a result I have
> from 1000 to 2000 threads in total.
>
> I starting to debug the lived threads and come up to the `deletePost`
> method of `HttpPoster`
> (`org.apache.manifoldcf.agents.output.solr.HttpPoster.deletePost(String,
> IOutputRemoveActivity)`). Here I was always getting an exception:
>
> ```java
> org.apache.solr.client.solrj.SolrServerException: IOException occurred
> when talking to server at: http://10.78.11.71:8983/solr/jaweb
> org.apache.http.conn.ConnectTimeoutException: Connect to 10.78.11.71:8983
> [/10.78.11.71] failed: connect timed out
> ```
>
> An exception was due to the Solr was unavailable (i.e. shut down), so here
> is no surprise. But the following was a true surpise for me. An exception
> I've got is of type `IOException`. Inside the `HttpPoster` that exception
> in the end is handled by the method `handleIOException`
> (org.apache.manifoldcf.agents.output.solr.HttpPoster.handleIOException(IOException,
> String)):
>
> ```java
>   protected static void handleIOException(IOException e, String context)
> throws ManifoldCFException, ServiceInterruption
>   {
> if ((e instanceof InterruptedIOException) && (!(e instanceof
> java.net.SocketTimeoutException)))
>   throw new ManifoldCFException(e.getMessage(),
> ManifoldCFException.INTERRUPTED);
> ...
>   }
> ```
>
> As we can see an exception is wrapped with the `ManifoldCFException`
> exception and assigned with the `INTERRUPTED` error code. Then this
> exception is bubbling up unitl it ends up in the main loop of the
> `DocumentDeleteThread`. Here is the full stack I extract during debug
> (unfortunately not a single exception is logged on the way):
>
> ```java
>
> org.apache.manifoldcf.agents.output.solr.HttpPoster.handleIOException(HttpPoster.java:514),
>
> org.apache.manifoldcf.agents.output.solr.HttpPoster.handleSolrServerException(HttpPoster.java:427),
>
> org.apache.manifoldcf.agents.output.solr.HttpPoster.deletePost(HttpPoster.java:817),
>
> org.apache.manifoldcf.agents.output.solr.SolrConnector.removeDocument(SolrConnector.java:594),
>
> org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester.removeDocument(IncrementalIngester.java:2296),
>
> org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester.documentDeleteMultiple(IncrementalIngester.java:1037),
>
> org.apache.manifoldcf.crawler.system.DocumentDeleteThread.run(DocumentDeleteThread.java:134)
> ```
>
> Inside the main loop of the `DocumentDeleteThread` that exception is
> handles like this:
>
> ```java
> public void run()
>   {
> try
> {
>   ...
>   // Loop
>   while (true)
>   {
> // Do another try/catch around everything in the loop
> try
> {
>   ...
> }
> catch (ManifoldCFException e)
> {
>   if (e.getErrorCode() == ManifoldCFException.INTERRUPTED)
> break;
> ...
> }
>   ...
>   }
> }
> catch (Throwable e)
> {
>   ...
> }
>   }
> ```
>
> It just breaks the loop making thread terminates normally! In a quite a
> short time I always ends up with no `DocumentDeleteThread`s at all and the
> framework transit to the incosistent state.
>
> In the end, I made Solr back online and managed to finish deletion
> successfully. But I think this case should be handled in some way.
>
> With respect,
> Abeleshev 

Re: Job stucked with cleaning up status

2023-02-02 Thread Artem Abeleshev
Karl, good day!

Thank you for the hint! It was very useful! Actually, you was right and the
actual problem was about the connection. But I doesn't expect it would be
so dramatic. Here is what I found using some debugging:

First I have found the actual code that was responsible for the deletion of
the documents. It was called by the `DocumentDeleteThread`
(`org.apache.manifoldcf.crawler.system.DocumentDeleteThread`). Then I
checked how many `DocumentDeleteThread` threads supposed to be started. I
haven't override the value and got default 10 threads. Then I grabbed
thread dump and check those threads. I found two strange things:

1. Not all threads were alived. Some of them were terminated.
2. Some live threads have a huge amount of supplementary zk threads like
`Worker thread 'x'-EventThread` and `Worker thread 'n'-EventThread(...)`.
Even the threads that already have been termanted also leave behind theirs
supplementary threads (since they are deamon threads). As a result I have
from 1000 to 2000 threads in total.

I starting to debug the lived threads and come up to the `deletePost`
method of `HttpPoster`
(`org.apache.manifoldcf.agents.output.solr.HttpPoster.deletePost(String,
IOutputRemoveActivity)`). Here I was always getting an exception:

```java
org.apache.solr.client.solrj.SolrServerException: IOException occurred when
talking to server at: http://10.78.11.71:8983/solr/jaweb
org.apache.http.conn.ConnectTimeoutException: Connect to 10.78.11.71:8983 [/
10.78.11.71] failed: connect timed out
```

An exception was due to the Solr was unavailable (i.e. shut down), so here
is no surprise. But the following was a true surpise for me. An exception
I've got is of type `IOException`. Inside the `HttpPoster` that exception
in the end is handled by the method `handleIOException`
(org.apache.manifoldcf.agents.output.solr.HttpPoster.handleIOException(IOException,
String)):

```java
  protected static void handleIOException(IOException e, String context)
throws ManifoldCFException, ServiceInterruption
  {
if ((e instanceof InterruptedIOException) && (!(e instanceof
java.net.SocketTimeoutException)))
  throw new ManifoldCFException(e.getMessage(),
ManifoldCFException.INTERRUPTED);
...
  }
```

As we can see an exception is wrapped with the `ManifoldCFException`
exception and assigned with the `INTERRUPTED` error code. Then this
exception is bubbling up unitl it ends up in the main loop of the
`DocumentDeleteThread`. Here is the full stack I extract during debug
(unfortunately not a single exception is logged on the way):

```java
org.apache.manifoldcf.agents.output.solr.HttpPoster.handleIOException(HttpPoster.java:514),
org.apache.manifoldcf.agents.output.solr.HttpPoster.handleSolrServerException(HttpPoster.java:427),
org.apache.manifoldcf.agents.output.solr.HttpPoster.deletePost(HttpPoster.java:817),
org.apache.manifoldcf.agents.output.solr.SolrConnector.removeDocument(SolrConnector.java:594),
org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester.removeDocument(IncrementalIngester.java:2296),
org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester.documentDeleteMultiple(IncrementalIngester.java:1037),
org.apache.manifoldcf.crawler.system.DocumentDeleteThread.run(DocumentDeleteThread.java:134)
```

Inside the main loop of the `DocumentDeleteThread` that exception is
handles like this:

```java
public void run()
  {
try
{
  ...
  // Loop
  while (true)
  {
// Do another try/catch around everything in the loop
try
{
  ...
}
catch (ManifoldCFException e)
{
  if (e.getErrorCode() == ManifoldCFException.INTERRUPTED)
break;
...
}
  ...
  }
}
catch (Throwable e)
{
  ...
}
  }
```

It just breaks the loop making thread terminates normally! In a quite a
short time I always ends up with no `DocumentDeleteThread`s at all and the
framework transit to the incosistent state.

In the end, I made Solr back online and managed to finish deletion
successfully. But I think this case should be handled in some way.

With respect,
Abeleshev Artem

On Sun, Jan 29, 2023 at 10:36 PM Karl Wright  wrote:

> Hi,
>
> 2.22 makes no changes to the way document deletions are processed over
> probably 10 previous versions of ManifoldCF.
>
> What likely is the case is that the connection to the output for the job
> you are cleaning up is down.  When that happens, the documents are queued
> but the delete worker threads cannot make any progress.
>
> You can see this maybe by looking at the "Simple Reports" for the job in
> question and see what it is doing and why the deletions are not succeeding.
>
> Karl
>
>
> On Sun, Jan 29, 2023 at 8:18 AM Artem Abeleshev <
> artem.abeles...@rondhuit.com> wrote:
>
>> Hi, everyone!
>>
>> Another problem that I got sometimes. We are using ManifoldCF 2.22.1 with
>> multiple nodes in our production. The creation of the MCF 

R: JCIFS: Possibly transient exception detected on attempt 1 while getting share security: All pipe instances are busy

2023-02-01 Thread Bisonti Mario
I don't understand.

Would you like to explain me what "running with a profiler" mean, please?

I start agent running a start-agents.sh script, and zookeeper too.

/opt/manifoldcf/multiprocess-zk-example-proprietary/runzookeeper.sh
/opt/manifoldcf/multiprocess-zk-example-proprietary/start-agents.sh

Where start-agents.sh is:

#!/bin/bash -e

cd /opt/manifoldcf/multiprocess-zk-example-proprietary/

if [ -e "$JAVA_HOME"/bin/java ] ; then
if [ -f ./properties.xml ] ; then
./executecommand.sh -Dorg.apache.manifoldcf.processid=A 
org.apache.manifoldcf.agents.AgentRun
   exit $?

else
echo "Working directory contains no properties.xml file." 1>&2
exit 1
fi

else
echo "Environment variable JAVA_HOME is not properly set." 1>&2
exit 1
fi

Thanks a lot Karl.



Da: Karl Wright 
Inviato: mercoledì 1 febbraio 2023 17:38
A: user@manifoldcf.apache.org
Oggetto: Re: JCIFS: Possibly transient exception detected on attempt 1 while 
getting share security: All pipe instances are busy

It looks like you are running with a profiler?  That uses a lot of memory.
Karl


On Wed, Feb 1, 2023 at 8:06 AM Bisonti Mario 
mailto:mario.biso...@vimar.com>> wrote:
This is my hs_err_pid_.log

Command Line: -Xms32768m -Xmx32768m 
-Dorg.apache.manifoldcf.configfile=./properties.xml 
-Djava.security.auth.login.con
fig= -Dorg.apache.manifoldcf.processid=A org.apache.manifoldcf.agents.AgentRun

.
.
.
CodeHeap 'non-profiled nmethods': size=120032Kb used=23677Kb max_used=23677Kb 
free=96354Kb
CodeHeap 'profiled nmethods': size=120028Kb used=20405Kb max_used=27584Kb 
free=99622Kb
CodeHeap 'non-nmethods': size=5700Kb used=1278Kb max_used=1417Kb free=4421Kb
Memory: 4k page, physical 72057128k(7300332k free), swap 4039676k(4039676k free)
.
.

Perhaps could be a RAM problem?

Thanks a lot




Da: Bisonti Mario
Inviato: venerdì 20 gennaio 2023 10:28
A: user@manifoldcf.apache.org
Oggetto: R: JCIFS: Possibly transient exception detected on attempt 1 while 
getting share security: All pipe instances are busy

I see that the agent crashed:

#
# A fatal error has been detected by the Java Runtime Environment:
#
#  Internal Error (g1ConcurrentMark.cpp:1665), pid=2537463, tid=2537470
#  fatal error: Overflow during reference processing, can not continue. Please 
increase MarkStackSizeMax (current value: 16777216) and restart.
#
# JRE version: OpenJDK Runtime Environment (11.0.16+8) (build 
11.0.16+8-post-Ubuntu-0ubuntu120.04)
# Java VM: OpenJDK 64-Bit Server VM (11.0.16+8-post-Ubuntu-0ubuntu120.04, mixed 
mode, tiered, g1 gc, linux-amd64)
# Core dump will be written. Default location: Core dumps may be processed with 
"/usr/share/apport/apport -p%p -s%s -c%c -d%d -P%P -u%u -g%g -- %E" (or dumping 
to /opt/manifoldcf/multiprocess-zk-example-proprietary/core.2537463)
#
# If you would like to submit a bug report, please visit:
#   
https://bugs.launchpad.net/ubuntu/+source/openjdk-lts
#

---  S U M M A R Y 

Command Line: -Xms32768m -Xmx32768m 
-Dorg.apache.manifoldcf.configfile=./properties.xml 
-Djava.security.auth.login.config= -Dorg.apache.manifoldcf.processid=A 
org.apache.manifoldcf.agents.AgentRun

Host: Intel(R) Xeon(R) Gold 6248 CPU @ 2.50GHz, 8 cores, 68G, Ubuntu 20.04.4 LTS
Time: Fri Jan 20 09:38:54 2023 CET elapsed time: 54532.106681 seconds (0d 15h 
8m 52s)

---  T H R E A D  ---

Current thread (0x7f051940a000):  VMThread "VM Thread" [stack: 
0x7f051c50a000,0x7f051c60a000] [id=2537470]

Stack: [0x7f051c50a000,0x7f051c60a000],  sp=0x7f051c608080,  free 
space=1016k
Native frames: (J=compiled Java code, A=aot compiled Java code, j=interpreted, 
Vv=VM code, C=native code)
V  [libjvm.so+0xe963a9]
V  [libjvm.so+0x67b504]
V  [libjvm.so+0x7604e6]


So, where could I change that parameter?
Is it an Agent configuration?

Thanks a lot
Mario


Da: Karl Wright mailto:daddy...@gmail.com>>
Inviato: mercoledì 18 gennaio 2023 14:59
A: user@manifoldcf.apache.org
Oggetto: Re: JCIFS: Possibly transient exception detected on attempt 1 while 
getting share security: All pipe instances are busy

When you get a hang like this, getting a thread dump of the agents process is 
essential to figure out what the issue is.  You can't assume that a transient 
error would block anything because that's not how ManifoldCF works, at all.  
Errors push the document in question back onto the queue with a retry time.

Karl


On Wed, Jan 18, 2023 at 6:15 AM Bisonti Mario 

Re: JCIFS: Possibly transient exception detected on attempt 1 while getting share security: All pipe instances are busy

2023-02-01 Thread Karl Wright
It looks like you are running with a profiler?  That uses a lot of memory.
Karl


On Wed, Feb 1, 2023 at 8:06 AM Bisonti Mario 
wrote:

> This is my hs_err_pid_.log
>
>
>
> Command Line: -Xms32768m -Xmx32768m
> -Dorg.apache.manifoldcf.configfile=./properties.xml
> -Djava.security.auth.login.con
>
> fig= -Dorg.apache.manifoldcf.processid=A
> org.apache.manifoldcf.agents.AgentRun
>
>
>
> .
>
> .
>
> .
>
> CodeHeap 'non-profiled nmethods': size=120032Kb used=23677Kb
> max_used=23677Kb free=96354Kb
>
> CodeHeap 'profiled nmethods': size=120028Kb used=20405Kb max_used=27584Kb
> free=99622Kb
>
> CodeHeap 'non-nmethods': size=5700Kb used=1278Kb max_used=1417Kb
> free=4421Kb
>
> Memory: 4k page, physical 72057128k(7300332k free), swap 4039676k(4039676k
> free)
>
> .
>
> .
>
>
>
> Perhaps could be a RAM problem?
>
>
>
> Thanks a lot
>
>
>
>
>
>
>
>
>
> *Da:* Bisonti Mario
> *Inviato:* venerdì 20 gennaio 2023 10:28
> *A:* user@manifoldcf.apache.org
> *Oggetto:* R: JCIFS: Possibly transient exception detected on attempt 1
> while getting share security: All pipe instances are busy
>
>
>
> I see that the agent crashed:
>
> #
>
> # A fatal error has been detected by the Java Runtime Environment:
>
> #
>
> #  Internal Error (g1ConcurrentMark.cpp:1665), pid=2537463, tid=2537470
>
> #  fatal error: Overflow during reference processing, can not continue.
> Please increase MarkStackSizeMax (current value: 16777216) and restart.
>
> #
>
> # JRE version: OpenJDK Runtime Environment (11.0.16+8) (build
> 11.0.16+8-post-Ubuntu-0ubuntu120.04)
>
> # Java VM: OpenJDK 64-Bit Server VM (11.0.16+8-post-Ubuntu-0ubuntu120.04,
> mixed mode, tiered, g1 gc, linux-amd64)
>
> # Core dump will be written. Default location: Core dumps may be processed
> with "/usr/share/apport/apport -p%p -s%s -c%c -d%d -P%P -u%u -g%g -- %E"
> (or dumping to
> /opt/manifoldcf/multiprocess-zk-example-proprietary/core.2537463)
>
> #
>
> # If you would like to submit a bug report, please visit:
>
> #   https://bugs.launchpad.net/ubuntu/+source/openjdk-lts
>
> #
>
>
>
> ---  S U M M A R Y 
>
>
>
> Command Line: -Xms32768m -Xmx32768m
> -Dorg.apache.manifoldcf.configfile=./properties.xml
> -Djava.security.auth.login.config= -Dorg.apache.manifoldcf.processid=A
> org.apache.manifoldcf.agents.AgentRun
>
>
>
> Host: Intel(R) Xeon(R) Gold 6248 CPU @ 2.50GHz, 8 cores, 68G, Ubuntu
> 20.04.4 LTS
>
> Time: Fri Jan 20 09:38:54 2023 CET elapsed time: 54532.106681 seconds (0d
> 15h 8m 52s)
>
>
>
> ---  T H R E A D  ---
>
>
>
> Current thread (0x7f051940a000):  VMThread "VM Thread" [stack:
> 0x7f051c50a000,0x7f051c60a000] [id=2537470]
>
>
>
> Stack: [0x7f051c50a000,0x7f051c60a000],  sp=0x7f051c608080,
> free space=1016k
>
> Native frames: (J=compiled Java code, A=aot compiled Java code,
> j=interpreted, Vv=VM code, C=native code)
>
> V  [libjvm.so+0xe963a9]
>
> V  [libjvm.so+0x67b504]
>
> V  [libjvm.so+0x7604e6]
>
>
>
>
>
> So, where could I change that parameter?
>
> Is it an Agent configuration?
>
> Thanks a lot
>
> Mario
>
>
>
>
>
> *Da:* Karl Wright 
> *Inviato:* mercoledì 18 gennaio 2023 14:59
> *A:* user@manifoldcf.apache.org
> *Oggetto:* Re: JCIFS: Possibly transient exception detected on attempt 1
> while getting share security: All pipe instances are busy
>
>
>
> When you get a hang like this, getting a thread dump of the agents process
> is essential to figure out what the issue is.  You can't assume that a
> transient error would block anything because that's not how ManifoldCF
> works, at all.  Errors push the document in question back onto the queue
> with a retry time.
>
>
>
> Karl
>
>
>
>
>
> On Wed, Jan 18, 2023 at 6:15 AM Bisonti Mario 
> wrote:
>
> Hi Karl.
>
> But I noted that the job was hanging, the document processed was stucked
> on the same number, no further document processing from the 6 a.m until I
> restart Agent
>
>
>
>
>
>
>
>
>
> *Da:* Karl Wright 
> *Inviato:* mercoledì 18 gennaio 2023 12:10
> *A:* user@manifoldcf.apache.org
> *Oggetto:* Re: JCIFS: Possibly transient exception detected on attempt 1
> while getting share security: All pipe instances are busy
>
>
>
> Hi, "Possibly transient issue" means that the error will be retried
> anyway, according to a schedule.  There should not need to be any
> requirement to shut down the agents process and restart.
>
> Karl
>
>
>
> On Wed, Jan 18, 2023 at 5:08 AM Bisonti Mario 
> wrote:
>
> Hi.
>
> Often, I obtain the error:
>
> WARN 2023-01-18T06:18:19,316 (Worker thread '89') - JCIFS: Possibly
> transient exception detected on attempt 1 while getting share security: All
> pipe instances are busy.
>
> jcifs.smb.SmbException: All pipe instances are busy.
>
> at
> jcifs.smb.SmbTransportImpl.checkStatus2(SmbTransportImpl.java:1441)
> ~[jcifs-ng-2.1.2.jar:?]
>
> at
> jcifs.smb.SmbTransportImpl.checkStatus(SmbTransportImpl.java:1552)
> ~[jcifs-ng-2.1.2.jar:?]
>
> at 

R: JCIFS: Possibly transient exception detected on attempt 1 while getting share security: All pipe instances are busy

2023-02-01 Thread Bisonti Mario
This is my hs_err_pid_.log

Command Line: -Xms32768m -Xmx32768m 
-Dorg.apache.manifoldcf.configfile=./properties.xml 
-Djava.security.auth.login.con
fig= -Dorg.apache.manifoldcf.processid=A org.apache.manifoldcf.agents.AgentRun

.
.
.
CodeHeap 'non-profiled nmethods': size=120032Kb used=23677Kb max_used=23677Kb 
free=96354Kb
CodeHeap 'profiled nmethods': size=120028Kb used=20405Kb max_used=27584Kb 
free=99622Kb
CodeHeap 'non-nmethods': size=5700Kb used=1278Kb max_used=1417Kb free=4421Kb
Memory: 4k page, physical 72057128k(7300332k free), swap 4039676k(4039676k free)
.
.

Perhaps could be a RAM problem?

Thanks a lot




Da: Bisonti Mario
Inviato: venerdì 20 gennaio 2023 10:28
A: user@manifoldcf.apache.org
Oggetto: R: JCIFS: Possibly transient exception detected on attempt 1 while 
getting share security: All pipe instances are busy

I see that the agent crashed:

#
# A fatal error has been detected by the Java Runtime Environment:
#
#  Internal Error (g1ConcurrentMark.cpp:1665), pid=2537463, tid=2537470
#  fatal error: Overflow during reference processing, can not continue. Please 
increase MarkStackSizeMax (current value: 16777216) and restart.
#
# JRE version: OpenJDK Runtime Environment (11.0.16+8) (build 
11.0.16+8-post-Ubuntu-0ubuntu120.04)
# Java VM: OpenJDK 64-Bit Server VM (11.0.16+8-post-Ubuntu-0ubuntu120.04, mixed 
mode, tiered, g1 gc, linux-amd64)
# Core dump will be written. Default location: Core dumps may be processed with 
"/usr/share/apport/apport -p%p -s%s -c%c -d%d -P%P -u%u -g%g -- %E" (or dumping 
to /opt/manifoldcf/multiprocess-zk-example-proprietary/core.2537463)
#
# If you would like to submit a bug report, please visit:
#   https://bugs.launchpad.net/ubuntu/+source/openjdk-lts
#

---  S U M M A R Y 

Command Line: -Xms32768m -Xmx32768m 
-Dorg.apache.manifoldcf.configfile=./properties.xml 
-Djava.security.auth.login.config= -Dorg.apache.manifoldcf.processid=A 
org.apache.manifoldcf.agents.AgentRun

Host: Intel(R) Xeon(R) Gold 6248 CPU @ 2.50GHz, 8 cores, 68G, Ubuntu 20.04.4 LTS
Time: Fri Jan 20 09:38:54 2023 CET elapsed time: 54532.106681 seconds (0d 15h 
8m 52s)

---  T H R E A D  ---

Current thread (0x7f051940a000):  VMThread "VM Thread" [stack: 
0x7f051c50a000,0x7f051c60a000] [id=2537470]

Stack: [0x7f051c50a000,0x7f051c60a000],  sp=0x7f051c608080,  free 
space=1016k
Native frames: (J=compiled Java code, A=aot compiled Java code, j=interpreted, 
Vv=VM code, C=native code)
V  [libjvm.so+0xe963a9]
V  [libjvm.so+0x67b504]
V  [libjvm.so+0x7604e6]


So, where could I change that parameter?
Is it an Agent configuration?

Thanks a lot
Mario


Da: Karl Wright mailto:daddy...@gmail.com>>
Inviato: mercoledì 18 gennaio 2023 14:59
A: user@manifoldcf.apache.org
Oggetto: Re: JCIFS: Possibly transient exception detected on attempt 1 while 
getting share security: All pipe instances are busy

When you get a hang like this, getting a thread dump of the agents process is 
essential to figure out what the issue is.  You can't assume that a transient 
error would block anything because that's not how ManifoldCF works, at all.  
Errors push the document in question back onto the queue with a retry time.

Karl


On Wed, Jan 18, 2023 at 6:15 AM Bisonti Mario 
mailto:mario.biso...@vimar.com>> wrote:
Hi Karl.
But I noted that the job was hanging, the document processed was stucked on the 
same number, no further document processing from the 6 a.m until I restart Agent




Da: Karl Wright mailto:daddy...@gmail.com>>
Inviato: mercoledì 18 gennaio 2023 12:10
A: user@manifoldcf.apache.org
Oggetto: Re: JCIFS: Possibly transient exception detected on attempt 1 while 
getting share security: All pipe instances are busy

Hi, "Possibly transient issue" means that the error will be retried anyway, 
according to a schedule.  There should not need to be any requirement to shut 
down the agents process and restart.
Karl

On Wed, Jan 18, 2023 at 5:08 AM Bisonti Mario 
mailto:mario.biso...@vimar.com>> wrote:
Hi.
Often, I obtain the error:

WARN 2023-01-18T06:18:19,316 (Worker thread '89') - JCIFS: Possibly transient 
exception detected on attempt 1 while getting share security: All pipe 
instances are busy.
jcifs.smb.SmbException: All pipe instances are busy.
at jcifs.smb.SmbTransportImpl.checkStatus2(SmbTransportImpl.java:1441) 
~[jcifs-ng-2.1.2.jar:?]
at jcifs.smb.SmbTransportImpl.checkStatus(SmbTransportImpl.java:1552) 
~[jcifs-ng-2.1.2.jar:?]
at jcifs.smb.SmbTransportImpl.sendrecv(SmbTransportImpl.java:1007) 
~[jcifs-ng-2.1.2.jar:?]
at jcifs.smb.SmbTransportImpl.send(SmbTransportImpl.java:1523) 
~[jcifs-ng-2.1.2.jar:?]
at jcifs.smb.SmbSessionImpl.send(SmbSessionImpl.java:409) 
~[jcifs-ng-2.1.2.jar:?]
at jcifs.smb.SmbTreeImpl.send(SmbTreeImpl.java:472) 
~[jcifs-ng-2.1.2.jar:?]
at 

Re: Job stucked with cleaning up status

2023-01-29 Thread Karl Wright
Hi,

2.22 makes no changes to the way document deletions are processed over
probably 10 previous versions of ManifoldCF.

What likely is the case is that the connection to the output for the job
you are cleaning up is down.  When that happens, the documents are queued
but the delete worker threads cannot make any progress.

You can see this maybe by looking at the "Simple Reports" for the job in
question and see what it is doing and why the deletions are not succeeding.

Karl


On Sun, Jan 29, 2023 at 8:18 AM Artem Abeleshev <
artem.abeles...@rondhuit.com> wrote:

> Hi, everyone!
>
> Another problem that I got sometimes. We are using ManifoldCF 2.22.1 with
> multiple nodes in our production. The creation of the MCF job pipeline is
> handled via the API calls from our service. We create jobs, repositories
> and output repositories. The crawler extracts documents and then they are
> pushed to the Solr. The pipeline works OK.
>
> The problem is about deleteing the job. Sometimes the job get stucked with
> a `Cleaning up` status (in DB it has status `e` that corresponds to status
> `STATUS_DELETING`). This time I have used MCF Web Admin to delete the job
> (pressed the delete button on the job list page).
>
> I have checked sources and debug it a bit. The method
> `deleteJobsReadyForDelete()`
> (`org.apache.manifoldcf.crawler.jobs.JobManager.deleteJobsReadyForDelete()`)
> is works OK. It is unable to delete the job cause it still found some
> documents in the document's queue table. The following SQL is executed
> within this method:
>
> ```sql
> select id from jobqueue where jobid = '1658215015582' and (status = 'E' or
> status = 'D') limit 1;
> ```
>
> where `E` status stands for `STATUS_ELIGIBLEFORDELETE` and `D` status
> stands for `STATUS_BEINGDELETED`. If at least one of such a documents is
> found in the queue it will do nothing. At the moment I had a lot of
> documents resided within the `jobqueue` having indicated statuses (actually
> all of them have `D` status).
>
> I see that `Documents delete stuffer thread` is running, and it set status
> `STATUS_BEINGDELETED` to the documents via the
> `getNextDeletableDocuments()` method
> (`org.apache.manifoldcf.crawler.jobs.JobManager.getNextDeletableDocuments(String,
> int, long)`). But I can't find any logic that actually deletes the
> documents. I've searched throught the sources, but status
> `STATUS_BEINGDELETED` mentioned mostly in `NOT EXISTS ...` queries.
> Searching in reverse order from `JobQueue`
> (`org.apache.manifoldcf.crawler.jobs.JobQueue`) also doesn't give result to
> me. I will be appreciated if somewone can point where to look, so I can
> debug and check what conditions are preventing documents to be removed.
>
> Thank you!
>
> With respect,
> Artem Abeleshev
>


Job stucked with cleaning up status

2023-01-29 Thread Artem Abeleshev
Hi, everyone!

Another problem that I got sometimes. We are using ManifoldCF 2.22.1 with
multiple nodes in our production. The creation of the MCF job pipeline is
handled via the API calls from our service. We create jobs, repositories
and output repositories. The crawler extracts documents and then they are
pushed to the Solr. The pipeline works OK.

The problem is about deleteing the job. Sometimes the job get stucked with
a `Cleaning up` status (in DB it has status `e` that corresponds to status
`STATUS_DELETING`). This time I have used MCF Web Admin to delete the job
(pressed the delete button on the job list page).

I have checked sources and debug it a bit. The method
`deleteJobsReadyForDelete()`
(`org.apache.manifoldcf.crawler.jobs.JobManager.deleteJobsReadyForDelete()`)
is works OK. It is unable to delete the job cause it still found some
documents in the document's queue table. The following SQL is executed
within this method:

```sql
select id from jobqueue where jobid = '1658215015582' and (status = 'E' or
status = 'D') limit 1;
```

where `E` status stands for `STATUS_ELIGIBLEFORDELETE` and `D` status
stands for `STATUS_BEINGDELETED`. If at least one of such a documents is
found in the queue it will do nothing. At the moment I had a lot of
documents resided within the `jobqueue` having indicated statuses (actually
all of them have `D` status).

I see that `Documents delete stuffer thread` is running, and it set status
`STATUS_BEINGDELETED` to the documents via the
`getNextDeletableDocuments()` method
(`org.apache.manifoldcf.crawler.jobs.JobManager.getNextDeletableDocuments(String,
int, long)`). But I can't find any logic that actually deletes the
documents. I've searched throught the sources, but status
`STATUS_BEINGDELETED` mentioned mostly in `NOT EXISTS ...` queries.
Searching in reverse order from `JobQueue`
(`org.apache.manifoldcf.crawler.jobs.JobQueue`) also doesn't give result to
me. I will be appreciated if somewone can point where to look, so I can
debug and check what conditions are preventing documents to be removed.

Thank you!

With respect,
Artem Abeleshev


Duplicate key value violates unique constraint "repohistory_pkey"

2023-01-28 Thread Artem Abeleshev
Hi everyone!

We are using ManifoldCF 2.22.1 with multiple nodes in our production. And I
am investigating the problem we've got recently (it happens at least 5-6
times already). Couple of our jobs are end up with the following error:

```
Error: ERROR: duplicate key value violates unique constraint
"repohistory_pkey" Detail: Key (id)=(1672652357009) already exists.
```

and following log entry appears in the logs of the one of the nodes:

```
org.apache.manifoldcf.core.interfaces.ManifoldCFException: ERROR: duplicate
key value violates unique constraint "repohistory_pkey"
  Detail: Key (id)=(1673507409625) already exists.
at
org.apache.manifoldcf.core.database.DBInterfacePostgreSQL.reinterpretException(DBInterfacePostgreSQL.java:638)
~[mcf-core.jar:?]
at
org.apache.manifoldcf.core.database.DBInterfacePostgreSQL.performModification(DBInterfacePostgreSQL.java:665)
~[mcf-core.jar:?]
at
org.apache.manifoldcf.core.database.DBInterfacePostgreSQL.performInsert(DBInterfacePostgreSQL.java:187)
~[mcf-core.jar:?]
at
org.apache.manifoldcf.core.database.BaseTable.performInsert(BaseTable.java:68)
~[mcf-core.jar:?]
at
org.apache.manifoldcf.crawler.repository.RepositoryHistoryManager.addRow(RepositoryHistoryManager.java:202)
~[mcf-pull-agent.jar:?]
at
org.apache.manifoldcf.crawler.repository.RepositoryConnectionManager.recordHistory(RepositoryConnectionManager.java:706)
~[mcf-pull-agent.jar:?]
at
org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.recordActivity(WorkerThread.java:1878)
~[mcf-pull-agent.jar:?]
at
org.apache.manifoldcf.crawler.connectors.webcrawler.WebcrawlerConnector.processDocument(WebcrawlerConnector.java:1470)
~[?:?]
at
org.apache.manifoldcf.crawler.connectors.webcrawler.WebcrawlerConnector.processDocuments(WebcrawlerConnector.java:753)
~[?:?]
at
org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:402)
[mcf-pull-agent.jar:?]
Caused by: org.postgresql.util.PSQLException: ERROR: duplicate key value
violates unique constraint "repohistory_pkey"
  Detail: Key (id)=(1673507409625) already exists.
at
org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:2476)
~[postgresql-42.1.3.jar:42.1.3]
at
org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:2189)
~[postgresql-42.1.3.jar:42.1.3]
at
org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:300)
~[postgresql-42.1.3.jar:42.1.3]
at
org.postgresql.jdbc.PgStatement.executeInternal(PgStatement.java:428)
~[postgresql-42.1.3.jar:42.1.3]
at org.postgresql.jdbc.PgStatement.execute(PgStatement.java:354)
~[postgresql-42.1.3.jar:42.1.3]
at
org.postgresql.jdbc.PgPreparedStatement.executeWithFlags(PgPreparedStatement.java:169)
~[postgresql-42.1.3.jar:42.1.3]
at
org.postgresql.jdbc.PgPreparedStatement.executeUpdate(PgPreparedStatement.java:136)
~[postgresql-42.1.3.jar:42.1.3]
at
org.apache.manifoldcf.core.database.Database.execute(Database.java:916)
~[mcf-core.jar:?]
at
org.apache.manifoldcf.core.database.Database$ExecuteQueryThread.run(Database.java:696)
~[mcf-core.jar:?]
```

First, I have noticed that ID of the entities in the ManifoldCF are actualy
a timestamps. So I become curious how it handles duplications and starting
to dig the sources to get an idea of how an ids are generated. I found that
ids are generated by the `IDFactory`
(`org.apache.manifoldcf.core.interfaces.IDFactory`). `IDFactory` is using
the id's pool. Each time we need a new id it will be extracted from the
pool. In case of pool is empty `IDFactory` will generate another 100
entries. To make sure ids are not overlapped the last generated id is
stored in the zookeeper, so each time `IDFactory` will start generating
next batch of ids, it will start from the last id generated. This part
looks clean to me.

Next investigation was concerning locking. It is obvious that during id
generation we should handle synronization on the thread level (local jvm)
and global level (zookeeper). Both global and local locking also looks fine.

The other observation I made is that all cases happens during saving the
repository history records. So the next idea was that probably the same
record was trying to be stored repeatedly. But it seems it is quite hard to
investigate this part as a lot of service layers can call this.

For now I have just disabled history completely by placing
`org.apache.manifoldcf.crawler.repository.store_history` propeprty with
`false` value to the Zookeeper. If you have some ideas or had an experience
that can shed some light on the problem, I would be greatly appreciated.

Thank you!

With respect,
Artem Abeleshev


R: JCIFS: Possibly transient exception detected on attempt 1 while getting share security: All pipe instances are busy

2023-01-20 Thread Bisonti Mario
I see that the agent crashed:

#
# A fatal error has been detected by the Java Runtime Environment:
#
#  Internal Error (g1ConcurrentMark.cpp:1665), pid=2537463, tid=2537470
#  fatal error: Overflow during reference processing, can not continue. Please 
increase MarkStackSizeMax (current value: 16777216) and restart.
#
# JRE version: OpenJDK Runtime Environment (11.0.16+8) (build 
11.0.16+8-post-Ubuntu-0ubuntu120.04)
# Java VM: OpenJDK 64-Bit Server VM (11.0.16+8-post-Ubuntu-0ubuntu120.04, mixed 
mode, tiered, g1 gc, linux-amd64)
# Core dump will be written. Default location: Core dumps may be processed with 
"/usr/share/apport/apport -p%p -s%s -c%c -d%d -P%P -u%u -g%g -- %E" (or dumping 
to /opt/manifoldcf/multiprocess-zk-example-proprietary/core.2537463)
#
# If you would like to submit a bug report, please visit:
#   https://bugs.launchpad.net/ubuntu/+source/openjdk-lts
#

---  S U M M A R Y 

Command Line: -Xms32768m -Xmx32768m 
-Dorg.apache.manifoldcf.configfile=./properties.xml 
-Djava.security.auth.login.config= -Dorg.apache.manifoldcf.processid=A 
org.apache.manifoldcf.agents.AgentRun

Host: Intel(R) Xeon(R) Gold 6248 CPU @ 2.50GHz, 8 cores, 68G, Ubuntu 20.04.4 LTS
Time: Fri Jan 20 09:38:54 2023 CET elapsed time: 54532.106681 seconds (0d 15h 
8m 52s)

---  T H R E A D  ---

Current thread (0x7f051940a000):  VMThread "VM Thread" [stack: 
0x7f051c50a000,0x7f051c60a000] [id=2537470]

Stack: [0x7f051c50a000,0x7f051c60a000],  sp=0x7f051c608080,  free 
space=1016k
Native frames: (J=compiled Java code, A=aot compiled Java code, j=interpreted, 
Vv=VM code, C=native code)
V  [libjvm.so+0xe963a9]
V  [libjvm.so+0x67b504]
V  [libjvm.so+0x7604e6]


So, where could I change that parameter?
Is it an Agent configuration?

Thanks a lot
Mario


Da: Karl Wright 
Inviato: mercoledì 18 gennaio 2023 14:59
A: user@manifoldcf.apache.org
Oggetto: Re: JCIFS: Possibly transient exception detected on attempt 1 while 
getting share security: All pipe instances are busy

When you get a hang like this, getting a thread dump of the agents process is 
essential to figure out what the issue is.  You can't assume that a transient 
error would block anything because that's not how ManifoldCF works, at all.  
Errors push the document in question back onto the queue with a retry time.

Karl


On Wed, Jan 18, 2023 at 6:15 AM Bisonti Mario 
mailto:mario.biso...@vimar.com>> wrote:
Hi Karl.
But I noted that the job was hanging, the document processed was stucked on the 
same number, no further document processing from the 6 a.m until I restart Agent




Da: Karl Wright mailto:daddy...@gmail.com>>
Inviato: mercoledì 18 gennaio 2023 12:10
A: user@manifoldcf.apache.org
Oggetto: Re: JCIFS: Possibly transient exception detected on attempt 1 while 
getting share security: All pipe instances are busy

Hi, "Possibly transient issue" means that the error will be retried anyway, 
according to a schedule.  There should not need to be any requirement to shut 
down the agents process and restart.
Karl

On Wed, Jan 18, 2023 at 5:08 AM Bisonti Mario 
mailto:mario.biso...@vimar.com>> wrote:
Hi.
Often, I obtain the error:

WARN 2023-01-18T06:18:19,316 (Worker thread '89') - JCIFS: Possibly transient 
exception detected on attempt 1 while getting share security: All pipe 
instances are busy.
jcifs.smb.SmbException: All pipe instances are busy.
at jcifs.smb.SmbTransportImpl.checkStatus2(SmbTransportImpl.java:1441) 
~[jcifs-ng-2.1.2.jar:?]
at jcifs.smb.SmbTransportImpl.checkStatus(SmbTransportImpl.java:1552) 
~[jcifs-ng-2.1.2.jar:?]
at jcifs.smb.SmbTransportImpl.sendrecv(SmbTransportImpl.java:1007) 
~[jcifs-ng-2.1.2.jar:?]
at jcifs.smb.SmbTransportImpl.send(SmbTransportImpl.java:1523) 
~[jcifs-ng-2.1.2.jar:?]
at jcifs.smb.SmbSessionImpl.send(SmbSessionImpl.java:409) 
~[jcifs-ng-2.1.2.jar:?]
at jcifs.smb.SmbTreeImpl.send(SmbTreeImpl.java:472) 
~[jcifs-ng-2.1.2.jar:?]
at jcifs.smb.SmbTreeConnection.send0(SmbTreeConnection.java:399) 
~[jcifs-ng-2.1.2.jar:?]
at jcifs.smb.SmbTreeConnection.send(SmbTreeConnection.java:314) 
~[jcifs-ng-2.1.2.jar:?]
at jcifs.smb.SmbTreeConnection.send(SmbTreeConnection.java:294) 
~[jcifs-ng-2.1.2.jar:?]
at jcifs.smb.SmbTreeHandleImpl.send(SmbTreeHandleImpl.java:130) 
~[jcifs-ng-2.1.2.jar:?]
at jcifs.smb.SmbTreeHandleImpl.send(SmbTreeHandleImpl.java:117) 
~[jcifs-ng-2.1.2.jar:?]
at jcifs.smb.SmbFile.openUnshared(SmbFile.java:665) 
~[jcifs-ng-2.1.2.jar:?]
at jcifs.smb.SmbPipeHandleImpl.ensureOpen(SmbPipeHandleImpl.java:169) 
~[jcifs-ng-2.1.2.jar:?]
at jcifs.smb.SmbPipeHandleImpl.sendrecv(SmbPipeHandleImpl.java:250) 
~[jcifs-ng-2.1.2.jar:?]
at 
jcifs.dcerpc.DcerpcPipeHandle.doSendReceiveFragment(DcerpcPipeHandle.java:113) 
~[jcifs-ng-2.1.2.jar:?]
at jcifs.dcerpc.DcerpcHandle.sendrecv(DcerpcHandle.java:243) 

Re: JCIFS: Possibly transient exception detected on attempt 1 while getting share security: All pipe instances are busy

2023-01-18 Thread Karl Wright
When you get a hang like this, getting a thread dump of the agents process
is essential to figure out what the issue is.  You can't assume that a
transient error would block anything because that's not how ManifoldCF
works, at all.  Errors push the document in question back onto the queue
with a retry time.

Karl


On Wed, Jan 18, 2023 at 6:15 AM Bisonti Mario 
wrote:

> Hi Karl.
>
> But I noted that the job was hanging, the document processed was stucked
> on the same number, no further document processing from the 6 a.m until I
> restart Agent
>
>
>
>
>
>
>
>
>
> *Da:* Karl Wright 
> *Inviato:* mercoledì 18 gennaio 2023 12:10
> *A:* user@manifoldcf.apache.org
> *Oggetto:* Re: JCIFS: Possibly transient exception detected on attempt 1
> while getting share security: All pipe instances are busy
>
>
>
> Hi, "Possibly transient issue" means that the error will be retried
> anyway, according to a schedule.  There should not need to be any
> requirement to shut down the agents process and restart.
>
> Karl
>
>
>
> On Wed, Jan 18, 2023 at 5:08 AM Bisonti Mario 
> wrote:
>
> Hi.
>
> Often, I obtain the error:
>
> WARN 2023-01-18T06:18:19,316 (Worker thread '89') - JCIFS: Possibly
> transient exception detected on attempt 1 while getting share security: All
> pipe instances are busy.
>
> jcifs.smb.SmbException: All pipe instances are busy.
>
> at
> jcifs.smb.SmbTransportImpl.checkStatus2(SmbTransportImpl.java:1441)
> ~[jcifs-ng-2.1.2.jar:?]
>
> at
> jcifs.smb.SmbTransportImpl.checkStatus(SmbTransportImpl.java:1552)
> ~[jcifs-ng-2.1.2.jar:?]
>
> at jcifs.smb.SmbTransportImpl.sendrecv(SmbTransportImpl.java:1007)
> ~[jcifs-ng-2.1.2.jar:?]
>
> at jcifs.smb.SmbTransportImpl.send(SmbTransportImpl.java:1523)
> ~[jcifs-ng-2.1.2.jar:?]
>
> at jcifs.smb.SmbSessionImpl.send(SmbSessionImpl.java:409)
> ~[jcifs-ng-2.1.2.jar:?]
>
> at jcifs.smb.SmbTreeImpl.send(SmbTreeImpl.java:472)
> ~[jcifs-ng-2.1.2.jar:?]
>
> at jcifs.smb.SmbTreeConnection.send0(SmbTreeConnection.java:399)
> ~[jcifs-ng-2.1.2.jar:?]
>
> at jcifs.smb.SmbTreeConnection.send(SmbTreeConnection.java:314)
> ~[jcifs-ng-2.1.2.jar:?]
>
> at jcifs.smb.SmbTreeConnection.send(SmbTreeConnection.java:294)
> ~[jcifs-ng-2.1.2.jar:?]
>
> at jcifs.smb.SmbTreeHandleImpl.send(SmbTreeHandleImpl.java:130)
> ~[jcifs-ng-2.1.2.jar:?]
>
> at jcifs.smb.SmbTreeHandleImpl.send(SmbTreeHandleImpl.java:117)
> ~[jcifs-ng-2.1.2.jar:?]
>
> at jcifs.smb.SmbFile.openUnshared(SmbFile.java:665)
> ~[jcifs-ng-2.1.2.jar:?]
>
> at
> jcifs.smb.SmbPipeHandleImpl.ensureOpen(SmbPipeHandleImpl.java:169)
> ~[jcifs-ng-2.1.2.jar:?]
>
> at
> jcifs.smb.SmbPipeHandleImpl.sendrecv(SmbPipeHandleImpl.java:250)
> ~[jcifs-ng-2.1.2.jar:?]
>
> at
> jcifs.dcerpc.DcerpcPipeHandle.doSendReceiveFragment(DcerpcPipeHandle.java:113)
> ~[jcifs-ng-2.1.2.jar:?]
>
> at jcifs.dcerpc.DcerpcHandle.sendrecv(DcerpcHandle.java:243)
> ~[jcifs-ng-2.1.2.jar:?]
>
> at jcifs.dcerpc.DcerpcHandle.bind(DcerpcHandle.java:216)
> ~[jcifs-ng-2.1.2.jar:?]
>
> at jcifs.dcerpc.DcerpcHandle.sendrecv(DcerpcHandle.java:234)
> ~[jcifs-ng-2.1.2.jar:?]
>
> at jcifs.smb.SmbFile.getShareSecurity(SmbFile.java:2337)
> ~[jcifs-ng-2.1.2.jar:?]
>
> at
> org.apache.manifoldcf.crawler.connectors.sharedrive.SharedDriveConnector.getFileShareSecurity(SharedDriveConnector.java:2468)
> [mcf-jcifs-connector.jar:?]
>
> at
> org.apache.manifoldcf.crawler.connectors.sharedrive.SharedDriveConnector.getFileShareSecuritySet(SharedDriveConnector.java:1243)
> [mcf-jcifs-connector.jar:?]
>
> at
> org.apache.manifoldcf.crawler.connectors.sharedrive.SharedDriveConnector.processDocuments(SharedDriveConnector.java:647)
> [mcf-jcifs-connector.jar:?]
>
>
>
> So, I have to stop the agent, restart it, and the crwling continues.
>
>
>
> How could I solve my issue?
> Thanks a lot.
>
> Mario
>
>


R: JCIFS: Possibly transient exception detected on attempt 1 while getting share security: All pipe instances are busy

2023-01-18 Thread Bisonti Mario
Hi Karl.
But I noted that the job was hanging, the document processed was stucked on the 
same number, no further document processing from the 6 a.m until I restart Agent




Da: Karl Wright 
Inviato: mercoledì 18 gennaio 2023 12:10
A: user@manifoldcf.apache.org
Oggetto: Re: JCIFS: Possibly transient exception detected on attempt 1 while 
getting share security: All pipe instances are busy

Hi, "Possibly transient issue" means that the error will be retried anyway, 
according to a schedule.  There should not need to be any requirement to shut 
down the agents process and restart.
Karl

On Wed, Jan 18, 2023 at 5:08 AM Bisonti Mario 
mailto:mario.biso...@vimar.com>> wrote:
Hi.
Often, I obtain the error:

WARN 2023-01-18T06:18:19,316 (Worker thread '89') - JCIFS: Possibly transient 
exception detected on attempt 1 while getting share security: All pipe 
instances are busy.
jcifs.smb.SmbException: All pipe instances are busy.
at jcifs.smb.SmbTransportImpl.checkStatus2(SmbTransportImpl.java:1441) 
~[jcifs-ng-2.1.2.jar:?]
at jcifs.smb.SmbTransportImpl.checkStatus(SmbTransportImpl.java:1552) 
~[jcifs-ng-2.1.2.jar:?]
at jcifs.smb.SmbTransportImpl.sendrecv(SmbTransportImpl.java:1007) 
~[jcifs-ng-2.1.2.jar:?]
at jcifs.smb.SmbTransportImpl.send(SmbTransportImpl.java:1523) 
~[jcifs-ng-2.1.2.jar:?]
at jcifs.smb.SmbSessionImpl.send(SmbSessionImpl.java:409) 
~[jcifs-ng-2.1.2.jar:?]
at jcifs.smb.SmbTreeImpl.send(SmbTreeImpl.java:472) 
~[jcifs-ng-2.1.2.jar:?]
at jcifs.smb.SmbTreeConnection.send0(SmbTreeConnection.java:399) 
~[jcifs-ng-2.1.2.jar:?]
at jcifs.smb.SmbTreeConnection.send(SmbTreeConnection.java:314) 
~[jcifs-ng-2.1.2.jar:?]
at jcifs.smb.SmbTreeConnection.send(SmbTreeConnection.java:294) 
~[jcifs-ng-2.1.2.jar:?]
at jcifs.smb.SmbTreeHandleImpl.send(SmbTreeHandleImpl.java:130) 
~[jcifs-ng-2.1.2.jar:?]
at jcifs.smb.SmbTreeHandleImpl.send(SmbTreeHandleImpl.java:117) 
~[jcifs-ng-2.1.2.jar:?]
at jcifs.smb.SmbFile.openUnshared(SmbFile.java:665) 
~[jcifs-ng-2.1.2.jar:?]
at jcifs.smb.SmbPipeHandleImpl.ensureOpen(SmbPipeHandleImpl.java:169) 
~[jcifs-ng-2.1.2.jar:?]
at jcifs.smb.SmbPipeHandleImpl.sendrecv(SmbPipeHandleImpl.java:250) 
~[jcifs-ng-2.1.2.jar:?]
at 
jcifs.dcerpc.DcerpcPipeHandle.doSendReceiveFragment(DcerpcPipeHandle.java:113) 
~[jcifs-ng-2.1.2.jar:?]
at jcifs.dcerpc.DcerpcHandle.sendrecv(DcerpcHandle.java:243) 
~[jcifs-ng-2.1.2.jar:?]
at jcifs.dcerpc.DcerpcHandle.bind(DcerpcHandle.java:216) 
~[jcifs-ng-2.1.2.jar:?]
at jcifs.dcerpc.DcerpcHandle.sendrecv(DcerpcHandle.java:234) 
~[jcifs-ng-2.1.2.jar:?]
at jcifs.smb.SmbFile.getShareSecurity(SmbFile.java:2337) 
~[jcifs-ng-2.1.2.jar:?]
at 
org.apache.manifoldcf.crawler.connectors.sharedrive.SharedDriveConnector.getFileShareSecurity(SharedDriveConnector.java:2468)
 [mcf-jcifs-connector.jar:?]
at 
org.apache.manifoldcf.crawler.connectors.sharedrive.SharedDriveConnector.getFileShareSecuritySet(SharedDriveConnector.java:1243)
 [mcf-jcifs-connector.jar:?]
at 
org.apache.manifoldcf.crawler.connectors.sharedrive.SharedDriveConnector.processDocuments(SharedDriveConnector.java:647)
 [mcf-jcifs-connector.jar:?]

So, I have to stop the agent, restart it, and the crwling continues.

How could I solve my issue?
Thanks a lot.
Mario


Re: JCIFS: Possibly transient exception detected on attempt 1 while getting share security: All pipe instances are busy

2023-01-18 Thread Karl Wright
Hi, "Possibly transient issue" means that the error will be retried anyway,
according to a schedule.  There should not need to be any requirement to
shut down the agents process and restart.
Karl

On Wed, Jan 18, 2023 at 5:08 AM Bisonti Mario 
wrote:

> Hi.
>
> Often, I obtain the error:
>
> WARN 2023-01-18T06:18:19,316 (Worker thread '89') - JCIFS: Possibly
> transient exception detected on attempt 1 while getting share security: All
> pipe instances are busy.
>
> jcifs.smb.SmbException: All pipe instances are busy.
>
> at
> jcifs.smb.SmbTransportImpl.checkStatus2(SmbTransportImpl.java:1441)
> ~[jcifs-ng-2.1.2.jar:?]
>
> at
> jcifs.smb.SmbTransportImpl.checkStatus(SmbTransportImpl.java:1552)
> ~[jcifs-ng-2.1.2.jar:?]
>
> at jcifs.smb.SmbTransportImpl.sendrecv(SmbTransportImpl.java:1007)
> ~[jcifs-ng-2.1.2.jar:?]
>
> at jcifs.smb.SmbTransportImpl.send(SmbTransportImpl.java:1523)
> ~[jcifs-ng-2.1.2.jar:?]
>
> at jcifs.smb.SmbSessionImpl.send(SmbSessionImpl.java:409)
> ~[jcifs-ng-2.1.2.jar:?]
>
> at jcifs.smb.SmbTreeImpl.send(SmbTreeImpl.java:472)
> ~[jcifs-ng-2.1.2.jar:?]
>
> at jcifs.smb.SmbTreeConnection.send0(SmbTreeConnection.java:399)
> ~[jcifs-ng-2.1.2.jar:?]
>
> at jcifs.smb.SmbTreeConnection.send(SmbTreeConnection.java:314)
> ~[jcifs-ng-2.1.2.jar:?]
>
> at jcifs.smb.SmbTreeConnection.send(SmbTreeConnection.java:294)
> ~[jcifs-ng-2.1.2.jar:?]
>
> at jcifs.smb.SmbTreeHandleImpl.send(SmbTreeHandleImpl.java:130)
> ~[jcifs-ng-2.1.2.jar:?]
>
> at jcifs.smb.SmbTreeHandleImpl.send(SmbTreeHandleImpl.java:117)
> ~[jcifs-ng-2.1.2.jar:?]
>
> at jcifs.smb.SmbFile.openUnshared(SmbFile.java:665)
> ~[jcifs-ng-2.1.2.jar:?]
>
> at
> jcifs.smb.SmbPipeHandleImpl.ensureOpen(SmbPipeHandleImpl.java:169)
> ~[jcifs-ng-2.1.2.jar:?]
>
> at
> jcifs.smb.SmbPipeHandleImpl.sendrecv(SmbPipeHandleImpl.java:250)
> ~[jcifs-ng-2.1.2.jar:?]
>
> at
> jcifs.dcerpc.DcerpcPipeHandle.doSendReceiveFragment(DcerpcPipeHandle.java:113)
> ~[jcifs-ng-2.1.2.jar:?]
>
> at jcifs.dcerpc.DcerpcHandle.sendrecv(DcerpcHandle.java:243)
> ~[jcifs-ng-2.1.2.jar:?]
>
> at jcifs.dcerpc.DcerpcHandle.bind(DcerpcHandle.java:216)
> ~[jcifs-ng-2.1.2.jar:?]
>
> at jcifs.dcerpc.DcerpcHandle.sendrecv(DcerpcHandle.java:234)
> ~[jcifs-ng-2.1.2.jar:?]
>
> at jcifs.smb.SmbFile.getShareSecurity(SmbFile.java:2337)
> ~[jcifs-ng-2.1.2.jar:?]
>
> at
> org.apache.manifoldcf.crawler.connectors.sharedrive.SharedDriveConnector.getFileShareSecurity(SharedDriveConnector.java:2468)
> [mcf-jcifs-connector.jar:?]
>
> at
> org.apache.manifoldcf.crawler.connectors.sharedrive.SharedDriveConnector.getFileShareSecuritySet(SharedDriveConnector.java:1243)
> [mcf-jcifs-connector.jar:?]
>
> at
> org.apache.manifoldcf.crawler.connectors.sharedrive.SharedDriveConnector.processDocuments(SharedDriveConnector.java:647)
> [mcf-jcifs-connector.jar:?]
>
>
>
> So, I have to stop the agent, restart it, and the crwling continues.
>
>
>
> How could I solve my issue?
> Thanks a lot.
>
> Mario
>


JCIFS: Possibly transient exception detected on attempt 1 while getting share security: All pipe instances are busy

2023-01-18 Thread Bisonti Mario
Hi.
Often, I obtain the error:

WARN 2023-01-18T06:18:19,316 (Worker thread '89') - JCIFS: Possibly transient 
exception detected on attempt 1 while getting share security: All pipe 
instances are busy.
jcifs.smb.SmbException: All pipe instances are busy.
at jcifs.smb.SmbTransportImpl.checkStatus2(SmbTransportImpl.java:1441) 
~[jcifs-ng-2.1.2.jar:?]
at jcifs.smb.SmbTransportImpl.checkStatus(SmbTransportImpl.java:1552) 
~[jcifs-ng-2.1.2.jar:?]
at jcifs.smb.SmbTransportImpl.sendrecv(SmbTransportImpl.java:1007) 
~[jcifs-ng-2.1.2.jar:?]
at jcifs.smb.SmbTransportImpl.send(SmbTransportImpl.java:1523) 
~[jcifs-ng-2.1.2.jar:?]
at jcifs.smb.SmbSessionImpl.send(SmbSessionImpl.java:409) 
~[jcifs-ng-2.1.2.jar:?]
at jcifs.smb.SmbTreeImpl.send(SmbTreeImpl.java:472) 
~[jcifs-ng-2.1.2.jar:?]
at jcifs.smb.SmbTreeConnection.send0(SmbTreeConnection.java:399) 
~[jcifs-ng-2.1.2.jar:?]
at jcifs.smb.SmbTreeConnection.send(SmbTreeConnection.java:314) 
~[jcifs-ng-2.1.2.jar:?]
at jcifs.smb.SmbTreeConnection.send(SmbTreeConnection.java:294) 
~[jcifs-ng-2.1.2.jar:?]
at jcifs.smb.SmbTreeHandleImpl.send(SmbTreeHandleImpl.java:130) 
~[jcifs-ng-2.1.2.jar:?]
at jcifs.smb.SmbTreeHandleImpl.send(SmbTreeHandleImpl.java:117) 
~[jcifs-ng-2.1.2.jar:?]
at jcifs.smb.SmbFile.openUnshared(SmbFile.java:665) 
~[jcifs-ng-2.1.2.jar:?]
at jcifs.smb.SmbPipeHandleImpl.ensureOpen(SmbPipeHandleImpl.java:169) 
~[jcifs-ng-2.1.2.jar:?]
at jcifs.smb.SmbPipeHandleImpl.sendrecv(SmbPipeHandleImpl.java:250) 
~[jcifs-ng-2.1.2.jar:?]
at 
jcifs.dcerpc.DcerpcPipeHandle.doSendReceiveFragment(DcerpcPipeHandle.java:113) 
~[jcifs-ng-2.1.2.jar:?]
at jcifs.dcerpc.DcerpcHandle.sendrecv(DcerpcHandle.java:243) 
~[jcifs-ng-2.1.2.jar:?]
at jcifs.dcerpc.DcerpcHandle.bind(DcerpcHandle.java:216) 
~[jcifs-ng-2.1.2.jar:?]
at jcifs.dcerpc.DcerpcHandle.sendrecv(DcerpcHandle.java:234) 
~[jcifs-ng-2.1.2.jar:?]
at jcifs.smb.SmbFile.getShareSecurity(SmbFile.java:2337) 
~[jcifs-ng-2.1.2.jar:?]
at 
org.apache.manifoldcf.crawler.connectors.sharedrive.SharedDriveConnector.getFileShareSecurity(SharedDriveConnector.java:2468)
 [mcf-jcifs-connector.jar:?]
at 
org.apache.manifoldcf.crawler.connectors.sharedrive.SharedDriveConnector.getFileShareSecuritySet(SharedDriveConnector.java:1243)
 [mcf-jcifs-connector.jar:?]
at 
org.apache.manifoldcf.crawler.connectors.sharedrive.SharedDriveConnector.processDocuments(SharedDriveConnector.java:647)
 [mcf-jcifs-connector.jar:?]

So, I have to stop the agent, restart it, and the crwling continues.

How could I solve my issue?
Thanks a lot.
Mario


Re: Help for subscribing the user mailing list of MCF

2023-01-10 Thread Koji Sekiguchi
Hi Karl,

I agree. BTW, Artem, the colleague, finally succeeded to subscribe. He
tried to subscribe some more times before opening JIRA ticket in
INFRA, and he finally got some responses from the ML system. Maybe
they restarted the system or did something else.

Thanks!

Koji

2023年1月10日(火) 20:17 Karl Wright :
>
> Hmm - I haven't heard of difficulties like this before.  The mail manager is 
> used apache-wide; if it doesn't work the best thing to do would be to create 
> an infra ticket in JIRA.
>
> Karl
>
>
> On Tue, Jan 10, 2023 at 3:50 AM Koji Sekiguchi  
> wrote:
>>
>> Hi Karl, everyone!
>>
>> I'm writing to the moderator of the MCF mailing list.
>>
>> I'd like you to help my colleague to subscribe to MCF user mailing list.
>> He's tried to subscribe several times by sending the request to
>> user-subscr...@manifoldcf.apache.org but he said that it seemed that
>> they were just ignored and he couldn't get any responses from the
>> system.
>> The email address is abeleshev at gmail dot com.
>>
>> He has some questions and wants to contribute something if possible.
>>
>> Thanks!
>>
>> Koji


Re: Help for subscribing the user mailing list of MCF

2023-01-10 Thread Karl Wright
Hmm - I haven't heard of difficulties like this before.  The mail manager
is used apache-wide; if it doesn't work the best thing to do would be to
create an infra ticket in JIRA.

Karl


On Tue, Jan 10, 2023 at 3:50 AM Koji Sekiguchi 
wrote:

> Hi Karl, everyone!
>
> I'm writing to the moderator of the MCF mailing list.
>
> I'd like you to help my colleague to subscribe to MCF user mailing list.
> He's tried to subscribe several times by sending the request to
> user-subscr...@manifoldcf.apache.org but he said that it seemed that
> they were just ignored and he couldn't get any responses from the
> system.
> The email address is abeleshev at gmail dot com.
>
> He has some questions and wants to contribute something if possible.
>
> Thanks!
>
> Koji
>


Help for subscribing the user mailing list of MCF

2023-01-10 Thread Koji Sekiguchi
Hi Karl, everyone!

I'm writing to the moderator of the MCF mailing list.

I'd like you to help my colleague to subscribe to MCF user mailing list.
He's tried to subscribe several times by sending the request to
user-subscr...@manifoldcf.apache.org but he said that it seemed that
they were just ignored and he couldn't get any responses from the
system.
The email address is abeleshev at gmail dot com.

He has some questions and wants to contribute something if possible.

Thanks!

Koji


Re: Is Manifold capable of handling these kind of files

2022-12-23 Thread Karl Wright
The internals of ManifoldCF will handle this fine if you are sure to set
the encoding of your database to be UTF-8.  However, I don't know about the
JCIFS library, and whether there might be a restriction on characters in
that code base.  I think you'd have to just try it and see, frankly.

Karl


On Fri, Dec 23, 2022 at 6:52 AM Priya Arora  wrote:

> Hi
>
> Is Manifold capable of handling this kind (ingesting) of file in window
> shares connector which has special characters like these
>
> demo/11208500/11208550/I. Proposal/PHASE II/220808
> Input/__MACOSX/虎尾/._62A33A6377CF08B472CC2AB562BD8B5D.JPG
>
>
> Any reply would be appreciated
>


Is Manifold capable of handling these kind of files

2022-12-23 Thread Priya Arora
Hi

Is Manifold capable of handling this kind (ingesting) of file in window
shares connector which has special characters like these

demo/11208500/11208550/I. Proposal/PHASE II/220808
Input/__MACOSX/虎尾/._62A33A6377CF08B472CC2AB562BD8B5D.JPG


Any reply would be appreciated


Re: Manifoldcf -XML parsing error: Character reference "" is an invalid XML character.

2022-12-22 Thread ritika jain
Can anybody provide any clue on this. Would be of great help

On Thu, Dec 22, 2022 at 5:33 PM ritika jain 
wrote:

> Hi all,
>
> I am using Manifoldcf 2.21 version with Windows shares connector and
> Output as Elastic.
> I am facing this error while clicking "List all jobs", Manifoldcf,  jobs
> are being run/create in such a way that our API is creating a manifold job
> object and thus creating/starting a job in manifold (on server) from API
> hit,  these are being processed/automated via cron jobs. Everytime job is
> being created from API (in Symphony) in manifold , job does not start and
> looks like the below screenshot (2),  & when clicking on "List all jobs" to
> view job structure at least , it straight away gives this error/.
> 1)
> [image: image.png]
> 2)
>
>
> [image: image.png]
>
> I have tried all exclusions -to exclude xml files , exclude any files
> which can have special characters like (–), also exclude tild sign etc
> (~) , because when like searched it looks like this issue , but still
> after this issue persists.
>
> Can anybody help out why manifold gives this error , when a certain job
> (from cron) is created. How to handle it
>
> Thanks
> Ritika
>
>
>


Manifoldcf -XML parsing error: Character reference "" is an invalid XML character.

2022-12-22 Thread ritika jain
Hi all,

I am using Manifoldcf 2.21 version with Windows shares connector and Output
as Elastic.
I am facing this error while clicking "List all jobs", Manifoldcf,  jobs
are being run/create in such a way that our API is creating a manifold job
object and thus creating/starting a job in manifold (on server) from API
hit,  these are being processed/automated via cron jobs. Everytime job is
being created from API (in Symphony) in manifold , job does not start and
looks like the below screenshot (2),  & when clicking on "List all jobs" to
view job structure at least , it straight away gives this error/.
1)
[image: image.png]
2)


[image: image.png]

I have tried all exclusions -to exclude xml files , exclude any files which
can have special characters like (–), also exclude tild sign etc (~) ,
because when like searched it looks like this issue , but still after this
issue persists.

Can anybody help out why manifold gives this error , when a certain job
(from cron) is created. How to handle it

Thanks
Ritika


CVE-2022-45910: Apache ManifoldCF: LDAP Injection Vulnerability - ActiveDirectory Authorities

2022-12-06 Thread Markus Schuch
Description:

Improper neutralization of special elements used in an LDAP query ('LDAP 
Injection') vulnerability in ActiveDirectory and Sharepoint ActiveDirectory 
authority connectors of Apache ManifoldCF allows an attacker to manipulate the 
LDAP search queries (DoS, additional queries, filter manipulation) during user 
lookup, if the username or the domain string are passed to the UserACLs servlet 
without validation.

This issue affects Apache ManifoldCF version 2.23 and prior versions.

Credit:

4ra1n of Chaitin Tech (finder)

References:

https://manifoldcf.apache.org/
https://cve.org/CVERecord?id=CVE-2022-45910



Re: Unscribe

2022-10-22 Thread Muhammed Olgun
Hi Ronny,

Unsubscribing is self-service. Please follow here,

https://manifoldcf.apache.org/en_US/mail.html


On 22 Oct 2022 Sat at 08:55 Ronny Heylen  wrote:

> Hi,
> Please unscribe me from these emails, I don't work anymore.
>
> Regards,
>
> Ronny
>


Unscribe

2022-10-21 Thread Ronny Heylen
Hi,
Please unscribe me from these emails, I don't work anymore.

Regards,

Ronny


Re: Frequent error while window shares job

2022-08-22 Thread Karl Wright
You will need to contact the current maintainers of the Jcifs library to
get answers to these questions.

Karl


On Mon, Aug 22, 2022 at 3:27 AM ritika jain 
wrote:

> Hi All,
>
> I have a Windows shared job to crawl files from samba server, it's a  huge
> job to crawl documents in millions(about 10). While running a job , we
> encounter two types of errors very frequently.
>
> 1)  WARN 2022-08-19T17:17:05,175 (Worker thread '7') - JCIFS: Possibly
> transient exception detected on attempt 3 while getting share security:
> Disconnecting during tree connect
> jcifs.smb.SmbException: Disconnecting during tree connect-- in what case
> it can come
>
> 2) WARN 2019-08-25T15:02:27,416 (Worker thread '11') - Service
> interruption reported for job 1565115290083 connection 'fs_vwoaahvp319':
> Timeout or other service interruption: The process cannot access the file
> because it is being used by another process.
>
> What can be the reason for this?. Can anybody please help how we can make
> the job skip the error (if for any particular file), and then let the job
> run without an abort.
>
> Thanks
> Ritika
>


Frequent error while window shares job

2022-08-22 Thread ritika jain
Hi All,

I have a Windows shared job to crawl files from samba server, it's a  huge
job to crawl documents in millions(about 10). While running a job , we
encounter two types of errors very frequently.

1)  WARN 2022-08-19T17:17:05,175 (Worker thread '7') - JCIFS: Possibly
transient exception detected on attempt 3 while getting share security:
Disconnecting during tree connect
jcifs.smb.SmbException: Disconnecting during tree connect-- in what case it
can come

2) WARN 2019-08-25T15:02:27,416 (Worker thread '11') - Service interruption
reported for job 1565115290083 connection 'fs_vwoaahvp319': Timeout or
other service interruption: The process cannot access the file because it
is being used by another process.

What can be the reason for this?. Can anybody please help how we can make
the job skip the error (if for any particular file), and then let the job
run without an abort.

Thanks
Ritika


[FINAL CALL] - Travel Assistance to ApacheCon New Orleans 2022

2022-06-27 Thread Gavin McDonald
 To all committers and non-committers.

This is a final call to apply for travel/hotel assistance to get to and
stay in New Orleans
for ApacheCon 2022.

Applications have been extended by one week and so the application deadline
is now the 8th July 2022.

The rest of this email is a copy of what has been sent out previously.

We will be supporting ApacheCon North America in New Orleans, Louisiana,
on October 3rd through 6th, 2022.

TAC exists to help those that would like to attend ApacheCon events, but
are unable to do so for financial reasons. This year, We are supporting
both committers and non-committers involved with projects at the
Apache Software Foundation, or open source projects in general.

For more info on this year's applications and qualifying criteria, please
visit the TAC website at http://www.apache.org/travel/
Applications have been extended until the 8th of July 2022.

Important: Applicants have until the closing date above to submit their
applications (which should contain as much supporting material as required
to efficiently and accurately process their request), this will enable TAC
to announce successful awards shortly afterwards.

As usual, TAC expects to deal with a range of applications from a diverse
range of backgrounds. We therefore encourage (as always) anyone thinking
about sending in an application to do so ASAP.

Why should you attend as a TAC recipient? We encourage you to read stories
from
past recipients at https://apache.org/travel/stories/ . Also note that
previous TAC recipients have gone on to become Committers, PMC Members, ASF
Members, Directors of the ASF Board and Infrastructure Staff members.
Others have gone from Committer to full time Open Source Developers!

How far can you go! - Let TAC help get you there.


===

Gavin McDonald on behalf of the Travel Assistance Committee.


Re: Can't delete a job when solr output connection can't connect to the instance.

2022-06-14 Thread Karl Wright
Remember, there is already a "forget" button on the output connection,
which will remove everything associated with the connection.  It's meant to
be used when the output index has been reset and is empty.  I'm not sure
what you'd do different functionally.

Karl


On Tue, Jun 14, 2022 at 2:04 AM Koji Sekiguchi 
wrote:

> +1.
>
> I respect for the design concept of ManifoldCF, but I think force delete
> options make MCF more
> useful for those who use MCF as crawler. Adding force delete options
> doesn't change default
> behaviors and it doesn't break back-compatibility.
>
> Koji
>
> On 2022/06/14 14:46, Ricardo Ruiz wrote:
> > Hi Karl
> > We are using  ManifoldCF as a crawler more than a synchronizer. We are
> thinking of contributing to
> > ManifoldCf by including a force job delete and force output connector
> delete, considering of course
> > the things that need to be deleted with them (BD, etc). Do you think
> this is possible?
> > We think that not only us but the community would be benefited from this
> kind of functionality.
> >
> > Ricardo.
> >
> > On Mon, Jun 13, 2022 at 7:34 PM Karl Wright  daddy...@gmail.com>> wrote:
> >
> > Because ManifoldCF is not just a crawler, but a synchonizer, a job
> represents and includes a
> > list of documents that have been indexed.  Deleting the job requires
> deleting the documents that
> > have been indexed also.  It's part of the basic model.
> >
> > So if you tear down your target output instance and then try to tear
> down the job, it won't
> > work.  ManifoldCF won't just throw away the memory of those
> documents and act as if nothing
> > happened.
> >
> > If you're just using ManifoldCF as a crawler, therefore, your fix is
> about as good as it gets.
> >
> > You can get into similar trouble if, for example, you reinstall
> ManifoldCF but forget to include
> > a connector class that was there before.  Carnage ensues.
> >
> > Karl
> >
> >
> > On Mon, Jun 13, 2022 at 1:39 AM Ricardo Ruiz  > > wrote:
> >
> > Hi all
> > My team uses mcf to crawl documents and index into solr
> instances, but for reasons beyond
> > our control, sometimes the instances or collections are deleted.
> > When we try to delete a job and the solr instance or collection
> doesn't exist anymore, the
> > job reaches the "End notification" status and gets stuck there.
> No other job can be aborted
> > or deleted until the initial error is fixed.
> >
> > We are able to clean up the errors following the next steps:
> >
> > 1.  Reconfigure the output connector to an existing Solr
> instance and collection
> > 2.  Reset the output connection, so it forgets any indexed
> documents.
> > 3.  Reset the job, so it forgets any indexed documents.
> > 4.  Restart the ManifoldCF server.
> >
> > Is there any other way we can solve this error? Is there any way
> we can force delete the job
> > if we don't care about the job's documents anymore?
> >
> > Thanks in advance.
> > Ricardo.
> >
>


Re: Can't delete a job when solr output connection can't connect to the instance.

2022-06-14 Thread Koji Sekiguchi

+1.

I respect for the design concept of ManifoldCF, but I think force delete options make MCF more 
useful for those who use MCF as crawler. Adding force delete options doesn't change default 
behaviors and it doesn't break back-compatibility.


Koji

On 2022/06/14 14:46, Ricardo Ruiz wrote:

Hi Karl
We are using  ManifoldCF as a crawler more than a synchronizer. We are thinking of contributing to 
ManifoldCf by including a force job delete and force output connector delete, considering of course 
the things that need to be deleted with them (BD, etc). Do you think this is possible?

We think that not only us but the community would be benefited from this kind 
of functionality.

Ricardo.

On Mon, Jun 13, 2022 at 7:34 PM Karl Wright mailto:daddy...@gmail.com>> wrote:

Because ManifoldCF is not just a crawler, but a synchonizer, a job 
represents and includes a
list of documents that have been indexed.  Deleting the job requires 
deleting the documents that
have been indexed also.  It's part of the basic model.

So if you tear down your target output instance and then try to tear down 
the job, it won't
work.  ManifoldCF won't just throw away the memory of those documents and 
act as if nothing
happened.

If you're just using ManifoldCF as a crawler, therefore, your fix is about 
as good as it gets.

You can get into similar trouble if, for example, you reinstall ManifoldCF 
but forget to include
a connector class that was there before.  Carnage ensues.

Karl


On Mon, Jun 13, 2022 at 1:39 AM Ricardo Ruiz mailto:ricrui3s...@gmail.com>> wrote:

Hi all
My team uses mcf to crawl documents and index into solr instances, but 
for reasons beyond
our control, sometimes the instances or collections are deleted.
When we try to delete a job and the solr instance or collection doesn't 
exist anymore, the
job reaches the "End notification" status and gets stuck there. No 
other job can be aborted
or deleted until the initial error is fixed.

We are able to clean up the errors following the next steps:

1.  Reconfigure the output connector to an existing Solr instance and 
collection
2.  Reset the output connection, so it forgets any indexed documents.
3.  Reset the job, so it forgets any indexed documents.
4.  Restart the ManifoldCF server.

Is there any other way we can solve this error? Is there any way we can 
force delete the job
if we don't care about the job's documents anymore?

Thanks in advance.
Ricardo.



Re: Can't delete a job when solr output connection can't connect to the instance.

2022-06-13 Thread Ricardo Ruiz
Hi Karl
We are using  ManifoldCF as a crawler more than a synchronizer. We are
thinking of contributing to ManifoldCf by including a force job delete and
force output connector delete, considering of course the things that need
to be deleted with them (BD, etc). Do you think this is possible?
We think that not only us but the community would be benefited from this
kind of functionality.

Ricardo.

On Mon, Jun 13, 2022 at 7:34 PM Karl Wright  wrote:

> Because ManifoldCF is not just a crawler, but a synchonizer, a job
> represents and includes a list of documents that have been indexed.
> Deleting the job requires deleting the documents that have been indexed
> also.  It's part of the basic model.
>
> So if you tear down your target output instance and then try to tear down
> the job, it won't work.  ManifoldCF won't just throw away the memory of
> those documents and act as if nothing happened.
>
> If you're just using ManifoldCF as a crawler, therefore, your fix is about
> as good as it gets.
>
> You can get into similar trouble if, for example, you reinstall ManifoldCF
> but forget to include a connector class that was there before.  Carnage
> ensues.
>
> Karl
>
>
> On Mon, Jun 13, 2022 at 1:39 AM Ricardo Ruiz 
> wrote:
>
>> Hi all
>> My team uses mcf to crawl documents and index into solr instances, but
>> for reasons beyond our control, sometimes the instances or collections are
>> deleted.
>> When we try to delete a job and the solr instance or collection doesn't
>> exist anymore, the job reaches the "End notification" status and gets stuck
>> there. No other job can be aborted or deleted until the initial error is
>> fixed.
>>
>> We are able to clean up the errors following the next steps:
>>
>> 1.  Reconfigure the output connector to an existing Solr instance and
>> collection
>> 2.  Reset the output connection, so it forgets any indexed documents.
>> 3.  Reset the job, so it forgets any indexed documents.
>> 4.  Restart the ManifoldCF server.
>>
>> Is there any other way we can solve this error? Is there any way we can
>> force delete the job if we don't care about the job's documents anymore?
>>
>> Thanks in advance.
>> Ricardo.
>>
>


Re: Can't delete a job when solr output connection can't connect to the instance.

2022-06-13 Thread Karl Wright
Because ManifoldCF is not just a crawler, but a synchonizer, a job
represents and includes a list of documents that have been indexed.
Deleting the job requires deleting the documents that have been indexed
also.  It's part of the basic model.

So if you tear down your target output instance and then try to tear down
the job, it won't work.  ManifoldCF won't just throw away the memory of
those documents and act as if nothing happened.

If you're just using ManifoldCF as a crawler, therefore, your fix is about
as good as it gets.

You can get into similar trouble if, for example, you reinstall ManifoldCF
but forget to include a connector class that was there before.  Carnage
ensues.

Karl


On Mon, Jun 13, 2022 at 1:39 AM Ricardo Ruiz  wrote:

> Hi all
> My team uses mcf to crawl documents and index into solr instances, but for
> reasons beyond our control, sometimes the instances or collections are
> deleted.
> When we try to delete a job and the solr instance or collection doesn't
> exist anymore, the job reaches the "End notification" status and gets stuck
> there. No other job can be aborted or deleted until the initial error is
> fixed.
>
> We are able to clean up the errors following the next steps:
>
> 1.  Reconfigure the output connector to an existing Solr instance and
> collection
> 2.  Reset the output connection, so it forgets any indexed documents.
> 3.  Reset the job, so it forgets any indexed documents.
> 4.  Restart the ManifoldCF server.
>
> Is there any other way we can solve this error? Is there any way we can
> force delete the job if we don't care about the job's documents anymore?
>
> Thanks in advance.
> Ricardo.
>


Can't delete a job when solr output connection can't connect to the instance.

2022-06-12 Thread Ricardo Ruiz
Hi all
My team uses mcf to crawl documents and index into solr instances, but for
reasons beyond our control, sometimes the instances or collections are
deleted.
When we try to delete a job and the solr instance or collection doesn't
exist anymore, the job reaches the "End notification" status and gets stuck
there. No other job can be aborted or deleted until the initial error is
fixed.

We are able to clean up the errors following the next steps:

1.  Reconfigure the output connector to an existing Solr instance and
collection
2.  Reset the output connection, so it forgets any indexed documents.
3.  Reset the job, so it forgets any indexed documents.
4.  Restart the ManifoldCF server.

Is there any other way we can solve this error? Is there any way we can
force delete the job if we don't care about the job's documents anymore?

Thanks in advance.
Ricardo.


Final reminder: ApacheCon North America call for presentations closing soon

2022-05-19 Thread Rich Bowen
[Note: You're receiving this because you are subscribed to one or more
Apache Software Foundation project mailing lists.]

This is your final reminder that the Call for Presetations for
ApacheCon North America 2022 will close at 00:01 GMT on Monday, May
23rd, 2022. Please don't wait! Get your talk proposals in now!

Details here: https://apachecon.com/acna2022/cfp.html

--Rich, for the ApacheCon Planners




REMINDER - Travel Assistance available for ApacheCon NA New Orleans 2022

2022-05-03 Thread Gavin McDonald
Hi All Contributors and Committers,

This is a first reminder email that travel
assistance applications for ApacheCon NA 2022 are now open!

We will be supporting ApacheCon North America in New Orleans, Louisiana,
on October 3rd through 6th, 2022.

TAC exists to help those that would like to attend ApacheCon events, but
are unable to do so for financial reasons. This year, We are supporting
both committers and non-committers involved with projects at the
Apache Software Foundation, or open source projects in general.

For more info on this year's applications and qualifying criteria, please
visit the TAC website at http://www.apache.org/travel/
Applications are open and will close on the 1st of July 2022.

Important: Applicants have until the closing date above to submit their
applications (which should contain as much supporting material as required
to efficiently and accurately process their request), this will enable TAC
to announce successful awards shortly afterwards.

As usual, TAC expects to deal with a range of applications from a diverse
range of backgrounds. We therefore encourage (as always) anyone thinking
about sending in an application to do so ASAP.

Why should you attend as a TAC recipient? We encourage you to read stories
from
past recipients at https://apache.org/travel/stories/ . Also note that
previous TAC recipients have gone on to become Committers, PMC Members, ASF
Members, Directors of the ASF Board and Infrastructure Staff members.
Others have gone from Committer to full time Open Source Developers!

How far can you go! - Let TAC help get you there.


Re: Job Service Interruption- and stops

2022-04-29 Thread Karl Wright
" repeated service interruption" means that it happens again and again.

For this particular document, the problem is that the error we are seeing
is: "The process cannot access the file because it is being used by another
process."

ManifoldCF assumes that if it retries enough it should be able to read the
document eventually.  In this case, if it cannot read the document after 6+
hours, it assumes something is wrong and stops the job.  We can make it
continue at this point but the issue is that you shouldn't be seeing such
an error for such a long period of time.  Perhaps you might want to
research why this is taking place.

Karl


On Fri, Apr 29, 2022 at 4:54 AM ritika jain 
wrote:

> Hi All,
>
> With the window shares connector, on the server I am getting this
> exception and due to repeated service interruption *job stops.*
>
> Error: Repeated service interruptions - failure processing document: The
> process cannot access the file because it is being used by another process.
>
> How we can prevent this.
> I read in the code that it retries the document. But due to service
> interruptions, the jobs stopped.
>
>
> Thanks
> Ritika
>


Job Service Interruption- and stops

2022-04-29 Thread ritika jain
Hi All,

With the window shares connector, on the server I am getting this exception
and due to repeated service interruption *job stops.*

Error: Repeated service interruptions - failure processing document: The
process cannot access the file because it is being used by another process.

How we can prevent this.
I read in the code that it retries the document. But due to service
interruptions, the jobs stopped.


Thanks
Ritika


Call for Presentations now open, ApacheCon North America 2022

2022-03-30 Thread Rich Bowen
[You are receiving this because you are subscribed to one or more user
or dev mailing list of an Apache Software Foundation project.]

ApacheCon draws participants at all levels to explore “Tomorrow’s
Technology Today” across 300+ Apache projects and their diverse
communities. ApacheCon showcases the latest developments in ubiquitous
Apache projects and emerging innovations through hands-on sessions,
keynotes, real-world case studies, trainings, hackathons, community
events, and more.

The Apache Software Foundation will be holding ApacheCon North America
2022 at the New Orleans Sheration, October 3rd through 6th, 2022. The
Call for Presentations is now open, and will close at 00:01 UTC on May
23rd, 2022.

We are accepting presentation proposals for any topic that is related
to the Apache mission of producing free software for the public good.
This includes, but is not limited to:

Community
Big Data
Search
IoT
Cloud
Fintech
Pulsar
Tomcat

You can submit your session proposals starting today at
https://cfp.apachecon.com/

Rich Bowen, on behalf of the ApacheCon Planners
apachecon.com
@apachecon


Re: Log4j Update Doubt

2022-03-15 Thread Karl Wright
We cannot do back patches of older versions of ManifoldCF.  There is a new
release which shipped in January that addresses log4j issues.  I suggest
updating to that.

Karl


On Tue, Mar 15, 2022 at 8:59 AM ritika jain 
wrote:

> Hi,
>
> How manifoldcf uses log4j files in bin directory/distribution. If this is
> the location "D:\\Manifoldcf\apache-manifoldcf-2.14\lib" that is the lib
> folder only.(for physical file presence)
>
> Also if the log4j dependency issue has been resolved and the version 2.15
> or higher is updated, then will it be reflected in Manifoldcf 2.14 version
> also. If not , can you help to let me know at what places the log4j2.4.1
> jar files should be replaced with 2.15.
>
> When the log4j2.15 jar has been manually replaced in 'lib' folder  and the
> older version is deleted (2.4.1), got this as error
> [image: image.png]
>
> What other location is needed to have the latest jar file.
>
> Thanks
> Ritika
>


Log4j Update Doubt

2022-03-15 Thread ritika jain
Hi,

How manifoldcf uses log4j files in bin directory/distribution. If this is
the location "D:\\Manifoldcf\apache-manifoldcf-2.14\lib" that is the lib
folder only.(for physical file presence)

Also if the log4j dependency issue has been resolved and the version 2.15
or higher is updated, then will it be reflected in Manifoldcf 2.14 version
also. If not , can you help to let me know at what places the log4j2.4.1
jar files should be replaced with 2.15.

When the log4j2.15 jar has been manually replaced in 'lib' folder  and the
older version is deleted (2.4.1), got this as error
[image: image.png]

What other location is needed to have the latest jar file.

Thanks
Ritika


Re: Manifoldcf freezes and sit idle

2022-01-31 Thread Karl Wright
As I've mentioned before, the best way to diagnose problems like this is to
get a thread dump of the agents process.  There are many potential reasons
it could occur, ranging from stuck locks to resource starvation.  What
locking model are you using?

Karl


On Mon, Jan 31, 2022 at 6:02 AM ritika jain 
wrote:

> Hi,
>
> I am using Manifoldcf 2.14, web connector and Elastic as output.
> I have observed after a certain time period of continuous run job freezes
> and does not do/process anything. Simple history shows nothing after a
> certain process, and it's not for one job it has been observed for 3
> different jobs , also checked for a certain document (that seems NOT to be
> the case).
>
> Only after restarting the docker container helps. After restarting the job
> continues
>
> What can be the possible reason for this? How it can be prevented for PROD.
>
> Thanks
> Ritika
>


Manifoldcf freezes and sit idle

2022-01-31 Thread ritika jain
Hi,

I am using Manifoldcf 2.14, web connector and Elastic as output.
I have observed after a certain time period of continuous run job freezes
and does not do/process anything. Simple history shows nothing after a
certain process, and it's not for one job it has been observed for 3
different jobs , also checked for a certain document (that seems NOT to be
the case).

Only after restarting the docker container helps. After restarting the job
continues

What can be the possible reason for this? How it can be prevented for PROD.

Thanks
Ritika


Re: Log4j dependency

2021-12-14 Thread Karl Wright
ManifoldCF framework and connectors use log4j 2.x to dump information to
the ManifoldCF log file.

Please read the following page:

https://logging.apache.org/log4j/2.x/security.html

Specifically, this part:

'Descripton: Apache Log4j2 <=2.14.1 JNDI features used in configuration,
log messages, and parameters do not protect against attacker controlled
LDAP and other JNDI related endpoints. An attacker who can control log
messages or log message parameters can execute arbitrary code loaded from
LDAP servers when message lookup substitution is enabled. From log4j
2.15.0, this behavior has been disabled by default.'

In other words, unless you are allowing external people access to the
crawler UI or to the API, it's impossible to exploit this in ManifoldCF.

However, in the interest of assuring people, we are updating this core
dependency to the recommended 2.15.0 anyway.  The release is scheduled by
the end of December.

Karl


On Tue, Dec 14, 2021 at 4:41 AM ritika jain 
wrote:

> .Hi All,
>
> How does manifold.cf use log4j. When I checked pom.xml of ES connector ,
> it is shown as an *exclusion *of maven dependency.
> [image: image.png]
>
> But when checked in Project's downloaded Dependencies, It shows it being
> used and downloaded.
>
> [image: image.png]
> How does manifold use log 4j and how can we change the version of it.
>
> Thanks
> Ritika
>


Re: Log4j dependency

2021-12-14 Thread Furkan KAMACI
Hi Ritika,

For maven check here:

https://github.com/apache/manifoldcf/blob/trunk/pom.xml#L80

For Ant check here:

https://github.com/apache/manifoldcf/blob/trunk/build.xml#L87

Kind Regards,
Furkan KAMACI

On Tue, Dec 14, 2021 at 12:41 PM ritika jain 
wrote:

> .Hi All,
>
> How does manifold.cf use log4j. When I checked pom.xml of ES connector ,
> it is shown as an *exclusion *of maven dependency.
> [image: image.png]
>
> But when checked in Project's downloaded Dependencies, It shows it being
> used and downloaded.
>
> [image: image.png]
> How does manifold use log 4j and how can we change the version of it.
>
> Thanks
> Ritika
>


Log4j dependency

2021-12-14 Thread ritika jain
.Hi All,

How does manifold.cf use log4j. When I checked pom.xml of ES connector , it
is shown as an *exclusion *of maven dependency.
[image: image.png]

But when checked in Project's downloaded Dependencies, It shows it being
used and downloaded.

[image: image.png]
How does manifold use log 4j and how can we change the version of it.

Thanks
Ritika


Two profiles of manifoldcf

2021-12-03 Thread ritika jain
Hi All,

Can we create two different username/password of crawler UI of manifoldcf.

I tried configuring two user profiles in properties.xml, but it's not
working.






Is there a way to do that?

Thanks
Ritika


Re: Manifoldcf background process

2021-11-18 Thread Karl Wright
The degree of parallelism can be controlled in two ways.
The first way is to set the number of worker threads to something
reasonable.  Usually, this is no more than about 2x the number of
processors you have.
The second way is to control the number of connections in your jcifs
connector to keep it at something reasonable, e.g. 4 (because windows SMB
is really not good at handling more than that anyway).

These two controls are independent of each other.  From your description,
it sounds like the parameter you want to set is not the number of worker
threads but rather the number of connections.  But setting both properly
certainly will help.  The reason that having a high worker thread count is
bad is because you use up some amount of memory for each active thread, and
that means if you give too big a value you need to give ManifoldCF way too
much memory, and you won't be able to compute it in advance either.


Karl


On Thu, Nov 18, 2021 at 2:49 AM ritika jain 
wrote:

> Hi All,
>
> I would like to understand the background process of Manifoldcf windows
> shares jobs , and how it processes the path mentioned in the jobs
> configuration.
>
> I am creating a dynamic job via API using PHP which will pick up approx
> 70k of documents and a dynamic job with  70k of different paths mentioned
> in the job and mention folder-subfolders path otherwise and file name in
> filespec.
>
> My question is, how does manifold work in the background to access all
> different folders at a time. Because mostly all files correspond to
> different folders. Does manifold loads while fetching all folder
> permissions and accessing folder/subfolders files. How does it fetch
> permission for one folder say for path 1 and simultaneously fetch different
> folder permission/access for say path2.
> Does it load the manifold. Because when this job is running then
> manifoldcf seems to be under heavy load and it gets really really slow and
> has to restart the docker container every 15-20 min.
>
> How can a job be run efficiently?
>
> Thanks
> Ritika
>
>


Manifoldcf background process

2021-11-17 Thread ritika jain
Hi All,

I would like to understand the background process of Manifoldcf windows
shares jobs , and how it processes the path mentioned in the jobs
configuration.

I am creating a dynamic job via API using PHP which will pick up approx 70k
of documents and a dynamic job with  70k of different paths mentioned in
the job and mention folder-subfolders path otherwise and file name in
filespec.

My question is, how does manifold work in the background to access all
different folders at a time. Because mostly all files correspond to
different folders. Does manifold loads while fetching all folder
permissions and accessing folder/subfolders files. How does it fetch
permission for one folder say for path 1 and simultaneously fetch different
folder permission/access for say path2.
Does it load the manifold. Because when this job is running then manifoldcf
seems to be under heavy load and it gets really really slow and has to
restart the docker container every 15-20 min.

How can a job be run efficiently?

Thanks
Ritika


Re: Manifold Job process isssue

2021-11-15 Thread Karl Wright
SMB exceptions with jcifs in the trace tell us that JCIFS couldn't talk to
your windows share server.  That's all we can tell though.

Karl


On Mon, Nov 15, 2021 at 7:24 AM ritika jain 
wrote:

> Hi,
>
> Raising the concern above again, to process only 60k of document (when
> clock issue is fixed too), job process is not progressing , its being stuck
> for like days. So had to restart the docker container every time for it to
> process.
> This time now we are getting this :- Timeout Exception. What we can be the
> reason for it and how it can be fixed .?
>   ... 24 more
> [Worker thread '23'] WARN jcifs.util.transport.Transport - sendrecv failed
> jcifs.util.transport.RequestTimeoutException: Transport40 timedout waiting
> for response to
> command=SMB2_TREE_CONNECT,status=0,flags=0x,mid=4,wordCount=0,byteCount=86
> at
> jcifs.util.transport.Transport.waitForResponses(Transport.java:365)
> at jcifs.util.transport.Transport.sendrecv(Transport.java:232)
> at jcifs.smb.SmbTransportImpl.sendrecv(SmbTransportImpl.java:1021)
> at jcifs.smb.SmbTransportImpl.send(SmbTransportImpl.java:1539)
> at jcifs.smb.SmbSessionImpl.send(SmbSessionImpl.java:409)
> at jcifs.smb.SmbSessionImpl.send(SmbSessionImpl.java:347)
> at jcifs.smb.SmbTreeImpl.treeConnect(SmbTreeImpl.java:611)
> at
> jcifs.smb.SmbTreeConnection.connectTree(SmbTreeConnection.java:614)
> at
> jcifs.smb.SmbTreeConnection.connectHost(SmbTreeConnection.java:568)
> at
> jcifs.smb.SmbTreeConnection.connectHost(SmbTreeConnection.java:489)
> at jcifs.smb.SmbTreeConnection.connect(SmbTreeConnection.java:465)
> at
> jcifs.smb.SmbTreeConnection.connectWrapException(SmbTreeConnection.java:426)
> at jcifs.smb.SmbFile.ensureTreeConnected(SmbFile.java:551)
> at jcifs.smb.SmbFile.length(SmbFile.java:1541)
> at
> org.apache.manifoldcf.crawler.connectors.sharedrive.SharedDriveConnector.fileLength(SharedDriveConnector.java:2340)
> at
> org.apache.manifoldcf.crawler.connectors.sharedrive.SharedDriveConnector$ProcessDocumentsFilter.accept(SharedDriveConnector.java:4935)
> at
> jcifs.smb.SmbEnumerationUtil$ResourceFilterWrapper.accept(SmbEnumerationUtil.java:331)
> at
> jcifs.smb.FileEntryAdapterIterator.advance(FileEntryAdapterIterator.java:82)
> at
> jcifs.smb.FileEntryAdapterIterator.(FileEntryAdapterIterator.java:52)
> at
> jcifs.smb.DirFileEntryAdapterIterator.(DirFileEntryAdapterIterator.java:37)
> at jcifs.smb.SmbEnumerationUtil.doEnum(SmbEnumerationUtil.java:223)
> at
> jcifs.smb.SmbEnumerationUtil.listFiles(SmbEnumerationUtil.java:279)
> at jcifs.smb.SmbFile.listFiles(SmbFile.java:1273)
> at
> org.apache.manifoldcf.crawler.connectors.sharedrive.SharedDriveConnector.fileListFiles(SharedDriveConnector.java:2380)
> at
> org.apache.manifoldcf.crawler.connectors.sharedrive.SharedDriveConnector.processDocuments(SharedDriveConnector.java:818)
> at
> org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:399)
> [Worker thread '23'] WARN jcifs.smb.SmbTransportImpl - Disconnecting
> transport while still in use Transport40[backup002.directory.intra/
> 136.231.158.172:445,state=5,signingEnforced=false,usage=5]:
> [SmbSession[credentials=svc_EScrawl,targetHost=backup002.directory.intra,targetDomain=null,uid=0,connectionState=2,usage=3]]
> [Worker thread '23'] WARN jcifs.smb.SmbSessionImpl - Logging off session
> while still in use
> SmbSession[credentials=svc_EScrawl,targetHost=backup002.directory.intra,targetDomain=null,uid=0,connectionState=3,usage=3]:[SmbTree[share=WINPROJECTS,service=?,tid=-1,inDfs=false,inDomainDfs=false,connectionState=1,usage=1]]
> [Worker thread '10'] WARN jcifs.util.transport.Transport - sendrecv failed
> jcifs.util.transport.RequestTimeoutException: Transport41 timedout waiting
> for response to
> command=SMB2_TREE_CONNECT,status=0,flags=0x,mid=4,wordCount=0,byteCount=80
> at
> jcifs.util.transport.Transport.waitForResponses(Transport.java:365)
> at jcifs.util.transport.Transport.sendrecv(Transport.java:232)
> at jcifs.smb.SmbTransportImpl.sendrecv(SmbTransportImpl.java:1021)
> at jcifs.smb.SmbTransportImpl.send(SmbTransportImpl.java:1539)
> at jcifs.smb.SmbSessionImpl.send(SmbSessionImpl.java:409)
> at jcifs.smb.SmbSessionImpl.send(SmbSessionImpl.java:347)
> at jcifs.smb.SmbTreeImpl.treeConnect(SmbTreeImpl.java:611)
> at
> jcifs.smb.SmbTreeConnection.connectTree(SmbTreeConnection.java:614)
> at
> jcifs.smb.SmbTreeConnection.connectHost(SmbTreeConnection.java:568)
> at
> jcifs.smb.SmbTreeConnection.connectHost(SmbTreeConnection.java:489)
> at jcifs.smb.SmbTreeConnection.connect(SmbTreeConnection.java:465)
> at
> jcifs.smb.SmbTreeConnection.connectWrapException(SmbTreeConnection.java:426)
> at 

Re: Manifold Job process isssue

2021-11-15 Thread ritika jain
Hi,

Raising the concern above again, to process only 60k of document (when
clock issue is fixed too), job process is not progressing , its being stuck
for like days. So had to restart the docker container every time for it to
process.
This time now we are getting this :- Timeout Exception. What we can be the
reason for it and how it can be fixed .?
  ... 24 more
[Worker thread '23'] WARN jcifs.util.transport.Transport - sendrecv failed
jcifs.util.transport.RequestTimeoutException: Transport40 timedout waiting
for response to
command=SMB2_TREE_CONNECT,status=0,flags=0x,mid=4,wordCount=0,byteCount=86
at
jcifs.util.transport.Transport.waitForResponses(Transport.java:365)
at jcifs.util.transport.Transport.sendrecv(Transport.java:232)
at jcifs.smb.SmbTransportImpl.sendrecv(SmbTransportImpl.java:1021)
at jcifs.smb.SmbTransportImpl.send(SmbTransportImpl.java:1539)
at jcifs.smb.SmbSessionImpl.send(SmbSessionImpl.java:409)
at jcifs.smb.SmbSessionImpl.send(SmbSessionImpl.java:347)
at jcifs.smb.SmbTreeImpl.treeConnect(SmbTreeImpl.java:611)
at
jcifs.smb.SmbTreeConnection.connectTree(SmbTreeConnection.java:614)
at
jcifs.smb.SmbTreeConnection.connectHost(SmbTreeConnection.java:568)
at
jcifs.smb.SmbTreeConnection.connectHost(SmbTreeConnection.java:489)
at jcifs.smb.SmbTreeConnection.connect(SmbTreeConnection.java:465)
at
jcifs.smb.SmbTreeConnection.connectWrapException(SmbTreeConnection.java:426)
at jcifs.smb.SmbFile.ensureTreeConnected(SmbFile.java:551)
at jcifs.smb.SmbFile.length(SmbFile.java:1541)
at
org.apache.manifoldcf.crawler.connectors.sharedrive.SharedDriveConnector.fileLength(SharedDriveConnector.java:2340)
at
org.apache.manifoldcf.crawler.connectors.sharedrive.SharedDriveConnector$ProcessDocumentsFilter.accept(SharedDriveConnector.java:4935)
at
jcifs.smb.SmbEnumerationUtil$ResourceFilterWrapper.accept(SmbEnumerationUtil.java:331)
at
jcifs.smb.FileEntryAdapterIterator.advance(FileEntryAdapterIterator.java:82)
at
jcifs.smb.FileEntryAdapterIterator.(FileEntryAdapterIterator.java:52)
at
jcifs.smb.DirFileEntryAdapterIterator.(DirFileEntryAdapterIterator.java:37)
at jcifs.smb.SmbEnumerationUtil.doEnum(SmbEnumerationUtil.java:223)
at
jcifs.smb.SmbEnumerationUtil.listFiles(SmbEnumerationUtil.java:279)
at jcifs.smb.SmbFile.listFiles(SmbFile.java:1273)
at
org.apache.manifoldcf.crawler.connectors.sharedrive.SharedDriveConnector.fileListFiles(SharedDriveConnector.java:2380)
at
org.apache.manifoldcf.crawler.connectors.sharedrive.SharedDriveConnector.processDocuments(SharedDriveConnector.java:818)
at
org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:399)
[Worker thread '23'] WARN jcifs.smb.SmbTransportImpl - Disconnecting
transport while still in use Transport40[backup002.directory.intra/
136.231.158.172:445,state=5,signingEnforced=false,usage=5]:
[SmbSession[credentials=svc_EScrawl,targetHost=backup002.directory.intra,targetDomain=null,uid=0,connectionState=2,usage=3]]
[Worker thread '23'] WARN jcifs.smb.SmbSessionImpl - Logging off session
while still in use
SmbSession[credentials=svc_EScrawl,targetHost=backup002.directory.intra,targetDomain=null,uid=0,connectionState=3,usage=3]:[SmbTree[share=WINPROJECTS,service=?,tid=-1,inDfs=false,inDomainDfs=false,connectionState=1,usage=1]]
[Worker thread '10'] WARN jcifs.util.transport.Transport - sendrecv failed
jcifs.util.transport.RequestTimeoutException: Transport41 timedout waiting
for response to
command=SMB2_TREE_CONNECT,status=0,flags=0x,mid=4,wordCount=0,byteCount=80
at
jcifs.util.transport.Transport.waitForResponses(Transport.java:365)
at jcifs.util.transport.Transport.sendrecv(Transport.java:232)
at jcifs.smb.SmbTransportImpl.sendrecv(SmbTransportImpl.java:1021)
at jcifs.smb.SmbTransportImpl.send(SmbTransportImpl.java:1539)
at jcifs.smb.SmbSessionImpl.send(SmbSessionImpl.java:409)
at jcifs.smb.SmbSessionImpl.send(SmbSessionImpl.java:347)
at jcifs.smb.SmbTreeImpl.treeConnect(SmbTreeImpl.java:611)
at
jcifs.smb.SmbTreeConnection.connectTree(SmbTreeConnection.java:614)
at
jcifs.smb.SmbTreeConnection.connectHost(SmbTreeConnection.java:568)
at
jcifs.smb.SmbTreeConnection.connectHost(SmbTreeConnection.java:489)
at jcifs.smb.SmbTreeConnection.connect(SmbTreeConnection.java:465)
at
jcifs.smb.SmbTreeConnection.connectWrapException(SmbTreeConnection.java:426)
at jcifs.smb.SmbFile.ensureTreeConnected(SmbFile.java:551)
at jcifs.smb.SmbFile.exists(SmbFile.java:845)
at
org.apache.manifoldcf.crawler.connectors.sharedrive.SharedDriveConnector.fileExists(SharedDriveConnector.java:2220)
at
org.apache.manifoldcf.crawler.connectors.sharedrive.SharedDriveConnector.processDocuments(SharedDriveConnector.java:610)
at

Re: Manifold Job process isssue

2021-11-09 Thread Karl Wright
One hour is quite a lot and will wreak havoc on the document queue.
Karl


On Tue, Nov 9, 2021 at 7:08 AM ritika jain  wrote:

> I have checked, there is only one hour time difference between docker
> container and docker host
>
> On Tue, Nov 9, 2021 at 4:41 PM Karl Wright  wrote:
>
>> If your docker image's clock is out of sync badly with the real world,
>> then System.currentTimeMillis() may give bogus values, and ManifoldCF uses
>> that to manage throttling etc.  I don't know if that is the correct
>> explanation but it's the only thing I can think of.
>>
>> Karl
>>
>>
>> On Tue, Nov 9, 2021 at 4:56 AM ritika jain 
>> wrote:
>>
>>>
>>> Hi All,
>>>
>>> I am using window shares connector , manifoldcf 2.14 and ES as output. I
>>> have configured a job to process 60k of documents, Also these documents are
>>> new and do not have corresponding values in DB and ES index.
>>>
>>> So ideally it should process/Index the documents as soon as the job
>>> starts.
>>> But Manifoldcf does not process anything for many hours of job start
>>> up.I have tried restarting the docker container as well. But it didn't help
>>> much. Also logs only correspond to Long running queries.
>>>
>>> Why does the manifold behave like that?
>>>
>>> Thanks
>>> Ritika
>>>
>>


Re: Manifold Job process isssue

2021-11-09 Thread ritika jain
I have checked, there is only one hour time difference between docker
container and docker host

On Tue, Nov 9, 2021 at 4:41 PM Karl Wright  wrote:

> If your docker image's clock is out of sync badly with the real world,
> then System.currentTimeMillis() may give bogus values, and ManifoldCF uses
> that to manage throttling etc.  I don't know if that is the correct
> explanation but it's the only thing I can think of.
>
> Karl
>
>
> On Tue, Nov 9, 2021 at 4:56 AM ritika jain 
> wrote:
>
>>
>> Hi All,
>>
>> I am using window shares connector , manifoldcf 2.14 and ES as output. I
>> have configured a job to process 60k of documents, Also these documents are
>> new and do not have corresponding values in DB and ES index.
>>
>> So ideally it should process/Index the documents as soon as the job
>> starts.
>> But Manifoldcf does not process anything for many hours of job start up.I
>> have tried restarting the docker container as well. But it didn't help
>> much. Also logs only correspond to Long running queries.
>>
>> Why does the manifold behave like that?
>>
>> Thanks
>> Ritika
>>
>


Re: Manifold Job process isssue

2021-11-09 Thread Karl Wright
If your docker image's clock is out of sync badly with the real world, then
System.currentTimeMillis() may give bogus values, and ManifoldCF uses that
to manage throttling etc.  I don't know if that is the correct explanation
but it's the only thing I can think of.

Karl


On Tue, Nov 9, 2021 at 4:56 AM ritika jain  wrote:

>
> Hi All,
>
> I am using window shares connector , manifoldcf 2.14 and ES as output. I
> have configured a job to process 60k of documents, Also these documents are
> new and do not have corresponding values in DB and ES index.
>
> So ideally it should process/Index the documents as soon as the job starts.
> But Manifoldcf does not process anything for many hours of job start up.I
> have tried restarting the docker container as well. But it didn't help
> much. Also logs only correspond to Long running queries.
>
> Why does the manifold behave like that?
>
> Thanks
> Ritika
>


Manifold Job process isssue

2021-11-09 Thread ritika jain
Hi All,

I am using window shares connector , manifoldcf 2.14 and ES as output. I
have configured a job to process 60k of documents, Also these documents are
new and do not have corresponding values in DB and ES index.

So ideally it should process/Index the documents as soon as the job starts.
But Manifoldcf does not process anything for many hours of job start up.I
have tried restarting the docker container as well. But it didn't help
much. Also logs only correspond to Long running queries.

Why does the manifold behave like that?

Thanks
Ritika


Re: Duplicate key error

2021-10-27 Thread Karl Wright
We see errors like this only because MCF is a highly multithreaded
application, and two threads sometimes are able to collide in what they are
doing even though they are transactionally separated.  That is because of
bugs in the database software.  So if you restart the job it should not
encounter the same problem.

If the problem IS repeatable, we will of course look deeper into what is
going on.

Karl


On Wed, Oct 27, 2021 at 9:52 AM Karl Wright  wrote:

> Is it repeatable?  My guess is it is not repeatable.
> Karl
>
> On Wed, Oct 27, 2021 at 4:43 AM ritika jain 
> wrote:
>
>> So , it can be left as it is.. ? because it is preventing job to complete
>> and its stopping.
>>
>> On Tue, Oct 26, 2021 at 8:40 PM Karl Wright  wrote:
>>
>>> That's a database bug.  All of our underlying databases have some bugs
>>> of this kind.
>>>
>>> Karl
>>>
>>>
>>> On Tue, Oct 26, 2021 at 9:17 AM ritika jain 
>>> wrote:
>>>
 Hi All,

 While using Manifoldcf 2.14 with Web connector and ES connector. After
 a certain time of continuing the job (jobs ingest some documents in lakhs),
 we got this error on PROD.

 Can anybody suggest what could be the problem?

 PRODUCTION MANIFOLD ERROR:

 Error: ERROR: duplicate key value violates unique constraint
 "ingeststatus_pkey" Detail: Key (id)=(1624***7) already exists.


 Thanks





Re: Duplicate key error

2021-10-27 Thread Karl Wright
Is it repeatable?  My guess is it is not repeatable.
Karl

On Wed, Oct 27, 2021 at 4:43 AM ritika jain 
wrote:

> So , it can be left as it is.. ? because it is preventing job to complete
> and its stopping.
>
> On Tue, Oct 26, 2021 at 8:40 PM Karl Wright  wrote:
>
>> That's a database bug.  All of our underlying databases have some bugs of
>> this kind.
>>
>> Karl
>>
>>
>> On Tue, Oct 26, 2021 at 9:17 AM ritika jain 
>> wrote:
>>
>>> Hi All,
>>>
>>> While using Manifoldcf 2.14 with Web connector and ES connector. After a
>>> certain time of continuing the job (jobs ingest some documents in lakhs),
>>> we got this error on PROD.
>>>
>>> Can anybody suggest what could be the problem?
>>>
>>> PRODUCTION MANIFOLD ERROR:
>>>
>>> Error: ERROR: duplicate key value violates unique constraint
>>> "ingeststatus_pkey" Detail: Key (id)=(1624***7) already exists.
>>>
>>>
>>> Thanks
>>>
>>>
>>>


Re: Duplicate key error

2021-10-27 Thread ritika jain
So , it can be left as it is.. ? because it is preventing job to complete
and its stopping.

On Tue, Oct 26, 2021 at 8:40 PM Karl Wright  wrote:

> That's a database bug.  All of our underlying databases have some bugs of
> this kind.
>
> Karl
>
>
> On Tue, Oct 26, 2021 at 9:17 AM ritika jain 
> wrote:
>
>> Hi All,
>>
>> While using Manifoldcf 2.14 with Web connector and ES connector. After a
>> certain time of continuing the job (jobs ingest some documents in lakhs),
>> we got this error on PROD.
>>
>> Can anybody suggest what could be the problem?
>>
>> PRODUCTION MANIFOLD ERROR:
>>
>> Error: ERROR: duplicate key value violates unique constraint
>> "ingeststatus_pkey" Detail: Key (id)=(1624***7) already exists.
>>
>>
>> Thanks
>>
>>
>>


Re:

2021-10-26 Thread Karl Wright
That's a database bug.  All of our underlying databases have some bugs of
this kind.

Karl


On Tue, Oct 26, 2021 at 9:17 AM ritika jain 
wrote:

> Hi All,
>
> While using Manifoldcf 2.14 with Web connector and ES connector. After a
> certain time of continuing the job (jobs ingest some documents in lakhs),
> we got this error on PROD.
>
> Can anybody suggest what could be the problem?
>
> PRODUCTION MANIFOLD ERROR:
>
> Error: ERROR: duplicate key value violates unique constraint
> "ingeststatus_pkey" Detail: Key (id)=(1624***7) already exists.
>
>
> Thanks
>
>
>


[no subject]

2021-10-26 Thread ritika jain
Hi All,

While using Manifoldcf 2.14 with Web connector and ES connector. After a
certain time of continuing the job (jobs ingest some documents in lakhs),
we got this error on PROD.

Can anybody suggest what could be the problem?

PRODUCTION MANIFOLD ERROR:

Error: ERROR: duplicate key value violates unique constraint
"ingeststatus_pkey" Detail: Key (id)=(1624***7) already exists.


Thanks


Re: Windows Shares job-Limit on defining no of paths

2021-10-25 Thread Karl Wright
The only limit is that the more you add, the slower it gets.

Karl


On Mon, Oct 25, 2021 at 6:06 AM ritika jain 
wrote:

> Hi ,
> Is there any limit on the number of paths we can define in job using
> Repository as Window Shares and ES as Output
>
> Thanks
>


Re: Null Pointer Exception

2021-10-25 Thread Karl Wright
The API should really catch this situation.  Basically, you are calling a
function that requires an input but you are not providing one.  In that
case the API sets the input to "null", and the detailed operation is
called.  The detailed operation is not expecting a null input.

This is API piece that is not flagging the error properly:

// Parse the input
Configuration input;

if (protocol.equals("json"))
{
  if (argument.length() != 0)
  {
input = new Configuration();
input.fromJSON(argument);
  }
  else
input = null;
}
else
{
  response.sendError(response.SC_BAD_REQUEST,"Unknown API protocol:
"+protocol);
  return;
}

Since this is POST, it should assume that the input cannot be null, and if
it is, it's a bad request.

Karl


On Mon, Oct 25, 2021 at 2:44 AM ritika jain 
wrote:

> Hi,
>
> I am getting Null pointer exceptions while creating a job programmatic
> approach via PHP.
> Can anybody suggest the reason for this?.
>
>Error 500 Server Error 
> HTTP ERROR 500 Problem accessing
> /mcf-api-service/json/jobs. Reason:  Server ErrorCaused
> by:java.lang.NullPointerException at
> org.apache.manifoldcf.agents.system.ManifoldCF.findConfigurationNode(ManifoldCF.java:208)
> at
> org.apache.manifoldcf.crawler.system.ManifoldCF.apiPostJob(ManifoldCF.java:3539)
> at
> org.apache.manifoldcf.crawler.system.ManifoldCF.executePostCommand(ManifoldCF.java:3585)
> at
> org.apache.manifoldcf.apiservlet.APIServlet.executePost(APIServlet.java:576)
> at org.apache.manifoldcf.apiservlet.APIServlet.doPost(APIServlet.java:175)
> at javax.servlet.http.HttpServlet.service(HttpServlet.java:707) at
> javax.servlet.http.HttpServlet.service(HttpServlet.java:790) at
> org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:769) at
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:585)
> at
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
> at
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:577)
> at
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:223)
> at
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1125)
> at
> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515)
> at
> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
> at
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1059)
> at
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
> at
> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:215)
> at org.eclipse.jetty.server.handler.HandlerList.handle(HandlerList.java:52)
> at
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97)
> at org.eclipse.jetty.server.Server.handle(Server.java:497) at
> org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:311) at
> org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:248)
> at
> org.eclipse.jetty.io.AbstractConnection$2.run(AbstractConnection.java:540)
> at
> org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:610)
> at
> org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:539)
> at java.lang.Thread.run(Thread.java:748)  Powered by
> Jetty://  
>
>


Windows Shares job-Limit on defining no of paths

2021-10-25 Thread ritika jain
Hi ,
Is there any limit on the number of paths we can define in job using
Repository as Window Shares and ES as Output

Thanks


Null Pointer Exception

2021-10-25 Thread ritika jain
Hi,

I am getting Null pointer exceptions while creating a job programmatic
approach via PHP.
Can anybody suggest the reason for this?.

   Error 500 Server Error 
HTTP ERROR 500 Problem accessing
/mcf-api-service/json/jobs. Reason:  Server ErrorCaused
by:java.lang.NullPointerException at
org.apache.manifoldcf.agents.system.ManifoldCF.findConfigurationNode(ManifoldCF.java:208)
at
org.apache.manifoldcf.crawler.system.ManifoldCF.apiPostJob(ManifoldCF.java:3539)
at
org.apache.manifoldcf.crawler.system.ManifoldCF.executePostCommand(ManifoldCF.java:3585)
at
org.apache.manifoldcf.apiservlet.APIServlet.executePost(APIServlet.java:576)
at org.apache.manifoldcf.apiservlet.APIServlet.doPost(APIServlet.java:175)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:707) at
javax.servlet.http.HttpServlet.service(HttpServlet.java:790) at
org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:769) at
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:585)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
at
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:577)
at
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:223)
at
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1125)
at
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515)
at
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
at
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1059)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
at
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:215)
at org.eclipse.jetty.server.handler.HandlerList.handle(HandlerList.java:52)
at
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97)
at org.eclipse.jetty.server.Server.handle(Server.java:497) at
org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:311) at
org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:248)
at
org.eclipse.jetty.io.AbstractConnection$2.run(AbstractConnection.java:540)
at
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:610)
at
org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:539)
at java.lang.Thread.run(Thread.java:748)  Powered by
Jetty://  


Re: Error: Repeated service interruptions - failure processing document: Read timed out

2021-09-30 Thread Karl Wright
Hi,

You say this is a "Tika error".  Is this Tika as a stand-alone service?  I
do not recognize any ManifoldCF classes whatsoever in this thread dump.

If this is Tika, I suggest contacting the Tika team.

Karl


On Thu, Sep 30, 2021 at 3:02 AM Bisonti Mario 
wrote:

> Additional info.
>
>
>
> I am using 2.17-dev version
>
>
>
>
>
>
>
> *Da:* Bisonti Mario
> *Inviato:* martedì 28 settembre 2021 17:01
> *A:* user@manifoldcf.apache.org
> *Oggetto:* Error: Repeated service interruptions - failure processing
> document: Read timed out
>
>
>
> Hello
>
>
>
> I have error on a Job that parses a network folder.
>
>
>
> This is the tika error:
> 2021-09-28 16:14:50 INFO  Server:415 - Started @1367ms
>
> 2021-09-28 16:14:50 WARN  ContextHandler:1671 - Empty contextPath
>
> 2021-09-28 16:14:50 INFO  ContextHandler:916 - Started
> o.e.j.s.h.ContextHandler@3dd69f5a{/,null,AVAILABLE}
>
> 2021-09-28 16:14:50 INFO  TikaServerCli:413 - Started Apache Tika server
> at http://sengvivv02.vimar.net:9998/
>
> 2021-09-28 16:15:04 INFO  MetadataResource:484 - meta (application/pdf)
>
> 2021-09-28 16:26:46 INFO  MetadataResource:484 - meta (application/pdf)
>
> 2021-09-28 16:26:46 INFO  TikaResource:484 - tika (application/pdf)
>
> 2021-09-28 16:27:23 INFO  MetadataResource:484 - meta (application/pdf)
>
> 2021-09-28 16:27:24 INFO  TikaResource:484 - tika (application/pdf)
>
> 2021-09-28 16:27:26 INFO  MetadataResource:484 - meta (application/pdf)
>
> 2021-09-28 16:27:26 INFO  TikaResource:484 - tika (application/pdf)
>
> 2021-09-28 16:30:28 WARN  PhaseInterceptorChain:468 - Interceptor for {
> http://resource.server.tika.apache.org/}MetadataResource has thrown
> exception, unwinding now
>
> org.apache.cxf.interceptor.Fault: Could not send Message.
>
> at
> org.apache.cxf.interceptor.MessageSenderInterceptor$MessageSenderEndingInterceptor.handleMessage(MessageSenderInterceptor.java:67)
>
> at
> org.apache.cxf.phase.PhaseInterceptorChain.doIntercept(PhaseInterceptorChain.java:308)
>
> at
> org.apache.cxf.interceptor.OutgoingChainInterceptor.handleMessage(OutgoingChainInterceptor.java:90)
>
> at
> org.apache.cxf.phase.PhaseInterceptorChain.doIntercept(PhaseInterceptorChain.java:308)
>
> at
> org.apache.cxf.transport.ChainInitiationObserver.onMessage(ChainInitiationObserver.java:121)
>
> at
> org.apache.cxf.transport.http.AbstractHTTPDestination.invoke(AbstractHTTPDestination.java:265)
>
> at
> org.apache.cxf.transport.http_jetty.JettyHTTPDestination.doService(JettyHTTPDestination.java:247)
>
> at
> org.apache.cxf.transport.http_jetty.JettyHTTPHandler.handle(JettyHTTPHandler.java:79)
>
> at
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)
>
> at
> org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:235)
>
> at
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1435)
>
> at
> org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:190)
>
> at
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1350)
>
> at
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
>
> at
> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:191)
>
> at
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)
>
> at org.eclipse.jetty.server.Server.handle(Server.java:516)
>
> at
> org.eclipse.jetty.server.HttpChannel.lambda$handle$1(HttpChannel.java:388)
>
> at
> org.eclipse.jetty.server.HttpChannel.dispatch(HttpChannel.java:633)
>
> at
> org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:380)
>
> at
> org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:277)
>
> at
> org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:311)
>
> at
> org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:105)
>
> at
> org.eclipse.jetty.io.ChannelEndPoint$1.run(ChannelEndPoint.java:104)
>
> at
> org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:882)
>
> at
> org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:1036)
>
> at java.base/java.lang.Thread.run(Thread.java:834)
>
> Caused by: org.eclipse.jetty.io.EofException
>
> at
> org.eclipse.jetty.io.ChannelEndPoint.flush(ChannelEndPoint.java:279)
>
> at org.eclipse.jetty.io.WriteFlusher.flush(WriteFlusher.java:422)
>
> at org.eclipse.jetty.io.WriteFlusher.write(WriteFlusher.java:277)
>
> at
> org.eclipse.jetty.io.AbstractEndPoint.write(AbstractEndPoint.java:381)
>
> at
> org.eclipse.jetty.server.HttpConnection$SendCallback.process(HttpConnection.java:826)
>
> at
> 

R: Error: Repeated service interruptions - failure processing document: Read timed out

2021-09-30 Thread Bisonti Mario
Additional info.

I am using 2.17-dev version



Da: Bisonti Mario
Inviato: martedì 28 settembre 2021 17:01
A: user@manifoldcf.apache.org
Oggetto: Error: Repeated service interruptions - failure processing document: 
Read timed out

Hello

I have error on a Job that parses a network folder.

This is the tika error:
2021-09-28 16:14:50 INFO  Server:415 - Started @1367ms
2021-09-28 16:14:50 WARN  ContextHandler:1671 - Empty contextPath
2021-09-28 16:14:50 INFO  ContextHandler:916 - Started 
o.e.j.s.h.ContextHandler@3dd69f5a{/,null,AVAILABLE}
2021-09-28 16:14:50 INFO  TikaServerCli:413 - Started Apache Tika server at 
http://sengvivv02.vimar.net:9998/
2021-09-28 16:15:04 INFO  MetadataResource:484 - meta (application/pdf)
2021-09-28 16:26:46 INFO  MetadataResource:484 - meta (application/pdf)
2021-09-28 16:26:46 INFO  TikaResource:484 - tika (application/pdf)
2021-09-28 16:27:23 INFO  MetadataResource:484 - meta (application/pdf)
2021-09-28 16:27:24 INFO  TikaResource:484 - tika (application/pdf)
2021-09-28 16:27:26 INFO  MetadataResource:484 - meta (application/pdf)
2021-09-28 16:27:26 INFO  TikaResource:484 - tika (application/pdf)
2021-09-28 16:30:28 WARN  PhaseInterceptorChain:468 - Interceptor for 
{http://resource.server.tika.apache.org/}MetadataResource has thrown exception, 
unwinding now
org.apache.cxf.interceptor.Fault: Could not send Message.
at 
org.apache.cxf.interceptor.MessageSenderInterceptor$MessageSenderEndingInterceptor.handleMessage(MessageSenderInterceptor.java:67)
at 
org.apache.cxf.phase.PhaseInterceptorChain.doIntercept(PhaseInterceptorChain.java:308)
at 
org.apache.cxf.interceptor.OutgoingChainInterceptor.handleMessage(OutgoingChainInterceptor.java:90)
at 
org.apache.cxf.phase.PhaseInterceptorChain.doIntercept(PhaseInterceptorChain.java:308)
at 
org.apache.cxf.transport.ChainInitiationObserver.onMessage(ChainInitiationObserver.java:121)
at 
org.apache.cxf.transport.http.AbstractHTTPDestination.invoke(AbstractHTTPDestination.java:265)
at 
org.apache.cxf.transport.http_jetty.JettyHTTPDestination.doService(JettyHTTPDestination.java:247)
at 
org.apache.cxf.transport.http_jetty.JettyHTTPHandler.handle(JettyHTTPHandler.java:79)
at 
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)
at 
org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:235)
at 
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1435)
at 
org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:190)
at 
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1350)
at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
at 
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:191)
at 
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)
at org.eclipse.jetty.server.Server.handle(Server.java:516)
at 
org.eclipse.jetty.server.HttpChannel.lambda$handle$1(HttpChannel.java:388)
at org.eclipse.jetty.server.HttpChannel.dispatch(HttpChannel.java:633)
at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:380)
at 
org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:277)
at 
org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:311)
at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:105)
at org.eclipse.jetty.io.ChannelEndPoint$1.run(ChannelEndPoint.java:104)
at 
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:882)
at 
org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:1036)
at java.base/java.lang.Thread.run(Thread.java:834)
Caused by: org.eclipse.jetty.io.EofException
at org.eclipse.jetty.io.ChannelEndPoint.flush(ChannelEndPoint.java:279)
at org.eclipse.jetty.io.WriteFlusher.flush(WriteFlusher.java:422)
at org.eclipse.jetty.io.WriteFlusher.write(WriteFlusher.java:277)
at 
org.eclipse.jetty.io.AbstractEndPoint.write(AbstractEndPoint.java:381)
at 
org.eclipse.jetty.server.HttpConnection$SendCallback.process(HttpConnection.java:826)
at 
org.eclipse.jetty.util.IteratingCallback.processing(IteratingCallback.java:241)
at 
org.eclipse.jetty.util.IteratingCallback.iterate(IteratingCallback.java:223)
at org.eclipse.jetty.server.HttpConnection.send(HttpConnection.java:550)
at 
org.eclipse.jetty.server.HttpChannel.sendResponse(HttpChannel.java:915)
at org.eclipse.jetty.server.HttpChannel.write(HttpChannel.java:987)
at org.eclipse.jetty.server.HttpOutput.channelWrite(HttpOutput.java:285)
at 

Error: Repeated service interruptions - failure processing document: Read timed out

2021-09-28 Thread Bisonti Mario
Hello

I have error on a Job that parses a network folder.

This is the tika error:
2021-09-28 16:14:50 INFO  Server:415 - Started @1367ms
2021-09-28 16:14:50 WARN  ContextHandler:1671 - Empty contextPath
2021-09-28 16:14:50 INFO  ContextHandler:916 - Started 
o.e.j.s.h.ContextHandler@3dd69f5a{/,null,AVAILABLE}
2021-09-28 16:14:50 INFO  TikaServerCli:413 - Started Apache Tika server at 
http://sengvivv02.vimar.net:9998/
2021-09-28 16:15:04 INFO  MetadataResource:484 - meta (application/pdf)
2021-09-28 16:26:46 INFO  MetadataResource:484 - meta (application/pdf)
2021-09-28 16:26:46 INFO  TikaResource:484 - tika (application/pdf)
2021-09-28 16:27:23 INFO  MetadataResource:484 - meta (application/pdf)
2021-09-28 16:27:24 INFO  TikaResource:484 - tika (application/pdf)
2021-09-28 16:27:26 INFO  MetadataResource:484 - meta (application/pdf)
2021-09-28 16:27:26 INFO  TikaResource:484 - tika (application/pdf)
2021-09-28 16:30:28 WARN  PhaseInterceptorChain:468 - Interceptor for 
{http://resource.server.tika.apache.org/}MetadataResource has thrown exception, 
unwinding now
org.apache.cxf.interceptor.Fault: Could not send Message.
at 
org.apache.cxf.interceptor.MessageSenderInterceptor$MessageSenderEndingInterceptor.handleMessage(MessageSenderInterceptor.java:67)
at 
org.apache.cxf.phase.PhaseInterceptorChain.doIntercept(PhaseInterceptorChain.java:308)
at 
org.apache.cxf.interceptor.OutgoingChainInterceptor.handleMessage(OutgoingChainInterceptor.java:90)
at 
org.apache.cxf.phase.PhaseInterceptorChain.doIntercept(PhaseInterceptorChain.java:308)
at 
org.apache.cxf.transport.ChainInitiationObserver.onMessage(ChainInitiationObserver.java:121)
at 
org.apache.cxf.transport.http.AbstractHTTPDestination.invoke(AbstractHTTPDestination.java:265)
at 
org.apache.cxf.transport.http_jetty.JettyHTTPDestination.doService(JettyHTTPDestination.java:247)
at 
org.apache.cxf.transport.http_jetty.JettyHTTPHandler.handle(JettyHTTPHandler.java:79)
at 
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)
at 
org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:235)
at 
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1435)
at 
org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:190)
at 
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1350)
at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
at 
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:191)
at 
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)
at org.eclipse.jetty.server.Server.handle(Server.java:516)
at 
org.eclipse.jetty.server.HttpChannel.lambda$handle$1(HttpChannel.java:388)
at org.eclipse.jetty.server.HttpChannel.dispatch(HttpChannel.java:633)
at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:380)
at 
org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:277)
at 
org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:311)
at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:105)
at org.eclipse.jetty.io.ChannelEndPoint$1.run(ChannelEndPoint.java:104)
at 
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:882)
at 
org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:1036)
at java.base/java.lang.Thread.run(Thread.java:834)
Caused by: org.eclipse.jetty.io.EofException
at org.eclipse.jetty.io.ChannelEndPoint.flush(ChannelEndPoint.java:279)
at org.eclipse.jetty.io.WriteFlusher.flush(WriteFlusher.java:422)
at org.eclipse.jetty.io.WriteFlusher.write(WriteFlusher.java:277)
at 
org.eclipse.jetty.io.AbstractEndPoint.write(AbstractEndPoint.java:381)
at 
org.eclipse.jetty.server.HttpConnection$SendCallback.process(HttpConnection.java:826)
at 
org.eclipse.jetty.util.IteratingCallback.processing(IteratingCallback.java:241)
at 
org.eclipse.jetty.util.IteratingCallback.iterate(IteratingCallback.java:223)
at org.eclipse.jetty.server.HttpConnection.send(HttpConnection.java:550)
at 
org.eclipse.jetty.server.HttpChannel.sendResponse(HttpChannel.java:915)
at org.eclipse.jetty.server.HttpChannel.write(HttpChannel.java:987)
at org.eclipse.jetty.server.HttpOutput.channelWrite(HttpOutput.java:285)
at org.eclipse.jetty.server.HttpOutput.close(HttpOutput.java:638)
at 
org.apache.cxf.transport.http_jetty.JettyHTTPDestination$JettyOutputStream.close(JettyHTTPDestination.java:329)
at 

ApacheCon starts tomorrow!

2021-09-20 Thread Rich Bowen
ApacheCon @Home starts tomorrow! Details at 
https://www.apachecon.com/acah2021/index.html


(Note: You're receiving this because you are subscribed to one or more 
user lists for Apache Software Foundation projects.)


We've got three days of great content lined up for you, spanning 14 
project communities. And we're very excited about the keynotes, with 
presentations from David Nalley, Ashley Wolfe, Mark Cox, Alison Parker, 
and Michael Weinberg. And we'll be hearing from our Platinum sponsors in 
their keynotes as well! (Schedule is at 
https://www.apachecon.com/acah2021/tracks/)


You can still register today, at 
https://www.apachecon.com/acah2021/register.html


We especially want to thank our sponsors, who have made this event possible:

Strategic sponsor: Google
Platinum sponsors: Huawei, Tencent, Instaclustr, and Apple
Gold sponsors: AWS, Aiven, Gradle, Replicated, Red 
Hat, Baidu, Fiter, Cerner, Dremio, and Didi
Silver sponsors: Bamboo, SpereEx, Microsoft, Imply, Securonix, DataStax, 
and Crafter Software

Bronze sponsor: Technical Arts

Please join us on our Slack for discussion before, during, and after the 
event! http://s.apache.org/apachecon-slack


And follow us on Twitter - https://twitter.com/apachecon - for tips and 
announcements during the event.


See you tomorrow!


--
Rich Bowen, VP Conferences
The Apache Software Foundation
https://apachecon.com/
@apachecon


  1   2   3   4   5   6   7   8   9   10   >