Notification connector error

2021-05-11 Thread julien.massiera
Hello,

 

I am trying to use an email notification connector but without success. When
the connector tries to send an email I keep having the following error:

 

Email: Error sending email: Could not convert socket to TLS

javax.mail.MessagingException: Could not convert socket to TLS

at
com.sun.mail.smtp.SMTPTransport.startTLS(SMTPTransport.java:1918)
~[mail-1.4.5.jar:1.4.5]

at
com.sun.mail.smtp.SMTPTransport.protocolConnect(SMTPTransport.java:652)
~[mail-1.4.5.jar:1.4.5]

at javax.mail.Service.connect(Service.java:317)
~[mail-1.4.5.jar:1.4.5]

at javax.mail.Service.connect(Service.java:176)
~[mail-1.4.5.jar:1.4.5]

at javax.mail.Service.connect(Service.java:125)
~[mail-1.4.5.jar:1.4.5]

at javax.mail.Transport.send0(Transport.java:194)
~[mail-1.4.5.jar:1.4.5]

at javax.mail.Transport.send(Transport.java:124)
~[mail-1.4.5.jar:1.4.5]

at
org.apache.manifoldcf.crawler.notifications.email.EmailSession.send(EmailSes
sion.java:112) ~[?:?]

at
org.apache.manifoldcf.crawler.notifications.email.EmailConnector$SendThread.
run(EmailConnector.java:963) ~[?:?]

Caused by: javax.net.ssl.SSLHandshakeException: No appropriate protocol
(protocol is disabled or cipher suites are inappropriate)

at
sun.security.ssl.HandshakeContext.(HandshakeContext.java:170) ~[?:?]

at
sun.security.ssl.ClientHandshakeContext.(ClientHandshakeContext.java:9
8) ~[?:?]

at
sun.security.ssl.TransportContext.kickstart(TransportContext.java:221)
~[?:?]

at
sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:433) ~[?:?]

at
sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:411) ~[?:?]

at
com.sun.mail.util.SocketFetcher.configureSSLSocket(SocketFetcher.java:548)
~[mail-1.4.5.jar:1.4.5]

at
com.sun.mail.util.SocketFetcher.startTLS(SocketFetcher.java:485)
~[mail-1.4.5.jar:1.4.5]

at
com.sun.mail.smtp.SMTPTransport.startTLS(SMTPTransport.java:1913)
~[mail-1.4.5.jar:1.4.5]

... 8 more

 

 

The connector is configured with a gmail SMTP, using the configuration
recommended by the documentation:

 

Hostname: smtp.gmail.com

Port: 587

 

Configuration properties:

mail.smtp.ssl.trust : smtp.gmail.com

mail.smtp.starttls.enable : true

mail.smtp.auth : true

 

 

The username and password I use are correct and I also tried with the
office365 SMTP and I get the same error. 

 

I am using openjdk version "11.0.11" 2021-04-20. Do you have any idea about
my issue ? 

 

Julien

 



RE: Password admin UI

2020-12-17 Thread julien.massiera
I should mention that I used the obfuscation method provided by 
org.apache.manifoldcf.core.system.ManifoldCF.obfuscate(String) and set the 
obfuscated password in the org.apache.manifoldcf.login.password.obfuscated and 
org.apache.manifoldcf.apilogin.password.obfuscated properties of the 
properties.xml file 

 

I can also guarantee you that I used UTF-8 encoding to provide the password to 
the obfuscate method and that testing the deobfuscate method provides the right 
password with UTF-8 chars

 

Julien

 

De : Karl Wright  
Envoyé : mercredi 16 décembre 2020 19:40
À : user@manifoldcf.apache.org
Objet : Re: Password admin UI

 

Hi Julien,
The properties file is read as utf-8, so as long as you make sure that the 
encoding in your editor is utf-8, it should work.

Many editors default to the Microsoft code page so use something like scite or 
emacs.


Karl

 

On Wed, Dec 16, 2020 at 12:31 PM mailto:julien.massi...@francelabs.com> > wrote:

Hi,

 

I tried different type of password for the admin UI and it appears that 
passwords containing accentuated characters or special characters do not work. 
Is it “normal” or not ? 

 

Regards,

Julien 

 



Password admin UI

2020-12-16 Thread julien.massiera
Hi,

 

I tried different type of password for the admin UI and it appears that
passwords containing accentuated characters or special characters do not
work. Is it "normal" or not ? 

 

Regards,

Julien 

 



RE: Web connector login sequence

2020-06-02 Thread julien.massiera
Hi Karl,

 

Thanks for your answer. 

 

The login sequence I configured was the problem but not because some part were 
missing, the main problem was that I entered the same regular expression to 
address two different login types : a login page and a redirect page. 
I did not check the code, but it seems that the connector saves the login 
sequence into an HashMap with the login regex as key. So my redirect rule 
“other-site\/cas\/login = redirect” was overridden by the form rule 
“other-site\/cas\/login = form”. This is why in the debug log, the other-site 
302 response was not recognized by the login sequence.

 

I have modified the two rules so that the regex are different and it works !

 

I hope my use case will help other people if they encounter the same problem.

 

Note that the solution I implemented sounds to me more like a workaround than a 
solution. Let me explain: I was able to differentiate the regex rules by 
removing a letter in one of them:
“other-site\/cas\/logi = redirect” vs “other-site\/cas\/login = form”. But this 
does not feel like a “clean” solution

 

Regards,
Julien

 



 

De : Karl Wright  
Envoyé : vendredi 29 mai 2020 22:32
À : user@manifoldcf.apache.org
Objet : Re: Web connector login sequence

 

Hi Julien,

The login sequence must include all parts of the login sequence, from 
initiation (the first 302 that you get when you load /site) all the way through 
to the last action that sets the cookie.  After the login sequence is 
completed, the /site URL will be fetched again.  If you need more than one 
fetch to set more than one cookie, ALL the fetches must match your description 
of the login sequence or it will abort early.  If the cookie gets set on a 
final redirection, be sure to include that redirection too.

 

Karl

 

 

On Fri, May 29, 2020 at 12:01 PM <  
julien.massi...@francelabs.com> wrote:

Hi MCF community,

 

I need some help with the configuration of a login sequence with the Web 
connector. Here is the login sequence on a web browser :

 

GET site/

302 -> site/login

302 -> other-site/cas/login

401 other-site/cas/login

POST other-site/cas/login (set cookie)

302 -> site/login?param1=value (set cookie)

302 -> site/login?param1=value (set cookie)

302 -> site/

 

I tested the following conf :

 

Session: site

  site\/login = redirect

other-site\/cas\/login = redirect

  other-site\/cas\/login = form 

  username=john

   password=***

 

This configuration works till the form POST, after the form POST, the first 
cookie is correctly retrieved by the job but then it ends up in an infinite 
loop. Here are the debug logs:

 

….

DEBUG 2020-05-29T15:07:25,560 (Worker thread '11') - 
MCF|MCF-agent|apache.manifoldcf.connectors|WEB: For  
 https://other-site/cas/login, setting virtual 
host to other-site

DEBUG 2020-05-29T15:07:25,560 (Worker thread '11') - 
MCF|MCF-agent|apache.manifoldcf.connectors|WEB: Got an HttpClient object after 
1 ms.

DEBUG 2020-05-29T15:07:25,560 (Worker thread '11') - 
MCF|MCF-agent|apache.manifoldcf.connectors|WEB: Post method for '/cas/login' 

…..

DEBUG 2020-05-29T15:07:18,442 (Worker thread '11') - 
MCF|MCF-agent|apache.manifoldcf.connectors|WEB: Retrieving cookies...

DEBUG 2020-05-29T15:07:18,442 (Worker thread '11') - 
MCF|MCF-agent|apache.manifoldcf.connectors|WEB:   Cookie '[version: 0]xx

INFO 2020-05-29T15:07:18,448 (Worker thread '11') - 
MCF|MCF-agent|apache.manifoldcf.connectors|WEB: FETCH LOGIN| 
 
https://other-site/cas/login|1590764838416+31|302|0|

DEBUG 2020-05-29T15:07:18,448 (Worker thread '11') - 
MCF|MCF-agent|apache.manifoldcf.connectors|WEB: Document ' 
 https://other-site/cas/login' did not match 
expected form, link, redirection, or content for sequence 'site'

….

 

It seems that the redirection after the form POST is not considered by the job 
but I don’t know why. After that, there is an infinite loop where the cookie is 
passed on the GET “site/login” which redirects to “other-site/login”, but this 
time, when “other-site/login” get the cookie in the request, it does not send a 
302 redirect response code but a 200 OK

 

I don’t know why there is such behavior and I would be glad to have your 
advises !

 

Thanks for your help

 

Julien

 



Web connector login sequence

2020-05-29 Thread julien.massiera
Hi MCF community,

 

I need some help with the configuration of a login sequence with the Web
connector. Here is the login sequence on a web browser :

 

GET site/

302 -> site/login

302 -> other-site/cas/login

401 other-site/cas/login

POST other-site/cas/login (set cookie)

302 -> site/login?param1=value (set cookie)

302 -> site/login?param1=value (set cookie)

302 -> site/

 

I tested the following conf :

 

Session: site

  site\/login = redirect

other-site\/cas\/login = redirect

  other-site\/cas\/login = form 

  username=john

   password=***

 

This configuration works till the form POST, after the form POST, the first
cookie is correctly retrieved by the job but then it ends up in an infinite
loop. Here are the debug logs:

 

..

DEBUG 2020-05-29T15:07:25,560 (Worker thread '11') -
MCF|MCF-agent|apache.manifoldcf.connectors|WEB: For
https://other-site/cas/login, setting virtual host to other-site

DEBUG 2020-05-29T15:07:25,560 (Worker thread '11') -
MCF|MCF-agent|apache.manifoldcf.connectors|WEB: Got an HttpClient object
after 1 ms.

DEBUG 2020-05-29T15:07:25,560 (Worker thread '11') -
MCF|MCF-agent|apache.manifoldcf.connectors|WEB: Post method for '/cas/login'


...

DEBUG 2020-05-29T15:07:18,442 (Worker thread '11') -
MCF|MCF-agent|apache.manifoldcf.connectors|WEB: Retrieving cookies...

DEBUG 2020-05-29T15:07:18,442 (Worker thread '11') -
MCF|MCF-agent|apache.manifoldcf.connectors|WEB:   Cookie '[version:
0]xx

INFO 2020-05-29T15:07:18,448 (Worker thread '11') -
MCF|MCF-agent|apache.manifoldcf.connectors|WEB: FETCH
LOGIN|https://other-site/cas/login|1590764838416+31|302|0|

DEBUG 2020-05-29T15:07:18,448 (Worker thread '11') -
MCF|MCF-agent|apache.manifoldcf.connectors|WEB: Document
'https://other-site/cas/login' did not match expected form, link,
redirection, or content for sequence 'site'

..

 

It seems that the redirection after the form POST is not considered by the
job but I don't know why. After that, there is an infinite loop where the
cookie is passed on the GET "site/login" which redirects to
"other-site/login", but this time, when "other-site/login" get the cookie in
the request, it does not send a 302 redirect response code but a 200 OK

 

I don't know why there is such behavior and I would be glad to have your
advises !

 

Thanks for your help

 

Julien

 



Re: Job Multiple Outputs

2019-09-10 Thread julien.massiera
Thanks for your answer Karl. I was unsure about that concerning the output 
connections but it is still the same pipeline after all.
 Message d'origine De : Karl Wright  Date : 
10/09/2019  20:08  (GMT+01:00) À : user@manifoldcf.apache.org Objet : Re: Job 
Multiple Outputs Hi Julien,You must understand that a job with a complex 
pipeline is really not running N independent jobs; it's running ONE job.  Every 
document is processed through the pipeline only once.  The pipeline may have 
faster components and slower components; doesn't matter; the document takes the 
sum total of the time all components need to fetch and process the 
document.KarlOn Tue, Sep 10, 2019 at 12:48 PM Julien Massiera 
 wrote:
  

  
  
Ok, so to be sure I understood what you are saying: 

suppose a job with two output connections and one of the outputs
  is twice time faster than the other one to index documents. At a
  given time t, both of the outputs will have indexed the same
  amount of documents, no matter if one output is faster than the
  other one. 
  In other words : The fastest output will not have indexed all the
  crawled documents meanwhile the second one will still have half of
  them to index. 

Am I wrong ? 

On 10/09/2019 18:09, Karl Wright wrote:


  
  The output connection contract is that a request to
index is made to the connector, and the connector returns when
it is done.
When there are multiple output connections, these are each
handed a copy of the document, one after the other, and told to
index it.  This is all done by one worker thread.  Multiple
worker threads are not used for multiple outputs of the same
document.

The framework is smart enough to not hand a document to a
connector if it hasn't changed (according to how the connector
computes the connector-specific output version string).


Karl


  
  
  
On Tue, Sep 10, 2019 at 11:00
  AM Julien Massiera 
  wrote:

Hi,
  
  I would like to have an explanation about the behavior of a
  job when 
  several outputs are configured. My main question is : for each
  output, 
  how is the docs ingestion managed ? More precisely, are the
  ingest 
  processes synchronized or not ? (in other words, is the
  ingestion of the 
  next document waiting for the current ingestion to be
  completed for both 
  outputs ?). But also, if one output is configured to send a
  commit at 
  the end of the job, is this commit pending until the last
  ingestion has 
  occured in the other output ?
  
  Thanks for your help,
  Julien

  

-- 
Julien MASSIERA
Directeur développement produit
France Labs – Les experts du Search
Datafari – Vainqueur du trophée Big Data 2018 au Digital Innovation Makers 
Summit
www.francelabs.com