[jira] [Comment Edited] (CONNECTORS-1529) Add "url" output element to ES Output Connector (required when used with the Web Repository Connector)

2018-11-01 Thread Steph van Schalkwyk (JIRA)


[ 
https://issues.apache.org/jira/browse/CONNECTORS-1529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16672487#comment-16672487
 ] 

Steph van Schalkwyk edited comment on CONNECTORS-1529 at 11/2/18 1:56 AM:
--

I added it as a addField in the Web Connector. Then allowing for it to be 
renamed in the Elasticsearch Connector. 
So it adds "documentId": "http://localhost:8000/10.pdf; to the metadata 
output by the Web Connector, but as an addField, so it doesn't break anything 
(at least in theory, unless there is a metadata rename with the same fieldname 
later in the pipeline).
Let me know if this works for you.

Its value is always lowercase as per ROOT.Locale.



was (Author: svanschalkwyk):
I added it as a addField in the Web Connector. Then allowing for it to be 
renamed in the Elasticsearch Connector. 
So it adds "documentId": "http://localhost:8000/10.pdf; to the metadata 
output by the Web Connector, but as an addField, so it doesn't break anything 
(at least in theory, unless there is a metadata rename with the same fieldname 
later in the pipeline).
Let me know if this works for you.


> Add "url" output element to ES Output Connector (required when used with the 
> Web Repository Connector)
> --
>
> Key: CONNECTORS-1529
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1529
> Project: ManifoldCF
>  Issue Type: Improvement
>  Components: Elastic Search connector
>Affects Versions: ManifoldCF 2.10
>Reporter: Steph van Schalkwyk
>Assignee: Steph van Schalkwyk
>Priority: Major
> Fix For: ManifoldCF 2.12
>
> Attachments: elasticsearch.patch, image-2018-09-06-10-28-45-008.png
>
>
> Add "url" (copy of the _id field) to ES Output.
> ES no longer supports copying from _id (copy-to) in the schema.
> As per 
> !image-2018-09-06-10-28-45-008.png!
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (CONNECTORS-1529) Add "url" output element to ES Output Connector (required when used with the Web Repository Connector)

2018-11-01 Thread Steph van Schalkwyk (JIRA)


[ 
https://issues.apache.org/jira/browse/CONNECTORS-1529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16672487#comment-16672487
 ] 

Steph van Schalkwyk commented on CONNECTORS-1529:
-

I added it as a addField in the Web Connector. Then allowing for it to be 
renamed in the Elasticsearch Connector. 
So it adds "documentId": "http://localhost:8000/10.pdf; to the metadata 
output by the Web Connector, but as an addField, so it doesn't break anything 
(at least in theory, unless there is a metadata rename with the same fieldname 
later in the pipeline).
Let me know if this works for you.


> Add "url" output element to ES Output Connector (required when used with the 
> Web Repository Connector)
> --
>
> Key: CONNECTORS-1529
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1529
> Project: ManifoldCF
>  Issue Type: Improvement
>  Components: Elastic Search connector
>Affects Versions: ManifoldCF 2.10
>Reporter: Steph van Schalkwyk
>Assignee: Steph van Schalkwyk
>Priority: Major
> Fix For: ManifoldCF 2.12
>
> Attachments: elasticsearch.patch, image-2018-09-06-10-28-45-008.png
>
>
> Add "url" (copy of the _id field) to ES Output.
> ES no longer supports copying from _id (copy-to) in the schema.
> As per 
> !image-2018-09-06-10-28-45-008.png!
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (CONNECTORS-1552) Apache ManifoldCF Elastic Connector for Basic Authorisation

2018-11-01 Thread Karl Wright (JIRA)


[ 
https://issues.apache.org/jira/browse/CONNECTORS-1552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16672435#comment-16672435
 ] 

Karl Wright commented on CONNECTORS-1552:
-

Looks good, but I'd suggest making sure the text capitalization style is 
consistent with everything else in the connector.


> Apache ManifoldCF Elastic Connector for Basic Authorisation
> ---
>
> Key: CONNECTORS-1552
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1552
> Project: ManifoldCF
>  Issue Type: Improvement
>  Components: Elastic Search connector
>Affects Versions: ManifoldCF 2.10
>Reporter: Krishna Agrawal
>Assignee: Steph van Schalkwyk
>Priority: Major
> Fix For: ManifoldCF 2.12
>
> Attachments: screenshot-1.png
>
>
> We are using the Apache Manifold CF to connect the elastic search as our 
> Elastic server is protected url there is no way we are able to connect from 
> the Admin console.
> If we remove the authentication connector works well but we want to access by 
> passing username and password.
> Please guide us so that we can complete our set up.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (CONNECTORS-1546) Optimize Elasticsearch performance by removing 'forcemerge'

2018-11-01 Thread Steph van Schalkwyk (JIRA)


[ 
https://issues.apache.org/jira/browse/CONNECTORS-1546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16672431#comment-16672431
 ] 

Steph van Schalkwyk commented on CONNECTORS-1546:
-

Removed.

> Optimize Elasticsearch performance by removing 'forcemerge'
> ---
>
> Key: CONNECTORS-1546
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1546
> Project: ManifoldCF
>  Issue Type: Improvement
>  Components: Elastic Search connector
>Reporter: Hans Van Goethem
>Assignee: Steph van Schalkwyk
>Priority: Major
>
> After crawling with ManifoldCF, forcemerge is applied to optimize the 
> Elasticsearch index. This optimization makes the Elastic faster for 
> read-operations but not for write-opeartions. On the contrary, performance on 
> the write operations becomes worse after every forcemerge. 
> Can you remove this forcemerge in ManifoldCF to optimize perfomance for 
> recurrent crawling to Elasticsearch?
> If somene needs this forcemerge, it can be applied mannually against 
> Elasticsearch directly.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (CONNECTORS-1552) Apache ManifoldCF Elastic Connector for Basic Authorisation

2018-11-01 Thread Steph van Schalkwyk (JIRA)


 [ 
https://issues.apache.org/jira/browse/CONNECTORS-1552?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steph van Schalkwyk updated CONNECTORS-1552:

Attachment: screenshot-1.png

> Apache ManifoldCF Elastic Connector for Basic Authorisation
> ---
>
> Key: CONNECTORS-1552
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1552
> Project: ManifoldCF
>  Issue Type: Improvement
>  Components: Elastic Search connector
>Affects Versions: ManifoldCF 2.10
>Reporter: Krishna Agrawal
>Assignee: Steph van Schalkwyk
>Priority: Major
> Fix For: ManifoldCF 2.12
>
> Attachments: screenshot-1.png
>
>
> We are using the Apache Manifold CF to connect the elastic search as our 
> Elastic server is protected url there is no way we are able to connect from 
> the Admin console.
> If we remove the authentication connector works well but we want to access by 
> passing username and password.
> Please guide us so that we can complete our set up.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (CONNECTORS-1552) Apache ManifoldCF Elastic Connector for Basic Authorisation

2018-11-01 Thread Steph van Schalkwyk (JIRA)


[ 
https://issues.apache.org/jira/browse/CONNECTORS-1552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16672430#comment-16672430
 ] 

Steph van Schalkwyk commented on CONNECTORS-1552:
-

 !screenshot-1.png! 

> Apache ManifoldCF Elastic Connector for Basic Authorisation
> ---
>
> Key: CONNECTORS-1552
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1552
> Project: ManifoldCF
>  Issue Type: Improvement
>  Components: Elastic Search connector
>Affects Versions: ManifoldCF 2.10
>Reporter: Krishna Agrawal
>Assignee: Steph van Schalkwyk
>Priority: Major
> Fix For: ManifoldCF 2.12
>
> Attachments: screenshot-1.png
>
>
> We are using the Apache Manifold CF to connect the elastic search as our 
> Elastic server is protected url there is no way we are able to connect from 
> the Admin console.
> If we remove the authentication connector works well but we want to access by 
> passing username and password.
> Please guide us so that we can complete our set up.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Issue Comment Deleted] (CONNECTORS-1552) Apache ManifoldCF Elastic Connector for Basic Authorisation

2018-11-01 Thread Steph van Schalkwyk (JIRA)


 [ 
https://issues.apache.org/jira/browse/CONNECTORS-1552?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steph van Schalkwyk updated CONNECTORS-1552:

Comment: was deleted

(was: !image-2018-11-01-20-00-35-913.png!)

> Apache ManifoldCF Elastic Connector for Basic Authorisation
> ---
>
> Key: CONNECTORS-1552
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1552
> Project: ManifoldCF
>  Issue Type: Improvement
>  Components: Elastic Search connector
>Affects Versions: ManifoldCF 2.10
>Reporter: Krishna Agrawal
>Assignee: Steph van Schalkwyk
>Priority: Major
> Fix For: ManifoldCF 2.12
>
>
> We are using the Apache Manifold CF to connect the elastic search as our 
> Elastic server is protected url there is no way we are able to connect from 
> the Admin console.
> If we remove the authentication connector works well but we want to access by 
> passing username and password.
> Please guide us so that we can complete our set up.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (CONNECTORS-1552) Apache ManifoldCF Elastic Connector for Basic Authorisation

2018-11-01 Thread Steph van Schalkwyk (JIRA)


[ 
https://issues.apache.org/jira/browse/CONNECTORS-1552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16672428#comment-16672428
 ] 

Steph van Schalkwyk commented on CONNECTORS-1552:
-

!image-2018-11-01-20-00-35-913.png!

> Apache ManifoldCF Elastic Connector for Basic Authorisation
> ---
>
> Key: CONNECTORS-1552
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1552
> Project: ManifoldCF
>  Issue Type: Improvement
>  Components: Elastic Search connector
>Affects Versions: ManifoldCF 2.10
>Reporter: Krishna Agrawal
>Assignee: Steph van Schalkwyk
>Priority: Major
> Fix For: ManifoldCF 2.12
>
>
> We are using the Apache Manifold CF to connect the elastic search as our 
> Elastic server is protected url there is no way we are able to connect from 
> the Admin console.
> If we remove the authentication connector works well but we want to access by 
> passing username and password.
> Please guide us so that we can complete our set up.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (CONNECTORS-1552) Apache ManifoldCF Elastic Connector for Basic Authorisation

2018-11-01 Thread Steph van Schalkwyk (JIRA)


[ 
https://issues.apache.org/jira/browse/CONNECTORS-1552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16672427#comment-16672427
 ] 

Steph van Schalkwyk commented on CONNECTORS-1552:
-

I have added username and userpassword to the ES connector. This allows the 
following usage:

http(s)://username:userpassword@localhost:9200

!image-2018-11-01-20-00-21-283.png!

> Apache ManifoldCF Elastic Connector for Basic Authorisation
> ---
>
> Key: CONNECTORS-1552
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1552
> Project: ManifoldCF
>  Issue Type: Improvement
>  Components: Elastic Search connector
>Affects Versions: ManifoldCF 2.10
>Reporter: Krishna Agrawal
>Assignee: Steph van Schalkwyk
>Priority: Major
> Fix For: ManifoldCF 2.12
>
>
> We are using the Apache Manifold CF to connect the elastic search as our 
> Elastic server is protected url there is no way we are able to connect from 
> the Admin console.
> If we remove the authentication connector works well but we want to access by 
> passing username and password.
> Please guide us so that we can complete our set up.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (CONNECTORS-1529) Add "url" output element to ES Output Connector (required when used with the Web Repository Connector)

2018-11-01 Thread Karl Wright (JIRA)


[ 
https://issues.apache.org/jira/browse/CONNECTORS-1529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16672425#comment-16672425
 ] 

Karl Wright commented on CONNECTORS-1529:
-

As long as it's a new field, seems that backwards compatibility is preserved, 
so I'm OK with it.


> Add "url" output element to ES Output Connector (required when used with the 
> Web Repository Connector)
> --
>
> Key: CONNECTORS-1529
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1529
> Project: ManifoldCF
>  Issue Type: Improvement
>  Components: Elastic Search connector
>Affects Versions: ManifoldCF 2.10
>Reporter: Steph van Schalkwyk
>Assignee: Steph van Schalkwyk
>Priority: Major
> Fix For: ManifoldCF 2.12
>
> Attachments: elasticsearch.patch, image-2018-09-06-10-28-45-008.png
>
>
> Add "url" (copy of the _id field) to ES Output.
> ES no longer supports copying from _id (copy-to) in the schema.
> As per 
> !image-2018-09-06-10-28-45-008.png!
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (CONNECTORS-1529) Add "url" output element to ES Output Connector (required when used with the Web Repository Connector)

2018-11-01 Thread Steph van Schalkwyk (JIRA)


[ 
https://issues.apache.org/jira/browse/CONNECTORS-1529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16672423#comment-16672423
 ] 

Steph van Schalkwyk commented on CONNECTORS-1529:
-

I have added the "documentId":  metatag to the Web Connector.
 * *"documentId": 
["http://localhost:8000/10.pdf;|http://localhost:8000/10.pdf]***

*Will this work for everybody?*

*Steph*

 

 

 
 * "_index": "index_cpt_all",
 * "_type": "catalogline",
 * "_id": ["http://localhost:8000/10.pdf;|http://localhost:8000/10.pdf],
 * "_version": 1,
 * "_score": 1,
 * "_source": {
 ** "date": "2005-05-05T21:19:55Z",
 ** "pdf:PDFVersion": "1.3",
 ** "pdf:docinfo:title": "Microsoft Word - 48428.doc",
 ** "xmp:CreatorTool": "PScript5.dll Version 5.2",
 ** "Server": "SimpleHTTP/0.6 Python/3.5.2",
 ** "access_permission:modify_annotations": "true",
 ** "access_permission:can_print_degraded": "true",
 ** "dc:creator": "edocslib",
 ** "dcterms:created": "2005-05-05T21:19:55Z",
 ** "Last-Modified": "2005-05-05T21:19:55Z",
 ** "dcterms:modified": "2005-05-05T21:19:55Z",
 ** "dc:format": "application/pdf; version=1.3",
 ** "title": "Microsoft Word - 48428.doc",
 ** "Last-Save-Date": "2005-05-05T21:19:55Z",
 ** "pdf:docinfo:creator_tool": "PScript5.dll Version 5.2",
 ** "access_permission:fill_in_form": "true",
 ** "pdf:docinfo:modified": "2005-05-05T21:19:55Z",
 ** "stream_name": "10.pdf",
 ** "meta:save-date": "2005-05-05T21:19:55Z",
 ** "pdf:encrypted": "false",
 ** "dc:title": "Microsoft Word - 48428.doc",
 ** "modified": "2005-05-05T21:19:55Z",
 ** "Content-Length": "120441",
 ** "Content-Type": "application/pdf",
 ** "stream_size": "120441",
 ** "pdf:docinfo:creator": "edocslib",
 ** "X-Parsed-By": "org.apache.tika.parser.DefaultParser",
 ** "creator": "edocslib",
 ** "meta:author": "edocslib",
 ** "meta:creation-date": "2005-05-05T21:19:55Z",
 ** "created": "Thu May 05 16:19:55 CDT 2005",
 ** "documentId": 
["http://localhost:8000/10.pdf;|http://localhost:8000/10.pdf],
 ** "access_permission:extract_for_accessibility": "true",
 ** "access_permission:assemble_document": "true",
 ** "xmpTPg:NPages": "4",
 ** "Creation-Date": "2005-05-05T21:19:55Z",
 ** "resourceName": "10.pdf",
 ** "access_permission:extract_content": "true",
 ** "access_permission:can_print": "true",
 ** "Content-type": "application/pdf",
 ** "Author": "edocslib",
 ** "producer": "Acrobat Distiller 5.0 (Windows)",
 ** "access_permission:can_modify": "true",
 ** "pdf:docinfo:producer": "Acrobat Distiller 5.0 (Windows)",
 ** "pdf:docinfo:created": "2005-05-05T21:19:55Z",
 ** "indexed": "2018-11-02T00:50:48.053+",
 ** "mime-type": "application/pdf",
 ** "allow_token_document": "__nosecurity__",
 ** "deny_token_document": "__nosecurity__",
 ** "allow_token_share": "__nosecurity__",
 ** "deny_token_share": "__nosecurity__",
 ** "allow_token_parent": "__nosecurity__",
 ** "deny_token_parent": "__nosecurity__",
 ** "content": " Federal Communications Commission DA 05

> Add "url" output element to ES Output Connector (required when used with the 
> Web Repository Connector)
> --
>
> Key: CONNECTORS-1529
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1529
> Project: ManifoldCF
>  Issue Type: Improvement
>  Components: Elastic Search connector
>Affects Versions: ManifoldCF 2.10
>Reporter: Steph van Schalkwyk
>Assignee: Steph van Schalkwyk
>Priority: Major
> Fix For: ManifoldCF 2.12
>
> Attachments: elasticsearch.patch, image-2018-09-06-10-28-45-008.png
>
>
> Add "url" (copy of the _id field) to ES Output.
> ES no longer supports copying from _id (copy-to) in the schema.
> As per 
> !image-2018-09-06-10-28-45-008.png!
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)