[jira] [Comment Edited] (CONNECTORS-1529) Add "url" output element to ES Output Connector (required when used with the Web Repository Connector)
[ https://issues.apache.org/jira/browse/CONNECTORS-1529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16672487#comment-16672487 ] Steph van Schalkwyk edited comment on CONNECTORS-1529 at 11/2/18 1:56 AM: -- I added it as a addField in the Web Connector. Then allowing for it to be renamed in the Elasticsearch Connector. So it adds "documentId": "http://localhost:8000/10.pdf; to the metadata output by the Web Connector, but as an addField, so it doesn't break anything (at least in theory, unless there is a metadata rename with the same fieldname later in the pipeline). Let me know if this works for you. Its value is always lowercase as per ROOT.Locale. was (Author: svanschalkwyk): I added it as a addField in the Web Connector. Then allowing for it to be renamed in the Elasticsearch Connector. So it adds "documentId": "http://localhost:8000/10.pdf; to the metadata output by the Web Connector, but as an addField, so it doesn't break anything (at least in theory, unless there is a metadata rename with the same fieldname later in the pipeline). Let me know if this works for you. > Add "url" output element to ES Output Connector (required when used with the > Web Repository Connector) > -- > > Key: CONNECTORS-1529 > URL: https://issues.apache.org/jira/browse/CONNECTORS-1529 > Project: ManifoldCF > Issue Type: Improvement > Components: Elastic Search connector >Affects Versions: ManifoldCF 2.10 >Reporter: Steph van Schalkwyk >Assignee: Steph van Schalkwyk >Priority: Major > Fix For: ManifoldCF 2.12 > > Attachments: elasticsearch.patch, image-2018-09-06-10-28-45-008.png > > > Add "url" (copy of the _id field) to ES Output. > ES no longer supports copying from _id (copy-to) in the schema. > As per > !image-2018-09-06-10-28-45-008.png! > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (CONNECTORS-1529) Add "url" output element to ES Output Connector (required when used with the Web Repository Connector)
[ https://issues.apache.org/jira/browse/CONNECTORS-1529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16672487#comment-16672487 ] Steph van Schalkwyk commented on CONNECTORS-1529: - I added it as a addField in the Web Connector. Then allowing for it to be renamed in the Elasticsearch Connector. So it adds "documentId": "http://localhost:8000/10.pdf; to the metadata output by the Web Connector, but as an addField, so it doesn't break anything (at least in theory, unless there is a metadata rename with the same fieldname later in the pipeline). Let me know if this works for you. > Add "url" output element to ES Output Connector (required when used with the > Web Repository Connector) > -- > > Key: CONNECTORS-1529 > URL: https://issues.apache.org/jira/browse/CONNECTORS-1529 > Project: ManifoldCF > Issue Type: Improvement > Components: Elastic Search connector >Affects Versions: ManifoldCF 2.10 >Reporter: Steph van Schalkwyk >Assignee: Steph van Schalkwyk >Priority: Major > Fix For: ManifoldCF 2.12 > > Attachments: elasticsearch.patch, image-2018-09-06-10-28-45-008.png > > > Add "url" (copy of the _id field) to ES Output. > ES no longer supports copying from _id (copy-to) in the schema. > As per > !image-2018-09-06-10-28-45-008.png! > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (CONNECTORS-1552) Apache ManifoldCF Elastic Connector for Basic Authorisation
[ https://issues.apache.org/jira/browse/CONNECTORS-1552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16672435#comment-16672435 ] Karl Wright commented on CONNECTORS-1552: - Looks good, but I'd suggest making sure the text capitalization style is consistent with everything else in the connector. > Apache ManifoldCF Elastic Connector for Basic Authorisation > --- > > Key: CONNECTORS-1552 > URL: https://issues.apache.org/jira/browse/CONNECTORS-1552 > Project: ManifoldCF > Issue Type: Improvement > Components: Elastic Search connector >Affects Versions: ManifoldCF 2.10 >Reporter: Krishna Agrawal >Assignee: Steph van Schalkwyk >Priority: Major > Fix For: ManifoldCF 2.12 > > Attachments: screenshot-1.png > > > We are using the Apache Manifold CF to connect the elastic search as our > Elastic server is protected url there is no way we are able to connect from > the Admin console. > If we remove the authentication connector works well but we want to access by > passing username and password. > Please guide us so that we can complete our set up. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (CONNECTORS-1546) Optimize Elasticsearch performance by removing 'forcemerge'
[ https://issues.apache.org/jira/browse/CONNECTORS-1546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16672431#comment-16672431 ] Steph van Schalkwyk commented on CONNECTORS-1546: - Removed. > Optimize Elasticsearch performance by removing 'forcemerge' > --- > > Key: CONNECTORS-1546 > URL: https://issues.apache.org/jira/browse/CONNECTORS-1546 > Project: ManifoldCF > Issue Type: Improvement > Components: Elastic Search connector >Reporter: Hans Van Goethem >Assignee: Steph van Schalkwyk >Priority: Major > > After crawling with ManifoldCF, forcemerge is applied to optimize the > Elasticsearch index. This optimization makes the Elastic faster for > read-operations but not for write-opeartions. On the contrary, performance on > the write operations becomes worse after every forcemerge. > Can you remove this forcemerge in ManifoldCF to optimize perfomance for > recurrent crawling to Elasticsearch? > If somene needs this forcemerge, it can be applied mannually against > Elasticsearch directly. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (CONNECTORS-1552) Apache ManifoldCF Elastic Connector for Basic Authorisation
[ https://issues.apache.org/jira/browse/CONNECTORS-1552?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steph van Schalkwyk updated CONNECTORS-1552: Attachment: screenshot-1.png > Apache ManifoldCF Elastic Connector for Basic Authorisation > --- > > Key: CONNECTORS-1552 > URL: https://issues.apache.org/jira/browse/CONNECTORS-1552 > Project: ManifoldCF > Issue Type: Improvement > Components: Elastic Search connector >Affects Versions: ManifoldCF 2.10 >Reporter: Krishna Agrawal >Assignee: Steph van Schalkwyk >Priority: Major > Fix For: ManifoldCF 2.12 > > Attachments: screenshot-1.png > > > We are using the Apache Manifold CF to connect the elastic search as our > Elastic server is protected url there is no way we are able to connect from > the Admin console. > If we remove the authentication connector works well but we want to access by > passing username and password. > Please guide us so that we can complete our set up. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (CONNECTORS-1552) Apache ManifoldCF Elastic Connector for Basic Authorisation
[ https://issues.apache.org/jira/browse/CONNECTORS-1552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16672430#comment-16672430 ] Steph van Schalkwyk commented on CONNECTORS-1552: - !screenshot-1.png! > Apache ManifoldCF Elastic Connector for Basic Authorisation > --- > > Key: CONNECTORS-1552 > URL: https://issues.apache.org/jira/browse/CONNECTORS-1552 > Project: ManifoldCF > Issue Type: Improvement > Components: Elastic Search connector >Affects Versions: ManifoldCF 2.10 >Reporter: Krishna Agrawal >Assignee: Steph van Schalkwyk >Priority: Major > Fix For: ManifoldCF 2.12 > > Attachments: screenshot-1.png > > > We are using the Apache Manifold CF to connect the elastic search as our > Elastic server is protected url there is no way we are able to connect from > the Admin console. > If we remove the authentication connector works well but we want to access by > passing username and password. > Please guide us so that we can complete our set up. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Issue Comment Deleted] (CONNECTORS-1552) Apache ManifoldCF Elastic Connector for Basic Authorisation
[ https://issues.apache.org/jira/browse/CONNECTORS-1552?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steph van Schalkwyk updated CONNECTORS-1552: Comment: was deleted (was: !image-2018-11-01-20-00-35-913.png!) > Apache ManifoldCF Elastic Connector for Basic Authorisation > --- > > Key: CONNECTORS-1552 > URL: https://issues.apache.org/jira/browse/CONNECTORS-1552 > Project: ManifoldCF > Issue Type: Improvement > Components: Elastic Search connector >Affects Versions: ManifoldCF 2.10 >Reporter: Krishna Agrawal >Assignee: Steph van Schalkwyk >Priority: Major > Fix For: ManifoldCF 2.12 > > > We are using the Apache Manifold CF to connect the elastic search as our > Elastic server is protected url there is no way we are able to connect from > the Admin console. > If we remove the authentication connector works well but we want to access by > passing username and password. > Please guide us so that we can complete our set up. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (CONNECTORS-1552) Apache ManifoldCF Elastic Connector for Basic Authorisation
[ https://issues.apache.org/jira/browse/CONNECTORS-1552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16672428#comment-16672428 ] Steph van Schalkwyk commented on CONNECTORS-1552: - !image-2018-11-01-20-00-35-913.png! > Apache ManifoldCF Elastic Connector for Basic Authorisation > --- > > Key: CONNECTORS-1552 > URL: https://issues.apache.org/jira/browse/CONNECTORS-1552 > Project: ManifoldCF > Issue Type: Improvement > Components: Elastic Search connector >Affects Versions: ManifoldCF 2.10 >Reporter: Krishna Agrawal >Assignee: Steph van Schalkwyk >Priority: Major > Fix For: ManifoldCF 2.12 > > > We are using the Apache Manifold CF to connect the elastic search as our > Elastic server is protected url there is no way we are able to connect from > the Admin console. > If we remove the authentication connector works well but we want to access by > passing username and password. > Please guide us so that we can complete our set up. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (CONNECTORS-1552) Apache ManifoldCF Elastic Connector for Basic Authorisation
[ https://issues.apache.org/jira/browse/CONNECTORS-1552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16672427#comment-16672427 ] Steph van Schalkwyk commented on CONNECTORS-1552: - I have added username and userpassword to the ES connector. This allows the following usage: http(s)://username:userpassword@localhost:9200 !image-2018-11-01-20-00-21-283.png! > Apache ManifoldCF Elastic Connector for Basic Authorisation > --- > > Key: CONNECTORS-1552 > URL: https://issues.apache.org/jira/browse/CONNECTORS-1552 > Project: ManifoldCF > Issue Type: Improvement > Components: Elastic Search connector >Affects Versions: ManifoldCF 2.10 >Reporter: Krishna Agrawal >Assignee: Steph van Schalkwyk >Priority: Major > Fix For: ManifoldCF 2.12 > > > We are using the Apache Manifold CF to connect the elastic search as our > Elastic server is protected url there is no way we are able to connect from > the Admin console. > If we remove the authentication connector works well but we want to access by > passing username and password. > Please guide us so that we can complete our set up. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (CONNECTORS-1529) Add "url" output element to ES Output Connector (required when used with the Web Repository Connector)
[ https://issues.apache.org/jira/browse/CONNECTORS-1529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16672425#comment-16672425 ] Karl Wright commented on CONNECTORS-1529: - As long as it's a new field, seems that backwards compatibility is preserved, so I'm OK with it. > Add "url" output element to ES Output Connector (required when used with the > Web Repository Connector) > -- > > Key: CONNECTORS-1529 > URL: https://issues.apache.org/jira/browse/CONNECTORS-1529 > Project: ManifoldCF > Issue Type: Improvement > Components: Elastic Search connector >Affects Versions: ManifoldCF 2.10 >Reporter: Steph van Schalkwyk >Assignee: Steph van Schalkwyk >Priority: Major > Fix For: ManifoldCF 2.12 > > Attachments: elasticsearch.patch, image-2018-09-06-10-28-45-008.png > > > Add "url" (copy of the _id field) to ES Output. > ES no longer supports copying from _id (copy-to) in the schema. > As per > !image-2018-09-06-10-28-45-008.png! > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (CONNECTORS-1529) Add "url" output element to ES Output Connector (required when used with the Web Repository Connector)
[ https://issues.apache.org/jira/browse/CONNECTORS-1529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16672423#comment-16672423 ] Steph van Schalkwyk commented on CONNECTORS-1529: - I have added the "documentId": metatag to the Web Connector. * *"documentId": ["http://localhost:8000/10.pdf;|http://localhost:8000/10.pdf]*** *Will this work for everybody?* *Steph* * "_index": "index_cpt_all", * "_type": "catalogline", * "_id": ["http://localhost:8000/10.pdf;|http://localhost:8000/10.pdf], * "_version": 1, * "_score": 1, * "_source": { ** "date": "2005-05-05T21:19:55Z", ** "pdf:PDFVersion": "1.3", ** "pdf:docinfo:title": "Microsoft Word - 48428.doc", ** "xmp:CreatorTool": "PScript5.dll Version 5.2", ** "Server": "SimpleHTTP/0.6 Python/3.5.2", ** "access_permission:modify_annotations": "true", ** "access_permission:can_print_degraded": "true", ** "dc:creator": "edocslib", ** "dcterms:created": "2005-05-05T21:19:55Z", ** "Last-Modified": "2005-05-05T21:19:55Z", ** "dcterms:modified": "2005-05-05T21:19:55Z", ** "dc:format": "application/pdf; version=1.3", ** "title": "Microsoft Word - 48428.doc", ** "Last-Save-Date": "2005-05-05T21:19:55Z", ** "pdf:docinfo:creator_tool": "PScript5.dll Version 5.2", ** "access_permission:fill_in_form": "true", ** "pdf:docinfo:modified": "2005-05-05T21:19:55Z", ** "stream_name": "10.pdf", ** "meta:save-date": "2005-05-05T21:19:55Z", ** "pdf:encrypted": "false", ** "dc:title": "Microsoft Word - 48428.doc", ** "modified": "2005-05-05T21:19:55Z", ** "Content-Length": "120441", ** "Content-Type": "application/pdf", ** "stream_size": "120441", ** "pdf:docinfo:creator": "edocslib", ** "X-Parsed-By": "org.apache.tika.parser.DefaultParser", ** "creator": "edocslib", ** "meta:author": "edocslib", ** "meta:creation-date": "2005-05-05T21:19:55Z", ** "created": "Thu May 05 16:19:55 CDT 2005", ** "documentId": ["http://localhost:8000/10.pdf;|http://localhost:8000/10.pdf], ** "access_permission:extract_for_accessibility": "true", ** "access_permission:assemble_document": "true", ** "xmpTPg:NPages": "4", ** "Creation-Date": "2005-05-05T21:19:55Z", ** "resourceName": "10.pdf", ** "access_permission:extract_content": "true", ** "access_permission:can_print": "true", ** "Content-type": "application/pdf", ** "Author": "edocslib", ** "producer": "Acrobat Distiller 5.0 (Windows)", ** "access_permission:can_modify": "true", ** "pdf:docinfo:producer": "Acrobat Distiller 5.0 (Windows)", ** "pdf:docinfo:created": "2005-05-05T21:19:55Z", ** "indexed": "2018-11-02T00:50:48.053+", ** "mime-type": "application/pdf", ** "allow_token_document": "__nosecurity__", ** "deny_token_document": "__nosecurity__", ** "allow_token_share": "__nosecurity__", ** "deny_token_share": "__nosecurity__", ** "allow_token_parent": "__nosecurity__", ** "deny_token_parent": "__nosecurity__", ** "content": " Federal Communications Commission DA 05 > Add "url" output element to ES Output Connector (required when used with the > Web Repository Connector) > -- > > Key: CONNECTORS-1529 > URL: https://issues.apache.org/jira/browse/CONNECTORS-1529 > Project: ManifoldCF > Issue Type: Improvement > Components: Elastic Search connector >Affects Versions: ManifoldCF 2.10 >Reporter: Steph van Schalkwyk >Assignee: Steph van Schalkwyk >Priority: Major > Fix For: ManifoldCF 2.12 > > Attachments: elasticsearch.patch, image-2018-09-06-10-28-45-008.png > > > Add "url" (copy of the _id field) to ES Output. > ES no longer supports copying from _id (copy-to) in the schema. > As per > !image-2018-09-06-10-28-45-008.png! > -- This message was sent by Atlassian JIRA (v7.6.3#76005)