[ https://issues.apache.org/jira/browse/CONNECTORS-1584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16774146#comment-16774146 ]
Tim Steenbeke commented on CONNECTORS-1584: ------------------------------------------- {panel:title=Failure notice send by mailer-dae...@apache.org} Hi. This is the qmail-send program at apache.org. I'm afraid I wasn't able to deliver your message to the following addresses. This is a permanent error; I've given up. Sorry it didn't work out. <u...@manifoldcf.apache.org>: Must be sent from an @apache.org address or a subscriber address or an address in LDAP. --- Below this line is a copy of the message. Return-Path: <Tim.Steenbeke@formica.digital> Received: (qmail 90034 invoked by uid 99); 18 Feb 2019 10:35:51 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 18 Feb 2019 10:35:51 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id 07C55C84A2 for <u...@manifoldcf.apache.org>; Mon, 18 Feb 2019 10:35:51 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 1.998 X-Spam-Level: * X-Spam-Status: No, score=1.998 tagged_above=-999 required=6.31 tests=[DKIMWL_WL_MED=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, HTML_MESSAGE=2, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=disabled Authentication-Results: spamd1-us-west.apache.org (amavisd-new); dkim=pass (1024-bit key) header.d=cronos.onmicrosoft.com Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id bzxj-Zwazahp for <u...@manifoldcf.apache.org>; Mon, 18 Feb 2019 10:35:47 +0000 (UTC) Received: from EUR02-HE1-obe.outbound.protection.outlook.com (mail-eopbgr10062.outbound.protection.outlook.com [40.107.1.62]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTPS id 5432D5F533 for <u...@manifoldcf.apache.org>; Mon, 18 Feb 2019 10:35:47 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=CRONOS.onmicrosoft.com; s=selector1-CRONOS-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=NXxuOXxO7L5OIh8wemB0u1esV8BQdvefAryTpMAPvDU=; b=dZRUlfL4a6CvpIZbLZVeakgTuNXTti3W/oO9VcpZrao8Odjy7PljvmTce1+2kx3NxG/uWOFVhgaHgYSJXBOwRSVRwW/Ovx6YP1z5fw5nBpdoux666pZd7uzLlTJSM5kNOLwqrU2fIdSkW3J6qFqB1TMMu8Jm4BonW/kXylfb0SY= Received: from AM6PR0302MB3256.eurprd03.prod.outlook.com (52.133.27.27) by AM6PR0302MB3383.eurprd03.prod.outlook.com (52.133.28.10) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.1622.19; Mon, 18 Feb 2019 10:35:40 +0000 Received: from AM6PR0302MB3256.eurprd03.prod.outlook.com ([fe80::a8f3:3f23:b1f3:8ce6]) by AM6PR0302MB3256.eurprd03.prod.outlook.com ([fe80::a8f3:3f23:b1f3:8ce6%5]) with mapi id 15.20.1622.018; Mon, 18 Feb 2019 10:35:40 +0000 From: Steenbeke Tim <Tim.Steenbeke@formica.digital> To: "u...@manifoldcf.apache.org" <u...@manifoldcf.apache.org> Subject: Regex support Thread-Topic: Regex support Thread-Index: AQHUx3UQjsHP1lgCt0uYyVLe47S0rw== Date: Mon, 18 Feb 2019 10:35:40 +0000 Message-ID: <am6pr0302mb3256cb7a19417b6dc010a3fbed...@am6pr0302mb3256.eurprd03.prod.outlook.com> Accept-Language: en- Content-Language: en- X-MS-Has-Attach: X-MS-TNEF-Correlator: authentication-results: spf=none (sender IP is ) smtp.mailfrom=Tim.Steenbeke@formica.digital; x-originating-ip: [94.143.189.241] x-ms-publictraffictype: Email x-ms-office365-filtering-correlation-id: 8b0d3335-aeb7-4dc8-f619-08d6958cd411 x-microsoft-antispam: BCL:0;PCL:0;RULEID:(2390118)(7020095)(4652040)(8989299)(4534185)(4627221)(201703031133081)(201702281549075)(8990200)(5600110)(711020)(4605104)(2017052603328)(7153060)(7193020);SRVR:AM6PR0302MB3383; x-ms-traffictypediagnostic: AM6PR0302MB3383: x-ms-exchange-purlcount: 2 x-microsoft-exchange-diagnostics: 1;AM6PR0302MB3383;20:GL97sCN3oMJg9YDuqqZQjTkFnP+s9blDsxlMF5L7uIMW/Cz7EUmc2qn4aUHZ/Gk7T7u0uYQUMqr5RYnJ4UUZF2FRDvKg91ZSHM2t/jcwq+Udc5ibZTY5ZByYX7bVG9i6ZqCb2tLa/S///Mc2MjH8KqSVacv1zGyCeBiOczfh3E4= x-microsoft-antispam-prvs: <am6pr0302mb338386cfa08642df3031a601ed...@am6pr0302mb3383.eurprd03.prod.outlook.com> x-forefront-prvs: 09525C61DB x-forefront-antispam-report: SFV:NSPM;SFS:(10009020)(39860400002)(366004)(396003)(136003)(346002)(376002)(199004)(189003)(106356001)(7696005)(72206003)(7736002)(966005)(2906002)(99286004)(3480700005)(74316002)(71200400001)(71190400001)(6116002)(3846002)(4744005)(105586002)(316002)(19627405001)(2351001)(256004)(606006)(25786009)(861006)(486006)(476003)(33656002)(6506007)(26005)(6916009)(5660300002)(81156014)(7116003)(186003)(102836004)(8676002)(97736004)(1730700003)(81166006)(8936002)(14454004)(53936002)(478600001)(105004)(68736007)(6306002)(54896002)(86362001)(66066001)(733005)(53376002)(221733001)(5640700003)(55016002)(9686003)(236005)(6436002)(2501003)(19273905006)(46492003)(562404015)(563064011);DIR:OUT;SFP:1101;SCL:1;SRVR:AM6PR0302MB3383;H:AM6PR0302MB3256.eurprd03.prod.outlook.com;FPR:;SPF:None;LANG:en;PTR:InfoNoRecords;A:1;MX:1; received-spf: None (protection.outlook.com: formica.digital does not designate permitted sender hosts) x-ms-exchange-senderadcheck: 1 x-microsoft-antispam-message-info: EMPljRkNmTcSM1j5nMvNwDndVko9eGF6RcOYtpOSbyRY2kh9SX6vORYgHfViN5iQu98bqnv3p6gv+/nraMEuFBuyMaXvYIpeOHohAxeofpX3r86GS1OD8R+CdzPHqZ6gzr7tijy0lXCyqXvj0cYvXUE6rDeu3ZlBy37m62ecWB61iVQaeqGrFlO5fkxU/52AxjRZwrcMWbuhJvna/Bsee76ONLBGghQj86TBMh3bhoT2+4/h4Iq4SqOBU/ZE7yL3VfgiyWfFo6TbEwkjE6RajvfzdE3sCW5jI8sXOI3eiuNc6+c/KWFJc5cPXbfxLJ6t03x4nrGevuL/vBz3w2ToNWIPaVNYA7Z1sl+LflHQK98fyRH5wDjziheu5ADA1TXjG8tKzDCFFv6SrT17QUN/gwhQfX4GR0+UtqhtBYBofSQ= Content-Type: multipart/alternative; boundary="_000_AM6PR0302MB3256CB7A19417B6DC010A3FBED630AM6PR0302MB3256_" MIME-Version: 1.0 X-OriginatorOrg: formica.digital X-MS-Exchange-CrossTenant-Network-Message-Id: 8b0d3335-aeb7-4dc8-f619-08d6958cd411 X-MS-Exchange-CrossTenant-originalarrivaltime: 18 Feb 2019 10:35:40.4230 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: 49c3d703-3579-47bf-a888-7c913fbdced9 X-MS-Exchange-CrossTenant-mailboxtype: HOSTED X-MS-Exchange-Transport-CrossTenantHeadersStamped: AM6PR0302MB3383 --_000_AM6PR0302MB3256CB7A19417B6DC010A3FBED630AM6PR0302MB3256_ Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Dear manifoldCf support What type of regexs does manifold include and exclude support and also in g= eneral? At the moment i'm using a web repository connection and an Elastic output c= onnection. I'm trying to exclude urls that link to documents. e.g.: [https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwebsite.com%2Fsomething%2Fpath%2Ffile.pdf&data=02%7C01%7CTim.Steenbeke%40formica.digital%7C833999f1a42f46139b4208d6958cdc39%7C49c3d703357947bfa8887c913fbdced9%7C0%7C1%7C636860829562948053&sdata=T%2B%2B3YBSgMSrsKLN%2BhaXP4Nz5Erw7fylGUyp9GpwWZhE%3D&reserved=0] and [http://webs=|http://mobile-mail.google.com/-1911833645/4845611814004996015] ite.com/something/path/file.<https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwebsite.com%2Fsomething%2Fpath%2Ffile.pdf&data=02%7C01%7CTim.Steenbeke%40formica.digital%7C833999f1a42f46139b4208d6958cdc39%7C49c3d703357947bfa8887c913fbdced9%7C0%7C1%7C636860829562948053&sdata=T%2B%2B3YBSgMSrsKLN%2BhaXP4Nz5Erw7fylGUyp9GpwWZhE%3D&reserved=0>PDF The issue i'm having is that the regex that I have found so far doesn't wor= k case insensitive, so for every possible case i have to add a new line. e.g.: .*.pdf$ and .*.PDF$ and .*.Pdf and ... . Is it possible to add documentation what type of regex is able to be used o= r maybe a tool to test your regex and see if it is supported by manifold ? kind regards [https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.formica.digital%2F.resources%2Fformica-magnolia-theme%2Fassets%2Fimg%2Fl&data=02%7C01%7CTim.Steenbeke%40formica.digital%7C833999f1a42f46139b4208d6958cdc39%7C49c3d703357947bfa8887c913fbdced9%7C0%7C1%7C636860829562948053&sdata=BefQSqkWc%2FnHy86py%2FqVnQpgb6w56fJF3Wm4o%2FgonJY%3D&reserved=0= ogo-symbol.png] Tim Steenbeke Consultant M: tim.steenbeke@formica.digital T: +32 497 03 66 69 [https://emea01.safelinks.protection.outlook.com/?url=www.formica.digital&data=02%7C01%7CTim.Steenbeke%40formica.digital%7C833999f1a42f46139b4208d6958cdc39%7C49c3d703357947bfa8887c913fbdced9%7C0%7C1%7C636860829562948053&sdata=RBHgpvxq1A%2FkusR1gLtwhz%2F1UAibu4Gk0HNudIKlwR8%3D&reserved=0<https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.formica.digital%2F&data=02%7C01%7CTim.Steenbeke%40formica.digital%7C833999f1a42f46139b4208d6958cdc39%7C49c3d703357947bfa8887c913fbdced9%7C0%7C1%7C636860829562948053&sdata=iMpBe9IilcCo0j6pRMvR2MjYxH6O7dq%2B53ye9K6387o%3D&reserved=0|https://emea01.safelinks.protection.outlook.com/?url=www.formica.digital&data=02%7C01%7CTim.Steenbeke%40formica.digital%7C833999f1a42f46139b4208d6958cdc39%7C49c3d703357947bfa8887c913fbdced9%7C0%7C1%7C636860829562948053&sdata=RBHgpvxq1A%2FkusR1gLtwhz%2F1UAibu4Gk0HNudIKlwR8%3D&reserved=0%3Chttps://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.formica.digital%2F&data=02%7C01%7CTim.Steenbeke%40formica.digital%7C833999f1a42f46139b4208d6958cdc39%7C49c3d703357947bfa8887c913fbdced9%7C0%7C1%7C636860829562948053&sdata=iMpBe9IilcCo0j6pRMvR2MjYxH6O7dq%2B53ye9K6387o%3D&reserved=0]> --_000_AM6PR0302MB3256CB7A19417B6DC010A3FBED630AM6PR0302MB3256_ Content-Type: text/html; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable <html> <head> <meta http-equiv=3D"Content-Type" content=3D"text/html; charset=3Diso-8859-= 1"> <style type=3D"text/css" style=3D"display:none;"> P {margin-top:0;margin-bo= ttom:0;} </style> </head> <body dir=3D"ltr"> <div style=3D"font-family: Tahoma, Geneva, sans-serif; font-size: 12pt; col= or: rgb(0, 0, 0);"> Dear manifoldCf support<br> <br> What type of regexs does manifold include and exclude support and also in g= eneral?<br> <br> </div> <div style=3D"font-family: Tahoma, Geneva, sans-serif; font-size: 12pt; col= or: rgb(0, 0, 0);"> At the moment i'm using a web repository connection and an Elastic output c= onnection.<br> I'm trying to exclude urls that link to documents.</div> <div style=3D"font-family: Tahoma, Geneva, sans-serif; font-size: 12pt; col= or: rgb(0, 0, 0);"> e.g.: <a href=3D"https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwebsite.co&data=02%7C01%7CTim.Steenbeke%40formica.digital%7C833999f1a42f46139b4208d6958cdc39%7C49c3d703357947bfa8887c913fbdced9%7C0%7C1%7C636860829562948053&sdata=KpZCQj4pXEcPQS65ryXpv%2Bwh9zDyDXLvE5mhODLz7fc%3D&reserved=0= m/something/path/file.pdf" id=3D"LPlnk629905"> [https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwebsite.com%2Fsomething%2Fpath%2Ffile.pdf&data=02%7C01%7CTim.Steenbeke%40formica.digital%7C833999f1a42f46139b4208d6958cdc39%7C49c3d703357947bfa8887c913fbdced9%7C0%7C1%7C636860829562958058&sdata=VBFNcm3XNPd3vWaE8ytzndIsMfm8QSr%2FrOvGdsLhuoQ%3D&reserved=0</a> and <a|https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwebsite.com%2Fsomething%2Fpath%2Ffile.pdf&data=02%7C01%7CTim.Steenbeke%40formica.digital%7C833999f1a42f46139b4208d6958cdc39%7C49c3d703357947bfa8887c913fbdced9%7C0%7C1%7C636860829562958058&sdata=VBFNcm3XNPd3vWaE8ytzndIsMfm8QSr%2FrOvGdsLhuoQ%3D&reserved=0%3C/a%3E and %3Ca]href=3D"htt= p://website.com/something/path/file.pdf" style=3D"margin: 0px; font-family:= Tahoma, Geneva, sans-serif; background-color: rgb(255, 255, 255)" id=3D"LP= lnk423139">https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwebsite.com%2Fsomething%2Fpath%2Ffile&data=02%7C01%7CTim.Steenbeke%40formica.digital%7C833999f1a42f46139b4208d6958cdc39%7C49c3d703357947bfa8887c913fbdced9%7C0%7C1%7C636860829562958058&sdata=JoTpcd6VPobVXo5R5oYzPkdhHGZ7ixBHVDp7J%2FSuSFU%3D&reserved=0.</a>PDF </div> <div style=3D"font-family: Tahoma, Geneva, sans-serif; font-size: 12pt; col= or: rgb(0, 0, 0);"> The issue i'm having is that the regex that I have found so far doesn't wor= k case insensitive, so for every possible case i have to add a new lin= e. </div> <div style=3D"font-family: Tahoma, Geneva, sans-serif; font-size: 12pt; col= or: rgb(0, 0, 0);"> e.g.: .*.pdf$ and .*.PDF$ and .*.P= df and ... .</div> <div style=3D"font-family: Tahoma, Geneva, sans-serif; font-size: 12pt; col= or: rgb(0, 0, 0);"> <br> </div> <div style=3D"font-family: Tahoma, Geneva, sans-serif; font-size: 12pt; col= or: rgb(0, 0, 0);"> Is it possible to add documentation what type of regex is able to be used o= r maybe a tool to test your regex and see if it is supported by manifold ?<= br> </div> <div style=3D"font-family: Tahoma, Geneva, sans-serif; font-size: 12pt; col= or: rgb(0, 0, 0);"> <br> </div> <div style=3D"font-family: Tahoma, Geneva, sans-serif; font-size: 12pt; col= or: rgb(0, 0, 0);"> <br> </div> <div style=3D"font-family: Tahoma, Geneva, sans-serif; font-size: 12pt; col= or: rgb(0, 0, 0);"> kind regards </div> <div style=3D"font-family: Tahoma, Geneva, sans-serif; font-size: 12pt; col= or: rgb(0, 0, 0);"> <br> </div> <div style=3D"font-family: Tahoma, Geneva, sans-serif; font-size: 12pt; col= or: rgb(0, 0, 0);"> <br> </div> <div id=3D"signature"> <div style=3D"font-family:Tahoma; font-size:13px"> <div style=3D"font-family:Tahoma; font-size:13px"> <div style=3D"font-family:Tahoma; font-size:13px"> <div style=3D"font-family:Tahoma; font-size:13px"> <div style=3D"font-family:Tahoma; font-size:13px"> <div style=3D"font-family:Tahoma; font-size:13px"> <div style=3D"font-family:Tahoma"> <div> <table style=3D"font-family:Verdana,Arial,sans-serif; font-size:11px; borde= r-style:none; border-width:0px"> <tbody> <tr> <td style=3D"padding:0px 10px 0px 0px; max-height:85px; max-width:70px; mar= gin-right:20px; border-color:rgb(56,67,72); border-width:1px; border-right-= style:solid"> <img width=3D"70" height=3D"94" src=3D"https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.formica.digital%2F.resourc&data=02%7C01%7CTim.Steenbeke%40formica.digital%7C833999f1a42f46139b4208d6958cdc39%7C49c3d703357947bfa8887c913fbdced9%7C0%7C1%7C636860829562958058&sdata=gzGd36adZ62rrMVJjaHwDiA%2BT%2B9sU9T0h%2BFuGGS4%2Fbg%3D&reserved=0= es/formica-magnolia-theme/assets/img/logo-symbol.png"></td> <td style=3D"padding-left:20px"> <table style=3D"border-style:none; border-width:0px"> <tbody> <tr> <td style=3D"border-style:none; border-width:0px"><b>Tim Steenbeke</b></td> </tr> <tr> <td>Consultant</td> </tr> <tr> <td><span style=3D"color:rgb(0,159,227)">M: </span>tim.steenbeke@formi= ca.digital</td> </tr> <tr> <td><span style=3D"color:rgb(150,193,31)">T: </span>+32 497 03 66 = 69<br> </td> </tr> <tr> <td height=3D"25" valign=3D"bottom"><a title=3D"" href=3D"https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.formi&data=02%7C01%7CTim.Steenbeke%40formica.digital%7C833999f1a42f46139b4208d6958cdc39%7C49c3d703357947bfa8887c913fbdced9%7C0%7C1%7C636860829562958058&sdata=%2BdRCArzs5ZW8joqX094eFkSymGbJVbb9AYqx3eJtS48%3D&reserved=0= ca.digital/" target=3D"_blank" style=3D"color:rgb(56,67,72)">https://emea01.safelinks.protection.outlook.com/?url=www.formica.di&data=02%7C01%7CTim.Steenbeke%40formica.digital%7C833999f1a42f46139b4208d6958cdc39%7C49c3d703357947bfa8887c913fbdced9%7C0%7C1%7C636860829562958058&sdata=BL60pUH0iOOvEwDeY5ZKCAO7JbW8nEEg7m9vztKIRkc%3D&reserved=0= gital</a></td> </tr> </tbody> </table> </td> </tr> </tbody> </table> </div {panel} > regex documentation > ------------------- > > Key: CONNECTORS-1584 > URL: https://issues.apache.org/jira/browse/CONNECTORS-1584 > Project: ManifoldCF > Issue Type: Improvement > Components: Web connector > Affects Versions: ManifoldCF 2.12 > Reporter: Tim Steenbeke > Priority: Minor > > What type of regexs does manifold include and exclude support and also in > general regex support? > At the moment i'm using a web repository connection and an Elastic output > connection. > I'm trying to exclude urls that link to documents. > e.g. website.com/document/path/this.pdf and > website.com/document/path/other.PDF > The issue i'm having is that the regex that I have found so far doesn't work > case insensitive, so for every possible case i have to add a new line. > e.g.: > {code:java} > .*.pdf$ and .*.PDF$ and .*.Pdf and ... .{code} > Is it possible to add documentation what type of regex is able to be used or > maybe a tool to test your regex and see if it is supported by manifold ? > I tried mailing this question to > [u...@manifoldcf.apache.org|mailto:u...@manifoldcf.apache.org] but this mail > adress returns a failure notice. -- This message was sent by Atlassian JIRA (v7.6.3#76005)