[jira] [Commented] (CONNECTORS-1563) SolrException: org.apache.tika.exception.ZeroByteFileException: InputStream must have > 0 bytes
[ https://issues.apache.org/jira/browse/CONNECTORS-1563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16771663#comment-16771663 ] Karl Wright commented on CONNECTORS-1563: - Hi Subasini, Are you now Tika-extracting in ManifoldCF, or in Solr? The text field looks like it contains properly extracted content, along with other stuff you do not want. Is this correct? If the extraction is happening in Solr, then I have no idea what this is coming from. If the extraction is happening in ManifoldCF, then if you have placed a Metadata Adjuster transformer in the pipeline between the Tika Extractor and the Solr Output Connector, I'd say you had set it up to concatenate many fields together into a text field. The Metadata Adjuster has that ability. The choice of how metadata (or content) fields get mapped to Solr schema is set up in your Solr output connection configuration. The Tika extraction basically replaces a binary input document with a character-sequence output document plus metadata fields. The character-sequence output document then must be sent to Solr not using the exracting update handler, but just the standard handler, so the handler should be changed from /update/extract to just /update, and the "Use extracting update handler" should be turned off. The actual field name used for the extracted content body can also be changed, if desired, in the "Schema" part of the configuration. But what is there by default works with Solr as it's set up by default. > SolrException: org.apache.tika.exception.ZeroByteFileException: InputStream > must have > 0 bytes > --- > > Key: CONNECTORS-1563 > URL: https://issues.apache.org/jira/browse/CONNECTORS-1563 > Project: ManifoldCF > Issue Type: Task > Components: Lucene/SOLR connector >Reporter: Sneha >Assignee: Karl Wright >Priority: Major > Attachments: Document simple history.docx, managed-schema, manifold > settings.docx, manifoldcf.log, solr.log, solrconfig.xml > > > I am encountering this problem: > I have checked "Use the Extract Update Handler:" param then I am getting an > error on Solr i.e. null:org.apache.solr.common.SolrException: > org.apache.tika.exception.ZeroByteFileException: InputStream must have > 0 > bytes > If I ignore tika exception, my documents get indexed but dont have content > field on Solr. > I am using Solr 7.3.1 and manifoldCF 2.8.1 > I am using solr cell and hence not configured external tika extractor in > manifoldCF pipeline > Please help me with this problem > Thanks in advance -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (CONNECTORS-1563) SolrException: org.apache.tika.exception.ZeroByteFileException: InputStream must have > 0 bytes
[ https://issues.apache.org/jira/browse/CONNECTORS-1563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16771607#comment-16771607 ] Subasini Rath commented on CONNECTORS-1563: --- Hi Karl, Could you please guide me - to which field manifold writes the actual textual content of the document. Currently I am using the _text_ field but it has been found that _text_ does not contain the actual data. Rather it added some extra values to the actual content. In my managed-schema : After my indexing in Solr, the value looks like : (The first 4 lines are appended before the content of file) "title":["NETWORK PLANNING\u"], "_text_":[" \n \n stream_size 34070 \n X-Parsed-By org.apache.tika.parser.DefaultParser \n X-Parsed-By org.apache.tika.parser.txt.TXTParser \n stream_content_type application/pdf \n stream_name cs.exe?bmsdocid=9.2.1=eebms.docdownload \n stream_source_info cs.exe?bmsdocid=9.2.1=eebms.docdownload \n Content-Encoding UTF-8 \n resourceName cs.exe?bmsdocid=9.2.1=eebms.docdownload \n Content-Type text/plain; charset=UTF-8 \n \n \n 9.2.1 UNCONTROLLED IF PRINTED Page 1 of 13\nCompany Policy\nNETWORK\nDocument No Amendment No Approved By Approval Date Review Date\n: : : : :\n9.2.1 9 CEO 23/05/2016 23/05/2019\n9.2.1 NETWORK PLANNING\n1.0 POLICY STATEMENT\nThe company will plan the expansion and augmentation of its electrical network to achieve levels of safety, reliability and quality of supply commensurate with community, regulator, customer and shareholder expectations.\nThe company will coordinate its planning with the NSW transmission utility Transgrid and neighbouring distribution utilities to develop effective solutions to satisfy load growth within the company’s supply area and in adjacent franchise areas where the company’s network has influence.\n2.0 PURPOSE\nTo provide principles for planning network Thanks & Regards, Subasini Rath O: +91-33 6636-8889 M: +91 983-1234-341 Email: subasini.r...@endeavourenergy.com.au > SolrException: org.apache.tika.exception.ZeroByteFileException: InputStream > must have > 0 bytes > --- > > Key: CONNECTORS-1563 > URL: https://issues.apache.org/jira/browse/CONNECTORS-1563 > Project: ManifoldCF > Issue Type: Task > Components: Lucene/SOLR connector >Reporter: Sneha >Assignee: Karl Wright >Priority: Major > Attachments: Document simple history.docx, managed-schema, manifold > settings.docx, manifoldcf.log, solr.log, solrconfig.xml > > > I am encountering this problem: > I have checked "Use the Extract Update Handler:" param then I am getting an > error on Solr i.e. null:org.apache.solr.common.SolrException: > org.apache.tika.exception.ZeroByteFileException: InputStream must have > 0 > bytes > If I ignore tika exception, my documents get indexed but dont have content > field on Solr. > I am using Solr 7.3.1 and manifoldCF 2.8.1 > I am using solr cell and hence not configured external tika extractor in > manifoldCF pipeline > Please help me with this problem > Thanks in advance -- This message was sent by Atlassian JIRA (v7.6.3#76005)
Re: Error integrity constraint violation
Hello, Wright: Thank you for your answer. I had not edited my own connector's build.xml, which I had copied from WebCrawler Connector. After I edited the build.xml for changing my class name, MCF runs fine. Again, I appreciate for your help. Sincerely, Kaya 2019年2月19日(火) 10:12 Karl Wright : > Hi Kaya, > > Database constraint violations, as you know, occur because you're trying to > put more than one identical value into a table column that cannot have such > a column. For the table in question, if you have the same class name for > two different connectors, this would be what you'd expect. > > Karl > > > On Sun, Feb 17, 2019 at 11:33 PM Kaya Ota wrote: > > > Hello, folks: > > > > I am new to ManifoldCF, and trying to make my own connector. > > For now, I could successfully build ManifoldCF including my own > connector. > > However, when I tried to run, I have exceptions. > > > > The exception I am facing is : > > > > org.apache.manifoldcf.core.interfaces.ManifoldCFException: integrity > > constraint violation: unique constraint or index violation: > I1549774667196 > > at > > > > > org.apache.manifoldcf.core.database.DBInterfaceHSQLDB.reinterpretException(DBInterfaceHSQLDB.java:734) > > at > > > > > org.apache.manifoldcf.core.database.DBInterfaceHSQLDB.performModification(DBInterfaceHSQLDB.java:754) > > at > > > > > org.apache.manifoldcf.core.database.DBInterfaceHSQLDB.performInsert(DBInterfaceHSQLDB.java:230) > > at > > > > > org.apache.manifoldcf.core.database.BaseTable.performInsert(BaseTable.java:68) > > at > > > > > org.apache.manifoldcf.crawler.connmgr.ConnectorManager.registerConnector(ConnectorManager.java:172) > > at > > > > > org.apache.manifoldcf.crawler.system.ManifoldCF.registerConnectors(ManifoldCF.java:672) > > at > > > > > org.apache.manifoldcf.crawler.system.ManifoldCF.reregisterAllConnectors(ManifoldCF.java:160) > > at > > > > > org.apache.manifoldcf.jettyrunner.ManifoldCFJettyRunner.main(ManifoldCFJettyRunner.java:239) > > Caused by: java.sql.SQLIntegrityConstraintViolationException: integrity > > constraint violation: unique constraint or index violation: > I1549774667196 > > at org.hsqldb.jdbc.JDBCUtil.sqlException(Unknown Source) > > at org.hsqldb.jdbc.JDBCUtil.sqlException(Unknown Source) > > at org.hsqldb.jdbc.JDBCPreparedStatement.fetchResult(Unknown > > Source) > > at org.hsqldb.jdbc.JDBCPreparedStatement.executeUpdate(Unknown > > Source) > > at > > org.apache.manifoldcf.core.database.Database.execute(Database.java:916) > > at > > > > > org.apache.manifoldcf.core.database.Database$ExecuteQueryThread.run(Database.java:696) > > Caused by: org.hsqldb.HsqlException: integrity constraint violation: > unique > > constraint or index violation: I1549774667196 > > at org.hsqldb.error.Error.error(Unknown Source) > > at org.hsqldb.error.Error.error(Unknown Source) > > at org.hsqldb.index.IndexAVL.insert(Unknown Source) > > at org.hsqldb.persist.RowStoreAVL.indexRow(Unknown Source) > > at org.hsqldb.persist.RowStoreAVLDisk.indexRow(Unknown Source) > > at org.hsqldb.TransactionManagerMVCC.addInsertAction(Unknown > > Source) > > at org.hsqldb.Session.addInsertAction(Unknown Source) > > at org.hsqldb.Table.insertSingleRow(Unknown Source) > > at org.hsqldb.StatementDML.insertSingleRow(Unknown Source) > > at org.hsqldb.StatementInsert.getResult(Unknown Source) > > at org.hsqldb.StatementDMQL.execute(Unknown Source) > > at org.hsqldb.Session.executeCompiledStatement(Unknown Source) > > at org.hsqldb.Session.execute(Unknown Source) > > ... 4 more > > > > > > I am guessing my class-path would have a problem, but do not have a > > confidence. > > What is the cause of this error? > > > > I would appreciate for any of your help. > > > > > > Sincerely, > > Kaya > > >
[jira] [Resolved] (CONNECTORS-1584) regex documentation
[ https://issues.apache.org/jira/browse/CONNECTORS-1584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karl Wright resolved CONNECTORS-1584. - Resolution: Not A Problem > regex documentation > --- > > Key: CONNECTORS-1584 > URL: https://issues.apache.org/jira/browse/CONNECTORS-1584 > Project: ManifoldCF > Issue Type: Improvement > Components: Web connector >Affects Versions: ManifoldCF 2.12 >Reporter: Tim Steenbeke >Priority: Minor > > What type of regexs does manifold include and exclude support and also in > general regex support? > At the moment i'm using a web repository connection and an Elastic output > connection. > I'm trying to exclude urls that link to documents. > e.g. website.com/document/path/this.pdf and > website.com/document/path/other.PDF > The issue i'm having is that the regex that I have found so far doesn't work > case insensitive, so for every possible case i have to add a new line. > e.g.: > {code:java} > .*.pdf$ and .*.PDF$ and .*.Pdf and ... .{code} > Is it possible to add documentation what type of regex is able to be used or > maybe a tool to test your regex and see if it is supported by manifold ? > I tried mailing this question to > [u...@manifoldcf.apache.org|mailto:u...@manifoldcf.apache.org] but this mail > adress returns a failure notice. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
Re: Error integrity constraint violation
Hi Kaya, Database constraint violations, as you know, occur because you're trying to put more than one identical value into a table column that cannot have such a column. For the table in question, if you have the same class name for two different connectors, this would be what you'd expect. Karl On Sun, Feb 17, 2019 at 11:33 PM Kaya Ota wrote: > Hello, folks: > > I am new to ManifoldCF, and trying to make my own connector. > For now, I could successfully build ManifoldCF including my own connector. > However, when I tried to run, I have exceptions. > > The exception I am facing is : > > org.apache.manifoldcf.core.interfaces.ManifoldCFException: integrity > constraint violation: unique constraint or index violation: I1549774667196 > at > > org.apache.manifoldcf.core.database.DBInterfaceHSQLDB.reinterpretException(DBInterfaceHSQLDB.java:734) > at > > org.apache.manifoldcf.core.database.DBInterfaceHSQLDB.performModification(DBInterfaceHSQLDB.java:754) > at > > org.apache.manifoldcf.core.database.DBInterfaceHSQLDB.performInsert(DBInterfaceHSQLDB.java:230) > at > > org.apache.manifoldcf.core.database.BaseTable.performInsert(BaseTable.java:68) > at > > org.apache.manifoldcf.crawler.connmgr.ConnectorManager.registerConnector(ConnectorManager.java:172) > at > > org.apache.manifoldcf.crawler.system.ManifoldCF.registerConnectors(ManifoldCF.java:672) > at > > org.apache.manifoldcf.crawler.system.ManifoldCF.reregisterAllConnectors(ManifoldCF.java:160) > at > > org.apache.manifoldcf.jettyrunner.ManifoldCFJettyRunner.main(ManifoldCFJettyRunner.java:239) > Caused by: java.sql.SQLIntegrityConstraintViolationException: integrity > constraint violation: unique constraint or index violation: I1549774667196 > at org.hsqldb.jdbc.JDBCUtil.sqlException(Unknown Source) > at org.hsqldb.jdbc.JDBCUtil.sqlException(Unknown Source) > at org.hsqldb.jdbc.JDBCPreparedStatement.fetchResult(Unknown > Source) > at org.hsqldb.jdbc.JDBCPreparedStatement.executeUpdate(Unknown > Source) > at > org.apache.manifoldcf.core.database.Database.execute(Database.java:916) > at > > org.apache.manifoldcf.core.database.Database$ExecuteQueryThread.run(Database.java:696) > Caused by: org.hsqldb.HsqlException: integrity constraint violation: unique > constraint or index violation: I1549774667196 > at org.hsqldb.error.Error.error(Unknown Source) > at org.hsqldb.error.Error.error(Unknown Source) > at org.hsqldb.index.IndexAVL.insert(Unknown Source) > at org.hsqldb.persist.RowStoreAVL.indexRow(Unknown Source) > at org.hsqldb.persist.RowStoreAVLDisk.indexRow(Unknown Source) > at org.hsqldb.TransactionManagerMVCC.addInsertAction(Unknown > Source) > at org.hsqldb.Session.addInsertAction(Unknown Source) > at org.hsqldb.Table.insertSingleRow(Unknown Source) > at org.hsqldb.StatementDML.insertSingleRow(Unknown Source) > at org.hsqldb.StatementInsert.getResult(Unknown Source) > at org.hsqldb.StatementDMQL.execute(Unknown Source) > at org.hsqldb.Session.executeCompiledStatement(Unknown Source) > at org.hsqldb.Session.execute(Unknown Source) > ... 4 more > > > I am guessing my class-path would have a problem, but do not have a > confidence. > What is the cause of this error? > > I would appreciate for any of your help. > > > Sincerely, > Kaya >
[jira] [Commented] (CONNECTORS-1584) regex documentation
[ https://issues.apache.org/jira/browse/CONNECTORS-1584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16771462#comment-16771462 ] Karl Wright commented on CONNECTORS-1584: - The mailing list is us...@manifoldcf.apache.org. The regular expressions are standard Java regular expressions. The documentation is widely available. You can also experiment with regular expressions in a java applet online at: https://www.cis.upenn.edu/~matuszek/General/RegexTester/regex-tester.html > regex documentation > --- > > Key: CONNECTORS-1584 > URL: https://issues.apache.org/jira/browse/CONNECTORS-1584 > Project: ManifoldCF > Issue Type: Improvement > Components: Web connector >Affects Versions: ManifoldCF 2.12 >Reporter: Tim Steenbeke >Priority: Minor > > What type of regexs does manifold include and exclude support and also in > general regex support? > At the moment i'm using a web repository connection and an Elastic output > connection. > I'm trying to exclude urls that link to documents. > e.g. website.com/document/path/this.pdf and > website.com/document/path/other.PDF > The issue i'm having is that the regex that I have found so far doesn't work > case insensitive, so for every possible case i have to add a new line. > e.g.: > {code:java} > .*.pdf$ and .*.PDF$ and .*.Pdf and ... .{code} > Is it possible to add documentation what type of regex is able to be used or > maybe a tool to test your regex and see if it is supported by manifold ? > I tried mailing this question to > [u...@manifoldcf.apache.org|mailto:u...@manifoldcf.apache.org] but this mail > adress returns a failure notice. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (CONNECTORS-1585) MCF Admin page shows 404 error frequently
[ https://issues.apache.org/jira/browse/CONNECTORS-1585?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karl Wright resolved CONNECTORS-1585. - Resolution: Cannot Reproduce > MCF Admin page shows 404 error frequently > - > > Key: CONNECTORS-1585 > URL: https://issues.apache.org/jira/browse/CONNECTORS-1585 > Project: ManifoldCF > Issue Type: Task >Reporter: Pavithra Dhakshinamurthy >Priority: Critical > > Hi Team, > I'm getting 404 Page not found error on a frequent basis in Manifold CF home > page. Not able to trace any error logs as well. Please let me know on what > scenarios 404 error will occur. > http://{hostname}:8345/mcf-crawler-ui/login.jsp > Regards, > Pavithra D -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (CONNECTORS-1585) MCF Admin page shows 404 error frequently
[ https://issues.apache.org/jira/browse/CONNECTORS-1585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16771461#comment-16771461 ] Karl Wright commented on CONNECTORS-1585: - 404 errors have nothing to do with ManifoldCF. They have to do with your app server environment -- either that, or your network/proxy. MCF is just a web app and does not have any magic in it. > MCF Admin page shows 404 error frequently > - > > Key: CONNECTORS-1585 > URL: https://issues.apache.org/jira/browse/CONNECTORS-1585 > Project: ManifoldCF > Issue Type: Task >Reporter: Pavithra Dhakshinamurthy >Priority: Critical > > Hi Team, > I'm getting 404 Page not found error on a frequent basis in Manifold CF home > page. Not able to trace any error logs as well. Please let me know on what > scenarios 404 error will occur. > http://{hostname}:8345/mcf-crawler-ui/login.jsp > Regards, > Pavithra D -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (CONNECTORS-1564) Support preemptive authentication to Solr connector
[ https://issues.apache.org/jira/browse/CONNECTORS-1564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16771345#comment-16771345 ] Michael Osipov commented on CONNECTORS-1564: Go ahead and create that ticket! > Support preemptive authentication to Solr connector > --- > > Key: CONNECTORS-1564 > URL: https://issues.apache.org/jira/browse/CONNECTORS-1564 > Project: ManifoldCF > Issue Type: Improvement > Components: Lucene/SOLR connector >Reporter: Erlend Garåsen >Assignee: Karl Wright >Priority: Major > Attachments: CONNECTORS-1564.patch > > > We should post preemptively in case the Solr server requires basic > authentication. This will make the communication between ManifoldCF and Solr > much more effective instead of the following: > * Send a HTTP POST request to Solr > * Solr sends a 401 response > * Send the same request, but with a "{{Authorization: Basic}}" header > With preemptive authentication, we can send the header in the first request. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (CONNECTORS-1564) Support preemptive authentication to Solr connector
[ https://issues.apache.org/jira/browse/CONNECTORS-1564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16771090#comment-16771090 ] Erlend Garåsen commented on CONNECTORS-1564: [~michael-o], unfortunately not. No responses on my post to the Solr list. I'll get back to this in a couple of days. Perhaps I should just create a Solr ticket. I have been very busy the last days, but have more time to follow up in a few days. > Support preemptive authentication to Solr connector > --- > > Key: CONNECTORS-1564 > URL: https://issues.apache.org/jira/browse/CONNECTORS-1564 > Project: ManifoldCF > Issue Type: Improvement > Components: Lucene/SOLR connector >Reporter: Erlend Garåsen >Assignee: Karl Wright >Priority: Major > Attachments: CONNECTORS-1564.patch > > > We should post preemptively in case the Solr server requires basic > authentication. This will make the communication between ManifoldCF and Solr > much more effective instead of the following: > * Send a HTTP POST request to Solr > * Solr sends a 401 response > * Send the same request, but with a "{{Authorization: Basic}}" header > With preemptive authentication, we can send the header in the first request. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (CONNECTORS-1585) MCF Admin page shows 404 error frequently
Pavithra Dhakshinamurthy created CONNECTORS-1585: Summary: MCF Admin page shows 404 error frequently Key: CONNECTORS-1585 URL: https://issues.apache.org/jira/browse/CONNECTORS-1585 Project: ManifoldCF Issue Type: Task Reporter: Pavithra Dhakshinamurthy Hi Team, I'm getting 404 Page not found error on a frequent basis in Manifold CF home page. Not able to trace any error logs as well. Please let me know on what scenarios 404 error will occur. http://{hostname}:8345/mcf-crawler-ui/login.jsp Regards, Pavithra D -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (CONNECTORS-1584) regex documentation
[ https://issues.apache.org/jira/browse/CONNECTORS-1584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Steenbeke updated CONNECTORS-1584: -- Description: What type of regexs does manifold include and exclude support and also in general regex support? At the moment i'm using a web repository connection and an Elastic output connection. I'm trying to exclude urls that link to documents. e.g. website.com/document/path/this.pdf and website.com/document/path/other.PDF The issue i'm having is that the regex that I have found so far doesn't work case insensitive, so for every possible case i have to add a new line. e.g.: {code:java} .*.pdf$ and .*.PDF$ and .*.Pdf and ... .{code} Is it possible to add documentation what type of regex is able to be used or maybe a tool to test your regex and see if it is supported by manifold ? I tried mailing this question to [u...@manifoldcf.apache.org|mailto:u...@manifoldcf.apache.org] but this mail adress returns a failure notice. was: What type of regexs does manifold include and exclude support and also in general regex support? At the moment i'm using a web repository connection and an Elastic output connection. I'm trying to exclude urls that link to documents. e.g. website.com/document/path/this.pdf and website.com/document/path/other.PDF The issue i'm having is that the regex that I have found so far doesn't work case insensitive, so for every possible case i have to add a new line. e.g.: .*.pdf$ and .*.PDF$ and .*.Pdf and ... . Is it possible to add documentation what type of regex is able to be used or maybe a tool to test your regex and see if it is supported by manifold ? I tried mailing this question to [u...@manifoldcf.apache.org|mailto:u...@manifoldcf.apache.org] but this mail adress returns a failure notice. > regex documentation > --- > > Key: CONNECTORS-1584 > URL: https://issues.apache.org/jira/browse/CONNECTORS-1584 > Project: ManifoldCF > Issue Type: Improvement > Components: Web connector >Affects Versions: ManifoldCF 2.12 >Reporter: Tim Steenbeke >Priority: Minor > > What type of regexs does manifold include and exclude support and also in > general regex support? > At the moment i'm using a web repository connection and an Elastic output > connection. > I'm trying to exclude urls that link to documents. > e.g. website.com/document/path/this.pdf and > website.com/document/path/other.PDF > The issue i'm having is that the regex that I have found so far doesn't work > case insensitive, so for every possible case i have to add a new line. > e.g.: > {code:java} > .*.pdf$ and .*.PDF$ and .*.Pdf and ... .{code} > Is it possible to add documentation what type of regex is able to be used or > maybe a tool to test your regex and see if it is supported by manifold ? > I tried mailing this question to > [u...@manifoldcf.apache.org|mailto:u...@manifoldcf.apache.org] but this mail > adress returns a failure notice. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (CONNECTORS-1584) regex documentation
[ https://issues.apache.org/jira/browse/CONNECTORS-1584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Steenbeke updated CONNECTORS-1584: -- Description: What type of regexs does manifold include and exclude support and also in general regex support? At the moment i'm using a web repository connection and an Elastic output connection. I'm trying to exclude urls that link to documents. e.g. website.com/document/path/this.pdf and website.com/document/path/other.PDF The issue i'm having is that the regex that I have found so far doesn't work case insensitive, so for every possible case i have to add a new line. e.g.: .*.pdf$ and .*.PDF$ and .*.Pdf and ... . Is it possible to add documentation what type of regex is able to be used or maybe a tool to test your regex and see if it is supported by manifold ? I tried mailing this question to [u...@manifoldcf.apache.org|mailto:u...@manifoldcf.apache.org] but this mail adress returns a failure notice. was: What type of regexs does manifold include and exclude support and also in general regex support? At the moment i'm using a web repository connection and an Elastic output connection. I'm trying to exclude urls that link to documents. e.g. website.com/document/path/this.pdf and website.com/document/path/other.PDF The issue i'm having is that the regex that I have found so far doesn't work case insensitive, so for every possible case i have to add a new line. e.g.: .*.pdf$ and .*.PDF$ and .*.Pdf and ... . Is it possible to add documentation what type of regex is able to be used or maybe a tool to test your regex and see if it is supported by manifold ? I tried mailing this question to [u...@manifoldcf.apache.org|mailto:u...@manifoldcf.apache.org] but this mail adress returns a failure notice. > regex documentation > --- > > Key: CONNECTORS-1584 > URL: https://issues.apache.org/jira/browse/CONNECTORS-1584 > Project: ManifoldCF > Issue Type: Improvement > Components: Web connector >Affects Versions: ManifoldCF 2.12 >Reporter: Tim Steenbeke >Priority: Minor > > What type of regexs does manifold include and exclude support and also in > general regex support? > At the moment i'm using a web repository connection and an Elastic output > connection. > I'm trying to exclude urls that link to documents. > e.g. website.com/document/path/this.pdf and > website.com/document/path/other.PDF > The issue i'm having is that the regex that I have found so far doesn't work > case insensitive, so for every possible case i have to add a new line. > e.g.: .*.pdf$ and .*.PDF$ and .*.Pdf and ... . > Is it possible to add documentation what type of regex is able to be used or > maybe a tool to test your regex and see if it is supported by manifold ? > I tried mailing this question to > [u...@manifoldcf.apache.org|mailto:u...@manifoldcf.apache.org] but this mail > adress returns a failure notice. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (CONNECTORS-1584) regex documentation
Tim Steenbeke created CONNECTORS-1584: - Summary: regex documentation Key: CONNECTORS-1584 URL: https://issues.apache.org/jira/browse/CONNECTORS-1584 Project: ManifoldCF Issue Type: Improvement Components: Web connector Affects Versions: ManifoldCF 2.12 Reporter: Tim Steenbeke What type of regexs does manifold include and exclude support and also in general regex support? At the moment i'm using a web repository connection and an Elastic output connection. I'm trying to exclude urls that link to documents. e.g. website.com/document/path/this.pdf and website.com/document/path/other.PDF The issue i'm having is that the regex that I have found so far doesn't work case insensitive, so for every possible case i have to add a new line. e.g.: .*.pdf$ and .*.PDF$ and .*.Pdf and ... . Is it possible to add documentation what type of regex is able to be used or maybe a tool to test your regex and see if it is supported by manifold ? I tried mailing this question to [u...@manifoldcf.apache.org|mailto:u...@manifoldcf.apache.org] but this mail adress returns a failure notice. -- This message was sent by Atlassian JIRA (v7.6.3#76005)