[jira] [Commented] (CONNECTORS-1563) SolrException: org.apache.tika.exception.ZeroByteFileException: InputStream must have > 0 bytes

2019-02-18 Thread Karl Wright (JIRA)


[ 
https://issues.apache.org/jira/browse/CONNECTORS-1563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16771663#comment-16771663
 ] 

Karl Wright commented on CONNECTORS-1563:
-

Hi Subasini,

Are you now Tika-extracting in ManifoldCF, or in Solr?
The text field looks like it contains properly extracted content, along with 
other stuff you do not want.  Is this correct?

If the extraction is happening in Solr, then I have no idea what this is coming 
from.  If the extraction is happening in ManifoldCF, then if you have placed a 
Metadata Adjuster transformer in the pipeline between the Tika Extractor and 
the Solr Output Connector, I'd say you had set it up to concatenate many fields 
together into a text field.  The Metadata Adjuster has that ability.

The choice of how metadata (or content) fields get mapped to Solr schema is set 
up in your Solr output connection configuration.  The Tika extraction basically 
replaces a binary input document with a character-sequence output document plus 
metadata fields.  The character-sequence output document then must be sent to 
Solr not using the exracting update handler, but just the standard handler, so 
the handler should be changed from /update/extract to just /update, and the 
"Use extracting update handler" should be turned off.  The actual field name 
used for the extracted content body can also be changed, if desired, in the 
"Schema" part of the configuration.  But what is there by default works with 
Solr as it's set up by default.





> SolrException: org.apache.tika.exception.ZeroByteFileException: InputStream 
> must have > 0 bytes
> ---
>
> Key: CONNECTORS-1563
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1563
> Project: ManifoldCF
>  Issue Type: Task
>  Components: Lucene/SOLR connector
>Reporter: Sneha
>Assignee: Karl Wright
>Priority: Major
> Attachments: Document simple history.docx, managed-schema, manifold 
> settings.docx, manifoldcf.log, solr.log, solrconfig.xml
>
>
> I am encountering this problem:
> I have checked "Use the Extract Update Handler:" param then I am getting an 
> error on Solr i.e. null:org.apache.solr.common.SolrException: 
> org.apache.tika.exception.ZeroByteFileException: InputStream must have > 0 
> bytes
> If I ignore tika exception, my documents get indexed but dont have content 
> field on Solr.
> I am using Solr 7.3.1 and manifoldCF 2.8.1
> I am using solr cell and hence not configured external tika extractor in 
> manifoldCF pipeline
> Please help me with this problem
> Thanks in advance



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (CONNECTORS-1563) SolrException: org.apache.tika.exception.ZeroByteFileException: InputStream must have > 0 bytes

2019-02-18 Thread Subasini Rath (JIRA)


[ 
https://issues.apache.org/jira/browse/CONNECTORS-1563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16771607#comment-16771607
 ] 

Subasini Rath commented on CONNECTORS-1563:
---

Hi Karl,
Could you please guide me - to which field manifold writes the actual 
textual content of the document.

Currently I am using the _text_ field but it has been found that _text_ does 
not contain the actual data. Rather it added some extra values to the actual 
content.

In my managed-schema : 



After my indexing in Solr, the value looks like : (The first 4 lines are 
appended before the content of file)

"title":["NETWORK PLANNING\u"],
"_text_":[" \n \n stream_size 34070  \n X-Parsed-By 
org.apache.tika.parser.DefaultParser  \n X-Parsed-By 
org.apache.tika.parser.txt.TXTParser  \n stream_content_type application/pdf  
\n stream_name cs.exe?bmsdocid=9.2.1=eebms.docdownload  \n 
stream_source_info cs.exe?bmsdocid=9.2.1=eebms.docdownload  \n 
Content-Encoding UTF-8  \n resourceName 
cs.exe?bmsdocid=9.2.1=eebms.docdownload  \n Content-Type text/plain; 
charset=UTF-8  \n  \n \n  9.2.1 UNCONTROLLED IF PRINTED Page 1 of 13\nCompany 
Policy\nNETWORK\nDocument No Amendment No Approved By Approval Date Review 
Date\n: : : : :\n9.2.1 9 CEO 23/05/2016 23/05/2019\n9.2.1 NETWORK PLANNING\n1.0 
POLICY STATEMENT\nThe company will plan the expansion and augmentation of its 
electrical network to achieve levels of safety, reliability and quality of 
supply commensurate with community, regulator, customer and shareholder 
expectations.\nThe company will coordinate its planning with the NSW 
transmission utility Transgrid and neighbouring distribution utilities to 
develop effective solutions to satisfy load growth within the company’s supply 
area and in adjacent franchise areas where the company’s network has 
influence.\n2.0 PURPOSE\nTo provide principles for planning network



Thanks & Regards,
Subasini Rath
O: +91-33 6636-8889 
M: +91 983-1234-341
Email: subasini.r...@endeavourenergy.com.au



> SolrException: org.apache.tika.exception.ZeroByteFileException: InputStream 
> must have > 0 bytes
> ---
>
> Key: CONNECTORS-1563
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1563
> Project: ManifoldCF
>  Issue Type: Task
>  Components: Lucene/SOLR connector
>Reporter: Sneha
>Assignee: Karl Wright
>Priority: Major
> Attachments: Document simple history.docx, managed-schema, manifold 
> settings.docx, manifoldcf.log, solr.log, solrconfig.xml
>
>
> I am encountering this problem:
> I have checked "Use the Extract Update Handler:" param then I am getting an 
> error on Solr i.e. null:org.apache.solr.common.SolrException: 
> org.apache.tika.exception.ZeroByteFileException: InputStream must have > 0 
> bytes
> If I ignore tika exception, my documents get indexed but dont have content 
> field on Solr.
> I am using Solr 7.3.1 and manifoldCF 2.8.1
> I am using solr cell and hence not configured external tika extractor in 
> manifoldCF pipeline
> Please help me with this problem
> Thanks in advance



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: Error integrity constraint violation

2019-02-18 Thread Kayak28
Hello, Wright:

Thank you for your answer.

I had not edited my own connector's build.xml, which I had copied from
WebCrawler Connector.
After I edited the build.xml for changing my class name, MCF runs fine.

Again, I appreciate for your help.

Sincerely,
Kaya




2019年2月19日(火) 10:12 Karl Wright :

> Hi Kaya,
>
> Database constraint violations, as you know, occur because you're trying to
> put more than one identical value into a table column that cannot have such
> a column.  For the table in question, if you have the same class name for
> two different connectors, this would be what you'd expect.
>
> Karl
>
>
> On Sun, Feb 17, 2019 at 11:33 PM Kaya Ota  wrote:
>
> > Hello, folks:
> >
> > I am new to ManifoldCF, and trying to make my own connector.
> > For now, I could successfully build ManifoldCF including my own
> connector.
> > However, when I tried to run, I have exceptions.
> >
> > The exception I am facing is :
> >
> > org.apache.manifoldcf.core.interfaces.ManifoldCFException: integrity
> > constraint violation: unique constraint or index violation:
> I1549774667196
> > at
> >
> >
> org.apache.manifoldcf.core.database.DBInterfaceHSQLDB.reinterpretException(DBInterfaceHSQLDB.java:734)
> > at
> >
> >
> org.apache.manifoldcf.core.database.DBInterfaceHSQLDB.performModification(DBInterfaceHSQLDB.java:754)
> > at
> >
> >
> org.apache.manifoldcf.core.database.DBInterfaceHSQLDB.performInsert(DBInterfaceHSQLDB.java:230)
> > at
> >
> >
> org.apache.manifoldcf.core.database.BaseTable.performInsert(BaseTable.java:68)
> > at
> >
> >
> org.apache.manifoldcf.crawler.connmgr.ConnectorManager.registerConnector(ConnectorManager.java:172)
> > at
> >
> >
> org.apache.manifoldcf.crawler.system.ManifoldCF.registerConnectors(ManifoldCF.java:672)
> > at
> >
> >
> org.apache.manifoldcf.crawler.system.ManifoldCF.reregisterAllConnectors(ManifoldCF.java:160)
> > at
> >
> >
> org.apache.manifoldcf.jettyrunner.ManifoldCFJettyRunner.main(ManifoldCFJettyRunner.java:239)
> > Caused by: java.sql.SQLIntegrityConstraintViolationException: integrity
> > constraint violation: unique constraint or index violation:
> I1549774667196
> > at org.hsqldb.jdbc.JDBCUtil.sqlException(Unknown Source)
> > at org.hsqldb.jdbc.JDBCUtil.sqlException(Unknown Source)
> > at org.hsqldb.jdbc.JDBCPreparedStatement.fetchResult(Unknown
> > Source)
> > at org.hsqldb.jdbc.JDBCPreparedStatement.executeUpdate(Unknown
> > Source)
> > at
> > org.apache.manifoldcf.core.database.Database.execute(Database.java:916)
> > at
> >
> >
> org.apache.manifoldcf.core.database.Database$ExecuteQueryThread.run(Database.java:696)
> > Caused by: org.hsqldb.HsqlException: integrity constraint violation:
> unique
> > constraint or index violation: I1549774667196
> > at org.hsqldb.error.Error.error(Unknown Source)
> > at org.hsqldb.error.Error.error(Unknown Source)
> > at org.hsqldb.index.IndexAVL.insert(Unknown Source)
> > at org.hsqldb.persist.RowStoreAVL.indexRow(Unknown Source)
> > at org.hsqldb.persist.RowStoreAVLDisk.indexRow(Unknown Source)
> > at org.hsqldb.TransactionManagerMVCC.addInsertAction(Unknown
> > Source)
> > at org.hsqldb.Session.addInsertAction(Unknown Source)
> > at org.hsqldb.Table.insertSingleRow(Unknown Source)
> > at org.hsqldb.StatementDML.insertSingleRow(Unknown Source)
> > at org.hsqldb.StatementInsert.getResult(Unknown Source)
> > at org.hsqldb.StatementDMQL.execute(Unknown Source)
> > at org.hsqldb.Session.executeCompiledStatement(Unknown Source)
> > at org.hsqldb.Session.execute(Unknown Source)
> > ... 4 more
> >
> >
> > I am guessing my class-path would have a problem, but do not have a
> > confidence.
> > What is the cause of this error?
> >
> > I would appreciate for any of your help.
> >
> >
> > Sincerely,
> > Kaya
> >
>


[jira] [Resolved] (CONNECTORS-1584) regex documentation

2019-02-18 Thread Karl Wright (JIRA)


 [ 
https://issues.apache.org/jira/browse/CONNECTORS-1584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karl Wright resolved CONNECTORS-1584.
-
Resolution: Not A Problem

> regex documentation
> ---
>
> Key: CONNECTORS-1584
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1584
> Project: ManifoldCF
>  Issue Type: Improvement
>  Components: Web connector
>Affects Versions: ManifoldCF 2.12
>Reporter: Tim Steenbeke
>Priority: Minor
>
> What type of regexs does manifold include and exclude support and also in 
> general regex support?
> At the moment i'm using a web repository connection and an Elastic output 
> connection.
>  I'm trying to exclude urls that link to documents.
>           e.g. website.com/document/path/this.pdf and 
> website.com/document/path/other.PDF
> The issue i'm having is that the regex that I have found so far doesn't work 
> case insensitive, so for every possible case i have to add a new line.
>     e.g.:
> {code:java}
> .*.pdf$ and .*.PDF$ and .*.Pdf and ... .{code}
> Is it possible to add documentation what type of regex is able to be used or 
> maybe a tool to test your regex and see if it is supported by manifold ?
> I tried mailing this question to 
> [u...@manifoldcf.apache.org|mailto:u...@manifoldcf.apache.org] but this mail 
> adress returns a failure notice.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: Error integrity constraint violation

2019-02-18 Thread Karl Wright
Hi Kaya,

Database constraint violations, as you know, occur because you're trying to
put more than one identical value into a table column that cannot have such
a column.  For the table in question, if you have the same class name for
two different connectors, this would be what you'd expect.

Karl


On Sun, Feb 17, 2019 at 11:33 PM Kaya Ota  wrote:

> Hello, folks:
>
> I am new to ManifoldCF, and trying to make my own connector.
> For now, I could successfully build ManifoldCF including my own connector.
> However, when I tried to run, I have exceptions.
>
> The exception I am facing is :
>
> org.apache.manifoldcf.core.interfaces.ManifoldCFException: integrity
> constraint violation: unique constraint or index violation: I1549774667196
> at
>
> org.apache.manifoldcf.core.database.DBInterfaceHSQLDB.reinterpretException(DBInterfaceHSQLDB.java:734)
> at
>
> org.apache.manifoldcf.core.database.DBInterfaceHSQLDB.performModification(DBInterfaceHSQLDB.java:754)
> at
>
> org.apache.manifoldcf.core.database.DBInterfaceHSQLDB.performInsert(DBInterfaceHSQLDB.java:230)
> at
>
> org.apache.manifoldcf.core.database.BaseTable.performInsert(BaseTable.java:68)
> at
>
> org.apache.manifoldcf.crawler.connmgr.ConnectorManager.registerConnector(ConnectorManager.java:172)
> at
>
> org.apache.manifoldcf.crawler.system.ManifoldCF.registerConnectors(ManifoldCF.java:672)
> at
>
> org.apache.manifoldcf.crawler.system.ManifoldCF.reregisterAllConnectors(ManifoldCF.java:160)
> at
>
> org.apache.manifoldcf.jettyrunner.ManifoldCFJettyRunner.main(ManifoldCFJettyRunner.java:239)
> Caused by: java.sql.SQLIntegrityConstraintViolationException: integrity
> constraint violation: unique constraint or index violation: I1549774667196
> at org.hsqldb.jdbc.JDBCUtil.sqlException(Unknown Source)
> at org.hsqldb.jdbc.JDBCUtil.sqlException(Unknown Source)
> at org.hsqldb.jdbc.JDBCPreparedStatement.fetchResult(Unknown
> Source)
> at org.hsqldb.jdbc.JDBCPreparedStatement.executeUpdate(Unknown
> Source)
> at
> org.apache.manifoldcf.core.database.Database.execute(Database.java:916)
> at
>
> org.apache.manifoldcf.core.database.Database$ExecuteQueryThread.run(Database.java:696)
> Caused by: org.hsqldb.HsqlException: integrity constraint violation: unique
> constraint or index violation: I1549774667196
> at org.hsqldb.error.Error.error(Unknown Source)
> at org.hsqldb.error.Error.error(Unknown Source)
> at org.hsqldb.index.IndexAVL.insert(Unknown Source)
> at org.hsqldb.persist.RowStoreAVL.indexRow(Unknown Source)
> at org.hsqldb.persist.RowStoreAVLDisk.indexRow(Unknown Source)
> at org.hsqldb.TransactionManagerMVCC.addInsertAction(Unknown
> Source)
> at org.hsqldb.Session.addInsertAction(Unknown Source)
> at org.hsqldb.Table.insertSingleRow(Unknown Source)
> at org.hsqldb.StatementDML.insertSingleRow(Unknown Source)
> at org.hsqldb.StatementInsert.getResult(Unknown Source)
> at org.hsqldb.StatementDMQL.execute(Unknown Source)
> at org.hsqldb.Session.executeCompiledStatement(Unknown Source)
> at org.hsqldb.Session.execute(Unknown Source)
> ... 4 more
>
>
> I am guessing my class-path would have a problem, but do not have a
> confidence.
> What is the cause of this error?
>
> I would appreciate for any of your help.
>
>
> Sincerely,
> Kaya
>


[jira] [Commented] (CONNECTORS-1584) regex documentation

2019-02-18 Thread Karl Wright (JIRA)


[ 
https://issues.apache.org/jira/browse/CONNECTORS-1584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16771462#comment-16771462
 ] 

Karl Wright commented on CONNECTORS-1584:
-

The mailing list is us...@manifoldcf.apache.org.

The regular expressions are standard Java regular expressions.  The 
documentation is widely available.  You can also experiment with regular 
expressions in a java applet online at: 
https://www.cis.upenn.edu/~matuszek/General/RegexTester/regex-tester.html


> regex documentation
> ---
>
> Key: CONNECTORS-1584
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1584
> Project: ManifoldCF
>  Issue Type: Improvement
>  Components: Web connector
>Affects Versions: ManifoldCF 2.12
>Reporter: Tim Steenbeke
>Priority: Minor
>
> What type of regexs does manifold include and exclude support and also in 
> general regex support?
> At the moment i'm using a web repository connection and an Elastic output 
> connection.
>  I'm trying to exclude urls that link to documents.
>           e.g. website.com/document/path/this.pdf and 
> website.com/document/path/other.PDF
> The issue i'm having is that the regex that I have found so far doesn't work 
> case insensitive, so for every possible case i have to add a new line.
>     e.g.:
> {code:java}
> .*.pdf$ and .*.PDF$ and .*.Pdf and ... .{code}
> Is it possible to add documentation what type of regex is able to be used or 
> maybe a tool to test your regex and see if it is supported by manifold ?
> I tried mailing this question to 
> [u...@manifoldcf.apache.org|mailto:u...@manifoldcf.apache.org] but this mail 
> adress returns a failure notice.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (CONNECTORS-1585) MCF Admin page shows 404 error frequently

2019-02-18 Thread Karl Wright (JIRA)


 [ 
https://issues.apache.org/jira/browse/CONNECTORS-1585?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karl Wright resolved CONNECTORS-1585.
-
Resolution: Cannot Reproduce

> MCF Admin page shows 404 error frequently
> -
>
> Key: CONNECTORS-1585
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1585
> Project: ManifoldCF
>  Issue Type: Task
>Reporter: Pavithra Dhakshinamurthy
>Priority: Critical
>
> Hi Team,
> I'm getting 404 Page not found error on a frequent basis in Manifold CF home 
> page. Not able to trace any error logs as well. Please let me know on what 
> scenarios 404 error will occur.
> http://{hostname}:8345/mcf-crawler-ui/login.jsp
> Regards,
> Pavithra D



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (CONNECTORS-1585) MCF Admin page shows 404 error frequently

2019-02-18 Thread Karl Wright (JIRA)


[ 
https://issues.apache.org/jira/browse/CONNECTORS-1585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16771461#comment-16771461
 ] 

Karl Wright commented on CONNECTORS-1585:
-

404 errors have nothing to do with ManifoldCF.  They have to do with your app 
server environment -- either that, or your network/proxy.  MCF is just a web 
app and does not have any magic in it.


> MCF Admin page shows 404 error frequently
> -
>
> Key: CONNECTORS-1585
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1585
> Project: ManifoldCF
>  Issue Type: Task
>Reporter: Pavithra Dhakshinamurthy
>Priority: Critical
>
> Hi Team,
> I'm getting 404 Page not found error on a frequent basis in Manifold CF home 
> page. Not able to trace any error logs as well. Please let me know on what 
> scenarios 404 error will occur.
> http://{hostname}:8345/mcf-crawler-ui/login.jsp
> Regards,
> Pavithra D



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (CONNECTORS-1564) Support preemptive authentication to Solr connector

2019-02-18 Thread Michael Osipov (JIRA)


[ 
https://issues.apache.org/jira/browse/CONNECTORS-1564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16771345#comment-16771345
 ] 

Michael Osipov commented on CONNECTORS-1564:


Go ahead and create that ticket!

> Support preemptive authentication to Solr connector
> ---
>
> Key: CONNECTORS-1564
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1564
> Project: ManifoldCF
>  Issue Type: Improvement
>  Components: Lucene/SOLR connector
>Reporter: Erlend Garåsen
>Assignee: Karl Wright
>Priority: Major
> Attachments: CONNECTORS-1564.patch
>
>
> We should post preemptively in case the Solr server requires basic 
> authentication. This will make the communication between ManifoldCF and Solr 
> much more effective instead of the following:
>  * Send a HTTP POST request to Solr
>  * Solr sends a 401 response
>  * Send the same request, but with a "{{Authorization: Basic}}" header
> With preemptive authentication, we can send the header in the first request.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (CONNECTORS-1564) Support preemptive authentication to Solr connector

2019-02-18 Thread JIRA


[ 
https://issues.apache.org/jira/browse/CONNECTORS-1564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16771090#comment-16771090
 ] 

Erlend Garåsen commented on CONNECTORS-1564:


[~michael-o], unfortunately not. No responses on my post to the Solr list. I'll 
get back to this in a couple of days. Perhaps I should just create a Solr 
ticket. I have been very busy the last days, but have more time to follow up in 
a few days.

> Support preemptive authentication to Solr connector
> ---
>
> Key: CONNECTORS-1564
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1564
> Project: ManifoldCF
>  Issue Type: Improvement
>  Components: Lucene/SOLR connector
>Reporter: Erlend Garåsen
>Assignee: Karl Wright
>Priority: Major
> Attachments: CONNECTORS-1564.patch
>
>
> We should post preemptively in case the Solr server requires basic 
> authentication. This will make the communication between ManifoldCF and Solr 
> much more effective instead of the following:
>  * Send a HTTP POST request to Solr
>  * Solr sends a 401 response
>  * Send the same request, but with a "{{Authorization: Basic}}" header
> With preemptive authentication, we can send the header in the first request.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (CONNECTORS-1585) MCF Admin page shows 404 error frequently

2019-02-18 Thread Pavithra Dhakshinamurthy (JIRA)
Pavithra Dhakshinamurthy created CONNECTORS-1585:


 Summary: MCF Admin page shows 404 error frequently
 Key: CONNECTORS-1585
 URL: https://issues.apache.org/jira/browse/CONNECTORS-1585
 Project: ManifoldCF
  Issue Type: Task
Reporter: Pavithra Dhakshinamurthy


Hi Team,

I'm getting 404 Page not found error on a frequent basis in Manifold CF home 
page. Not able to trace any error logs as well. Please let me know on what 
scenarios 404 error will occur.

http://{hostname}:8345/mcf-crawler-ui/login.jsp

Regards,
Pavithra D



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (CONNECTORS-1584) regex documentation

2019-02-18 Thread Tim Steenbeke (JIRA)


 [ 
https://issues.apache.org/jira/browse/CONNECTORS-1584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Steenbeke updated CONNECTORS-1584:
--
Description: 
What type of regexs does manifold include and exclude support and also in 
general regex support?

At the moment i'm using a web repository connection and an Elastic output 
connection.
 I'm trying to exclude urls that link to documents.

          e.g. website.com/document/path/this.pdf and 
website.com/document/path/other.PDF

The issue i'm having is that the regex that I have found so far doesn't work 
case insensitive, so for every possible case i have to add a new line.
    e.g.:
{code:java}
.*.pdf$ and .*.PDF$ and .*.Pdf and ... .{code}
Is it possible to add documentation what type of regex is able to be used or 
maybe a tool to test your regex and see if it is supported by manifold ?

I tried mailing this question to 
[u...@manifoldcf.apache.org|mailto:u...@manifoldcf.apache.org] but this mail 
adress returns a failure notice.

  was:
What type of regexs does manifold include and exclude support and also in 
general regex support?

At the moment i'm using a web repository connection and an Elastic output 
connection.
 I'm trying to exclude urls that link to documents.

          e.g. website.com/document/path/this.pdf and 
website.com/document/path/other.PDF

The issue i'm having is that the regex that I have found so far doesn't work 
case insensitive, so for every possible case i have to add a new line.
    e.g.: .*.pdf$ and .*.PDF$ and .*.Pdf and ... .

Is it possible to add documentation what type of regex is able to be used or 
maybe a tool to test your regex and see if it is supported by manifold ?

I tried mailing this question to 
[u...@manifoldcf.apache.org|mailto:u...@manifoldcf.apache.org] but this mail 
adress returns a failure notice.


> regex documentation
> ---
>
> Key: CONNECTORS-1584
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1584
> Project: ManifoldCF
>  Issue Type: Improvement
>  Components: Web connector
>Affects Versions: ManifoldCF 2.12
>Reporter: Tim Steenbeke
>Priority: Minor
>
> What type of regexs does manifold include and exclude support and also in 
> general regex support?
> At the moment i'm using a web repository connection and an Elastic output 
> connection.
>  I'm trying to exclude urls that link to documents.
>           e.g. website.com/document/path/this.pdf and 
> website.com/document/path/other.PDF
> The issue i'm having is that the regex that I have found so far doesn't work 
> case insensitive, so for every possible case i have to add a new line.
>     e.g.:
> {code:java}
> .*.pdf$ and .*.PDF$ and .*.Pdf and ... .{code}
> Is it possible to add documentation what type of regex is able to be used or 
> maybe a tool to test your regex and see if it is supported by manifold ?
> I tried mailing this question to 
> [u...@manifoldcf.apache.org|mailto:u...@manifoldcf.apache.org] but this mail 
> adress returns a failure notice.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (CONNECTORS-1584) regex documentation

2019-02-18 Thread Tim Steenbeke (JIRA)


 [ 
https://issues.apache.org/jira/browse/CONNECTORS-1584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Steenbeke updated CONNECTORS-1584:
--
Description: 
What type of regexs does manifold include and exclude support and also in 
general regex support?

At the moment i'm using a web repository connection and an Elastic output 
connection.
 I'm trying to exclude urls that link to documents.

          e.g. website.com/document/path/this.pdf and 
website.com/document/path/other.PDF

The issue i'm having is that the regex that I have found so far doesn't work 
case insensitive, so for every possible case i have to add a new line.
    e.g.: .*.pdf$ and .*.PDF$ and .*.Pdf and ... .

Is it possible to add documentation what type of regex is able to be used or 
maybe a tool to test your regex and see if it is supported by manifold ?

I tried mailing this question to 
[u...@manifoldcf.apache.org|mailto:u...@manifoldcf.apache.org] but this mail 
adress returns a failure notice.

  was:
What type of regexs does manifold include and exclude support and also in 
general regex support?

At the moment i'm using a web repository connection and an Elastic output 
connection.
I'm trying to exclude urls that link to documents.

          e.g. website.com/document/path/this.pdf and 
website.com/document/path/other.PDF

The issue i'm having is that the regex that I have found so far doesn't work 
case insensitive, so for every possible case i have to add a new line.
   e.g.: .*.pdf$ and .*.PDF$ and .*.Pdf and ... .

Is it possible to add documentation what type of regex is able to be used or 
maybe a tool to test your regex and see if it is supported by manifold ?

I tried mailing this question to 
[u...@manifoldcf.apache.org|mailto:u...@manifoldcf.apache.org] but this mail 
adress returns a failure notice.


> regex documentation
> ---
>
> Key: CONNECTORS-1584
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1584
> Project: ManifoldCF
>  Issue Type: Improvement
>  Components: Web connector
>Affects Versions: ManifoldCF 2.12
>Reporter: Tim Steenbeke
>Priority: Minor
>
> What type of regexs does manifold include and exclude support and also in 
> general regex support?
> At the moment i'm using a web repository connection and an Elastic output 
> connection.
>  I'm trying to exclude urls that link to documents.
>           e.g. website.com/document/path/this.pdf and 
> website.com/document/path/other.PDF
> The issue i'm having is that the regex that I have found so far doesn't work 
> case insensitive, so for every possible case i have to add a new line.
>     e.g.: .*.pdf$ and .*.PDF$ and .*.Pdf and ... .
> Is it possible to add documentation what type of regex is able to be used or 
> maybe a tool to test your regex and see if it is supported by manifold ?
> I tried mailing this question to 
> [u...@manifoldcf.apache.org|mailto:u...@manifoldcf.apache.org] but this mail 
> adress returns a failure notice.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (CONNECTORS-1584) regex documentation

2019-02-18 Thread Tim Steenbeke (JIRA)
Tim Steenbeke created CONNECTORS-1584:
-

 Summary: regex documentation
 Key: CONNECTORS-1584
 URL: https://issues.apache.org/jira/browse/CONNECTORS-1584
 Project: ManifoldCF
  Issue Type: Improvement
  Components: Web connector
Affects Versions: ManifoldCF 2.12
Reporter: Tim Steenbeke


What type of regexs does manifold include and exclude support and also in 
general regex support?

At the moment i'm using a web repository connection and an Elastic output 
connection.
I'm trying to exclude urls that link to documents.

          e.g. website.com/document/path/this.pdf and 
website.com/document/path/other.PDF

The issue i'm having is that the regex that I have found so far doesn't work 
case insensitive, so for every possible case i have to add a new line.
   e.g.: .*.pdf$ and .*.PDF$ and .*.Pdf and ... .

Is it possible to add documentation what type of regex is able to be used or 
maybe a tool to test your regex and see if it is supported by manifold ?

I tried mailing this question to 
[u...@manifoldcf.apache.org|mailto:u...@manifoldcf.apache.org] but this mail 
adress returns a failure notice.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)