from:"Karl Wright"

[jira] [Commented] (CONNECTORS-1588) Custom Jcifs Properties

2019-02-28 Thread Karl Wright (JIRA)



[ 
https://issues.apache.org/jira/browse/CONNECTORS-1588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16780473#comment-16780473
 ] 

Karl Wright commented on CONNECTORS-1588:
-

Patch looks fine.  I'll commit it.


> Custom Jcifs Properties
> ---
>
> Key: CONNECTORS-1588
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1588
> Project: ManifoldCF
>  Issue Type: Improvement
>  Components: JCIFS connector
>Affects Versions: ManifoldCF 2.12
>Reporter: Cihad Guzel
>Assignee: Karl Wright
>Priority: Major
> Fix For: ManifoldCF 2.13
>
> Attachments: CONNECTORS-1588
>
>
> In some cases, "jcifs" is running slowly. In order to solve this problem, we 
> need to set custom some properties. 
>   
>  For example; my problem was in my test environment: I have a windows server 
> and an ubuntu server in same network in AWS EC2 Service. The windows server 
> has Active Directory service, DNS Server and shared folder while the ubuntu 
> server has some instance such as manifoldcf, an db instance and solr. 
>   
>  If the DNS settings are not defined on the ubuntu server, jcifs runs slowly. 
> Because the default resolver order is set as 'LMHOSTS,DNS,WINS'. It means[1] 
> ; firstly "jcifs" checks '/etc/hosts' files for linux/unix server'', then it 
> checks the DNS server. In my opinion, the linux server doesn't recognize the 
> DNS server and threads are waiting for every file for access to read.
>   
>  I suppose, WINS is used when accessing hosts on different subnets. So, I 
> have set "jcifs.resolveOrder = WINS" and my problem has been FIXED. 
>   
>  Another suggestion for similar problem from [another 
> example|https://stackoverflow.com/a/18837754] : "-Djcifs.resolveOrder = DNS"
>   
>  We need to set custom resolveOrder variable.
> ^[1]^ [https://www.jcifs.org/src/docs/resolver.html] 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (CONNECTORS-1563) SolrException: org.apache.tika.exception.ZeroByteFileException: InputStream must have > 0 bytes

2019-02-28 Thread Karl Wright (JIRA)



[ 
https://issues.apache.org/jira/browse/CONNECTORS-1563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16780373#comment-16780373
 ] 

Karl Wright commented on CONNECTORS-1563:
-

Hi [~Subasini],

The "excluded mime types" that you set are meant to exclude documents 
*entirely*, so changing that setting has no effect on *how* documents are 
indexed.  You can look at the Simple History report to verify that this is 
taking place as you desire, because most connectors create a record when they 
reject a document for any reason.  The Web Connector is no exception.


> SolrException: org.apache.tika.exception.ZeroByteFileException: InputStream 
> must have > 0 bytes
> ---
>
> Key: CONNECTORS-1563
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1563
> Project: ManifoldCF
>  Issue Type: Task
>  Components: Lucene/SOLR connector
>Reporter: Sneha
>Assignee: Karl Wright
>Priority: Major
> Attachments: Document simple history.docx, Manifold and Solr 
> settings_CustomField.docx, managed-schema, manifold settings.docx, 
> manifoldcf.log, path.png, schema.png, solr.log, solrconfig.xml
>
>
> I am encountering this problem:
> I have checked "Use the Extract Update Handler:" param then I am getting an 
> error on Solr i.e. null:org.apache.solr.common.SolrException: 
> org.apache.tika.exception.ZeroByteFileException: InputStream must have > 0 
> bytes
> If I ignore tika exception, my documents get indexed but dont have content 
> field on Solr.
> I am using Solr 7.3.1 and manifoldCF 2.8.1
> I am using solr cell and hence not configured external tika extractor in 
> manifoldCF pipeline
> Please help me with this problem
> Thanks in advance



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (CONNECTORS-1587) Unable to Crawl Documents Meta data

2019-02-26 Thread Karl Wright (JIRA)



[ 
https://issues.apache.org/jira/browse/CONNECTORS-1587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16778989#comment-16778989
 ] 

Karl Wright commented on CONNECTORS-1587:
-

It is simple; the crawler is requesting more metadata columns at one time than 
your Sharepoint instance is allowed to respond to.  This is a SharePoint 
configuration issue, apparently, although it is one I've never heard of before. 
 It's certainly *not* a SharePoint Connector issue, unless there's some 
hard-wired Microsoft limit that you are up against.


> Unable to Crawl Documents Meta data
> ---
>
> Key: CONNECTORS-1587
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1587
> Project: ManifoldCF
>  Issue Type: Bug
>Reporter: Pavithra Dhakshinamurthy
>Priority: Major
>
> I tried to crawl the meta data of Document section. but cannot able to crawl 
> the data.
> I have facing error stating that " The query cannot be completed because the 
> number of lookup columns it contains exceeds the lookup column threshold 
> enforced by the administrator."
> How can I resolve this issue.Is there any config needs for that.  
> Please assist the same.
> While checking for documentation mentioned the meta data contents as 
> drop-down type but my connector(Manifold 2.9.1)  is different. Is there any 
> version update is there for this lookup.
> Thanks.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Resolved] (CONNECTORS-1587) Unable to Crawl Documents Meta data

2019-02-26 Thread Karl Wright (JIRA)



 [ 
https://issues.apache.org/jira/browse/CONNECTORS-1587?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karl Wright resolved CONNECTORS-1587.
-
Resolution: Invalid

Not a ManifoldCF bug

> Unable to Crawl Documents Meta data
> ---
>
> Key: CONNECTORS-1587
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1587
> Project: ManifoldCF
>  Issue Type: Bug
>Reporter: Pavithra Dhakshinamurthy
>Priority: Major
>
> I tried to crawl the meta data of Document section. but cannot able to crawl 
> the data.
> I have facing error stating that " The query cannot be completed because the 
> number of lookup columns it contains exceeds the lookup column threshold 
> enforced by the administrator."
> How can I resolve this issue.Is there any config needs for that.  
> Please assist the same.
> While checking for documentation mentioned the meta data contents as 
> drop-down type but my connector(Manifold 2.9.1)  is different. Is there any 
> version update is there for this lookup.
> Thanks.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (CONNECTORS-1564) Support preemptive authentication to Solr connector

2019-02-26 Thread Karl Wright (JIRA)



[ 
https://issues.apache.org/jira/browse/CONNECTORS-1564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16778159#comment-16778159
 ] 

Karl Wright commented on CONNECTORS-1564:
-

[~erlendfg], if ModifiedHttpSolrClient overrides this setting already, then I 
don't understand why it isn't working, unless the override isn't setting it.  
Is that the case?  If so, then the obvious fix is to just set it there.



> Support preemptive authentication to Solr connector
> ---
>
> Key: CONNECTORS-1564
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1564
> Project: ManifoldCF
>  Issue Type: Improvement
>  Components: Lucene/SOLR connector
>Reporter: Erlend Garåsen
>    Assignee: Karl Wright
>Priority: Major
> Attachments: CONNECTORS-1564.patch
>
>
> We should post preemptively in case the Solr server requires basic 
> authentication. This will make the communication between ManifoldCF and Solr 
> much more effective instead of the following:
>  * Send a HTTP POST request to Solr
>  * Solr sends a 401 response
>  * Send the same request, but with a "{{Authorization: Basic}}" header
> With preemptive authentication, we can send the header in the first request.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Re: custom jcifs properties

2019-02-25 Thread Karl Wright

Hi Cihad,

I am usually not fond of solutions where all connections of a given type
are affected by a single environment change.  In this case, there is no
good way to make the change connection-specific.

I would insist, though, that any change to the functioning of the system be
backwards-compatible.  That means that even if there are no changes to
start-options.env, the default behavior is still as it was.  I can think of
a way to do that: basically, in the static block, you should check to see
if the property is already set before setting it (if it isn't).

Karl


On Mon, Feb 25, 2019 at 2:49 PM Cihad Guzel  wrote:

> Hi Karl,
>
> In some cases, "jcifs" is running slowly. In order to solve this problem,
> we need to set custom some properties.
>
> For example; my problem was in my test environment: I have a windows
> server and an ubuntu server in same network in AWS EC2 Service. The windows
> server has Active Directory service, DNS Server and shared folder while the
> ubuntu server has some instance such as manifoldcf, an db instance and
> solr.
>
> If the DNS settings are not defined on the ubuntu server, jcifs runs
> slowly. Because the default resolver order is set as 'LMHOSTS,DNS,WINS'. It
> means[1]; firstly "jcifs" checks '/etc/hosts' files for linux/unix
> server'', then it checks the DNS server. In my opinion, the linux server
> doesn't recognize the DNS server and threads are waiting for every file for
> access to read.
>
> I suppose, WINS is used when accessing hosts on different subnets. So, I
> have set "jcifs.resolveOrder = WINS" and my problem has been FIXED.
>
> I suppose, WINS is used when accessing hosts on different subnets.
>
> Another suggestion for similar problem from another example[2]:
> "-Djcifs.resolveOrder = DNS"
>
> Finally; I suggest these changes:
>
> Remove the line
> (System.setProperty("jcifs.resolveOrder","LMHOSTS,DNS,WINS"); )  from
> SharedDriveConnector.java
>
> Add "-Djcifs.resolveOrder = LMHOSTS,DNS,WINS" to "start-options.env" file.
>
> If you have been convinced about this, I can create a PR.
>
> [1] https://www.jcifs.org/src/docs/resolver.html
> [2] https://stackoverflow.com/a/18837754
>
> Regards,
> Cihad Guzel
>
> Karl Wright , 24 Şub 2019 Paz, 19:20 tarihinde şunu
> yazdı:
>
>> These settings were provided by the developer of jcifs, Michael Allen.
>> You have to really understand the protocol well before you should consider
>> changing them in any way.
>>
>> Thanks,
>> Karl
>>
>>
>> On Sun, Feb 24, 2019 at 9:53 AM Cihad Guzel  wrote:
>>
>>> Hi,
>>>
>>> SharedDriveConnector have some hardcoded system properties as follow:
>>>
>>> static
>>> {
>>>   System.setProperty("jcifs.smb.client.soTimeout","15");
>>>   System.setProperty("jcifs.smb.client.responseTimeout","12");
>>>   System.setProperty("jcifs.resolveOrder","LMHOSTS,DNS,WINS");
>>>   System.setProperty("jcifs.smb.client.listCount","20");
>>>   System.setProperty("jcifs.smb.client.dfs.strictView","true");
>>> }
>>>
>>> How can I override them when to start manifoldcf?
>>>
>>> It may be better to define these settings in the start-options.env file.
>>>
>>> Regards,
>>> Cihad Guzel
>>>
>>

Re: custom jcifs properties

2019-02-24 Thread Karl Wright

These settings were provided by the developer of jcifs, Michael Allen.  You
have to really understand the protocol well before you should consider
changing them in any way.

Thanks,
Karl


On Sun, Feb 24, 2019 at 9:53 AM Cihad Guzel  wrote:

> Hi,
>
> SharedDriveConnector have some hardcoded system properties as follow:
>
> static
> {
>   System.setProperty("jcifs.smb.client.soTimeout","15");
>   System.setProperty("jcifs.smb.client.responseTimeout","12");
>   System.setProperty("jcifs.resolveOrder","LMHOSTS,DNS,WINS");
>   System.setProperty("jcifs.smb.client.listCount","20");
>   System.setProperty("jcifs.smb.client.dfs.strictView","true");
> }
>
> How can I override them when to start manifoldcf?
>
> It may be better to define these settings in the start-options.env file.
>
> Regards,
> Cihad Guzel
>

[jira] [Commented] (CONNECTORS-1587) Unable to Crawl Documents Meta data

2019-02-22 Thread Karl Wright (JIRA)



[ 
https://issues.apache.org/jira/browse/CONNECTORS-1587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16775088#comment-16775088
 ] 

Karl Wright commented on CONNECTORS-1587:
-

Can you amend your ticket to tell us what connectors you are using for your 
job?  This ticket is very nearly incomprehensible, and unless it is amended I 
will close it on that basis.


> Unable to Crawl Documents Meta data
> ---
>
> Key: CONNECTORS-1587
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1587
> Project: ManifoldCF
>  Issue Type: Bug
>Reporter: Pavithra Dhakshinamurthy
>Priority: Major
>
> I tried to crawl the meta data of Document section. but cannot able to crawl 
> the data.
> I have facing error stating that " The query cannot be completed because the 
> number of lookup columns it contains exceeds the lookup column threshold 
> enforced by the administrator."
> How can I resolve this issue.Is there any config needs for that.  
> Please assist the same.
> While checking for documentation mentioned the meta data contents as 
> drop-down type but my connector(Manifold 2.9.1)  is different. Is there any 
> version update is there for this lookup.
> Thanks.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Re: ManifoldCF Website Links

2019-02-22 Thread Karl Wright

We haven't been updating "latest" since at least 2016.  We now only include
actual releases.

Honestly, how do you tell Google to reindex??



On Fri, Feb 22, 2019 at 3:59 AM Furkan KAMACI 
wrote:

> Hep about adding a path as latest i.e.:
>
> https://manifoldcf.apache.org/release/latest/en_US/performance-tuning.html
>
>
> 22 Şub 2019 Cum, saat 11:34 tarihinde Karl Wright 
> şunu
> yazdı:
>
> > Hi Furkan,
> >
> > I am not sure why Google maintains these dead links but we simply cannot
> > publish doc for every release going back to 2012.  Generally we cycle
> > releases and include the last two for each major release.  We include the
> > 1.10 docs as well as the 2.12 and 2.11 docs right now.  It is
> prohibitively
> > expensive to include more than that; doing so would make it incrementally
> > harder to update the site, and it's already not easy.
> >
> > Thanks,
> > Karl
> >
> >
> > On Fri, Feb 22, 2019 at 2:34 AM Furkan KAMACI 
> > wrote:
> >
> > > Hi All,
> > >
> > > When we search something on ManifoldCF on Google we get results
> something
> > > like that:
> > >
> > >
> > >
> >
> https://manifoldcf.apache.org/release/release-2.9.1/en_US/performance-tuning.html
> > >
> > > However, such links are broken. Can we fix it someway i.e. creating a
> > path
> > > for latest release?
> > >
> > > Kind Regards,
> > > Furkan KAMACI
> > >
> >
>

Re: ManifoldCF Website Links

2019-02-22 Thread Karl Wright

Hi Furkan,

I am not sure why Google maintains these dead links but we simply cannot
publish doc for every release going back to 2012.  Generally we cycle
releases and include the last two for each major release.  We include the
1.10 docs as well as the 2.12 and 2.11 docs right now.  It is prohibitively
expensive to include more than that; doing so would make it incrementally
harder to update the site, and it's already not easy.

Thanks,
Karl

On Fri, Feb 22, 2019 at 2:34 AM Furkan KAMACI 
wrote:

> Hi All,
>
> When we search something on ManifoldCF on Google we get results something
> like that:
>
>
> https://manifoldcf.apache.org/release/release-2.9.1/en_US/performance-tuning.html
>
> However, such links are broken. Can we fix it someway i.e. creating a path
> for latest release?
>
> Kind Regards,
> Furkan KAMACI
>

[jira] [Commented] (CONNECTORS-1584) regex documentation

2019-02-21 Thread Karl Wright (JIRA)



[ 
https://issues.apache.org/jira/browse/CONNECTORS-1584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16774309#comment-16774309
 ] 

Karl Wright commented on CONNECTORS-1584:
-

Have you subscribed to the list?  Instructions are in the documentation for 
"contact us".  You send mail to:

user-subscr...@manifoldcf.apache.org



> regex documentation
> ---
>
> Key: CONNECTORS-1584
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1584
> Project: ManifoldCF
>  Issue Type: Improvement
>  Components: Web connector
>Affects Versions: ManifoldCF 2.12
>Reporter: Tim Steenbeke
>Priority: Minor
>
> What type of regexs does manifold include and exclude support and also in 
> general regex support?
> At the moment i'm using a web repository connection and an Elastic output 
> connection.
>  I'm trying to exclude urls that link to documents.
>           e.g. website.com/document/path/this.pdf and 
> website.com/document/path/other.PDF
> The issue i'm having is that the regex that I have found so far doesn't work 
> case insensitive, so for every possible case i have to add a new line.
>     e.g.:
> {code:java}
> .*.pdf$ and .*.PDF$ and .*.Pdf and ... .{code}
> Is it possible to add documentation what type of regex is able to be used or 
> maybe a tool to test your regex and see if it is supported by manifold ?
> I tried mailing this question to 
> [u...@manifoldcf.apache.org|mailto:u...@manifoldcf.apache.org] but this mail 
> adress returns a failure notice.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (CONNECTORS-1584) regex documentation

2019-02-21 Thread Karl Wright (JIRA)



[ 
https://issues.apache.org/jira/browse/CONNECTORS-1584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16774105#comment-16774105
 ] 

Karl Wright commented on CONNECTORS-1584:
-

Actually, it *is* user@ but so many people get mixed up with that that I got it 
backwards myself.

What failure notice did you get when you mailed to user@?  I receive email from 
this list a dozen times a day or more so I am not sure why you'd be having 
trouble.


> regex documentation
> ---
>
> Key: CONNECTORS-1584
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1584
> Project: ManifoldCF
>  Issue Type: Improvement
>  Components: Web connector
>Affects Versions: ManifoldCF 2.12
>Reporter: Tim Steenbeke
>Priority: Minor
>
> What type of regexs does manifold include and exclude support and also in 
> general regex support?
> At the moment i'm using a web repository connection and an Elastic output 
> connection.
>  I'm trying to exclude urls that link to documents.
>           e.g. website.com/document/path/this.pdf and 
> website.com/document/path/other.PDF
> The issue i'm having is that the regex that I have found so far doesn't work 
> case insensitive, so for every possible case i have to add a new line.
>     e.g.:
> {code:java}
> .*.pdf$ and .*.PDF$ and .*.Pdf and ... .{code}
> Is it possible to add documentation what type of regex is able to be used or 
> maybe a tool to test your regex and see if it is supported by manifold ?
> I tried mailing this question to 
> [u...@manifoldcf.apache.org|mailto:u...@manifoldcf.apache.org] but this mail 
> adress returns a failure notice.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (CONNECTORS-1563) SolrException: org.apache.tika.exception.ZeroByteFileException: InputStream must have > 0 bytes

2019-02-20 Thread Karl Wright (JIRA)



[ 
https://issues.apache.org/jira/browse/CONNECTORS-1563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16772995#comment-16772995
 ] 

Karl Wright commented on CONNECTORS-1563:
-

[~Subasini], we are trying to debug your setup.  The first principle of 
debugging is to identify where exactly the problem is occurring.  It eliminates 
one variable.  The file system connector is quite simple and has few 
configuration options, so it should be easy to set something up we can use to 
evaluate your solr connection.

Thanks,
Karl

> SolrException: org.apache.tika.exception.ZeroByteFileException: InputStream 
> must have > 0 bytes
> ---
>
> Key: CONNECTORS-1563
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1563
> Project: ManifoldCF
>  Issue Type: Task
>  Components: Lucene/SOLR connector
>Reporter: Sneha
>Assignee: Karl Wright
>Priority: Major
> Attachments: Document simple history.docx, Manifold and Solr 
> settings_CustomField.docx, managed-schema, manifold settings.docx, 
> manifoldcf.log, path.png, schema.png, solr.log, solrconfig.xml
>
>
> I am encountering this problem:
> I have checked "Use the Extract Update Handler:" param then I am getting an 
> error on Solr i.e. null:org.apache.solr.common.SolrException: 
> org.apache.tika.exception.ZeroByteFileException: InputStream must have > 0 
> bytes
> If I ignore tika exception, my documents get indexed but dont have content 
> field on Solr.
> I am using Solr 7.3.1 and manifoldCF 2.8.1
> I am using solr cell and hence not configured external tika extractor in 
> manifoldCF pipeline
> Please help me with this problem
> Thanks in advance



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (CONNECTORS-1563) SolrException: org.apache.tika.exception.ZeroByteFileException: InputStream must have > 0 bytes

2019-02-20 Thread Karl Wright (JIRA)



[ 
https://issues.apache.org/jira/browse/CONNECTORS-1563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16772912#comment-16772912
 ] 

Karl Wright commented on CONNECTORS-1563:
-

[~Subasini], the "error" is because it does not recognize a specific 
translation bundle for your language, so it defaults to English.  It is 
harmless.

I asked you to *try* working with a File System connection initially to narrow 
down where your problems were coming from.  Please do so.  [~shinichiro abe] 
and myself both tried a configuration similar to the one you report end of last 
year when we were debugging the 2.11 release of ManifoldCF.

> SolrException: org.apache.tika.exception.ZeroByteFileException: InputStream 
> must have > 0 bytes
> ---
>
> Key: CONNECTORS-1563
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1563
> Project: ManifoldCF
>  Issue Type: Task
>  Components: Lucene/SOLR connector
>Reporter: Sneha
>Assignee: Karl Wright
>Priority: Major
> Attachments: Document simple history.docx, Manifold and Solr 
> settings_CustomField.docx, managed-schema, manifold settings.docx, 
> manifoldcf.log, path.png, schema.png, solr.log, solrconfig.xml
>
>
> I am encountering this problem:
> I have checked "Use the Extract Update Handler:" param then I am getting an 
> error on Solr i.e. null:org.apache.solr.common.SolrException: 
> org.apache.tika.exception.ZeroByteFileException: InputStream must have > 0 
> bytes
> If I ignore tika exception, my documents get indexed but dont have content 
> field on Solr.
> I am using Solr 7.3.1 and manifoldCF 2.8.1
> I am using solr cell and hence not configured external tika extractor in 
> manifoldCF pipeline
> Please help me with this problem
> Thanks in advance



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (CONNECTORS-1563) SolrException: org.apache.tika.exception.ZeroByteFileException: InputStream must have > 0 bytes

2019-02-19 Thread Karl Wright (JIRA)



[ 
https://issues.apache.org/jira/browse/CONNECTORS-1563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16772729#comment-16772729
 ] 

Karl Wright commented on CONNECTORS-1563:
-

In general in cases like this I recommend that people start with the simplest 
possible working configuration and then modify it until they achieve their 
goals.  In this case that would mean starting with a file system job and a 
freshly-installed Solr instance, with no other changes whatsoever.

[~shinichiro abe], can you help Mr. Rath by trying MCF 2.12 with a fresh 
single-process Solr instance, using the "/update" handler?  He claims that this 
does not work and I do not have any time to work with him for the next few 
weeks.  If it works for you please provide detailed steps describing what you 
did.  Thanks in advance!


> SolrException: org.apache.tika.exception.ZeroByteFileException: InputStream 
> must have > 0 bytes
> ---
>
> Key: CONNECTORS-1563
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1563
> Project: ManifoldCF
>  Issue Type: Task
>  Components: Lucene/SOLR connector
>Reporter: Sneha
>Assignee: Karl Wright
>Priority: Major
> Attachments: Document simple history.docx, Manifold and Solr 
> settings_CustomField.docx, managed-schema, manifold settings.docx, 
> manifoldcf.log, path.png, schema.png, solr.log, solrconfig.xml
>
>
> I am encountering this problem:
> I have checked "Use the Extract Update Handler:" param then I am getting an 
> error on Solr i.e. null:org.apache.solr.common.SolrException: 
> org.apache.tika.exception.ZeroByteFileException: InputStream must have > 0 
> bytes
> If I ignore tika exception, my documents get indexed but dont have content 
> field on Solr.
> I am using Solr 7.3.1 and manifoldCF 2.8.1
> I am using solr cell and hence not configured external tika extractor in 
> manifoldCF pipeline
> Please help me with this problem
> Thanks in advance



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (CONNECTORS-1563) SolrException: org.apache.tika.exception.ZeroByteFileException: InputStream must have > 0 bytes

2019-02-18 Thread Karl Wright (JIRA)



[ 
https://issues.apache.org/jira/browse/CONNECTORS-1563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16771663#comment-16771663
 ] 

Karl Wright commented on CONNECTORS-1563:
-

Hi Subasini,

Are you now Tika-extracting in ManifoldCF, or in Solr?
The text field looks like it contains properly extracted content, along with 
other stuff you do not want.  Is this correct?

If the extraction is happening in Solr, then I have no idea what this is coming 
from.  If the extraction is happening in ManifoldCF, then if you have placed a 
Metadata Adjuster transformer in the pipeline between the Tika Extractor and 
the Solr Output Connector, I'd say you had set it up to concatenate many fields 
together into a text field.  The Metadata Adjuster has that ability.

The choice of how metadata (or content) fields get mapped to Solr schema is set 
up in your Solr output connection configuration.  The Tika extraction basically 
replaces a binary input document with a character-sequence output document plus 
metadata fields.  The character-sequence output document then must be sent to 
Solr not using the exracting update handler, but just the standard handler, so 
the handler should be changed from /update/extract to just /update, and the 
"Use extracting update handler" should be turned off.  The actual field name 
used for the extracted content body can also be changed, if desired, in the 
"Schema" part of the configuration.  But what is there by default works with 
Solr as it's set up by default.





> SolrException: org.apache.tika.exception.ZeroByteFileException: InputStream 
> must have > 0 bytes
> ---
>
> Key: CONNECTORS-1563
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1563
> Project: ManifoldCF
>  Issue Type: Task
>  Components: Lucene/SOLR connector
>        Reporter: Sneha
>Assignee: Karl Wright
>Priority: Major
> Attachments: Document simple history.docx, managed-schema, manifold 
> settings.docx, manifoldcf.log, solr.log, solrconfig.xml
>
>
> I am encountering this problem:
> I have checked "Use the Extract Update Handler:" param then I am getting an 
> error on Solr i.e. null:org.apache.solr.common.SolrException: 
> org.apache.tika.exception.ZeroByteFileException: InputStream must have > 0 
> bytes
> If I ignore tika exception, my documents get indexed but dont have content 
> field on Solr.
> I am using Solr 7.3.1 and manifoldCF 2.8.1
> I am using solr cell and hence not configured external tika extractor in 
> manifoldCF pipeline
> Please help me with this problem
> Thanks in advance



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Resolved] (CONNECTORS-1584) regex documentation

2019-02-18 Thread Karl Wright (JIRA)



 [ 
https://issues.apache.org/jira/browse/CONNECTORS-1584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karl Wright resolved CONNECTORS-1584.
-
Resolution: Not A Problem

> regex documentation
> ---
>
> Key: CONNECTORS-1584
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1584
> Project: ManifoldCF
>  Issue Type: Improvement
>  Components: Web connector
>Affects Versions: ManifoldCF 2.12
>Reporter: Tim Steenbeke
>Priority: Minor
>
> What type of regexs does manifold include and exclude support and also in 
> general regex support?
> At the moment i'm using a web repository connection and an Elastic output 
> connection.
>  I'm trying to exclude urls that link to documents.
>           e.g. website.com/document/path/this.pdf and 
> website.com/document/path/other.PDF
> The issue i'm having is that the regex that I have found so far doesn't work 
> case insensitive, so for every possible case i have to add a new line.
>     e.g.:
> {code:java}
> .*.pdf$ and .*.PDF$ and .*.Pdf and ... .{code}
> Is it possible to add documentation what type of regex is able to be used or 
> maybe a tool to test your regex and see if it is supported by manifold ?
> I tried mailing this question to 
> [u...@manifoldcf.apache.org|mailto:u...@manifoldcf.apache.org] but this mail 
> adress returns a failure notice.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Re: Error integrity constraint violation

2019-02-18 Thread Karl Wright

Hi Kaya,

Database constraint violations, as you know, occur because you're trying to
put more than one identical value into a table column that cannot have such
a column.  For the table in question, if you have the same class name for
two different connectors, this would be what you'd expect.

Karl


On Sun, Feb 17, 2019 at 11:33 PM Kaya Ota  wrote:

> Hello, folks:
>
> I am new to ManifoldCF, and trying to make my own connector.
> For now, I could successfully build ManifoldCF including my own connector.
> However, when I tried to run, I have exceptions.
>
> The exception I am facing is :
>
> org.apache.manifoldcf.core.interfaces.ManifoldCFException: integrity
> constraint violation: unique constraint or index violation: I1549774667196
> at
>
> org.apache.manifoldcf.core.database.DBInterfaceHSQLDB.reinterpretException(DBInterfaceHSQLDB.java:734)
> at
>
> org.apache.manifoldcf.core.database.DBInterfaceHSQLDB.performModification(DBInterfaceHSQLDB.java:754)
> at
>
> org.apache.manifoldcf.core.database.DBInterfaceHSQLDB.performInsert(DBInterfaceHSQLDB.java:230)
> at
>
> org.apache.manifoldcf.core.database.BaseTable.performInsert(BaseTable.java:68)
> at
>
> org.apache.manifoldcf.crawler.connmgr.ConnectorManager.registerConnector(ConnectorManager.java:172)
> at
>
> org.apache.manifoldcf.crawler.system.ManifoldCF.registerConnectors(ManifoldCF.java:672)
> at
>
> org.apache.manifoldcf.crawler.system.ManifoldCF.reregisterAllConnectors(ManifoldCF.java:160)
> at
>
> org.apache.manifoldcf.jettyrunner.ManifoldCFJettyRunner.main(ManifoldCFJettyRunner.java:239)
> Caused by: java.sql.SQLIntegrityConstraintViolationException: integrity
> constraint violation: unique constraint or index violation: I1549774667196
> at org.hsqldb.jdbc.JDBCUtil.sqlException(Unknown Source)
> at org.hsqldb.jdbc.JDBCUtil.sqlException(Unknown Source)
> at org.hsqldb.jdbc.JDBCPreparedStatement.fetchResult(Unknown
> Source)
> at org.hsqldb.jdbc.JDBCPreparedStatement.executeUpdate(Unknown
> Source)
> at
> org.apache.manifoldcf.core.database.Database.execute(Database.java:916)
> at
>
> org.apache.manifoldcf.core.database.Database$ExecuteQueryThread.run(Database.java:696)
> Caused by: org.hsqldb.HsqlException: integrity constraint violation: unique
> constraint or index violation: I1549774667196
> at org.hsqldb.error.Error.error(Unknown Source)
> at org.hsqldb.error.Error.error(Unknown Source)
> at org.hsqldb.index.IndexAVL.insert(Unknown Source)
> at org.hsqldb.persist.RowStoreAVL.indexRow(Unknown Source)
> at org.hsqldb.persist.RowStoreAVLDisk.indexRow(Unknown Source)
> at org.hsqldb.TransactionManagerMVCC.addInsertAction(Unknown
> Source)
> at org.hsqldb.Session.addInsertAction(Unknown Source)
> at org.hsqldb.Table.insertSingleRow(Unknown Source)
> at org.hsqldb.StatementDML.insertSingleRow(Unknown Source)
> at org.hsqldb.StatementInsert.getResult(Unknown Source)
> at org.hsqldb.StatementDMQL.execute(Unknown Source)
> at org.hsqldb.Session.executeCompiledStatement(Unknown Source)
> at org.hsqldb.Session.execute(Unknown Source)
> ... 4 more
>
>
> I am guessing my class-path would have a problem, but do not have a
> confidence.
> What is the cause of this error?
>
> I would appreciate for any of your help.
>
>
> Sincerely,
> Kaya
>

[jira] [Commented] (CONNECTORS-1584) regex documentation

2019-02-18 Thread Karl Wright (JIRA)



[ 
https://issues.apache.org/jira/browse/CONNECTORS-1584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16771462#comment-16771462
 ] 

Karl Wright commented on CONNECTORS-1584:
-

The mailing list is us...@manifoldcf.apache.org.

The regular expressions are standard Java regular expressions.  The 
documentation is widely available.  You can also experiment with regular 
expressions in a java applet online at: 
https://www.cis.upenn.edu/~matuszek/General/RegexTester/regex-tester.html


> regex documentation
> ---
>
> Key: CONNECTORS-1584
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1584
> Project: ManifoldCF
>  Issue Type: Improvement
>  Components: Web connector
>Affects Versions: ManifoldCF 2.12
>Reporter: Tim Steenbeke
>Priority: Minor
>
> What type of regexs does manifold include and exclude support and also in 
> general regex support?
> At the moment i'm using a web repository connection and an Elastic output 
> connection.
>  I'm trying to exclude urls that link to documents.
>           e.g. website.com/document/path/this.pdf and 
> website.com/document/path/other.PDF
> The issue i'm having is that the regex that I have found so far doesn't work 
> case insensitive, so for every possible case i have to add a new line.
>     e.g.:
> {code:java}
> .*.pdf$ and .*.PDF$ and .*.Pdf and ... .{code}
> Is it possible to add documentation what type of regex is able to be used or 
> maybe a tool to test your regex and see if it is supported by manifold ?
> I tried mailing this question to 
> [u...@manifoldcf.apache.org|mailto:u...@manifoldcf.apache.org] but this mail 
> adress returns a failure notice.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Resolved] (CONNECTORS-1585) MCF Admin page shows 404 error frequently

2019-02-18 Thread Karl Wright (JIRA)



 [ 
https://issues.apache.org/jira/browse/CONNECTORS-1585?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karl Wright resolved CONNECTORS-1585.
-
Resolution: Cannot Reproduce

> MCF Admin page shows 404 error frequently
> -
>
> Key: CONNECTORS-1585
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1585
> Project: ManifoldCF
>  Issue Type: Task
>Reporter: Pavithra Dhakshinamurthy
>Priority: Critical
>
> Hi Team,
> I'm getting 404 Page not found error on a frequent basis in Manifold CF home 
> page. Not able to trace any error logs as well. Please let me know on what 
> scenarios 404 error will occur.
> http://{hostname}:8345/mcf-crawler-ui/login.jsp
> Regards,
> Pavithra D



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (CONNECTORS-1585) MCF Admin page shows 404 error frequently

2019-02-18 Thread Karl Wright (JIRA)



[ 
https://issues.apache.org/jira/browse/CONNECTORS-1585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16771461#comment-16771461
 ] 

Karl Wright commented on CONNECTORS-1585:
-

404 errors have nothing to do with ManifoldCF.  They have to do with your app 
server environment -- either that, or your network/proxy.  MCF is just a web 
app and does not have any magic in it.


> MCF Admin page shows 404 error frequently
> -
>
> Key: CONNECTORS-1585
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1585
> Project: ManifoldCF
>  Issue Type: Task
>Reporter: Pavithra Dhakshinamurthy
>Priority: Critical
>
> Hi Team,
> I'm getting 404 Page not found error on a frequent basis in Manifold CF home 
> page. Not able to trace any error logs as well. Please let me know on what 
> scenarios 404 error will occur.
> http://{hostname}:8345/mcf-crawler-ui/login.jsp
> Regards,
> Pavithra D



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Resolved] (CONNECTORS-1580) Issues in documentum connector

2019-02-12 Thread Karl Wright (JIRA)



 [ 
https://issues.apache.org/jira/browse/CONNECTORS-1580?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karl Wright resolved CONNECTORS-1580.
-
Resolution: Won't Fix

> Issues in documentum connector
> --
>
> Key: CONNECTORS-1580
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1580
> Project: ManifoldCF
>  Issue Type: Bug
>Reporter: Pavithra Dhakshinamurthy
>Priority: Blocker
> Attachments: Job_Scheduling.png
>
>
> Hi Team,
>  We are facing below issues in apache manifold documentum connector version 
> 2.9.1.kindly help us. 
>  1.During the first run of the job,documents are getting indexed to 
> ElasticSearch.If the same job is run after the completion,records are getting 
> seeded,processed but not updated to output connector.Once the document id is 
> indexed,same document id is not able to update it again in the same job. 
>
>  2.We have scheduled incremental crawling for every 15 mins and document 
> count will vary for every 15 mins. But in seeding it is not resetting the 
> document count,once the job is completed.It's getting added to last scheduled 
> job count.
>eg.1st schedule-10 documents 
>   2nd schedule-5 documents 
> In the 2nd scheduled of the job,the document count should be 5,but it is 
> having document count as 15. so it is keep on adding the dcouments id for 
> every schedule and it is processing



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (CONNECTORS-1580) Issues in documentum connector

2019-02-12 Thread Karl Wright (JIRA)



[ 
https://issues.apache.org/jira/browse/CONNECTORS-1580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16766799#comment-16766799
 ] 

Karl Wright commented on CONNECTORS-1580:
-

You are on your own here.  You are trying to use it as a queuing engine, not an 
incremental indexer.  You have not thought this out properly, clearly, because 
that's not what addSeedDocuments() does.  So you must come up with a version 
string computation that reflects the fact that your documents have changed and 
need to be reconsidered.  It will have to directly reference whatever external 
queue you are using to stuff changed documents in.

You should maybe start by reading the book.  It's free.  Here:  
https://github.com/DaddyWri/manifoldcfinaction/tree/master/pdfs



> Issues in documentum connector
> --
>
> Key: CONNECTORS-1580
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1580
> Project: ManifoldCF
>  Issue Type: Bug
>Reporter: Pavithra Dhakshinamurthy
>Priority: Blocker
> Attachments: Job_Scheduling.png
>
>
> Hi Team,
>  We are facing below issues in apache manifold documentum connector version 
> 2.9.1.kindly help us. 
>  1.During the first run of the job,documents are getting indexed to 
> ElasticSearch.If the same job is run after the completion,records are getting 
> seeded,processed but not updated to output connector.Once the document id is 
> indexed,same document id is not able to update it again in the same job. 
>
>  2.We have scheduled incremental crawling for every 15 mins and document 
> count will vary for every 15 mins. But in seeding it is not resetting the 
> document count,once the job is completed.It's getting added to last scheduled 
> job count.
>eg.1st schedule-10 documents 
>   2nd schedule-5 documents 
> In the 2nd scheduled of the job,the document count should be 5,but it is 
> having document count as 15. so it is keep on adding the dcouments id for 
> every schedule and it is processing



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Re: Apache ManifoldCF: Get a history report for a repository connection over REST API

2019-02-12 Thread Karl Wright

Yes, query parameters in any URL go after the fixed "path" part of the URL,
and are of the form ?parameter=value=value2... just like any
other URL.

My suspicion is that you aren't supplying the activity(s) that you want to
match.  The best way to figure out what activities make sense for the
connection is to look at the report UI page for that connection and see
what activities are available.  Also please be aware that by default MCF
purges history records that are more than 30 days old -- you can configure
longer, but if they don't show up in the UI they aren't going to show up in
the API.  Finally, the reason you don't get an error when you use a
connection name that is bogus is because the underlying implementation is
merely doing a dumb query and not checking for the legality/existence of
the connection name you give it.

Thanks, and please let me know how it goes.

Karl

On Tue, Feb 12, 2019 at 7:43 PM Dave Fisher  wrote:

> Redirecting your query to dev@manifoldcf.apache.org
>
> Sent from my iPhone
>
> > On Feb 12, 2019, at 8:38 AM, Marta Gołąbek  wrote:
> >
> > Dear Sir or Madam,
> >
> > I'm trying to get a history report for a repository connection over
> > ManifoldCF REST API. According to the documentation:
> >
> >
> https://manifoldcf.apache.org/release/release-2.11/en_US/programmatic-operation.html#History+query+parameters
> >
> > It should be possible with the following URL (connection name:
> > myConnection):
> >
> >
> http://localhost:8345/mcf-api-service/json/repositoryconnectionhistory/myConnection
> >
> > I have also tried to use some of the history query parameters:
> >
> >
> http://localhost:8345/mcf-api-service/json/repositoryconnectionhistory/myConnection?report=simple
> >
> > But I am not sure if I am using them correctly or how they should be
> > attached to the URL, because it is not mentioned in the documentation.
> The
> > problem is also that I don't receive any error, but an empty object, so
> it
> > is difficult to debug. The API returns an empty object even for a
> > non-existing connection.
> >
> > However it works for resources, which doesn't have any attributes, e.g.:
> >
> >
> http://localhost:8345/mcf-api-service/json/repositoryconnectionjobs/myConnection
> >
> > or
> >
> >
> http://localhost:8345/mcf-api-service/json/repositoryconnections/myConnection
> >
> > Thanks in advace for any help.
> >
> > Best regards,
> >
> > Marta Golabek
>
>

[jira] [Commented] (CONNECTORS-1581) [Set priority thread] Error tossed: null during startup

2019-02-12 Thread Karl Wright (JIRA)



[ 
https://issues.apache.org/jira/browse/CONNECTORS-1581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16766037#comment-16766037
 ] 

Karl Wright commented on CONNECTORS-1581:
-

It's possible that the problem is due to funkiness in MySQL.  We've had a lot 
of trouble lately because MySQL no longer seems to be properly enforcing 
transaction integrity in at least some circumstances.  OR it could be the 
open-source MySQL driver we're using; maybe that needs an upgrade?

At any rate, removal of jobqueue rows MUST precede removal of job table rows; 
there's a constraint in place in fact.  So if you get to the point where that 
constraint has been violated, you're pretty certain it's a database issue. :-(


> [Set priority thread] Error tossed: null during startup
> ---
>
> Key: CONNECTORS-1581
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1581
> Project: ManifoldCF
>  Issue Type: Bug
> Environment: •ManifoldCF 2.12, running in a Docker Container 
> based on Redhat Linux, OpenJDK 8 
> • AWS RDS Database (Aurora MySQL -> 5.6 compatible, utf8 (collation 
> utf8_bin))
> • Single Process Setup
>Reporter: Markus Schuch
>Assignee: Markus Schuch
>Priority: Major
>
> We see the following {{NullPointerException}} at startup:
> {code}
> [Set priority thread] FATAL org.apache.manifoldcf.crawlerthreads- Error 
> tossed: null
> java.lang.NullPointerException
> at 
> org.apache.manifoldcf.crawler.system.ManifoldCF.writeDocumentPriorities(ManifoldCF.java:1202)
> at 
> org.apache.manifoldcf.crawler.system.SetPriorityThread.run(SetPriorityThread.java:141)
> {code}
> What could be the cause of that?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (CONNECTORS-1582) Unable to Crawl the Site Contents and Meta-Data

2019-02-12 Thread Karl Wright (JIRA)



[ 
https://issues.apache.org/jira/browse/CONNECTORS-1582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16765959#comment-16765959
 ] 

Karl Wright commented on CONNECTORS-1582:
-

The purpose is to decide whether the document matches the specified inclusion 
rules for the document.


> Unable to Crawl the Site Contents and Meta-Data
> ---
>
> Key: CONNECTORS-1582
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1582
> Project: ManifoldCF
>  Issue Type: Bug
>Reporter: Pavithra Dhakshinamurthy
>    Assignee: Karl Wright
>Priority: Major
>
> Hi,
> Currently I'm using the ManifoldCF(2.9.1) SharePoint version 2003. I'm unable 
> to crawl the site contents data. I have facing some issues, hard to figure 
> out to resolve.
> can you please assist the same.
> There is a method(CheckMatch) for validating ASCII value for site contests 
> but unable to understand the usage of validation. I'm getting error "no 
> matching rule" because of failing the rule of CheckMatch(). 
> Even-though i tried path type as Library, List, Site, Folder but unable to 
> crawl the site contents and meta data. while putting logger i can able to see 
> the list of site contents 
> Thanks,



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (CONNECTORS-1583) ManifoldCF getting hung frequently

2019-02-11 Thread Karl Wright (JIRA)



[ 
https://issues.apache.org/jira/browse/CONNECTORS-1583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16765760#comment-16765760
 ] 

Karl Wright commented on CONNECTORS-1583:
-

How have you deployed ManifoldCF?  What app server are you using?  What 
deployment model (e.g. which example)?

The ManifoldCF UI runs underneath an application server.  It appears to me like 
that application server is either inaccessible or has been shut down.  This is 
not a ManifoldCF problem.


> ManifoldCF getting hung frequently
> --
>
> Key: CONNECTORS-1583
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1583
> Project: ManifoldCF
>  Issue Type: Bug
>Affects Versions: ManifoldCF 2.9.1
>Reporter: Pavithra Dhakshinamurthy
>Priority: Major
> Attachments: image-2019-02-12-11-59-52-131.png
>
>
> Hi Team,
> We are using Manifold 2.9.1 version for crawling the documents. The 
> ManifoldCF server is getting hung very frequently due to which crawling is 
> getting failed. 
> While accessing the Manifold application, it's throwing 404 error, but we 
> could see the process running at the background.
>  !image-2019-02-12-11-59-52-131.png|thumbnail! 
> Connectors used:
> Repository :Documentum
> Output : Elasticsearch
> Kindly help us in resolving this issue.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Resolved] (CONNECTORS-1583) ManifoldCF getting hung frequently

2019-02-11 Thread Karl Wright (JIRA)



 [ 
https://issues.apache.org/jira/browse/CONNECTORS-1583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karl Wright resolved CONNECTORS-1583.
-
Resolution: Incomplete

> ManifoldCF getting hung frequently
> --
>
> Key: CONNECTORS-1583
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1583
> Project: ManifoldCF
>  Issue Type: Bug
>Affects Versions: ManifoldCF 2.9.1
>Reporter: Pavithra Dhakshinamurthy
>Priority: Major
> Attachments: image-2019-02-12-11-59-52-131.png
>
>
> Hi Team,
> We are using Manifold 2.9.1 version for crawling the documents. The 
> ManifoldCF server is getting hung very frequently due to which crawling is 
> getting failed. 
> While accessing the Manifold application, it's throwing 404 error, but we 
> could see the process running at the background.
>  !image-2019-02-12-11-59-52-131.png|thumbnail! 
> Connectors used:
> Repository :Documentum
> Output : Elasticsearch
> Kindly help us in resolving this issue.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (CONNECTORS-1581) [Set priority thread] Error tossed: null during startup

2019-02-11 Thread Karl Wright (JIRA)



[ 
https://issues.apache.org/jira/browse/CONNECTORS-1581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16765178#comment-16765178
 ] 

Karl Wright commented on CONNECTORS-1581:
-

Yes if the job ID doesn't show up anywhere it's safe to delete.
How did you wind up in that situation though?

Karl


On Mon, Feb 11, 2019 at 12:15 PM Markus Schuch (JIRA) 



> [Set priority thread] Error tossed: null during startup
> ---
>
> Key: CONNECTORS-1581
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1581
> Project: ManifoldCF
>  Issue Type: Bug
> Environment: •ManifoldCF 2.12, running in a Docker Container 
> based on Redhat Linux, OpenJDK 8 
> • AWS RDS Database (Aurora MySQL -> 5.6 compatible, utf8 (collation 
> utf8_bin))
> • Single Process Setup
>Reporter: Markus Schuch
>Assignee: Markus Schuch
>Priority: Major
>
> We see the following {{NullPointerException}} at startup:
> {code}
> [Set priority thread] FATAL org.apache.manifoldcf.crawlerthreads- Error 
> tossed: null
> java.lang.NullPointerException
> at 
> org.apache.manifoldcf.crawler.system.ManifoldCF.writeDocumentPriorities(ManifoldCF.java:1202)
> at 
> org.apache.manifoldcf.crawler.system.SetPriorityThread.run(SetPriorityThread.java:141)
> {code}
> What could be the cause of that?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (CONNECTORS-1580) Issues in documentum connector

2019-02-11 Thread Karl Wright (JIRA)



[ 
https://issues.apache.org/jira/browse/CONNECTORS-1580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16765083#comment-16765083
 ] 

Karl Wright commented on CONNECTORS-1580:
-

So you modified the Documentum Connector to change what addSeedDocument returns?

Did you change what getModel() returns?

Did you change how the version string is calculated in processDocuments()?  If 
you don't do that the framework will not detect changes and will not work 
properly.



> Issues in documentum connector
> --
>
> Key: CONNECTORS-1580
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1580
> Project: ManifoldCF
>  Issue Type: Bug
>Reporter: Pavithra Dhakshinamurthy
>Priority: Blocker
> Attachments: Job_Scheduling.png
>
>
> Hi Team,
>  We are facing below issues in apache manifold documentum connector version 
> 2.9.1.kindly help us. 
>  1.During the first run of the job,documents are getting indexed to 
> ElasticSearch.If the same job is run after the completion,records are getting 
> seeded,processed but not updated to output connector.Once the document id is 
> indexed,same document id is not able to update it again in the same job. 
>
>  2.We have scheduled incremental crawling for every 15 mins and document 
> count will vary for every 15 mins. But in seeding it is not resetting the 
> document count,once the job is completed.It's getting added to last scheduled 
> job count.
>eg.1st schedule-10 documents 
>   2nd schedule-5 documents 
> In the 2nd scheduled of the job,the document count should be 5,but it is 
> having document count as 15. so it is keep on adding the dcouments id for 
> every schedule and it is processing



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Assigned] (CONNECTORS-1582) Unable to Crawl the Site Contents and Meta-Data

2019-02-11 Thread Karl Wright (JIRA)



 [ 
https://issues.apache.org/jira/browse/CONNECTORS-1582?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karl Wright reassigned CONNECTORS-1582:
---

Assignee: Karl Wright

> Unable to Crawl the Site Contents and Meta-Data
> ---
>
> Key: CONNECTORS-1582
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1582
> Project: ManifoldCF
>  Issue Type: Bug
>Reporter: Pavithra Dhakshinamurthy
>    Assignee: Karl Wright
>Priority: Major
>
> Hi,
> Currently I'm using the ManifoldCF(2.9.1) SharePoint version 2003. I'm unable 
> to crawl the site contents data. I have facing some issues, hard to figure 
> out to resolve.
> can you please assist the same.
> There is a method(CheckMatch) for validating ASCII value for site contests 
> but unable to understand the usage of validation. I'm getting error "no 
> matching rule" because of failing the rule of CheckMatch(). 
> Even-though i tried path type as Library, List, Site, Folder but unable to 
> crawl the site contents and meta data. while putting logger i can able to see 
> the list of site contents 
> Thanks,



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Resolved] (CONNECTORS-1582) Unable to Crawl the Site Contents and Meta-Data

2019-02-11 Thread Karl Wright (JIRA)



 [ 
https://issues.apache.org/jira/browse/CONNECTORS-1582?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karl Wright resolved CONNECTORS-1582.
-
Resolution: Not A Problem

> Unable to Crawl the Site Contents and Meta-Data
> ---
>
> Key: CONNECTORS-1582
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1582
> Project: ManifoldCF
>  Issue Type: Bug
>Reporter: Pavithra Dhakshinamurthy
>    Assignee: Karl Wright
>Priority: Major
>
> Hi,
> Currently I'm using the ManifoldCF(2.9.1) SharePoint version 2003. I'm unable 
> to crawl the site contents data. I have facing some issues, hard to figure 
> out to resolve.
> can you please assist the same.
> There is a method(CheckMatch) for validating ASCII value for site contests 
> but unable to understand the usage of validation. I'm getting error "no 
> matching rule" because of failing the rule of CheckMatch(). 
> Even-though i tried path type as Library, List, Site, Folder but unable to 
> crawl the site contents and meta data. while putting logger i can able to see 
> the list of site contents 
> Thanks,



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (CONNECTORS-1582) Unable to Crawl the Site Contents and Meta-Data

2019-02-11 Thread Karl Wright (JIRA)



[ 
https://issues.apache.org/jira/browse/CONNECTORS-1582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16765019#comment-16765019
 ] 

Karl Wright commented on CONNECTORS-1582:
-

Hi [~Pavithrad], the problem is that you will need not just one rule, but a 
rule for sites, and a rule for libraries, and a rule for documents.  So if the 
entity you need to decide whether it is included is a site, then you need a 
site rule, and the same for libraries or documents.  And since you can't get to 
all document metadata without drilling down through sites and libraries, you 
need the rules for these in order to get to the metadata for each of these 
levels.  The documentation is pretty clear about how these rules work, but I 
agree that the interface is complex to work with.


> Unable to Crawl the Site Contents and Meta-Data
> ---
>
> Key: CONNECTORS-1582
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1582
> Project: ManifoldCF
>  Issue Type: Bug
>Reporter: Pavithra Dhakshinamurthy
>Priority: Major
>
> Hi,
> Currently I'm using the ManifoldCF(2.9.1) SharePoint version 2003. I'm unable 
> to crawl the site contents data. I have facing some issues, hard to figure 
> out to resolve.
> can you please assist the same.
> There is a method(CheckMatch) for validating ASCII value for site contests 
> but unable to understand the usage of validation. I'm getting error "no 
> matching rule" because of failing the rule of CheckMatch(). 
> Even-though i tried path type as Library, List, Site, Folder but unable to 
> crawl the site contents and meta data. while putting logger i can able to see 
> the list of site contents 
> Thanks,



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (CONNECTORS-1580) Issues in documentum connector

2019-02-08 Thread Karl Wright (JIRA)



[ 
https://issues.apache.org/jira/browse/CONNECTORS-1580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16763725#comment-16763725
 ] 

Karl Wright commented on CONNECTORS-1580:
-

{quote}
The documents which have already got indexed are getting processed but not 
getting updated to Elasticsearch while re-running the same job
{quote}

What does the Simple History say here?  Look for a document that you think 
should be updated but isn't getting updated.  Do you see a document fetch?  Do 
you see a document ingestion?

If you see an ingestion BUT the ES index is not getting updated then your 
problem has to do with how ES is set up.  I can imagine quite a few scenarios 
where that can occur.

If you are seeing a fetch but no indexing, that means that the version string 
for your documentum documents is not changing for some reason.  This would 
require more analysis, starting with learning exactly what has changed with the 
document in question that you expect should cause a reindex.  It is possible 
you have some custom information that is not showing up in the version string 
and you are nonetheless expecting it to.  We would need more details to be able 
to fix that.


> Issues in documentum connector
> --
>
> Key: CONNECTORS-1580
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1580
> Project: ManifoldCF
>  Issue Type: Bug
>Reporter: Pavithra Dhakshinamurthy
>Priority: Blocker
> Attachments: Job_Scheduling.png
>
>
> Hi Team,
>  We are facing below issues in apache manifold documentum connector version 
> 2.9.1.kindly help us. 
>  1.During the first run of the job,documents are getting indexed to 
> ElasticSearch.If the same job is run after the completion,records are getting 
> seeded,processed but not updated to output connector.Once the document id is 
> indexed,same document id is not able to update it again in the same job. 
>
>  2.We have scheduled incremental crawling for every 15 mins and document 
> count will vary for every 15 mins. But in seeding it is not resetting the 
> document count,once the job is completed.It's getting added to last scheduled 
> job count.
>eg.1st schedule-10 documents 
>   2nd schedule-5 documents 
> In the 2nd scheduled of the job,the document count should be 5,but it is 
> having document count as 15. so it is keep on adding the dcouments id for 
> every schedule and it is processing



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (CONNECTORS-1581) [Set priority thread] Error tossed: null during startup

2019-02-08 Thread Karl Wright (JIRA)



[ 
https://issues.apache.org/jira/browse/CONNECTORS-1581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16763667#comment-16763667
 ] 

Karl Wright commented on CONNECTORS-1581:
-

I am pretty concerned that the database layer is fundamentally not working 
right.  The fact that the set priority thread recovers argues that there is a 
database failure that silently resolves.  This is bizarre.

If the thread actually finally starts, then you should be good to go other than 
the concerns expressed above.


> [Set priority thread] Error tossed: null during startup
> ---
>
> Key: CONNECTORS-1581
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1581
> Project: ManifoldCF
>  Issue Type: Bug
> Environment: •ManifoldCF 2.12, running in a Docker Container 
> based on Redhat Linux, OpenJDK 8 
> • AWS RDS Database (Aurora MySQL -> 5.6 compatible, utf8 (collation 
> utf8_bin))
> • Single Process Setup
>Reporter: Markus Schuch
>Priority: Major
>
> We see the following {{NullPointerException}} at startup:
> {code}
> [Set priority thread] FATAL org.apache.manifoldcf.crawlerthreads- Error 
> tossed: null
> java.lang.NullPointerException
> at 
> org.apache.manifoldcf.crawler.system.ManifoldCF.writeDocumentPriorities(ManifoldCF.java:1202)
> at 
> org.apache.manifoldcf.crawler.system.SetPriorityThread.run(SetPriorityThread.java:141)
> {code}
> What could be the cause of that?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (CONNECTORS-1581) [Set priority thread] Error tossed: null during startup

2019-02-08 Thread Karl Wright (JIRA)



[ 
https://issues.apache.org/jira/browse/CONNECTORS-1581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16763530#comment-16763530
 ] 

Karl Wright commented on CONNECTORS-1581:
-

Here's the code that's throwing an NPE:

{code}
// Compute the list of connector instances we will need.
// This has a side effect of fetching all job descriptions too.
Set connectionNames = new HashSet();
for (int i = 0; i < descs.length; i++)
{
  DocumentDescription dd = descs[i];
  IJobDescription job = jobDescriptionMap.get(dd.getJobID());
  if (job == null)
  {
job = jobManager.load(dd.getJobID(),true);
jobDescriptionMap.put(dd.getJobID(),job);
  }
  connectionNames.add(job.getConnectionName());
}
{code}

The problem is, apparently, that jobManager.load() is coming back null.  
I have no idea why this would happen but clearly the problem has to do with the 
database implementation, perhaps the mysql driver being used?


> [Set priority thread] Error tossed: null during startup
> ---
>
> Key: CONNECTORS-1581
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1581
> Project: ManifoldCF
>  Issue Type: Bug
> Environment: •ManifoldCF 2.12, running in a Docker Container 
> based on Redhat Linux, OpenJDK 8 
> • AWS RDS Database (Aurora MySQL -> 5.6 compatible, utf8 (collation 
> utf8_bin))
> • Single Process Setup
>Reporter: Markus Schuch
>Priority: Major
>
> We see the following {{NullPointerException}} at startup:
> {code}
> [Set priority thread] FATAL org.apache.manifoldcf.crawlerthreads- Error 
> tossed: null
> java.lang.NullPointerException
> at 
> org.apache.manifoldcf.crawler.system.ManifoldCF.writeDocumentPriorities(ManifoldCF.java:1202)
> at 
> org.apache.manifoldcf.crawler.system.SetPriorityThread.run(SetPriorityThread.java:141)
> {code}
> What could be the cause of that?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (CONNECTORS-1580) Issues in documentum connector

2019-02-08 Thread Karl Wright (JIRA)



[ 
https://issues.apache.org/jira/browse/CONNECTORS-1580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16763404#comment-16763404
 ] 

Karl Wright commented on CONNECTORS-1580:
-

Hi,
I can make almost no sense of this ticket.

Can you describe the job scheduling setup?  Specifically is this "scan once" or 
"rescan dynamically"?  What does this mean exactly? "We have scheduled 
incremental crawling for every 15 mins"

You should be aware that the document count will vary because documents that 
are discovered are then processed and ManifoldCF may determine during 
processing that the document does not need to be indexed.  The best way to 
figure out what MCF is doing is to look at the Simple History report and see 
what is happening.  You can see what is fetched and what is reindexed that way.

Can you include the Simple History for one incremental job run here, and 
describe what is wrong with it?


> Issues in documentum connector
> --
>
> Key: CONNECTORS-1580
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1580
> Project: ManifoldCF
>  Issue Type: Bug
>Reporter: Pavithra Dhakshinamurthy
>Priority: Blocker
>
> Hi Team,
>  We are facing below issues in apache manifold documentum connector version 
> 2.9.1.kindly help us. 
>  1.During the first run of the job,documents are getting indexed to 
> ElasticSearch.If the same job is run after the completion,records are getting 
> seeded,processed but not updated to output connector.Once the document id is 
> indexed,same document id is not able to update it again in the same job. 
>
>  2.We have scheduled incremental crawling for every 15 mins and document 
> count will vary for every 15 mins. But in seeding it is not resetting the 
> document count,once the job is completed.It's getting added to last scheduled 
> job count.
>eg.1st schedule-10 documents 
>   2nd schedule-5 documents 
> In the 2nd scheduled of the job,the document count should be 5,but it is 
> having document count as 15. so it is keep on adding the dcouments id for 
> every schedule and it is processing



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (CONNECTORS-1579) Error when crawling a MSSQL table

2019-02-08 Thread Karl Wright (JIRA)



[ 
https://issues.apache.org/jira/browse/CONNECTORS-1579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16763395#comment-16763395
 ] 

Karl Wright commented on CONNECTORS-1579:
-

You can either check out the entire current trunk source code and build that, 
or download the release source and libs, apply the patch, and build that.  
Which do you want to do?


> Error when crawling a MSSQL table
> -
>
> Key: CONNECTORS-1579
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1579
> Project: ManifoldCF
>  Issue Type: Bug
>  Components: JDBC connector
>Affects Versions: ManifoldCF 2.12
>Reporter: Donald Van den Driessche
>Assignee: Karl Wright
>Priority: Major
> Fix For: ManifoldCF 2.13
>
> Attachments: 636_bb2.csv, CONNECTORS-1579.patch
>
>
> When I'm crawling a MSSQL table through the JDBC connector I get following 
> error on multiple lines:
>  
> {noformat}
> FATAL 2019-02-05T13:21:58,929 (Worker thread '40') - Error tossed: Multiple 
> document primary component dispositions not allowed: document '636'
> java.lang.IllegalStateException: Multiple document primary component 
> dispositions not allowed: document '636'
> at 
> org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.checkMultipleDispositions(WorkerThread.java:2125)
>  ~[mcf-pull-agent.jar:?]
> at 
> org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.noDocument(WorkerThread.java:1624)
>  ~[mcf-pull-agent.jar:?]
> at 
> org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.noDocument(WorkerThread.java:1605)
>  ~[mcf-pull-agent.jar:?]
> at 
> org.apache.manifoldcf.crawler.connectors.jdbc.JDBCConnector.processDocuments(JDBCConnector.java:944)
>  ~[?:?]
> at 
> org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:399) 
> [mcf-pull-agent.jar:?]{noformat}
> I looked this error up on the internet and it said that it might have 
> something to do with using the same key for different lines.
>  I checked, but I couldn't find any duplicates that match any of the selected 
> fields in the JDBC.
> Hereby my queries:
>  Seeding query
> {code:java}
> SELECT pk1 as $(IDCOLUMN)
> FROM dbo.bb2
> WHERE search_url IS NOT NULL
> AND mimetype IS NOT NULL AND mimetype NOT IN ('unknown/unknown', 
> 'application/xml', 'application/zip');
> {code}
> Version check query: none
>  Access token query: none
>  Data query: 
>  
>  
> {code:java}
> SELECT 
> pk1 AS $(IDCOLUMN), 
> search_url AS $(URLCOLUMN), 
> ISNULL(content, '') AS $(DATACOLUMN),
> doc_id, 
> search_url AS url, 
> ISNULL(title, '') as title, 
> ISNULL(groups,'') as groups, 
> ISNULL(type,'') as document_type, 
> ISNULL(users, '') as users
> FROM dbo.bb2
> WHERE pk1 IN $(IDLIST);
> {code}
> The hereby added csv is the corresponding line from the table.
> [^636_bb2.csv]
>  
> Due to this problem, the whole crawling pipeline is being held up. It keeps 
> on retrying this line.
> Could you help me understand this error?
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Resolved] (CONNECTORS-1579) Error when crawling a MSSQL table

2019-02-05 Thread Karl Wright (JIRA)



 [ 
https://issues.apache.org/jira/browse/CONNECTORS-1579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karl Wright resolved CONNECTORS-1579.
-
   Resolution: Fixed
Fix Version/s: ManifoldCF 2.13

r1853008

> Error when crawling a MSSQL table
> -
>
> Key: CONNECTORS-1579
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1579
> Project: ManifoldCF
>  Issue Type: Bug
>  Components: JDBC connector
>Affects Versions: ManifoldCF 2.12
>Reporter: Donald Van den Driessche
>Assignee: Karl Wright
>Priority: Major
> Fix For: ManifoldCF 2.13
>
> Attachments: 636_bb2.csv, CONNECTORS-1579.patch
>
>
> When I'm crawling a MSSQL table through the JDBC connector I get following 
> error on multiple lines:
>  
> {noformat}
> FATAL 2019-02-05T13:21:58,929 (Worker thread '40') - Error tossed: Multiple 
> document primary component dispositions not allowed: document '636'
> java.lang.IllegalStateException: Multiple document primary component 
> dispositions not allowed: document '636'
> at 
> org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.checkMultipleDispositions(WorkerThread.java:2125)
>  ~[mcf-pull-agent.jar:?]
> at 
> org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.noDocument(WorkerThread.java:1624)
>  ~[mcf-pull-agent.jar:?]
> at 
> org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.noDocument(WorkerThread.java:1605)
>  ~[mcf-pull-agent.jar:?]
> at 
> org.apache.manifoldcf.crawler.connectors.jdbc.JDBCConnector.processDocuments(JDBCConnector.java:944)
>  ~[?:?]
> at 
> org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:399) 
> [mcf-pull-agent.jar:?]{noformat}
> I looked this error up on the internet and it said that it might have 
> something to do with using the same key for different lines.
>  I checked, but I couldn't find any duplicates that match any of the selected 
> fields in the JDBC.
> Hereby my queries:
>  Seeding query
> {code:java}
> SELECT pk1 as $(IDCOLUMN)
> FROM dbo.bb2
> WHERE search_url IS NOT NULL
> AND mimetype IS NOT NULL AND mimetype NOT IN ('unknown/unknown', 
> 'application/xml', 'application/zip');
> {code}
> Version check query: none
>  Access token query: none
>  Data query: 
>  
>  
> {code:java}
> SELECT 
> pk1 AS $(IDCOLUMN), 
> search_url AS $(URLCOLUMN), 
> ISNULL(content, '') AS $(DATACOLUMN),
> doc_id, 
> search_url AS url, 
> ISNULL(title, '') as title, 
> ISNULL(groups,'') as groups, 
> ISNULL(type,'') as document_type, 
> ISNULL(users, '') as users
> FROM dbo.bb2
> WHERE pk1 IN $(IDLIST);
> {code}
> The hereby added csv is the corresponding line from the table.
> [^636_bb2.csv]
>  
> Due to this problem, the whole crawling pipeline is being held up. It keeps 
> on retrying this line.
> Could you help me understand this error?
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (CONNECTORS-1579) Error when crawling a MSSQL table

2019-02-05 Thread Karl Wright (JIRA)



 [ 
https://issues.apache.org/jira/browse/CONNECTORS-1579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karl Wright updated CONNECTORS-1579:

Attachment: CONNECTORS-1579.patch

> Error when crawling a MSSQL table
> -
>
> Key: CONNECTORS-1579
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1579
> Project: ManifoldCF
>  Issue Type: Bug
>  Components: JDBC connector
>Affects Versions: ManifoldCF 2.12
>Reporter: Donald Van den Driessche
>Assignee: Karl Wright
>Priority: Major
> Attachments: 636_bb2.csv, CONNECTORS-1579.patch
>
>
> When I'm crawling a MSSQL table through the JDBC connector I get following 
> error on multiple lines:
>  
> {noformat}
> FATAL 2019-02-05T13:21:58,929 (Worker thread '40') - Error tossed: Multiple 
> document primary component dispositions not allowed: document '636'
> java.lang.IllegalStateException: Multiple document primary component 
> dispositions not allowed: document '636'
> at 
> org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.checkMultipleDispositions(WorkerThread.java:2125)
>  ~[mcf-pull-agent.jar:?]
> at 
> org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.noDocument(WorkerThread.java:1624)
>  ~[mcf-pull-agent.jar:?]
> at 
> org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.noDocument(WorkerThread.java:1605)
>  ~[mcf-pull-agent.jar:?]
> at 
> org.apache.manifoldcf.crawler.connectors.jdbc.JDBCConnector.processDocuments(JDBCConnector.java:944)
>  ~[?:?]
> at 
> org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:399) 
> [mcf-pull-agent.jar:?]{noformat}
> I looked this error up on the internet and it said that it might have 
> something to do with using the same key for different lines.
>  I checked, but I couldn't find any duplicates that match any of the selected 
> fields in the JDBC.
> Hereby my queries:
>  Seeding query
> {code:java}
> SELECT pk1 as $(IDCOLUMN)
> FROM dbo.bb2
> WHERE search_url IS NOT NULL
> AND mimetype IS NOT NULL AND mimetype NOT IN ('unknown/unknown', 
> 'application/xml', 'application/zip');
> {code}
> Version check query: none
>  Access token query: none
>  Data query: 
>  
>  
> {code:java}
> SELECT 
> pk1 AS $(IDCOLUMN), 
> search_url AS $(URLCOLUMN), 
> ISNULL(content, '') AS $(DATACOLUMN),
> doc_id, 
> search_url AS url, 
> ISNULL(title, '') as title, 
> ISNULL(groups,'') as groups, 
> ISNULL(type,'') as document_type, 
> ISNULL(users, '') as users
> FROM dbo.bb2
> WHERE pk1 IN $(IDLIST);
> {code}
> The hereby added csv is the corresponding line from the table.
> [^636_bb2.csv]
>  
> Due to this problem, the whole crawling pipeline is being held up. It keeps 
> on retrying this line.
> Could you help me understand this error?
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (CONNECTORS-1579) Error when crawling a MSSQL table

2019-02-05 Thread Karl Wright (JIRA)



[ 
https://issues.apache.org/jira/browse/CONNECTORS-1579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16760848#comment-16760848
 ] 

Karl Wright commented on CONNECTORS-1579:
-

It's a bug in the code.
Whenever the JDBC connector rejects a document based on what the downstream 
pipeline tells it to do, it improperly accounts for that and you get this 
error.  The fix is quite simple and I can attach a patch, and will do so 
shortly. Thanks!


> Error when crawling a MSSQL table
> -
>
> Key: CONNECTORS-1579
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1579
> Project: ManifoldCF
>  Issue Type: Bug
>  Components: JDBC connector
>Affects Versions: ManifoldCF 2.12
>Reporter: Donald Van den Driessche
>Assignee: Karl Wright
>Priority: Major
> Attachments: 636_bb2.csv
>
>
> When I'm crawling a MSSQL table through the JDBC connector I get following 
> error on multiple lines:
>  
> {noformat}
> FATAL 2019-02-05T13:21:58,929 (Worker thread '40') - Error tossed: Multiple 
> document primary component dispositions not allowed: document '636'
> java.lang.IllegalStateException: Multiple document primary component 
> dispositions not allowed: document '636'
> at 
> org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.checkMultipleDispositions(WorkerThread.java:2125)
>  ~[mcf-pull-agent.jar:?]
> at 
> org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.noDocument(WorkerThread.java:1624)
>  ~[mcf-pull-agent.jar:?]
> at 
> org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.noDocument(WorkerThread.java:1605)
>  ~[mcf-pull-agent.jar:?]
> at 
> org.apache.manifoldcf.crawler.connectors.jdbc.JDBCConnector.processDocuments(JDBCConnector.java:944)
>  ~[?:?]
> at 
> org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:399) 
> [mcf-pull-agent.jar:?]{noformat}
> I looked this error up on the internet and it said that it might have 
> something to do with using the same key for different lines.
>  I checked, but I couldn't find any duplicates that match any of the selected 
> fields in the JDBC.
> Hereby my queries:
>  Seeding query
> {code:java}
> SELECT pk1 as $(IDCOLUMN)
> FROM dbo.bb2
> WHERE search_url IS NOT NULL
> AND mimetype IS NOT NULL AND mimetype NOT IN ('unknown/unknown', 
> 'application/xml', 'application/zip');
> {code}
> Version check query: none
>  Access token query: none
>  Data query: 
>  
>  
> {code:java}
> SELECT 
> pk1 AS $(IDCOLUMN), 
> search_url AS $(URLCOLUMN), 
> ISNULL(content, '') AS $(DATACOLUMN),
> doc_id, 
> search_url AS url, 
> ISNULL(title, '') as title, 
> ISNULL(groups,'') as groups, 
> ISNULL(type,'') as document_type, 
> ISNULL(users, '') as users
> FROM dbo.bb2
> WHERE pk1 IN $(IDLIST);
> {code}
> The hereby added csv is the corresponding line from the table.
> [^636_bb2.csv]
>  
> Due to this problem, the whole crawling pipeline is being held up. It keeps 
> on retrying this line.
> Could you help me understand this error?
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (CONNECTORS-1579) Error when crawling a MSSQL table

2019-02-05 Thread Karl Wright (JIRA)



[ 
https://issues.apache.org/jira/browse/CONNECTORS-1579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16760818#comment-16760818
 ] 

Karl Wright commented on CONNECTORS-1579:
-

Hi,

The proximate cause of the problem is that there are multiple "resolutions" 
occurring for one document in the JDBC crawl set.  When a connector is asked to 
process a document, it must tell the framework what is to be done with it -- 
either it gets indexed, or it gets skipped, or it gets deleted.  The problem is 
that the connector is telling the framework TWO things for the same document.  
The code in question:

{code}
// Now, go through the original id's, and see which ones are still in the 
map.  These
// did not appear in the result and are presumed to be gone from the 
database, and thus must be deleted.
for (final String documentIdentifier : fetchDocuments)
{
  if (!seenDocuments.contains(documentIdentifier))
  {
// Never saw it in the fetch attempt
activities.deleteDocument(documentIdentifier);
  }
  else
  {
// Saw it in the fetch attempt, and we might have fetched it
final String documentVersion = map.get(documentIdentifier);
if (documentVersion != null)
{
  // This means we did not see it (or data for it) in the result set.  
Delete it!
  activities.noDocument(documentIdentifier,documentVersion);
{code}

It's failing on the last line.  The connector thinks there is in fact no 
document that exists (based on the version query you gave it), BUT based on the 
results of the other queries, it thinks the document does exist (and was in 
fact processed).

I will need to look carefully at the queries and at the connector code to 
figure out exactly how that can happen, and then I can let you know whether 
it's a bug in the code or a bug in your queries.  Stay tuned.


> Error when crawling a MSSQL table
> -
>
> Key: CONNECTORS-1579
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1579
> Project: ManifoldCF
>  Issue Type: Bug
>  Components: JDBC connector
>Affects Versions: ManifoldCF 2.12
>Reporter: Donald Van den Driessche
>Assignee: Karl Wright
>Priority: Major
> Attachments: 636_bb2.csv
>
>
> When I'm crawling a MSSQL table through the JDBC connector I get following 
> error on multiple lines:
>  
> {noformat}
> FATAL 2019-02-05T13:21:58,929 (Worker thread '40') - Error tossed: Multiple 
> document primary component dispositions not allowed: document '636'
> java.lang.IllegalStateException: Multiple document primary component 
> dispositions not allowed: document '636'
> at 
> org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.checkMultipleDispositions(WorkerThread.java:2125)
>  ~[mcf-pull-agent.jar:?]
> at 
> org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.noDocument(WorkerThread.java:1624)
>  ~[mcf-pull-agent.jar:?]
> at 
> org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.noDocument(WorkerThread.java:1605)
>  ~[mcf-pull-agent.jar:?]
> at 
> org.apache.manifoldcf.crawler.connectors.jdbc.JDBCConnector.processDocuments(JDBCConnector.java:944)
>  ~[?:?]
> at 
> org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:399) 
> [mcf-pull-agent.jar:?]{noformat}
> I looked this error up on the internet and it said that it might have 
> something to do with using the same key for different lines.
>  I checked, but I couldn't find any duplicates that match any of the selected 
> fields in the JDBC.
> Hereby my queries:
>  Seeding query
> {code:java}
> SELECT pk1 as $(IDCOLUMN)
> FROM dbo.bb2
> WHERE search_url IS NOT NULL
> AND mimetype IS NOT NULL AND mimetype NOT IN ('unknown/unknown', 
> 'application/xml', 'application/zip');
> {code}
> Version check query: none
>  Access token query: none
>  Data query: 
>  
>  
> {code:java}
> SELECT 
> pk1 AS $(IDCOLUMN), 
> search_url AS $(URLCOLUMN), 
> ISNULL(content, '') AS $(DATACOLUMN),
> doc_id, 
> search_url AS url, 
> ISNULL(title, '') as title, 
> ISNULL(groups,'') as groups, 
> ISNULL(type,'') as document_type, 
> ISNULL(users, '') as users
> FROM dbo.bb2
> WHERE pk1 IN $(IDLIST);
> {code}
> The hereby added csv is the corresponding line from the table.
> [^636_bb2.csv]
>  
> Due to this problem, the whole crawling pipeline is being held up. It keeps 
> on retrying this line.
> Could you help me understand this error?
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Assigned] (CONNECTORS-1579) Error when crawling a MSSQL table

2019-02-05 Thread Karl Wright (JIRA)



 [ 
https://issues.apache.org/jira/browse/CONNECTORS-1579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karl Wright reassigned CONNECTORS-1579:
---

Assignee: Karl Wright

> Error when crawling a MSSQL table
> -
>
> Key: CONNECTORS-1579
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1579
> Project: ManifoldCF
>  Issue Type: Bug
>  Components: JDBC connector
>Affects Versions: ManifoldCF 2.12
>Reporter: Donald Van den Driessche
>Assignee: Karl Wright
>Priority: Major
> Attachments: 636_bb2.csv
>
>
> When I'm crawling a MSSQL table through the JDBC connector I get following 
> error on multiple lines:
>  
> {noformat}
> FATAL 2019-02-05T13:21:58,929 (Worker thread '40') - Error tossed: Multiple 
> document primary component dispositions not allowed: document '636'
> java.lang.IllegalStateException: Multiple document primary component 
> dispositions not allowed: document '636'
> at 
> org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.checkMultipleDispositions(WorkerThread.java:2125)
>  ~[mcf-pull-agent.jar:?]
> at 
> org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.noDocument(WorkerThread.java:1624)
>  ~[mcf-pull-agent.jar:?]
> at 
> org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.noDocument(WorkerThread.java:1605)
>  ~[mcf-pull-agent.jar:?]
> at 
> org.apache.manifoldcf.crawler.connectors.jdbc.JDBCConnector.processDocuments(JDBCConnector.java:944)
>  ~[?:?]
> at 
> org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:399) 
> [mcf-pull-agent.jar:?]{noformat}
> I looked this error up on the internet and it said that it might have 
> something to do with using the same key for different lines.
>  I checked, but I couldn't find any duplicates that match any of the selected 
> fields in the JDBC.
> Hereby my queries:
>  Seeding query
> {code:java}
> SELECT pk1 as $(IDCOLUMN)
> FROM dbo.bb2
> WHERE search_url IS NOT NULL
> AND mimetype IS NOT NULL AND mimetype NOT IN ('unknown/unknown', 
> 'application/xml', 'application/zip');
> {code}
> Version check query: none
>  Access token query: none
>  Data query: 
>  
>  
> {code:java}
> SELECT 
> pk1 AS $(IDCOLUMN), 
> search_url AS $(URLCOLUMN), 
> ISNULL(content, '') AS $(DATACOLUMN),
> doc_id, 
> search_url AS url, 
> ISNULL(title, '') as title, 
> ISNULL(groups,'') as groups, 
> ISNULL(type,'') as document_type, 
> ISNULL(users, '') as users
> FROM dbo.bb2
> WHERE pk1 IN $(IDLIST);
> {code}
> The hereby added csv is the corresponding line from the table.
> [^636_bb2.csv]
>  
> Due to this problem, the whole crawling pipeline is being held up. It keeps 
> on retrying this line.
> Could you help me understand this error?
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Re: Apache CXF question

2019-02-01 Thread Karl Wright

Thanks, Kishore -- but I already have the documentation.  What I need is
Apache CXF expertise. ;-)

Karl


On Fri, Feb 1, 2019 at 2:28 PM Kishore Kumar  wrote:

> Hi Karl,
>
> Good morning, I have shared you a Dropbox shared folder with OpenText
> Content Server Web Service Documentation.
>
> If you have not received the link from Dropbox in your inbox, check in
> Spam or let me know.
>
> Thanks,
> Kishore Kumar
>
> -----Original Message-
> From: Karl Wright 
> Sent: 01 February 2019 05:48
> To: dev ; Rafa Haro 
> Subject: Apache CXF question
>
> I'm still working on the new OpenText connector, now using Apache CXF to
> handle the web services piece.  I've never worked with this package before,
> but I've got the WSDLs generating what looks like usable java classes
> representing the WSDL interfaces.  But the underlying transport is
> mysterious given what is generated.  So, two questions:
>
> (1) It doesn't appear to me like explicit generation of classes from the
> XSD are needed here.  It looks like CXF does that too.  Am I wrong?
> (2) I want the transport to go via an HttpComponents/HttpClient HttpClient
> object that I create and initialize myself.  How can I set that up?  If
> anyone on this list has a few snippets of code they can share it would be
> great.
>
> Thanks in advance,
> Karl
>

Apache CXF question

2019-02-01 Thread Karl Wright

I'm still working on the new OpenText connector, now using Apache CXF to
handle the web services piece.  I've never worked with this package before,
but I've got the WSDLs generating what looks like usable java classes
representing the WSDL interfaces.  But the underlying transport is
mysterious given what is generated.  So, two questions:

(1) It doesn't appear to me like explicit generation of classes from the
XSD are needed here.  It looks like CXF does that too.  Am I wrong?
(2) I want the transport to go via an HttpComponents/HttpClient HttpClient
object that I create and initialize myself.  How can I set that up?  If
anyone on this list has a few snippets of code they can share it would be
great.

Thanks in advance,
Karl

[jira] [Commented] (CONNECTORS-1564) Support preemptive authentication to Solr connector

2019-01-31 Thread Karl Wright (JIRA)



[ 
https://issues.apache.org/jira/browse/CONNECTORS-1564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16757517#comment-16757517
 ] 

Karl Wright commented on CONNECTORS-1564:
-

[~michael-o], we have zero control over whether/when this gets addressed in 
SolrJ.  Previous interactions with the SolrJ developers do not make me feel 
like a fix would likely be a prompt one.  But I suggest that [~erlendfg] at 
least take the step of opening a ticket.

We can afford to wait until the next MCF release is imminent before taking any 
action, but if there's no resolution in sight then, I think we should implement 
the workaround for the time being.


> Support preemptive authentication to Solr connector
> ---
>
> Key: CONNECTORS-1564
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1564
> Project: ManifoldCF
>  Issue Type: Improvement
>  Components: Lucene/SOLR connector
>Reporter: Erlend Garåsen
>    Assignee: Karl Wright
>Priority: Major
> Attachments: CONNECTORS-1564.patch
>
>
> We should post preemptively in case the Solr server requires basic 
> authentication. This will make the communication between ManifoldCF and Solr 
> much more effective instead of the following:
>  * Send a HTTP POST request to Solr
>  * Solr sends a 401 response
>  * Send the same request, but with a "{{Authorization: Basic}}" header
> With preemptive authentication, we can send the header in the first request.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (CONNECTORS-1564) Support preemptive authentication to Solr connector

2019-01-31 Thread Karl Wright (JIRA)



[ 
https://issues.apache.org/jira/browse/CONNECTORS-1564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16757137#comment-16757137
 ] 

Karl Wright commented on CONNECTORS-1564:
-

[~erlendfg], if SolrJ is overriding our .setExpectContinue(true), then your 
workaround is pretty reasonable, and I'd be happy to commit that (as long as 
you include enough comment so that we can figure out what we were thinking 
later).


> Support preemptive authentication to Solr connector
> ---
>
> Key: CONNECTORS-1564
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1564
> Project: ManifoldCF
>  Issue Type: Improvement
>  Components: Lucene/SOLR connector
>Reporter: Erlend Garåsen
>    Assignee: Karl Wright
>Priority: Major
> Attachments: CONNECTORS-1564.patch
>
>
> We should post preemptively in case the Solr server requires basic 
> authentication. This will make the communication between ManifoldCF and Solr 
> much more effective instead of the following:
>  * Send a HTTP POST request to Solr
>  * Solr sends a 401 response
>  * Send the same request, but with a "{{Authorization: Basic}}" header
> With preemptive authentication, we can send the header in the first request.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (CONNECTORS-1564) Support preemptive authentication to Solr connector

2019-01-30 Thread Karl Wright (JIRA)



[ 
https://issues.apache.org/jira/browse/CONNECTORS-1564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16756074#comment-16756074
 ] 

Karl Wright commented on CONNECTORS-1564:
-

The way you tell it is this:

{code}
request.setProtocolVersion(HttpVersion.HTTP_1_1);
{code}

I suspect there's a similar method in the RequestOptions builder.  But I bet 
one of the things we're doing in the builder is convincing it that it's HTTP 
1.0, and that's the problem.  We need to figure out what it is.


> Support preemptive authentication to Solr connector
> ---
>
> Key: CONNECTORS-1564
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1564
> Project: ManifoldCF
>  Issue Type: Improvement
>  Components: Lucene/SOLR connector
>Reporter: Erlend Garåsen
>    Assignee: Karl Wright
>Priority: Major
> Attachments: CONNECTORS-1564.patch
>
>
> We should post preemptively in case the Solr server requires basic 
> authentication. This will make the communication between ManifoldCF and Solr 
> much more effective instead of the following:
>  * Send a HTTP POST request to Solr
>  * Solr sends a 401 response
>  * Send the same request, but with a "{{Authorization: Basic}}" header
> With preemptive authentication, we can send the header in the first request.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (CONNECTORS-1564) Support preemptive authentication to Solr connector

2019-01-30 Thread Karl Wright (JIRA)



[ 
https://issues.apache.org/jira/browse/CONNECTORS-1564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16756071#comment-16756071
 ] 

Karl Wright commented on CONNECTORS-1564:
-

Oh, and I vaguely recall something -- that since the expect-continue header is 
for HTTP 1.1 (and not HTTP 1.0), there was code in HttpComponents/HttpClient 
that disabled it if the client thought it was working in an HTTP 1.0 
environment.  I wonder if we just need to tell it somehow that it's HTTP 1.1?


> Support preemptive authentication to Solr connector
> ---
>
> Key: CONNECTORS-1564
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1564
> Project: ManifoldCF
>  Issue Type: Improvement
>  Components: Lucene/SOLR connector
>Reporter: Erlend Garåsen
>    Assignee: Karl Wright
>Priority: Major
> Attachments: CONNECTORS-1564.patch
>
>
> We should post preemptively in case the Solr server requires basic 
> authentication. This will make the communication between ManifoldCF and Solr 
> much more effective instead of the following:
>  * Send a HTTP POST request to Solr
>  * Solr sends a 401 response
>  * Send the same request, but with a "{{Authorization: Basic}}" header
> With preemptive authentication, we can send the header in the first request.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (CONNECTORS-1564) Support preemptive authentication to Solr connector

2019-01-30 Thread Karl Wright (JIRA)



[ 
https://issues.apache.org/jira/browse/CONNECTORS-1564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16756069#comment-16756069
 ] 

Karl Wright commented on CONNECTORS-1564:
-

[~erlendfg], forcing the header would be a last resort.  But we can do it if we 
must.  However there are about a dozen connectors that rely on this 
functionality working properly, so I really want to know what is going wrong.

Can you experiment with changing the order of the builder method invocations 
for HttpClient in HttpPoster?  It's the only thing I can think of that might be 
germane.  Perhaps if toString() isn't helpful, you can still inspect the 
property in question.  Is there a getter method for useExpectContinue?



> Support preemptive authentication to Solr connector
> ---
>
> Key: CONNECTORS-1564
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1564
> Project: ManifoldCF
>  Issue Type: Improvement
>  Components: Lucene/SOLR connector
>Reporter: Erlend Garåsen
>    Assignee: Karl Wright
>Priority: Major
> Attachments: CONNECTORS-1564.patch
>
>
> We should post preemptively in case the Solr server requires basic 
> authentication. This will make the communication between ManifoldCF and Solr 
> much more effective instead of the following:
>  * Send a HTTP POST request to Solr
>  * Solr sends a 401 response
>  * Send the same request, but with a "{{Authorization: Basic}}" header
> With preemptive authentication, we can send the header in the first request.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Re: About publishing in mvn central repository

2019-01-30 Thread Karl Wright

There's a ticket outstanding for this but nobody could figure out how to do
it, since the jars are built with Ant not Maven.

If you want to work out how, please feel free to go ahead.

Karl

On Wed, Jan 30, 2019 at 7:08 AM Cihad Guzel  wrote:

> Hi,
>
> There aren't Manifoldcf jar packages in the mvn central repository. Maybe
> they can be published in the repository? So we can add mcf-core or other
> mfc jar packages to our projects as dependency.
>
> What do you think about that?
>
> [1]
> https://maven.apache.org/repository/guide-central-repository-upload.html
>
>
> Regards,
> Cihad Güzel
>

[jira] [Commented] (CONNECTORS-1564) Support preemptive authentication to Solr connector

2019-01-29 Thread Karl Wright (JIRA)



[ 
https://issues.apache.org/jira/browse/CONNECTORS-1564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16755582#comment-16755582
 ] 

Karl Wright commented on CONNECTORS-1564:
-

[~erlendfg], are you in a position to build MCF and experiment with how the 
HttpClient is constructed in HttpPoster.java?  I suspect that what is happening 
is that the expect/continue is indeed being set but something that is later 
done to the builder is turning it back off again.  So I would suggest adding a 
log.debug("httpclientbuilder = "+httpClientBuilder) line in there before we 
actually use the builder to construct the client, to see if this is the case, 
and if so, try to figure out which addition is causing the flag to be flipped 
back.


> Support preemptive authentication to Solr connector
> ---
>
> Key: CONNECTORS-1564
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1564
> Project: ManifoldCF
>  Issue Type: Improvement
>  Components: Lucene/SOLR connector
>Reporter: Erlend Garåsen
>Assignee: Karl Wright
>Priority: Major
> Attachments: CONNECTORS-1564.patch
>
>
> We should post preemptively in case the Solr server requires basic 
> authentication. This will make the communication between ManifoldCF and Solr 
> much more effective instead of the following:
>  * Send a HTTP POST request to Solr
>  * Solr sends a 401 response
>  * Send the same request, but with a "{{Authorization: Basic}}" header
> With preemptive authentication, we can send the header in the first request.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Comment Edited] (CONNECTORS-1564) Support preemptive authentication to Solr connector

2019-01-29 Thread Karl Wright (JIRA)



[ 
https://issues.apache.org/jira/browse/CONNECTORS-1564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1673#comment-1673
 ] 

Karl Wright edited comment on CONNECTORS-1564 at 1/30/19 1:37 AM:
--

[~michael-o] We're using the standard setup code that was recommended by Oleg.  
If the builders have decent toString() methods, we can dump them to the log 
when we create the HttpClient object to confirm they are set up correctly.  But 
from the beginning we could see nothing wrong with it.

This was the test you said was working:

{code}
HttpClientBuilder builder = HttpClientBuilder.create();
RequestConfig rc = 
RequestConfig.custom().setExpectContinueEnabled(true).build();
builder.setDefaultRequestConfig(rc);
{code}

We will figure out what winds up canceling out the expect/continue flag, if 
that's what indeed is happening.



was (Author: kwri...@metacarta.com):
[~michael-o] We're using the standard setup code that was recommended by Oleg.  
If the builders have decent toString() methods, we can dump them to the log 
when we create the HttpClient object to confirm they are set up correctly.  But 
from the beginning we could see nothing wrong with it.

Can you include the test example here that you used to verify that 
expect-continue was working?  


> Support preemptive authentication to Solr connector
> ---
>
> Key: CONNECTORS-1564
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1564
> Project: ManifoldCF
>  Issue Type: Improvement
>  Components: Lucene/SOLR connector
>Reporter: Erlend Garåsen
>    Assignee: Karl Wright
>Priority: Major
> Attachments: CONNECTORS-1564.patch
>
>
> We should post preemptively in case the Solr server requires basic 
> authentication. This will make the communication between ManifoldCF and Solr 
> much more effective instead of the following:
>  * Send a HTTP POST request to Solr
>  * Solr sends a 401 response
>  * Send the same request, but with a "{{Authorization: Basic}}" header
> With preemptive authentication, we can send the header in the first request.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (CONNECTORS-1564) Support preemptive authentication to Solr connector

2019-01-29 Thread Karl Wright (JIRA)



[ 
https://issues.apache.org/jira/browse/CONNECTORS-1564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1673#comment-1673
 ] 

Karl Wright commented on CONNECTORS-1564:
-

[~michael-o] We're using the standard setup code that was recommended by Oleg.  
If the builders have decent toString() methods, we can dump them to the log 
when we create the HttpClient object to confirm they are set up correctly.  But 
from the beginning we could see nothing wrong with it.

Can you include the test example here that you used to verify that 
expect-continue was working?  


> Support preemptive authentication to Solr connector
> ---
>
> Key: CONNECTORS-1564
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1564
> Project: ManifoldCF
>  Issue Type: Improvement
>  Components: Lucene/SOLR connector
>Reporter: Erlend Garåsen
>    Assignee: Karl Wright
>Priority: Major
> Attachments: CONNECTORS-1564.patch
>
>
> We should post preemptively in case the Solr server requires basic 
> authentication. This will make the communication between ManifoldCF and Solr 
> much more effective instead of the following:
>  * Send a HTTP POST request to Solr
>  * Solr sends a 401 response
>  * Send the same request, but with a "{{Authorization: Basic}}" header
> With preemptive authentication, we can send the header in the first request.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Resolved] (CONNECTORS-1576) Running Multiple Jobs in ManifoldCF

2019-01-29 Thread Karl Wright (JIRA)



 [ 
https://issues.apache.org/jira/browse/CONNECTORS-1576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karl Wright resolved CONNECTORS-1576.
-
Resolution: Not A Problem

> Running Multiple Jobs in ManifoldCF
> ---
>
> Key: CONNECTORS-1576
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1576
> Project: ManifoldCF
>  Issue Type: Bug
>  Components: Documentum connector
>Affects Versions: ManifoldCF 2.9.1
>Reporter: Pavithra Dhakshinamurthy
>Priority: Major
>  Labels: features
> Fix For: ManifoldCF 2.9.1
>
>
> Hi,
> We have configured two jobs to index documentum contents. when running it in 
> parallel, seeding is working fine. But only one job processes the document 
> and pushes to ES. After the first job completes, the second job is processing 
> the document. 
> Is this the expected behavior? Or Are we missing anything?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (CONNECTORS-1576) Running Multiple Jobs in ManifoldCF

2019-01-29 Thread Karl Wright (JIRA)



[ 
https://issues.apache.org/jira/browse/CONNECTORS-1576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16755215#comment-16755215
 ] 

Karl Wright commented on CONNECTORS-1576:
-

The documents that have been queued at the time the second job is started all 
must be processed before any documents from the second job are picked up.  This 
is because of how documents are assigned priorities in the database.

Once you get past the initial bunch of queued documents then both jobs will run 
simultaneously.



> Running Multiple Jobs in ManifoldCF
> ---
>
> Key: CONNECTORS-1576
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1576
> Project: ManifoldCF
>  Issue Type: Bug
>  Components: Documentum connector
>Affects Versions: ManifoldCF 2.9.1
>Reporter: Pavithra Dhakshinamurthy
>Priority: Major
>  Labels: features
> Fix For: ManifoldCF 2.9.1
>
>
> Hi,
> We have configured two jobs to index documentum contents. when running it in 
> parallel, seeding is working fine. But only one job processes the document 
> and pushes to ES. After the first job completes, the second job is processing 
> the document. 
> Is this the expected behavior? Or Are we missing anything?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (CONNECTORS-1564) Support preemptive authentication to Solr connector

2019-01-29 Thread Karl Wright (JIRA)



[ 
https://issues.apache.org/jira/browse/CONNECTORS-1564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16755212#comment-16755212
 ] 

Karl Wright commented on CONNECTORS-1564:
-

[~michael-o], you need to be looking here:

{code}
https://svn.apache.org/repos/asf/manifoldcf/trunk/connectors/solr/connector/src/main/java/org/apache/manifoldcf/agents/output/solr/HttpPoster.java
{code}

ManifoldCF has its own HttpClient construction.


> Support preemptive authentication to Solr connector
> ---
>
> Key: CONNECTORS-1564
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1564
> Project: ManifoldCF
>  Issue Type: Improvement
>  Components: Lucene/SOLR connector
>Reporter: Erlend Garåsen
>    Assignee: Karl Wright
>Priority: Major
> Attachments: CONNECTORS-1564.patch
>
>
> We should post preemptively in case the Solr server requires basic 
> authentication. This will make the communication between ManifoldCF and Solr 
> much more effective instead of the following:
>  * Send a HTTP POST request to Solr
>  * Solr sends a 401 response
>  * Send the same request, but with a "{{Authorization: Basic}}" header
> With preemptive authentication, we can send the header in the first request.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (CONNECTORS-1575) inconsistant use of value-labels

2019-01-28 Thread Karl Wright (JIRA)



[ 
https://issues.apache.org/jira/browse/CONNECTORS-1575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16753908#comment-16753908
 ] 

Karl Wright commented on CONNECTORS-1575:
-

This is because there are two somewhat different internal representations 
involved.  While it is unfortunate that they appear inconsistent, there is 
nothing that can be done to change them since doing so would be backwards 
incompatible.


> inconsistant use of value-labels 
> -
>
> Key: CONNECTORS-1575
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1575
> Project: ManifoldCF
>  Issue Type: Bug
>  Components: API
>Affects Versions: ManifoldCF 2.12
>Reporter: Tim Steenbeke
>Priority: Minor
> Attachments: image-2019-01-28-11-57-46-738.png
>
>
> When retrieving a job, using the API there seems to be inconsistencies in the 
> return JSON of a job.
> For the schedule value of 'hourofday', 'minutesofhour', etc. the label of the 
> value is 'value' while for all other value-labels it is '_value_'.
>  
> !image-2019-01-28-11-57-46-738.png!



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (CONNECTORS-1574) Performance tuning of manifold

2019-01-28 Thread Karl Wright (JIRA)



[ 
https://issues.apache.org/jira/browse/CONNECTORS-1574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16753913#comment-16753913
 ] 

Karl Wright commented on CONNECTORS-1574:
-

If you look in the ManifoldCF log, all queries that take more than a minute to 
execute are logged, along with an EXPLAIN plan.  Could you look at your logs 
and find the queries and provide their explanation?

The quality of the query plans is usually dependent on the quality of the 
statistics that the database keeps.  When the statistics are out of date, then 
the plan sometimes gets horribly bad.  ManifoldCF *attempts* to keep up with 
this by re-analyzing tables after a fixed number of changes, but necessarily it 
cannot do better than estimate the number of changes and their effects on the 
table statistics.  So if you are experiencing problems with certain queries, 
you can set properties.xml values that increase the frequency of analyze 
operations for that table.  But first we need to know what's going wrong.


> Performance tuning of manifold
> --
>
> Key: CONNECTORS-1574
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1574
> Project: ManifoldCF
>  Issue Type: Bug
>  Components: File system connector, JCIFS connector, Solr 6.x 
> component
>Affects Versions: ManifoldCF 2.5
> Environment: Apache manifold installed in Linux machine
> Linux version 3.10.0-327.el7.ppc64le
> Red Hat Enterprise Linux Server release 7.2 (Maipo)
>    Reporter: balaji
>Assignee: Karl Wright
>Priority: Critical
>  Labels: performance
>
> My team is using *Apache ManifoldCF 2.5 with SOLR Cloud* for indexing of 
> data. we are currently having 450-500 jobs which needs to run simultaneously. 
> We need to index json data and we are using connector type as *file system* 
> along with *postgres* as backend database. 
> We are facing several issues like
> 1. Scheduling works for some jobs and doesn't work for other jobs. 
> 2. Some jobs gets completed and some jobs hangs and doesn't get completed.
> 3. With one job earlier 6 documents was getting indexed in 15minutes but 
> now even a directory path having 5 documents takes 20 minutes or sometimes 
> doesn't get completed
> 4. "list all jobs" or "status and job management" page doesn't load sometimes 
> and on seeing the pg_stat_activity we observe that 2 queries are in waiting 
> state state because of which the page doesn't load. so if we kill those 
> queries or restart manifold the issue gets resolved and the page loads 
> properly
> queries getting stuck:
> 1. SELECT ID,FAILTIME, FAILCOUNT, SEEDINGVERSION, STATUS FROM JOBS WHERE 
> (STATUS=$1 OR STATUS=$2) FOR UPDATE
> 2. UPDATE JOBS SET ERRORTEXT=NULL, ENDTIME=NULL, WINDOWEND=NULL, STATUS=$1 
> WHERE ID=$2
> note : We have deployed manifold in *linux*. Our major requirement is 
> scheduling of jobs which will run every 15 minutes
> Please help us in fine tuning manifold so that it runs smoothly and acts as a 
> robust system.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Assigned] (CONNECTORS-1574) Performance tuning of manifold

2019-01-28 Thread Karl Wright (JIRA)



 [ 
https://issues.apache.org/jira/browse/CONNECTORS-1574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karl Wright reassigned CONNECTORS-1574:
---

Assignee: Karl Wright

> Performance tuning of manifold
> --
>
> Key: CONNECTORS-1574
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1574
> Project: ManifoldCF
>  Issue Type: Bug
>  Components: File system connector, JCIFS connector, Solr 6.x 
> component
>Affects Versions: ManifoldCF 2.5
> Environment: Apache manifold installed in Linux machine
> Linux version 3.10.0-327.el7.ppc64le
> Red Hat Enterprise Linux Server release 7.2 (Maipo)
>    Reporter: balaji
>Assignee: Karl Wright
>Priority: Critical
>  Labels: performance
>
> My team is using *Apache ManifoldCF 2.5 with SOLR Cloud* for indexing of 
> data. we are currently having 450-500 jobs which needs to run simultaneously. 
> We need to index json data and we are using connector type as *file system* 
> along with *postgres* as backend database. 
> We are facing several issues like
> 1. Scheduling works for some jobs and doesn't work for other jobs. 
> 2. Some jobs gets completed and some jobs hangs and doesn't get completed.
> 3. With one job earlier 6 documents was getting indexed in 15minutes but 
> now even a directory path having 5 documents takes 20 minutes or sometimes 
> doesn't get completed
> 4. "list all jobs" or "status and job management" page doesn't load sometimes 
> and on seeing the pg_stat_activity we observe that 2 queries are in waiting 
> state state because of which the page doesn't load. so if we kill those 
> queries or restart manifold the issue gets resolved and the page loads 
> properly
> queries getting stuck:
> 1. SELECT ID,FAILTIME, FAILCOUNT, SEEDINGVERSION, STATUS FROM JOBS WHERE 
> (STATUS=$1 OR STATUS=$2) FOR UPDATE
> 2. UPDATE JOBS SET ERRORTEXT=NULL, ENDTIME=NULL, WINDOWEND=NULL, STATUS=$1 
> WHERE ID=$2
> note : We have deployed manifold in *linux*. Our major requirement is 
> scheduling of jobs which will run every 15 minutes
> Please help us in fine tuning manifold so that it runs smoothly and acts as a 
> robust system.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Resolved] (CONNECTORS-1575) inconsistant use of value-labels

2019-01-28 Thread Karl Wright (JIRA)



 [ 
https://issues.apache.org/jira/browse/CONNECTORS-1575?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karl Wright resolved CONNECTORS-1575.
-
Resolution: Won't Fix

> inconsistant use of value-labels 
> -
>
> Key: CONNECTORS-1575
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1575
> Project: ManifoldCF
>  Issue Type: Bug
>  Components: API
>Affects Versions: ManifoldCF 2.12
>Reporter: Tim Steenbeke
>Priority: Minor
> Attachments: image-2019-01-28-11-57-46-738.png
>
>
> When retrieving a job, using the API there seems to be inconsistencies in the 
> return JSON of a job.
> For the schedule value of 'hourofday', 'minutesofhour', etc. the label of the 
> value is 'value' while for all other value-labels it is '_value_'.
>  
> !image-2019-01-28-11-57-46-738.png!



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Re: Mambo CMS

2019-01-26 Thread Karl Wright

We do not have Mambo connectors in MCF.
I don't know anything about CMIS support in that offering either.

Karl

On Sat, Jan 26, 2019 at 8:17 AM Furkan KAMACI 
wrote:

> Hi All,
>
> Mambo (http://mambo-foundation.org) is an open source CMS system which is
> being used by many companies.
>
> Do we have a Mambo integration via ManifoldCF or does anybody knows Mambo
> supports our CMIS connector?
>
> If not, we can suggest it as a GSoC project for 2019.
>
> Kind Regards,
> Furkan KAMACI
>

Re: Axis question

2019-01-26 Thread Karl Wright

I was able to get the wsdl->java compilation working without downloading a
ton of additional dependencies, and with cxf version 2.6.2.  Thanks, Rafa,
for your help in getting this far.

Karl


On Fri, Jan 25, 2019 at 4:11 PM Karl Wright  wrote:

> That's one approach.  I'm not thrilled with it; we cannot guarantee no
> client wsdl changes over time.  But if there's nothing better we'll have to
> live with it.
>
> The real problem, of course, is that code generated with version X of cxf
> requires runtime libraries from version X, and that's still a conflict.  So
> I need to get the WSDL2Java going for 2.6.2.
>
> Karl
>
>
> On Fri, Jan 25, 2019 at 3:54 PM Rafa Haro  wrote:
>
>> I would try to be pragmatic. If those wsdl are not likely to change in the
>> future, I would build the client classes offline. Not sure if the
>> generated
>> class are going to use further classes of cxf and then the problem could
>> end up being the same, but it is worth to try
>>
>> El El vie, 25 ene 2019 a las 21:14, Karl Wright 
>> escribió:
>>
>> > I downloaded the cxf binary, latest version.
>> > The dependency list is huge and very likely conflicts with existing
>> > connectors which have dependencies on cxf 2.x.  I would estimate that
>> > including all the new jars and dependencies would easily double our
>> > download footprint.
>> >
>> > Surely there must be a list of the minimal jars needed to get
>> WSDLToJava to
>> > function somewhere?
>> >
>> > Karl
>> >
>> >
>> >
>> >
>> > On Fri, Jan 25, 2019 at 2:14 PM Karl Wright  wrote:
>> >
>> > > I'm not getting missing cxf jars.  I'm getting problems with
>> downstream
>> > > dependencies.
>> > >
>> > > We don't usually ship more jars than we need to, is the short answer
>> to
>> > > your second question.
>> > >
>> > > Karl
>> > >
>> > >
>> > > On Fri, Jan 25, 2019 at 11:38 AM Rafa Haro  wrote:
>> > >
>> > >> which jars are you downloading?. Why not getting the whole release?
>> > >>
>> > >> On Fri, Jan 25, 2019 at 5:31 PM Rafa Haro  wrote:
>> > >>
>> > >>> Not sure, Karl I just picked up last release. I can try to find the
>> > >>> first version offering it but as long as they have backwards
>> > compatibility
>> > >>> we should be fine with the last version although we might need to
>> > update
>> > >>> the affected connectors
>> > >>>
>> > >>> Rafa
>> > >>>
>> > >>> On Fri, Jan 25, 2019 at 3:53 PM Karl Wright 
>> > wrote:
>> > >>>
>> > >>>> When did it first appear?  We're currently on 2.6.2; this is set by
>> > >>>> various dependencies by our connectors.
>> > >>>>
>> > >>>> Karl
>> > >>>>
>> > >>>> On Fri, Jan 25, 2019 at 9:52 AM Karl Wright 
>> > wrote:
>> > >>>>
>> > >>>>> The tools package doesn't seem to have it either.
>> > >>>>> Karl
>> > >>>>>
>> > >>>>>
>> > >>>>> On Fri, Jan 25, 2019 at 9:43 AM Karl Wright 
>> > >>>>> wrote:
>> > >>>>>
>> > >>>>>> Do you know what jar/maven package this is in?  because I don't
>> seem
>> > >>>>>> to have it in our normal cxf jars...
>> > >>>>>>
>> > >>>>>> Karl
>> > >>>>>>
>> > >>>>>>
>> > >>>>>> On Fri, Jan 25, 2019 at 9:08 AM Rafa Haro 
>> wrote:
>> > >>>>>>
>> > >>>>>>> I used a wsdl2java script that comes as an utility of the apache
>> > cxf
>> > >>>>>>> release, but basically is making use
>> > >>>>>>> of org.apache.cxf.tools.wsdlto.WSDLToJava class. You can find
>> here
>> > >>>>>>> an usage
>> > >>>>>>> example with ant: http://cxf.apache.org/docs/wsdl-to-java.html
>> > >>>>>>>
>> > >>>>>>> On Fri, Jan 25, 2019 at 2:59 PM Karl Wright > >
>> > >>>>>>> wrote:
>> > >>>>>

Re: Axis question

2019-01-25 Thread Karl Wright

That's one approach.  I'm not thrilled with it; we cannot guarantee no
client wsdl changes over time.  But if there's nothing better we'll have to
live with it.

The real problem, of course, is that code generated with version X of cxf
requires runtime libraries from version X, and that's still a conflict.  So
I need to get the WSDL2Java going for 2.6.2.

Karl


On Fri, Jan 25, 2019 at 3:54 PM Rafa Haro  wrote:

> I would try to be pragmatic. If those wsdl are not likely to change in the
> future, I would build the client classes offline. Not sure if the generated
> class are going to use further classes of cxf and then the problem could
> end up being the same, but it is worth to try
>
> El El vie, 25 ene 2019 a las 21:14, Karl Wright 
> escribió:
>
> > I downloaded the cxf binary, latest version.
> > The dependency list is huge and very likely conflicts with existing
> > connectors which have dependencies on cxf 2.x.  I would estimate that
> > including all the new jars and dependencies would easily double our
> > download footprint.
> >
> > Surely there must be a list of the minimal jars needed to get WSDLToJava
> to
> > function somewhere?
> >
> > Karl
> >
> >
> >
> >
> > On Fri, Jan 25, 2019 at 2:14 PM Karl Wright  wrote:
> >
> > > I'm not getting missing cxf jars.  I'm getting problems with downstream
> > > dependencies.
> > >
> > > We don't usually ship more jars than we need to, is the short answer to
> > > your second question.
> > >
> > > Karl
> > >
> > >
> > > On Fri, Jan 25, 2019 at 11:38 AM Rafa Haro  wrote:
> > >
> > >> which jars are you downloading?. Why not getting the whole release?
> > >>
> > >> On Fri, Jan 25, 2019 at 5:31 PM Rafa Haro  wrote:
> > >>
> > >>> Not sure, Karl I just picked up last release. I can try to find the
> > >>> first version offering it but as long as they have backwards
> > compatibility
> > >>> we should be fine with the last version although we might need to
> > update
> > >>> the affected connectors
> > >>>
> > >>> Rafa
> > >>>
> > >>> On Fri, Jan 25, 2019 at 3:53 PM Karl Wright 
> > wrote:
> > >>>
> > >>>> When did it first appear?  We're currently on 2.6.2; this is set by
> > >>>> various dependencies by our connectors.
> > >>>>
> > >>>> Karl
> > >>>>
> > >>>> On Fri, Jan 25, 2019 at 9:52 AM Karl Wright 
> > wrote:
> > >>>>
> > >>>>> The tools package doesn't seem to have it either.
> > >>>>> Karl
> > >>>>>
> > >>>>>
> > >>>>> On Fri, Jan 25, 2019 at 9:43 AM Karl Wright 
> > >>>>> wrote:
> > >>>>>
> > >>>>>> Do you know what jar/maven package this is in?  because I don't
> seem
> > >>>>>> to have it in our normal cxf jars...
> > >>>>>>
> > >>>>>> Karl
> > >>>>>>
> > >>>>>>
> > >>>>>> On Fri, Jan 25, 2019 at 9:08 AM Rafa Haro 
> wrote:
> > >>>>>>
> > >>>>>>> I used a wsdl2java script that comes as an utility of the apache
> > cxf
> > >>>>>>> release, but basically is making use
> > >>>>>>> of org.apache.cxf.tools.wsdlto.WSDLToJava class. You can find
> here
> > >>>>>>> an usage
> > >>>>>>> example with ant: http://cxf.apache.org/docs/wsdl-to-java.html
> > >>>>>>>
> > >>>>>>> On Fri, Jan 25, 2019 at 2:59 PM Karl Wright 
> > >>>>>>> wrote:
> > >>>>>>>
> > >>>>>>> > I was using ancient Axis 1.4 and none of them were working.
> You
> > >>>>>>> can
> > >>>>>>> > exercise this with "ant classcreate-wsdls" in the csws
> directory.
> > >>>>>>> >
> > >>>>>>> > If you can give instructions for invoking CXF, maybe we can do
> > that
> > >>>>>>> > instead.  What's the main class, and what jars do we need to
> > >>>>>>> include?
> > >>>>>>> >
> > >>>>>>> >

Re: Axis question

2019-01-25 Thread Karl Wright

I downloaded the cxf binary, latest version.
The dependency list is huge and very likely conflicts with existing
connectors which have dependencies on cxf 2.x.  I would estimate that
including all the new jars and dependencies would easily double our
download footprint.

Surely there must be a list of the minimal jars needed to get WSDLToJava to
function somewhere?

Karl




On Fri, Jan 25, 2019 at 2:14 PM Karl Wright  wrote:

> I'm not getting missing cxf jars.  I'm getting problems with downstream
> dependencies.
>
> We don't usually ship more jars than we need to, is the short answer to
> your second question.
>
> Karl
>
>
> On Fri, Jan 25, 2019 at 11:38 AM Rafa Haro  wrote:
>
>> which jars are you downloading?. Why not getting the whole release?
>>
>> On Fri, Jan 25, 2019 at 5:31 PM Rafa Haro  wrote:
>>
>>> Not sure, Karl I just picked up last release. I can try to find the
>>> first version offering it but as long as they have backwards compatibility
>>> we should be fine with the last version although we might need to update
>>> the affected connectors
>>>
>>> Rafa
>>>
>>> On Fri, Jan 25, 2019 at 3:53 PM Karl Wright  wrote:
>>>
>>>> When did it first appear?  We're currently on 2.6.2; this is set by
>>>> various dependencies by our connectors.
>>>>
>>>> Karl
>>>>
>>>> On Fri, Jan 25, 2019 at 9:52 AM Karl Wright  wrote:
>>>>
>>>>> The tools package doesn't seem to have it either.
>>>>> Karl
>>>>>
>>>>>
>>>>> On Fri, Jan 25, 2019 at 9:43 AM Karl Wright 
>>>>> wrote:
>>>>>
>>>>>> Do you know what jar/maven package this is in?  because I don't seem
>>>>>> to have it in our normal cxf jars...
>>>>>>
>>>>>> Karl
>>>>>>
>>>>>>
>>>>>> On Fri, Jan 25, 2019 at 9:08 AM Rafa Haro  wrote:
>>>>>>
>>>>>>> I used a wsdl2java script that comes as an utility of the apache cxf
>>>>>>> release, but basically is making use
>>>>>>> of org.apache.cxf.tools.wsdlto.WSDLToJava class. You can find here
>>>>>>> an usage
>>>>>>> example with ant: http://cxf.apache.org/docs/wsdl-to-java.html
>>>>>>>
>>>>>>> On Fri, Jan 25, 2019 at 2:59 PM Karl Wright 
>>>>>>> wrote:
>>>>>>>
>>>>>>> > I was using ancient Axis 1.4 and none of them were working.  You
>>>>>>> can
>>>>>>> > exercise this with "ant classcreate-wsdls" in the csws directory.
>>>>>>> >
>>>>>>> > If you can give instructions for invoking CXF, maybe we can do that
>>>>>>> > instead.  What's the main class, and what jars do we need to
>>>>>>> include?
>>>>>>> >
>>>>>>> > Karl
>>>>>>> >
>>>>>>> >
>>>>>>> > On Fri, Jan 25, 2019 at 7:28 AM Rafa Haro 
>>>>>>> wrote:
>>>>>>> >
>>>>>>> >> Yes, I did. I have only tested Authentication service with Apache
>>>>>>> CXF and
>>>>>>> >> it was apparently working fine. Which ones were failing for you?
>>>>>>> >>
>>>>>>> >> On Fri, Jan 25, 2019 at 12:38 PM Karl Wright 
>>>>>>> wrote:
>>>>>>> >>
>>>>>>> >>> Were you able to look at this yesterday at all?
>>>>>>> >>> Karl
>>>>>>> >>>
>>>>>>> >>> On Thu, Jan 24, 2019 at 6:34 AM Karl Wright 
>>>>>>> wrote:
>>>>>>> >>>
>>>>>>> >>>> They're all checked in.
>>>>>>> >>>>
>>>>>>> >>>> See
>>>>>>> >>>>
>>>>>>> https://svn.apache.org/repos/asf/manifoldcf/branches/CONNECTORS-1566/connectors/csws/wsdls
>>>>>>> >>>>
>>>>>>> >>>> Karl
>>>>>>> >>>>
>>>>>>> >>>>
>>>>>>> >>>> On Thu, Jan 24, 2019 at 6:24 AM Rafa Haro 
>>>>>>> wrote:
>>>>>>> >>>>
>>>>>>> >>>>> Karl, can you share the WSDL, I can try to take a look later
>>>>>>> today
>>>>>>> >>>>>
>>>>>>> >>>>> On Thu, Jan 24, 2019 at 12:13 PM Karl Wright <
>>>>>>> daddy...@gmail.com>
>>>>>>> >>>>> wrote:
>>>>>>> >>>>>
>>>>>>> >>>>> > I'm redeveloping the Livelink connector because the API code
>>>>>>> has been
>>>>>>> >>>>> > discontinued and the only API is now web services based.
>>>>>>> The WSDLs
>>>>>>> >>>>> and
>>>>>>> >>>>> > XSDs have been exported and I'm trying to use the Axis tool
>>>>>>> >>>>> WSDL2Java to
>>>>>>> >>>>> > convert to Java code.  Unfortunately, I haven't been able to
>>>>>>> make
>>>>>>> >>>>> this work
>>>>>>> >>>>> > -- even though the WSDLs references have been made local and
>>>>>>> the
>>>>>>> >>>>> XSDs also
>>>>>>> >>>>> > seem to be getting parsed, it complains about missing
>>>>>>> definitions,
>>>>>>> >>>>> even
>>>>>>> >>>>> > though those definitions are clearly present in the XSD
>>>>>>> files.
>>>>>>> >>>>> >
>>>>>>> >>>>> > Has anyone had enough experience with this tool, and web
>>>>>>> services in
>>>>>>> >>>>> > general, to figure out what's wrong?  I've tried turning on
>>>>>>> as
>>>>>>> >>>>> verbose a
>>>>>>> >>>>> > debugging level for WSDL2Java as I can and it's no help at
>>>>>>> all.  I
>>>>>>> >>>>> suspect
>>>>>>> >>>>> > namespace issues but I can't figure out what they are.
>>>>>>> >>>>> >
>>>>>>> >>>>> > Thanks in advance,
>>>>>>> >>>>> > Karl
>>>>>>> >>>>> >
>>>>>>> >>>>>
>>>>>>> >>>>
>>>>>>>
>>>>>>

Re: Axis question

2019-01-25 Thread Karl Wright

I'm not getting missing cxf jars.  I'm getting problems with downstream
dependencies.

We don't usually ship more jars than we need to, is the short answer to
your second question.

Karl


On Fri, Jan 25, 2019 at 11:38 AM Rafa Haro  wrote:

> which jars are you downloading?. Why not getting the whole release?
>
> On Fri, Jan 25, 2019 at 5:31 PM Rafa Haro  wrote:
>
>> Not sure, Karl I just picked up last release. I can try to find the first
>> version offering it but as long as they have backwards compatibility we
>> should be fine with the last version although we might need to update the
>> affected connectors
>>
>> Rafa
>>
>> On Fri, Jan 25, 2019 at 3:53 PM Karl Wright  wrote:
>>
>>> When did it first appear?  We're currently on 2.6.2; this is set by
>>> various dependencies by our connectors.
>>>
>>> Karl
>>>
>>> On Fri, Jan 25, 2019 at 9:52 AM Karl Wright  wrote:
>>>
>>>> The tools package doesn't seem to have it either.
>>>> Karl
>>>>
>>>>
>>>> On Fri, Jan 25, 2019 at 9:43 AM Karl Wright  wrote:
>>>>
>>>>> Do you know what jar/maven package this is in?  because I don't seem
>>>>> to have it in our normal cxf jars...
>>>>>
>>>>> Karl
>>>>>
>>>>>
>>>>> On Fri, Jan 25, 2019 at 9:08 AM Rafa Haro  wrote:
>>>>>
>>>>>> I used a wsdl2java script that comes as an utility of the apache cxf
>>>>>> release, but basically is making use
>>>>>> of org.apache.cxf.tools.wsdlto.WSDLToJava class. You can find here an
>>>>>> usage
>>>>>> example with ant: http://cxf.apache.org/docs/wsdl-to-java.html
>>>>>>
>>>>>> On Fri, Jan 25, 2019 at 2:59 PM Karl Wright 
>>>>>> wrote:
>>>>>>
>>>>>> > I was using ancient Axis 1.4 and none of them were working.  You can
>>>>>> > exercise this with "ant classcreate-wsdls" in the csws directory.
>>>>>> >
>>>>>> > If you can give instructions for invoking CXF, maybe we can do that
>>>>>> > instead.  What's the main class, and what jars do we need to
>>>>>> include?
>>>>>> >
>>>>>> > Karl
>>>>>> >
>>>>>> >
>>>>>> > On Fri, Jan 25, 2019 at 7:28 AM Rafa Haro  wrote:
>>>>>> >
>>>>>> >> Yes, I did. I have only tested Authentication service with Apache
>>>>>> CXF and
>>>>>> >> it was apparently working fine. Which ones were failing for you?
>>>>>> >>
>>>>>> >> On Fri, Jan 25, 2019 at 12:38 PM Karl Wright 
>>>>>> wrote:
>>>>>> >>
>>>>>> >>> Were you able to look at this yesterday at all?
>>>>>> >>> Karl
>>>>>> >>>
>>>>>> >>> On Thu, Jan 24, 2019 at 6:34 AM Karl Wright 
>>>>>> wrote:
>>>>>> >>>
>>>>>> >>>> They're all checked in.
>>>>>> >>>>
>>>>>> >>>> See
>>>>>> >>>>
>>>>>> https://svn.apache.org/repos/asf/manifoldcf/branches/CONNECTORS-1566/connectors/csws/wsdls
>>>>>> >>>>
>>>>>> >>>> Karl
>>>>>> >>>>
>>>>>> >>>>
>>>>>> >>>> On Thu, Jan 24, 2019 at 6:24 AM Rafa Haro 
>>>>>> wrote:
>>>>>> >>>>
>>>>>> >>>>> Karl, can you share the WSDL, I can try to take a look later
>>>>>> today
>>>>>> >>>>>
>>>>>> >>>>> On Thu, Jan 24, 2019 at 12:13 PM Karl Wright <
>>>>>> daddy...@gmail.com>
>>>>>> >>>>> wrote:
>>>>>> >>>>>
>>>>>> >>>>> > I'm redeveloping the Livelink connector because the API code
>>>>>> has been
>>>>>> >>>>> > discontinued and the only API is now web services based.  The
>>>>>> WSDLs
>>>>>> >>>>> and
>>>>>> >>>>> > XSDs have been exported and I'm trying to use the Axis tool
>>>>>> >>>>> WSDL2Java to
>>>>>> >>>>> > convert to Java code.  Unfortunately, I haven't been able to
>>>>>> make
>>>>>> >>>>> this work
>>>>>> >>>>> > -- even though the WSDLs references have been made local and
>>>>>> the
>>>>>> >>>>> XSDs also
>>>>>> >>>>> > seem to be getting parsed, it complains about missing
>>>>>> definitions,
>>>>>> >>>>> even
>>>>>> >>>>> > though those definitions are clearly present in the XSD files.
>>>>>> >>>>> >
>>>>>> >>>>> > Has anyone had enough experience with this tool, and web
>>>>>> services in
>>>>>> >>>>> > general, to figure out what's wrong?  I've tried turning on as
>>>>>> >>>>> verbose a
>>>>>> >>>>> > debugging level for WSDL2Java as I can and it's no help at
>>>>>> all.  I
>>>>>> >>>>> suspect
>>>>>> >>>>> > namespace issues but I can't figure out what they are.
>>>>>> >>>>> >
>>>>>> >>>>> > Thanks in advance,
>>>>>> >>>>> > Karl
>>>>>> >>>>> >
>>>>>> >>>>>
>>>>>> >>>>
>>>>>>
>>>>>

Re: Axis question

2019-01-25 Thread Karl Wright

I've been fighting with this pretty hard for a couple of hours now.  I did
find the proper cxf tools jar eventually but I'm getting one dependency
problem after another.  Currently I have:

>>>>>>
classcreate-wsdl-cxf:
[mkdir] Created dir:
/mnt/c/wip/mcf/CONNECTORS-1566/connectors/csws/build/wsdljava
 [java] Jan 25, 2019 4:09:02 PM org.apache.cxf.staxutils.StaxUtils
createXMLInputFactory
 [java] WARNING: Could not create a secure Stax XMLInputFactory.  Found
class com.sun.xml.internal.stream.XMLInputFactoryImpl.  Suggest Woodstox
4.2.0 or newer.
 [java] Jan 25, 2019 4:09:03 PM org.apache.cxf.staxutils.StaxUtils
createXMLInputFactory
 [java] WARNING: Could not create a secure Stax XMLInputFactory.  Found
class com.sun.xml.internal.stream.XMLInputFactoryImpl.  Suggest Woodstox
4.2.0 or newer.
 [java]
 [java] WSDLToJava Error: Could not find jaxws frontend within classpath
 [java]
<<<<<<

... even though I have jaxws* in the path and woodstox 5.7 too.

Rafa, can you tell me what classpath you are using and what the full
dependencies are for this tool?

Karl


On Fri, Jan 25, 2019 at 9:53 AM Karl Wright  wrote:

> When did it first appear?  We're currently on 2.6.2; this is set by
> various dependencies by our connectors.
>
> Karl
>
> On Fri, Jan 25, 2019 at 9:52 AM Karl Wright  wrote:
>
>> The tools package doesn't seem to have it either.
>> Karl
>>
>>
>> On Fri, Jan 25, 2019 at 9:43 AM Karl Wright  wrote:
>>
>>> Do you know what jar/maven package this is in?  because I don't seem to
>>> have it in our normal cxf jars...
>>>
>>> Karl
>>>
>>>
>>> On Fri, Jan 25, 2019 at 9:08 AM Rafa Haro  wrote:
>>>
>>>> I used a wsdl2java script that comes as an utility of the apache cxf
>>>> release, but basically is making use
>>>> of org.apache.cxf.tools.wsdlto.WSDLToJava class. You can find here an
>>>> usage
>>>> example with ant: http://cxf.apache.org/docs/wsdl-to-java.html
>>>>
>>>> On Fri, Jan 25, 2019 at 2:59 PM Karl Wright  wrote:
>>>>
>>>> > I was using ancient Axis 1.4 and none of them were working.  You can
>>>> > exercise this with "ant classcreate-wsdls" in the csws directory.
>>>> >
>>>> > If you can give instructions for invoking CXF, maybe we can do that
>>>> > instead.  What's the main class, and what jars do we need to include?
>>>> >
>>>> > Karl
>>>> >
>>>> >
>>>> > On Fri, Jan 25, 2019 at 7:28 AM Rafa Haro  wrote:
>>>> >
>>>> >> Yes, I did. I have only tested Authentication service with Apache
>>>> CXF and
>>>> >> it was apparently working fine. Which ones were failing for you?
>>>> >>
>>>> >> On Fri, Jan 25, 2019 at 12:38 PM Karl Wright 
>>>> wrote:
>>>> >>
>>>> >>> Were you able to look at this yesterday at all?
>>>> >>> Karl
>>>> >>>
>>>> >>> On Thu, Jan 24, 2019 at 6:34 AM Karl Wright 
>>>> wrote:
>>>> >>>
>>>> >>>> They're all checked in.
>>>> >>>>
>>>> >>>> See
>>>> >>>>
>>>> https://svn.apache.org/repos/asf/manifoldcf/branches/CONNECTORS-1566/connectors/csws/wsdls
>>>> >>>>
>>>> >>>> Karl
>>>> >>>>
>>>> >>>>
>>>> >>>> On Thu, Jan 24, 2019 at 6:24 AM Rafa Haro 
>>>> wrote:
>>>> >>>>
>>>> >>>>> Karl, can you share the WSDL, I can try to take a look later today
>>>> >>>>>
>>>> >>>>> On Thu, Jan 24, 2019 at 12:13 PM Karl Wright 
>>>> >>>>> wrote:
>>>> >>>>>
>>>> >>>>> > I'm redeveloping the Livelink connector because the API code
>>>> has been
>>>> >>>>> > discontinued and the only API is now web services based.  The
>>>> WSDLs
>>>> >>>>> and
>>>> >>>>> > XSDs have been exported and I'm trying to use the Axis tool
>>>> >>>>> WSDL2Java to
>>>> >>>>> > convert to Java code.  Unfortunately, I haven't been able to
>>>> make
>>>> >>>>> this work
>>>> >>>>> > -- even though the WSDLs references have been made local and the
>>>> >>>>> XSDs also
>>>> >>>>> > seem to be getting parsed, it complains about missing
>>>> definitions,
>>>> >>>>> even
>>>> >>>>> > though those definitions are clearly present in the XSD files.
>>>> >>>>> >
>>>> >>>>> > Has anyone had enough experience with this tool, and web
>>>> services in
>>>> >>>>> > general, to figure out what's wrong?  I've tried turning on as
>>>> >>>>> verbose a
>>>> >>>>> > debugging level for WSDL2Java as I can and it's no help at
>>>> all.  I
>>>> >>>>> suspect
>>>> >>>>> > namespace issues but I can't figure out what they are.
>>>> >>>>> >
>>>> >>>>> > Thanks in advance,
>>>> >>>>> > Karl
>>>> >>>>> >
>>>> >>>>>
>>>> >>>>
>>>>
>>>

Re: Axis question

2019-01-25 Thread Karl Wright

When did it first appear?  We're currently on 2.6.2; this is set by various
dependencies by our connectors.

Karl

On Fri, Jan 25, 2019 at 9:52 AM Karl Wright  wrote:

> The tools package doesn't seem to have it either.
> Karl
>
>
> On Fri, Jan 25, 2019 at 9:43 AM Karl Wright  wrote:
>
>> Do you know what jar/maven package this is in?  because I don't seem to
>> have it in our normal cxf jars...
>>
>> Karl
>>
>>
>> On Fri, Jan 25, 2019 at 9:08 AM Rafa Haro  wrote:
>>
>>> I used a wsdl2java script that comes as an utility of the apache cxf
>>> release, but basically is making use
>>> of org.apache.cxf.tools.wsdlto.WSDLToJava class. You can find here an
>>> usage
>>> example with ant: http://cxf.apache.org/docs/wsdl-to-java.html
>>>
>>> On Fri, Jan 25, 2019 at 2:59 PM Karl Wright  wrote:
>>>
>>> > I was using ancient Axis 1.4 and none of them were working.  You can
>>> > exercise this with "ant classcreate-wsdls" in the csws directory.
>>> >
>>> > If you can give instructions for invoking CXF, maybe we can do that
>>> > instead.  What's the main class, and what jars do we need to include?
>>> >
>>> > Karl
>>> >
>>> >
>>> > On Fri, Jan 25, 2019 at 7:28 AM Rafa Haro  wrote:
>>> >
>>> >> Yes, I did. I have only tested Authentication service with Apache CXF
>>> and
>>> >> it was apparently working fine. Which ones were failing for you?
>>> >>
>>> >> On Fri, Jan 25, 2019 at 12:38 PM Karl Wright 
>>> wrote:
>>> >>
>>> >>> Were you able to look at this yesterday at all?
>>> >>> Karl
>>> >>>
>>> >>> On Thu, Jan 24, 2019 at 6:34 AM Karl Wright 
>>> wrote:
>>> >>>
>>> >>>> They're all checked in.
>>> >>>>
>>> >>>> See
>>> >>>>
>>> https://svn.apache.org/repos/asf/manifoldcf/branches/CONNECTORS-1566/connectors/csws/wsdls
>>> >>>>
>>> >>>> Karl
>>> >>>>
>>> >>>>
>>> >>>> On Thu, Jan 24, 2019 at 6:24 AM Rafa Haro  wrote:
>>> >>>>
>>> >>>>> Karl, can you share the WSDL, I can try to take a look later today
>>> >>>>>
>>> >>>>> On Thu, Jan 24, 2019 at 12:13 PM Karl Wright 
>>> >>>>> wrote:
>>> >>>>>
>>> >>>>> > I'm redeveloping the Livelink connector because the API code has
>>> been
>>> >>>>> > discontinued and the only API is now web services based.  The
>>> WSDLs
>>> >>>>> and
>>> >>>>> > XSDs have been exported and I'm trying to use the Axis tool
>>> >>>>> WSDL2Java to
>>> >>>>> > convert to Java code.  Unfortunately, I haven't been able to make
>>> >>>>> this work
>>> >>>>> > -- even though the WSDLs references have been made local and the
>>> >>>>> XSDs also
>>> >>>>> > seem to be getting parsed, it complains about missing
>>> definitions,
>>> >>>>> even
>>> >>>>> > though those definitions are clearly present in the XSD files.
>>> >>>>> >
>>> >>>>> > Has anyone had enough experience with this tool, and web
>>> services in
>>> >>>>> > general, to figure out what's wrong?  I've tried turning on as
>>> >>>>> verbose a
>>> >>>>> > debugging level for WSDL2Java as I can and it's no help at all.
>>> I
>>> >>>>> suspect
>>> >>>>> > namespace issues but I can't figure out what they are.
>>> >>>>> >
>>> >>>>> > Thanks in advance,
>>> >>>>> > Karl
>>> >>>>> >
>>> >>>>>
>>> >>>>
>>>
>>

Re: Axis question

2019-01-25 Thread Karl Wright

The tools package doesn't seem to have it either.
Karl


On Fri, Jan 25, 2019 at 9:43 AM Karl Wright  wrote:

> Do you know what jar/maven package this is in?  because I don't seem to
> have it in our normal cxf jars...
>
> Karl
>
>
> On Fri, Jan 25, 2019 at 9:08 AM Rafa Haro  wrote:
>
>> I used a wsdl2java script that comes as an utility of the apache cxf
>> release, but basically is making use
>> of org.apache.cxf.tools.wsdlto.WSDLToJava class. You can find here an
>> usage
>> example with ant: http://cxf.apache.org/docs/wsdl-to-java.html
>>
>> On Fri, Jan 25, 2019 at 2:59 PM Karl Wright  wrote:
>>
>> > I was using ancient Axis 1.4 and none of them were working.  You can
>> > exercise this with "ant classcreate-wsdls" in the csws directory.
>> >
>> > If you can give instructions for invoking CXF, maybe we can do that
>> > instead.  What's the main class, and what jars do we need to include?
>> >
>> > Karl
>> >
>> >
>> > On Fri, Jan 25, 2019 at 7:28 AM Rafa Haro  wrote:
>> >
>> >> Yes, I did. I have only tested Authentication service with Apache CXF
>> and
>> >> it was apparently working fine. Which ones were failing for you?
>> >>
>> >> On Fri, Jan 25, 2019 at 12:38 PM Karl Wright 
>> wrote:
>> >>
>> >>> Were you able to look at this yesterday at all?
>> >>> Karl
>> >>>
>> >>> On Thu, Jan 24, 2019 at 6:34 AM Karl Wright 
>> wrote:
>> >>>
>> >>>> They're all checked in.
>> >>>>
>> >>>> See
>> >>>>
>> https://svn.apache.org/repos/asf/manifoldcf/branches/CONNECTORS-1566/connectors/csws/wsdls
>> >>>>
>> >>>> Karl
>> >>>>
>> >>>>
>> >>>> On Thu, Jan 24, 2019 at 6:24 AM Rafa Haro  wrote:
>> >>>>
>> >>>>> Karl, can you share the WSDL, I can try to take a look later today
>> >>>>>
>> >>>>> On Thu, Jan 24, 2019 at 12:13 PM Karl Wright 
>> >>>>> wrote:
>> >>>>>
>> >>>>> > I'm redeveloping the Livelink connector because the API code has
>> been
>> >>>>> > discontinued and the only API is now web services based.  The
>> WSDLs
>> >>>>> and
>> >>>>> > XSDs have been exported and I'm trying to use the Axis tool
>> >>>>> WSDL2Java to
>> >>>>> > convert to Java code.  Unfortunately, I haven't been able to make
>> >>>>> this work
>> >>>>> > -- even though the WSDLs references have been made local and the
>> >>>>> XSDs also
>> >>>>> > seem to be getting parsed, it complains about missing definitions,
>> >>>>> even
>> >>>>> > though those definitions are clearly present in the XSD files.
>> >>>>> >
>> >>>>> > Has anyone had enough experience with this tool, and web services
>> in
>> >>>>> > general, to figure out what's wrong?  I've tried turning on as
>> >>>>> verbose a
>> >>>>> > debugging level for WSDL2Java as I can and it's no help at all.  I
>> >>>>> suspect
>> >>>>> > namespace issues but I can't figure out what they are.
>> >>>>> >
>> >>>>> > Thanks in advance,
>> >>>>> > Karl
>> >>>>> >
>> >>>>>
>> >>>>
>>
>

Re: Axis question

2019-01-25 Thread Karl Wright

Do you know what jar/maven package this is in?  because I don't seem to
have it in our normal cxf jars...

Karl


On Fri, Jan 25, 2019 at 9:08 AM Rafa Haro  wrote:

> I used a wsdl2java script that comes as an utility of the apache cxf
> release, but basically is making use
> of org.apache.cxf.tools.wsdlto.WSDLToJava class. You can find here an usage
> example with ant: http://cxf.apache.org/docs/wsdl-to-java.html
>
> On Fri, Jan 25, 2019 at 2:59 PM Karl Wright  wrote:
>
> > I was using ancient Axis 1.4 and none of them were working.  You can
> > exercise this with "ant classcreate-wsdls" in the csws directory.
> >
> > If you can give instructions for invoking CXF, maybe we can do that
> > instead.  What's the main class, and what jars do we need to include?
> >
> > Karl
> >
> >
> > On Fri, Jan 25, 2019 at 7:28 AM Rafa Haro  wrote:
> >
> >> Yes, I did. I have only tested Authentication service with Apache CXF
> and
> >> it was apparently working fine. Which ones were failing for you?
> >>
> >> On Fri, Jan 25, 2019 at 12:38 PM Karl Wright 
> wrote:
> >>
> >>> Were you able to look at this yesterday at all?
> >>> Karl
> >>>
> >>> On Thu, Jan 24, 2019 at 6:34 AM Karl Wright 
> wrote:
> >>>
> >>>> They're all checked in.
> >>>>
> >>>> See
> >>>>
> https://svn.apache.org/repos/asf/manifoldcf/branches/CONNECTORS-1566/connectors/csws/wsdls
> >>>>
> >>>> Karl
> >>>>
> >>>>
> >>>> On Thu, Jan 24, 2019 at 6:24 AM Rafa Haro  wrote:
> >>>>
> >>>>> Karl, can you share the WSDL, I can try to take a look later today
> >>>>>
> >>>>> On Thu, Jan 24, 2019 at 12:13 PM Karl Wright 
> >>>>> wrote:
> >>>>>
> >>>>> > I'm redeveloping the Livelink connector because the API code has
> been
> >>>>> > discontinued and the only API is now web services based.  The WSDLs
> >>>>> and
> >>>>> > XSDs have been exported and I'm trying to use the Axis tool
> >>>>> WSDL2Java to
> >>>>> > convert to Java code.  Unfortunately, I haven't been able to make
> >>>>> this work
> >>>>> > -- even though the WSDLs references have been made local and the
> >>>>> XSDs also
> >>>>> > seem to be getting parsed, it complains about missing definitions,
> >>>>> even
> >>>>> > though those definitions are clearly present in the XSD files.
> >>>>> >
> >>>>> > Has anyone had enough experience with this tool, and web services
> in
> >>>>> > general, to figure out what's wrong?  I've tried turning on as
> >>>>> verbose a
> >>>>> > debugging level for WSDL2Java as I can and it's no help at all.  I
> >>>>> suspect
> >>>>> > namespace issues but I can't figure out what they are.
> >>>>> >
> >>>>> > Thanks in advance,
> >>>>> > Karl
> >>>>> >
> >>>>>
> >>>>
>

Re: Axis question

2019-01-25 Thread Karl Wright

The cxf stuff is already present, and is available in connector-common-lib
as well, so all that might be needed might be a new ant rule to invoke it:

01/17/2019  05:47 PM 1,400,339 cxf-core-3.2.6.jar
01/17/2019  05:46 PM   181,690 cxf-rt-bindings-soap-3.2.6.jar
01/17/2019  05:46 PM38,307 cxf-rt-bindings-xml-3.2.6.jar
01/17/2019  05:46 PM   105,048 cxf-rt-databinding-jaxb-3.2.6.jar
01/17/2019  05:47 PM   680,120 cxf-rt-frontend-jaxrs-3.2.6.jar
01/17/2019  05:46 PM   346,308 cxf-rt-frontend-jaxws-3.2.6.jar
01/17/2019  05:46 PM   103,850 cxf-rt-frontend-simple-3.2.6.jar
01/17/2019  05:47 PM   179,790 cxf-rt-rs-client-3.2.6.jar
01/17/2019  05:47 PM   362,532 cxf-rt-transports-http-3.2.6.jar
01/17/2019  05:46 PM75,478 cxf-rt-ws-addr-3.2.6.jar
01/17/2019  05:46 PM   214,507 cxf-rt-ws-policy-3.2.6.jar
01/17/2019  05:46 PM   173,359 cxf-rt-wsdl-3.2.6.jar

We'd also need XSD code generation, which is currently done by Castor
(haven't even tried it yet), so if this package has that ability too, it
would would be fantastic.

Karl




On Fri, Jan 25, 2019 at 8:59 AM Karl Wright  wrote:

> I was using ancient Axis 1.4 and none of them were working.  You can
> exercise this with "ant classcreate-wsdls" in the csws directory.
>
> If you can give instructions for invoking CXF, maybe we can do that
> instead.  What's the main class, and what jars do we need to include?
>
> Karl
>
>
> On Fri, Jan 25, 2019 at 7:28 AM Rafa Haro  wrote:
>
>> Yes, I did. I have only tested Authentication service with Apache CXF and
>> it was apparently working fine. Which ones were failing for you?
>>
>> On Fri, Jan 25, 2019 at 12:38 PM Karl Wright  wrote:
>>
>>> Were you able to look at this yesterday at all?
>>> Karl
>>>
>>> On Thu, Jan 24, 2019 at 6:34 AM Karl Wright  wrote:
>>>
>>>> They're all checked in.
>>>>
>>>> See
>>>> https://svn.apache.org/repos/asf/manifoldcf/branches/CONNECTORS-1566/connectors/csws/wsdls
>>>>
>>>> Karl
>>>>
>>>>
>>>> On Thu, Jan 24, 2019 at 6:24 AM Rafa Haro  wrote:
>>>>
>>>>> Karl, can you share the WSDL, I can try to take a look later today
>>>>>
>>>>> On Thu, Jan 24, 2019 at 12:13 PM Karl Wright 
>>>>> wrote:
>>>>>
>>>>> > I'm redeveloping the Livelink connector because the API code has been
>>>>> > discontinued and the only API is now web services based.  The WSDLs
>>>>> and
>>>>> > XSDs have been exported and I'm trying to use the Axis tool
>>>>> WSDL2Java to
>>>>> > convert to Java code.  Unfortunately, I haven't been able to make
>>>>> this work
>>>>> > -- even though the WSDLs references have been made local and the
>>>>> XSDs also
>>>>> > seem to be getting parsed, it complains about missing definitions,
>>>>> even
>>>>> > though those definitions are clearly present in the XSD files.
>>>>> >
>>>>> > Has anyone had enough experience with this tool, and web services in
>>>>> > general, to figure out what's wrong?  I've tried turning on as
>>>>> verbose a
>>>>> > debugging level for WSDL2Java as I can and it's no help at all.  I
>>>>> suspect
>>>>> > namespace issues but I can't figure out what they are.
>>>>> >
>>>>> > Thanks in advance,
>>>>> > Karl
>>>>> >
>>>>>
>>>>

Re: Axis question

2019-01-25 Thread Karl Wright

I was using ancient Axis 1.4 and none of them were working.  You can
exercise this with "ant classcreate-wsdls" in the csws directory.

If you can give instructions for invoking CXF, maybe we can do that
instead.  What's the main class, and what jars do we need to include?

Karl


On Fri, Jan 25, 2019 at 7:28 AM Rafa Haro  wrote:

> Yes, I did. I have only tested Authentication service with Apache CXF and
> it was apparently working fine. Which ones were failing for you?
>
> On Fri, Jan 25, 2019 at 12:38 PM Karl Wright  wrote:
>
>> Were you able to look at this yesterday at all?
>> Karl
>>
>> On Thu, Jan 24, 2019 at 6:34 AM Karl Wright  wrote:
>>
>>> They're all checked in.
>>>
>>> See
>>> https://svn.apache.org/repos/asf/manifoldcf/branches/CONNECTORS-1566/connectors/csws/wsdls
>>>
>>> Karl
>>>
>>>
>>> On Thu, Jan 24, 2019 at 6:24 AM Rafa Haro  wrote:
>>>
>>>> Karl, can you share the WSDL, I can try to take a look later today
>>>>
>>>> On Thu, Jan 24, 2019 at 12:13 PM Karl Wright 
>>>> wrote:
>>>>
>>>> > I'm redeveloping the Livelink connector because the API code has been
>>>> > discontinued and the only API is now web services based.  The WSDLs
>>>> and
>>>> > XSDs have been exported and I'm trying to use the Axis tool WSDL2Java
>>>> to
>>>> > convert to Java code.  Unfortunately, I haven't been able to make
>>>> this work
>>>> > -- even though the WSDLs references have been made local and the XSDs
>>>> also
>>>> > seem to be getting parsed, it complains about missing definitions,
>>>> even
>>>> > though those definitions are clearly present in the XSD files.
>>>> >
>>>> > Has anyone had enough experience with this tool, and web services in
>>>> > general, to figure out what's wrong?  I've tried turning on as
>>>> verbose a
>>>> > debugging level for WSDL2Java as I can and it's no help at all.  I
>>>> suspect
>>>> > namespace issues but I can't figure out what they are.
>>>> >
>>>> > Thanks in advance,
>>>> > Karl
>>>> >
>>>>
>>>

Re: Axis question

2019-01-25 Thread Karl Wright

Were you able to look at this yesterday at all?
Karl

On Thu, Jan 24, 2019 at 6:34 AM Karl Wright  wrote:

> They're all checked in.
>
> See
> https://svn.apache.org/repos/asf/manifoldcf/branches/CONNECTORS-1566/connectors/csws/wsdls
>
> Karl
>
>
> On Thu, Jan 24, 2019 at 6:24 AM Rafa Haro  wrote:
>
>> Karl, can you share the WSDL, I can try to take a look later today
>>
>> On Thu, Jan 24, 2019 at 12:13 PM Karl Wright  wrote:
>>
>> > I'm redeveloping the Livelink connector because the API code has been
>> > discontinued and the only API is now web services based.  The WSDLs and
>> > XSDs have been exported and I'm trying to use the Axis tool WSDL2Java to
>> > convert to Java code.  Unfortunately, I haven't been able to make this
>> work
>> > -- even though the WSDLs references have been made local and the XSDs
>> also
>> > seem to be getting parsed, it complains about missing definitions, even
>> > though those definitions are clearly present in the XSD files.
>> >
>> > Has anyone had enough experience with this tool, and web services in
>> > general, to figure out what's wrong?  I've tried turning on as verbose a
>> > debugging level for WSDL2Java as I can and it's no help at all.  I
>> suspect
>> > namespace issues but I can't figure out what they are.
>> >
>> > Thanks in advance,
>> > Karl
>> >
>>
>

[jira] [Resolved] (CONNECTORS-1573) Web Crawler exclude from index matches too much?

2019-01-24 Thread Karl Wright (JIRA)



 [ 
https://issues.apache.org/jira/browse/CONNECTORS-1573?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karl Wright resolved CONNECTORS-1573.
-
Resolution: Not A Problem

> Web Crawler exclude from index matches too much?
> 
>
> Key: CONNECTORS-1573
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1573
> Project: ManifoldCF
>  Issue Type: Bug
>  Components: Web connector
>Affects Versions: ManifoldCF 2.10
>Reporter: Korneel Staelens
>Priority: Major
>
> Hello, 
> I'm not sure this is a bug, or my misinterpretation of the exclusion rules:
> I want to set-up a rule, so that it does NOT index a parentpage, but does 
> index all childpages of that parent:
> I'm setting up a rule: 
> Inclusions: 
> .*
>  
> Exclustions:
> [http://www.website.com/nl/]
> (I've tried also: http://www.website.com/nl/(\s)* )
> No dice, I'f I'm looking at the logs, I see the pages are crawled, but not 
> indexed due to job restriction. Is my rule wrong? Or is this a small bug?
>  
> Thanks for advice!
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (CONNECTORS-1573) Web Crawler exclude from index matches too much?

2019-01-24 Thread Karl Wright (JIRA)



[ 
https://issues.apache.org/jira/browse/CONNECTORS-1573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16751689#comment-16751689
 ] 

Karl Wright commented on CONNECTORS-1573:
-

Questions like this should be asked to the us...@manifoldcf.apache.org list, 
not via a ticket.

The quick answer: if you look at the simple history, you can tell whether the 
pages are fetched or not.  If they are not fetched at all (that is, they do not 
appear), then your inclusion and exclusion list is wrong.  That doesn't sound 
like it's the problem here; it sounds like *after* fetching it's being blocked. 
 There are a number of reasons for that; the Simple History should give you a 
good idea which answer it is.  If it reports "JOBDESCRIPTION", that means that 
the *indexing* inclusion/exclusion rule discarded it   This is not the same as 
the *fetching* include/exclusion rules, which is what it sounds like you might 
be setting.  They're on the same tabs, just farther down.  The manual does not 
include the indexing rules sections; this should be addressed.

I suspect that, based on the regexps you given, you're also overlooking the 
fact that if the regexp matches ANYWHERE in the URL it is considered a match.  
So if you want a very specific URL, you need to delimit it with ^ at the 
beginning and $ at the end, to insure that the entire URL matches and ONLY that 
URL.




> Web Crawler exclude from index matches too much?
> 
>
> Key: CONNECTORS-1573
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1573
> Project: ManifoldCF
>  Issue Type: Bug
>  Components: Web connector
>Affects Versions: ManifoldCF 2.10
>Reporter: Korneel Staelens
>Priority: Major
>
> Hello, 
> I'm not sure this is a bug, or my misinterpretation of the exclusion rules:
> I want to set-up a rule, so that it does NOT index a parentpage, but does 
> index all childpages of that parent:
> I'm setting up a rule: 
> Inclusions: 
> .*
>  
> Exclustions:
> [http://www.website.com/nl/]
> (I've tried also: http://www.website.com/nl/(\s)* )
> No dice, I'f I'm looking at the logs, I see the pages are crawled, but not 
> indexed due to job restriction. Is my rule wrong? Or is this a small bug?
>  
> Thanks for advice!
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Re: Axis question

2019-01-24 Thread Karl Wright

They're all checked in.

See
https://svn.apache.org/repos/asf/manifoldcf/branches/CONNECTORS-1566/connectors/csws/wsdls

Karl


On Thu, Jan 24, 2019 at 6:24 AM Rafa Haro  wrote:

> Karl, can you share the WSDL, I can try to take a look later today
>
> On Thu, Jan 24, 2019 at 12:13 PM Karl Wright  wrote:
>
> > I'm redeveloping the Livelink connector because the API code has been
> > discontinued and the only API is now web services based.  The WSDLs and
> > XSDs have been exported and I'm trying to use the Axis tool WSDL2Java to
> > convert to Java code.  Unfortunately, I haven't been able to make this
> work
> > -- even though the WSDLs references have been made local and the XSDs
> also
> > seem to be getting parsed, it complains about missing definitions, even
> > though those definitions are clearly present in the XSD files.
> >
> > Has anyone had enough experience with this tool, and web services in
> > general, to figure out what's wrong?  I've tried turning on as verbose a
> > debugging level for WSDL2Java as I can and it's no help at all.  I
> suspect
> > namespace issues but I can't figure out what they are.
> >
> > Thanks in advance,
> > Karl
> >
>

Axis question

2019-01-24 Thread Karl Wright

I'm redeveloping the Livelink connector because the API code has been
discontinued and the only API is now web services based.  The WSDLs and
XSDs have been exported and I'm trying to use the Axis tool WSDL2Java to
convert to Java code.  Unfortunately, I haven't been able to make this work
-- even though the WSDLs references have been made local and the XSDs also
seem to be getting parsed, it complains about missing definitions, even
though those definitions are clearly present in the XSD files.

Has anyone had enough experience with this tool, and web services in
general, to figure out what's wrong?  I've tried turning on as verbose a
debugging level for WSDL2Java as I can and it's no help at all.  I suspect
namespace issues but I can't figure out what they are.

Thanks in advance,
Karl

Re: Do we support UTF-16 chars in version strings when using MySQL/MariaDB?

2019-01-23 Thread Karl Wright

It's critical, with Manifold, that the database instance be capable of
handling any characters it's likely to encounter.  For Postgresql we tell
people to install it with the utf-8 collation, for instance, and when we
create database instances ourselves we try to specify that as well.  For
MariaDB, have a look at the database implementation we've got, and let me
know if this is something we're missing anywhere?

Thanks,
Karl

On Wed, Jan 23, 2019 at 3:00 AM Markus Schuch  wrote:

> Hi,
>
> while using MySQL/MariaDB for MCF i encountered a "deadlock" kind of
> situation caused by a UTF-16 character (e.g. U+1F3AE) in a String
> inserted in one of the varchar colums.
>
> In my case a connector wrote th title of a parent document in to the
> version string of the process document, which contained the character
> U+1F3AE - a gamepad :)
>
> This lead to SQL Error 22001 "Incorrect string value: '\xF0\x9F\x8E\xAE'
> for column 'lastversion' at row 1" in mysql because the utf8 collation
> encoding does not support that kind of chars. (utf8mb4 does)
>
> The cause was hard to find, because it somehow it lead to a transaction
> abort loop in the incremental ingester and the error was not logged
> properly.
>
> My question:
> - should we create the mysql database with utf8mb4 by default?
> - or should inserted strings be sanatized from UTF-16 chars?
> - or should 22001 be handled better?
>
> Thanks in advance
> Markus
>

[jira] [Resolved] (CONNECTORS-1563) SolrException: org.apache.tika.exception.ZeroByteFileException: InputStream must have > 0 bytes

2019-01-21 Thread Karl Wright (JIRA)



 [ 
https://issues.apache.org/jira/browse/CONNECTORS-1563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karl Wright resolved CONNECTORS-1563.
-
Resolution: Not A Problem

User has a configuration that makes no sense.

> SolrException: org.apache.tika.exception.ZeroByteFileException: InputStream 
> must have > 0 bytes
> ---
>
> Key: CONNECTORS-1563
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1563
> Project: ManifoldCF
>  Issue Type: Task
>  Components: Lucene/SOLR connector
>Reporter: Sneha
>Assignee: Karl Wright
>Priority: Major
> Attachments: Document simple history.docx, managed-schema, manifold 
> settings.docx, manifoldcf.log, solr.log, solrconfig.xml
>
>
> I am encountering this problem:
> I have checked "Use the Extract Update Handler:" param then I am getting an 
> error on Solr i.e. null:org.apache.solr.common.SolrException: 
> org.apache.tika.exception.ZeroByteFileException: InputStream must have > 0 
> bytes
> If I ignore tika exception, my documents get indexed but dont have content 
> field on Solr.
> I am using Solr 7.3.1 and manifoldCF 2.8.1
> I am using solr cell and hence not configured external tika extractor in 
> manifoldCF pipeline
> Please help me with this problem
> Thanks in advance



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Resolved] (CONNECTORS-1535) Documentum Connector cannot find dfc.properties

2019-01-18 Thread Karl Wright (JIRA)



 [ 
https://issues.apache.org/jira/browse/CONNECTORS-1535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karl Wright resolved CONNECTORS-1535.
-
Resolution: Fixed

> Documentum Connector cannot find dfc.properties
> ---
>
> Key: CONNECTORS-1535
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1535
> Project: ManifoldCF
>  Issue Type: Bug
>  Components: Documentum connector
>Affects Versions: ManifoldCF 2.10, ManifoldCF 2.11
> Environment: Manifold 2.11
> CentOS Linux release 7.5.1804 (Core)
> OpenJDK 64-Bit Server VM 18.9 (build 11+28, mixed mode)
>  
>Reporter: James Thomas
>Assignee: Karl Wright
>Priority: Major
> Fix For: ManifoldCF 2.13
>
>
> I have found that when installing a clean MCF instance I cannot get 
> Documentum repository connectors to connect to Documentum until I have added 
> this line to the processes/documentum-server/run.sh script before the call to 
> Java:
>  
> {code:java}
> CLASSPATH="$CLASSPATH""$PATHSEP""$DOCUMENTUM"{code}
> Until I do this, attempts to save the connector will result in this output to 
> the console:
>  
> {noformat}
> 4 [RMI TCP Connection(2)-127.0.0.1] ERROR 
> com.documentum.fc.common.impl.preferences.PreferencesManager  - 
> [DFC_PREFERENCE_LOAD_FAILED] Failed to load persistent preferences from null
> java.io.FileNotFoundException: dfc.properties
>     at 
> com.documentum.fc.common.impl.preferences.PreferencesManager.locateMainPersistentStore(PreferencesManager.java:378)
>     at 
> com.documentum.fc.common.impl.preferences.PreferencesManager.readPersistentProperties(PreferencesManager.java:329)
>     at 
> com.documentum.fc.common.impl.preferences.PreferencesManager.(PreferencesManager.java:37)
>     at 
> com.documentum.fc.common.DfPreferences.initialize(DfPreferences.java:64)
> ..{noformat}
> and this message in the MCF UI:
>  
> {noformat}
> Connection failed: Documentum error: No DocBrokers are configured{noformat}
>  
>  
> I mentioned this in #1512 for MCF 2.10 but it got lost in the other work done 
> in that ticket. While setting up 2.11 from scratch I encountered it again.
>  
> Once I have edited the run.sh script I get this in the console, showing that 
> (for whatever reason) the change is significant:
>  
> {noformat}
> Reading DFC configuration from 
> "file:/opt/manifold/apache-manifoldcf-2.11/processes/documentum-server/dfc.properties"
> {noformat}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (CONNECTORS-1535) Documentum Connector cannot find dfc.properties

2019-01-18 Thread Karl Wright (JIRA)



[ 
https://issues.apache.org/jira/browse/CONNECTORS-1535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16746262#comment-16746262
 ] 

Karl Wright commented on CONNECTORS-1535:
-

[~jamesthomas], the registry process has no dependencies whatsoever on DFC, so 
any changes to this would be unnecessary.

Last question: can the DFC properties location be provided as a -D switch 
parameter to the JVM?  


> Documentum Connector cannot find dfc.properties
> ---
>
> Key: CONNECTORS-1535
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1535
> Project: ManifoldCF
>  Issue Type: Bug
>  Components: Documentum connector
>Affects Versions: ManifoldCF 2.10, ManifoldCF 2.11
> Environment: Manifold 2.11
> CentOS Linux release 7.5.1804 (Core)
> OpenJDK 64-Bit Server VM 18.9 (build 11+28, mixed mode)
>  
>Reporter: James Thomas
>Assignee: Karl Wright
>Priority: Major
> Fix For: ManifoldCF 2.13
>
>
> I have found that when installing a clean MCF instance I cannot get 
> Documentum repository connectors to connect to Documentum until I have added 
> this line to the processes/documentum-server/run.sh script before the call to 
> Java:
>  
> {code:java}
> CLASSPATH="$CLASSPATH""$PATHSEP""$DOCUMENTUM"{code}
> Until I do this, attempts to save the connector will result in this output to 
> the console:
>  
> {noformat}
> 4 [RMI TCP Connection(2)-127.0.0.1] ERROR 
> com.documentum.fc.common.impl.preferences.PreferencesManager  - 
> [DFC_PREFERENCE_LOAD_FAILED] Failed to load persistent preferences from null
> java.io.FileNotFoundException: dfc.properties
>     at 
> com.documentum.fc.common.impl.preferences.PreferencesManager.locateMainPersistentStore(PreferencesManager.java:378)
>     at 
> com.documentum.fc.common.impl.preferences.PreferencesManager.readPersistentProperties(PreferencesManager.java:329)
>     at 
> com.documentum.fc.common.impl.preferences.PreferencesManager.(PreferencesManager.java:37)
>     at 
> com.documentum.fc.common.DfPreferences.initialize(DfPreferences.java:64)
> ..{noformat}
> and this message in the MCF UI:
>  
> {noformat}
> Connection failed: Documentum error: No DocBrokers are configured{noformat}
>  
>  
> I mentioned this in #1512 for MCF 2.10 but it got lost in the other work done 
> in that ticket. While setting up 2.11 from scratch I encountered it again.
>  
> Once I have edited the run.sh script I get this in the console, showing that 
> (for whatever reason) the change is significant:
>  
> {noformat}
> Reading DFC configuration from 
> "file:/opt/manifold/apache-manifoldcf-2.11/processes/documentum-server/dfc.properties"
> {noformat}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (CONNECTORS-1564) Support preemptive authentication to Solr connector

2019-01-17 Thread Karl Wright (JIRA)



[ 
https://issues.apache.org/jira/browse/CONNECTORS-1564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16745422#comment-16745422
 ] 

Karl Wright commented on CONNECTORS-1564:
-

[~michael-o] thanks for trying this.

I await Erlend's more precise description of his setup.  We are in fact setting 
up the HttpClientBuilder exactly as you recommend:

{code}
RequestConfig.Builder requestBuilder = RequestConfig.custom()
  .setCircularRedirectsAllowed(true)
  .setSocketTimeout(socketTimeout)
  .setExpectContinueEnabled(true)
  .setConnectTimeout(connectionTimeout)
  .setConnectionRequestTimeout(socketTimeout);

HttpClientBuilder clientBuilder = HttpClients.custom()
  .setConnectionManager(connectionManager)
  .disableAutomaticRetries()
  .setDefaultRequestConfig(requestBuilder.build())
  .setRedirectStrategy(new LaxRedirectStrategy())
  .setRequestExecutor(new HttpRequestExecutor(socketTimeout));
{code}



> Support preemptive authentication to Solr connector
> ---
>
> Key: CONNECTORS-1564
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1564
> Project: ManifoldCF
>  Issue Type: Improvement
>  Components: Lucene/SOLR connector
>Reporter: Erlend Garåsen
>    Assignee: Karl Wright
>Priority: Major
> Attachments: CONNECTORS-1564.patch
>
>
> We should post preemptively in case the Solr server requires basic 
> authentication. This will make the communication between ManifoldCF and Solr 
> much more effective instead of the following:
>  * Send a HTTP POST request to Solr
>  * Solr sends a 401 response
>  * Send the same request, but with a "{{Authorization: Basic}}" header
> With preemptive authentication, we can send the header in the first request.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (CONNECTORS-1535) Documentum Connector cannot find dfc.properties

2019-01-17 Thread Karl Wright (JIRA)



[ 
https://issues.apache.org/jira/browse/CONNECTORS-1535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16745418#comment-16745418
 ] 

Karl Wright commented on CONNECTORS-1535:
-

Can you put dfc.properties in the same directory as the other DFC files and 
have it be found?


> Documentum Connector cannot find dfc.properties
> ---
>
> Key: CONNECTORS-1535
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1535
> Project: ManifoldCF
>  Issue Type: Bug
>  Components: Documentum connector
>Affects Versions: ManifoldCF 2.10, ManifoldCF 2.11
> Environment: Manifold 2.11
> CentOS Linux release 7.5.1804 (Core)
> OpenJDK 64-Bit Server VM 18.9 (build 11+28, mixed mode)
>  
>Reporter: James Thomas
>Assignee: Karl Wright
>Priority: Major
> Fix For: ManifoldCF 2.13
>
>
> I have found that when installing a clean MCF instance I cannot get 
> Documentum repository connectors to connect to Documentum until I have added 
> this line to the processes/documentum-server/run.sh script before the call to 
> Java:
>  
> {code:java}
> CLASSPATH="$CLASSPATH""$PATHSEP""$DOCUMENTUM"{code}
> Until I do this, attempts to save the connector will result in this output to 
> the console:
>  
> {noformat}
> 4 [RMI TCP Connection(2)-127.0.0.1] ERROR 
> com.documentum.fc.common.impl.preferences.PreferencesManager  - 
> [DFC_PREFERENCE_LOAD_FAILED] Failed to load persistent preferences from null
> java.io.FileNotFoundException: dfc.properties
>     at 
> com.documentum.fc.common.impl.preferences.PreferencesManager.locateMainPersistentStore(PreferencesManager.java:378)
>     at 
> com.documentum.fc.common.impl.preferences.PreferencesManager.readPersistentProperties(PreferencesManager.java:329)
>     at 
> com.documentum.fc.common.impl.preferences.PreferencesManager.(PreferencesManager.java:37)
>     at 
> com.documentum.fc.common.DfPreferences.initialize(DfPreferences.java:64)
> ..{noformat}
> and this message in the MCF UI:
>  
> {noformat}
> Connection failed: Documentum error: No DocBrokers are configured{noformat}
>  
>  
> I mentioned this in #1512 for MCF 2.10 but it got lost in the other work done 
> in that ticket. While setting up 2.11 from scratch I encountered it again.
>  
> Once I have edited the run.sh script I get this in the console, showing that 
> (for whatever reason) the change is significant:
>  
> {noformat}
> Reading DFC configuration from 
> "file:/opt/manifold/apache-manifoldcf-2.11/processes/documentum-server/dfc.properties"
> {noformat}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (CONNECTORS-1535) Documentum Connector cannot find dfc.properties

2019-01-17 Thread Karl Wright (JIRA)



 [ 
https://issues.apache.org/jira/browse/CONNECTORS-1535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karl Wright updated CONNECTORS-1535:

Fix Version/s: ManifoldCF 2.13

> Documentum Connector cannot find dfc.properties
> ---
>
> Key: CONNECTORS-1535
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1535
> Project: ManifoldCF
>  Issue Type: Bug
>  Components: Documentum connector
>Affects Versions: ManifoldCF 2.10, ManifoldCF 2.11
> Environment: Manifold 2.11
> CentOS Linux release 7.5.1804 (Core)
> OpenJDK 64-Bit Server VM 18.9 (build 11+28, mixed mode)
>  
>Reporter: James Thomas
>Assignee: Karl Wright
>Priority: Major
> Fix For: ManifoldCF 2.13
>
>
> I have found that when installing a clean MCF instance I cannot get 
> Documentum repository connectors to connect to Documentum until I have added 
> this line to the processes/documentum-server/run.sh script before the call to 
> Java:
>  
> {code:java}
> CLASSPATH="$CLASSPATH""$PATHSEP""$DOCUMENTUM"{code}
> Until I do this, attempts to save the connector will result in this output to 
> the console:
>  
> {noformat}
> 4 [RMI TCP Connection(2)-127.0.0.1] ERROR 
> com.documentum.fc.common.impl.preferences.PreferencesManager  - 
> [DFC_PREFERENCE_LOAD_FAILED] Failed to load persistent preferences from null
> java.io.FileNotFoundException: dfc.properties
>     at 
> com.documentum.fc.common.impl.preferences.PreferencesManager.locateMainPersistentStore(PreferencesManager.java:378)
>     at 
> com.documentum.fc.common.impl.preferences.PreferencesManager.readPersistentProperties(PreferencesManager.java:329)
>     at 
> com.documentum.fc.common.impl.preferences.PreferencesManager.(PreferencesManager.java:37)
>     at 
> com.documentum.fc.common.DfPreferences.initialize(DfPreferences.java:64)
> ..{noformat}
> and this message in the MCF UI:
>  
> {noformat}
> Connection failed: Documentum error: No DocBrokers are configured{noformat}
>  
>  
> I mentioned this in #1512 for MCF 2.10 but it got lost in the other work done 
> in that ticket. While setting up 2.11 from scratch I encountered it again.
>  
> Once I have edited the run.sh script I get this in the console, showing that 
> (for whatever reason) the change is significant:
>  
> {noformat}
> Reading DFC configuration from 
> "file:/opt/manifold/apache-manifoldcf-2.11/processes/documentum-server/dfc.properties"
> {noformat}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Assigned] (CONNECTORS-1535) Documentum Connector cannot find dfc.properties

2019-01-17 Thread Karl Wright (JIRA)



 [ 
https://issues.apache.org/jira/browse/CONNECTORS-1535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karl Wright reassigned CONNECTORS-1535:
---

Assignee: Karl Wright

> Documentum Connector cannot find dfc.properties
> ---
>
> Key: CONNECTORS-1535
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1535
> Project: ManifoldCF
>  Issue Type: Bug
>  Components: Documentum connector
>Affects Versions: ManifoldCF 2.10, ManifoldCF 2.11
> Environment: Manifold 2.11
> CentOS Linux release 7.5.1804 (Core)
> OpenJDK 64-Bit Server VM 18.9 (build 11+28, mixed mode)
>  
>Reporter: James Thomas
>Assignee: Karl Wright
>Priority: Major
>
> I have found that when installing a clean MCF instance I cannot get 
> Documentum repository connectors to connect to Documentum until I have added 
> this line to the processes/documentum-server/run.sh script before the call to 
> Java:
>  
> {code:java}
> CLASSPATH="$CLASSPATH""$PATHSEP""$DOCUMENTUM"{code}
> Until I do this, attempts to save the connector will result in this output to 
> the console:
>  
> {noformat}
> 4 [RMI TCP Connection(2)-127.0.0.1] ERROR 
> com.documentum.fc.common.impl.preferences.PreferencesManager  - 
> [DFC_PREFERENCE_LOAD_FAILED] Failed to load persistent preferences from null
> java.io.FileNotFoundException: dfc.properties
>     at 
> com.documentum.fc.common.impl.preferences.PreferencesManager.locateMainPersistentStore(PreferencesManager.java:378)
>     at 
> com.documentum.fc.common.impl.preferences.PreferencesManager.readPersistentProperties(PreferencesManager.java:329)
>     at 
> com.documentum.fc.common.impl.preferences.PreferencesManager.(PreferencesManager.java:37)
>     at 
> com.documentum.fc.common.DfPreferences.initialize(DfPreferences.java:64)
> ..{noformat}
> and this message in the MCF UI:
>  
> {noformat}
> Connection failed: Documentum error: No DocBrokers are configured{noformat}
>  
>  
> I mentioned this in #1512 for MCF 2.10 but it got lost in the other work done 
> in that ticket. While setting up 2.11 from scratch I encountered it again.
>  
> Once I have edited the run.sh script I get this in the console, showing that 
> (for whatever reason) the change is significant:
>  
> {noformat}
> Reading DFC configuration from 
> "file:/opt/manifold/apache-manifoldcf-2.11/processes/documentum-server/dfc.properties"
> {noformat}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Re: SharedDriveConnector - jcifs.smb.SmbException: A device attached to the system is not functioning

2019-01-17 Thread Karl Wright

That error is coming from the server you're trying to index.  Sounds like
some kind of hardware problem is being detected.

Karl


On Thu, Jan 17, 2019 at 2:17 AM  wrote:

>
>
> Hi,
>
> I've got a problem with the WindowsShares-Connector.
> While indexing data on a filesever i get the following exception after
> approximately 1 documents have been indexed.
>
> jcifs.smb.SmbException: A device attached to the system is not functioning.
> at jcifs.smb.SmbTransport.checkStatus(SmbTransport.java:563) ~
> [jcifs.jar:?]
> at jcifs.smb.SmbTransport.send(SmbTransport.java:640) ~
> [jcifs.jar:?]
> at jcifs.smb.SmbSession.send(SmbSession.java:238) ~[jcifs.jar:?]
> at jcifs.smb.SmbTree.send(SmbTree.java:119) ~[jcifs.jar:?]
> at jcifs.smb.SmbFile.send(SmbFile.java:775) ~[jcifs.jar:?]
> at jcifs.smb.SmbFile.doFindFirstNext(SmbFile.java:1989) ~
> [jcifs.jar:?]
> at jcifs.smb.SmbFile.doEnum(SmbFile.java:1741) ~[jcifs.jar:?]
> at jcifs.smb.SmbFile.listFiles(SmbFile.java:1718) ~[jcifs.jar:?]
> at jcifs.smb.SmbFile.listFiles(SmbFile.java:1707) ~[jcifs.jar:?]
> at
>
> org.apache.manifoldcf.crawler.connectors.sharedrive.SharedDriveConnector.fileListFiles
> (SharedDriveConnector.java:2318) [mcf-jcifs-connector.jar:?]
>
> at
>
> org.apache.manifoldcf.crawler.connectors.sharedrive.SharedDriveConnector.processDocuments
> (SharedDriveConnector.java:798) [mcf-jcifs-connector.jar:?]
>
> at org.apache.manifoldcf.crawler.system.WorkerThread.run
> (WorkerThread.java:399) [mcf-pull-agent.jar:?]
> ERROR 2019-01-16T08:48:00,172 (Worker thread '14') - JCIFS: SmbException
> tossed processing smb://??.??.??.???/dir/dir/dir
>
> I'm using ManifoldCF 2.11 and jcifs-1.3.19.jar
>
> Do you have an idea what i could do or even a solution for it?
>
> Kind regards,
>
> Florjana
>
> 
> Der Austausch von Nachrichten via e-mail dient ausschließlich
> Informationszwecken und ist nur für den Gebrauch des Empfängers bestimmt.
> Rechtsgeschäftliche Erklärungen dürfen über dieses Medium nicht
> ausgetauscht werden. Sollten Sie nicht der Adressat sein, verständigen Sie
> uns bitte unverzüglich per e-mail oder Telefon und vernichten Sie diese
> Nachricht.
>
> The exchange of e-mail messages is for purposes of information only and
> only intended for the recipient.
> This medium may not be used to exchange legal declarations. If you are not
> the intended recipient, please contact us immediately by e-mail or phone
> and delete this message from your system.

[jira] [Commented] (CONNECTORS-1564) Support preemptive authentication to Solr connector

2019-01-15 Thread Karl Wright (JIRA)



[ 
https://issues.apache.org/jira/browse/CONNECTORS-1564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16743186#comment-16743186
 ] 

Karl Wright commented on CONNECTORS-1564:
-

[~michael-o], any updates?


> Support preemptive authentication to Solr connector
> ---
>
> Key: CONNECTORS-1564
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1564
> Project: ManifoldCF
>  Issue Type: Improvement
>  Components: Lucene/SOLR connector
>Reporter: Erlend Garåsen
>    Assignee: Karl Wright
>Priority: Major
> Attachments: CONNECTORS-1564.patch
>
>
> We should post preemptively in case the Solr server requires basic 
> authentication. This will make the communication between ManifoldCF and Solr 
> much more effective instead of the following:
>  * Send a HTTP POST request to Solr
>  * Solr sends a 401 response
>  * Send the same request, but with a "{{Authorization: Basic}}" header
> With preemptive authentication, we can send the header in the first request.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (CONNECTORS-1563) SolrException: org.apache.tika.exception.ZeroByteFileException: InputStream must have > 0 bytes

2019-01-15 Thread Karl Wright (JIRA)



[ 
https://issues.apache.org/jira/browse/CONNECTORS-1563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16743033#comment-16743033
 ] 

Karl Wright commented on CONNECTORS-1563:
-

Please also see this discussion:

https://issues.apache.org/jira/browse/CONNECTORS-1533



> SolrException: org.apache.tika.exception.ZeroByteFileException: InputStream 
> must have > 0 bytes
> ---
>
> Key: CONNECTORS-1563
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1563
> Project: ManifoldCF
>  Issue Type: Task
>  Components: Lucene/SOLR connector
>Reporter: Sneha
>Assignee: Karl Wright
>Priority: Major
> Attachments: Document simple history.docx, managed-schema, manifold 
> settings.docx, manifoldcf.log, solr.log, solrconfig.xml
>
>
> I am encountering this problem:
> I have checked "Use the Extract Update Handler:" param then I am getting an 
> error on Solr i.e. null:org.apache.solr.common.SolrException: 
> org.apache.tika.exception.ZeroByteFileException: InputStream must have > 0 
> bytes
> If I ignore tika exception, my documents get indexed but dont have content 
> field on Solr.
> I am using Solr 7.3.1 and manifoldCF 2.8.1
> I am using solr cell and hence not configured external tika extractor in 
> manifoldCF pipeline
> Please help me with this problem
> Thanks in advance



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (CONNECTORS-1563) SolrException: org.apache.tika.exception.ZeroByteFileException: InputStream must have > 0 bytes

2019-01-15 Thread Karl Wright (JIRA)



[ 
https://issues.apache.org/jira/browse/CONNECTORS-1563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16743028#comment-16743028
 ] 

Karl Wright commented on CONNECTORS-1563:
-

First, I asked for the Simple History, not the manifoldcf logs.  What does the 
simple history say about document ingestions for the connection in question 
with the new configuration?

But, from your solr log:

{code}
2019-01-15 11:51:54.211 ERROR (qtp592617454-22) [   x:eesolr_webcrawler] 
o.a.s.s.HttpSolrCall null:org.apache.solr.common.SolrException: 
org.apache.tika.exception.ZeroByteFileException: InputStream must have > 0 bytes
at 
org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:234)
{code}

Note that the stack trace is from the ExtractingDocumentLoader, which is Tika.  
You did not manage to actually change the output handler to the non-extracting 
one, possibly because you have configured your Solr in a non-default way.  I 
cannot debug that for you, sorry.

Can you do the following:  Download the current 7.x version of Solr, fresh, and 
extract it.  Start it using the standard provided simple scripts.  Point 
ManifoldCF at it and crawl some documents, using the setup for the connection I 
have described.  Does that work?  If it does, and I expect it to because that 
is what works for me here, then it is your job to figure out what you did to 
Solr to make that not work.


> SolrException: org.apache.tika.exception.ZeroByteFileException: InputStream 
> must have > 0 bytes
> ---
>
> Key: CONNECTORS-1563
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1563
> Project: ManifoldCF
>  Issue Type: Task
>  Components: Lucene/SOLR connector
>Reporter: Sneha
>Assignee: Karl Wright
>Priority: Major
> Attachments: Document simple history.docx, managed-schema, manifold 
> settings.docx, manifoldcf.log, solr.log, solrconfig.xml
>
>
> I am encountering this problem:
> I have checked "Use the Extract Update Handler:" param then I am getting an 
> error on Solr i.e. null:org.apache.solr.common.SolrException: 
> org.apache.tika.exception.ZeroByteFileException: InputStream must have > 0 
> bytes
> If I ignore tika exception, my documents get indexed but dont have content 
> field on Solr.
> I am using Solr 7.3.1 and manifoldCF 2.8.1
> I am using solr cell and hence not configured external tika extractor in 
> manifoldCF pipeline
> Please help me with this problem
> Thanks in advance



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (CONNECTORS-1563) SolrException: org.apache.tika.exception.ZeroByteFileException: InputStream must have > 0 bytes

2019-01-15 Thread Karl Wright (JIRA)



[ 
https://issues.apache.org/jira/browse/CONNECTORS-1563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16743006#comment-16743006
 ] 

Karl Wright commented on CONNECTORS-1563:
-

Please include [INFO] messages from the Solr log for example indexing requests, 
and also include records from the Simple History for documents indexed with the 
new configuration.  Thanks.


> SolrException: org.apache.tika.exception.ZeroByteFileException: InputStream 
> must have > 0 bytes
> ---
>
> Key: CONNECTORS-1563
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1563
> Project: ManifoldCF
>  Issue Type: Task
>  Components: Lucene/SOLR connector
>Reporter: Sneha
>Assignee: Karl Wright
>Priority: Major
> Attachments: managed-schema, manifold settings.docx, solrconfig.xml
>
>
> I am encountering this problem:
> I have checked "Use the Extract Update Handler:" param then I am getting an 
> error on Solr i.e. null:org.apache.solr.common.SolrException: 
> org.apache.tika.exception.ZeroByteFileException: InputStream must have > 0 
> bytes
> If I ignore tika exception, my documents get indexed but dont have content 
> field on Solr.
> I am using Solr 7.3.1 and manifoldCF 2.8.1
> I am using solr cell and hence not configured external tika extractor in 
> manifoldCF pipeline
> Please help me with this problem
> Thanks in advance



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (CONNECTORS-1563) SolrException: org.apache.tika.exception.ZeroByteFileException: InputStream must have > 0 bytes

2019-01-15 Thread Karl Wright (JIRA)



[ 
https://issues.apache.org/jira/browse/CONNECTORS-1563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16742949#comment-16742949
 ] 

Karl Wright commented on CONNECTORS-1563:
-

Please view the Solr connection and click the button that tells it to forget 
about everything it has indexed.  That will force reindexing.  That's standard 
step when you change configuration like this and you want all documents to be 
reindexed.


> SolrException: org.apache.tika.exception.ZeroByteFileException: InputStream 
> must have > 0 bytes
> ---
>
> Key: CONNECTORS-1563
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1563
> Project: ManifoldCF
>  Issue Type: Task
>  Components: Lucene/SOLR connector
>Reporter: Sneha
>Assignee: Karl Wright
>Priority: Major
> Attachments: managed-schema, manifold settings.docx, solrconfig.xml
>
>
> I am encountering this problem:
> I have checked "Use the Extract Update Handler:" param then I am getting an 
> error on Solr i.e. null:org.apache.solr.common.SolrException: 
> org.apache.tika.exception.ZeroByteFileException: InputStream must have > 0 
> bytes
> If I ignore tika exception, my documents get indexed but dont have content 
> field on Solr.
> I am using Solr 7.3.1 and manifoldCF 2.8.1
> I am using solr cell and hence not configured external tika extractor in 
> manifoldCF pipeline
> Please help me with this problem
> Thanks in advance



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (CONNECTORS-1570) ManifoldCF Documentum connetor crawling performance

2019-01-12 Thread Karl Wright (JIRA)



[ 
https://issues.apache.org/jira/browse/CONNECTORS-1570?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16741333#comment-16741333
 ] 

Karl Wright commented on CONNECTORS-1570:
-

Please ask your question on the us...@manifoldcf.apache.org list.
In our experience, the performance of documentum itself is the bottleneck, and 
nothing can be done without optimizing for that.


> ManifoldCF Documentum connetor crawling performance
> ---
>
> Key: CONNECTORS-1570
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1570
> Project: ManifoldCF
>  Issue Type: Bug
>  Components: Documentum connector
>Affects Versions: ManifoldCF 2.9.1
>Reporter: Gomahti
>Priority: Major
>
> We are crawling data from DCTM repository using ManiFoldCF documentum 
> connector and writing the crawled data to MongoDB. Crawling triggered with 
> throttling value 500.But crawling speed is very slow per minute connector is 
> fetching only 170 documents. The server where MCF installed is configured 
> with enough memory with 8 logical cores (CPU). Can someone help us here to 
> improve crawling speed?
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Resolved] (CONNECTORS-1570) ManifoldCF Documentum connetor crawling performance

2019-01-12 Thread Karl Wright (JIRA)



 [ 
https://issues.apache.org/jira/browse/CONNECTORS-1570?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karl Wright resolved CONNECTORS-1570.
-
Resolution: Not A Problem

> ManifoldCF Documentum connetor crawling performance
> ---
>
> Key: CONNECTORS-1570
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1570
> Project: ManifoldCF
>  Issue Type: Bug
>  Components: Documentum connector
>Affects Versions: ManifoldCF 2.9.1
>Reporter: Gomahti
>Priority: Major
>
> We are crawling data from DCTM repository using ManiFoldCF documentum 
> connector and writing the crawled data to MongoDB. Crawling triggered with 
> throttling value 500.But crawling speed is very slow per minute connector is 
> fetching only 170 documents. The server where MCF installed is configured 
> with enough memory with 8 logical cores (CPU). Can someone help us here to 
> improve crawling speed?
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (CONNECTORS-1563) SolrException: org.apache.tika.exception.ZeroByteFileException: InputStream must have > 0 bytes

2019-01-11 Thread Karl Wright (JIRA)



[ 
https://issues.apache.org/jira/browse/CONNECTORS-1563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16741004#comment-16741004
 ] 

Karl Wright commented on CONNECTORS-1563:
-

{quote}
I need to pass from manifold one custom field and value which I want to see in 
Solr index. That is the reason why I used metadata transformer where I can pass 
the custom field in job - tab metadata adjuster.
{quote}

Yes, people do that all the time.  Just add the Metadata Adjuster any place in 
your pipeline and have it inject the field value you want.  It will be 
faithfully transmitted to Solr.


> SolrException: org.apache.tika.exception.ZeroByteFileException: InputStream 
> must have > 0 bytes
> ---
>
> Key: CONNECTORS-1563
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1563
> Project: ManifoldCF
>  Issue Type: Task
>  Components: Lucene/SOLR connector
>Reporter: Sneha
>Assignee: Karl Wright
>Priority: Major
> Attachments: managed-schema, solrconfig.xml
>
>
> I am encountering this problem:
> I have checked "Use the Extract Update Handler:" param then I am getting an 
> error on Solr i.e. null:org.apache.solr.common.SolrException: 
> org.apache.tika.exception.ZeroByteFileException: InputStream must have > 0 
> bytes
> If I ignore tika exception, my documents get indexed but dont have content 
> field on Solr.
> I am using Solr 7.3.1 and manifoldCF 2.8.1
> I am using solr cell and hence not configured external tika extractor in 
> manifoldCF pipeline
> Please help me with this problem
> Thanks in advance



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (CONNECTORS-1563) SolrException: org.apache.tika.exception.ZeroByteFileException: InputStream must have > 0 bytes

2019-01-11 Thread Karl Wright (JIRA)



[ 
https://issues.apache.org/jira/browse/CONNECTORS-1563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16740587#comment-16740587
 ] 

Karl Wright commented on CONNECTORS-1563:
-

The metadata extractor can go anywhere in your pipeline, after Tika extraction. 
 There is absolutely no point in having *two* Tika extractions though -- and 
that's what you're trying to do with the setup you've got.

What I'd recommend is that you use only the ManifoldCF-side Tika extractor, and 
inject content into Solr using the /update handler, not the /update/extract 
handler.  There's also a checkbox you'd need to uncheck in the Solr connection 
configuration. It's all covered in the ManifoldCF end user documentation.



> SolrException: org.apache.tika.exception.ZeroByteFileException: InputStream 
> must have > 0 bytes
> ---
>
> Key: CONNECTORS-1563
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1563
> Project: ManifoldCF
>  Issue Type: Task
>  Components: Lucene/SOLR connector
>Reporter: Sneha
>Assignee: Karl Wright
>Priority: Major
> Attachments: managed-schema, solrconfig.xml
>
>
> I am encountering this problem:
> I have checked "Use the Extract Update Handler:" param then I am getting an 
> error on Solr i.e. null:org.apache.solr.common.SolrException: 
> org.apache.tika.exception.ZeroByteFileException: InputStream must have > 0 
> bytes
> If I ignore tika exception, my documents get indexed but dont have content 
> field on Solr.
> I am using Solr 7.3.1 and manifoldCF 2.8.1
> I am using solr cell and hence not configured external tika extractor in 
> manifoldCF pipeline
> Please help me with this problem
> Thanks in advance



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (CONNECTORS-1563) SolrException: org.apache.tika.exception.ZeroByteFileException: InputStream must have > 0 bytes

2019-01-11 Thread Karl Wright (JIRA)



[ 
https://issues.apache.org/jira/browse/CONNECTORS-1563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16740435#comment-16740435
 ] 

Karl Wright commented on CONNECTORS-1563:
-

{quote}
Solr cell with standard update handler...
{quote}

This is not Option 2; it's a combination of (1) and (2) and is not a model that 
we support.


> SolrException: org.apache.tika.exception.ZeroByteFileException: InputStream 
> must have > 0 bytes
> ---
>
> Key: CONNECTORS-1563
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1563
> Project: ManifoldCF
>  Issue Type: Task
>  Components: Lucene/SOLR connector
>Reporter: Sneha
>Assignee: Karl Wright
>Priority: Major
> Attachments: managed-schema, solrconfig.xml
>
>
> I am encountering this problem:
> I have checked "Use the Extract Update Handler:" param then I am getting an 
> error on Solr i.e. null:org.apache.solr.common.SolrException: 
> org.apache.tika.exception.ZeroByteFileException: InputStream must have > 0 
> bytes
> If I ignore tika exception, my documents get indexed but dont have content 
> field on Solr.
> I am using Solr 7.3.1 and manifoldCF 2.8.1
> I am using solr cell and hence not configured external tika extractor in 
> manifoldCF pipeline
> Please help me with this problem
> Thanks in advance



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (CONNECTORS-1563) SolrException: org.apache.tika.exception.ZeroByteFileException: InputStream must have > 0 bytes

2019-01-11 Thread Karl Wright (JIRA)



[ 
https://issues.apache.org/jira/browse/CONNECTORS-1563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16740330#comment-16740330
 ] 

Karl Wright commented on CONNECTORS-1563:
-

Can you tell me which configuration you are attempting:

(1) Solr Cell + extract update handler + no Tika content extraction in MCF, or
(2) NO Solr Cell + standard update handler + Tika content extraction in MCF

Which is it?


> SolrException: org.apache.tika.exception.ZeroByteFileException: InputStream 
> must have > 0 bytes
> ---
>
> Key: CONNECTORS-1563
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1563
> Project: ManifoldCF
>  Issue Type: Task
>  Components: Lucene/SOLR connector
>Reporter: Sneha
>Assignee: Karl Wright
>Priority: Major
> Attachments: managed-schema, solrconfig.xml
>
>
> I am encountering this problem:
> I have checked "Use the Extract Update Handler:" param then I am getting an 
> error on Solr i.e. null:org.apache.solr.common.SolrException: 
> org.apache.tika.exception.ZeroByteFileException: InputStream must have > 0 
> bytes
> If I ignore tika exception, my documents get indexed but dont have content 
> field on Solr.
> I am using Solr 7.3.1 and manifoldCF 2.8.1
> I am using solr cell and hence not configured external tika extractor in 
> manifoldCF pipeline
> Please help me with this problem
> Thanks in advance



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (CONNECTORS-1569) IBM WebSEAL authentication

2019-01-10 Thread Karl Wright (JIRA)



[ 
https://issues.apache.org/jira/browse/CONNECTORS-1569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16739541#comment-16739541
 ] 

Karl Wright commented on CONNECTORS-1569:
-

I'm not sure what the best approach might be for this since almost everyone 
wants the expect-continue in place.  It's essential, in fact, for 
authenticating properly via POST on many other systems.

Adding a way of disabling this via the UI is plausible but it's significant 
work all around.  Still, I think that would be the best approach to meet your 
needs.  Unfortunately I'm already booked at least until March, so you may do 
best by trying to submit a patch that I can integrate and/or clean up.

> IBM WebSEAL authentication
> --
>
> Key: CONNECTORS-1569
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1569
> Project: ManifoldCF
>  Issue Type: Bug
>  Components: Web connector
>Affects Versions: ManifoldCF 2.11
> Environment: Manifold 2.11
>  IBM WebSEAL
>Reporter: Ferdi Klomp
>Assignee: Karl Wright
>Priority: Major
>  Labels: ManifoldCF
>
> Hi,
> We have stumbled upon a problem with the Web Connector authentication in 
> relation to IBM WebSEAL. We were unable to perform a successfully 
> authentication against WebSEAL. After some time debugging we figured out the 
> web connector sends out a "Expect:100 Continue" header and this is not 
> supported by WebSEAL.
>  [https://www-01.ibm.com/support/docview.wss?uid=swg21626421
> ]1. Disabling the "Expect:100 Continue" functionality by putting 
> setExpectedContinueEnabled to false in the "ThrottledFetcher.java" eventually 
> solved the problem. The exact line can be found here:
>  
> [https://github.com/apache/manifoldcf/blob/trunk/connectors/webcrawler/connector/src/main/java/org/apache/manifoldcf/crawler/connectors/webcrawler/ThrottledFetcher.java#L508]
> I'm not sure if this option is required for other environment, or that it can 
> be disabled by default, or made configurable?
> 2. Another option would be to make the timeout configurable, as the WebSEAL 
> docs state "The browser need to have some kind of timeout to to send the 
> request body before exceeding intra-connection-timeout.". By default, the web 
> connector request timeout exceeded the intra-connection-timeout of WebSEAL.
> What is the best way to proceed and get a fixed for this in the web connector?
> Kind regards,
>  Ferdi



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Assigned] (CONNECTORS-1569) IBM WebSEAL authentication

2019-01-10 Thread Karl Wright (JIRA)



 [ 
https://issues.apache.org/jira/browse/CONNECTORS-1569?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karl Wright reassigned CONNECTORS-1569:
---

Assignee: Karl Wright

> IBM WebSEAL authentication
> --
>
> Key: CONNECTORS-1569
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1569
> Project: ManifoldCF
>  Issue Type: Bug
>  Components: Web connector
>Affects Versions: ManifoldCF 2.11
> Environment: Manifold 2.11
>  IBM WebSEAL
>Reporter: Ferdi Klomp
>Assignee: Karl Wright
>Priority: Major
>  Labels: ManifoldCF
>
> Hi,
> We have stumbled upon a problem with the Web Connector authentication in 
> relation to IBM WebSEAL. We were unable to perform a successfully 
> authentication against WebSEAL. After some time debugging we figured out the 
> web connector sends out a "Expect:100 Continue" header and this is not 
> supported by WebSEAL.
>  [https://www-01.ibm.com/support/docview.wss?uid=swg21626421
> ]1. Disabling the "Expect:100 Continue" functionality by putting 
> setExpectedContinueEnabled to false in the "ThrottledFetcher.java" eventually 
> solved the problem. The exact line can be found here:
>  
> [https://github.com/apache/manifoldcf/blob/trunk/connectors/webcrawler/connector/src/main/java/org/apache/manifoldcf/crawler/connectors/webcrawler/ThrottledFetcher.java#L508]
> I'm not sure if this option is required for other environment, or that it can 
> be disabled by default, or made configurable?
> 2. Another option would be to make the timeout configurable, as the WebSEAL 
> docs state "The browser need to have some kind of timeout to to send the 
> request body before exceeding intra-connection-timeout.". By default, the web 
> connector request timeout exceeded the intra-connection-timeout of WebSEAL.
> What is the best way to proceed and get a fixed for this in the web connector?
> Kind regards,
>  Ferdi



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (CONNECTORS-1562) Documents unreachable due to hopcount are not considered unreachable on cleanup pass

2019-01-09 Thread Karl Wright (JIRA)



[ 
https://issues.apache.org/jira/browse/CONNECTORS-1562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16738271#comment-16738271
 ] 

Karl Wright commented on CONNECTORS-1562:
-

The "Stream has been closed" issue is occurring because it is simply taking too 
long to read all the data from the sitemap page, and the webserver is closing 
the connection before it's complete.  Alternatively, it might be because the 
server is configured to cut pages off after a certain number of bytes.  I don't 
know which one it is.  You will need to do some research to figure out what 
your server's rules look like.  The preferred solution would be to simply relax 
the rules for that one page.

However, if that's not possible, the best alternative would be to break the 
sitemap page up into pieces.  If each piece was, say 1/4 the size, it might be 
small enough to get past your current rules.


> Documents unreachable due to hopcount are not considered unreachable on 
> cleanup pass
> 
>
> Key: CONNECTORS-1562
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1562
> Project: ManifoldCF
>  Issue Type: Bug
>  Components: Elastic Search connector, Web connector
>Affects Versions: ManifoldCF 2.11
> Environment: Manifoldcf 2.11
> Elasticsearch 6.3.2
> Web inputconnector
> elastic outputconnecotr
> Job crawls website input and outputs content to elastic
>    Reporter: Tim Steenbeke
>Assignee: Karl Wright
>Priority: Critical
>  Labels: starter
> Fix For: ManifoldCF 2.12
>
> Attachments: Screenshot from 2018-12-31 11-17-29.png, 
> image-2019-01-09-14-20-50-616.png, manifoldcf.log.cleanup, 
> manifoldcf.log.init, manifoldcf.log.reduced
>
>   Original Estimate: 4h
>  Remaining Estimate: 4h
>
> My documents aren't removed from ElasticSearch index after rerunning the 
> changed seeds
> I update my job to change the seedmap and rerun it or use the schedualer to 
> keep it runneng even after updating it.
> After the rerun the unreachable documents don't get deleted.
> It only adds doucments when they can be reached.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

< 4 5 6 7 8 9 10 11 12 13 >

801 - 900 of 7528 matches

Mail list logo