Amazon cloud search connector question

2014-10-25 Thread Karl Wright
Hi Minoru-san,

I am doing some work on the Amazon Cloud Search connector, and I noticed
that there is no field that is getting set that contains the original
un-hashed URI for the document.  I would like to know if there is a
standard field in Amazon where this URI should go?

Thanks in advance for your help!

Karl


[jira] [Commented] (CONNECTORS-1077) Add activity logging for decision and exception events across all connectors

2014-10-25 Thread Karl Wright (JIRA)

[ 
https://issues.apache.org/jira/browse/CONNECTORS-1077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14184019#comment-14184019
 ] 

Karl Wright commented on CONNECTORS-1077:
-

r1634188 (trunk) Amazon Cloud Search
r1634189 (dev_1x)


 Add activity logging for decision and exception events across all connectors
 

 Key: CONNECTORS-1077
 URL: https://issues.apache.org/jira/browse/CONNECTORS-1077
 Project: ManifoldCF
  Issue Type: Improvement
  Components: Alfresco connector
Affects Versions: ManifoldCF 2.0
Reporter: Karl Wright
Assignee: Karl Wright
 Fix For: ManifoldCF 2.0

 Attachments: Example.patch, capture, elasticsearch_review.patch, 
 elasticsearch_review2.patch


 Many document skip decisions or transient exceptions are only logged, and are 
 not recorded as history events.  This makes it necessary upon occasion to 
 refer to the manifoldcf log for basic diagnosis.  We should record activity 
 events for most decisions and exceptions in the history.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CONNECTORS-1077) Add activity logging for decision and exception events across all connectors

2014-10-25 Thread Karl Wright (JIRA)

[ 
https://issues.apache.org/jira/browse/CONNECTORS-1077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14184021#comment-14184021
 ] 

Karl Wright commented on CONNECTORS-1077:
-

r1634193 (trunk) removes table upgrade code for Amazon Cloud Search table for 
MCF 2.0, since no upgrade code should be present there.


 Add activity logging for decision and exception events across all connectors
 

 Key: CONNECTORS-1077
 URL: https://issues.apache.org/jira/browse/CONNECTORS-1077
 Project: ManifoldCF
  Issue Type: Improvement
  Components: Alfresco connector
Affects Versions: ManifoldCF 2.0
Reporter: Karl Wright
Assignee: Karl Wright
 Fix For: ManifoldCF 2.0

 Attachments: Example.patch, capture, elasticsearch_review.patch, 
 elasticsearch_review2.patch


 Many document skip decisions or transient exceptions are only logged, and are 
 not recorded as history events.  This makes it necessary upon occasion to 
 refer to the manifoldcf log for basic diagnosis.  We should record activity 
 events for most decisions and exceptions in the history.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (CONNECTORS-1084) Missing resource 'WebcrawlerConnector.MatchMustHaveARegexpValue' in bundle 'org.apache.manifoldcf.crawler.connectors.webcrawler.common'

2014-10-25 Thread Mingchun Zhao (JIRA)
Mingchun Zhao created CONNECTORS-1084:
-

 Summary: Missing resource 
'WebcrawlerConnector.MatchMustHaveARegexpValue' in bundle 
'org.apache.manifoldcf.crawler.connectors.webcrawler.common'
 Key: CONNECTORS-1084
 URL: https://issues.apache.org/jira/browse/CONNECTORS-1084
 Project: ManifoldCF
  Issue Type: Bug
  Components: Web connector
Affects Versions: ManifoldCF 2.0
Reporter: Mingchun Zhao
Assignee: Mingchun Zhao
Priority: Minor


An error occurred in web connector as below:

ERROR 2014-10-24 09:30:19,537 (qtp876209191-368) - Missing resource 
'WebcrawlerConnector.MatchMustHaveARegexpValue' in bundle 
'org.apache.manifoldcf.crawler.connectors.webcrawler.common' for locale 'ja'
java.util.MissingResourceException: Can't find resource for bundle 
java.util.PropertyResourceBundle, key 
WebcrawlerConnector.MatchMustHaveARegexpValue
at java.util.ResourceBundle.getObject(ResourceBundle.java:395)
at java.util.ResourceBundle.getString(ResourceBundle.java:355)
... ...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Difference between trunk and dev_1x

2014-10-25 Thread Maurizio Pillitu
Hi everyone,

I'd like to create a branch to work on CONNECTORS-1082, but I'm doubtful
whether to svn copy from dev_1x or from trunk; I currently don't have
very clear what is the role of dev_1x and in which cases it is used.

Maybe someone can shed some light on my doubt.
Thanks in advance.

mao


[jira] [Resolved] (CONNECTORS-1084) Missing resource 'WebcrawlerConnector.MatchMustHaveARegexpValue' in bundle 'org.apache.manifoldcf.crawler.connectors.webcrawler.common'

2014-10-25 Thread Mingchun Zhao (JIRA)

 [ 
https://issues.apache.org/jira/browse/CONNECTORS-1084?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mingchun Zhao resolved CONNECTORS-1084.
---
   Resolution: Fixed
Fix Version/s: ManifoldCF 2.0

Committed r1634202(trunk).

 Missing resource 'WebcrawlerConnector.MatchMustHaveARegexpValue' in bundle 
 'org.apache.manifoldcf.crawler.connectors.webcrawler.common'
 ---

 Key: CONNECTORS-1084
 URL: https://issues.apache.org/jira/browse/CONNECTORS-1084
 Project: ManifoldCF
  Issue Type: Bug
  Components: Web connector
Affects Versions: ManifoldCF 2.0
Reporter: Mingchun Zhao
Assignee: Mingchun Zhao
Priority: Minor
 Fix For: ManifoldCF 2.0


 An error occurred in web connector as below:
 ERROR 2014-10-24 09:30:19,537 (qtp876209191-368) - Missing resource 
 'WebcrawlerConnector.MatchMustHaveARegexpValue' in bundle 
 'org.apache.manifoldcf.crawler.connectors.webcrawler.common' for locale 'ja'
 java.util.MissingResourceException: Can't find resource for bundle 
 java.util.PropertyResourceBundle, key 
 WebcrawlerConnector.MatchMustHaveARegexpValue
 at java.util.ResourceBundle.getObject(ResourceBundle.java:395)
 at java.util.ResourceBundle.getString(ResourceBundle.java:355)
 ... ...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: New committer: Rafa Haro

2014-10-25 Thread Mingchun Zhao
Hi Rafa,

A warm welcome!

Mingchun Zhao

2014-10-06 18:47 GMT+09:00 Karl Wright daddy...@gmail.com:
 The Project Management Committee (PMC) for Apache ManifoldCFhas asked
 Rafa Haro to become a committer and we are pleased to announce that
 they have accepted.

 Rafa has been working to include the new Alfresco web-script connector
 and authority into the ManifoldCF project as an officially
 supported connector.  Rafa is also an existing committer for the

 Apache Stanbol project.


 Being a committer enables easier contribution to theproject since
 there is no need to go via the patchsubmission process. This should
 enable better productivity.

 Thanks,
 Karl


Re: New committer: Alessandro Benedetti

2014-10-25 Thread Mingchun Zhao
Hi Alessandro,

A warm welcome!

Mingchun Zhao

2014-10-17 17:12 GMT+09:00 Karl Wright daddy...@gmail.com:
 The Project Management Committee (PMC) for Apache ManifoldCF
 has asked Alessandro Benedetti to become a committer and we are pleased
 to announce that they have accepted.

 Alessandro has been active in using ManifoldCF to integrate
 clients with Apache Solr over several years.

 Being a committer enables easier contribution to the
 project since there is no need to go via the patch
 submission process. This should enable better productivity.

 Thanks,
 The ManifoldCF PMC


Re: New committer: Maurizio Pillitu

2014-10-25 Thread Mingchun Zhao
Hi Maurizio,

A warm welcome!

Mingchun Zhao

2014-10-07 2:44 GMT+09:00 Karl Wright daddy...@gmail.com:
 The Project Management Committee (PMC) for Apache ManifoldCFhas asked
 Maurizio Pillitu to become a committer and we are pleased to announce
 that they have accepted.

 Maurizio has worked with Rafa Haro to develop the Alfresco Webscript
 connector and authority.  As an Alfresco developer, he will be able
 to keep us up to date with changes in the Alfresco product offerings
 and maintenance of that connector.

 Being a committer enables easier contribution to theproject since
 there is no need to go via the patchsubmission process. This should
 enable better productivity.

 Thanks,

 The ManifoldCF PMC


[jira] [Reopened] (CONNECTORS-1079) the parsing in TikaExtractor always return empty result

2014-10-25 Thread Mingchun Zhao (JIRA)

 [ 
https://issues.apache.org/jira/browse/CONNECTORS-1079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mingchun Zhao reopened CONNECTORS-1079:
---

Hi Karl,

Thank you for your help, I've tried your fix.
Unfortunately, this symptom still occurs even we have two ika-core.jar in both 
of lib and connector-lib directory.
It looks like that the two same jars cause jar conflict.
I tried to use ClassLoader to fix it, but gave up eventually. because that 
makes things more confusing.

Could you please confirm my suggestion as below:

1. Get rid of the tika-core.jar from lib directory(need to modify build.xml?)

2. Directly call Tika().detect to get MimeType instead of calling 
ExtensionMimeMap.mapToMimeType.
The related connectors as below(4 files):
connectors/filesystem/connector/src/main/java/org/apache/manifoldcf/crawler/connectors/filesystem/FileConnector.java
connectors/hdfs/connector/src/main/java/org/apache/manifoldcf/crawler/connectors/hdfs/HDFSRepositoryConnector.java
connectors/jcifs/connector/src/main/java/org/apache/manifoldcf/crawler/connectors/sharedrive/SharedDriveConnector.java
connectors/sharepoint/connector/src/main/java/org/apache/manifoldcf/crawler/connectors/sharepoint/SharePointRepository.java

3.Delete unused ExtensionMimeMap class which just contains one method to call 
Tika().detect to get MimeType.
framework/core/src/main/java/org/apache/manifoldcf/core/extmimemap/ExtensionMimeMap.java

Thanks.

 the parsing in TikaExtractor always return empty result
 ---

 Key: CONNECTORS-1079
 URL: https://issues.apache.org/jira/browse/CONNECTORS-1079
 Project: ManifoldCF
  Issue Type: Bug
  Components: Tika extractor
Affects Versions: ManifoldCF 2.0
Reporter: Mingchun Zhao
Assignee: Karl Wright
 Fix For: ManifoldCF 1.8, ManifoldCF 2.0


 When I use latest trunk source(2.0) to try the Tika content extractor,It did 
 not return any expected results.
 I looked at it using debugging tools, found that the parser of Tika content 
 extractor does not return any data.
 I've tried to move lib/tika-core-1.6.jar into connector-lib/, 
 Then, the Tika content extractor returned data as expected.
 My configurations are as below:
 ==
 Transformation:
  Type: Tika content extractor
 Output:
  Type:Solr(Use extract update handler=false)
 Repository:
  type: Web
 Job:
  1.type: repository
  2.type: transformation
  3.type: output
 ==
 Maybe, it is related to CONNECTORS-1074(?), 
 It looks like that the place of tika-core-1.6.jar affects the result of 
 TikaExtractor.
  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (CONNECTORS-1079) the parsing in TikaExtractor always return empty result

2014-10-25 Thread Mingchun Zhao (JIRA)

[ 
https://issues.apache.org/jira/browse/CONNECTORS-1079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14184165#comment-14184165
 ] 

Mingchun Zhao edited comment on CONNECTORS-1079 at 10/25/14 5:10 PM:
-

Hi Karl,

Thank you for your help, I've tried your fix.
Unfortunately, this symptom still occurs even we have two tika-core.jar in both 
of lib and connector-lib directory.
It looks like that the two same jars cause jar conflict.
I tried to use ClassLoader to fix it, but gave up eventually. because that 
makes things more confusing.

Could you please confirm my suggestion as below:

1. Get rid of the tika-core.jar from lib directory(need to modify build.xml?)

2. Directly call Tika().detect to get MimeType instead of calling 
ExtensionMimeMap.mapToMimeType.
The related connectors as below(4 files):
connectors/filesystem/connector/src/main/java/org/apache/manifoldcf/crawler/connectors/filesystem/FileConnector.java
connectors/hdfs/connector/src/main/java/org/apache/manifoldcf/crawler/connectors/hdfs/HDFSRepositoryConnector.java
connectors/jcifs/connector/src/main/java/org/apache/manifoldcf/crawler/connectors/sharedrive/SharedDriveConnector.java
connectors/sharepoint/connector/src/main/java/org/apache/manifoldcf/crawler/connectors/sharepoint/SharePointRepository.java

3.Delete unused ExtensionMimeMap class which just contains one method to call 
Tika().detect to get MimeType.
framework/core/src/main/java/org/apache/manifoldcf/core/extmimemap/ExtensionMimeMap.java

Thanks.


was (Author: mingchun.zhao):
Hi Karl,

Thank you for your help, I've tried your fix.
Unfortunately, this symptom still occurs even we have two ika-core.jar in both 
of lib and connector-lib directory.
It looks like that the two same jars cause jar conflict.
I tried to use ClassLoader to fix it, but gave up eventually. because that 
makes things more confusing.

Could you please confirm my suggestion as below:

1. Get rid of the tika-core.jar from lib directory(need to modify build.xml?)

2. Directly call Tika().detect to get MimeType instead of calling 
ExtensionMimeMap.mapToMimeType.
The related connectors as below(4 files):
connectors/filesystem/connector/src/main/java/org/apache/manifoldcf/crawler/connectors/filesystem/FileConnector.java
connectors/hdfs/connector/src/main/java/org/apache/manifoldcf/crawler/connectors/hdfs/HDFSRepositoryConnector.java
connectors/jcifs/connector/src/main/java/org/apache/manifoldcf/crawler/connectors/sharedrive/SharedDriveConnector.java
connectors/sharepoint/connector/src/main/java/org/apache/manifoldcf/crawler/connectors/sharepoint/SharePointRepository.java

3.Delete unused ExtensionMimeMap class which just contains one method to call 
Tika().detect to get MimeType.
framework/core/src/main/java/org/apache/manifoldcf/core/extmimemap/ExtensionMimeMap.java

Thanks.

 the parsing in TikaExtractor always return empty result
 ---

 Key: CONNECTORS-1079
 URL: https://issues.apache.org/jira/browse/CONNECTORS-1079
 Project: ManifoldCF
  Issue Type: Bug
  Components: Tika extractor
Affects Versions: ManifoldCF 2.0
Reporter: Mingchun Zhao
Assignee: Karl Wright
 Fix For: ManifoldCF 1.8, ManifoldCF 2.0


 When I use latest trunk source(2.0) to try the Tika content extractor,It did 
 not return any expected results.
 I looked at it using debugging tools, found that the parser of Tika content 
 extractor does not return any data.
 I've tried to move lib/tika-core-1.6.jar into connector-lib/, 
 Then, the Tika content extractor returned data as expected.
 My configurations are as below:
 ==
 Transformation:
  Type: Tika content extractor
 Output:
  Type:Solr(Use extract update handler=false)
 Repository:
  type: Web
 Job:
  1.type: repository
  2.type: transformation
  3.type: output
 ==
 Maybe, it is related to CONNECTORS-1074(?), 
 It looks like that the place of tika-core-1.6.jar affects the result of 
 TikaExtractor.
  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (CONNECTORS-1079) the parsing in TikaExtractor always return empty result

2014-10-25 Thread Mingchun Zhao (JIRA)

[ 
https://issues.apache.org/jira/browse/CONNECTORS-1079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14184165#comment-14184165
 ] 

Mingchun Zhao edited comment on CONNECTORS-1079 at 10/25/14 5:11 PM:
-

Hi Karl,

Thank you for your help, I've tried your fix.
Unfortunately, this symptom still occurs even though we have two tika-core.jar 
in both of lib and connector-lib directory.
It looks like that the two same jars cause jar conflict.
I tried to use ClassLoader to fix it, but gave up eventually. because that 
makes things more confusing.

Could you please confirm my suggestion as below:

1. Get rid of the tika-core.jar from lib directory(need to modify build.xml?)

2. Directly call Tika().detect to get MimeType instead of calling 
ExtensionMimeMap.mapToMimeType.
The related connectors as below(4 files):
connectors/filesystem/connector/src/main/java/org/apache/manifoldcf/crawler/connectors/filesystem/FileConnector.java
connectors/hdfs/connector/src/main/java/org/apache/manifoldcf/crawler/connectors/hdfs/HDFSRepositoryConnector.java
connectors/jcifs/connector/src/main/java/org/apache/manifoldcf/crawler/connectors/sharedrive/SharedDriveConnector.java
connectors/sharepoint/connector/src/main/java/org/apache/manifoldcf/crawler/connectors/sharepoint/SharePointRepository.java

3.Delete unused ExtensionMimeMap class which just contains one method to call 
Tika().detect to get MimeType.
framework/core/src/main/java/org/apache/manifoldcf/core/extmimemap/ExtensionMimeMap.java

Thanks.


was (Author: mingchun.zhao):
Hi Karl,

Thank you for your help, I've tried your fix.
Unfortunately, this symptom still occurs even we have two tika-core.jar in both 
of lib and connector-lib directory.
It looks like that the two same jars cause jar conflict.
I tried to use ClassLoader to fix it, but gave up eventually. because that 
makes things more confusing.

Could you please confirm my suggestion as below:

1. Get rid of the tika-core.jar from lib directory(need to modify build.xml?)

2. Directly call Tika().detect to get MimeType instead of calling 
ExtensionMimeMap.mapToMimeType.
The related connectors as below(4 files):
connectors/filesystem/connector/src/main/java/org/apache/manifoldcf/crawler/connectors/filesystem/FileConnector.java
connectors/hdfs/connector/src/main/java/org/apache/manifoldcf/crawler/connectors/hdfs/HDFSRepositoryConnector.java
connectors/jcifs/connector/src/main/java/org/apache/manifoldcf/crawler/connectors/sharedrive/SharedDriveConnector.java
connectors/sharepoint/connector/src/main/java/org/apache/manifoldcf/crawler/connectors/sharepoint/SharePointRepository.java

3.Delete unused ExtensionMimeMap class which just contains one method to call 
Tika().detect to get MimeType.
framework/core/src/main/java/org/apache/manifoldcf/core/extmimemap/ExtensionMimeMap.java

Thanks.

 the parsing in TikaExtractor always return empty result
 ---

 Key: CONNECTORS-1079
 URL: https://issues.apache.org/jira/browse/CONNECTORS-1079
 Project: ManifoldCF
  Issue Type: Bug
  Components: Tika extractor
Affects Versions: ManifoldCF 2.0
Reporter: Mingchun Zhao
Assignee: Karl Wright
 Fix For: ManifoldCF 1.8, ManifoldCF 2.0


 When I use latest trunk source(2.0) to try the Tika content extractor,It did 
 not return any expected results.
 I looked at it using debugging tools, found that the parser of Tika content 
 extractor does not return any data.
 I've tried to move lib/tika-core-1.6.jar into connector-lib/, 
 Then, the Tika content extractor returned data as expected.
 My configurations are as below:
 ==
 Transformation:
  Type: Tika content extractor
 Output:
  Type:Solr(Use extract update handler=false)
 Repository:
  type: Web
 Job:
  1.type: repository
  2.type: transformation
  3.type: output
 ==
 Maybe, it is related to CONNECTORS-1074(?), 
 It looks like that the place of tika-core-1.6.jar affects the result of 
 TikaExtractor.
  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CONNECTORS-1079) the parsing in TikaExtractor always return empty result

2014-10-25 Thread Karl Wright (JIRA)

[ 
https://issues.apache.org/jira/browse/CONNECTORS-1079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14184227#comment-14184227
 ] 

Karl Wright commented on CONNECTORS-1079:
-

The first thing to check is how big the build binary will be if every
tika jar is at the root level.

Sent from my Windows Phone
From: Mingchun Zhao (JIRA)
Sent: 10/25/2014 1:12 PM
To: daddy...@gmail.com
Subject: [jira] [Comment Edited] (CONNECTORS-1079) the parsing in
TikaExtractor always return empty result

[ 
https://issues.apache.org/jira/browse/CONNECTORS-1079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14184165#comment-14184165
]

Mingchun Zhao edited comment on CONNECTORS-1079 at 10/25/14 5:11 PM:
-

Hi Karl,

Thank you for your help, I've tried your fix.
Unfortunately, this symptom still occurs even though we have two
tika-core.jar in both of lib and connector-lib directory.
It looks like that the two same jars cause jar conflict.
I tried to use ClassLoader to fix it, but gave up eventually. because
that makes things more confusing.

Could you please confirm my suggestion as below:

1. Get rid of the tika-core.jar from lib directory(need to modify build.xml?)

2. Directly call Tika().detect to get MimeType instead of calling
ExtensionMimeMap.mapToMimeType.
The related connectors as below(4 files):
connectors/filesystem/connector/src/main/java/org/apache/manifoldcf/crawler/connectors/filesystem/FileConnector.java
connectors/hdfs/connector/src/main/java/org/apache/manifoldcf/crawler/connectors/hdfs/HDFSRepositoryConnector.java
connectors/jcifs/connector/src/main/java/org/apache/manifoldcf/crawler/connectors/sharedrive/SharedDriveConnector.java
connectors/sharepoint/connector/src/main/java/org/apache/manifoldcf/crawler/connectors/sharepoint/SharePointRepository.java

3.Delete unused ExtensionMimeMap class which just contains one method
to call Tika().detect to get MimeType.
framework/core/src/main/java/org/apache/manifoldcf/core/extmimemap/ExtensionMimeMap.java

Thanks.


was (Author: mingchun.zhao):
Hi Karl,

Thank you for your help, I've tried your fix.
Unfortunately, this symptom still occurs even we have two
tika-core.jar in both of lib and connector-lib directory.
It looks like that the two same jars cause jar conflict.
I tried to use ClassLoader to fix it, but gave up eventually. because
that makes things more confusing.

Could you please confirm my suggestion as below:

1. Get rid of the tika-core.jar from lib directory(need to modify build.xml?)

2. Directly call Tika().detect to get MimeType instead of calling
ExtensionMimeMap.mapToMimeType.
The related connectors as below(4 files):
connectors/filesystem/connector/src/main/java/org/apache/manifoldcf/crawler/connectors/filesystem/FileConnector.java
connectors/hdfs/connector/src/main/java/org/apache/manifoldcf/crawler/connectors/hdfs/HDFSRepositoryConnector.java
connectors/jcifs/connector/src/main/java/org/apache/manifoldcf/crawler/connectors/sharedrive/SharedDriveConnector.java
connectors/sharepoint/connector/src/main/java/org/apache/manifoldcf/crawler/connectors/sharepoint/SharePointRepository.java

3.Delete unused ExtensionMimeMap class which just contains one method
to call Tika().detect to get MimeType.
framework/core/src/main/java/org/apache/manifoldcf/core/extmimemap/ExtensionMimeMap.java

Thanks.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


 the parsing in TikaExtractor always return empty result
 ---

 Key: CONNECTORS-1079
 URL: https://issues.apache.org/jira/browse/CONNECTORS-1079
 Project: ManifoldCF
  Issue Type: Bug
  Components: Tika extractor
Affects Versions: ManifoldCF 2.0
Reporter: Mingchun Zhao
Assignee: Karl Wright
 Fix For: ManifoldCF 1.8, ManifoldCF 2.0


 When I use latest trunk source(2.0) to try the Tika content extractor,It did 
 not return any expected results.
 I looked at it using debugging tools, found that the parser of Tika content 
 extractor does not return any data.
 I've tried to move lib/tika-core-1.6.jar into connector-lib/, 
 Then, the Tika content extractor returned data as expected.
 My configurations are as below:
 ==
 Transformation:
  Type: Tika content extractor
 Output:
  Type:Solr(Use extract update handler=false)
 Repository:
  type: Web
 Job:
  1.type: repository
  2.type: transformation
  3.type: output
 ==
 Maybe, it is related to CONNECTORS-1074(?), 
 It looks like that the place of tika-core-1.6.jar affects the result of 
 TikaExtractor.
  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CONNECTORS-1079) the parsing in TikaExtractor always return empty result

2014-10-25 Thread Mingchun Zhao (JIRA)

[ 
https://issues.apache.org/jira/browse/CONNECTORS-1079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14184307#comment-14184307
 ] 

Mingchun Zhao commented on CONNECTORS-1079:
---

 The first thing to check is how big the build binary will be if every
 tika jar is at the root level.

Thanks, I'll confirm this.

 the parsing in TikaExtractor always return empty result
 ---

 Key: CONNECTORS-1079
 URL: https://issues.apache.org/jira/browse/CONNECTORS-1079
 Project: ManifoldCF
  Issue Type: Bug
  Components: Tika extractor
Affects Versions: ManifoldCF 2.0
Reporter: Mingchun Zhao
Assignee: Karl Wright
 Fix For: ManifoldCF 1.8, ManifoldCF 2.0


 When I use latest trunk source(2.0) to try the Tika content extractor,It did 
 not return any expected results.
 I looked at it using debugging tools, found that the parser of Tika content 
 extractor does not return any data.
 I've tried to move lib/tika-core-1.6.jar into connector-lib/, 
 Then, the Tika content extractor returned data as expected.
 My configurations are as below:
 ==
 Transformation:
  Type: Tika content extractor
 Output:
  Type:Solr(Use extract update handler=false)
 Repository:
  type: Web
 Job:
  1.type: repository
  2.type: transformation
  3.type: output
 ==
 Maybe, it is related to CONNECTORS-1074(?), 
 It looks like that the place of tika-core-1.6.jar affects the result of 
 TikaExtractor.
  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


RE: Difference between trunk and dev_1x

2014-10-25 Thread Karl Wright
Hi Maurizio,

The best way to develop is to do it against trunk. We pull up any
needed changes to the dev_1x branch depending on what it is.
So make your branch by svn copy from trunk.

Thanks,
Karl

Sent from my Windows Phone
From: Maurizio Pillitu
Sent: 10/25/2014 6:20 AM
To: dev@manifoldcf.apache.org
Subject: Difference between trunk and dev_1x
Hi everyone,

I'd like to create a branch to work on CONNECTORS-1082, but I'm doubtful
whether to svn copy from dev_1x or from trunk; I currently don't have
very clear what is the role of dev_1x and in which cases it is used.

Maybe someone can shed some light on my doubt.
Thanks in advance.

mao


[jira] [Commented] (CONNECTORS-1079) the parsing in TikaExtractor always return empty result

2014-10-25 Thread Karl Wright (JIRA)

[ 
https://issues.apache.org/jira/browse/CONNECTORS-1079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14184350#comment-14184350
 ] 

Karl Wright commented on CONNECTORS-1079:
-

The size I get is OK, but is 30% larger than without:

{code}
10/25/2014  09:32 PM   243,448,556 apache-manifoldcf-2.0-dev-bin.zip
{code}

Still, I think I will commit it this way for now.  We can try other ways of 
cutting back on size after things are working again.


 the parsing in TikaExtractor always return empty result
 ---

 Key: CONNECTORS-1079
 URL: https://issues.apache.org/jira/browse/CONNECTORS-1079
 Project: ManifoldCF
  Issue Type: Bug
  Components: Tika extractor
Affects Versions: ManifoldCF 2.0
Reporter: Mingchun Zhao
Assignee: Karl Wright
 Fix For: ManifoldCF 1.8, ManifoldCF 2.0


 When I use latest trunk source(2.0) to try the Tika content extractor,It did 
 not return any expected results.
 I looked at it using debugging tools, found that the parser of Tika content 
 extractor does not return any data.
 I've tried to move lib/tika-core-1.6.jar into connector-lib/, 
 Then, the Tika content extractor returned data as expected.
 My configurations are as below:
 ==
 Transformation:
  Type: Tika content extractor
 Output:
  Type:Solr(Use extract update handler=false)
 Repository:
  type: Web
 Job:
  1.type: repository
  2.type: transformation
  3.type: output
 ==
 Maybe, it is related to CONNECTORS-1074(?), 
 It looks like that the place of tika-core-1.6.jar affects the result of 
 TikaExtractor.
  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CONNECTORS-1079) the parsing in TikaExtractor always return empty result

2014-10-25 Thread Karl Wright (JIRA)

[ 
https://issues.apache.org/jira/browse/CONNECTORS-1079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14184352#comment-14184352
 ] 

Karl Wright commented on CONNECTORS-1079:
-

r1634264 (trunk)
r1634265 (dev_1x)

Mingchun, please verify whether this works.  Thanks!


 the parsing in TikaExtractor always return empty result
 ---

 Key: CONNECTORS-1079
 URL: https://issues.apache.org/jira/browse/CONNECTORS-1079
 Project: ManifoldCF
  Issue Type: Bug
  Components: Tika extractor
Affects Versions: ManifoldCF 2.0
Reporter: Mingchun Zhao
Assignee: Karl Wright
 Fix For: ManifoldCF 1.8, ManifoldCF 2.0


 When I use latest trunk source(2.0) to try the Tika content extractor,It did 
 not return any expected results.
 I looked at it using debugging tools, found that the parser of Tika content 
 extractor does not return any data.
 I've tried to move lib/tika-core-1.6.jar into connector-lib/, 
 Then, the Tika content extractor returned data as expected.
 My configurations are as below:
 ==
 Transformation:
  Type: Tika content extractor
 Output:
  Type:Solr(Use extract update handler=false)
 Repository:
  type: Web
 Job:
  1.type: repository
  2.type: transformation
  3.type: output
 ==
 Maybe, it is related to CONNECTORS-1074(?), 
 It looks like that the place of tika-core-1.6.jar affects the result of 
 TikaExtractor.
  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CONNECTORS-1084) Missing resource 'WebcrawlerConnector.MatchMustHaveARegexpValue' in bundle 'org.apache.manifoldcf.crawler.connectors.webcrawler.common'

2014-10-25 Thread Karl Wright (JIRA)

 [ 
https://issues.apache.org/jira/browse/CONNECTORS-1084?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karl Wright updated CONNECTORS-1084:

Fix Version/s: ManifoldCF 1.8

 Missing resource 'WebcrawlerConnector.MatchMustHaveARegexpValue' in bundle 
 'org.apache.manifoldcf.crawler.connectors.webcrawler.common'
 ---

 Key: CONNECTORS-1084
 URL: https://issues.apache.org/jira/browse/CONNECTORS-1084
 Project: ManifoldCF
  Issue Type: Bug
  Components: Web connector
Affects Versions: ManifoldCF 2.0
Reporter: Mingchun Zhao
Assignee: Mingchun Zhao
Priority: Minor
 Fix For: ManifoldCF 1.8, ManifoldCF 2.0


 An error occurred in web connector as below:
 ERROR 2014-10-24 09:30:19,537 (qtp876209191-368) - Missing resource 
 'WebcrawlerConnector.MatchMustHaveARegexpValue' in bundle 
 'org.apache.manifoldcf.crawler.connectors.webcrawler.common' for locale 'ja'
 java.util.MissingResourceException: Can't find resource for bundle 
 java.util.PropertyResourceBundle, key 
 WebcrawlerConnector.MatchMustHaveARegexpValue
 at java.util.ResourceBundle.getObject(ResourceBundle.java:395)
 at java.util.ResourceBundle.getString(ResourceBundle.java:355)
 ... ...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (CONNECTORS-1085) Introduce a mcf-connector-common.jar to save binary delivery space

2014-10-25 Thread Karl Wright (JIRA)
Karl Wright created CONNECTORS-1085:
---

 Summary: Introduce a mcf-connector-common.jar to save binary 
delivery space
 Key: CONNECTORS-1085
 URL: https://issues.apache.org/jira/browse/CONNECTORS-1085
 Project: ManifoldCF
  Issue Type: Improvement
  Components: Framework core
Affects Versions: ManifoldCF 2.0
Reporter: Karl Wright
Assignee: Karl Wright
 Fix For: ManifoldCF 2.0


The ManifoldCF 2.0 deliverable provides a number of connector-only services in 
mcf-core, such as:

- ISO 8601 date parsing and formatting
- Axis SOAP transport support via Httpcomponents Httpclient
- extension to mime type mapping

These functions have the unfortunate requirement that many (large) jar packages 
wind up needing to be included at the root level, which since these wind up in 
all of the various war files, really bloats the binary deliverable.

For MCF 2.0, we can fix this by moving this functionality to a 
mcf-connector-common.jar, which would be included in connector-lib rather than 
at the root level.

This can't be done for MCF 1.8, because of backwards compatibility reasons.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CONNECTORS-1079) the parsing in TikaExtractor always return empty result

2014-10-25 Thread Karl Wright (JIRA)

[ 
https://issues.apache.org/jira/browse/CONNECTORS-1079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14184403#comment-14184403
 ] 

Karl Wright commented on CONNECTORS-1079:
-

I'm going to resolve this ticket; Mingchun please reopen if it still does not 
work.

The space issue I plan to deal with as described in CONNECTORS-1085.



 the parsing in TikaExtractor always return empty result
 ---

 Key: CONNECTORS-1079
 URL: https://issues.apache.org/jira/browse/CONNECTORS-1079
 Project: ManifoldCF
  Issue Type: Bug
  Components: Tika extractor
Affects Versions: ManifoldCF 2.0
Reporter: Mingchun Zhao
Assignee: Karl Wright
 Fix For: ManifoldCF 1.8, ManifoldCF 2.0


 When I use latest trunk source(2.0) to try the Tika content extractor,It did 
 not return any expected results.
 I looked at it using debugging tools, found that the parser of Tika content 
 extractor does not return any data.
 I've tried to move lib/tika-core-1.6.jar into connector-lib/, 
 Then, the Tika content extractor returned data as expected.
 My configurations are as below:
 ==
 Transformation:
  Type: Tika content extractor
 Output:
  Type:Solr(Use extract update handler=false)
 Repository:
  type: Web
 Job:
  1.type: repository
  2.type: transformation
  3.type: output
 ==
 Maybe, it is related to CONNECTORS-1074(?), 
 It looks like that the place of tika-core-1.6.jar affects the result of 
 TikaExtractor.
  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (CONNECTORS-1079) the parsing in TikaExtractor always return empty result

2014-10-25 Thread Karl Wright (JIRA)

[ 
https://issues.apache.org/jira/browse/CONNECTORS-1079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14184403#comment-14184403
 ] 

Karl Wright edited comment on CONNECTORS-1079 at 10/26/14 5:14 AM:
---

I'm going to resolve this ticket; Mingchun please reopen if it still does not 
work.

The size issue I plan to deal with as described in CONNECTORS-1085.




was (Author: kwri...@metacarta.com):
I'm going to resolve this ticket; Mingchun please reopen if it still does not 
work.

The space issue I plan to deal with as described in CONNECTORS-1085.



 the parsing in TikaExtractor always return empty result
 ---

 Key: CONNECTORS-1079
 URL: https://issues.apache.org/jira/browse/CONNECTORS-1079
 Project: ManifoldCF
  Issue Type: Bug
  Components: Tika extractor
Affects Versions: ManifoldCF 2.0
Reporter: Mingchun Zhao
Assignee: Karl Wright
 Fix For: ManifoldCF 1.8, ManifoldCF 2.0


 When I use latest trunk source(2.0) to try the Tika content extractor,It did 
 not return any expected results.
 I looked at it using debugging tools, found that the parser of Tika content 
 extractor does not return any data.
 I've tried to move lib/tika-core-1.6.jar into connector-lib/, 
 Then, the Tika content extractor returned data as expected.
 My configurations are as below:
 ==
 Transformation:
  Type: Tika content extractor
 Output:
  Type:Solr(Use extract update handler=false)
 Repository:
  type: Web
 Job:
  1.type: repository
  2.type: transformation
  3.type: output
 ==
 Maybe, it is related to CONNECTORS-1074(?), 
 It looks like that the place of tika-core-1.6.jar affects the result of 
 TikaExtractor.
  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)