Amazon cloud search connector question
Hi Minoru-san, I am doing some work on the Amazon Cloud Search connector, and I noticed that there is no field that is getting set that contains the original un-hashed URI for the document. I would like to know if there is a standard field in Amazon where this URI should go? Thanks in advance for your help! Karl
[jira] [Commented] (CONNECTORS-1077) Add activity logging for decision and exception events across all connectors
[ https://issues.apache.org/jira/browse/CONNECTORS-1077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14184019#comment-14184019 ] Karl Wright commented on CONNECTORS-1077: - r1634188 (trunk) Amazon Cloud Search r1634189 (dev_1x) Add activity logging for decision and exception events across all connectors Key: CONNECTORS-1077 URL: https://issues.apache.org/jira/browse/CONNECTORS-1077 Project: ManifoldCF Issue Type: Improvement Components: Alfresco connector Affects Versions: ManifoldCF 2.0 Reporter: Karl Wright Assignee: Karl Wright Fix For: ManifoldCF 2.0 Attachments: Example.patch, capture, elasticsearch_review.patch, elasticsearch_review2.patch Many document skip decisions or transient exceptions are only logged, and are not recorded as history events. This makes it necessary upon occasion to refer to the manifoldcf log for basic diagnosis. We should record activity events for most decisions and exceptions in the history. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CONNECTORS-1077) Add activity logging for decision and exception events across all connectors
[ https://issues.apache.org/jira/browse/CONNECTORS-1077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14184021#comment-14184021 ] Karl Wright commented on CONNECTORS-1077: - r1634193 (trunk) removes table upgrade code for Amazon Cloud Search table for MCF 2.0, since no upgrade code should be present there. Add activity logging for decision and exception events across all connectors Key: CONNECTORS-1077 URL: https://issues.apache.org/jira/browse/CONNECTORS-1077 Project: ManifoldCF Issue Type: Improvement Components: Alfresco connector Affects Versions: ManifoldCF 2.0 Reporter: Karl Wright Assignee: Karl Wright Fix For: ManifoldCF 2.0 Attachments: Example.patch, capture, elasticsearch_review.patch, elasticsearch_review2.patch Many document skip decisions or transient exceptions are only logged, and are not recorded as history events. This makes it necessary upon occasion to refer to the manifoldcf log for basic diagnosis. We should record activity events for most decisions and exceptions in the history. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (CONNECTORS-1084) Missing resource 'WebcrawlerConnector.MatchMustHaveARegexpValue' in bundle 'org.apache.manifoldcf.crawler.connectors.webcrawler.common'
Mingchun Zhao created CONNECTORS-1084: - Summary: Missing resource 'WebcrawlerConnector.MatchMustHaveARegexpValue' in bundle 'org.apache.manifoldcf.crawler.connectors.webcrawler.common' Key: CONNECTORS-1084 URL: https://issues.apache.org/jira/browse/CONNECTORS-1084 Project: ManifoldCF Issue Type: Bug Components: Web connector Affects Versions: ManifoldCF 2.0 Reporter: Mingchun Zhao Assignee: Mingchun Zhao Priority: Minor An error occurred in web connector as below: ERROR 2014-10-24 09:30:19,537 (qtp876209191-368) - Missing resource 'WebcrawlerConnector.MatchMustHaveARegexpValue' in bundle 'org.apache.manifoldcf.crawler.connectors.webcrawler.common' for locale 'ja' java.util.MissingResourceException: Can't find resource for bundle java.util.PropertyResourceBundle, key WebcrawlerConnector.MatchMustHaveARegexpValue at java.util.ResourceBundle.getObject(ResourceBundle.java:395) at java.util.ResourceBundle.getString(ResourceBundle.java:355) ... ... -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Difference between trunk and dev_1x
Hi everyone, I'd like to create a branch to work on CONNECTORS-1082, but I'm doubtful whether to svn copy from dev_1x or from trunk; I currently don't have very clear what is the role of dev_1x and in which cases it is used. Maybe someone can shed some light on my doubt. Thanks in advance. mao
[jira] [Resolved] (CONNECTORS-1084) Missing resource 'WebcrawlerConnector.MatchMustHaveARegexpValue' in bundle 'org.apache.manifoldcf.crawler.connectors.webcrawler.common'
[ https://issues.apache.org/jira/browse/CONNECTORS-1084?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mingchun Zhao resolved CONNECTORS-1084. --- Resolution: Fixed Fix Version/s: ManifoldCF 2.0 Committed r1634202(trunk). Missing resource 'WebcrawlerConnector.MatchMustHaveARegexpValue' in bundle 'org.apache.manifoldcf.crawler.connectors.webcrawler.common' --- Key: CONNECTORS-1084 URL: https://issues.apache.org/jira/browse/CONNECTORS-1084 Project: ManifoldCF Issue Type: Bug Components: Web connector Affects Versions: ManifoldCF 2.0 Reporter: Mingchun Zhao Assignee: Mingchun Zhao Priority: Minor Fix For: ManifoldCF 2.0 An error occurred in web connector as below: ERROR 2014-10-24 09:30:19,537 (qtp876209191-368) - Missing resource 'WebcrawlerConnector.MatchMustHaveARegexpValue' in bundle 'org.apache.manifoldcf.crawler.connectors.webcrawler.common' for locale 'ja' java.util.MissingResourceException: Can't find resource for bundle java.util.PropertyResourceBundle, key WebcrawlerConnector.MatchMustHaveARegexpValue at java.util.ResourceBundle.getObject(ResourceBundle.java:395) at java.util.ResourceBundle.getString(ResourceBundle.java:355) ... ... -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: New committer: Rafa Haro
Hi Rafa, A warm welcome! Mingchun Zhao 2014-10-06 18:47 GMT+09:00 Karl Wright daddy...@gmail.com: The Project Management Committee (PMC) for Apache ManifoldCFhas asked Rafa Haro to become a committer and we are pleased to announce that they have accepted. Rafa has been working to include the new Alfresco web-script connector and authority into the ManifoldCF project as an officially supported connector. Rafa is also an existing committer for the Apache Stanbol project. Being a committer enables easier contribution to theproject since there is no need to go via the patchsubmission process. This should enable better productivity. Thanks, Karl
Re: New committer: Alessandro Benedetti
Hi Alessandro, A warm welcome! Mingchun Zhao 2014-10-17 17:12 GMT+09:00 Karl Wright daddy...@gmail.com: The Project Management Committee (PMC) for Apache ManifoldCF has asked Alessandro Benedetti to become a committer and we are pleased to announce that they have accepted. Alessandro has been active in using ManifoldCF to integrate clients with Apache Solr over several years. Being a committer enables easier contribution to the project since there is no need to go via the patch submission process. This should enable better productivity. Thanks, The ManifoldCF PMC
Re: New committer: Maurizio Pillitu
Hi Maurizio, A warm welcome! Mingchun Zhao 2014-10-07 2:44 GMT+09:00 Karl Wright daddy...@gmail.com: The Project Management Committee (PMC) for Apache ManifoldCFhas asked Maurizio Pillitu to become a committer and we are pleased to announce that they have accepted. Maurizio has worked with Rafa Haro to develop the Alfresco Webscript connector and authority. As an Alfresco developer, he will be able to keep us up to date with changes in the Alfresco product offerings and maintenance of that connector. Being a committer enables easier contribution to theproject since there is no need to go via the patchsubmission process. This should enable better productivity. Thanks, The ManifoldCF PMC
[jira] [Reopened] (CONNECTORS-1079) the parsing in TikaExtractor always return empty result
[ https://issues.apache.org/jira/browse/CONNECTORS-1079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mingchun Zhao reopened CONNECTORS-1079: --- Hi Karl, Thank you for your help, I've tried your fix. Unfortunately, this symptom still occurs even we have two ika-core.jar in both of lib and connector-lib directory. It looks like that the two same jars cause jar conflict. I tried to use ClassLoader to fix it, but gave up eventually. because that makes things more confusing. Could you please confirm my suggestion as below: 1. Get rid of the tika-core.jar from lib directory(need to modify build.xml?) 2. Directly call Tika().detect to get MimeType instead of calling ExtensionMimeMap.mapToMimeType. The related connectors as below(4 files): connectors/filesystem/connector/src/main/java/org/apache/manifoldcf/crawler/connectors/filesystem/FileConnector.java connectors/hdfs/connector/src/main/java/org/apache/manifoldcf/crawler/connectors/hdfs/HDFSRepositoryConnector.java connectors/jcifs/connector/src/main/java/org/apache/manifoldcf/crawler/connectors/sharedrive/SharedDriveConnector.java connectors/sharepoint/connector/src/main/java/org/apache/manifoldcf/crawler/connectors/sharepoint/SharePointRepository.java 3.Delete unused ExtensionMimeMap class which just contains one method to call Tika().detect to get MimeType. framework/core/src/main/java/org/apache/manifoldcf/core/extmimemap/ExtensionMimeMap.java Thanks. the parsing in TikaExtractor always return empty result --- Key: CONNECTORS-1079 URL: https://issues.apache.org/jira/browse/CONNECTORS-1079 Project: ManifoldCF Issue Type: Bug Components: Tika extractor Affects Versions: ManifoldCF 2.0 Reporter: Mingchun Zhao Assignee: Karl Wright Fix For: ManifoldCF 1.8, ManifoldCF 2.0 When I use latest trunk source(2.0) to try the Tika content extractor,It did not return any expected results. I looked at it using debugging tools, found that the parser of Tika content extractor does not return any data. I've tried to move lib/tika-core-1.6.jar into connector-lib/, Then, the Tika content extractor returned data as expected. My configurations are as below: == Transformation: Type: Tika content extractor Output: Type:Solr(Use extract update handler=false) Repository: type: Web Job: 1.type: repository 2.type: transformation 3.type: output == Maybe, it is related to CONNECTORS-1074(?), It looks like that the place of tika-core-1.6.jar affects the result of TikaExtractor. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (CONNECTORS-1079) the parsing in TikaExtractor always return empty result
[ https://issues.apache.org/jira/browse/CONNECTORS-1079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14184165#comment-14184165 ] Mingchun Zhao edited comment on CONNECTORS-1079 at 10/25/14 5:10 PM: - Hi Karl, Thank you for your help, I've tried your fix. Unfortunately, this symptom still occurs even we have two tika-core.jar in both of lib and connector-lib directory. It looks like that the two same jars cause jar conflict. I tried to use ClassLoader to fix it, but gave up eventually. because that makes things more confusing. Could you please confirm my suggestion as below: 1. Get rid of the tika-core.jar from lib directory(need to modify build.xml?) 2. Directly call Tika().detect to get MimeType instead of calling ExtensionMimeMap.mapToMimeType. The related connectors as below(4 files): connectors/filesystem/connector/src/main/java/org/apache/manifoldcf/crawler/connectors/filesystem/FileConnector.java connectors/hdfs/connector/src/main/java/org/apache/manifoldcf/crawler/connectors/hdfs/HDFSRepositoryConnector.java connectors/jcifs/connector/src/main/java/org/apache/manifoldcf/crawler/connectors/sharedrive/SharedDriveConnector.java connectors/sharepoint/connector/src/main/java/org/apache/manifoldcf/crawler/connectors/sharepoint/SharePointRepository.java 3.Delete unused ExtensionMimeMap class which just contains one method to call Tika().detect to get MimeType. framework/core/src/main/java/org/apache/manifoldcf/core/extmimemap/ExtensionMimeMap.java Thanks. was (Author: mingchun.zhao): Hi Karl, Thank you for your help, I've tried your fix. Unfortunately, this symptom still occurs even we have two ika-core.jar in both of lib and connector-lib directory. It looks like that the two same jars cause jar conflict. I tried to use ClassLoader to fix it, but gave up eventually. because that makes things more confusing. Could you please confirm my suggestion as below: 1. Get rid of the tika-core.jar from lib directory(need to modify build.xml?) 2. Directly call Tika().detect to get MimeType instead of calling ExtensionMimeMap.mapToMimeType. The related connectors as below(4 files): connectors/filesystem/connector/src/main/java/org/apache/manifoldcf/crawler/connectors/filesystem/FileConnector.java connectors/hdfs/connector/src/main/java/org/apache/manifoldcf/crawler/connectors/hdfs/HDFSRepositoryConnector.java connectors/jcifs/connector/src/main/java/org/apache/manifoldcf/crawler/connectors/sharedrive/SharedDriveConnector.java connectors/sharepoint/connector/src/main/java/org/apache/manifoldcf/crawler/connectors/sharepoint/SharePointRepository.java 3.Delete unused ExtensionMimeMap class which just contains one method to call Tika().detect to get MimeType. framework/core/src/main/java/org/apache/manifoldcf/core/extmimemap/ExtensionMimeMap.java Thanks. the parsing in TikaExtractor always return empty result --- Key: CONNECTORS-1079 URL: https://issues.apache.org/jira/browse/CONNECTORS-1079 Project: ManifoldCF Issue Type: Bug Components: Tika extractor Affects Versions: ManifoldCF 2.0 Reporter: Mingchun Zhao Assignee: Karl Wright Fix For: ManifoldCF 1.8, ManifoldCF 2.0 When I use latest trunk source(2.0) to try the Tika content extractor,It did not return any expected results. I looked at it using debugging tools, found that the parser of Tika content extractor does not return any data. I've tried to move lib/tika-core-1.6.jar into connector-lib/, Then, the Tika content extractor returned data as expected. My configurations are as below: == Transformation: Type: Tika content extractor Output: Type:Solr(Use extract update handler=false) Repository: type: Web Job: 1.type: repository 2.type: transformation 3.type: output == Maybe, it is related to CONNECTORS-1074(?), It looks like that the place of tika-core-1.6.jar affects the result of TikaExtractor. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (CONNECTORS-1079) the parsing in TikaExtractor always return empty result
[ https://issues.apache.org/jira/browse/CONNECTORS-1079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14184165#comment-14184165 ] Mingchun Zhao edited comment on CONNECTORS-1079 at 10/25/14 5:11 PM: - Hi Karl, Thank you for your help, I've tried your fix. Unfortunately, this symptom still occurs even though we have two tika-core.jar in both of lib and connector-lib directory. It looks like that the two same jars cause jar conflict. I tried to use ClassLoader to fix it, but gave up eventually. because that makes things more confusing. Could you please confirm my suggestion as below: 1. Get rid of the tika-core.jar from lib directory(need to modify build.xml?) 2. Directly call Tika().detect to get MimeType instead of calling ExtensionMimeMap.mapToMimeType. The related connectors as below(4 files): connectors/filesystem/connector/src/main/java/org/apache/manifoldcf/crawler/connectors/filesystem/FileConnector.java connectors/hdfs/connector/src/main/java/org/apache/manifoldcf/crawler/connectors/hdfs/HDFSRepositoryConnector.java connectors/jcifs/connector/src/main/java/org/apache/manifoldcf/crawler/connectors/sharedrive/SharedDriveConnector.java connectors/sharepoint/connector/src/main/java/org/apache/manifoldcf/crawler/connectors/sharepoint/SharePointRepository.java 3.Delete unused ExtensionMimeMap class which just contains one method to call Tika().detect to get MimeType. framework/core/src/main/java/org/apache/manifoldcf/core/extmimemap/ExtensionMimeMap.java Thanks. was (Author: mingchun.zhao): Hi Karl, Thank you for your help, I've tried your fix. Unfortunately, this symptom still occurs even we have two tika-core.jar in both of lib and connector-lib directory. It looks like that the two same jars cause jar conflict. I tried to use ClassLoader to fix it, but gave up eventually. because that makes things more confusing. Could you please confirm my suggestion as below: 1. Get rid of the tika-core.jar from lib directory(need to modify build.xml?) 2. Directly call Tika().detect to get MimeType instead of calling ExtensionMimeMap.mapToMimeType. The related connectors as below(4 files): connectors/filesystem/connector/src/main/java/org/apache/manifoldcf/crawler/connectors/filesystem/FileConnector.java connectors/hdfs/connector/src/main/java/org/apache/manifoldcf/crawler/connectors/hdfs/HDFSRepositoryConnector.java connectors/jcifs/connector/src/main/java/org/apache/manifoldcf/crawler/connectors/sharedrive/SharedDriveConnector.java connectors/sharepoint/connector/src/main/java/org/apache/manifoldcf/crawler/connectors/sharepoint/SharePointRepository.java 3.Delete unused ExtensionMimeMap class which just contains one method to call Tika().detect to get MimeType. framework/core/src/main/java/org/apache/manifoldcf/core/extmimemap/ExtensionMimeMap.java Thanks. the parsing in TikaExtractor always return empty result --- Key: CONNECTORS-1079 URL: https://issues.apache.org/jira/browse/CONNECTORS-1079 Project: ManifoldCF Issue Type: Bug Components: Tika extractor Affects Versions: ManifoldCF 2.0 Reporter: Mingchun Zhao Assignee: Karl Wright Fix For: ManifoldCF 1.8, ManifoldCF 2.0 When I use latest trunk source(2.0) to try the Tika content extractor,It did not return any expected results. I looked at it using debugging tools, found that the parser of Tika content extractor does not return any data. I've tried to move lib/tika-core-1.6.jar into connector-lib/, Then, the Tika content extractor returned data as expected. My configurations are as below: == Transformation: Type: Tika content extractor Output: Type:Solr(Use extract update handler=false) Repository: type: Web Job: 1.type: repository 2.type: transformation 3.type: output == Maybe, it is related to CONNECTORS-1074(?), It looks like that the place of tika-core-1.6.jar affects the result of TikaExtractor. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CONNECTORS-1079) the parsing in TikaExtractor always return empty result
[ https://issues.apache.org/jira/browse/CONNECTORS-1079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14184227#comment-14184227 ] Karl Wright commented on CONNECTORS-1079: - The first thing to check is how big the build binary will be if every tika jar is at the root level. Sent from my Windows Phone From: Mingchun Zhao (JIRA) Sent: 10/25/2014 1:12 PM To: daddy...@gmail.com Subject: [jira] [Comment Edited] (CONNECTORS-1079) the parsing in TikaExtractor always return empty result [ https://issues.apache.org/jira/browse/CONNECTORS-1079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14184165#comment-14184165 ] Mingchun Zhao edited comment on CONNECTORS-1079 at 10/25/14 5:11 PM: - Hi Karl, Thank you for your help, I've tried your fix. Unfortunately, this symptom still occurs even though we have two tika-core.jar in both of lib and connector-lib directory. It looks like that the two same jars cause jar conflict. I tried to use ClassLoader to fix it, but gave up eventually. because that makes things more confusing. Could you please confirm my suggestion as below: 1. Get rid of the tika-core.jar from lib directory(need to modify build.xml?) 2. Directly call Tika().detect to get MimeType instead of calling ExtensionMimeMap.mapToMimeType. The related connectors as below(4 files): connectors/filesystem/connector/src/main/java/org/apache/manifoldcf/crawler/connectors/filesystem/FileConnector.java connectors/hdfs/connector/src/main/java/org/apache/manifoldcf/crawler/connectors/hdfs/HDFSRepositoryConnector.java connectors/jcifs/connector/src/main/java/org/apache/manifoldcf/crawler/connectors/sharedrive/SharedDriveConnector.java connectors/sharepoint/connector/src/main/java/org/apache/manifoldcf/crawler/connectors/sharepoint/SharePointRepository.java 3.Delete unused ExtensionMimeMap class which just contains one method to call Tika().detect to get MimeType. framework/core/src/main/java/org/apache/manifoldcf/core/extmimemap/ExtensionMimeMap.java Thanks. was (Author: mingchun.zhao): Hi Karl, Thank you for your help, I've tried your fix. Unfortunately, this symptom still occurs even we have two tika-core.jar in both of lib and connector-lib directory. It looks like that the two same jars cause jar conflict. I tried to use ClassLoader to fix it, but gave up eventually. because that makes things more confusing. Could you please confirm my suggestion as below: 1. Get rid of the tika-core.jar from lib directory(need to modify build.xml?) 2. Directly call Tika().detect to get MimeType instead of calling ExtensionMimeMap.mapToMimeType. The related connectors as below(4 files): connectors/filesystem/connector/src/main/java/org/apache/manifoldcf/crawler/connectors/filesystem/FileConnector.java connectors/hdfs/connector/src/main/java/org/apache/manifoldcf/crawler/connectors/hdfs/HDFSRepositoryConnector.java connectors/jcifs/connector/src/main/java/org/apache/manifoldcf/crawler/connectors/sharedrive/SharedDriveConnector.java connectors/sharepoint/connector/src/main/java/org/apache/manifoldcf/crawler/connectors/sharepoint/SharePointRepository.java 3.Delete unused ExtensionMimeMap class which just contains one method to call Tika().detect to get MimeType. framework/core/src/main/java/org/apache/manifoldcf/core/extmimemap/ExtensionMimeMap.java Thanks. -- This message was sent by Atlassian JIRA (v6.3.4#6332) the parsing in TikaExtractor always return empty result --- Key: CONNECTORS-1079 URL: https://issues.apache.org/jira/browse/CONNECTORS-1079 Project: ManifoldCF Issue Type: Bug Components: Tika extractor Affects Versions: ManifoldCF 2.0 Reporter: Mingchun Zhao Assignee: Karl Wright Fix For: ManifoldCF 1.8, ManifoldCF 2.0 When I use latest trunk source(2.0) to try the Tika content extractor,It did not return any expected results. I looked at it using debugging tools, found that the parser of Tika content extractor does not return any data. I've tried to move lib/tika-core-1.6.jar into connector-lib/, Then, the Tika content extractor returned data as expected. My configurations are as below: == Transformation: Type: Tika content extractor Output: Type:Solr(Use extract update handler=false) Repository: type: Web Job: 1.type: repository 2.type: transformation 3.type: output == Maybe, it is related to CONNECTORS-1074(?), It looks like that the place of tika-core-1.6.jar affects the result of TikaExtractor. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CONNECTORS-1079) the parsing in TikaExtractor always return empty result
[ https://issues.apache.org/jira/browse/CONNECTORS-1079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14184307#comment-14184307 ] Mingchun Zhao commented on CONNECTORS-1079: --- The first thing to check is how big the build binary will be if every tika jar is at the root level. Thanks, I'll confirm this. the parsing in TikaExtractor always return empty result --- Key: CONNECTORS-1079 URL: https://issues.apache.org/jira/browse/CONNECTORS-1079 Project: ManifoldCF Issue Type: Bug Components: Tika extractor Affects Versions: ManifoldCF 2.0 Reporter: Mingchun Zhao Assignee: Karl Wright Fix For: ManifoldCF 1.8, ManifoldCF 2.0 When I use latest trunk source(2.0) to try the Tika content extractor,It did not return any expected results. I looked at it using debugging tools, found that the parser of Tika content extractor does not return any data. I've tried to move lib/tika-core-1.6.jar into connector-lib/, Then, the Tika content extractor returned data as expected. My configurations are as below: == Transformation: Type: Tika content extractor Output: Type:Solr(Use extract update handler=false) Repository: type: Web Job: 1.type: repository 2.type: transformation 3.type: output == Maybe, it is related to CONNECTORS-1074(?), It looks like that the place of tika-core-1.6.jar affects the result of TikaExtractor. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
RE: Difference between trunk and dev_1x
Hi Maurizio, The best way to develop is to do it against trunk. We pull up any needed changes to the dev_1x branch depending on what it is. So make your branch by svn copy from trunk. Thanks, Karl Sent from my Windows Phone From: Maurizio Pillitu Sent: 10/25/2014 6:20 AM To: dev@manifoldcf.apache.org Subject: Difference between trunk and dev_1x Hi everyone, I'd like to create a branch to work on CONNECTORS-1082, but I'm doubtful whether to svn copy from dev_1x or from trunk; I currently don't have very clear what is the role of dev_1x and in which cases it is used. Maybe someone can shed some light on my doubt. Thanks in advance. mao
[jira] [Commented] (CONNECTORS-1079) the parsing in TikaExtractor always return empty result
[ https://issues.apache.org/jira/browse/CONNECTORS-1079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14184350#comment-14184350 ] Karl Wright commented on CONNECTORS-1079: - The size I get is OK, but is 30% larger than without: {code} 10/25/2014 09:32 PM 243,448,556 apache-manifoldcf-2.0-dev-bin.zip {code} Still, I think I will commit it this way for now. We can try other ways of cutting back on size after things are working again. the parsing in TikaExtractor always return empty result --- Key: CONNECTORS-1079 URL: https://issues.apache.org/jira/browse/CONNECTORS-1079 Project: ManifoldCF Issue Type: Bug Components: Tika extractor Affects Versions: ManifoldCF 2.0 Reporter: Mingchun Zhao Assignee: Karl Wright Fix For: ManifoldCF 1.8, ManifoldCF 2.0 When I use latest trunk source(2.0) to try the Tika content extractor,It did not return any expected results. I looked at it using debugging tools, found that the parser of Tika content extractor does not return any data. I've tried to move lib/tika-core-1.6.jar into connector-lib/, Then, the Tika content extractor returned data as expected. My configurations are as below: == Transformation: Type: Tika content extractor Output: Type:Solr(Use extract update handler=false) Repository: type: Web Job: 1.type: repository 2.type: transformation 3.type: output == Maybe, it is related to CONNECTORS-1074(?), It looks like that the place of tika-core-1.6.jar affects the result of TikaExtractor. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CONNECTORS-1079) the parsing in TikaExtractor always return empty result
[ https://issues.apache.org/jira/browse/CONNECTORS-1079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14184352#comment-14184352 ] Karl Wright commented on CONNECTORS-1079: - r1634264 (trunk) r1634265 (dev_1x) Mingchun, please verify whether this works. Thanks! the parsing in TikaExtractor always return empty result --- Key: CONNECTORS-1079 URL: https://issues.apache.org/jira/browse/CONNECTORS-1079 Project: ManifoldCF Issue Type: Bug Components: Tika extractor Affects Versions: ManifoldCF 2.0 Reporter: Mingchun Zhao Assignee: Karl Wright Fix For: ManifoldCF 1.8, ManifoldCF 2.0 When I use latest trunk source(2.0) to try the Tika content extractor,It did not return any expected results. I looked at it using debugging tools, found that the parser of Tika content extractor does not return any data. I've tried to move lib/tika-core-1.6.jar into connector-lib/, Then, the Tika content extractor returned data as expected. My configurations are as below: == Transformation: Type: Tika content extractor Output: Type:Solr(Use extract update handler=false) Repository: type: Web Job: 1.type: repository 2.type: transformation 3.type: output == Maybe, it is related to CONNECTORS-1074(?), It looks like that the place of tika-core-1.6.jar affects the result of TikaExtractor. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CONNECTORS-1084) Missing resource 'WebcrawlerConnector.MatchMustHaveARegexpValue' in bundle 'org.apache.manifoldcf.crawler.connectors.webcrawler.common'
[ https://issues.apache.org/jira/browse/CONNECTORS-1084?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karl Wright updated CONNECTORS-1084: Fix Version/s: ManifoldCF 1.8 Missing resource 'WebcrawlerConnector.MatchMustHaveARegexpValue' in bundle 'org.apache.manifoldcf.crawler.connectors.webcrawler.common' --- Key: CONNECTORS-1084 URL: https://issues.apache.org/jira/browse/CONNECTORS-1084 Project: ManifoldCF Issue Type: Bug Components: Web connector Affects Versions: ManifoldCF 2.0 Reporter: Mingchun Zhao Assignee: Mingchun Zhao Priority: Minor Fix For: ManifoldCF 1.8, ManifoldCF 2.0 An error occurred in web connector as below: ERROR 2014-10-24 09:30:19,537 (qtp876209191-368) - Missing resource 'WebcrawlerConnector.MatchMustHaveARegexpValue' in bundle 'org.apache.manifoldcf.crawler.connectors.webcrawler.common' for locale 'ja' java.util.MissingResourceException: Can't find resource for bundle java.util.PropertyResourceBundle, key WebcrawlerConnector.MatchMustHaveARegexpValue at java.util.ResourceBundle.getObject(ResourceBundle.java:395) at java.util.ResourceBundle.getString(ResourceBundle.java:355) ... ... -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (CONNECTORS-1085) Introduce a mcf-connector-common.jar to save binary delivery space
Karl Wright created CONNECTORS-1085: --- Summary: Introduce a mcf-connector-common.jar to save binary delivery space Key: CONNECTORS-1085 URL: https://issues.apache.org/jira/browse/CONNECTORS-1085 Project: ManifoldCF Issue Type: Improvement Components: Framework core Affects Versions: ManifoldCF 2.0 Reporter: Karl Wright Assignee: Karl Wright Fix For: ManifoldCF 2.0 The ManifoldCF 2.0 deliverable provides a number of connector-only services in mcf-core, such as: - ISO 8601 date parsing and formatting - Axis SOAP transport support via Httpcomponents Httpclient - extension to mime type mapping These functions have the unfortunate requirement that many (large) jar packages wind up needing to be included at the root level, which since these wind up in all of the various war files, really bloats the binary deliverable. For MCF 2.0, we can fix this by moving this functionality to a mcf-connector-common.jar, which would be included in connector-lib rather than at the root level. This can't be done for MCF 1.8, because of backwards compatibility reasons. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CONNECTORS-1079) the parsing in TikaExtractor always return empty result
[ https://issues.apache.org/jira/browse/CONNECTORS-1079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14184403#comment-14184403 ] Karl Wright commented on CONNECTORS-1079: - I'm going to resolve this ticket; Mingchun please reopen if it still does not work. The space issue I plan to deal with as described in CONNECTORS-1085. the parsing in TikaExtractor always return empty result --- Key: CONNECTORS-1079 URL: https://issues.apache.org/jira/browse/CONNECTORS-1079 Project: ManifoldCF Issue Type: Bug Components: Tika extractor Affects Versions: ManifoldCF 2.0 Reporter: Mingchun Zhao Assignee: Karl Wright Fix For: ManifoldCF 1.8, ManifoldCF 2.0 When I use latest trunk source(2.0) to try the Tika content extractor,It did not return any expected results. I looked at it using debugging tools, found that the parser of Tika content extractor does not return any data. I've tried to move lib/tika-core-1.6.jar into connector-lib/, Then, the Tika content extractor returned data as expected. My configurations are as below: == Transformation: Type: Tika content extractor Output: Type:Solr(Use extract update handler=false) Repository: type: Web Job: 1.type: repository 2.type: transformation 3.type: output == Maybe, it is related to CONNECTORS-1074(?), It looks like that the place of tika-core-1.6.jar affects the result of TikaExtractor. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (CONNECTORS-1079) the parsing in TikaExtractor always return empty result
[ https://issues.apache.org/jira/browse/CONNECTORS-1079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14184403#comment-14184403 ] Karl Wright edited comment on CONNECTORS-1079 at 10/26/14 5:14 AM: --- I'm going to resolve this ticket; Mingchun please reopen if it still does not work. The size issue I plan to deal with as described in CONNECTORS-1085. was (Author: kwri...@metacarta.com): I'm going to resolve this ticket; Mingchun please reopen if it still does not work. The space issue I plan to deal with as described in CONNECTORS-1085. the parsing in TikaExtractor always return empty result --- Key: CONNECTORS-1079 URL: https://issues.apache.org/jira/browse/CONNECTORS-1079 Project: ManifoldCF Issue Type: Bug Components: Tika extractor Affects Versions: ManifoldCF 2.0 Reporter: Mingchun Zhao Assignee: Karl Wright Fix For: ManifoldCF 1.8, ManifoldCF 2.0 When I use latest trunk source(2.0) to try the Tika content extractor,It did not return any expected results. I looked at it using debugging tools, found that the parser of Tika content extractor does not return any data. I've tried to move lib/tika-core-1.6.jar into connector-lib/, Then, the Tika content extractor returned data as expected. My configurations are as below: == Transformation: Type: Tika content extractor Output: Type:Solr(Use extract update handler=false) Repository: type: Web Job: 1.type: repository 2.type: transformation 3.type: output == Maybe, it is related to CONNECTORS-1074(?), It looks like that the place of tika-core-1.6.jar affects the result of TikaExtractor. -- This message was sent by Atlassian JIRA (v6.3.4#6332)