[jira] [Created] (CONNECTORS-986) Error While Editing a job involving a pipeline
Rafa Haro created CONNECTORS-986: Summary: Error While Editing a job involving a pipeline Key: CONNECTORS-986 URL: https://issues.apache.org/jira/browse/CONNECTORS-986 Project: ManifoldCF Issue Type: Bug Components: Framework core Affects Versions: ManifoldCF 1.7 Reporter: Rafa Haro To reproduce the error: 1. Create a FileSystem Repository Connector 2. Create a Solr Output Connector 3. Create a Transformation Connector, for example Allowed Documents 4. Create a job, configure a pipeline including the transformation connector 5. Save the Job 6. Edit the Job. Go to Repository Paths. Try to Add a root path. 7. Save the job 8. Error in the UI: Error! Output name 'null' removed from job; not allowed Exception: org.apache.manifoldcf.core.interfaces.ManifoldCFException: Output name 'null' removed from job; not allowed at org.apache.manifoldcf.crawler.jobs.PipelineManager.compareRows(PipelineManager.java:267) at org.apache.manifoldcf.crawler.jobs.Jobs.save(Jobs.java:988) at org.apache.manifoldcf.crawler.jobs.JobManager.save(JobManager.java:848) at org.apache.jsp.execute_jsp._jspService(execute_jsp.java:1809) at org.apache.jasper.runtime.HttpJspBase.service(HttpJspBase.java:70) at javax.servlet.http.HttpServlet.service(HttpServlet.java:820) at org.apache.jasper.servlet.JspServletWrapper.service(JspServletWrapper.java:388) at org.apache.jasper.servlet.JspServlet.serviceJspFile(JspServlet.java:313) at org.apache.jasper.servlet.JspServlet.service(JspServlet.java:260) at javax.servlet.http.HttpServlet.service(HttpServlet.java:820) at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:547) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:480) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:119) at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:520) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:227) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:941) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:409) at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:186) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:875) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:117) at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:250) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:110) at org.eclipse.jetty.server.Server.handle(Server.java:349) at org.eclipse.jetty.server.HttpConnection.handleRequest(HttpConnection.java:441) at org.eclipse.jetty.server.HttpConnection$RequestHandler.content(HttpConnection.java:936) at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:801) at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:224) at org.eclipse.jetty.server.AsyncHttpConnection.handle(AsyncHttpConnection.java:51) at org.eclipse.jetty.io.nio.SelectChannelEndPoint.handle(SelectChannelEndPoint.java:586) at org.eclipse.jetty.io.nio.SelectChannelEndPoint$1.run(SelectChannelEndPoint.java:44) at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:598) at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:533) at java.lang.Thread.run(Thread.java:744) -- This message was sent by Atlassian JIRA (v6.2#6252)
Testing Pipelines. Conclusions so far and Some Doubts
Hi, I have spent a couple of hours testing the Pipelines in ManifoldCF 1.7. Before exposing the problems I have experimented and before asking some questions, I would like to explain the kind of test I have performed so far: 1. Testing with a simple File system connector for simplicity 2. Using 2 instances of Solr Output Connector for testing Multiple output. The final Solr instance is the same and each output connector has been configured with 2 different solr cores (collection1 and collection2) 3. Using Allowed Documents and Tika Extractor as Transformation connectors. Allowed Documents has been configured to allow only PDF files (mimetype + extension) 4. The processing pipeline I wanted to configure is quite simple: Filter and extract content (with Tika) for collection1 and a normal crawling for collection2. Let me explain better: both transformation connectors were configured for collection1 Solr Output and no transformation connector were configured for collection2. I have two files in the configured repository path for the File system connector: a PDF file and an ODS file. I was expecting only the PDF file to be indexed in collection1 and both files in collection2. The result of the experiment has been the following: 1. All the files have been indexed in both collections. Apparently the Allowed Documents transformation connector doesn't work with filesystem repository connector. 2. For collection1 Output Connector, I first changed the Update Handler from /update/extract to /update because Tika Extractor was going to be configured for it. This change produces an error in Solr while indexing (Unsupported ContentType: application/octet-stream Not in: [application/xml, text/csv, text/json, application/csv, application/javabin, text/xml, application/json]). 3. Therefore, I configured again the update handler as /update/extract. Because the same exact content is being indexed for both cores, I don't have a way to know if the Tika transformation connector is working properly or not. Up to here the testing outcomes. Now I would like to expose some conclusions from the point of view of our use case. Although the pipeline approach is great, as far as I have understood it, we can't still use it for our purposes. Specifically, what we would is somehow to create different repository documents in any moment of the chain and send them to different output connector. Let me put an easy use case: We want to process the documents to extract Named Entities: Persons, Places and Organizations. The first transformation of the pipeline can use any NER system to extract the name entities. Then I want to have separates repositories (outputs): one for the raw content and one for each type of entity. Let's say 4 different solr cores. Of course with current approach I could send the same repository document to all the outputs and respectively filtering, but doesn't sound to me as a good solution. Cheers, Rafa
[jira] [Created] (CONNECTORS-987) Chinese Localization(Documentation, Help screens)
Mingchun Zhao created CONNECTORS-987: Summary: Chinese Localization(Documentation, Help screens) Key: CONNECTORS-987 URL: https://issues.apache.org/jira/browse/CONNECTORS-987 Project: ManifoldCF Issue Type: Improvement Components: Documentation Reporter: Mingchun Zhao -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (CONNECTORS-987) Chinese Localization(Documentation, Help screens)
[ https://issues.apache.org/jira/browse/CONNECTORS-987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mingchun Zhao updated CONNECTORS-987: - Description: In this issue, I will deal with documentation,help screens for Chinese Localization. Chinese Localization(Documentation, Help screens) - Key: CONNECTORS-987 URL: https://issues.apache.org/jira/browse/CONNECTORS-987 Project: ManifoldCF Issue Type: Improvement Components: Documentation Reporter: Mingchun Zhao In this issue, I will deal with documentation,help screens for Chinese Localization. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (CONNECTORS-987) Chinese Localization(Documentation, Help screens)
[ https://issues.apache.org/jira/browse/CONNECTORS-987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mingchun Zhao updated CONNECTORS-987: - Attachment: CONNECTORS-987.patch A patch for the first time(5 files added). Chinese Localization(Documentation, Help screens) - Key: CONNECTORS-987 URL: https://issues.apache.org/jira/browse/CONNECTORS-987 Project: ManifoldCF Issue Type: Improvement Components: Documentation Reporter: Mingchun Zhao Attachments: CONNECTORS-987.patch In this issue, I will deal with documentation,help screens for Chinese Localization. -- This message was sent by Atlassian JIRA (v6.2#6252)
RE: Testing Pipelines. Conclusions so far and Some Doubts
Hi rafa, I am out of town at the moment, but frankly I could see no reason that the architecture as it is implemented would not meet your use case. A transformation connection is not limited to passing along the input repository document object; it can modify I extensively and even replace it. Karl Sent from my Windows Phone From: Rafa Haro Sent: 6/30/2014 6:48 AM To: dev@manifoldcf.apache.org Subject: Testing Pipelines. Conclusions so far and Some Doubts Hi, I have spent a couple of hours testing the Pipelines in ManifoldCF 1.7. Before exposing the problems I have experimented and before asking some questions, I would like to explain the kind of test I have performed so far: 1. Testing with a simple File system connector for simplicity 2. Using 2 instances of Solr Output Connector for testing Multiple output. The final Solr instance is the same and each output connector has been configured with 2 different solr cores (collection1 and collection2) 3. Using Allowed Documents and Tika Extractor as Transformation connectors. Allowed Documents has been configured to allow only PDF files (mimetype + extension) 4. The processing pipeline I wanted to configure is quite simple: Filter and extract content (with Tika) for collection1 and a normal crawling for collection2. Let me explain better: both transformation connectors were configured for collection1 Solr Output and no transformation connector were configured for collection2. I have two files in the configured repository path for the File system connector: a PDF file and an ODS file. I was expecting only the PDF file to be indexed in collection1 and both files in collection2. The result of the experiment has been the following: 1. All the files have been indexed in both collections. Apparently the Allowed Documents transformation connector doesn't work with filesystem repository connector. 2. For collection1 Output Connector, I first changed the Update Handler from /update/extract to /update because Tika Extractor was going to be configured for it. This change produces an error in Solr while indexing (Unsupported ContentType: application/octet-stream Not in: [application/xml, text/csv, text/json, application/csv, application/javabin, text/xml, application/json]). 3. Therefore, I configured again the update handler as /update/extract. Because the same exact content is being indexed for both cores, I don't have a way to know if the Tika transformation connector is working properly or not. Up to here the testing outcomes. Now I would like to expose some conclusions from the point of view of our use case. Although the pipeline approach is great, as far as I have understood it, we can't still use it for our purposes. Specifically, what we would is somehow to create different repository documents in any moment of the chain and send them to different output connector. Let me put an easy use case: We want to process the documents to extract Named Entities: Persons, Places and Organizations. The first transformation of the pipeline can use any NER system to extract the name entities. Then I want to have separates repositories (outputs): one for the raw content and one for each type of entity. Let's say 4 different solr cores. Of course with current approach I could send the same repository document to all the outputs and respectively filtering, but doesn't sound to me as a good solution. Cheers, Rafa
RE: [jira] [Created] (CONNECTORS-986) Error While Editing a job involving a pipeline
Hi, This problem was corrected last week and committed to trunk. Please synch up and try again. Thanks, Karl Sent from my Windows Phone From: Rafa Haro (JIRA) Sent: 6/30/2014 5:53 AM To: dev@manifoldcf.apache.org Subject: [jira] [Created] (CONNECTORS-986) Error While Editing a job involving a pipeline Rafa Haro created CONNECTORS-986: Summary: Error While Editing a job involving a pipeline Key: CONNECTORS-986 URL: https://issues.apache.org/jira/browse/CONNECTORS-986 Project: ManifoldCF Issue Type: Bug Components: Framework core Affects Versions: ManifoldCF 1.7 Reporter: Rafa Haro To reproduce the error: 1. Create a FileSystem Repository Connector 2. Create a Solr Output Connector 3. Create a Transformation Connector, for example Allowed Documents 4. Create a job, configure a pipeline including the transformation connector 5. Save the Job 6. Edit the Job. Go to Repository Paths. Try to Add a root path. 7. Save the job 8. Error in the UI: Error! Output name 'null' removed from job; not allowed Exception: org.apache.manifoldcf.core.interfaces.ManifoldCFException: Output name 'null' removed from job; not allowed at org.apache.manifoldcf.crawler.jobs.PipelineManager.compareRows(PipelineManager.java:267) at org.apache.manifoldcf.crawler.jobs.Jobs.save(Jobs.java:988) at org.apache.manifoldcf.crawler.jobs.JobManager.save(JobManager.java:848) at org.apache.jsp.execute_jsp._jspService(execute_jsp.java:1809) at org.apache.jasper.runtime.HttpJspBase.service(HttpJspBase.java:70) at javax.servlet.http.HttpServlet.service(HttpServlet.java:820) at org.apache.jasper.servlet.JspServletWrapper.service(JspServletWrapper.java:388) at org.apache.jasper.servlet.JspServlet.serviceJspFile(JspServlet.java:313) at org.apache.jasper.servlet.JspServlet.service(JspServlet.java:260) at javax.servlet.http.HttpServlet.service(HttpServlet.java:820) at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:547) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:480) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:119) at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:520) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:227) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:941) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:409) at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:186) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:875) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:117) at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:250) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:110) at org.eclipse.jetty.server.Server.handle(Server.java:349) at org.eclipse.jetty.server.HttpConnection.handleRequest(HttpConnection.java:441) at org.eclipse.jetty.server.HttpConnection$RequestHandler.content(HttpConnection.java:936) at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:801) at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:224) at org.eclipse.jetty.server.AsyncHttpConnection.handle(AsyncHttpConnection.java:51) at org.eclipse.jetty.io.nio.SelectChannelEndPoint.handle(SelectChannelEndPoint.java:586) at org.eclipse.jetty.io.nio.SelectChannelEndPoint$1.run(SelectChannelEndPoint.java:44) at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:598) at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:533) at java.lang.Thread.run(Thread.java:744) -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CONNECTORS-986) Error While Editing a job involving a pipeline
[ https://issues.apache.org/jira/browse/CONNECTORS-986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14047680#comment-14047680 ] Karl Wright commented on CONNECTORS-986: Hi, This problem was corrected last week and committed to trunk. Please synch up and try again. Thanks, Karl Sent from my Windows Phone From: Rafa Haro (JIRA) Sent: 6/30/2014 5:53 AM To: dev@manifoldcf.apache.org Subject: [jira] [Created] (CONNECTORS-986) Error While Editing a job involving a pipeline Rafa Haro created CONNECTORS-986: Summary: Error While Editing a job involving a pipeline Key: CONNECTORS-986 URL: https://issues.apache.org/jira/browse/CONNECTORS-986 Project: ManifoldCF Issue Type: Bug Components: Framework core Affects Versions: ManifoldCF 1.7 Reporter: Rafa Haro To reproduce the error: 1. Create a FileSystem Repository Connector 2. Create a Solr Output Connector 3. Create a Transformation Connector, for example Allowed Documents 4. Create a job, configure a pipeline including the transformation connector 5. Save the Job 6. Edit the Job. Go to Repository Paths. Try to Add a root path. 7. Save the job 8. Error in the UI: Error! Output name 'null' removed from job; not allowed Exception: org.apache.manifoldcf.core.interfaces.ManifoldCFException: Output name 'null' removed from job; not allowed at org.apache.manifoldcf.crawler.jobs.PipelineManager.compareRows(PipelineManager.java:267) at org.apache.manifoldcf.crawler.jobs.Jobs.save(Jobs.java:988) at org.apache.manifoldcf.crawler.jobs.JobManager.save(JobManager.java:848) at org.apache.jsp.execute_jsp._jspService(execute_jsp.java:1809) at org.apache.jasper.runtime.HttpJspBase.service(HttpJspBase.java:70) at javax.servlet.http.HttpServlet.service(HttpServlet.java:820) at org.apache.jasper.servlet.JspServletWrapper.service(JspServletWrapper.java:388) at org.apache.jasper.servlet.JspServlet.serviceJspFile(JspServlet.java:313) at org.apache.jasper.servlet.JspServlet.service(JspServlet.java:260) at javax.servlet.http.HttpServlet.service(HttpServlet.java:820) at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:547) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:480) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:119) at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:520) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:227) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:941) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:409) at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:186) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:875) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:117) at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:250) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:110) at org.eclipse.jetty.server.Server.handle(Server.java:349) at org.eclipse.jetty.server.HttpConnection.handleRequest(HttpConnection.java:441) at org.eclipse.jetty.server.HttpConnection$RequestHandler.content(HttpConnection.java:936) at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:801) at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:224) at org.eclipse.jetty.server.AsyncHttpConnection.handle(AsyncHttpConnection.java:51) at org.eclipse.jetty.io.nio.SelectChannelEndPoint.handle(SelectChannelEndPoint.java:586) at org.eclipse.jetty.io.nio.SelectChannelEndPoint$1.run(SelectChannelEndPoint.java:44) at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:598) at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:533) at java.lang.Thread.run(Thread.java:744) -- This message was sent by Atlassian JIRA (v6.2#6252) Error While Editing a job involving a pipeline -- Key: CONNECTORS-986 URL: https://issues.apache.org/jira/browse/CONNECTORS-986 Project: ManifoldCF Issue Type: Bug Components: Framework core Affects Versions: ManifoldCF 1.7 Reporter: Rafa Haro To reproduce the error: 1. Create a FileSystem Repository Connector 2. Create a Solr Output Connector 3. Create a Transformation Connector, for example Allowed Documents 4.
Re: Testing Pipelines. Conclusions so far and Some Doubts
Hi Karl, I can extend myself explaining the reasons, but a simple summary is that we need more complex pipelines, supporting for example splitters and aggregators, not only sequential components. Of course, everything can be hacked, and we have decided to change our current approach by implementing some transformation connectors, but for incoming versions of our product we will be using our own processor architecture. Thanks. Rafa El 30/06/14 16:05, Karl Wright escribió: Hi rafa, I am out of town at the moment, but frankly I could see no reason that the architecture as it is implemented would not meet your use case. A transformation connection is not limited to passing along the input repository document object; it can modify I extensively and even replace it. Karl Sent from my Windows Phone From: Rafa Haro Sent: 6/30/2014 6:48 AM To: dev@manifoldcf.apache.org Subject: Testing Pipelines. Conclusions so far and Some Doubts Hi, I have spent a couple of hours testing the Pipelines in ManifoldCF 1.7. Before exposing the problems I have experimented and before asking some questions, I would like to explain the kind of test I have performed so far: 1. Testing with a simple File system connector for simplicity 2. Using 2 instances of Solr Output Connector for testing Multiple output. The final Solr instance is the same and each output connector has been configured with 2 different solr cores (collection1 and collection2) 3. Using Allowed Documents and Tika Extractor as Transformation connectors. Allowed Documents has been configured to allow only PDF files (mimetype + extension) 4. The processing pipeline I wanted to configure is quite simple: Filter and extract content (with Tika) for collection1 and a normal crawling for collection2. Let me explain better: both transformation connectors were configured for collection1 Solr Output and no transformation connector were configured for collection2. I have two files in the configured repository path for the File system connector: a PDF file and an ODS file. I was expecting only the PDF file to be indexed in collection1 and both files in collection2. The result of the experiment has been the following: 1. All the files have been indexed in both collections. Apparently the Allowed Documents transformation connector doesn't work with filesystem repository connector. 2. For collection1 Output Connector, I first changed the Update Handler from /update/extract to /update because Tika Extractor was going to be configured for it. This change produces an error in Solr while indexing (Unsupported ContentType: application/octet-stream Not in: [application/xml, text/csv, text/json, application/csv, application/javabin, text/xml, application/json]). 3. Therefore, I configured again the update handler as /update/extract. Because the same exact content is being indexed for both cores, I don't have a way to know if the Tika transformation connector is working properly or not. Up to here the testing outcomes. Now I would like to expose some conclusions from the point of view of our use case. Although the pipeline approach is great, as far as I have understood it, we can't still use it for our purposes. Specifically, what we would is somehow to create different repository documents in any moment of the chain and send them to different output connector. Let me put an easy use case: We want to process the documents to extract Named Entities: Persons, Places and Organizations. The first transformation of the pipeline can use any NER system to extract the name entities. Then I want to have separates repositories (outputs): one for the raw content and one for each type of entity. Let's say 4 different solr cores. Of course with current approach I could send the same repository document to all the outputs and respectively filtering, but doesn't sound to me as a good solution. Cheers, Rafa
[jira] [Commented] (CONNECTORS-981) Solr Connector - classic Solrj SolrInputDocument support
[ https://issues.apache.org/jira/browse/CONNECTORS-981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14047695#comment-14047695 ] Alessandro Benedetti commented on CONNECTORS-981: - Hi karl, I was on holidays ! Let me take a look ! Cheers Solr Connector - classic Solrj SolrInputDocument support Key: CONNECTORS-981 URL: https://issues.apache.org/jira/browse/CONNECTORS-981 Project: ManifoldCF Issue Type: Improvement Components: Lucene/SOLR connector Affects Versions: ManifoldCF 1.7 Reporter: Alessandro Benedetti Assignee: Karl Wright Fix For: ManifoldCF 1.7 Attachments: CONNECTORS-981.patch The solr connector, according with the development of the Tika Connector processor, should be able to operate in 2 ways : 1) as usual 2) using the classic Solrj SolrInputDocument approach with already extracted metadata To allow the choice a flag will be added in the UI in the mapping tab ( as it's related with how the fields will be processed) -- This message was sent by Atlassian JIRA (v6.2#6252)
RE: [jira] [Updated] (CONNECTORS-987) Chinese Localization(Documentation, Help screens)
Hi Mingchun, You should have committer rights; please just go ahead and commit! Karl Sent from my Windows Phone From: Mingchun Zhao (JIRA) Sent: 6/30/2014 10:01 AM To: dev@manifoldcf.apache.org Subject: [jira] [Updated] (CONNECTORS-987) Chinese Localization(Documentation, Help screens) [ https://issues.apache.org/jira/browse/CONNECTORS-987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mingchun Zhao updated CONNECTORS-987: - Attachment: CONNECTORS-987.patch A patch for the first time(5 files added). Chinese Localization(Documentation, Help screens) - Key: CONNECTORS-987 URL: https://issues.apache.org/jira/browse/CONNECTORS-987 Project: ManifoldCF Issue Type: Improvement Components: Documentation Reporter: Mingchun Zhao Attachments: CONNECTORS-987.patch In this issue, I will deal with documentation,help screens for Chinese Localization. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CONNECTORS-987) Chinese Localization(Documentation, Help screens)
[ https://issues.apache.org/jira/browse/CONNECTORS-987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14047732#comment-14047732 ] Karl Wright commented on CONNECTORS-987: Hi Mingchun, You should have committer rights; please just go ahead and commit! Karl Sent from my Windows Phone From: Mingchun Zhao (JIRA) Sent: 6/30/2014 10:01 AM To: dev@manifoldcf.apache.org Subject: [jira] [Updated] (CONNECTORS-987) Chinese Localization(Documentation, Help screens) [ https://issues.apache.org/jira/browse/CONNECTORS-987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mingchun Zhao updated CONNECTORS-987: - Attachment: CONNECTORS-987.patch A patch for the first time(5 files added). -- This message was sent by Atlassian JIRA (v6.2#6252) Chinese Localization(Documentation, Help screens) - Key: CONNECTORS-987 URL: https://issues.apache.org/jira/browse/CONNECTORS-987 Project: ManifoldCF Issue Type: Improvement Components: Documentation Reporter: Mingchun Zhao Attachments: CONNECTORS-987.patch In this issue, I will deal with documentation,help screens for Chinese Localization. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CONNECTORS-981) Solr Connector - classic Solrj SolrInputDocument support
[ https://issues.apache.org/jira/browse/CONNECTORS-981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14047756#comment-14047756 ] Alessandro Benedetti commented on CONNECTORS-981: - A couple of observations : 1) simply replacing the Solr connector jar in my deployment produces : javax.servlet.ServletException: java.lang.NoClassDefFoundError: Could not initialize class org.apache.http.impl.conn.ManagedHttpClientConnectionFactory normal ? Am I missing some other component that doesn't allow me to simply build again only the Solr Connector ? 2) I saw you moved the configuration of the use or not for the Extract update Handler from the job configuration to the Connector configuration. Of course is matter of choice, but can you can explain me the advantages of this approach ? 3) after a brief code review it seems ok, by the way Cheers Solr Connector - classic Solrj SolrInputDocument support Key: CONNECTORS-981 URL: https://issues.apache.org/jira/browse/CONNECTORS-981 Project: ManifoldCF Issue Type: Improvement Components: Lucene/SOLR connector Affects Versions: ManifoldCF 1.7 Reporter: Alessandro Benedetti Assignee: Karl Wright Fix For: ManifoldCF 1.7 Attachments: CONNECTORS-981.patch The solr connector, according with the development of the Tika Connector processor, should be able to operate in 2 ways : 1) as usual 2) using the classic Solrj SolrInputDocument approach with already extracted metadata To allow the choice a flag will be added in the UI in the mapping tab ( as it's related with how the fields will be processed) -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CONNECTORS-981) Solr Connector - classic Solrj SolrInputDocument support
[ https://issues.apache.org/jira/browse/CONNECTORS-981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14047901#comment-14047901 ] Karl Wright commented on CONNECTORS-981: I have no idea why you are getting no class found errors; you will need to diagnose that yourself. The reason I put configuration information in the configuration part of the UI is because it is related to how indexing is done, rather than what is indexed. Karl Sent from my Windows Phone From: Alessandro Benedetti (JIRA) Sent: 6/30/2014 11:35 AM To: daddy...@gmail.com Subject: [jira] [Commented] (CONNECTORS-981) Solr Connector - classic Solrj SolrInputDocument support [ https://issues.apache.org/jira/browse/CONNECTORS-981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14047756#comment-14047756 ] Alessandro Benedetti commented on CONNECTORS-981: - A couple of observations : 1) simply replacing the Solr connector jar in my deployment produces : javax.servlet.ServletException: java.lang.NoClassDefFoundError: Could not initialize class org.apache.http.impl.conn.ManagedHttpClientConnectionFactory normal ? Am I missing some other component that doesn't allow me to simply build again only the Solr Connector ? 2) I saw you moved the configuration of the use or not for the Extract update Handler from the job configuration to the Connector configuration. Of course is matter of choice, but can you can explain me the advantages of this approach ? 3) after a brief code review it seems ok, by the way Cheers -- This message was sent by Atlassian JIRA (v6.2#6252) Solr Connector - classic Solrj SolrInputDocument support Key: CONNECTORS-981 URL: https://issues.apache.org/jira/browse/CONNECTORS-981 Project: ManifoldCF Issue Type: Improvement Components: Lucene/SOLR connector Affects Versions: ManifoldCF 1.7 Reporter: Alessandro Benedetti Assignee: Karl Wright Fix For: ManifoldCF 1.7 Attachments: CONNECTORS-981.patch The solr connector, according with the development of the Tika Connector processor, should be able to operate in 2 ways : 1) as usual 2) using the classic Solrj SolrInputDocument approach with already extracted metadata To allow the choice a flag will be added in the UI in the mapping tab ( as it's related with how the fields will be processed) -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Resolved] (CONNECTORS-986) Error While Editing a job involving a pipeline
[ https://issues.apache.org/jira/browse/CONNECTORS-986?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karl Wright resolved CONNECTORS-986. Resolution: Fixed Assignee: Karl Wright Was resolved last week. Error While Editing a job involving a pipeline -- Key: CONNECTORS-986 URL: https://issues.apache.org/jira/browse/CONNECTORS-986 Project: ManifoldCF Issue Type: Bug Components: Framework core Affects Versions: ManifoldCF 1.7 Reporter: Rafa Haro Assignee: Karl Wright To reproduce the error: 1. Create a FileSystem Repository Connector 2. Create a Solr Output Connector 3. Create a Transformation Connector, for example Allowed Documents 4. Create a job, configure a pipeline including the transformation connector 5. Save the Job 6. Edit the Job. Go to Repository Paths. Try to Add a root path. 7. Save the job 8. Error in the UI: Error! Output name 'null' removed from job; not allowed Exception: org.apache.manifoldcf.core.interfaces.ManifoldCFException: Output name 'null' removed from job; not allowed at org.apache.manifoldcf.crawler.jobs.PipelineManager.compareRows(PipelineManager.java:267) at org.apache.manifoldcf.crawler.jobs.Jobs.save(Jobs.java:988) at org.apache.manifoldcf.crawler.jobs.JobManager.save(JobManager.java:848) at org.apache.jsp.execute_jsp._jspService(execute_jsp.java:1809) at org.apache.jasper.runtime.HttpJspBase.service(HttpJspBase.java:70) at javax.servlet.http.HttpServlet.service(HttpServlet.java:820) at org.apache.jasper.servlet.JspServletWrapper.service(JspServletWrapper.java:388) at org.apache.jasper.servlet.JspServlet.serviceJspFile(JspServlet.java:313) at org.apache.jasper.servlet.JspServlet.service(JspServlet.java:260) at javax.servlet.http.HttpServlet.service(HttpServlet.java:820) at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:547) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:480) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:119) at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:520) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:227) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:941) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:409) at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:186) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:875) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:117) at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:250) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:110) at org.eclipse.jetty.server.Server.handle(Server.java:349) at org.eclipse.jetty.server.HttpConnection.handleRequest(HttpConnection.java:441) at org.eclipse.jetty.server.HttpConnection$RequestHandler.content(HttpConnection.java:936) at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:801) at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:224) at org.eclipse.jetty.server.AsyncHttpConnection.handle(AsyncHttpConnection.java:51) at org.eclipse.jetty.io.nio.SelectChannelEndPoint.handle(SelectChannelEndPoint.java:586) at org.eclipse.jetty.io.nio.SelectChannelEndPoint$1.run(SelectChannelEndPoint.java:44) at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:598) at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:533) at java.lang.Thread.run(Thread.java:744) -- This message was sent by Atlassian JIRA (v6.2#6252)