[jira] [Created] (CONNECTORS-236) Tests and test server needed for CMIS connector

2011-08-03 Thread Piergiorgio Lucidi (JIRA)
Tests and test server needed for CMIS connector
---

 Key: CONNECTORS-236
 URL: https://issues.apache.org/jira/browse/CONNECTORS-236
 Project: ManifoldCF
  Issue Type: Bug
  Components: CMIS connector
Reporter: Piergiorgio Lucidi


The CMIS connector needs tests, and a CMIS test server to run against.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CONNECTORS-235) item description element not indexed

2011-08-03 Thread Karl Wright (JIRA)

[ 
https://issues.apache.org/jira/browse/CONNECTORS-235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13078766#comment-13078766
 ] 

Karl Wright commented on CONNECTORS-235:


Thanks for the info.  The fix, as structured, should generally apply to 
PostgreSQL too.  Please let me know if it works for you.  But I'll need to 
research how this problem could have gotten past the tests regardless.


 item description element not indexed
 

 Key: CONNECTORS-235
 URL: https://issues.apache.org/jira/browse/CONNECTORS-235
 Project: ManifoldCF
  Issue Type: Improvement
  Components: RSS connector
Affects Versions: ManifoldCF 0.2
Reporter: Kate McGonigal
Assignee: Karl Wright
 Fix For: ManifoldCF 0.3


 The RSS feed's *item* description is not written to any field in the Solr 
 index. 
 I have a typical RSS feed with the general structure:
 rss
 channel
 title/title
 link/link
 description/description
 item
 title/title
 link/link
 pubDate/pubDate
 description *** the description I do want *** /description
 author/author
 category/category
 /item
 /channel
 /rss
 Example:
 For the RSS feed: 
 http://www.onemansjazz.ca/component/option,com_rss/feed,RSS2.0/no_html,1/
 the rss/channel/item/description field is not indexed into Solr.
 Example notes:
   - what does get written to the Solr description field is the description 
 metadata from the website, i.e. Jazz radio show from Winnipeg on CKUW 95.9 
 FM, hosted by Maurice Hogue. in this case.
   - on the Dechromed Content tab of the job, No dechromed content is 
 selected. I'm not sure if that is relevant.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




Re: Reseting Manifoldcf

2011-08-03 Thread Karl Wright
Can you clarify what you mean by user data?  There's no such data
stored by ManifoldCF in any kind of persistent way.

There are command-line commands which clear out various kinds of
things like jobs and connections.  There's also the ManifoldCF API
Service.  But I can't help further unless you are more specific.

Karl

On Wed, Aug 3, 2011 at 10:57 AM, Farzad Valad ho...@farzad.net wrote:
 What would be the sequence of commands to automate to reset ManifoldCF and
 flush out user data?  I've been doing it through the UI and it is very
 tedious.  Much rather have a batch file to do the job.  Thanks!



Re: CMIS Connector - Tests

2011-08-03 Thread Piergiorgio Lucidi
I'm trying to implement tests but I found a problem to set all the needed
parameters to the CMIS Repository Connector that needs: an username, a
password and the endpoint (url).

I need to know how to create the configuration nodes for the connector, in
the connector code I managed the configuration parameters in this way in the
processConfigurationPost method:

 public String processConfigurationPost(IThreadContext threadContext,
   IPostParameters variableContext, ConfigParams parameters)
   throws ManifoldCFException {
 String username = variableContext.getParameter(CONFIG_PARAM_USERNAME);
 if (StringUtils.isNotEmpty(username))
   parameters.setParameter(CONFIG_PARAM_USERNAME, username);
 String password = variableContext.getParameter(CONFIG_PARAM_PASSWORD);
 if (StringUtils.isNotEmpty(password))
   parameters.setParameter(CONFIG_PARAM_PASSWORD, password);
 String endpoint = variableContext.getParameter(CONFIG_PARAM_ENDPOINT);
 if (StringUtils.isNotEmpty(endpoint)  endpoint.length()  0)
   parameters.setParameter(CONFIG_PARAM_ENDPOINT, endpoint);
 String repositoryId = variableContext
 .getParameter(CONFIG_PARAM_REPOSITORY_ID);
 if (StringUtils.isNotEmpty(repositoryId))
   parameters.setParameter(CONFIG_PARAM_REPOSITORY_ID, repositoryId);
 return null;
   }


Now I have to setup the same parameters inside my test class APISanityTest
that doesn't like the following snippet, it works only if CMIS parameters
are commented as the following:

 @Test
   public void sanityCheck()
 throws Exception
   {
 try
 {
   // Hey, we were able to install the file system connector etc.
   // Now, create a local test job and run it.
   IThreadContext tc = ThreadContextFactory.make();
   int i;
   IJobManager jobManager = JobManagerFactory.make(tc);
   // Create a basic file system connection, and save it.
   ConfigurationNode connectionObject;
   ConfigurationNode child;
   Configuration requestObject;
   Configuration result;

   connectionObject = new ConfigurationNode(repositoryconnection);

   child = new ConfigurationNode(name);
   child.setValue(CMIS Connection);
   connectionObject.addChild(connectionObject.getChildCount(),child);

   child = new ConfigurationNode(class_name);
   child.setValue(
 org.apache.manifoldcf.crawler.connectors.cmis.CmisRepositoryConnector);
   connectionObject.addChild(connectionObject.getChildCount(),child);

   child = new ConfigurationNode(description);
   child.setValue(CMIS Connection);
   connectionObject.addChild(connectionObject.getChildCount(),child);
   child = new ConfigurationNode(max_connections);
   child.setValue(10);
   connectionObject.addChild(connectionObject.getChildCount(),child);

   //setting the CMIS specific parameters
 //  child = new ConfigurationNode(username);
 //  child.setValue(CMIS_USERNAME);
 //  connectionObject.addChild(connectionObject.getChildCount(),child);
 //
 //  child = new ConfigurationNode(password);
 //  child.setValue(CMIS_PASSWORD);
 //  connectionObject.addChild(connectionObject.getChildCount(),child);
 //
 //  child = new ConfigurationNode(endpoint);
 //  child.setValue(CMIS_ENDPOINT_TEST_SERVER);
 //  connectionObject.addChild(connectionObject.getChildCount(),child);
   requestObject = new Configuration();
   requestObject.addChild(0,connectionObject);

   result = performAPIPutOperationViaNodes(
 repositoryconnections/CMIS%20Connection,201,requestObject);


How can I set the username, password and endpoint for the CMIS Repository
Connector parameters in this test class?

Thank you.

Piergiorgio


2011/8/2 Karl Wright daddy...@gmail.com

 Thanks for the status report.  I hope to see your patch soon!

 Also, FWIW, once the documentation is also done I'd like to consider
 solidifying the 0.3 release.  It's got a lot of good stuff in it and I
 think as soon as we've finished off the new CMIS connector in all
 dimensions we should go ahead.  Thoughts, anyone?

 Karl


 On Tue, Aug 2, 2011 at 5:00 AM, Piergiorgio Lucidi
 piergiorgioluc...@gmail.com wrote:
  Yesterday I started to work on end-to-end integration test for the CMIS
  Connector and now I have a full running OpenCMIS test server integrated
 with
  the ManifoldCF Maven build process.
 
  Now I have to implement:
  - a setup method to create the test documents in the CMIS server
  - a null output connector using the ManifoldCF api
  - tests using the ManifoldCF api to create a mock configuration against
 the
  test CMIS server
 
  I'll let you know when it works.
 
  Regards,
  Piergiorgio
 
  2011/7/29 Piergiorgio Lucidi piergiorgioluc...@gmail.com
 
  Hi Karl,
 
  thank you for the details and as soon as I finish a first version of
  integration and/or unit test I will create a new ticket in the CMIS
  Component to release the patch.
 
  I hope to release this 

[jira] [Issue Comment Edited] (CONNECTORS-235) item description element not indexed

2011-08-03 Thread Kate McGonigal (JIRA)

[ 
https://issues.apache.org/jira/browse/CONNECTORS-235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13079085#comment-13079085
 ] 

Kate McGonigal edited comment on CONNECTORS-235 at 8/3/11 10:31 PM:


I'm afraid these problems still exist for me. 

A few hours ago I built the latest from trunk. It is running on PostgreSQL.

Just in case, I also started from a fresh install of Solr 3.3.0.  I'm using the 
example that comes with the distribution. It is thus running on Derby. I 
realize the schema is not optimal for RSS feeds, but it does include a 
description  field, which is what I'm interested in at the moment.

Problem 1) When I try running the example job (see original post) with 
Dechromed Content set to No dechromed content, what shows up in the 
description field (for all documents) is Jazz radio show from Winnipeg on CKUW 
95.9 FM, hosted by Maurice Hogue. which is not the item-description in the RSS 
feed's XML, but rather from the website's metadata description element in the 
HTML.  I have tried another RSS feed, with the same result.

Problem 2) When I try running the example job with Dechromed Content set to 
if present, in 'description' field it still hangs with the log file showing:
{quote}FATAL 2011-08-03 16:08:21,703 (Worker thread '10') - Error tossed: 
java.lang.String cannot be cast to 
org.apache.manifoldcf.core.interfaces.CharacterInput
java.lang.ClassCastException: java.lang.String cannot be cast to 
org.apache.manifoldcf.core.interfaces.CharacterInput
at 
org.apache.manifoldcf.crawler.jobs.Carrydown.getDataValuesAsFiles(Carrydown.java:611)
at 
org.apache.manifoldcf.crawler.jobs.JobManager.retrieveParentDataAsFiles(JobManager.java:4263)
at 
org.apache.manifoldcf.crawler.system.WorkerThread$VersionActivity.retrieveParentDataAsFiles(WorkerThread.java:1221)
at 
org.apache.manifoldcf.crawler.connectors.rss.RSSConnector.getDocumentVersions(RSSConnector.java:824)
at 
org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:321){quote}

And just to be clear on what I am ultimately trying to do: I'd like to be able 
to show my searchers the description from the RSS feed for each of the 
documents that match their searches. I actually only need to index the 
item-description field (as opposed to what is at the item link) since my RSS 
feeds are of scientific papers that will have a detailed abstract in the 
item-description.

  was (Author: kmcgonig):
I'm afraid these problems still exist for me. 

A few hours ago I built the latest from trunk. It is running on PostgreSQL.

Just in case, I also started from a fresh install of Solr 3.3.0.  I'm using the 
example that comes with the distribution. It is thus running on Derby. I 
realize the schema is not optimal for RSS feeds, but it does include a 
description  field, which is what I'm interested in at the moment.

Problem 1) When I try running the example job with Dechromed Content set to 
No dechromed content, what shows up in the description field (for all 
documents) is Jazz radio show from Winnipeg on CKUW 95.9 FM, hosted by Maurice 
Hogue. which is not the item-description in the RSS feed's XML, but rather 
from the website's metadata description element in the HTML.  I have tried 
another RSS feed, with the same result.

Problem 2) When I try running the example job (see original post) with 
Dechromed Content set to if present, in 'description' field it still hangs 
with the log file showing:
{quote}FATAL 2011-08-03 16:08:21,703 (Worker thread '10') - Error tossed: 
java.lang.String cannot be cast to 
org.apache.manifoldcf.core.interfaces.CharacterInput
java.lang.ClassCastException: java.lang.String cannot be cast to 
org.apache.manifoldcf.core.interfaces.CharacterInput
at 
org.apache.manifoldcf.crawler.jobs.Carrydown.getDataValuesAsFiles(Carrydown.java:611)
at 
org.apache.manifoldcf.crawler.jobs.JobManager.retrieveParentDataAsFiles(JobManager.java:4263)
at 
org.apache.manifoldcf.crawler.system.WorkerThread$VersionActivity.retrieveParentDataAsFiles(WorkerThread.java:1221)
at 
org.apache.manifoldcf.crawler.connectors.rss.RSSConnector.getDocumentVersions(RSSConnector.java:824)
at 
org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:321){quote}

And just to be clear on what I am ultimately trying to do: I'd like to be able 
to show my searchers the description from the RSS feed for each of the 
documents that match their searches. I actually only need to index the 
item-description field (as opposed to what is at the item link) since my RSS 
feeds are of scientific papers that will have a detailed abstract in the 
item-description.
  
 item description element not indexed
 

 Key: CONNECTORS-235
 URL: 

[jira] [Commented] (CONNECTORS-235) item description element not indexed

2011-08-03 Thread Karl Wright (JIRA)

[ 
https://issues.apache.org/jira/browse/CONNECTORS-235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13079097#comment-13079097
 ] 

Karl Wright commented on CONNECTORS-235:


Hmm, I'm using the very same feed you are, with PostgreSQL, and seeing perfect 
results.
Can you attach a screen shot of the View Job page of the job in question?  
Also, the View Connection page for both the output connection and the 
repository connection?


 item description element not indexed
 

 Key: CONNECTORS-235
 URL: https://issues.apache.org/jira/browse/CONNECTORS-235
 Project: ManifoldCF
  Issue Type: Improvement
  Components: RSS connector
Affects Versions: ManifoldCF 0.2
Reporter: Kate McGonigal
Assignee: Karl Wright
 Fix For: ManifoldCF 0.3


 The RSS feed's *item* description is not written to any field in the Solr 
 index. 
 I have a typical RSS feed with the general structure:
 rss
 channel
 title/title
 link/link
 description/description
 item
 title/title
 link/link
 pubDate/pubDate
 description *** the description I do want *** /description
 author/author
 category/category
 /item
 /channel
 /rss
 Example:
 For the RSS feed: 
 http://www.onemansjazz.ca/component/option,com_rss/feed,RSS2.0/no_html,1/
 the rss/channel/item/description field is not indexed into Solr.
 Example notes:
   - what does get written to the Solr description field is the description 
 metadata from the website, i.e. Jazz radio show from Winnipeg on CKUW 95.9 
 FM, hosted by Maurice Hogue. in this case.
   - on the Dechromed Content tab of the job, No dechromed content is 
 selected. I'm not sure if that is relevant.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CONNECTORS-235) item description element not indexed

2011-08-03 Thread Karl Wright (JIRA)

[ 
https://issues.apache.org/jira/browse/CONNECTORS-235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13079118#comment-13079118
 ] 

Karl Wright commented on CONNECTORS-235:


One problem I found is that due to a rebuild I was not using PostgreSQL after 
all, so here's another check-in to fix its handling of streamed carrydown info. 
 r1153702.

 item description element not indexed
 

 Key: CONNECTORS-235
 URL: https://issues.apache.org/jira/browse/CONNECTORS-235
 Project: ManifoldCF
  Issue Type: Improvement
  Components: RSS connector
Affects Versions: ManifoldCF 0.2
Reporter: Kate McGonigal
Assignee: Karl Wright
 Fix For: ManifoldCF 0.3


 The RSS feed's *item* description is not written to any field in the Solr 
 index. 
 I have a typical RSS feed with the general structure:
 rss
 channel
 title/title
 link/link
 description/description
 item
 title/title
 link/link
 pubDate/pubDate
 description *** the description I do want *** /description
 author/author
 category/category
 /item
 /channel
 /rss
 Example:
 For the RSS feed: 
 http://www.onemansjazz.ca/component/option,com_rss/feed,RSS2.0/no_html,1/
 the rss/channel/item/description field is not indexed into Solr.
 Example notes:
   - what does get written to the Solr description field is the description 
 metadata from the website, i.e. Jazz radio show from Winnipeg on CKUW 95.9 
 FM, hosted by Maurice Hogue. in this case.
   - on the Dechromed Content tab of the job, No dechromed content is 
 selected. I'm not sure if that is relevant.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CONNECTORS-235) item description element not indexed

2011-08-03 Thread Karl Wright (JIRA)

[ 
https://issues.apache.org/jira/browse/CONNECTORS-235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13079134#comment-13079134
 ] 

Karl Wright commented on CONNECTORS-235:


Ok, another mystery solved.  The RSS chromed data mode of None was not 
properly tried because of the inadvertant database switch, and I found that 
recrawling vs. crawling fresh generated incorrect version information.  I've 
fixed that problem but I can't check it in because it causes the following 
error against a plain-vanilla Solr installation:

ERROR: [http://www.onemansjazz.ca/content/view/330/50/] multiple values 
encountered for non multiValued field description: [Jazz radio show from 
Winnipeg on CKUW 95.9 FM, hosted by Maurice Hogue., I have created a Listener 
Survey and if you have the time to complete it, that would be terrific. I#39;m 
trying to do an evaluation of One Man#39;s Jazz as well as considering some 
new options that have arisen. Your feedback would be most appreciate.This 
survey is in two parts and is a total of twenty parts, most of them just 
require a click of your mouse. Click here 
(http://www.surveymonkey.com/s/C3DZ3JK) for Part One, and here 
(http://www.surveymonkey.com/s/C38FVH8) for Part Two. Thanks again for your 
input. ]

I'm not sure why Solr is interpreting this long field as multivalued, but 
clearly it would be much better if I used a metadata name that wasn't 
description, since Solr's example configuration has dibs on that.  I'll 
experiment and post further.


 item description element not indexed
 

 Key: CONNECTORS-235
 URL: https://issues.apache.org/jira/browse/CONNECTORS-235
 Project: ManifoldCF
  Issue Type: Improvement
  Components: RSS connector
Affects Versions: ManifoldCF 0.2
Reporter: Kate McGonigal
Assignee: Karl Wright
 Fix For: ManifoldCF 0.3


 The RSS feed's *item* description is not written to any field in the Solr 
 index. 
 I have a typical RSS feed with the general structure:
 rss
 channel
 title/title
 link/link
 description/description
 item
 title/title
 link/link
 pubDate/pubDate
 description *** the description I do want *** /description
 author/author
 category/category
 /item
 /channel
 /rss
 Example:
 For the RSS feed: 
 http://www.onemansjazz.ca/component/option,com_rss/feed,RSS2.0/no_html,1/
 the rss/channel/item/description field is not indexed into Solr.
 Example notes:
   - what does get written to the Solr description field is the description 
 metadata from the website, i.e. Jazz radio show from Winnipeg on CKUW 95.9 
 FM, hosted by Maurice Hogue. in this case.
   - on the Dechromed Content tab of the job, No dechromed content is 
 selected. I'm not sure if that is relevant.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CONNECTORS-235) item description element not indexed

2011-08-03 Thread Karl Wright (JIRA)

[ 
https://issues.apache.org/jira/browse/CONNECTORS-235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13079137#comment-13079137
 ] 

Karl Wright commented on CONNECTORS-235:


I switched the name to summary.  r1153705.


 item description element not indexed
 

 Key: CONNECTORS-235
 URL: https://issues.apache.org/jira/browse/CONNECTORS-235
 Project: ManifoldCF
  Issue Type: Improvement
  Components: RSS connector
Affects Versions: ManifoldCF 0.2
Reporter: Kate McGonigal
Assignee: Karl Wright
 Fix For: ManifoldCF 0.3


 The RSS feed's *item* description is not written to any field in the Solr 
 index. 
 I have a typical RSS feed with the general structure:
 rss
 channel
 title/title
 link/link
 description/description
 item
 title/title
 link/link
 pubDate/pubDate
 description *** the description I do want *** /description
 author/author
 category/category
 /item
 /channel
 /rss
 Example:
 For the RSS feed: 
 http://www.onemansjazz.ca/component/option,com_rss/feed,RSS2.0/no_html,1/
 the rss/channel/item/description field is not indexed into Solr.
 Example notes:
   - what does get written to the Solr description field is the description 
 metadata from the website, i.e. Jazz radio show from Winnipeg on CKUW 95.9 
 FM, hosted by Maurice Hogue. in this case.
   - on the Dechromed Content tab of the job, No dechromed content is 
 selected. I'm not sure if that is relevant.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CONNECTORS-235) item description element not indexed

2011-08-03 Thread Karl Wright (JIRA)

[ 
https://issues.apache.org/jira/browse/CONNECTORS-235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13079141#comment-13079141
 ] 

Karl Wright commented on CONNECTORS-235:


Just to be clear, here's an example of the Solr log line for indexing one of 
the documents from the above mentioned feed.  You can, of course, configure the 
job to map the field names to whatever you like.  This is with no mapping 
whatsoever.

INFO: [] webapp=/solr path=/update/extract 
params={literal.source=http://www.onemansjazz.ca/component/option,com_rss/feed,RSS2.0/no_html,1/literal.category=Radio+-+Play+listsliteral.summary=I+had+a+lot+of+fun+putting+this+show+together+this+week.+Hope+you+enjoy+it,+too.literal.id=http://www.onemansjazz.ca/content/view/332/30/literal.title=July+23,+2011+Playlistliteral.pubdate=1311339967000}
 status=0 QTime=13

I'm pretty certain you must have a metadata value set for description in your 
job, because there is absolutely no mechanism (and never was one) for picking 
up the channel description from the feed. So you will have to remove that in 
order to get all this to work for you.

 item description element not indexed
 

 Key: CONNECTORS-235
 URL: https://issues.apache.org/jira/browse/CONNECTORS-235
 Project: ManifoldCF
  Issue Type: Improvement
  Components: RSS connector
Affects Versions: ManifoldCF 0.2
Reporter: Kate McGonigal
Assignee: Karl Wright
 Fix For: ManifoldCF 0.3


 The RSS feed's *item* description is not written to any field in the Solr 
 index. 
 I have a typical RSS feed with the general structure:
 rss
 channel
 title/title
 link/link
 description/description
 item
 title/title
 link/link
 pubDate/pubDate
 description *** the description I do want *** /description
 author/author
 category/category
 /item
 /channel
 /rss
 Example:
 For the RSS feed: 
 http://www.onemansjazz.ca/component/option,com_rss/feed,RSS2.0/no_html,1/
 the rss/channel/item/description field is not indexed into Solr.
 Example notes:
   - what does get written to the Solr description field is the description 
 metadata from the website, i.e. Jazz radio show from Winnipeg on CKUW 95.9 
 FM, hosted by Maurice Hogue. in this case.
   - on the Dechromed Content tab of the job, No dechromed content is 
 selected. I'm not sure if that is relevant.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira