I have no reason to believe that this patch won't work for MCF 2.2 as well. Please let me know of any problems and I will issue a revised patch against that branch.
Thanks, Karl On Tue, Jun 14, 2016 at 12:18 AM, Tomoko Uchida < [email protected]> wrote: > I've run and delete the job in question on ManifloldCF 2.4 and current > trunk (2.5-dev with CONNECTORS-1323.patch.) > Our problem can be reproduced with 2.4 and seems to be resolved with > trunk version. > > Operation: > > 1. Create a job with eight outputs below: > - ds_solr_forum_en-eu > - ds_solr_forum_en-in > - ds_solr_forum_en-sg > - ds_solr_forum_en-us > - ds_solr_forum_ko-kr_en > - ds_solr_forum_zh-cn_en > - ds_solr_forum_zh-tw_en > - ds_solr_forum_pt-br_en > > 2. Run the job for a while. > > 3. Abort the job. > > 4. Delete the job. > > With ManifoldCF 2.4, SQLException and stack traces (below) was logged > and the job remained in "clean up" status. > > ERROR 2016-06-14 09:33:19,714 (Document delete thread '0') - Document > delete thread aborting and restarting due to database connection > reset: Database exception: SQLException doing query (22001): ERROR: > value too long for type character varying(64) > org.apache.manifoldcf.core.interfaces.ManifoldCFException: Database > exception: SQLException doing query (22001): ERROR: value too long for > type character varying(64) > at > org.apache.manifoldcf.core.database.Database$ExecuteQueryThread.finishUp(Database.java:715) > at > org.apache.manifoldcf.core.database.Database.executeViaThread(Database.java:741) > at > org.apache.manifoldcf.core.database.Database.executeUncachedQuery(Database.java:803) > at > org.apache.manifoldcf.core.database.Database$QueryCacheExecutor.create(Database.java:1457) > at > org.apache.manifoldcf.core.cachemanager.CacheManager.findObjectsAndExecute(CacheManager.java:146) > at > org.apache.manifoldcf.core.database.Database.executeQuery(Database.java:204) > at > org.apache.manifoldcf.core.database.DBInterfacePostgreSQL.performModification(DBInterfacePostgreSQL.java:661) > at > org.apache.manifoldcf.core.database.DBInterfacePostgreSQL.performInsert(DBInterfacePostgreSQL.java:187) > at > org.apache.manifoldcf.core.database.BaseTable.performInsert(BaseTable.java:68) > at > org.apache.manifoldcf.crawler.repository.RepositoryHistoryManager.addRow(RepositoryHistoryManager.java:203) > at > org.apache.manifoldcf.crawler.repository.RepositoryConnectionManager.recordHistory(RepositoryConnectionManager.java:706) > at > org.apache.manifoldcf.crawler.system.DocumentDeleteThread$OutputRemoveActivity.recordActivity(DocumentDeleteThread.java:295) > at > org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$OutputRecordingActivity.recordActivity(IncrementalIngester.java:2383) > at > org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$OutputRecordingActivity.recordActivity(IncrementalIngester.java:2383) > at > org.apache.manifoldcf.agents.output.solr.HttpPoster.deletePost(HttpPoster.java:720) > at > org.apache.manifoldcf.agents.output.solr.SolrConnector.removeDocument(SolrConnector.java:605) > at > org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester.removeDocument(IncrementalIngester.java:2306) > at > org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester.documentDeleteMultiple(IncrementalIngester.java:1042) > > With 2.5-dev, there were no errors and the job was completely removed. > > Thank you for the fix. > > So I want to apply same fix (CONNECTORS-1323) to ManifoldCF 2.2 > because our production system cannot be upgraded to the latest version > immediately, though we should plan to do so. > I'll try it. > > Best regards, > Tomoko > > 2016-06-13 18:34 GMT+09:00 Tomoko Uchida <[email protected]>: > > And some additional information are here. > > > > I use ManifoldCF 2.2. > > > >> (1) Which underlying database are you using? > > > > I use PostgreSQL 9.4.5 > > > >> (2) Have you modified the MCF schema in any way? > > > > No. I did not modify any MCF db schema. > > > >> (3) What are the actual names of the output connections in question? > > > > For example, a job has 8 outputs below. There are other jobs that > > cannot be deleted by same reason. > > - ds_solr_forum_en-eu > > - ds_solr_forum_en-in > > - ds_solr_forum_en-sg > > - ds_solr_forum_en-us > > - ds_solr_forum_ko-kr_en > > - ds_solr_forum_zh-cn_en > > - ds_solr_forum_zh-tw_en > > - ds_solr_forum_pt-br_en > > > > For business requirements, I crawl a web site and post to multiple > > (eight) solr cores. > > > > Whole job definition is below (I deleted seeds/includes/excludes URLs > > from the original json data): > > > > { > > "job": { > > "description": "ds_forum_en", > > "document_specification": { > > "excludes": “…”, > > "excludescontentindex": "", > > "excludesindex": "", > > "includes": “…”, > > "includesindex": ".*", > > "limittoseeds": { > > "_attribute_value": "true", > > "_value_": "" > > }, > > "seeds": “…” > > }, > > "expiration_interval": "infinite", > > "hopcount_mode": "accurate", > > "id": "1464673266530", > > "pipelinestage": [ > > { > > "stage_connectionname": "ds_solr_forum_en-eu", > > "stage_id": "0", > > "stage_isoutput": "true", > > "stage_specification": {} > > }, > > { > > "stage_connectionname": "ds_solr_forum_en-in", > > "stage_id": "1", > > "stage_isoutput": "true", > > "stage_specification": {} > > }, > > { > > "stage_connectionname": "ds_solr_forum_en-sg", > > "stage_id": "2", > > "stage_isoutput": "true", > > "stage_specification": {} > > }, > > { > > "stage_connectionname": "ds_solr_forum_en-us", > > "stage_id": "3", > > "stage_isoutput": "true", > > "stage_specification": {} > > }, > > { > > "stage_connectionname": "ds_solr_forum_ko-kr_en", > > "stage_id": "4", > > "stage_isoutput": "true", > > "stage_specification": {} > > }, > > { > > "stage_connectionname": "ds_solr_forum_zh-cn_en", > > "stage_id": "5", > > "stage_isoutput": "true", > > "stage_specification": {} > > }, > > { > > "stage_connectionname": "ds_solr_forum_zh-tw_en", > > "stage_id": "6", > > "stage_isoutput": "true", > > "stage_specification": {} > > }, > > { > > "stage_connectionname": "ds_solr_forum_pt-br_en", > > "stage_id": "7", > > "stage_isoutput": "true", > > "stage_specification": {} > > } > > ], > > "priority": "5", > > "recrawl_interval": "86400000", > > "repository_connection": "ds_forum_en", > > "reseed_interval": "3600000", > > "run_mode": "continuous", > > "start_mode": "manual" > > } > > } > > > > Thank you, > > Tomoko > > > > 2016-06-13 18:09 GMT+09:00 Tomoko Uchida <[email protected]>: > >> Hi Karl, > >> > >> Thank you for rapid response! I'll try the patch soon. > >> > >> Regards, > >> Tomoko > >> > >> 2016-06-13 16:20 GMT+09:00 Karl Wright <[email protected]>: > >>> Ok, some further exploration yields the following: > >>> (1) A check was put into the code a while ago to prevent overly long > >>> activity names from blowing things up. That is why we no longer see > this > >>> problem. > >>> (2) There was a problem with activity logging for deletions across > multiple > >>> output connections. See CONNECTORS-1323. I've provided a patch. > >>> > >>> Karl > >>> > >>> > >>> On Mon, Jun 13, 2016 at 1:55 AM, Karl Wright <[email protected]> > wrote: > >>> > >>>> Hi Tomoko, > >>>> > >>>> Sorry, I missed this post when it was originally made. > >>>> > >>>> The activitytype column is provided by the framework for only a small > >>>> number of specific events. In no case does the activitytype contain > >>>> anything other than a fixed-length string; it's meant to be queried > on. > >>>> That string may include the name of a single output connection or of a > >>>> transformation connection, but only one. The maximum length of an > output > >>>> or transformation connection name is 32, so the total length > available for > >>>> the rest of the activitytype column is 30. > >>>> > >>>> The string "document deletion" is 17 characters, so that's nowhere > near > >>>> the limit here. So this makes no sense. > >>>> > >>>> Can you be more specific about the following: > >>>> > >>>> (1) Which underlying database are you using? > >>>> (2) Have you modified the MCF schema in any way? > >>>> (3) What are the actual names of the output connections in question? > >>>> > >>>> Thanks, > >>>> Karl > >>>> > >>>> > >>>> > >>>> > >>>> On Sun, Jun 12, 2016 at 10:42 PM, Tomoko Uchida < > >>>> [email protected]> wrote: > >>>> > >>>>> Hi, any suggestions? > >>>>> > >>>>> Is this a known limitation, or > >>>>> should I create a ticket about that? > >>>>> > >>>>> Thanks, > >>>>> Tomoko > >>>>> > >>>>> 2016-06-09 10:44 GMT+09:00 Tomoko Uchida < > [email protected]>: > >>>>> > Hello developers, > >>>>> > > >>>>> > I have sent same message to the user mailing list but there are no > >>>>> > reply. Could anyone help me? > >>>>> > Some jobs in our customer production environment no longer cannot > be > >>>>> > deleted for this problem. > >>>>> > > >>>>> > We are looking for solutions to delete the jobs safely. > >>>>> > If my question was not clear, I am ready to provide more detailed > >>>>> explanation. > >>>>> > > >>>>> > ---- > >>>>> > > >>>>> > Hello, > >>>>> > I encountered an SQLException when I deleted a job with many output > >>>>> connections. > >>>>> > > >>>>> > ERROR 2016-06-02 09:41:49,492 (Document delete thread '9') - > Document > >>>>> > delete thread aborting and restarting due to database connection > >>>>> > reset: Database exception: SQLException doing query (22001): ERROR: > >>>>> > value too long for type character varying(64) > >>>>> > > >>>>> > > >>>>> > I've found that the error occurred because of ManifoldCF trying to > >>>>> > insert long string (more than 64 characters) to 'activitytype' > column > >>>>> > of 'repohistory' table while deleting documents associated with the > >>>>> > job. > >>>>> > > >>>>> > For a trial, I altered 'activitytype' column type to 'text' by this > >>>>> > sentence. > >>>>> > > >>>>> > ALTER TABLE repohistory ALTER COLUMN activitytype TYPE text; > >>>>> > > >>>>> > After altering the table I restarted ManifoldCF then the deletion > >>>>> > histories was successfully added and the job seemed to be safely > >>>>> > deleted. > >>>>> > > >>>>> > Inserted 'activitytype' values are like this: > >>>>> > document deletion (outputA) (outputB) (outputC) (outputD) > (outputE) > >>>>> ... > >>>>> > > >>>>> > For application requirements, I cannot limit the number of output > >>>>> > connectors (to shorten history records.) > >>>>> > > >>>>> > Is that OK? Or there are good solutions for that? > >>>>> > > >>>>> > Thank you in advance, > >>>>> > Tomoko > >>>>> > >>>> > >>>> >
