And some additional information are here.
I use ManifoldCF 2.2.
> (1) Which underlying database are you using?
I use PostgreSQL 9.4.5
> (2) Have you modified the MCF schema in any way?
No. I did not modify any MCF db schema.
> (3) What are the actual names of the output connections in question?
For example, a job has 8 outputs below. There are other jobs that
cannot be deleted by same reason.
- ds_solr_forum_en-eu
- ds_solr_forum_en-in
- ds_solr_forum_en-sg
- ds_solr_forum_en-us
- ds_solr_forum_ko-kr_en
- ds_solr_forum_zh-cn_en
- ds_solr_forum_zh-tw_en
- ds_solr_forum_pt-br_en
For business requirements, I crawl a web site and post to multiple
(eight) solr cores.
Whole job definition is below (I deleted seeds/includes/excludes URLs
from the original json data):
{
"job": {
"description": "ds_forum_en",
"document_specification": {
"excludes": “…”,
"excludescontentindex": "",
"excludesindex": "",
"includes": “…”,
"includesindex": ".*",
"limittoseeds": {
"_attribute_value": "true",
"_value_": ""
},
"seeds": “…”
},
"expiration_interval": "infinite",
"hopcount_mode": "accurate",
"id": "1464673266530",
"pipelinestage": [
{
"stage_connectionname": "ds_solr_forum_en-eu",
"stage_id": "0",
"stage_isoutput": "true",
"stage_specification": {}
},
{
"stage_connectionname": "ds_solr_forum_en-in",
"stage_id": "1",
"stage_isoutput": "true",
"stage_specification": {}
},
{
"stage_connectionname": "ds_solr_forum_en-sg",
"stage_id": "2",
"stage_isoutput": "true",
"stage_specification": {}
},
{
"stage_connectionname": "ds_solr_forum_en-us",
"stage_id": "3",
"stage_isoutput": "true",
"stage_specification": {}
},
{
"stage_connectionname": "ds_solr_forum_ko-kr_en",
"stage_id": "4",
"stage_isoutput": "true",
"stage_specification": {}
},
{
"stage_connectionname": "ds_solr_forum_zh-cn_en",
"stage_id": "5",
"stage_isoutput": "true",
"stage_specification": {}
},
{
"stage_connectionname": "ds_solr_forum_zh-tw_en",
"stage_id": "6",
"stage_isoutput": "true",
"stage_specification": {}
},
{
"stage_connectionname": "ds_solr_forum_pt-br_en",
"stage_id": "7",
"stage_isoutput": "true",
"stage_specification": {}
}
],
"priority": "5",
"recrawl_interval": "86400000",
"repository_connection": "ds_forum_en",
"reseed_interval": "3600000",
"run_mode": "continuous",
"start_mode": "manual"
}
}
Thank you,
Tomoko
2016-06-13 18:09 GMT+09:00 Tomoko Uchida <[email protected]>:
> Hi Karl,
>
> Thank you for rapid response! I'll try the patch soon.
>
> Regards,
> Tomoko
>
> 2016-06-13 16:20 GMT+09:00 Karl Wright <[email protected]>:
>> Ok, some further exploration yields the following:
>> (1) A check was put into the code a while ago to prevent overly long
>> activity names from blowing things up. That is why we no longer see this
>> problem.
>> (2) There was a problem with activity logging for deletions across multiple
>> output connections. See CONNECTORS-1323. I've provided a patch.
>>
>> Karl
>>
>>
>> On Mon, Jun 13, 2016 at 1:55 AM, Karl Wright <[email protected]> wrote:
>>
>>> Hi Tomoko,
>>>
>>> Sorry, I missed this post when it was originally made.
>>>
>>> The activitytype column is provided by the framework for only a small
>>> number of specific events. In no case does the activitytype contain
>>> anything other than a fixed-length string; it's meant to be queried on.
>>> That string may include the name of a single output connection or of a
>>> transformation connection, but only one. The maximum length of an output
>>> or transformation connection name is 32, so the total length available for
>>> the rest of the activitytype column is 30.
>>>
>>> The string "document deletion" is 17 characters, so that's nowhere near
>>> the limit here. So this makes no sense.
>>>
>>> Can you be more specific about the following:
>>>
>>> (1) Which underlying database are you using?
>>> (2) Have you modified the MCF schema in any way?
>>> (3) What are the actual names of the output connections in question?
>>>
>>> Thanks,
>>> Karl
>>>
>>>
>>>
>>>
>>> On Sun, Jun 12, 2016 at 10:42 PM, Tomoko Uchida <
>>> [email protected]> wrote:
>>>
>>>> Hi, any suggestions?
>>>>
>>>> Is this a known limitation, or
>>>> should I create a ticket about that?
>>>>
>>>> Thanks,
>>>> Tomoko
>>>>
>>>> 2016-06-09 10:44 GMT+09:00 Tomoko Uchida <[email protected]>:
>>>> > Hello developers,
>>>> >
>>>> > I have sent same message to the user mailing list but there are no
>>>> > reply. Could anyone help me?
>>>> > Some jobs in our customer production environment no longer cannot be
>>>> > deleted for this problem.
>>>> >
>>>> > We are looking for solutions to delete the jobs safely.
>>>> > If my question was not clear, I am ready to provide more detailed
>>>> explanation.
>>>> >
>>>> > ----
>>>> >
>>>> > Hello,
>>>> > I encountered an SQLException when I deleted a job with many output
>>>> connections.
>>>> >
>>>> > ERROR 2016-06-02 09:41:49,492 (Document delete thread '9') - Document
>>>> > delete thread aborting and restarting due to database connection
>>>> > reset: Database exception: SQLException doing query (22001): ERROR:
>>>> > value too long for type character varying(64)
>>>> >
>>>> >
>>>> > I've found that the error occurred because of ManifoldCF trying to
>>>> > insert long string (more than 64 characters) to 'activitytype' column
>>>> > of 'repohistory' table while deleting documents associated with the
>>>> > job.
>>>> >
>>>> > For a trial, I altered 'activitytype' column type to 'text' by this
>>>> > sentence.
>>>> >
>>>> > ALTER TABLE repohistory ALTER COLUMN activitytype TYPE text;
>>>> >
>>>> > After altering the table I restarted ManifoldCF then the deletion
>>>> > histories was successfully added and the job seemed to be safely
>>>> > deleted.
>>>> >
>>>> > Inserted 'activitytype' values are like this:
>>>> > document deletion (outputA) (outputB) (outputC) (outputD) (outputE)
>>>> ...
>>>> >
>>>> > For application requirements, I cannot limit the number of output
>>>> > connectors (to shorten history records.)
>>>> >
>>>> > Is that OK? Or there are good solutions for that?
>>>> >
>>>> > Thank you in advance,
>>>> > Tomoko
>>>>
>>>
>>>