[jira] [Commented] (CONNECTORS-1549) Include and exclude rules order lost

2018-10-18 Thread Karl Wright (JIRA)


[ 
https://issues.apache.org/jira/browse/CONNECTORS-1549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16656073#comment-16656073
 ] 

Karl Wright commented on CONNECTORS-1549:
-

I found the issue and have attached a patch.  Thanks!


> Include and exclude rules order lost
> 
>
> Key: CONNECTORS-1549
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1549
> Project: ManifoldCF
>  Issue Type: Bug
>  Components: API, JCIFS connector
>Affects Versions: ManifoldCF 2.11
>Reporter: Julien Massiera
>Assignee: Karl Wright
>Priority: Critical
> Attachments: image-2018-10-18-18-28-14-547.png, 
> image-2018-10-18-18-33-01-577.png, image-2018-10-18-18-34-01-542.png
>
>
> The include and exclude rules that can be defined in the job configuration 
> for the JCIFS connector can be combined and the defined order is really 
> important.
> The problem is that when one retrieve the job configuration as a json object 
> through the API, the include and exclude rules are splitted in two diffrent 
> arrays instead of one (one for each type of rule). So, the order is 
> completely lost when one try to recreate the job thanks to the API and the 
> JSON object. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (CONNECTORS-1549) Include and exclude rules order lost

2018-10-18 Thread Karl Wright (JIRA)


[ 
https://issues.apache.org/jira/browse/CONNECTORS-1549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16655986#comment-16655986
 ] 

Karl Wright commented on CONNECTORS-1549:
-

Hi [~julienFL]

Sorry for the delay.

First note that you can always use the order-preserving form even if MCF 
outputs the JSON in the other "sugary" form.  So this should unblock you.

Second, I'm looking at the code that generates the output in Configuration.java:

{code}
// The new JSON parser uses hash order for object keys.  So it isn't good 
enough to just detect that there's an
// intermingling.  Instead we need to the existence of more that one key; 
that implies that we need to do order preservation.
String lastChildType = null;
boolean needAlternate = false;
int i = 0;
while (i < getChildCount())
{
  ConfigurationNode child = findChild(i++);
  String key = child.getType();
  List list = childMap.get(key);
  if (list == null)
  {
// We found no existing list, so create one
list = new ArrayList();
childMap.put(key,list);
childList.add(key);
  }
  // Key order comes into play when we have elements of different types 
within the same child. 
  if (lastChildType != null && !lastChildType.equals(key))
  {
needAlternate = true;
break;
  }
  list.add(child);
  lastChildType = key;
}

if (needAlternate)
{
  // Can't use the array representation.  We'll need to start do a 
_children_ object, and enumerate
  // each child.  So, the JSON will look like:
  // :{_attribute_:xxx,_children_:[{_type_:, 
...},{_type_:, ...}, ...]}
...
{code}

The (needAlternate) clause is the one that writes the specification in the 
verbose form.  The logic seems like it would detect any time there's a subtree 
with a different key under a given level and set "needAlternate".  I'll stare 
at it some more but right now I'm having trouble seeing how this fails.


> Include and exclude rules order lost
> 
>
> Key: CONNECTORS-1549
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1549
> Project: ManifoldCF
>  Issue Type: Bug
>  Components: API, JCIFS connector
>Affects Versions: ManifoldCF 2.11
>Reporter: Julien Massiera
>Assignee: Karl Wright
>Priority: Critical
> Attachments: image-2018-10-18-18-28-14-547.png, 
> image-2018-10-18-18-33-01-577.png, image-2018-10-18-18-34-01-542.png
>
>
> The include and exclude rules that can be defined in the job configuration 
> for the JCIFS connector can be combined and the defined order is really 
> important.
> The problem is that when one retrieve the job configuration as a json object 
> through the API, the include and exclude rules are splitted in two diffrent 
> arrays instead of one (one for each type of rule). So, the order is 
> completely lost when one try to recreate the job thanks to the API and the 
> JSON object. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (CONNECTORS-1549) Include and exclude rules order lost

2018-10-18 Thread Julien Massiera (JIRA)


[ 
https://issues.apache.org/jira/browse/CONNECTORS-1549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16655562#comment-16655562
 ] 

Julien Massiera commented on CONNECTORS-1549:
-

After some tests, I noticed that the problem only happens when exclude rules 
are all defined BEFORE include rules. 
For example, here is my original job configuration : 
!image-2018-10-18-18-34-01-542.png!

and here is the extract of the JSON generated for the API for this job :
{code:java}
"startpoint": {
  "include": {
"_attribute_filespec": "*",
"_value_": "",
"_attribute_type": "file"
  },
  "_attribute_path": "ocr",
  "_value_": "",
  "exclude": [
{
  "_attribute_filespec": "*.pst",
  "_value_": "",
  "_attribute_type": "file"
},
{
  "_attribute_filespec": "*",
  "_value_": "",
  "_attribute_type": "directory"
}
  ]
}
{code}

When re-creating the job thanks to the same JSON here is the new job 
configuration:
!image-2018-10-18-18-33-01-577.png!

 

 

When executing my original job, the pst files are correctly filtered, but when 
executing the job created from the generated JSON, they are not excluded from 
the process.

However, if I create a job that combines include and exclude rules, the JSON 
generated by the API uses a different format for the filters : 
{code:java}
"startpoint": {
  "_children_": [
{
  "_type_": "include",
  "_attribute_filespec": "/subfolder1/",
  "_value_": "",
  "_attribute_type": "directory"
},
{
  "_type_": "exclude",
  "_attribute_filespec": "*",
  "_value_": "",
  "_attribute_type": "directory"
},
{
  "_type_": "include",
  "_attribute_filespec": "*",
  "_value_": "",
  "_attribute_type": "file"
}
  ],
  "_attribute_path": "ocr",
  "_value_": ""
}{code}
In that case, the order is respected and the job behavior is what I expect.

 

> Include and exclude rules order lost
> 
>
> Key: CONNECTORS-1549
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1549
> Project: ManifoldCF
>  Issue Type: Bug
>  Components: API, JCIFS connector
>Affects Versions: ManifoldCF 2.11
>Reporter: Julien Massiera
>Assignee: Karl Wright
>Priority: Critical
> Attachments: image-2018-10-18-18-28-14-547.png, 
> image-2018-10-18-18-33-01-577.png, image-2018-10-18-18-34-01-542.png
>
>
> The include and exclude rules that can be defined in the job configuration 
> for the JCIFS connector can be combined and the defined order is really 
> important.
> The problem is that when one retrieve the job configuration as a json object 
> through the API, the include and exclude rules are splitted in two diffrent 
> arrays instead of one (one for each type of rule). So, the order is 
> completely lost when one try to recreate the job thanks to the API and the 
> JSON object. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (CONNECTORS-1549) Include and exclude rules order lost

2018-10-18 Thread Karl Wright (JIRA)


[ 
https://issues.apache.org/jira/browse/CONNECTORS-1549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16655223#comment-16655223
 ] 

Karl Wright commented on CONNECTORS-1549:
-

Hi [~julienFL], there was a similar ticket a while back for the file system 
connector.  Let me explain what the solution was and see if you still think 
there is a problem.

(1) The actual internal representation of a Document Specification is XML.
(2) For the API, we convert the XML to JSON and back.
(3) Because a complete and unambiguous conversion between these formats is 
quite ugly, we have multiple ways of doing the conversion, so that we allow 
"syntactic sugar" in the JSON for specific cases where the conversion can be 
done simply.
(4) A while back, there was a bug in the code that determined whether it was 
possible to use syntactic sugar of the specific kind that would lead to two 
independent lists for the File System Connector's document specification, so 
for a while what was *output* when you exported the Job was incorrect, and 
order would be lost if you re-imported it.

The solution was to (a) fix the bug, and (b) get the person using the API to 
use the correct, unambigious JSON format instead of the "sugary" format.  This 
preserves order.

The way to see if this is what you are up against is to create a JCIFS job with 
a complex rule set that has both inclusions and exclusions.  If it looks 
different than what you are expecting, then try replicating that format when 
you import via the API.


> Include and exclude rules order lost
> 
>
> Key: CONNECTORS-1549
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1549
> Project: ManifoldCF
>  Issue Type: Bug
>  Components: API, JCIFS connector
>Affects Versions: ManifoldCF 2.11
>Reporter: Julien Massiera
>Assignee: Karl Wright
>Priority: Critical
>
> The include and exclude rules that can be defined in the job configuration 
> for the JCIFS connector can be combined and the defined order is really 
> important.
> The problem is that when one retrieve the job configuration as a json object 
> through the API, the include and exclude rules are splitted in two diffrent 
> arrays instead of one (one for each type of rule). So, the order is 
> completely lost when one try to recreate the job thanks to the API and the 
> JSON object. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)