UIMAv3 & WebAnno

2018-01-03 Thread Richard Eckart de Castilho
Hi again,

I have once again switched my local environment to a UIMA v3 mode:

- UIMA SDK v3 (3.0.1-beta-SNAPSHOT v3 branch)
- uimaFIT (3.0.0-SNAPSHOT v3 branch)
- DKPro Core (2.0.x branch)
- WebAnno (feature/issue1115-uimav3 branch)

Last time, I ran into trouble because the IDs loaded from serialized CAS files 
were no longer accessible.
I programmatically set "uima.default_v2_id_references" to "true" during startup 
now to avoid that.


But what seems to be happening even before getting there is that I run again in 
JCas <-> Type System problems.
When a user opens a document for annotation in WebAnno, WebAnno loads the 
serialized CAS (CasCompleteSerializer),
serializes the CAS into a byte array (compressed form 6), creates a new CAS 
with the current type system definition,
and deserializes the data again into that CAS. The idea is that the lenient 
loading of the compressed form 6 allows

  a) new types / features to be added in that way
  b) unreachable FSes to be garbage collected

So, it is not an uncommon case here that the data stored with the 
CasCompleteSerializer used a different type system than the CAs into which it 
is loaded - and in fact it can be the case that the data stored with the 
CasCompleteSerializer had used different JCas wrappers at the time then what is 
available at the time of loading
the data again. Afaik in there should be no truely incompatible changes in the 
type system though - i.e. only new features / types were added; no features 
were removed. Still, I get a lot of this type of error:

> org.apache.uima.cas.CASRuntimeException: The JCas cannot be initialized.  The 
> following errors occurred: 
> In JCAS class 
> "de.tudarmstadt.ukp.dkpro.core.api.lexmorph.type.morph.MorphologicalFeatures",
>  UIMA field 
> "de.tudarmstadt.ukp.dkpro.core.api.lexmorph.type.morph.MorphologicalFeatures:verbForm"
>  was set up when this class was previously loaded and initialized, to have an 
> adjusted offset of "-1" but now the feature has a different adjusted offset 
> of "5"; this may be due to something else other than type system commit 
> actions loading and initializing the JCas class, or to having a different 
> non-compatible type system for this class, trying to use a common JCas cover 
> class, which is not supported. 
>  
>   at 
> org.apache.uima.cas.impl.FSClassRegistry.reportErrors(FSClassRegistry.java:870)
>  ~[classes/:?]
>   at 
> org.apache.uima.cas.impl.FSClassRegistry.loadJCasForTSandClassLoader(FSClassRegistry.java:342)
>  ~[classes/:?]
>   at 
> org.apache.uima.cas.impl.FSClassRegistry.getGeneratorsForClassLoader(FSClassRegistry.java:904)
>  ~[classes/:?]
>   at 
> org.apache.uima.cas.impl.TypeSystemImpl.getGeneratorsForClassLoader(TypeSystemImpl.java:2651)
>  ~[classes/:?]
>   at 
> org.apache.uima.cas.impl.TypeSystemImpl.commit(TypeSystemImpl.java:1393) 
> ~[classes/:?]
>   at org.apache.uima.cas.impl.CASImpl.commitTypeSystem(CASImpl.java:1607) 
> ~[classes/:?]
>   at 
> org.apache.uima.util.CasCreationUtils.doCreateCas(CasCreationUtils.java:614) 
> ~[classes/:?]
>   at 
> org.apache.uima.util.CasCreationUtils.createCas(CasCreationUtils.java:362) 
> ~[classes/:?]
>   at 
> org.apache.uima.util.CasCreationUtils.createCas(CasCreationUtils.java:313) 
> ~[classes/:?]
>   at 
> org.apache.uima.fit.factory.JCasFactory.createJCas(JCasFactory.java:147) 
> ~[classes/:?]
>   at 
> de.tudarmstadt.ukp.clarin.webanno.api.dao.AnnotationSchemaServiceImpl.upgradeCas(AnnotationSchemaServiceImpl.java:640)
>  ~[classes/:?]

I have the feeling that this is what happens:

1) a CasCompleteSerialized-CAS is loaded - it was created at a time when the 
MorphologicalFeatures did not yet have a feature called "verbForm".
2) I create a new JCas, now using a type system description where 
MorphologicalFeatures includes the "verbForm" feature

At step 2, the above error seems to be triggered. I actually do not even get to 
the point where I would temporarily serialize into form 6 and back. The code 
already crashes when trying to set up the target task with the updated type 
system.

Any ideas?

Cheers,

-- Richard

Jenkins build is back to normal : UIMA-SDK #992

2018-01-03 Thread Apache Jenkins Server
See 



anyone know why java 8 now seems to be required by our jenkins maven builds?

2018-01-03 Thread Marshall Schor
Both Ducc and UIMA-SDK builds got a failure using maven 3.3.9 - did that get
compiled with Java 8 (and now won't run with java 7)? or something else? 

Workaround - seems to be to configure Jenkins to use Java 1.8 (latest).

-Marshall



[jira] [Commented] (UIMA-5662) uv3 support CAS deserialization subsequent low level access

2018-01-03 Thread Marshall Schor (JIRA)

[ 
https://issues.apache.org/jira/browse/UIMA-5662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16310069#comment-16310069
 ] 

Marshall Schor commented on UIMA-5662:
--

Current v2 design for Xmi deserialization creates the internal CAS structures 
in a different order from the original CAS, so the "addresses" do not match.  
However, the deserialization can return extra metadata, so that a subsequent 
reserialization will have the original Xmi ids.   For now, V3 will keep this 
same behavior.  (so no changes are needed to the XmiCasDeserializer code).

> uv3 support CAS deserialization subsequent low level access
> ---
>
> Key: UIMA-5662
> URL: https://issues.apache.org/jira/browse/UIMA-5662
> Project: UIMA
>  Issue Type: Improvement
>  Components: Core Java Framework
>Affects Versions: 3.0.0SDK-beta
>Reporter: Marshall Schor
>Assignee: Marshall Schor
>Priority: Minor
> Fix For: 3.0.0SDK
>
>
> Some users depend 1) constant v2-ids for FSs preserved in deserialization and 
> serialization, and 2) low level cas API access to these.
> V3 normally doesn't maintain tables linking ids to FSs, as these (unless weak 
> refs are used) prevent GC of unreachable FSs.
> Based on a mode, set by -Duima.deserialize_perserve_ids, and also 
> controllable by new config option per deserialize call, alter the 
> deserialization for those deserializers which know about v2 ids, to put these 
> into the map used for low-level CAS access, using the actual v2 ids, and 
> change the v3 next available id for future new FSs to be 1 beyond the end.
> The -Duima.deserialize-preserve_ids global setting is needed to handle the 
> use case of some annotators using low-level APIs, when part of a pipeline is 
> "remoted". 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Closed] (UIMA-5691) hex to byte conversion routine wrong for lower case hex

2018-01-03 Thread Marshall Schor (JIRA)

 [ 
https://issues.apache.org/jira/browse/UIMA-5691?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marshall Schor closed UIMA-5691.

Resolution: Fixed

> hex to byte conversion routine wrong for lower case hex
> ---
>
> Key: UIMA-5691
> URL: https://issues.apache.org/jira/browse/UIMA-5691
> Project: UIMA
>  Issue Type: Bug
>  Components: Core Java Framework
>Affects Versions: 3.0.0SDK-beta, 2.10.2SDK
>Reporter: Marshall Schor
>Assignee: Marshall Schor
>Priority: Minor
> Fix For: 3.0.0SDK, 2.10.3SDK
>
>
> bug in XmiDeserialization code in hex char to byte when converting lower-case 
> hex chars - using wrong lower bound char (should be 'a', but is using '1').  
> This bug is from 2008.  Since no one has noticed, it's probably true that 
> lower case hex representations are never being used in Xmi byte array 
> serializations.  But this should be fixed anyways.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


Build failed in Jenkins: UIMA-SDK #991

2018-01-03 Thread Apache Jenkins Server
See 

Changes:

[schor] [UIMA-5691] fix hex char to byte conversion for lower case hex char 
format

--
[...truncated 277.13 KB...]
A 
uimaj-json/src/test/java/org/apache/uima/json/JsonCasSerializerTest.java
A 
uimaj-json/src/test/java/org/apache/uima/json/JsonMetaDataObjectTest.java
A uimaj-json/src/test/java/org/apache/uima/test
A uimaj-json/src/test/java/org/apache/uima/test/AllTypes.java
A uimaj-json/src/test/java/org/apache/uima/test/AllTypes_Type.java
A uimaj-json/src/test/java/org/apache/uima/test/RefTypes.java
A uimaj-json/src/test/java/org/apache/uima/test/RefTypes_Type.java
A uimaj-json/src/test/resources
A uimaj-json/src/test/resources/CasSerialization
A uimaj-json/src/test/resources/CasSerialization/desc
A uimaj-json/src/test/resources/CasSerialization/desc/allTypes.xml
A 
uimaj-json/src/test/resources/CasSerialization/desc/nameSpaceNeeded.xml
A uimaj-json/src/test/resources/CasSerialization/desc/refTypes.xml
A uimaj-json/src/test/resources/CasSerialization/expected
A uimaj-json/src/test/resources/CasSerialization/expected/json
A 
uimaj-json/src/test/resources/CasSerialization/expected/json/nameSpaceCollisionOmits.txt
A 
uimaj-json/src/test/resources/CasSerialization/expected/json/topAndTokenOnlyNoSubtypes.txt
A 
uimaj-json/src/test/resources/CasSerialization/expected/json/emptyCAS.txt
A 
uimaj-json/src/test/resources/CasSerialization/expected/json/array-self-items-all-embeddable-l.txt
A 
uimaj-json/src/test/resources/CasSerialization/expected/json/twoListMerge.txt
A 
uimaj-json/src/test/resources/CasSerialization/expected/json/array-non-embeddable-a.txt
A 
uimaj-json/src/test/resources/CasSerialization/expected/json/indexedSingleListStatic.txt
A 
uimaj-json/src/test/resources/CasSerialization/expected/json/array-self-non-embeddable-l.txt
A 
uimaj-json/src/test/resources/CasSerialization/expected/json/topAndTokenOnly.txt
A 
uimaj-json/src/test/resources/CasSerialization/expected/json/array-a1-not-a.txt
A 
uimaj-json/src/test/resources/CasSerialization/expected/json/array-a2-not-a.txt
A 
uimaj-json/src/test/resources/CasSerialization/expected/json/nameSpaceCollision2.txt
A 
uimaj-json/src/test/resources/CasSerialization/expected/json/topAndTokenOnlyNoContext.txt
A 
uimaj-json/src/test/resources/CasSerialization/expected/json/array-a3-not-a.txt
A 
uimaj-json/src/test/resources/CasSerialization/expected/json/twoListMergeStatic.txt
A 
uimaj-json/src/test/resources/CasSerialization/expected/json/array-non-embeddable-l.txt
A 
uimaj-json/src/test/resources/CasSerialization/expected/json/allValuesNoOmits.txt
A 
uimaj-json/src/test/resources/CasSerialization/expected/json/topNoContext.txt
A 
uimaj-json/src/test/resources/CasSerialization/expected/json/array-all-embeddable-a.txt
A 
uimaj-json/src/test/resources/CasSerialization/expected/json/array-a1-not-l.txt
A 
uimaj-json/src/test/resources/CasSerialization/expected/json/array-a2-not-l.txt
A 
uimaj-json/src/test/resources/CasSerialization/expected/json/nameSpaceCollision2Omits.txt
A 
uimaj-json/src/test/resources/CasSerialization/expected/json/array-a3-not-l.txt
A 
uimaj-json/src/test/resources/CasSerialization/expected/json/multipleViews.txt
A 
uimaj-json/src/test/resources/CasSerialization/expected/json/allValuesStaticNoOmits.txt
A 
uimaj-json/src/test/resources/CasSerialization/expected/json/topExpandedNamesNoViews.txt
A 
uimaj-json/src/test/resources/CasSerialization/expected/json/topWithNamedViewOmits.txt
A 
uimaj-json/src/test/resources/CasSerialization/expected/json/nameSpaceNoCollsionFiltered.txt
A 
uimaj-json/src/test/resources/CasSerialization/expected/json/nameSpaceCollsionFiltered.txt
A 
uimaj-json/src/test/resources/CasSerialization/expected/json/array-all-embeddable-l.txt
A 
uimaj-json/src/test/resources/CasSerialization/expected/json/topAndTokenOnlyNoExpandedTypeNames.txt
A 
uimaj-json/src/test/resources/CasSerialization/expected/json/nameSpaceCollision2pp.txt
A 
uimaj-json/src/test/resources/CasSerialization/expected/json/topNoExpandedTypeNames.txt
A 
uimaj-json/src/test/resources/CasSerialization/expected/json/indexedAndRef.txt
A 
uimaj-json/src/test/resources/CasSerialization/expected/json/allValuesOmits.txt
A 
uimaj-json/src/test/resources/CasSerialization/expected/json/nameSpaceCollision2ppOmits.txt
A 
uimaj-json/src/test/resources/CasSerialization/expected/json/nameSpaceCollision.txt
A 
uimaj-json/src/test/resources/CasSerialization/expected/json/array-self-items-all-embeddable-a.txt
A 

Jenkins build is back to normal : UIMA-v3-sdk #333

2018-01-03 Thread Apache Jenkins Server
See 




[jira] [Commented] (UIMA-5542) UIMA-DUCC: upgrade JNA or correct LICENSE

2018-01-03 Thread Jerry Cwiklik (JIRA)

[ 
https://issues.apache.org/jira/browse/UIMA-5542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16309765#comment-16309765
 ] 

Jerry Cwiklik commented on UIMA-5542:
-

To use OS-based login for WS while running with IBM Java, the minimum JDK 
version is Java 8 SR4 FP5 (8.0.4.5)

> UIMA-DUCC: upgrade JNA or correct LICENSE
> -
>
> Key: UIMA-5542
> URL: https://issues.apache.org/jira/browse/UIMA-5542
> Project: UIMA
>  Issue Type: Bug
>  Components: DUCC
>Reporter: Jerry Cwiklik
>Assignee: Jerry Cwiklik
> Fix For: 2.2.2-Ducc
>
>
> DUCC License says we depend on jna-4.2.2 which was true at some point. The 
> ibm jvm had some issues with that causing a hang. The parent pom dependency 
> was changed to jna 4.0.0 but LICENSE file had not been updated. Apparently 
> ibm jvm was fixed (jdk8?) and is now happy with a newer jna. If so we can 
> upgrade jna for the next release.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (UIMA-5691) hex to byte conversion routine wrong for lower case hex

2018-01-03 Thread Marshall Schor (JIRA)
Marshall Schor created UIMA-5691:


 Summary: hex to byte conversion routine wrong for lower case hex
 Key: UIMA-5691
 URL: https://issues.apache.org/jira/browse/UIMA-5691
 Project: UIMA
  Issue Type: Bug
  Components: Core Java Framework
Affects Versions: 2.10.2SDK, 3.0.0SDK-beta
Reporter: Marshall Schor
Assignee: Marshall Schor
Priority: Minor
 Fix For: 3.0.0SDK, 2.10.3SDK


bug in XmiDeserialization code in hex char to byte when converting lower-case 
hex chars - using wrong lower bound char (should be 'a', but is using '1').  
This bug is from 2008.  Since no one has noticed, it's probably true that lower 
case hex representations are never being used in Xmi byte array serializations. 
 But this should be fixed anyways.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (UIMA-5662) uv3 support CAS deserialization subsequent low level access

2018-01-03 Thread Marshall Schor (JIRA)

[ 
https://issues.apache.org/jira/browse/UIMA-5662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16309678#comment-16309678
 ] 

Marshall Schor commented on UIMA-5662:
--

I'm trying to support XCAS and Xmi in this new mode, as well.  

For Xmi, the serialized form may contain sequences of UIMA Lists, encoded as 
just the item values; this serialization doesn't have any fsId information for 
these.  (Note: some list elements may be multiply referenced; these will have 
fsIds).  For the missing fsId case, I'm thinking of assigning fsIds to these, 
following the deserialization.  

XCAS should be OK - all Feature Structures (I believe) have id's in the 
serialized format.

> uv3 support CAS deserialization subsequent low level access
> ---
>
> Key: UIMA-5662
> URL: https://issues.apache.org/jira/browse/UIMA-5662
> Project: UIMA
>  Issue Type: Improvement
>  Components: Core Java Framework
>Affects Versions: 3.0.0SDK-beta
>Reporter: Marshall Schor
>Assignee: Marshall Schor
>Priority: Minor
> Fix For: 3.0.0SDK
>
>
> Some users depend 1) constant v2-ids for FSs preserved in deserialization and 
> serialization, and 2) low level cas API access to these.
> V3 normally doesn't maintain tables linking ids to FSs, as these (unless weak 
> refs are used) prevent GC of unreachable FSs.
> Based on a mode, set by -Duima.deserialize_perserve_ids, and also 
> controllable by new config option per deserialize call, alter the 
> deserialization for those deserializers which know about v2 ids, to put these 
> into the map used for low-level CAS access, using the actual v2 ids, and 
> change the v3 next available id for future new FSs to be 1 beyond the end.
> The -Duima.deserialize-preserve_ids global setting is needed to handle the 
> use case of some annotators using low-level APIs, when part of a pipeline is 
> "remoted". 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)