Thanks Richard.

This could probably work. We haven’t tried converting via XML yet, because we 
had problems with some documents containing characters outside the allowed XML 
1.0 range, but it should be possible now that XML 1.1 is supported with the XMI 
serialiser. The corpus has to be sufficiently large of course before this extra 
work and processing overhead pays off, but we have some of these.

Cheers
Mario


> On 21 Oct 2020, at 22.34, Richard Eckart de Castilho <r...@apache.org> wrote:
>
> External email – Do not click links or open attachments unless you recognize 
> the sender and know that the content is safe.
>
>
> Hi Mario,
>
>> On 21. Oct 2020, at 21:26, Mario Juric <mario.ju...@cactusglobal.com> wrote:
>>
>> We never had problems migrating from one type system as long the types where 
>> either extended or something was deleted. The problem we had was when an 
>> attribute changed type, e.g. a change from a simple FSArray to a wrapper 
>> type with the custom java object and a FSArray. We tried something similar 
>> last year where a type A had an FSArray attribute with elements of another 
>> type B that previously inherited from Annotation, and we changed that to 
>> inherit from TOP instead, while all of the attributes of B, that we had 
>> declared, remained unchanged. Not surprisingly the deserialiser couldn’t 
>> load the old CAS leniently with this change, and we never figured out how to 
>> do a conversion, if that is at all possible, since A can only take one form, 
>> i.e. we haven’t figured out how to have two versions of A simultaneously in 
>> order to make a conversion. Maybe there are some lower level CAS 
>> possibilities that we are not aware of yet. The problem should be the same 
>> when changing the type of an attribute from FSArray to a wrapper type with 
>> custom java objects.
>
> Ok, I think I get the picture now. I was imagining to create a new type that 
> would replace the old and basically copying the data over into the new 
> structure. You are thinking of basically modifying a type "in-place".
>
> I think this is doable in the following way:
>
> 1) create a CAS "oldCas" with your existing type system
>
> CAS oldCas = CasFactory.createCas(
>  
> TypeSystemDescriptionFactory.createTypeSystemDescriptionFromPath("old_typesystem.xml");
>
> 2) create a CAS "newCas" with your new type system
>
> CAS newCas = CasFactory.createCas(
>  
> TypeSystemDescriptionFactory.createTypeSystemDescriptionFromPath("new_typesystem.xml");
>
> 3) implement a method taking two CASes and coping the data from one to the 
> other while
>   massaging relevant feature structures according to the changes in the type 
> system
>
> void copyAndUpgradeCas(CAS oldCas, CAS newCas) {
>  // Recursively collect all accessible feature structures in oldCas
>  // for each feature structure, create a copy in newCas
>  // If the feature structure is of a type which changed, copy data according 
> to the changes
>  // otherwise, copy it 1-to-1 (or at least the primitive values)
>  // collect a reference which old FS was mapped to which new FS which can be 
> used to connect
>  //   FS references in a second pass
>  // in a second pass copy/convert the FS references (i.e. non-primitive 
> features)
>  // Optionally repeat the process for other views in the CAS
> }
>
> (Basically step 3 is in a sense CasCopier - just a custom one where you apply 
> a data transformation
> instead of just copying the data.)
>
> Important for this to work is that you are using the CAS API and stay away 
> from the JCas API!
>
> If you had XMI data instead of binary CASes, I would have suggested that 
> DKPro Cassis might be a route to explore. With this library, you can load XMI 
> CAS objects into Python and Python objects are notoriously flexible and 
> malleable - much more so than CAS / JCas objects. I didn't dig into it, but I 
> could imagine that a CAS and type system loaded using DKPro Cassis could be 
> monkey-patched in-place into a new structure. But then again, I haven't tried 
> using Cassis for this purpose but I am quite confident that
> the Java-based approach I outlined above should be doable.
>
> Cheerio,
>
> -- Richard
>


________________________________
Disclaimer:
This email and any files transmitted with it are confidential and directed 
solely for the use of the intended addressee or addressees and may contain 
information that is legally privileged, confidential, and exempt from 
disclosure. If you have received this email in error, please notify the sender 
by telephone, fax, or return email and immediately delete this email and any 
files transmitted along with it. Unintended recipients are not authorized to 
disclose, disseminate, distribute, copy or take any action in reliance on 
information contained in this email and/or any files attached thereto, in any 
manner other than to notify the sender; any unauthorized use is subject to 
legal prosecution.

Reply via email to