[ 
https://issues.apache.org/jira/browse/AIRAVATA-2741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcus Christie updated AIRAVATA-2741:
--------------------------------------
    Description: 
Just want to capture some details of recent conversations with [~eroma_a] and 
[~spamidig] on how to improve Airavata capabilities so we can move beyond using 
ARCHIVE.  The ARCHIVE capability is a bit of a hack and causes some issues for 
us. Just briefly, here are some of the problems:
* pulls back absolutely every file but some aren't needed and some intermediate 
files are very large. For some applications it isn't even practical to use 
ARCHIVE
* pulls back duplicates of Application Output files, further filling gateway 
data storage
* these files are basically opaque to Airavata, so there is a limit on what can 
be done in a programmatic way for some of these files

Here are some potential improvements:
* improve wildcard support: allow specifying a wildcard that can match a single 
or multiple files. For multiple files these can all be registered as a 
URI_COLLECTION type data output. (Side note: I'm not sure what all is currently 
supported with the wildcard support, need to investigate)
* Show all of the job files in the portal, including ones that aren't defined 
as Application Outputs and haven't actually been staged back to the portal, and 
allow the user to request pulling back one of these other files. This would be 
nice because there are certainly going to be cases where a file is generated 
that wasn't anticipated (either lack of configuration or just something truly 
not anticipatable). Would mean needing to register every file in the job 
directory, not just the Application Outputs (not sure where, replica catalog?). 
Would also mean we need backend task execution support for fetching these files 
as needed.


  was:
Just want to capture some details of recent conversations with [~eroma_a] and 
[~spamidig] on how to improve Airavata capabilities so we can move beyond using 
ARCHIVE.  The ARCHIVE capability is a bit of a hack and causes some issues for 
us. Just briefly, here are some of the problems:
* pulls back absolutely every file but some aren't needed and some intermediate 
files are very large. For some applications it isn't even practical to use 
ARCHIVE
* pulls back duplicates of Application Output files, further filling gateway 
data storage
* these files are basically opaque to Airavata, so there is a limit on what can 
be done in a programmatic way for some of these files

Here are some potential improvements:
* improve wildcard support: allow specifying a wildcard that can match a single 
or multiple files. For multiple files these can all be registered as a 
URI_COLLECTION type data output.
* Show all of the job files in the portal, including ones that aren't defined 
as Application Outputs and haven't actually been staged back to the portal, and 
allow the user to request pulling back one of these other files. This would be 
nice because there are certainly going to be cases where a file is generated 
that wasn't anticipated (either lack of configuration or just something truly 
not anticipatable). Would mean needing to register every file in the job 
directory, not just the Application Outputs (not sure where, replica catalog?). 
Would also mean we need backend task execution support for fetching these files 
as needed.



> Ideas for better way to deal with arbitrary output files than ARCHIVE
> ---------------------------------------------------------------------
>
>                 Key: AIRAVATA-2741
>                 URL: https://issues.apache.org/jira/browse/AIRAVATA-2741
>             Project: Airavata
>          Issue Type: Improvement
>            Reporter: Marcus Christie
>            Assignee: Marcus Christie
>            Priority: Major
>
> Just want to capture some details of recent conversations with [~eroma_a] and 
> [~spamidig] on how to improve Airavata capabilities so we can move beyond 
> using ARCHIVE.  The ARCHIVE capability is a bit of a hack and causes some 
> issues for us. Just briefly, here are some of the problems:
> * pulls back absolutely every file but some aren't needed and some 
> intermediate files are very large. For some applications it isn't even 
> practical to use ARCHIVE
> * pulls back duplicates of Application Output files, further filling gateway 
> data storage
> * these files are basically opaque to Airavata, so there is a limit on what 
> can be done in a programmatic way for some of these files
> Here are some potential improvements:
> * improve wildcard support: allow specifying a wildcard that can match a 
> single or multiple files. For multiple files these can all be registered as a 
> URI_COLLECTION type data output. (Side note: I'm not sure what all is 
> currently supported with the wildcard support, need to investigate)
> * Show all of the job files in the portal, including ones that aren't defined 
> as Application Outputs and haven't actually been staged back to the portal, 
> and allow the user to request pulling back one of these other files. This 
> would be nice because there are certainly going to be cases where a file is 
> generated that wasn't anticipated (either lack of configuration or just 
> something truly not anticipatable). Would mean needing to register every file 
> in the job directory, not just the Application Outputs (not sure where, 
> replica catalog?). Would also mean we need backend task execution support for 
> fetching these files as needed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to