Re: Apache Airavata-OODT Integration
Hi Chris, I have started looking at changing the current implementation to use file Name instead of product id. As per the current PGETask wrapper implementation, it takes two inputs (Product ID or file path at the remote location. If filePath is used force staging should be set. But I am not quite sure what it means by force staging). If I am to use the current provisions in PGETaskWrapper, then remote file path (Not the file name) has to be given as input. I am not quite sure whether it is ideal to use file path instead of file name. If filename to use as input, then FilesStager needs to be customized to retrieve product references from file name. File manager client doesn't have a mechanism to retrieve product by file name. But it has mechanism to retrieve product by product name. I guess typically both are the same. One drawback of this approach is that it doesn't support list of product names. The method getProductReferences which returns list of products is based on back end implementation that is based on product id, through actual input is product (Product with just product name set is not possible to as input). Please let me know your thoughts. Best Regards, Sanjaya On Mon, Jun 17, 2013 at 5:52 PM, Sanjaya Medonsa sanjaya...@gmail.comwrote: Thanks Chris. I'll update the implementation to use file name instead of OODT product id. Cheers, Sanjaya On Sun, Jun 16, 2013 at 12:51 AM, Mattmann, Chris A (398J) chris.a.mattm...@jpl.nasa.gov wrote: Hey Sanjaya, sure +1 use the Filename. It's not guaranteed to be unique, but you can easily just pop the first one off the top (latest) and take that (since it's sorted by product received time). You may check out the pcs-core module and some of its internal classes like FileManagerUtils to see some cool helper functions that could aid in this regard. Cheers, Chris ++ Chris Mattmann, Ph.D. Senior Computer Scientist NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 171-266B, Mailstop: 171-246 Email: chris.a.mattm...@nasa.gov WWW: http://sunset.usc.edu/~mattmann/ ++ Adjunct Assistant Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA ++ -Original Message- From: Sanjaya Medonsa sanjaya...@gmail.com Reply-To: d...@airavata.apache.org d...@airavata.apache.org Date: Saturday, June 15, 2013 4:04 AM To: Airavata Dev d...@airavata.apache.org Subject: Re: Apache Airavata-OODT Integration Thanks Chris for your help! Working directory is available in JobExecutionContext in Airavata and directory can easily be retrieved. Issue in my case is that, from XBaya GUI I take product id as input not the file name. Internally file stager query the file manager using product id to retrieve product reference and corresponding file name to stage the file into input dir. Since this product id to file name mapping happens internally during the file staging, my implementation don't have access to filename unless I query the file manager to retrieve the corresponding file name using product id. One of the major issue in my implementation seems that I use OODT product id as input, not the file name. Should I change my implementation to use file name instead of product id ? Best Regards, Sanjaya On Fri, Jun 14, 2013 at 8:51 PM, Mattmann, Chris A (398J) chris.a.mattm...@jpl.nasa.gov wrote: Hey Sanjaya, Easy, see the attached PGEConfig.xml here: http://paste.apache.org/6OGW In that file: 1. We compute the staged file path by computing JobDir 2. We create in the exe block a staged input dir 3. We stage the files just using cps in the exeBlock (could have just as easily used fileStager) 4. We know that the file is [JobInputDir]/[Filename] HTH. Cheers, Chris ++ Chris Mattmann, Ph.D. Senior Computer Scientist NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 171-266B, Mailstop: 171-246 Email: chris.a.mattm...@nasa.gov WWW: http://sunset.usc.edu/~mattmann/ ++ Adjunct Assistant Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA ++ -Original Message- From: Sanjaya Medonsa sanjaya...@gmail.com Reply-To: d...@airavata.apache.org d...@airavata.apache.org Date: Friday, June 14, 2013 5:02 AM To: Airavata Dev d...@airavata.apache.org Subject: Re: Apache Airavata-OODT Integration Thanks Chris for your input. I actually use the PGETaskInstance for file staging with minimal additional code. But my issue issue
Re: Apache Airavata-OODT Integration
Hi Sanjaya, -Original Message- From: Sanjaya Medonsa sanjaya...@gmail.com Reply-To: d...@airavata.apache.org d...@airavata.apache.org Date: Monday, July 8, 2013 12:09 AM To: Airavata Dev d...@airavata.apache.org Cc: dev@oodt.apache.org dev@oodt.apache.org Subject: Re: Apache Airavata-OODT Integration Hi Chris, I have started looking at changing the current implementation to use file Name instead of product id. As per the current PGETask wrapper implementation, it takes two inputs (Product ID or file path at the remote location. If filePath is used force staging should be set. But I am not quite sure what it means by force staging). Force staging I believe controls whether or not the staged files are overwritten. If I am to use the current provisions in PGETaskWrapper, then remote file path (Not the file name) has to be given as input. I am not quite sure whether it is ideal to use file path instead of file name. You can easily generate the file path (which does not have to be remote, in fact, if you think about it, it could easily be local and in Apache OODT, we typically ensure it's local by using distributed filesystems like HDFS or NFS or Gluster to make remote files appear local by pushing that portion down into the distributed filesystem which we think does a better job of data movement :) ). To generate the file path you can use CAS-PGE SQLQuery facility that will allow you to look up e.g., $FileLocation/$Filename based on met fields, which in turn you can then feed into the path. If filename to use as input, then FilesStager needs to be customized to retrieve product references from file name. See above for an alternative. File manager client doesn't have a mechanism to retrieve product by file name. But it has mechanism to retrieve product by product name. I guess typically both are the same. Yeah, or the other easy mechanism is simply to issue a query, e.g., build yourself a Filename query and then query the FM Catalog. One drawback of this approach is that it doesn't support list of product names. The method getProductReferences which returns list of products is based on back end implementation that is based on product id, through actual input is product (Product with just product name set is not possible to as input). Please let me know your thoughts. See above. Cheers, Chris ++ Chris Mattmann, Ph.D. Senior Computer Scientist NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 171-266B, Mailstop: 171-246 Email: chris.a.mattm...@nasa.gov WWW: http://sunset.usc.edu/~mattmann/ ++ Adjunct Assistant Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA ++ On Mon, Jun 17, 2013 at 5:52 PM, Sanjaya Medonsa sanjaya...@gmail.comwrote: Thanks Chris. I'll update the implementation to use file name instead of OODT product id. Cheers, Sanjaya On Sun, Jun 16, 2013 at 12:51 AM, Mattmann, Chris A (398J) chris.a.mattm...@jpl.nasa.gov wrote: Hey Sanjaya, sure +1 use the Filename. It's not guaranteed to be unique, but you can easily just pop the first one off the top (latest) and take that (since it's sorted by product received time). You may check out the pcs-core module and some of its internal classes like FileManagerUtils to see some cool helper functions that could aid in this regard. Cheers, Chris ++ Chris Mattmann, Ph.D. Senior Computer Scientist NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 171-266B, Mailstop: 171-246 Email: chris.a.mattm...@nasa.gov WWW: http://sunset.usc.edu/~mattmann/ ++ Adjunct Assistant Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA ++ -Original Message- From: Sanjaya Medonsa sanjaya...@gmail.com Reply-To: d...@airavata.apache.org d...@airavata.apache.org Date: Saturday, June 15, 2013 4:04 AM To: Airavata Dev d...@airavata.apache.org Subject: Re: Apache Airavata-OODT Integration Thanks Chris for your help! Working directory is available in JobExecutionContext in Airavata and directory can easily be retrieved. Issue in my case is that, from XBaya GUI I take product id as input not the file name. Internally file stager query the file manager using product id to retrieve product reference and corresponding file name to stage the file into input dir. Since this product id to file name mapping happens internally during the file staging, my implementation don't have access to filename unless I query the file manager to retrieve the corresponding file name using product id. One
Re: Apache Airavata-OODT Integration
Thanks Chris. I'll update the implementation to use file name instead of OODT product id. Cheers, Sanjaya On Sun, Jun 16, 2013 at 12:51 AM, Mattmann, Chris A (398J) chris.a.mattm...@jpl.nasa.gov wrote: Hey Sanjaya, sure +1 use the Filename. It's not guaranteed to be unique, but you can easily just pop the first one off the top (latest) and take that (since it's sorted by product received time). You may check out the pcs-core module and some of its internal classes like FileManagerUtils to see some cool helper functions that could aid in this regard. Cheers, Chris ++ Chris Mattmann, Ph.D. Senior Computer Scientist NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 171-266B, Mailstop: 171-246 Email: chris.a.mattm...@nasa.gov WWW: http://sunset.usc.edu/~mattmann/ ++ Adjunct Assistant Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA ++ -Original Message- From: Sanjaya Medonsa sanjaya...@gmail.com Reply-To: d...@airavata.apache.org d...@airavata.apache.org Date: Saturday, June 15, 2013 4:04 AM To: Airavata Dev d...@airavata.apache.org Subject: Re: Apache Airavata-OODT Integration Thanks Chris for your help! Working directory is available in JobExecutionContext in Airavata and directory can easily be retrieved. Issue in my case is that, from XBaya GUI I take product id as input not the file name. Internally file stager query the file manager using product id to retrieve product reference and corresponding file name to stage the file into input dir. Since this product id to file name mapping happens internally during the file staging, my implementation don't have access to filename unless I query the file manager to retrieve the corresponding file name using product id. One of the major issue in my implementation seems that I use OODT product id as input, not the file name. Should I change my implementation to use file name instead of product id ? Best Regards, Sanjaya On Fri, Jun 14, 2013 at 8:51 PM, Mattmann, Chris A (398J) chris.a.mattm...@jpl.nasa.gov wrote: Hey Sanjaya, Easy, see the attached PGEConfig.xml here: http://paste.apache.org/6OGW In that file: 1. We compute the staged file path by computing JobDir 2. We create in the exe block a staged input dir 3. We stage the files just using cps in the exeBlock (could have just as easily used fileStager) 4. We know that the file is [JobInputDir]/[Filename] HTH. Cheers, Chris ++ Chris Mattmann, Ph.D. Senior Computer Scientist NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 171-266B, Mailstop: 171-246 Email: chris.a.mattm...@nasa.gov WWW: http://sunset.usc.edu/~mattmann/ ++ Adjunct Assistant Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA ++ -Original Message- From: Sanjaya Medonsa sanjaya...@gmail.com Reply-To: d...@airavata.apache.org d...@airavata.apache.org Date: Friday, June 14, 2013 5:02 AM To: Airavata Dev d...@airavata.apache.org Subject: Re: Apache Airavata-OODT Integration Thanks Chris for your input. I actually use the PGETaskInstance for file staging with minimal additional code. But my issue issue not with the file staging. As per my current implementation, application inputs product id. Then using the capabilities in PGETaskInstance class, it does the file staging. But my issue is that during the file staging product is mapped to a file in specified working directory. I don't have a way to retrieve the staged file name, as it is not recorded in Metadata (For this purpose, I query the FileManager again to get the corresponding reference name for a given product id). I need the staged file path, since I modify the input product id into staged file path prior to actual workflow invocation. Basically I am looking for some implementation where I can easily retrieve, staged file path for a given product id. Cheers, Sanjaya On Wed, Jun 12, 2013 at 10:04 PM, Mattmann, Chris A (398J) chris.a.mattm...@jpl.nasa.gov wrote: Hi Sanjaya, -Original Message- From: Sanjaya Medonsa sanjaya...@gmail.com Reply-To: d...@airavata.apache.org d...@airavata.apache.org Date: Monday, June 10, 2013 5:20 PM To: d...@airavata.apache.org d...@airavata.apache.org Cc: dev@oodt.apache.org dev@oodt.apache.org Subject: Re: Apache Airavata-OODT Integration Hi Chris, On configuration, I have get rid of all
Re: Apache Airavata-OODT Integration
Hey Sanjaya, sure +1 use the Filename. It's not guaranteed to be unique, but you can easily just pop the first one off the top (latest) and take that (since it's sorted by product received time). You may check out the pcs-core module and some of its internal classes like FileManagerUtils to see some cool helper functions that could aid in this regard. Cheers, Chris ++ Chris Mattmann, Ph.D. Senior Computer Scientist NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 171-266B, Mailstop: 171-246 Email: chris.a.mattm...@nasa.gov WWW: http://sunset.usc.edu/~mattmann/ ++ Adjunct Assistant Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA ++ -Original Message- From: Sanjaya Medonsa sanjaya...@gmail.com Reply-To: d...@airavata.apache.org d...@airavata.apache.org Date: Saturday, June 15, 2013 4:04 AM To: Airavata Dev d...@airavata.apache.org Subject: Re: Apache Airavata-OODT Integration Thanks Chris for your help! Working directory is available in JobExecutionContext in Airavata and directory can easily be retrieved. Issue in my case is that, from XBaya GUI I take product id as input not the file name. Internally file stager query the file manager using product id to retrieve product reference and corresponding file name to stage the file into input dir. Since this product id to file name mapping happens internally during the file staging, my implementation don't have access to filename unless I query the file manager to retrieve the corresponding file name using product id. One of the major issue in my implementation seems that I use OODT product id as input, not the file name. Should I change my implementation to use file name instead of product id ? Best Regards, Sanjaya On Fri, Jun 14, 2013 at 8:51 PM, Mattmann, Chris A (398J) chris.a.mattm...@jpl.nasa.gov wrote: Hey Sanjaya, Easy, see the attached PGEConfig.xml here: http://paste.apache.org/6OGW In that file: 1. We compute the staged file path by computing JobDir 2. We create in the exe block a staged input dir 3. We stage the files just using cps in the exeBlock (could have just as easily used fileStager) 4. We know that the file is [JobInputDir]/[Filename] HTH. Cheers, Chris ++ Chris Mattmann, Ph.D. Senior Computer Scientist NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 171-266B, Mailstop: 171-246 Email: chris.a.mattm...@nasa.gov WWW: http://sunset.usc.edu/~mattmann/ ++ Adjunct Assistant Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA ++ -Original Message- From: Sanjaya Medonsa sanjaya...@gmail.com Reply-To: d...@airavata.apache.org d...@airavata.apache.org Date: Friday, June 14, 2013 5:02 AM To: Airavata Dev d...@airavata.apache.org Subject: Re: Apache Airavata-OODT Integration Thanks Chris for your input. I actually use the PGETaskInstance for file staging with minimal additional code. But my issue issue not with the file staging. As per my current implementation, application inputs product id. Then using the capabilities in PGETaskInstance class, it does the file staging. But my issue is that during the file staging product is mapped to a file in specified working directory. I don't have a way to retrieve the staged file name, as it is not recorded in Metadata (For this purpose, I query the FileManager again to get the corresponding reference name for a given product id). I need the staged file path, since I modify the input product id into staged file path prior to actual workflow invocation. Basically I am looking for some implementation where I can easily retrieve, staged file path for a given product id. Cheers, Sanjaya On Wed, Jun 12, 2013 at 10:04 PM, Mattmann, Chris A (398J) chris.a.mattm...@jpl.nasa.gov wrote: Hi Sanjaya, -Original Message- From: Sanjaya Medonsa sanjaya...@gmail.com Reply-To: d...@airavata.apache.org d...@airavata.apache.org Date: Monday, June 10, 2013 5:20 PM To: d...@airavata.apache.org d...@airavata.apache.org Cc: dev@oodt.apache.org dev@oodt.apache.org Subject: Re: Apache Airavata-OODT Integration Hi Chris, On configuration, I have get rid of all the configuration files, including pge-config.xml. All the required configurations are programmatically set. Configurations such FileManagerServer URL are configured in the airavata-server.properties file. I'll update the review request with modified details. Great work! Still I am not quite clear on how
Re: Apache Airavata-OODT Integration
Hi Sanjaya, -Original Message- From: Sanjaya Medonsa sanjaya...@gmail.com Reply-To: d...@airavata.apache.org d...@airavata.apache.org Date: Monday, June 10, 2013 5:20 PM To: d...@airavata.apache.org d...@airavata.apache.org Cc: dev@oodt.apache.org dev@oodt.apache.org Subject: Re: Apache Airavata-OODT Integration Hi Chris, On configuration, I have get rid of all the configuration files, including pge-config.xml. All the required configurations are programmatically set. Configurations such FileManagerServer URL are configured in the airavata-server.properties file. I'll update the review request with modified details. Great work! Still I am not quite clear on how to retrieve staged file path properly. Currently I am using getStagedFilePath method in ApacheAiravataWorkFlowInstanceImpl to regenerate the staged file path. While I am going through the OODT code that I have seen method in DataTransferer to notify FileManagerServer once transfer is completed. But I couldn't see the same for product retrieval. Example: http://svn.apache.org/repos/asf/oodt/trunk/pge/src/test/resources/pge-confi g.xml Review Board tickets: https://reviews.apache.org/r/4746/ https://reviews.apache.org/r/5382/ JIRA issue source (in OODT since 0.4): https://issues.apache.org/jira/browse/OODT-443 As you suggested I'll improve my workflow using Apache Tika. I'd like to continue this as an Parallal task. While modifying staging implementation based on community feedback, currently I am looking at ingesting output back to OODT. See above for info on file staging. I would strongly encourage you not to reimplement CAS-PGE in Airavata -- it's pretty functional and expressive anyways and I would work to figure out how to make Airavata leverage CAS-PGE. Cheers, Chris ++ Chris Mattmann, Ph.D. Senior Computer Scientist NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 171-266B, Mailstop: 171-246 Email: chris.a.mattm...@nasa.gov WWW: http://sunset.usc.edu/~mattmann/ ++ Adjunct Assistant Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA ++ On Wed, Jun 5, 2013 at 12:11 AM, Mattmann, Chris A (398J) chris.a.mattm...@jpl.nasa.gov wrote: Hi Sanjaya, I think starting out with /bin/ls would be good, maybe like a /bin/ls workflow, and then for each file returned, maybe run Apache Tika and extract its metadata and then pipe that to a file? How about that? Cheers, Chris ++ Chris Mattmann, Ph.D. Senior Computer Scientist NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 171-266B, Mailstop: 171-246 Email: chris.a.mattm...@nasa.gov WWW: http://sunset.usc.edu/~mattmann/ ++ Adjunct Assistant Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA ++ -Original Message- From: Sanjaya Medonsa sanjaya...@gmail.com Reply-To: d...@airavata.apache.org d...@airavata.apache.org Date: Tuesday, June 4, 2013 5:31 AM To: d...@airavata.apache.org d...@airavata.apache.org Cc: dev@oodt.apache.org dev@oodt.apache.org Subject: Re: Apache Airavata-OODT Integration Hi Chris, Please see my comments below on the two items. Configuration : It should be possible to set them programmatically. Actually I have implemented partly it for file staging information. I'll work to get rid of the other configuration files. Staged File Path : I'll work on the suggested approach, though I am not fully understand it at the moment. I guess I need to go through bit more on CAS-PGE and come back to you on the proposed approach. Currently I am testing this by wrapping /bin/ls command as GFac service. I may need to test this with real workflow. Could you please provide me know some guidance on better scenario to test this. Cheers, Sanjaya On Mon, Jun 3, 2013 at 8:17 PM, Mattmann, Chris A (398J) chris.a.mattm...@jpl.nasa.gov wrote: Hi Sanjaya, -Original Message- From: Sanjaya Medonsa sanjaya...@gmail.com Reply-To: d...@airavata.apache.org d...@airavata.apache.org Date: Thursday, May 30, 2013 5:12 AM To: dev@oodt.apache.org dev@oodt.apache.org, d...@airavata.apache.org d...@airavata.apache.org Subject: Apache Airavata-OODT Integration Hi, I have worked on the Apache Airavata integration with Apache OODT. As a first step, I have implemented integration with Apache OODT file manager component. Great work!! Comments below: 1. Introduce a new GFac Schema type called OODTProduct which takes APache OODT product IDs
Re: Apache Airavata-OODT Integration
+5000 great idea, as usual my friend. Cheers, Chris ++ Chris Mattmann, Ph.D. Senior Computer Scientist NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 171-266B, Mailstop: 171-246 Email: chris.a.mattm...@nasa.gov WWW: http://sunset.usc.edu/~mattmann/ ++ Adjunct Assistant Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA ++ -Original Message- From: Suresh Marru sma...@apache.org Reply-To: d...@airavata.apache.org d...@airavata.apache.org Date: Wednesday, June 12, 2013 9:51 AM To: d...@airavata.apache.org d...@airavata.apache.org Cc: dev@oodt.apache.org dev@oodt.apache.org Subject: Re: Apache Airavata-OODT Integration On Jun 12, 2013, at 12:34 PM, Mattmann, Chris A (398J) chris.a.mattm...@jpl.nasa.gov wrote: See above for info on file staging. I would strongly encourage you not to reimplement CAS-PGE in Airavata -- it's pretty functional and expressive anyways and I would work to figure out how to make Airavata leverage CAS-PGE. + 1. Sanjaya, Airavata and OODT communities, Any volunteers to write a paper on A tale of two apache workflow systems: Airavata and OODT? Given the page limit and to keep in scope, I suggest to leave out the use cases of the systems and focus on software architectures. A detailed technical paper comparing and contrasting the features and identifying potential collaborative components. If you want a deadline, how about August 15th to WORKS workshop - http://works.cs.cardiff.ac.uk/ Suresh
Re: Apache Airavata-OODT Integration
Hi Chris, On configuration, I have get rid of all the configuration files, including pge-config.xml. All the required configurations are programmatically set. Configurations such FileManagerServer URL are configured in the airavata-server.properties file. I'll update the review request with modified details. Still I am not quite clear on how to retrieve staged file path properly. Currently I am using getStagedFilePath method in ApacheAiravataWorkFlowInstanceImpl to regenerate the staged file path. While I am going through the OODT code that I have seen method in DataTransferer to notify FileManagerServer once transfer is completed. But I couldn't see the same for product retrieval. As you suggested I'll improve my workflow using Apache Tika. I'd like to continue this as an Parallal task. While modifying staging implementation based on community feedback, currently I am looking at ingesting output back to OODT. Best Regards, Sanjaya On Wed, Jun 5, 2013 at 12:11 AM, Mattmann, Chris A (398J) chris.a.mattm...@jpl.nasa.gov wrote: Hi Sanjaya, I think starting out with /bin/ls would be good, maybe like a /bin/ls workflow, and then for each file returned, maybe run Apache Tika and extract its metadata and then pipe that to a file? How about that? Cheers, Chris ++ Chris Mattmann, Ph.D. Senior Computer Scientist NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 171-266B, Mailstop: 171-246 Email: chris.a.mattm...@nasa.gov WWW: http://sunset.usc.edu/~mattmann/ ++ Adjunct Assistant Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA ++ -Original Message- From: Sanjaya Medonsa sanjaya...@gmail.com Reply-To: d...@airavata.apache.org d...@airavata.apache.org Date: Tuesday, June 4, 2013 5:31 AM To: d...@airavata.apache.org d...@airavata.apache.org Cc: dev@oodt.apache.org dev@oodt.apache.org Subject: Re: Apache Airavata-OODT Integration Hi Chris, Please see my comments below on the two items. Configuration : It should be possible to set them programmatically. Actually I have implemented partly it for file staging information. I'll work to get rid of the other configuration files. Staged File Path : I'll work on the suggested approach, though I am not fully understand it at the moment. I guess I need to go through bit more on CAS-PGE and come back to you on the proposed approach. Currently I am testing this by wrapping /bin/ls command as GFac service. I may need to test this with real workflow. Could you please provide me know some guidance on better scenario to test this. Cheers, Sanjaya On Mon, Jun 3, 2013 at 8:17 PM, Mattmann, Chris A (398J) chris.a.mattm...@jpl.nasa.gov wrote: Hi Sanjaya, -Original Message- From: Sanjaya Medonsa sanjaya...@gmail.com Reply-To: d...@airavata.apache.org d...@airavata.apache.org Date: Thursday, May 30, 2013 5:12 AM To: dev@oodt.apache.org dev@oodt.apache.org, d...@airavata.apache.org d...@airavata.apache.org Subject: Apache Airavata-OODT Integration Hi, I have worked on the Apache Airavata integration with Apache OODT. As a first step, I have implemented integration with Apache OODT file manager component. Great work!! Comments below: 1. Introduce a new GFac Schema type called OODTProduct which takes APache OODT product IDs as input. 2. Implemented new pre GFac Handler by extending Apache OODT PgeTaskInstance to stage the corresponding file into the working directory. 3. Once file is staged, input parameter with OODT product id is replaced with path of the staged file for downstream processing I have tested the implementation with Gfac application which wraps /bin/ls command. Application takes product id as input and stage corresponding file into the working directory and /bin/ls is executed against the staged file. Hope this is a valid testing scenario. Concerns - Configurations : I have added new configuration file named and oodt-integration.properties in addition to dynamic_metadata.met and pge-config.xml files used by OODT. But at the moment there is no item configured with the oodt-integration.properties. You probably only need the pge-config.xml file. Dynamic metadata, and the task configuration properties can be specified programmatically, right? - Staged File Name - With the current implementation of PgeTaskInstance it is not possible to retrieve path of the staged file. Due to this limitation, I have query the FileManagerServer with product id and retrieve the file name and computed the file path using information of working directory. I'm not sure I understand this? If you store
Re: Apache Airavata-OODT Integration
Hi Chris, I have just realized that Airavata GFac Handler has been updated to include Gfac Handler specific configuration recently. I think I should move the configurations in airavata-server.properties into gfac-config.xml as properties of the GFac handler which performs the OODT File Staging Best Regards, Sanjaya On Tue, Jun 11, 2013 at 5:50 AM, Sanjaya Medonsa sanjaya...@gmail.comwrote: Hi Chris, On configuration, I have get rid of all the configuration files, including pge-config.xml. All the required configurations are programmatically set. Configurations such FileManagerServer URL are configured in the airavata-server.properties file. I'll update the review request with modified details. Still I am not quite clear on how to retrieve staged file path properly. Currently I am using getStagedFilePath method in ApacheAiravataWorkFlowInstanceImpl to regenerate the staged file path. While I am going through the OODT code that I have seen method in DataTransferer to notify FileManagerServer once transfer is completed. But I couldn't see the same for product retrieval. As you suggested I'll improve my workflow using Apache Tika. I'd like to continue this as an Parallal task. While modifying staging implementation based on community feedback, currently I am looking at ingesting output back to OODT. Best Regards, Sanjaya On Wed, Jun 5, 2013 at 12:11 AM, Mattmann, Chris A (398J) chris.a.mattm...@jpl.nasa.gov wrote: Hi Sanjaya, I think starting out with /bin/ls would be good, maybe like a /bin/ls workflow, and then for each file returned, maybe run Apache Tika and extract its metadata and then pipe that to a file? How about that? Cheers, Chris ++ Chris Mattmann, Ph.D. Senior Computer Scientist NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 171-266B, Mailstop: 171-246 Email: chris.a.mattm...@nasa.gov WWW: http://sunset.usc.edu/~mattmann/ ++ Adjunct Assistant Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA ++ -Original Message- From: Sanjaya Medonsa sanjaya...@gmail.com Reply-To: d...@airavata.apache.org d...@airavata.apache.org Date: Tuesday, June 4, 2013 5:31 AM To: d...@airavata.apache.org d...@airavata.apache.org Cc: dev@oodt.apache.org dev@oodt.apache.org Subject: Re: Apache Airavata-OODT Integration Hi Chris, Please see my comments below on the two items. Configuration : It should be possible to set them programmatically. Actually I have implemented partly it for file staging information. I'll work to get rid of the other configuration files. Staged File Path : I'll work on the suggested approach, though I am not fully understand it at the moment. I guess I need to go through bit more on CAS-PGE and come back to you on the proposed approach. Currently I am testing this by wrapping /bin/ls command as GFac service. I may need to test this with real workflow. Could you please provide me know some guidance on better scenario to test this. Cheers, Sanjaya On Mon, Jun 3, 2013 at 8:17 PM, Mattmann, Chris A (398J) chris.a.mattm...@jpl.nasa.gov wrote: Hi Sanjaya, -Original Message- From: Sanjaya Medonsa sanjaya...@gmail.com Reply-To: d...@airavata.apache.org d...@airavata.apache.org Date: Thursday, May 30, 2013 5:12 AM To: dev@oodt.apache.org dev@oodt.apache.org, d...@airavata.apache.org d...@airavata.apache.org Subject: Apache Airavata-OODT Integration Hi, I have worked on the Apache Airavata integration with Apache OODT. As a first step, I have implemented integration with Apache OODT file manager component. Great work!! Comments below: 1. Introduce a new GFac Schema type called OODTProduct which takes APache OODT product IDs as input. 2. Implemented new pre GFac Handler by extending Apache OODT PgeTaskInstance to stage the corresponding file into the working directory. 3. Once file is staged, input parameter with OODT product id is replaced with path of the staged file for downstream processing I have tested the implementation with Gfac application which wraps /bin/ls command. Application takes product id as input and stage corresponding file into the working directory and /bin/ls is executed against the staged file. Hope this is a valid testing scenario. Concerns - Configurations : I have added new configuration file named and oodt-integration.properties in addition to dynamic_metadata.met and pge-config.xml files used by OODT. But at the moment there is no item configured with the oodt-integration.properties. You probably only need the pge-config.xml file. Dynamic metadata, and the task