Hi Chris,
       On configuration, I have get rid of all the configuration files,
including pge-config.xml. All the required configurations are
programmatically set.  Configurations such FileManagerServer URL are
configured in the airavata-server.properties file. I'll update the review
request with modified details.
       Still I am not quite clear on how to retrieve staged file path
properly. Currently I am using getStagedFilePath method
in ApacheAiravataWorkFlowInstanceImpl to regenerate the staged file path.
While I am going through the OODT code that I have seen method in
DataTransferer to notify FileManagerServer once transfer is completed. But
I couldn't see the same for product retrieval.
       As you suggested I'll improve my workflow using Apache Tika. I'd
like to continue this as an Parallal task. While modifying staging
implementation based on community feedback, currently I am looking at
ingesting output back to OODT.

Best Regards,
Sanjaya



On Wed, Jun 5, 2013 at 12:11 AM, Mattmann, Chris A (398J) <
chris.a.mattm...@jpl.nasa.gov> wrote:

> Hi Sanjaya,
>
> I think starting out with /bin/ls would be good, maybe like a /bin/ls
> workflow, and then for each file returned, maybe run Apache Tika and
> extract its metadata and then pipe that to a file?
>
> How about that?
>
> Cheers,
> Chris
>
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> Chris Mattmann, Ph.D.
> Senior Computer Scientist
> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> Office: 171-266B, Mailstop: 171-246
> Email: chris.a.mattm...@nasa.gov
> WWW:  http://sunset.usc.edu/~mattmann/
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> Adjunct Assistant Professor, Computer Science Department
> University of Southern California, Los Angeles, CA 90089 USA
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>
>
>
>
>
>
> -----Original Message-----
> From: Sanjaya Medonsa <sanjaya...@gmail.com>
> Reply-To: "d...@airavata.apache.org" <d...@airavata.apache.org>
> Date: Tuesday, June 4, 2013 5:31 AM
> To: "d...@airavata.apache.org" <d...@airavata.apache.org>
> Cc: "dev@oodt.apache.org" <dev@oodt.apache.org>
> Subject: Re: Apache Airavata-OODT Integration
>
> >Hi Chris,
> >     Please see my comments below on the two items.
> >
> >Configuration : It should be possible to set them programmatically.
> >Actually I have implemented partly it for file staging information. I'll
> >work to get rid of the other configuration files.
> >
> >Staged File Path : I'll work on the suggested approach, though I am not
> >fully understand it at the moment. I guess I need to go through bit more
> >on
> >CAS-PGE and come back to you on the proposed approach.
> >
> >Currently I am testing this by wrapping /bin/ls command as GFac service. I
> >may need to test this with real workflow. Could you please provide me know
> >some guidance on better scenario to test this.
> >
> >Cheers,
> >Sanjaya
> >
> >
> >
> >
> >On Mon, Jun 3, 2013 at 8:17 PM, Mattmann, Chris A (398J) <
> >chris.a.mattm...@jpl.nasa.gov> wrote:
> >
> >> Hi Sanjaya,
> >>
> >> -----Original Message-----
> >>
> >> From: Sanjaya Medonsa <sanjaya...@gmail.com>
> >> Reply-To: "d...@airavata.apache.org" <d...@airavata.apache.org>
> >> Date: Thursday, May 30, 2013 5:12 AM
> >> To: "dev@oodt.apache.org" <dev@oodt.apache.org>,
> >>"d...@airavata.apache.org"
> >> <d...@airavata.apache.org>
> >> Subject: Apache Airavata-OODT Integration
> >>
> >> >Hi,
> >> >     I have worked on the Apache Airavata integration with Apache
> >>OODT. As
> >> >a first step, I have implemented integration with Apache OODT file
> >> >manager component.
> >>
> >> Great work!!
> >>
> >> Comments below:
> >>
> >> >      1. Introduce a new GFac Schema type called OODTProduct which
> >>takes
> >> >APache OODT product IDs as input.
> >> >      2. Implemented new pre GFac Handler by extending Apache OODT
> >> >PgeTaskInstance to stage the corresponding file into the working
> >> >directory.
> >> >      3. Once file is staged, input parameter with OODT product id is
> >> >replaced with path of the staged file for downstream processing
> >> >
> >> >I have tested the implementation with Gfac application which wraps
> >>/bin/ls
> >> >command. Application takes product id as input and stage corresponding
> >> >file
> >> >into the working directory and /bin/ls is executed against the staged
> >> >file.
> >> >Hope this is a valid testing scenario.
> >> >
> >> >Concerns
> >> >- Configurations : I have added new configuration file named and
> >> >oodt-integration.properties in addition to dynamic_metadata.met and
> >> >pge-config.xml files used by OODT. But at the moment there is no item
> >> >configured with the oodt-integration.properties.
> >>
> >> You probably only need the pge-config.xml file. Dynamic metadata, and
> >>the
> >> task configuration properties can be specified programmatically, right?
> >>
> >> >- Staged File Name - With the current implementation of
> >>PgeTaskInstance it
> >> >is not possible to retrieve path of the staged file. Due to this
> >> >limitation, I have query the FileManagerServer with product id and
> >> >retrieve
> >> >the file name and computed the file path using information of working
> >> >directory.
> >>
> >> I'm not sure I understand this? If you store and record the Filename,
> >>and
> >> FileLocation
> >> metadata files, then you can easily retrieve the staged file path via a
> >> SQLquery
> >> via CAS-PGE by simply setting the FORMAT=('$FileLocation/$Filename') in
> >> the response.
> >> Can you comment on this?
> >>
> >> >- Currently it is not possible to execute the workflow using Xbaya due
> >>to
> >> >validation failure due to new schema type. I have commented out the
> >> >relevant validation code for testing purpose.
> >>
> >> OK, will probably need to work on this.
> >>
> >> >
> >> >Currently I am having an issue with review board client tool and need
> >>to
> >> >resolve it to upload the code for review.
> >>
> >> I see later that you got this working, so will head over and review that
> >> now.
> >>
> >> Thanks!
> >>
> >> Cheers,
> >> Chris
> >>
> >> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >> Chris Mattmann, Ph.D.
> >> Senior Computer Scientist
> >> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> >> Office: 171-266B, Mailstop: 171-246
> >> Email: chris.a.mattm...@nasa.gov
> >> WWW:  http://sunset.usc.edu/~mattmann/
> >> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >> Adjunct Assistant Professor, Computer Science Department
> >> University of Southern California, Los Angeles, CA 90089 USA
> >> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >>
> >>
> >>
> >>
>
>

Reply via email to