Hi Zichuan,

Please see:

http://oodt.apache.org/components/maven/crawler/user/


And also see the UpdateWorkflowStatusToIngest crawler action.

Cheers,
Chris


++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Chief Architect
Instrument Software and Science Data Systems Section (398)
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 168-519, Mailstop: 168-527
Email: chris.a.mattm...@nasa.gov
WWW:  http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Adjunct Associate Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++






-----Original Message-----
From: Zichuan Wang <zichu...@usc.edu>
Date: Tuesday, October 28, 2014 at 9:37 PM
To: Luke <shuai...@usc.edu>
Cc: Chris Mattmann <chris.a.mattm...@jpl.nasa.gov>, "dev@oodt.apache.org"
<dev@oodt.apache.org>, Chris Mattmann <mattm...@usc.edu>,
"zhouj...@usc.edu" <zhouj...@usc.edu>, "xiaoy...@usc.edu"
<xiaoy...@usc.edu>
Subject: 回复: Question about OODT file manager

>Dear Professor, 
>
>
>We are stuck in OODT. The most critical problem we have now is
>
>
>“How to make crawler work with workflow”?
>
>
>-- 
>Zichuan Wang
>University of Southern California, Department of Computer Science
>
>
>
>在 2014年10月28日 星期二,下午12:52,Luke 写道:
>
>Dear Professor Mattamnn,
>Thanks a lot Professor Mattmann for the kind help, it is appreciated,
>sorry for getting back to you with my appreciation, I have been
>conducting tests with OODT based on your advice, but unfortunately I am
>having another problem....
>
>
>I am following the steps
>(https://cwiki.apache.org/confluence/display/OODT/CAS-PGE+Learn+by+Example
>) to get a sense of how to get workflow to work.
>The problem is that the File-Concatenator-PGE (by running the wmgr-client
>command-line) does not seems to be invoked or executed, but I am seeing
>the tasks are getting stacked up in the workflow manager with status
>either "RSUBMIT" or "QUEUED", but they
> are not getting executed, PFA: workflow_monitor.jpg, please note, by
>default the workflow min pool size is 6; so here comes another problem, i
>have 6 submitted tasks with status RSUBMIT, but any new incoming tasks
>will be forwarded to the waiting QUEUE with
> status "QUEUED"...please refer to the workflow_monitor.jpg for details,
>where I have 3 QUEUED workflow task and 6 RSUMBITE tasks.
>
>
>
>Question 1): not sure why the workflow is not being executed, and hanging
>at the state of "RSUBMIT", after enabling the log level, I am seeing the
>following entry in the log, not sure if this has anything to do with the
>"hanging" problem where workflow
> is not getting executed and hanging at state of "RSUBMIT".
>Oct 28, 2014 3:35:07 AM
>org.apache.oodt.cas.workflow.engine.IterativeWorkflowProcessorThread
>safeCheckJobComplete
>WARNING: Exception checking completion status for job:
>[2014-10-28T01:59:32.813-07:00]: Messsage: java.lang.Exception:
>java.lang.NullPointerException
>
>
>Question 2): I think currently on my side any new incoming workflow task
>I am sending with the following command is being directed to the waiting
>"QUEUE" because of the min pool size (i.e. 6) (I can increase this to a
>larger number though),
>
>./wmgr-client --url http://localhost:9200 --operation --sendEvent
>--eventName fileconcatenator-pge --metaData --key RunID testNumber1
>If possible, I would like to please know if there is a way we can purge
>the queue and get rid of those workflow tasks either in "RSUMBIT" and
>"QUEUED" I have already sent, please kindly help.
>
>
>Very sorry for troubling you with this, to be honest I find OODT a bit
>challenging to grasp within a short time frame, probably because there is
>no book like OODT in action like Solr.... and what I am doing is just
>trial and error blended with guess, but
> I don’t want to make a blind guess, it will be appreciated if you can
>please also shed some lights on where I can get more information logging
>or other way where I can troubleshoot. I think it might be worth tracking
>what is happening when workflow reach the
> status "RSUBMIT" and how to get a specific logging info specific to it...
>
>
>Again your advice and kind help will be appreciated usual.
>
>
>
>
>Thanks
>Luke
>
>
>
>-----Original Message-----
>From: Mattmann, Chris A (3980) [mailto:chris.a.mattm...@jpl.nasa.gov]
>Sent: 2014年10月26日 22:18
>To: Luke; 'Zichuan Wang'
>Cc: 'Christian Alan Mattmann'; zhouj...@usc.edu;
>xiaoy...@usc.edu;
>dev@oodt.apache.org
>Subject: Re: re: Question about OODT file manager
>
>
>Hi Luke,
>
>
>Thanks and sorry it’s taken me a while to reply. Here are some details
>below:
>
>
>
>
>-----Original Message-----
>From: Luke <shuai...@usc.edu>
>Date: Sunday, October 26, 2014 at 6:19 PM
>To: Chris Mattmann <chris.a.mattm...@jpl.nasa.gov>, 'Zichuan Wang'
><zichu...@usc.edu>
>Cc: Chris Mattmann <mattm...@usc.edu>, "zhouj...@usc.edu"
><zhouj...@usc.edu>, "xiaoy...@usc.edu" <xiaoy...@usc.edu>,
>"dev@oodt.apache.org" <dev@oodt.apache.org>
>Subject: RE: re: Question about OODT file manager
>
>
>
>Hi Professor Mattmann and OODT DEV,
>
>
>Sorry to trouble you with this email, our team has been struggling in
>the oodt to send json files to solr.
>One of the difficulties is still getting OODT workflow to call the
>poster.py in etllib.
>
>
>
>
>
>Sorry that you’re having difficulty let me try and help.
>
>
>
>
>
>I am not sure if my understanding is correct with OODT requirement, I
>hope you can please kindly advice and help with our confusion.
>
>
>a set of goals in my mind with OODT is as follows, please kindly
>confirm and clarify:
>
>
>1)
>Get the File-Manager up and running.
>
>
>
>
>
>Yep, hopefully as installed via OODT RADIX.
>
>
>
>2)
>send all json files with command wmgr-client to the fileManager server.
>(I believe we can achieve it with a bash script or probably python
>that calls the command line sequentially with each json file name as an
>argument?!)
>
>
>
>
>
>Suggestion:
>
>
>1. Use the OODT crawler and file manager to crawl/index the JSON files (in
>place data transfer).
>2. Take a look at CAS-PGE, it will help you write a workflow task that
>will wrap
>ETLlib and the poster command.
>3. Once you are confident with #2, whip up a script that pages through
>all of
>your indexed JSON files, and then for each one, submits a workflow event
>(you
>may need to look into aggregating them) that calls your CAS-PGE wrapped
>poster task from ETLlib.
>
>
>
>3)
>Once we have json files sent and stored in the File-Manager, we need to
>get workflow-manager up and running, and we can create a workflow that
>send those jsons file from the file manager to solr.
>
>
>
>
>
>See above.
>
>
>
>4)
>Create a workflow according to
>Workflow2 User Guide
><https://cwiki.apache.org/confluence/display/OODT/Workflow2+User+Guide>
>
>here comes the problem…..
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>I am not sure how to create a workflow task which can call the
>poster.py in python etllib, it looks like we need to create our own
>java class that extend <TaskInstance> which is an abstract Java class
>with one abstract method that has the following signature:
>
>
>
>
>protectedabstract ResultsState performExecution(ControlMetadata
>crtlMetadata);
>However, the detail of where to find the corresponding libs
>and where to put our implementation in workflow manager is being
>neglected in that page. I am not sure if we should use TaskInstance,
>but it seems the workflow has to have an interface thru which it can
>call the python code i.e. poster.py. and it looks like we need to
>embody the TaskInstance::performExecution by injecting the code that
>calls the poster.py and return the resultState.
>
>
>
>
>It would be greatly appreciated if you could please shed some lights
>and advice how we can get a task instance to call the poster.py. BTW, I
>am also not sure if my understanding is correct, please kindly correct
>it if inappropriate. Your help will be appreciated as usual.
>
>
>
>
>
>
>Thanks
>Luke
>
>
>
>
>
>Thanks Luke, see above. Let me know if it helps.
>
>
>Cheers!
>
>
>Chris
>
>
>
>
>
>From: Mattmann, Chris A (3980) [mailto:chris.a.mattm...@jpl.nasa.gov]
>
>
>Sent: 2014年10月25日
>13:34
>To: Zichuan Wang
>Cc: Christian Alan Mattmann; Luke; zhouj...@usc.edu;
>xiaoy...@usc.edu
>Subject: Re: 回复: Question about OODT file manager
>
>
>
>
>
>
>Please cc
>dev@oodt.apache.org <mailto:dev@oodt.apache.org> I will reply in detail
>soon
>
>
>Sent from my iPhone
>
>
>
>
>
>
>
>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>++
>Chris Mattmann, Ph.D.
>Chief Architect
>Instrument Software and Science Data Systems Section (398) NASA Jet
>Propulsion Laboratory Pasadena, CA 91109 USA
>Office: 168-519, Mailstop: 168-527
>Email: chris.a.mattm...@nasa.gov
>WWW: http://sunset.usc.edu/~mattmann/
>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>++
>Adjunct Associate Professor, Computer Science Department University of
>Southern California, Los Angeles, CA 90089 USA
>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>++
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>On Oct 25, 2014, at 1:26 PM, "Zichuan Wang" <zichu...@usc.edu> wrote:
>
>
>
>
>Dear Professor,
>
>
>
>
>
>
>Could please also explain how I can crawl all JSON file name under a
>specific directory using CAS-PGE? I’ll work through this example
>https://cwiki.apache.org/confluence/display/OODT/CAS-PGE+Learn+by+Exam
>
>
>
>p
>
>le, but it doesn’t mention anything about crawling, instead it
>manually set the Input files paths...
>
>
>
>
>
>
>
>
>--
>
>
>Zichuan Wang
>
>
>University of Southern California, Department of Computer Science
>
>
>
>
>
>
>
>
>在 2014年10月25日 星期六,下午12:10,Zichuan Wang
>写道:
>
>
>Dear Professor,
>
>
>
>
>
>
>In assignment 2 specification I noticed that you mentioned OODT File
>Manager, but from my understanding, we are using ETLLib poster which
>talks directly to Solr. So how can we use OODT File Manager in this
>assignment?
>
>
>
>
>
>
>--
>
>
>Zichuan Wang
>
>
>University of Southern California, Department of Computer Science
>
>
>
>
>
>
>
>
>
>附件:
>- workflow_monitor.jpg
>
>
>
>
>

Reply via email to