Jon,
  Those are good areas to check. Few things I have seen regarding those are

 1) JythonScriptEngine -PythonInterpreter is static and is not suitable for
multiple runs if the script names are same (hit this issue in PIG-2433 unit
tests).
 2) QueryParserDriver - There is a static cache with macro name to macro
file mapping. So same macro names with different file locations will cause
problems.
 3) FileLocalizer.relativeRoot - If single cluster no issues. Just need to
reinitialize if supporting Multiple clusters.

Regards,
Rohini


On Fri, Jan 25, 2013 at 9:37 AM, Jonathan Coveney <jcove...@gmail.com>wrote:

> user to bcc, +dev
>
> Cheolsoo,
>
> Can you make a JIRA for this? I can imagine a slightly heavier test suite,
> but I like where you started. If it's not far off, then I think it'll be a
> win to make it thread safe. But we need to make sure to test the most
> advanced features...UDF's (esp the same name but different udf in different
> invocations), scripting UDFs (same thing), and so on.
>
>
> 2013/1/25 Cheolsoo Park <cheol...@cloudera.com>
>
> > >> if you have multiple threads that run a query via PigServer, there is
> a
> > great chance of the internals clashing because of the use of static
> > variable within Pig.
> >
> > Recently, I spent some time on this, and what I found is that the Pig
> > front-end is quite thread-safe. Here is how I tested it:
> >
> > 1) Wrote a PigUnit test that runs in MR mode.
> > 2) Executed test cases concurrently in 4 threads using a JUnit extension
> > called temps-fugit:
> > http://tempusfugitlibrary.org/documentation/junit/parallel/
> >
> > After fixing PIG-3096, I was able to successfully run Pig queries in
> > parallel. It's important to note that only the front-end needs to be
> > thread-safe since that's what is executed in parallel.
> >
> > I arbitrarily selected queries from e2e test cases, so they are probably
> > not complex enough to mimic real-world examples. Nevertheless, my test
> > program ran without a problem for few days. I couldn't continue my
> > experiment because I was pulled out into something else. However, I think
> > that making the front-end thread-safe is an achievable goal.
> >
> > Thanks,
> > Cheolsoo
> >
> >
> >
> > On Thu, Jan 24, 2013 at 11:18 PM, Ramakrishna Nalam
> > <nramakris...@gmail.com>wrote:
> >
> > > That clarifies it for me, thanks a lot.
> > >
> > > Regards,
> > > Rama.
> > >
> > >
> > > On Fri, Jan 25, 2013 at 10:09 AM, Jonathan Coveney <jcove...@gmail.com
> > > >wrote:
> > >
> > > > Well, when I say that Pig is not multi-threaded, what I mean is that
> if
> > > you
> > > > have multiple threads that run a query via PigServer, there is a
> great
> > > > chance of the internals clashing because of the use of static
> variables
> > > > within Pig. Pig itself, when running a single query, is
> multi-threaded.
> > > > It's just not "multi-threaded" in the sense that multiple instances
> can
> > > > safely be run in the same JVM.
> > > >
> > > >
> > > > 2013/1/24 Ramakrishna Nalam <nramakris...@gmail.com>
> > > >
> > > > > Hi Jonathan,
> > > > >
> > > > > Pardon if it's a naive question, but Interesting that you say Pig
> is
> > > not
> > > > > multithreaded.
> > > > > We're using Pig 0.10.0, and looking at the code, it seems to do the
> > > right
> > > > > things to handle multi threaded requests (ThreadLocal for
> ScriptState
> > > for
> > > > > eg).
> > > > >
> > > > > Would be great if you can point out to the kind of issues there
> could
> > > be.
> > > > >
> > > > >
> > > > > Regards,
> > > > > Rama.
> > > > >
> > > > >
> > > > >
> > > > > On Thu, Jan 24, 2013 at 8:32 PM, Praveen M <
> lefthandma...@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > Are there any plans on making the pigserver multi-threaded?
> > > > > >
> > > > > > since there is "PigProcessNotificationListener" to subscribe for
> > > async
> > > > > > callbacks when the pig job completes, is there any real need to
> > keep
> > > > the
> > > > > > pig job submitting thread waiting until the job completes?
> > > > > >
> > > > > > Is this just a shortcoming today or are there more concrete
> reasons
> > > > > against
> > > > > > providing with a pigserver which can submit to the cluster in
> > > mapreduce
> > > > > > mode async?
> > > > > >
> > > > > > Thanks,
> > > > > > Praveen
> > > > > >
> > > > > >
> > > > > >
> > > > > > On Wed, Jan 23, 2013 at 10:56 PM, Jonathan Coveney <
> > > jcove...@gmail.com
> > > > > > >wrote:
> > > > > >
> > > > > > > I think whatever way you slice it, handling thousands of pig
> jobs
> > > > > > > asynchronously is going to be a bear. I mean, this is
> essentially
> > > > what
> > > > > > the
> > > > > > > job tracker does, albeit with a lot less information.
> > > > > > >
> > > > > > > Either way, Pig is not multi-threaded so having more than one
> > > > instance
> > > > > of
> > > > > > > Pig in the same JVM is going to start causing problems (which
> is
> > > > why, I
> > > > > > > imagine, there is no async way to call Pig). So multiple
> > processes
> > > is
> > > > > > > really the only way around it that I know of.
> > > > > > >
> > > > > > > At Twitter we have a deployment of mesos, and our long term
> > > solution
> > > > is
> > > > > > > going to be running all of our pig jobs on mesos, in the short
> > term
> > > > by
> > > > > > > deploying daemons that run pig jobs as local processes.
> > > > > > >
> > > > > > >
> > > > > > > 2013/1/23 Prashant Kommireddi <prash1...@gmail.com>
> > > > > > >
> > > > > > > > Both. Think of it as an app server handling all of these
> > > requests.
> > > > > > > >
> > > > > > > > Sent from my iPhone
> > > > > > > >
> > > > > > > > On Jan 23, 2013, at 9:09 PM, Jonathan Coveney <
> > > jcove...@gmail.com>
> > > > > > > wrote:
> > > > > > > >
> > > > > > > > > Thousands of requests, or thousands of Pig jobs? Or both?
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > 2013/1/23 Prashant Kommireddi <prash1...@gmail.com>
> > > > > > > > >
> > > > > > > > >> Did not want to have several threads launched for this. We
> > > might
> > > > > > have
> > > > > > > > >> thousands of requests coming in, and the app is doing a
> lot
> > > more
> > > > > > than
> > > > > > > > only
> > > > > > > > >> Pig.
> > > > > > > > >>
> > > > > > > > >> On Wed, Jan 23, 2013 at 5:44 PM, Jonathan Coveney <
> > > > > > jcove...@gmail.com
> > > > > > > > >>> wrote:
> > > > > > > > >>
> > > > > > > > >>> start a separate Process which runs Pig?
> > > > > > > > >>>
> > > > > > > > >>>
> > > > > > > > >>> 2013/1/23 Prashant Kommireddi <prash1...@gmail.com>
> > > > > > > > >>>
> > > > > > > > >>>> Hey guys,
> > > > > > > > >>>>
> > > > > > > > >>>> I am trying to do the following:
> > > > > > > > >>>>
> > > > > > > > >>>>   1. Launch a pig job asynchronously via Java program
> > > > > > > > >>>>   2. Get a notification once the job is complete
> > (something
> > > > > > similar
> > > > > > > to
> > > > > > > > >>>>   Hadoop callback with a servlet)
> > > > > > > > >>>>
> > > > > > > > >>>> I looked at PigServer.executeBatch() and it seems to be
> > > > waiting
> > > > > > > until
> > > > > > > > >> job
> > > > > > > > >>>> completes.This is not what I would like my app to do.
> > > > > > > > >>>>
> > > > > > > > >>>> Any ideas?
> > > > > > > > >>>>
> > > > > > > > >>>> Thanks,
> > > > > > > > >>>>
> > > > > > > > >>>
> > > > > > > > >>
> > > > > > > >
> > > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > --
> > > > > > -Praveen
> > > > > >
> > > > >
> > > >
> > >
> >
>

Reply via email to