Thank you for the suggestions. I will file a jira and add our discussion there.
On Fri, Jan 25, 2013 at 4:23 PM, Rohini Palaniswamy <rohini.adi...@gmail.com > wrote: > Jon, > Those are good areas to check. Few things I have seen regarding those are > > 1) JythonScriptEngine -PythonInterpreter is static and is not suitable for > multiple runs if the script names are same (hit this issue in PIG-2433 unit > tests). > 2) QueryParserDriver - There is a static cache with macro name to macro > file mapping. So same macro names with different file locations will cause > problems. > 3) FileLocalizer.relativeRoot - If single cluster no issues. Just need to > reinitialize if supporting Multiple clusters. > > Regards, > Rohini > > > On Fri, Jan 25, 2013 at 9:37 AM, Jonathan Coveney <jcove...@gmail.com > >wrote: > > > user to bcc, +dev > > > > Cheolsoo, > > > > Can you make a JIRA for this? I can imagine a slightly heavier test > suite, > > but I like where you started. If it's not far off, then I think it'll be > a > > win to make it thread safe. But we need to make sure to test the most > > advanced features...UDF's (esp the same name but different udf in > different > > invocations), scripting UDFs (same thing), and so on. > > > > > > 2013/1/25 Cheolsoo Park <cheol...@cloudera.com> > > > > > >> if you have multiple threads that run a query via PigServer, there > is > > a > > > great chance of the internals clashing because of the use of static > > > variable within Pig. > > > > > > Recently, I spent some time on this, and what I found is that the Pig > > > front-end is quite thread-safe. Here is how I tested it: > > > > > > 1) Wrote a PigUnit test that runs in MR mode. > > > 2) Executed test cases concurrently in 4 threads using a JUnit > extension > > > called temps-fugit: > > > http://tempusfugitlibrary.org/documentation/junit/parallel/ > > > > > > After fixing PIG-3096, I was able to successfully run Pig queries in > > > parallel. It's important to note that only the front-end needs to be > > > thread-safe since that's what is executed in parallel. > > > > > > I arbitrarily selected queries from e2e test cases, so they are > probably > > > not complex enough to mimic real-world examples. Nevertheless, my test > > > program ran without a problem for few days. I couldn't continue my > > > experiment because I was pulled out into something else. However, I > think > > > that making the front-end thread-safe is an achievable goal. > > > > > > Thanks, > > > Cheolsoo > > > > > > > > > > > > On Thu, Jan 24, 2013 at 11:18 PM, Ramakrishna Nalam > > > <nramakris...@gmail.com>wrote: > > > > > > > That clarifies it for me, thanks a lot. > > > > > > > > Regards, > > > > Rama. > > > > > > > > > > > > On Fri, Jan 25, 2013 at 10:09 AM, Jonathan Coveney < > jcove...@gmail.com > > > > >wrote: > > > > > > > > > Well, when I say that Pig is not multi-threaded, what I mean is > that > > if > > > > you > > > > > have multiple threads that run a query via PigServer, there is a > > great > > > > > chance of the internals clashing because of the use of static > > variables > > > > > within Pig. Pig itself, when running a single query, is > > multi-threaded. > > > > > It's just not "multi-threaded" in the sense that multiple instances > > can > > > > > safely be run in the same JVM. > > > > > > > > > > > > > > > 2013/1/24 Ramakrishna Nalam <nramakris...@gmail.com> > > > > > > > > > > > Hi Jonathan, > > > > > > > > > > > > Pardon if it's a naive question, but Interesting that you say Pig > > is > > > > not > > > > > > multithreaded. > > > > > > We're using Pig 0.10.0, and looking at the code, it seems to do > the > > > > right > > > > > > things to handle multi threaded requests (ThreadLocal for > > ScriptState > > > > for > > > > > > eg). > > > > > > > > > > > > Would be great if you can point out to the kind of issues there > > could > > > > be. > > > > > > > > > > > > > > > > > > Regards, > > > > > > Rama. > > > > > > > > > > > > > > > > > > > > > > > > On Thu, Jan 24, 2013 at 8:32 PM, Praveen M < > > lefthandma...@gmail.com> > > > > > > wrote: > > > > > > > > > > > > > Are there any plans on making the pigserver multi-threaded? > > > > > > > > > > > > > > since there is "PigProcessNotificationListener" to subscribe > for > > > > async > > > > > > > callbacks when the pig job completes, is there any real need to > > > keep > > > > > the > > > > > > > pig job submitting thread waiting until the job completes? > > > > > > > > > > > > > > Is this just a shortcoming today or are there more concrete > > reasons > > > > > > against > > > > > > > providing with a pigserver which can submit to the cluster in > > > > mapreduce > > > > > > > mode async? > > > > > > > > > > > > > > Thanks, > > > > > > > Praveen > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Wed, Jan 23, 2013 at 10:56 PM, Jonathan Coveney < > > > > jcove...@gmail.com > > > > > > > >wrote: > > > > > > > > > > > > > > > I think whatever way you slice it, handling thousands of pig > > jobs > > > > > > > > asynchronously is going to be a bear. I mean, this is > > essentially > > > > > what > > > > > > > the > > > > > > > > job tracker does, albeit with a lot less information. > > > > > > > > > > > > > > > > Either way, Pig is not multi-threaded so having more than one > > > > > instance > > > > > > of > > > > > > > > Pig in the same JVM is going to start causing problems (which > > is > > > > > why, I > > > > > > > > imagine, there is no async way to call Pig). So multiple > > > processes > > > > is > > > > > > > > really the only way around it that I know of. > > > > > > > > > > > > > > > > At Twitter we have a deployment of mesos, and our long term > > > > solution > > > > > is > > > > > > > > going to be running all of our pig jobs on mesos, in the > short > > > term > > > > > by > > > > > > > > deploying daemons that run pig jobs as local processes. > > > > > > > > > > > > > > > > > > > > > > > > 2013/1/23 Prashant Kommireddi <prash1...@gmail.com> > > > > > > > > > > > > > > > > > Both. Think of it as an app server handling all of these > > > > requests. > > > > > > > > > > > > > > > > > > Sent from my iPhone > > > > > > > > > > > > > > > > > > On Jan 23, 2013, at 9:09 PM, Jonathan Coveney < > > > > jcove...@gmail.com> > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > Thousands of requests, or thousands of Pig jobs? Or both? > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > 2013/1/23 Prashant Kommireddi <prash1...@gmail.com> > > > > > > > > > > > > > > > > > > > >> Did not want to have several threads launched for this. > We > > > > might > > > > > > > have > > > > > > > > > >> thousands of requests coming in, and the app is doing a > > lot > > > > more > > > > > > > than > > > > > > > > > only > > > > > > > > > >> Pig. > > > > > > > > > >> > > > > > > > > > >> On Wed, Jan 23, 2013 at 5:44 PM, Jonathan Coveney < > > > > > > > jcove...@gmail.com > > > > > > > > > >>> wrote: > > > > > > > > > >> > > > > > > > > > >>> start a separate Process which runs Pig? > > > > > > > > > >>> > > > > > > > > > >>> > > > > > > > > > >>> 2013/1/23 Prashant Kommireddi <prash1...@gmail.com> > > > > > > > > > >>> > > > > > > > > > >>>> Hey guys, > > > > > > > > > >>>> > > > > > > > > > >>>> I am trying to do the following: > > > > > > > > > >>>> > > > > > > > > > >>>> 1. Launch a pig job asynchronously via Java program > > > > > > > > > >>>> 2. Get a notification once the job is complete > > > (something > > > > > > > similar > > > > > > > > to > > > > > > > > > >>>> Hadoop callback with a servlet) > > > > > > > > > >>>> > > > > > > > > > >>>> I looked at PigServer.executeBatch() and it seems to > be > > > > > waiting > > > > > > > > until > > > > > > > > > >> job > > > > > > > > > >>>> completes.This is not what I would like my app to do. > > > > > > > > > >>>> > > > > > > > > > >>>> Any ideas? > > > > > > > > > >>>> > > > > > > > > > >>>> Thanks, > > > > > > > > > >>>> > > > > > > > > > >>> > > > > > > > > > >> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > -Praveen > > > > > > > > > > > > > > > > > > > > > > > > > > > >